Professional Documents
Culture Documents
In Memoriam
Norman Ernest Borlaug
(25 March 191412 September 2009)
Norman Borlaug was one of the greatest men of our times a steadfast champion and
spokesman against hunger and poverty. He dedicated his 95 richly lived years to filling the
bellies of others, and is credited by the United Nations World Food Program with saving
more lives than any other man in history.
An American plant pathologist who spent most of his years in Mexico, it was Dr
Borlaugs high-yielding dwarf wheat varieties that prevented wide-spread famine in South
Asia, specifically India and Pakistan, and also in Turkey. Known as the Green Revolution,
this feat earned him the Nobel Peace Prize in 1970. He was instrumental in establishing
the International Maize and Wheat Improvement Center, known by its Spanish acronym
CIMMYT, and later the Consultative Group of International Agricultural Research (CGIAR),
a network of 15 agricultural research centres.
Dr Borlaug spent time as a microbiologist with DuPont before moving to Mexico in
1944 as a geneticist and plant pathologist to develop stem rust resistant wheat cultivars. In
1966 he became the director of CIMMYTs Wheat Program, seconded from the Rockefeller
Foundation. His full-time employment with the Center ended in 1979, although he remained
a part-time consultant until his death. In 1984 he began a new career as a university pro-
fessor and went on to establish the World Food Prize, which honours the achievements of
individuals who have advanced human development by improving the quality, quantity or
availability of food in the world. In 1986, he joined forces with former US President Jimmy
Carter and the Nippon Foundation of Japan, under the chairmanship of Ryoichi Sasakawa,
to establish Sasakawa Africa Association (SAA) to address Africas food problems. Since
then, more than 1 million small-scale African farmers in 15 countries have been trained by
SAA in improved farming techniques.
Dr Borlaug influenced the thinking of thousands of agricultural scientists. He was a path-
breaking wheat breeder and, equally important, his stature enabled him to influence politi-
cians and leaders around the world. His legacy and his work ethic to get things done and not
mind getting your hands dirty influenced us all and remain CIMMYT guiding principles.
We will honor Dr Borlaugs memory by carrying forward his mission and spirit of inno-
vation: applying agricultural science to help smallholder farmers produce more and better-
quality food using fewer resources. At stake is no less than the future of humanity, for, as
Borlaug said: The destiny of world civilization depends upon providing a decent standard
of living for all. His presence will never really leave CIMMYT; it is embedded in our soul.
Thomas A. Lumpkin
Director General, CIMMYT
Marianne Bnziger
Deputy Director General for Research and Partnerships, CIMMYT
Hans-Joachim Braun
Director for Global Wheat Program, CIMMYT
Molecular Plant Breeding
Yunbi Xu
Preface ix
Foreword by Dr Norman E. Borlaug xv
Foreword by Dr Ronald L. Phillips xvii
1 Introduction 1
1.1 Domestication of Crop Plants 1
1.2 Early Efforts at Plant Breeding 3
1.3 Major Developments in the History of Plant Breeding 4
1.4 Genetic Variation 9
1.5 Quantitative Traits: Variance, Heritability and Selection Index 10
1.6 The Green Revolution and the Challenges Ahead 16
1.7 Objectives of Plant Breeding 17
1.8 Molecular Breeding 18
v
vi Contents
References 627
Index 717
The colour plate section can be found following p. 270
This page intentionally left blank
Preface
The genomics revolution of the past decade has greatly enhanced our understanding of
the genetic composition of living organisms including many plant species of economic
importance. Complete genomic sequences of Arabidopsis and several major crops, together
with high-throughput technologies for analyses of transcripts, proteins and mutants, pro-
vide the basis for understanding the relationship between genes, proteins and phenotypes.
Sequences and genes have been used to develop functional and biallelic markers, such as
single nucleotide polymorphism (SNP), that are powerful tools for genetic mapping, germ-
plasm evaluation and marker-assisted selection.
The road from basic genomics research to impacts on routine breeding programmes has
been long, windy and bumpy, not to mention scattered with wrong turns and unexpected
blockades. As a result, genomics can be applied to plant breeding only when an integrated
package becomes available that combines multiple components such as high-throughput
techniques, cost-effective protocols, global integration of genetic and environmental fac-
tors and precise knowledge of quantitative trait inheritance. More recently, the end of the
tunnel has come in sight, and the multinational corporations have ramped up their invest-
ments in and expectations from these technologies. The challenge now is to translate and
integrate the new knowledge from genomics and molecular biology into appropriate tools
and methodologies for public-sector plant breeding programmes, particularly those in low-
income countries. It is expected that harnessing the outputs of genomics research will be
an important component in successfully addressing the challenge of doubling world food
production by 2050.
The term molecular plant breeding has been much used and abused in the literature, and
thus loved or maligned in equal measure by the readership. In the context of this book, the
term is used to provide a simple umbrella for the multidisciplinary field of modern plant
breeding that combines molecular tools and methodologies with conventional approaches
for improvement of crop plants. This book is intended to provide comprehensive coverage
of the components that should be integrated within plant breeding programmes to develop
crop products in a more efficient and targeted way.
ix
x Preface
The first chapter introduces some basic concepts that are required for understanding
fundamentally important issues described in subsequent chapters. The concepts include
crop domestication, critical events in the history of plant breeding, basics of quantitative
genetics (variance, heritability and selection index), plant breeding objectives and molecu-
lar breeding goals. Chapters 2 and 3 introduce the key genomics tools that are used in
molecular breeding programmes, including molecular markers, maps, omics technolo-
gies and arrays. Different types of molecular markers are compared and construction of
molecular maps is discussed. Chapter 4 describes common types of populations that have
been used in genetics and plant breeding, with a focus on recombinant inbred lines, dou-
bled haploids and near-isogenic lines. Chapter 5 provides an overview of marker-assisted
germplasm evaluation, management and enhancement. Chapters 6 and 7 discuss the theory
and practice, respectively, of using molecular markers to dissect complex traits and locate
quantitative trait loci (QTL). Chapters 8 and 9 cover the theory and practice, respectively,
of marker-assisted selection. Genotype-by-environment interaction (GEI) is discussed in
Chapter 10, including multi-environment trials, stability of genotype performance, molecu-
lar dissection of GEI and breeding for optimum GEI. Chapter 11 provides a summary of
gene isolation and functional analysis approaches, including in silico prediction of genes,
comparative approaches for gene isolation, gene cloning based on cDNA sequencing, posi-
tional cloning and identification of genes by mutagenesis. Chapter 12 describes the use of
isolated and characterized genes for gene transfer and the generation of genetically modi-
fied plants, focusing on the vital elements of expression vectors, selectable marker genes,
transgene integration, expression and localization, transgene stacking and transgenic crop
commercialization. Chapter 13 is devoted to intellectual property rights and plant vari-
ety protection, including plant breeders rights, international agreements affecting plant
breeding, plant variety protection strategies, intellectual property rights affecting molec-
ular breeding and the use of molecular techniques in plant variety protection. The last
two chapters (14 and 15) discuss supporting tools that are required in molecular breeding
for information management and decision making, including data collection, integration,
retrieval and mining and information management systems. Decision support tools are
described for germplasm and breeding population management and evaluation, genetic
mapping and marker-trait association analysis, marker-assisted selection, simulation and
modelling, and breeding by design.
Intended audience and guidance for reading and using this book
This book is intended to provide a handbook for biologists, geneticists and breeders, as well
as a textbook for final year undergraduates and graduate students specializing in agronomy,
genetics, genomics and plant breeding. Although the book has attempted to cover all rel-
evant areas of molecular breeding in plants, many examples have been drawn from the
genomics research and molecular breeding of major cereal crops. It is hoped that the book
can also serve as a resource for training courses as described below. As each chapter covers
a complete story on a special topic, readers can choose to read chapters in any order.
Advanced Course on Quantitative Genetics: Chapters 1, 2, 4, 6, 7, 10 and 14, which
cover all molecular marker-based QTL mapping, including markers, maps, populations,
statistics and genotype-by-environment interaction.
Comprehensive Course on Marker-assisted Plant Breeding: Chapters 1, 2, 3, 4, 5, 8, 9,
10, 13, 14 and 15, which cover basic theories, tools, methodologies about markers, maps,
omics, arrays, informatics and support tools for marker-assisted selection.
Short Course on Genetic Transformation: Chapters 1, 11, 12 and 13, which provide
a brief introduction to gene isolation, transformation techniques, genetic-transformation-
related intellectual property and genetically modified organism (GMO) issues.
Preface xi
This book has been almost a decade in preparation. In fact, the initial idea for the book
was stimulated by the impact from my previous book Molecular Quantitative Genetics
published by China Agriculture Press (Xu and Zhu, 1994), which was well received by
colleagues and students in China and used as a textbook in many universities. Preliminary
ideas related to the book were developed in a review article on QTL separation, pyramiding
and cloning in Plant Breeding Reviews (Xu, 1997). Much of the hopeful thinking described
in this paper has fortunately come true during the following 10 years, and the manipula-
tion of QTL has been revolutionized and become mainstream. As complete sequences for
several plant genomes have become available and with more anticipated, as shown by
numerous genes and QTL that have been separated and cloned individually, some of them
have been pyramided for plant breeding through genetic transformation or marker-assisted
selection.
I started making tangible progress on this book while working as a molecular breeder
for hybrid rice at RiceTec, Inc., Texas (19982003). This experience shaped my thinking
about how an applied breeding programme could be integrated with molecular approaches.
With numerous QTL accumulating for a model crop, taking all the QTL into consideration
becomes necessary. Initial thoughts on this were described in Global view of QTL . . ., pub-
lished in the proceedings on quantitative genetics and plant breeding, which considered
various genetic background effects and genotype-by-environment interaction (Xu, Y., 2002).
Hybrid rice breeding, which involves a three-line system, requires a large number of test-
crosses in order to identify traits that perform well in seed and grain production. My expe-
rience in development of marker-assisted selection strategies for breeding hybrid rice was
then summarized in a review article in Plant Breeding Reviews (Xu, Y., 2003), which also
covered general strategies for other crops using hybrids.
Moving on to research at Cornell University with Dr Susan McCouch helped me to bet-
ter understand how molecular techniques could facilitate breeding of complex traits such
as water-use efficiency, which is a difficult trait to measure and requires strong collabora-
tion among researchers across many disciplines. In addition, this experience with rice as a
model crop raised the issue of how we can use rice as a reference genome for improvement
of other crops, which was discussed in an article published in a special rice issue of Plant
Molecular Biology (Xu et al., 2005).
With over 20 years experience in rice, I decided to shift to another major crop by work-
ing for the International Maize and Wheat Improvement Center (CIMMYT) as the principle
maize molecular breeder. CIMMYT has given me exposure to an interface connecting basic
research with applied breeding for developing countries and the resource-poor. Comparing
public- and private-sector breeding programmes has given me an intense understanding
of the importance of making the type of breeding systems that have been working well
for the private sector a practical reality for the public sector, particularly in developing
countries. This has been addressed in a recent review paper published in Crop Science
(Xu and Crouch, 2008), which discussed the critical issues for achieving this translation.
My most recent research has focused on the development of various molecular breeding
xii Preface
platforms that can be used to facilitate breeding procedures through seed DNA-based geno-
typing, selective and pooled DNA analysis, and chip-based large-scale germplasm evalua-
tion, markertrait association and marker-assisted selection (see Xu et al., 2009b for further
details). Thus, my career has evolved alongside the transition from molecular biology
research to routine molecular plant breeding applications and I strongly believe that now
is the right time for a mainstream publication providing comprehensive coverage of all
fields relevant for a new generation of molecular breeders.
Acknowledgements
The dream of writing this book could not have become reality without the wonderful sup-
port of Dr Susan McCouch at Cornell University and Dr Jinhua Xiao, now at Monsanto, who
have both fully supported my proposal since 2002. Their support and consistent encour-
agement has greatly motivated me throughout the process. While working with Susan,
she allowed me so much flexibility in my research projects and working hours so that I
could continue to make progress on the writing of this book. At the same time the Cornell
libraries were an indispensible source of the major references cited throughout the book.
Susans encouragement provided the impetus to keep working on the book through a very
difficult time in my life. I also extend my appreciation to Dr Jonathan Crouch, the Director
of the Germplasm Resources Program at CIMMYT, where I received his full understanding
and support so that I could complete the second half of the book. Jonathans guidance and
contribution to my research projects and publications while at CIMMYT has significantly
impacted the preparation of the book.
I would also like to thank the chief editors of the three journals for which I have served
on the editorial boards during the preparation of this book: Dr Paul Christou for Molecular
Breeding, Dr Albrecht Melchinger for Theoretical and Applied Genetics, and Dr Hongbin
Zhang for International Journal of Plant Genomics. I thank them for their patience, support
and flexibility with my editorial responsibilities during the preparation of the book. In addi-
tion, Drs Christou and Melchinger also reviewed several chapters in their respective fields.
My appreciation also goes to Yanli Lu (a graduate student from Sichuan Agricultural
University of China) and Dr Zhuanfang Hao (a visiting scientist from the Chinese Academy
of Agricultural Sciences) who helped prepare some figures and tables during their work
in my lab at CIMMYT, Mexico. I would like to give special thanks to Dr Rodomiro Ortiz
at CIMMYT for his consistent information sharing and stimulating discussions during our
years together at CIMMYT. Finally, I would like to thank my colleagues at CIMMYT, par-
ticularly Drs Kevin Pixley, Manilal William, Jose Crossa and Guy Davenport, who provided
useful discussions on various molecular breeding-related issues.
Forewords
I am greatly indebted to Dr Norman E. Borlaug, visioned plant breeder and Nobel laure-
ate for his role in the Green Revolution, and Dr Ronald L. Phillips, Regents Professor and
McKnight Presidential Chair in Genomics, University of Minnesota, who each contributed
a foreword for the book. Their contributions emphasized the importance of molecular
breeding in crop improvement and the role that this book will play in molecular breeding
education and practice.
Preface xiii
Reviewers
Each chapter of the book has undergone comprehensive peer review and revision before
finalization. The constructive comments and critical advice of these reviewers have greatly
improved this book. The reviewers were selected for their active expertise in the field of the
respective chapter. Reviewers come from almost all continents and work in various fields
including plant breeding, quantitative genetics, genetic transformation, intellectual prop-
erty protection, bioinformatics and molecular biology, many of whom are CIMMYT scien-
tists and managers. Considering that each chapter is relatively large in content, reviewers
had to contribute a lot of time and effort to complete their reviews. Although these inputs
were indispensible, any remaining errors remain my sole responsibility. The names and
affiliations of the reviewers (alphabetically) are:
Several editors at CABI have been working with me over the years: Tim Hardwick (2002
2006), Sarah Hulbert (20062007), Stefanie Gehrig (20072008), Claire Parfitt (20082009),
Meredith Caroll (2009) and Tracy Head (2009). These editors and their associates have
done a superb job of converting a series of manuscripts into a useable and coherent book.
I thank them for their effort, consideration and cooperation.
xiv Preface
Research grants
During the preparation of the book, my research on genomic analysis of plant water-use
efficiency at Cornell University was supported by the National Science Foundation (Plant
Genome Research Project Grant DBI-0110069). My molecular breeding research at CIMMYT
has been supported by the Rockefeller Foundation, the Generation Challenge Programme
(GCP), Bill and Melinda Gates Foundation and the European Community, and through
other attributed or unrestricted funds provided by the members of the Consultative Group
on International Agricultural Research (CGIAR) and national governments of the USA,
Japan and the UK.
Family
It is difficult to imagine writing a book without the full support and understanding of ones
family. My greatest thanks go to my wife, Yu Wang, who has given me her wholehearted and
unwavering support, and to my sons, Sheng, Benjamin and Lawrence, who have retained
great patience during this long adventure. And finally to my parents, for their love, encour-
agement and vision that unveiled in me from my earliest years the desire to thrive on the
challenge of always striving to reach the highest mountain in everything I do.
Foreword
DR NORMAN E. BORLAUG
The past 50 years have been the most productive period in world agricultural history.
Innovations in agricultural science and technology enabled the Green Revolution, which
is reputed to have spared one billion people the pains of hunger and even starvation.
Although we have seen the greatest reductions in hunger in history, it has not been enough.
There are still one billion people who suffer chronic hunger, with more than half being
small-scale farmers who cultivate environmentally sensitive marginal lands in developing
countries.
Within the next 50 years, the world population is likely to increase by 6080%, requir-
ing global food production to nearly double. We will have to achieve this feat on a shrinking
agricultural land base, and most of the increased production must occur in those countries
that will consume it. Unless global grain supplies are expanded at an accelerated rate, food
prices will remain high, or be driven up even further.
Spectacular economic growth in many newly industrializing developing countries,
especially in Asia, has spurred rapid growth in global cereal demand, as more people eat
better, especially through more protein-heavy diets. More recently, the subsidized conver-
sion of grains into biofuels in the USA and Europe has accelerated demand even faster.
On the supply side, a slowing in research investment in the developing world and more
frequent climatic shocks (droughts, floods) have led to greater volatility in production.
Higher food prices affect everyone, but especially the poor, who spend most of their
disposable income on food. Increasing supply, primarily through the generation and diffu-
sion of productivity-enhancing new technologies, is the best way to bring food prices down
and secure minimum nutritional standards for the poor.
Todays agricultural development challenges are centred on marginal lands and in
regions that have been bypassed during the Green Revolution, such as Africa and resource-
poor parts of Asia, and are experiencing the ripple effects of food insecurity through hun-
ger, malnutrition and poverty.
Despite these serious and daunting challenges, there is cause for hope. New science
and technology including biotechnology have the potential to help the worlds poor and
food insecure. Biotechnologies have developed invaluable new scientific methodologies
and products for more productive agriculture and added-value food. This journey deeper
into the genome to the molecular level is the consequence of our progressive understanding
of the workings of nature. Genomics-based methods have enabled breeders greater preci-
sion in selecting and transferring genes, which has not only reduced the time needed to
xv
xvi Foreword
eliminate undesirable genes, but has also allowed breeders to access useful genes from
distant species.
Bringing the power of science and technology to bear on the challenges of these riskier
environments is one of the great challenges of the 21st century. With the new tools of bio-
technology, we are poised for another explosion in agricultural innovation. New science
has the power to increase yields, address agroclimatic extremes and mitigate a range of
environmental and biological challenges.
Molecular Plant Breeding, authored by my CIMMYT colleague Yunbi Xu, is an out-
standing review and synthesis of the theory and practice of genetics and genomics that
can drive progress in modern plant breeding. Dr Xu has done a masterful job in integrating
information about traditional and molecular plant breeding approaches. This encyclope-
dic handbook is poised to become a standard reference for experienced breeders and stu-
dents alike. I commend him for this prodigious new contribution to the body of scientific
literature.
Foreword
DR RONALD L. PHILLIPS
The road is long from basic research findings to final destinations reflecting important
applications but it is a road that can ultimately save time and money. There may be obsta-
cles along the way that delay building that road but they are generally overcome by careful
thought and timely considerations. A new road may involve the former road but with some
widening and the filling in of certain potholes. We seldom look back and think that the
improvements were not useful.
The road to improved varieties by traditional plant breeding has and continues to serve
society well. That approach has been based on careful observation, evaluation of multi-
ple genotypes (parents and progenies), selection at various generational levels, extensive
testing and the sophisticated utilization of statistical analyses and quantitative genetics.
About 50% of the increased productivity of new varieties is generally attributed to genetic
improvements, with the remaining 50% due to many other factors such as time of planting,
irrigation, fertilizer, pesticide applications and planting densities.
The statistical genetics associated with traditional plant breeding can now be supple-
mented by extensive genomic information, gene sequences, regulatory factors and linked
genetic markers. We can now draw on a broader genetic base, the identification of major
loci controlling various traits and expression analyses across the entire genome under vari-
ous biotic and abiotic conditions. One can anticipate a future when the networking of
genes, genotype-by-environment (G E) interactions, and even hybrid vigour will be better
understood and lead to new breeding approaches. The importance of de novo variation may
modify much of our current interpretation of breeding behaviour; de novo variation such as
mutation, intragenic recombination, methylation, transposable elements, unequal crossing
over, generation of genomic changes due to recombination among dispersed repeated ele-
ments, gene amplification and other mechanisms will need to be incorporated into plant
breeding theory.
This book calls for an integration of approaches traditional and molecular and
represents a theoretical/practical handbook reflecting modern plant breeding at its fin-
est. I believe the reader will be surprised to find that that this single-authored book is so
full of information that is useful in plant genetics and plant breeding. Students as well
as established researchers wanting to learn more about molecular plant breeding will be
xvii
xviii Foreword
well-served by reading this book. The information is up-to-date with many current refer-
ences. Even many of the tables are packed with information and references. A good rep-
resentation of international and domestic breeding is reflected through many examples.
The importance of G E interactions is clearly demonstrated. Various statistical models
are provided as appropriate. The importance of defining mega-environments for varietal
development is made clear. The role of core germplasm collections, appropriate population
sizes, major databases and data management issues are all integrated with various plant
breeding approaches. Marker-assisted selection receives considerable attention, includ-
ing its requirements and advantages, along with the multitude of quantitative trait locus
(QTL) analysis methods. Transformation technologies leading to the extensive use of trans-
genic crops are reviewed along with the increased use of trait stacking. The procurement of
intellectual property that, in part, is driving the application of molecular genetics in plant
breeding provides the reader with an understanding of why private industry is now more
involved and why some common crops represent new business opportunities.
Molecular Plant Breeding is not like other plant breeding books. The interconnecting
road that it depicts is one where you can look at the beautiful new scenery and appreciate
the current view, yet see the horizon down the road.
1
Introduction
Several definitions of plant breeding have and technologies discussed in the following
been put forward, such as the art and sci- chapters of this book.
ence of improving the heredity of plants for
the benefit of humankind (J.M. Poehlman),
or evolution directed by the will of man
(N.I. Vavilov). Bernardo (2002), however, 1.1 Domestication of Crop Plants
offers the most universal description: Plant
breeding is the science, art, and business of The earliest records indicate that agricul-
improving plants for human benefit. ture developed some 11,000 years ago in
Plants are employed in the manufac- the so-called Fertile Crescent, a hilly region
ture of a multitude of products for domes- in south-western Asia. Agriculture devel-
tic (cosmetics, medicines and clothing), oped later in other regions. Archaeologists
industrial (manufacture of rubber, cork suggest that plant domestication began
and engine fuel) and recreational uses because of the increasing size of popula-
(paper, art supplies, sports equipment and tions and changes in the exploitation of
musical instruments) and plant breeders local resources (see http://www.ngdc.noaa.
have therefore been driven by the chal- gov/paleo/ctl/10k.html for further details).
lenges of meeting the ever increasing Domestication is a selection process carried
demands of the manufacturers of these out by man to adapt plants and animals to
products. Lewington has described the their own needs, whether as farmers or con-
diverse uses of plants in his book Plants sumers. Successive selection of desirable
for People (2003). plants changed the genetic composition of
Plant breeding began by the domesti- early crops. Primitive farmers, knowing little
cation of crop plants and has become ever or nothing about genetics or plant breeding,
more sophisticated. New developments accomplished much in a short time. They
in molecular biology have now led to an did so by unconsciously altering the natural
increasing number of methods which can process of evolution. Indeed, domestication
be used to enhance breeding effective- is nothing more than directed evolution; as
ness and efficiency. This chapter includes a result, the process of evolution is acceler-
a brief history of plant breeding together ated. The key to domestication is the selec-
with breeding objectives and some back- tive advantage of rare mutant alleles, which
ground information relevant to the theories are desirable for successful cultivation,
but unnecessary for survival in the wild. domesticated plants is another example.
The process of selection continues until For further information see http://oregon-
the desired mutant phenotype dominates state.edu/instruct/css/330/index.htm and
the population. There are three important Swaminathan (2006).
steps in the domestication process. Man It is generally believed that domesti-
not only planted seeds, but also: (i) moved cation of crop plants was undertaken in
seeds from their native habitat and planted several regions of the world independ-
them in areas to which they were perhaps ently. The Russian geneticist and plant
not as well adapted; (ii) removed certain geographer N.I. Vavilov, collected plants
natural selection pressures by growing the from all over the world and identified
plants in a cultivated field; and (iii) applied regions where crop species and their wild
artificial selection pressures by choosing relatives showed great genetic diversity. In
characteristics that would not necessarily 1926 he published Studies on the origin
have been beneficial for the plants under of cultivated plants in which he described
natural conditions. Cultivation also cre- his theories regarding the origins of crops.
ates selection pressure, resulting in changes Vavilov concluded that each crop had a
in allele frequency, gradations within and characteristic primary centre of diversity
between species, fixation of major genes, which was also its centre of origin. He
and improvement of quantitative traits. By identified eight areas and hypothesized
the end of the 18th century, the informal that these were the centres from which all
processes of selection practised by farmers our modern major crops originated. Later,
everywhere led to the worldwide creation he modified his theory to include second-
of thousands of different cultivars or land- ary centres of diversity for some crops.
races for each major crop species. These centres of origin included China,
More than 1000 species of plants have India, Central Asia, the Near East, the
been domesticated at one time or another, of Mediterranean, East Africa, Mesoamerica,
which about 100200 are now major com- and South America. From these foci, agri-
ponents of the human diet. The 15 most culture was progressively disseminated to
important examples can be divided into the other regions such as Europe and North
following four groups: America. Subsequently, others includ-
ing the American geographer Jack Harlan,
1. Cereals: rice, wheat, maize, sorghum,
challenged Vavilovs hypothesis because
barley.
many cultivated plants did not fit Vavilovs
2. Roots and stems: sugarbeet, sugarcane,
pattern, and appeared to have been domes-
potato, yam, cassava.
ticated over a broad geographical area for a
3. Legumes: bean, soybean, groundnut.
long period of time.
4. Fruits: coconut, banana.
In recent years, variation in DNA frac-
Certain characteristics may have been tions and other approaches have been used
selected deliberately or unwittingly. When to study the diversity of crop species. In
farmers set aside a portion of their har- general, these studies have not confirmed
vest for planting in the next season, they Vavilovs theory that the centres of origin
were selecting seeds with specific char- are the areas of greatest diversity, because
acteristics. This selection has resulted in while centres of diversity have been iden-
profound differences between crop plants tified, these are often not the centres of
and their progenitors. For example, many origin. For some crops there is little connec-
wild plants have a seed dispersal mecha- tion between the source of their wild ances-
nism that ensures that seeds will be sepa- tors, areas of domestication, and the areas
rated from the plants and distributed over of evolutionary diversification. Species
as large an area as possible, while mod- may have originated in one geographic area,
ern crops have been modified by selec- but domesticated in a different region and
tion against seed dispersal. The absence some crops do not appear to have centres
of seed dormancy mechanisms in some of diversity, thus a continuum of evolution-
Introduction 3
ary activity is perceived rather than discrete selected the best plants to provide seed for
centres. their next crop. When they found particular
In 1971, Jack Harlan described his own plants that fared well even in bad weather,
views on the origins of agriculture. He pro- were especially prolific, or resisted disease
posed three independent systems, each that had destroyed neighbouring crops,
with a centre and a concentre (larger, dif- they naturally tried to capture these desir-
fuse areas where domestication is thought to able traits by crossbreeding them into other
have occurred): Near East + Africa, China + plants. In this way, they selected and bred
South-east Asia, and Mesoamerica + South plants to improve their crop for commercial
America. purposes. Although unbeknown to them,
Evidence gathered since that time farmers have been utilizing genetics for cen-
suggests that these centres are also more turies to modify the food we eat by selecting
diffuse than he had envisioned. After the and growing seeds which produce a health-
initial phases of evolution, species spread ier crop that has a better flavour, richer col-
out over large, ill-defined areas. This is our and stronger resistance to certain plant
probably due to the dispersal and evolu- diseases.
tion of crops associated with iterant popu- Modern plant breeding started with
lations. Regional and/or multiple areas of sedentary agriculture and the domestica-
origin may prove to be more accurate than tion of the first agricultural plants, cereals.
the hypothesis of a unique, localized ori- This led to the rapid elimination of undesir-
gin for many crops. However, the probable able characters such as seed-shattering and
geographic origin of many crops is listed in dormancy and we can only speculate on
Table 1.1. how much foresight or what kind of plan-
ning based on experience was used by the
first selectors of non-shattering wheat and
1.2 Early Efforts at Plant Breeding rice, compact-headed sorghum, or soft-
shelled gourds. For 10,000 years man has
For thousands of years selective breed- consciously been moulding the phenotype
ing has been employed to re-engineer (and so the genotype) of hundreds of plant
plants to produce traits or qualities that species as one of the many routine activi-
were considered to be desirable to con- ties in the normal course of making a living
sumers. Selective breeding began with the (Harlan, 1992). Over long periods of time
early farmers, ranchers and vintners who there was a transition from the collection of
Region Crops
Near East (Fertile Crescent) Wheat and barley, flax, lentils, chickpea, figs, dates, grapes, olives,
lettuce, onions, cabbage, carrots, cucumbers, melons;
fruits and nuts
Africa Pearl millet, Guinea millet, African rice, sorghum, cowpea, groundnut,
yam, oil palm, watermelon, okra
China Japanese millet, rice, buckwheat, soybean
South-east Asia Wet- and dryland rice, pigeon pea, mung bean, citrus fruits, coconut,
taro, yams, banana, breadfruit, coconut, sugarcane
Mesoamerica and Maize, squash, common bean, lima bean, peppers, amaranth,
North America sweet potato, sunflower
South America Lowlands: cassava; Mid-altitudes and uplands (Peru): potato,
groundnut, cotton, maize
wild plants for food to the selection of those of commercial seed production enterprises.
to be cultivated which began to guide the Besides selecting plants with useful charac-
evolutionary process. Now plant breeders teristics, breeders also arrange marriages
accelerate the evolution of major crop spe- between plants with different traits in the
cies through skilful manipulation of breed- hope of producing fertile offspring carry-
ing procedures. High-input agriculture ing both traits. The use of artificial crosses
emerged as a result of voyages of discovery in pre-Mendelian breeding is exemplified
and modern science. by the case of Fragaria ananassa devel-
Many traits important to early agricul- oped in the botanical garden of Paris by
turists were heritable and, therefore, could Duchesne, in the 17th century by crossing
be reliably selected. However, this phase Fragaria chiloense with Fragaria virgin-
of breeding was empirical and generally iana. In England, at about the same time
not considered scientific in the modern new cultivars of fruits, wheat and peas were
sense because changes in these plant and being obtained by artificial hybridization
animal populations were not analysed in (Snchez-Monge, 1993).
an attempt to explain biological phenom- Hybridization combined with selec-
ena. At this stage of agriculture, the focus tion was adopted by Patrik Sheireff in
was on the practical goal of producing 1819 in wheat and rice where the new
food rather than finding rational explana- selections were grown along with culti-
tions for nature (Harlan, 1992). Ideas about vars for comparative purposes. He specu-
heredity during the period when many lated that introduction and hybridization
early crops were domesticated ranged to be the important sources of new cul-
from mythological interpretations to near- tivars and stressed crossing of carefully
scientific notions of trait transmission. In selected parents to meet the aims of new
his Presidential Address to the American cultivars. Although the essential elements
Society for Horticulture Science in 1987, of plant breeding were known by this time,
Janick (1988) stated: there was still a lack of knowledge regard-
ing the scientific basis of variation among
The origin of new information in
horticulture derives from two traditions: plants. For example, the first generation
empirical and experimental. The roots of of crossed materials were mistakenly
empiricism stem from efforts of prehistoric expected to inevitably produce new culti-
farmers, Hellenic root diggers, medieval vars but instead took several generations to
peasants, and gardeners everywhere to stabilize. Many historical examples of suc-
obtain practical solutions to problems of cessful plant breeding can be found in the
plant growing. The accumulated successes literature, although there were still many
and improvements passed orally from important discoveries to be made before it
parent to child, from artisan to apprentice,
could be called a technology (Chahal and
have become embodied in human
Gosal, 2002).
consciousness via legend, craft secrets,
and folk wisdom. This information is
now stored in tales, almanacs, herbals,
and histories and has become part of our
1.3 Major Developments in the
common culture. More than practices
and skills were involved as improved History of Plant Breeding
germplasm was selected and preserved via
seed and graft from harvest to harvest and Plant breeders of today use various meth-
generation to generation. The sum total of ods to accelerate the evolutionary process
these technologies makes up the traditional in order to increase the usefulness of plants
lore of horticulture. It represents a by exploiting genetic differences within a
monumental achievement of our forbears
species. This has been made possible by the
unknown and unsung.
determination of the genetic basis for devel-
Large-scale breeding activities began oping crop breeding procedures and this in
very early in Europe, often under the auspices turn has a long history.
Introduction 5
The role of reproduction in plants was In 1859 Darwin proposed in The Origin of
first reported in 1694 by Camerarius who Species that natural selection is the mech-
noticed the difference between male and anism of evolution. Darwins thesis was
female reproductive organs in maize and that the adaptation of populations to their
produced the first artificial hybrid plant. He environments resulted from natural selec-
established that seed could not be produced tion and that if this process continued for
without the participation of pollen produced long enough, it would ultimately lead to
in male reproductive organs of plants. The the origin of new species. Darwins Theory
first hybridization experiment was carried of Evolution through Natural Selection
out on wheat by Fairchild in 1719 and the hypothesized that plants change gradually
current technique of hybridization is largely by natural selection operating on variable
based on the work of Klreuter (17331806), populations and was the outstanding dis-
a French researcher who carried out his covery of the 19th century with direct rele-
experiments in the 1760s. Hybridization vance to plant breeding.
freed the breeder from the severe constraints
of working within a limited population,
enabled him to bring together useful traits
1.3.4 Breeding types and polyploidy
from two or more sources, and allowed spe-
cific genes to be introduced.
By understanding the reproductive Other historical developments in plant
capacities of plants, plant breeders can breeding include, pedigree breeding, back-
manipulate these crosses to produce fer- cross breeding (Harlan and Pope, 1922) and
tile offspring which carry traits from both mutation breeding (Stadler, 1928). Natural
parents. Crossing has been very valuable and artificial polyploids also offered new
to plant breeders, because it allows some possibilities for plant breeding. Blakeslee
measure of control over the phenotype of and Avery (1937) demonstrated the use-
a plant. Nearly all modern plant breeding fulness of colchicine in the induction of
involves some use of hybridization. chromosome doubling and polyploidy,
enabling plant breeders to combine entire
chromosome sets of two or more species to
evolve new crop plants.
1.3.2 Mendelian genetics
in Major Crops that brought into focus the of plant genomics, particularly molecular
causes and levels of genetic uniformity and markers, and other molecular tools that can
its consequences. It was a turning point be used to dissect complex traits into sin-
in the history of germplasm resources and gle Mendelian factors (Xu and Zhu, 1994;
the International Board for Plant Genetic Buckler et al., 2009; Chapters 6 and 7).
Resources (IBPGR) was established in 1974, Genotype-by-environment interaction
and was later renamed the International (GEI) and its importance to plant breed-
Plant Genetic Resources Institute (IPGRI) ing were first recognized by Mooers (1921)
and now Biodiversity International, to col- and Yates and Cochran (1938). Since then,
lect, evaluate and conserve plant germplasm various statistical methods have been
for future use. developed for the evaluation of GEI using
joint linear regression, heterogeneity of
variance and lack of correlation, ordina-
tion, clustering, and pattern analysis. As
1.3.6 Quantitative genetics and an important field in quantitative genet-
genotype-by-environment interaction ics, GEI has been receiving more attention
in recent years and is covered in Chapter
Quantitative genetics is the study of the 10 along with molecular methods for GEI
genetic control of those traits which show analysis.
continuous variation. It is concerned with
the level of inheritance of these differences
between individuals rather than the type of 1.3.7 Heterosis and hybrid breeding
differences, that is quantitative rather than
qualitative (Falconer, 1989). Several important
Although early botanists had observed
books have been published which document
increased growth when unrelated plants
the major developments in quantitative genet-
of the same species were crossed, it was
ics and these include Animal Breeding Plans
Charles Darwin who carried out the first
(Lush, 1937), Population Genetics and Animal
seminal experiments. In 1877, he showed
Improvement (Lerner, 1950), Biometrical
that crosses of related strains did not
Genetics (Mather, 1949), Population Genetics
exhibit the vigour of hybrids. He observed
(Li, 1955), An Introduction to Genetics Statistics
heterosis, i.e. the tendency of cross-bred
(Kempthorne, 1957) and Introduction to
individuals to show qualities superior to
Quantitative Genetics (Falconer, 1960).
those of both parents, in crops like maize
Many of the misconceptions regarding
and concluded that cross-fertilization was
the inheritance of quantitative traits, which
generally beneficial and self-fertilization
include most of the economically important
injurious. In 1879, William Beal demon-
characters, were corrected by the classical
strated hybrid vigour in maize by using
work of Fisher (1918) who successfully
two unrelated cultivars. The best combi-
applied Mendelian principles to explain
nations yielded 50% more than the mean
the genetic control of continuous varia-
of the parents. Reports by Sanborn in 1890
tion. He divided the phenotypic variance
and McClure in 1892 confirmed Beals ear-
observed into three variance components:
lier reports and extended the generality of
additive, dominance and epistatic effects. This
the superiority of hybrids over the average
approach has been substantially refined and
of the parental forms.
applied to the improvement of the efficiency
of plant breeding. Fisher also laid the found-
ations for scientific crop experimentation
by developing the theory of experimental 1.3.8 Refinement of populations
designs that is an essential part of any plant
breeding programme. Quantitative genetics Several different population breeding
has however evolved considerably in the methods can be used: (i) bulk; (ii) mass
past two decades because of the development selection; and (iii) recurrent selection. One
Introduction 7
of the methods used for managing large All the genes necessary to make an
populations of segregates was the bulk entire organism can be induced to function
method proposed by Harlan et al. (1940) in the correct sequence from a living cell
for multi-parent crosses. This concept isolated from a mature tissue (called totipo-
changed the breeding methodologies for tency). Regeneration of whole plants from
self-pollinated species. Mass selection is single cells is an important new source of
a system of breeding in which seeds from genetic variability for refining the properties
individuals selected on the basis of pheno- of plants because when somatic embryos
type are bulked and used to grow the next derived from single cells are grown into
generation. Mass selection is the oldest plants, the plants characteristics vary some-
breeding method for plant improvement what. Larkin and Scowcroft (1981) coined
and was employed by early farmers for the the term somaclonal variation to describe
development of cultivated species from this observed phenotypic variation among
their ancestral forms. plants derived from micro-propagation
The enhancement of open-pollinated experiments. When it was recognized as a
populations of crops such as rye, maize genuine phenomenon, somaclonal variation
and sugarbeet, herbage grasses, legumes, was considered to be a potential tool for the
and tropical trees such as cacao, coconut, introduction of new variants of perennial
oil palm, and some rubber, depends essen- crops that can be asexually propagated (e.g.
tially on changing the gene frequencies so banana). Somaclonal variation has also been
that the favourable alleles are fixed, while exploited by plant breeders as a new source
maintaining a high (but far from maximal) of genetic variation for annual crops.
degree of heterozygosity. Recurrent selec-
tion is a method of plant breeding associ-
ated with quantitatively inherited traits by
which the frequencies of favourable genes 1.3.10 Genetic engineering and
are increased in populations of plants. The gene transfer
methodology is cyclical with each cycle
encompassing two phases: (i) selection of The discovery of the structure of DNA by
genotypes that possess the favourable or Watson and Crick has enhanced traditional
required genes; and (ii) crossing among the breeding techniques by allowing breeders to
selected genotypes. This leads to a gradual pinpoint the particular gene responsible for
increase in the frequencies of the desired a particular trait and to follow its transmis-
alleles. While recurrent selection is often sion to subsequent generations. Enzymes
successful it also has potential limitations that cut and rejoin DNA molecules allow sci-
in closed populations and this has led to entists to manipulate genes in the laboratory.
numerous modifications and alternative In 1973 Stanley Cohen and Herbert Boyer
schemes (see Hallauer and Miranda, 1988). spliced the gene from one organism into the
Recurrent selection breeding methods have DNA of another to produce recombinant
been applied to a wide range of plant spe- DNA which was then expressed normally
cies, including self-pollinated crops. and this formed the basis of genetic engin-
eering. The goal of plant genetic engineers
is to isolate one or more specific genes and
introduce these into plants. Improvement in
1.3.9 Cell totipotency, tissue culture a crop plant can often be achieved by intro-
and somaclonal variation ducing a single gene, and genes can now be
transferred to plants using the natural gene
The discovery of auxins, by Went and transfer system of a promiscuous pathogenic
colleagues, and cytokinins, by Skoog and soil bacterium, Agrobacterium tumefaciens.
colleagues, preceded the first success of in DNA can also be introduced into cells by
vitro culture of plant tissues (White, 1934; bombardment with DNA-coated particles
Nobcourt, 1939). or by electroporation. Transgenic breeding
8 Chapter 1
has the potential to decrease or increase 1.3.12 Breeding efforts in the public
the environmental impact of agricultural and private sectors
practices.
The initial successes in plant genetic Agricultural research has mainly been the
engineering marked a significant turning responsibility of a national and/or state gov-
point in crop research. In the 1990s in par- ernment department. To accelerate progress
ticular, there was an upsurge of private sec- in food production especially in developing
tor investment in agricultural biotechnology. countries, international agricultural research
Some of the first products were plant strains centres were established with major empha-
capable of synthesizing an insecticidal pro- sis on the development of high yielding culti-
tein encoded by a gene isolated from the vars. Two centres, International Rice Research
bacterium Bacillus thuringiensis (Bt). Bt cot- Institute (IRRI), Philippines, and Centro Inter-
ton, maize, and other crops are now grown nacional de Mejoramiento de Maiz y Trigo
commercially. There are also crop cultivars (CIMMYT), Mexico, established in the 1960s,
which are tolerant to or capable of degrad- made phenomenal contributions to food pro-
ing herbicides. Proponents stress the value duction by developing shorter and higher-
of these crops in conserving tillage soil, yielding rice, wheat and maize cultivars.
reducing the use of harmful chemicals and Encouraged by the astonishing success of
reducing the labour and costs involved in these centres and two others which were
crop production. established later, the Consultative Group on
International Agricultural Research (CGIAR)
was established in 1971. The CGIAR now has
1.3.11 DNA markers and genomics 15 international agricultural research cen-
tres, of which eight concentrate on specific
During the 1980s and 1990s, various types crop plants and one on genetic resources
of molecular markers such as restriction with a mission to contribute towards sus-
fragment length polymorphism (RFLP) tainable agriculture for food security espe-
(Botstein et al., 1980), randomly ampli- cially in developing countries. The breeding
fied polymorphic DNA (RAPD) (Williams materials developed at these centres are dis-
et al., 1990; Welsh and McClelland, 1990), tributed to public and private sector research
microsatellites and single nucleotide poly- programmes for utilization in the develop-
morphism (SNP) were developed. Because ment of locally adapted cultivars. Through
of their abundance and importance in the National Agricultural Research Systems
plant genome, molecular markers have been (NARS), these centres work in close coor-
widely used in the fields of germplasm dination with public and private breed-
evaluation, genetic mapping, map-based ing programmes in each country and share
gene discovery and marker-assisted plant their breeding technologies and stocks of
breeding. Molecular marker technology germplasm.
has become a powerful tool in the genetic In the USA, crop breeding, with the
manipulation of agronomic traits. exception of cotton, began largely as a
Initiated by the complete sequencing tax-supported endeavour with breeding
of the Arabidopsis genome in 2000 (The programmes taking place in most State
Arabidopsis Genome Initiative, 2000) and Agricultural Experimental Stations and in
the rice genome in 2002 (Goff et al., 2002; the United States Department of Agriculture
Yu et al., 2002), the genomes of an increas- (USDA). This pattern changed with the
ing number of plants have been or are being advent of hybrid maize when inbred lines
sequenced. Technological developments in were initially developed by public institu-
bioinformatics, genomics and various omics tions and utilized to produce hybrids by pri-
fields are creating substantial data on which vate companies. With the implementation
future revolutions in plant breeding can be of a Plant Variety Protection Act in the USA
based. in 1974, private breeding was expanded to
Introduction 9
include forages, cereals, soybean, and other leading to the proliferation of specific traits
crops. The activities of private companies within that population. The degree of gene
contributed to the total crop breeding flow varies widely and is dependent on the
effort and offered a large number of culti- type of organism and population structure.
var options for farmers and consumers. For example genes in a mobile popula-
In the USA and other industrialized coun- tion are likely to be more widely distrib-
tries today, the new life-science companies uted than those in a sedentary population,
notably the big multinationals such as Dow, resulting in high and low rates of gene flow,
DuPont and Monsanto, dominate the appli- respectively.
cation of biotechnology to agriculture, and
have developed many proprietary products.
1.4.2 Mutation
to phenotypic variation and to identify the both in terms of action and in transmission
molecular pathway from gene to function. through meiosis.
The recent progress made in humans by
combining linkage disequilibrium mapping
(Chapter 6) and transcriptomics (Chapter
3) holds great promises for high-resolution 1.5.2 The concept of allelic and
association mapping and identification genotypic frequencies
of regulatory genetic factors (Dixon et al.,
2007). Information from omics research will A biological population is defined geneti-
be integrated with our current knowledge at cally as a group of individuals that exist
the phenotypic level to increase the effec- together in time and space and that can
tiveness and efficiency of plant breeding. mate or be crossed to each other to produce
fertile progeny. Statistically, this group is
called a population. Breeding populations
are created by breeders to serve as a source
1.5.1 Qualitative and quantitative traits of cultivars that meet specific breeding
objectives.
In general, qualitative traits are genetically At the population level, genetics can be
controlled by one or a few major genes, characterized by allelic and genotypic fre-
each of which has a relatively large effect quencies. The allele frequencies refer to the
on the phenotype but is relatively insensi- proportion of each allele in the population,
tive to environmental influences. Trait dis- while the genotypic frequencies refer to the
tribution in a typical segregating population proportion of individuals (plants) in the
such as an F2 shows multi-peak distribu- population that have a particular genotype.
tion, although individuals within a category A gene may have many allelic states. Some
show continuous variation. Each individual of the alleles of a given gene may have such
in the population can be classified unam- marked effects as to be clearly recognized
biguously into distinct categories that cor- as a classical major mutant. Other alleles,
respond to different genotypes so that they though potentially separable at the DNA
can be studied using Mendelian methods. level, may well cause only minor differ-
Quantitative traits are genetically ences at the level of the external phenotype.
controlled by many genes, each of which has For example, one allele at a locus involved
a relatively small effect on the phenotype, with growth hormone production could be
but is largely influenced by environmental inactive and result in a dwarf plant, while
factors (Buckler et al., 2009). Trait distri- others may simply reduce or increase height
bution in an F2 population usually shows by a few percent.
a normal or bell-shape distribution and as Allele and genotypic frequencies can be
a result, individuals cannot be classified calculated by simple counting in the popu-
into phenotypic categories that correspond lation. For a gene with n alleles, there are
to different genotypes thus making the n(n + 1)/2 possible genotypes. The relation-
effects of individual genes indistinguish- ship between allele frequency and genotypic
able. Quantitative genetics is traditionally frequency for a single gene at the population
described as the study of all these genes as a level can be used to infer the genetic status
whole and the total variation observed in a of the gene in that population, relative to the
population results from the combined effects expected equilibrium under some assumed
of genetic (polygenes as a whole) and envi- mating system. Allele frequencies are gen-
ronmental factors. However, quantitative erally not an issue in breeding populations
variation is not due solely to minor allelic created from non-inbred parents or from
variation in structural genes as regulatory three or more inbred parents. But breed-
genes no doubt also contribute to this vari- ing populations in both self-pollinated and
ation. We expected polygenes to show all cross-pollinated crops are often created by
the typical properties of chromosomal genes crossing two inbred individuals.
12 Chapter 1
1.5.3 HardyWeinberg equilibrium (HWE) mean, m, also known as the first moment
about the origin, is a parameter used to
A population is in equilibrium if the allele measure the central location of a frequency
and genotypic frequencies are constant distribution. The population variance, s 2,
from generation to generation. A collec- also known as the second moment about the
tion of pure selfers is also at equilibrium mean, provides measures of the dispersion
if all are completely selfed, with PA1A1 = p of the distribution. If the yield trait for a cul-
and PA2A2 = q. This implies that the allele tivar that is genetically homogenous is taken
frequency and genotypic frequency share a as an example, the genetic effect for this
simple relationship: cultivar population is a constant. The yield
for all individuals should also be a constant
PA1A1 = p2 provided that environmental factors do not
PA1A2 = 2pq affect the yield which is equal to the pop-
PA2A2 = q2 ulation mean. However, the yield for each
individual is affected not only by its geno-
or
type but also by environmental factors such
(p + q)2 = p2 + 2pq + q2 as temperature, sunlight, water, and vari-
ous nutrients. As a result, individuals may
With one generation of random mating, have different phenotypic values, in this
i.e. an individual in the population that is case yield, resulting in continuous variation
equally likely to mate with any other indi- among individuals. Therefore, the individ-
vidual, the above simple relationship will ual yield measures vary either positively or
be obeyed. However, HWE represents ide- negatively around the population mean so
alized populations and breeders routinely that they are either higher or lower than the
use procedures that cause deviations from population mean by a certain number which
HWE. These procedures include the lack is determined by its variance.
of random mating, the use of small popula-
tion sizes, assortative mating, selection, and
inbreeding during the development of prog-
enies. Some of these procedures, such as 1.5.5 Heritability
inbreeding and the use of small population
sizes, affect all loci in the population while The response of traits to selection depends
others affect only certain loci. Suppose that on the relative importance of the genetic
two traits are controlled by different sets and non-genetic factors which contribute
of loci, and a change in one trait does not to phenotypic variation among genotypes
affect the other. If selection occurs only for in a population, a concept referred to as
the first trait, the loci affecting that trait heritability. The heritability of a trait has
may deviate from HWE, but the loci for the a major impact on the methods chosen for
other trait will remain in equilibrium. In population improvement, inbreeding, and
large natural populations, migration, muta- selection. Selection for single plants is
tion, and selection are the forces that can more efficient when the heritability is high.
change allelic frequencies from generation The extent to which replicated testing is
to generation. required for selection depends on the herit-
ability of the trait.
The question of whether a trait varia-
1.5.4 Population means and variances tion is a result of genetic or environmental
variation is meaningless in practice. Genes
Theoretically, a population can be described cannot cause a trait to develop unless the
by its parameters such as the mean and vari- organism is growing in an appropriate
ance which depend on the probability dis- environment, and, conversely, no amount
tribution of the population. The arithmetic of manipulation will cause a phenotype to
Introduction 13
develop unless the necessary gene or genes genetic gain, and predicted progress or gain,
are present. Nevertheless, the variability and has been denoted as R, GS, G and G.
observed in some traits might result prima- Starting with a parental population of
rily from difference in the numbers and the mean, m, a subset of individuals is selected.
_
magnitude of the effect of different genes, The selected individuals have a mean x ,
but that variability in other traits might while the offspring
_ of the selected popula-
stem primarily from the differences in the tion has a mean y . The difference between
environments to which various individuals the selected population and the original
have been exposed. It is therefore essential population is defined as the selection dif-
to identify reliable measures to determine ferential, and denoted by S, i.e.
the relative importance of not only the _
numbers and magnitude of the effects of the S=xm
genes involved, but also of the effects of dif-
ferent environments on the expression of The response to selection, R, can be
phenotypic traits (Allard, 1999). written as
Heritablity is defined as the ratio of _
genetic variance to phenotypic variance: R=ym
Parental population
Individuals
x
selected,
k = 5%
S=xm
Selection differential
Progeny population
Fig. 1.1. Distribution of parental and progeny populations with a selection intensity of 5%. Because the
phenotypic values of the selected plants include both a genetic and an environmental component, the
progeny means depend on the heritability of the trait selected.
1.5.7 Selection index and selection With tandem selection, one trait is selected
for multiple traits until it is improved to a satisfactory level
or a critical phenotypic value. Then, in the
In most plant breeding programmes, there next generation or programme, selection
is a need to improve more than one trait at for a second trait is carried out within the
a time. For example, a high-yielding culti- population selected for the first trait, and
var susceptible to a prevalent disease would so on for the third and subsequent traits.
be of little use to a grower. Recognition A selection index is a single score which
that improvement of one trait may cause reflects the merits and demerits of all target
improvement or deterioration in associ- traits. Selection among individuals is based
ated traits serves to emphasize the need on the relative values of the index scores.
for the simultaneous consideration of all Selection indices provide one method
traits which are important in a crop spe- for improving multiple traits in a breeding
cies. Three selection methods, which are programme. The use of a selection index
recognized as appropriate for the simulta- in plant breeding was originally proposed
neous improvement of two or more traits by Smith (1936) who acknowledged criti-
in a breeding programme, are index selec- cal input from Fisher (1936). Subsequently,
tion, independent culling, and tandem methods of developing selection indices
selection. Independent culling requires the were modified, subjected to critical evalu-
establishment of minimum levels of merit ation, and compared to other methods of
for each trait. An individual with a pheno- multiple trait selection.
type value below the critical culling level It is generally recognized that a selec-
for any trait will be removed from the popu- tion index is a linear function of observable
lation. That is, only individuals meeting phenotypic values of different traits. There
requirements for all traits will be selected. are a number of forms of the equations avail-
Introduction 15
able from index selection for multiple traits on the extent of previous testing of the par-
in grain. To construct a selection index, the ents included in the crosses. Although these
observed value of each trait is weighted by concepts were developed for breeding maize,
an index coefficient, an open-pollinated crop, they are generally
applicable to self-pollinated crops.
I = b1x1 + b2x2 + + bnxn The GCA for an inbred line or a cul-
tivar can be evaluated by the average per-
where I is an index of merit of an indi- formance of yield or other economic traits
vidual, xi represents the observed pheno- in a set of hybrid combinations. The SCA
typic value of the ith trait, and b1 bn are for a cross combination can be evaluated
weights assigned to phenotypic trait meas- by the deviation in its performance from
urements represented as x1 xn. The b val- the value expected from the GCA of its two
ues are the products of the inverse of the parental lines. If the crosses among a set of
phenotypic variancecovariance matrix, inbred lines are made in such a way that
genotypic variancecovariance matrix, and each line is crossed with several other lines
a vector of economic weights. A number of in a systematic manner, the total variation
variations of this index, most changing the among crosses can be partitioned into two
manner of computing the b values, have components ascribable to GCA and SCA. _
been developed. These include the base The mean performance of a cross (x AB)
index of Williams (1962), the desired gain between two inbred lines A and B can be
index of Pesek and Baker (1969), and retro- represented as
spective indexes proposed by Johnson et al. _
(1988) and Bernardo (1991). The emphasis x AB = GCAA + GCAB + SCAAB
in the retrospective index developments is
on quantifying the knowledge experienced The GCAA and GCAB are the GCA of the
breeders have obtained. Baker (1986) sum- parents A and B, respectively, and the cross
marized all select indexes in plant breeding of A B is expected to have a performance
developed before that time. equal to the sum (GCAA + GCAB) of the GCA
of their parents. The actual performance of
the cross, however, may be different from
1.5.8 Combining ability the expectation by an amount equivalent to
the SCA. Sprague and Tatum (1942) inter-
Combining ability is a very important con- preted these combining abilities in terms
cept in plant breeding and it can be used to of type of gene action. The differences due
compare and investigate how two inbred to GCA of lines are the results of additive
lines can be combined together to produce genetic variance and additive by additive
a productive hybrid or to breed new inbred interaction whereas SCA is a reflection of
lines. Selection and development of paren- non-additive genetic variances.
tal lines or inbreds with strong combining
ability is one of the most important breeding
objectives, no matter whether the goal is to 1.5.9 Recurrent selection
create a hybrid with strong vigour or develop
a pure-line cultivar with improved charac- Recurrent selection can be broadly defined
teristics compared to their parental lines. In as the systematic selection of desirable
maize breeding, Sprague and Tatum (1942) individuals from a population followed by
partitioned the genetic variability among recombination of the selected individuals to
crosses into effects due to primarily either form a new population. The basic feature of
additive or non-additive effects, which cor- recurrent selection methods is that they are
respond to two categories of combining abil- procedures conducted in a repetitive man-
ity, general combining ability (GCA) and ner, or recycling, including development
special combining ability (SCA). The rela- of a base population with which to begin
tive importance of GCA and SCA depends selection, evaluation of individuals from
16 Chapter 1
the population, and selection of superior for outcrossing crops, to rectify limitations
individuals as parents that can be crossed in inbred development by continuous self-
to produce a new population for the next ing that rapidly leads to inbreeding and
cycle of selection, as shown below: allele fixation and thus inadequate oppor-
tunity for selection. There are two ways by
Develop a
which recurrent selection address this lim-
population itation in inbred development (Bernardo,
2002). First, recurrent selection increases
the frequency of favourable alleles in the
Select superior Evaluate indi- population by repeated cycles of selection.
individuals as viduals in the Secondly, recurrent selection maintains the
parents population
degree of genetic variation in the popula-
tion to allow sustained progress from subse-
A cycle of selection is completed each quent cycles of selection. Genetic variation
time a new population is formed. The initial is maintained by recombining a sufficiently
population that is developed for a recurrent large number of individuals to reduce
selection programme is referred to as the random fluctuations in allele frequencies,
base, or cycle 0, population. The population i.e. genetic drift.
formed after one cycle of selection is called Since the late 1950s, extensive research
the cycle 1 population; the cycle 2 popula- has been conducted to determine the rela-
tion is developed from the second cycle of tive importance of different genetic effects
selection, and so on. on the inheritance of quantitative traits for
Recurrent selection procedures are most cultivated plant species. As indicated
conducted for primarily quantitatively by Hallauer (2007), quantitative genetic
inherited traits. The objective of recurrent research has provided extensive information
selection is to improve the mean perform- to assist plant breeders in developing breed-
ance of a population of plants by increas- ing and selection strategies. Directly and/or
ing the frequency of favourable alleles in a indirectly, the principles for the inheritance
consistent manner in order to enhance the of quantitative traits are pervasive in devel-
value of the population and to maintain the oping superior cultivars to meet the world-
genetic variability present in the popula- wide food, feed, fuel and fibre demands.
tion as effectively as possible. In addition, The principles of quantitative genetics will
separation of the genetic and environmen- have continued importance in the future.
tal effects is an important facet of effective
recurrent selection methods. The improved
populations can be used as a cultivar per
se, as parents of a cultivar-cross hybrid and 1.6 The Green Revolution and the
as a source of superior individuals that can Challenges Ahead
be used as inbred lines, pure-line cultivars,
clonal cultivars, or parents of a synthetic The application of science and technology
line. Successful recurrent selection results to crop production in the second half of the
in an improved population that is superior 20th century resulted in significant yield
to the original population in mean perform- improvements for rice, wheat and maize in
ance and in the performance of the best the developed countries, and the final result
individuals within it. Ideally, the popula- of these efforts was the Green Revolution
tion will be improved without its genetic which led to a new type of agriculture
variability being significantly reduced so high-input or chemical-genetic agri cul-
that additional selection and improvement ture which replaced the more traditional
can occur in the future. Recurrent selection system. Countries involved in the Green
is complementary to inbred development Revolution, a term coined by Borlaug
procedures; in fact the concept of recur- (1972), included Japan, Mexico, India and
rent selection was developed, particularly China among others.
Introduction 17
and social access to a balanced diet and breeding is becoming quicker, easier, more
safe dinking water will be threatened, with effective and more efficient (Phillips, 2006).
a holistic approach to nutritional and non- Plant breeders will be well equipped with
nutritional factors needed to achieve suc- innovative approaches to identify and/
cess in the eradication of hunger. Science or create genetic variation, to define the
and technology can play a very impor- genetic feature of the genes related to the
tant role in stimulating and sustaining an variation (position, function and relation-
Evergreen Revolution leading to long-term ship with other genes and environments),
increases in productivity without any asso- to understand the structure of breeding
ciated ecological harm (Borlaug, 2001; populations, to recombine novel alleles or
Swaminathan, 2007). The objectives of the allele combinations into specific cultivars
plant breeder can be realized through con- or hybrids, and to select the best individu-
ventional breeding integrated with various als with desirable genetic features which
biotechnology developments (e.g. Damude enable them to adapt to a wide range of
and Kinney, 2008; Xu et al., 2009c). environments.
Plant breeding can be defined as an Sequencing data for many plants is now
evolving science and technology (Fig. 1.2). readily available and the GenBank database
It has gradually been evolving from art to is doubling every 15 months. Over 20 plant
science over the last 10,000 years, starting species including many important crops are
as an ancient art to the present molecular in the process of being sequenced (Phillips,
design-based science. With the develop- 2008). The next challenge is to determine
ment of molecular tools which will be dis- the function of every gene and eventually
cussed further in Chapters 2 and 3, plant how genes interact to form the basis of com-
plex traits. Fortunately, DNA chips and
other technologies are being developed to
Art-based Plant Breeding study the expression of multiple or even
all genes simultaneously. High throughput
Collection of wild plants for food robotics and bioinformatics tools will play
Selection of wild plants for cultivation an essential role in this endeavour.
(starting from 10,000 years ago)
New information about our crop spe-
cies is expanding our capabilities to use
Large-scale breeding activities supported molecular genetics. For example, we did
by commercial seed production enterprises not previously realize how similar broadly
Hybridization combined with selection
related species are in terms of their gene
Evolution through natural selection
(1700s1800s) content and gene order. Since these spe-
cies cannot usually be crossed, there was
Mendelian genetics no means of assessing their relatedness.
Quantitative genetics With the advent of DNA-based molecular
Mutation markers, the extensive genetic mapping of
Polyploidy chromosomes became readily possible for
Tissue culture a variety of species. We learned that the
(1900s) genomes were highly similar and that this
similarity allowed the prediction of gene
Gene cloning and direct transfer locations among species. For example, rice
Genomics-assisted breeding has become the model or reference spe-
(2000s and beyond)
cies for the cereals as many of the gene
sequences on the rice chromosomes are
Molecular Plant Breeding
shared with other cereals such as maize,
Fig. 1.2. The steps of evolution of plant breeding. sorghum, sugarcane, millet, oats, wheat
With the availability of more sophisticated tools, and barley (Xu et al., 2005). Knowing the
the art of plant breeding became science-based complete DNA sequence of a model or ref-
technology, molecular plant breeding. erence genome allows genes/traits from this
20 Chapter 1
model to be tracked to other genomes. We improve the understanding of the role of het-
have come to realize that the differences erosis in evolution and the domestication of
between species of plants are not due to crop plants (Lippman and Zamir, 2007), and
novel genes, but to novel allelic specifica- finally to make it possible to predict hybrid
tions and interactions. performance.
Since many fundamental aspects of Messenger RNA transcript profiling is
current plant breeding procedures are not an obvious candidate for functional genomic
well understood, further data relating to application to plant breeding. Although
the genetics of crop species may help to direct selection at the gene transcript level
shed light on the genetic gains obtained using microarray or real-time PCR may be
from plant breeding. For example, in suc- a long-term goal, other genomic tools can
cessful plant breeding programmes, the be used to achieve shorter term goals with
genetic base often becomes narrower rather more practical applications (Crosbie et al.,
than broader. Elite by elite crosses may be 2006). Genetic modification of crops today
the rule in these programmes. Molecular involves the interfacing of molecular bio-
genetic markers have been widely employed logy, cell and tissue culture, and genetics/
to identify cryptic and novel genetic vari- breeding. The transfer of genes by cellu-
ation among cultivars and related species lar and molecular means will increase the
and used to increase the efficiency of selec- available gene pool and lead to second
tion for agronomic traits and the pyramid of generation biotechnology plant products
genes from different genetic backgrounds. such as those with a modified oil, protein,
Long-term selection programmes would vitamin, or micronutrient content or those
be expected to lead to genetic fixation, how- that have been engineered to produce com-
ever this has not been found to be the case pounds that can be used as vaccines or anti-
so far and variation is still observed. Several carcinogens.
mechanisms for de novo variation have been While all these new innovations have
described, including intragenic recombin- been useful, practical plant breeding con-
ation, unequal crossing over among repeated tinues to be based on hybridization and
elements, transposon activity, DNA methyl- selection with little change in the basic
ation, and paramutation. Another important procedures. A more complete understand-
feature in plant breeding whose molecular ing of the mechanisms by which genetic
basis is not understood is heterosis although and environmental variation modify yield
it is used as the basis for many seed-producing and composition is needed so that specific
industries. Genomics and particularly tran- quantitative and qualitative targets can be
scriptomics are now being used to identify identified. To achieve this aim, the exper-
the heterotic genes responsible for increas- tise of plant genomics (including various
ing crop yields. Comprehensive quantitative omics), physiology and agronomy, as well
trait locus-based phenotyping (phenomics) as plant modelling techniques must be com-
combined with genome-wide expression bined (Wollenweber et al., 2005) and many
analysis, should help to identify the loci logistic and genetic constraints also need to
controlling heterotic phenotypes and thus be resolved (Xu and Crouch, 2008).
2
Molecular Breeding Tools:
Markers and Maps
Table 2.1 lists the major molecular Schwarz (2005) and Falque and Santoni
marker technologies that are currently (2007). Further information regarding the
available. Only a selection of widely-used application of DNA markers in genetics and
representative types of markers will be dis- breeding can be found in Lrz and Wenzel
cussed in this section. Figure 2.1 shows the (2005). After a brief review of the classical
molecular mechanism of several major DNA markers, DNA markers will be discussed in
markers and the genetic polymorphisms more detail in this section.
that can be generated by restriction site or
PCR priming site mutation, insertion, dele-
tion or by changing the number of repeat 2.1.1 Classical markers
units between two restriction or PCR prim-
ing sites and nucleotide mutation resulting Morphological markers
in a single nucleotide polymorphism (SNP).
There are several comprehensive reviews In the late 1800s, following his studies on
that cover all the important DNA markers, the garden pea (Pisum sativum), G.J. Mendel
e.g. Reiter (2001), Avise (2004), Mohler and proposed two basic rules of genetics,
A. Mutation at
enzyme restriction
or PCR priming site
RFLP, AFLP, CAPS
B. Insertion
between enzyme
restriction or PCR
priming sites Insertion
C. Deletion
between enzyme
restriction or PCR
Deletion
banding sites
D. Change of
tandem repeat
units between
enzyme restriction
or PCR banding
sites
SSR, VNTR, ISSR
Fig. 2.1. Molecular basis of major DNA markers. Parts AE show different ways in which DNA markers
(listed below each diagram) can be generated. The cross in part A indicates that mutation has eliminated
the priming site. Abbreviations: as defined in Table 2.1; VNTR, variable number of tandem repeat; CAPS,
a DNA marker generated by specific primer PCR combined with RFLP; ISSR, inter simple sequence repeat.
24 Chapter 2
which were later known as the Mendelian 1998). Many of these markers have been
laws of equal segregation and independ- linked with other agronomic traits.
ent assortment. Mendel selected individu- Morphological markers are usually
als which differed in a particular trait and mapped by classical two- or three-point
used them as the parental lines in a cross linkage tests. The linkage groups are estab-
breeding experiment to determine the phe- lished and the order of the markers and
notype of the offspring with regard to the the relative distance between any two are
selected trait. The term phenotype (derived determined by their recombinant frequen-
from Greek) literally means the form that cies. Relatively complete linkage maps
is shown and is used by both geneticists have been constructed in many crop spe-
and breeders. The seven pairs of contrasting cies using morphological markers and these
phenotypes studied by Mendel included maps provide the fundamental information
round versus wrinkled seeds, yellow ver- for the genetic mapping of many physiolog-
sus green seeds, purple versus white petals, ical and biochemical traits.
inflated versus pinched pods, green versus However, it is difficult to construct a
yellow pods, axial versus terminal flowers relatively saturated genetic map because of
and long versus short stems. The plants in the limitation in the number of morphologi-
the segregated populations of the pea, such cal markers with distinguishable polymor-
as F2 and backcross, were classified into two phisms. In addition, many morphological
distinct groups depending on their pheno- markers have deleterious effects on pheno-
types. These contrasting morphological types and some are significantly affected by
phenotypes are the starting point for any other factors such as environments or matu-
genetic analysis and can be mapped to par- rity which results in potential problems
ticular chromosomes using the Mendelian when these markers are used for genetics
laws of inheritance and can thus be used as and plant breeding.
morphological markers of the genome and
the particular trait. Cytological markers
Morphological markers therefore gen-
erally represent genetic polymorphisms By studying the morphology, number and
which are visible as differences in appear- structure of chromosomes from different
ance, such as the relative difference in plant species, particular cytogenetic features can
height and colour, distinct differences in be found, such as various types of aneu-
response to abiotic and biotic stresses, and ploidy, variants of chromosome structure
the presence/absence of other specific mor- and abnormal chromosomes. These can
phological characteristics. A large number be used as genetic markers to locate other
of variants showing particular morphologi- genes on to chromosomes and determine
cal or physiological phenotypes have been their relative positions, or used for genetic
generated by tissue culture and mutation mapping via chromosome manipulations
breeding. Using selection techniques these such as chromosome substitution.
variants can be genetically stabilized and The structural features of chromo-
then used as morphological markers. somes can be shown by chromosome kary-
Some genetic stocks contain more than otype and bands. The banding patterns are
one morphological marker, for example indicated by colour, width, order and posi-
there are a total of over 300 morphologi- tion, revealing the difference in distribu-
cal markers available for genetic studies in tions of euchromatin and heterochromatin.
rice (Khush, 1987) and more are being cre- There are Q bands (produced by quina-
ated for functional genomics. Many mor- crine hydrochloride), G bands (produced
phological marker stocks are also available by Giemsa stain) and R bands (reversed
for tomato (http://www.plantpath.wisc.edu/ Giemsa). These chromosome landmarks are
GeminivirusResistantTomatoes/MERC/ not only useful for characterizing normal
Tomato/Tomato.html), maize (Neuffer et al., chromosomes but also for detecting chro-
1997) and soybean (Palmer and Shoemaker, mosome mutation.
Markers and Maps 25
Cytological markers have been widely otide difference within a gene or between
used to identify linkage groups within spe- genes; and in others it represents the site
cific chromosomes and have been widely of a variable number of tandem repeats of
applied in physical mapping. However, junk DNA present between genes. The
because of the limited number and reso- development of RFLP markers has acceler-
lution, they have limited applications in ated the construction of molecular linkage
genetic diversity analysis, genetic mapping maps for many organisms, improved the
and marker-assisted selection (MAS). accuracy of gene location, and reduced the
time required to establish a complete link-
Protein markers age map.
The digestion of purified DNA using
Isozymes are structural variants of an restriction enzymes which cut the DNA
enzyme and while they differ from the strand wherever there is a recognition
original enzyme in molecular weight and site sequence (usually four to eight base
mobility in an electric field, they have the pairs), leads to the formation of RFLPs
same catalytic activity. The difference in which yield a molecular fingerprint that
enzyme mobility is caused by point muta- may be unique to a particular individual.
tions resulting from amino acid substitu- If the bases are positioned at random in the
tion such that isozymes reflect the products genome, an enzyme having a recognition
of different alleles rather than different site with six bases will cleave the DNA at
genes. Therefore, isozymes can be geneti- every 4096 bases on average (46). A genome
cally mapped on to chromosomes and then of 109 bases could thus produce around
used as genetic markers for mapping other 250,000 restriction fragments of variable
genes. Isozyme markers are based on their length. Gel electrophoresis on such a large
biochemistry and thus are also known as number of genomic DNA digestion prod-
biochemical or protein markers. ucts produces a continuous smear image.
However, their use as markers is lim- Particular fragments that are homologous
ited. For example a total of 57 isozymes between several individuals, and possibly
representing about 100 loci have been iden- allelic, can be separated only by means
tified in plants (Vallegos and Chase, 1991) of molecular probes using the Southern
but for specific species only 1020 iso- technique (Southern, 1975). RFLP analysis
zymes are available so that they cannot be includes the following steps (Fig. 2.2):
used to construct a complete genetic map.
Each isozyme can only be identified with a 1. DNA isolation: a significant amount of
specific stain which also limits their use in DNA must be isolated from multiple indi-
practice. viduals from target genotypes (parents and
segregating populations, germplasm survey,
garden blot, etc.) and purified to a fairly
2.1.2 DNA markers stringent degree as contaminants can often
interfere with the restriction enzyme and
RFLP inhibit its ability to digest the DNA.
2. Restriction digestion: restriction enzyme
Botstein et al. (1980) first used DNA restric- is added to purified genomic DNA under
tion fragment length polymorphism (RFLP) buffered conditions. The enzyme cuts at
in human linkage mapping and this pio- recognition sites throughout the genome
neered the utilization of DNA polymor- and leaves behind hundreds of thousands
phisms as genetic markers. It is known that of fragments.
the genomes of all organisms show many 3. Gel electrophoresis: digested products
sites of neutral variation at the DNA level. (restriction fragments) are electrophoresed
These neutral variant sites do not have any on agarose gel and when visualized appear
effect on the phenotype. In some cases a neu- as smears because of the large number of
tral site is nothing more than a single nucle- fragments.
26 Chapter 2
A1 A2 A1 A2 A1 A2
A1 A2
Fig. 2.2. RFLP workflow from DNA extraction to radio-autograph. Modified from Xu and Zhu (1994).
4. The agarose gel is denatured using NaOH DNA (cDNA). The standard procedure for
solution and then neutralized. developing genomic DNA probes is to digest
5. The DNA fragments are transferred to a total DNA with a methylation-sensitive
nitrocellulose membrane using Southern enzyme (e.g. PstI), thereby enriching the
blotting. library for single-copy sequences (Burr et al.,
6. Probe visualization: the membrane-bound 1988). Typically, the digested DNA is size
genomic DNA is probed by hybridization fractionated on a preparative agarose gel.
using a cloned fragment of the genome of DNA fragments ranging from 500 to 2000 bp
interest or a genome from a relatively close are excised and eluted for cloning into a
species as the probe. plasmid vector (e.g. pUC18). Digests of the
7. The membrane is washed to remove non- plasmids are screened for inserts and their
specifically hybridized DNA. lengths can be estimated. Southern blots of
8. In most cases the sizes of the fragments the inserts can be probed with total sheared
are determined by radioactive methods. genomic DNA to select clones that hybrid-
The probe-restriction enzyme combina- ize to single- and low-copy sequences and to
tions may identify two or more differently eliminate clones that hybridize to medium-
sized fragments. Polymorphism is revealed and high-copy repeated sequences. Single-
whenever the recognized fragments are of and low-copy probes are screened for RFLPs
non-identical lengths. among a sample of genotypes using genomic
DNAs digested with restriction endonucle-
Differences in size of restriction frag- ases (one per assay). Typically, in species
ments are due to: (i) base pair changes that with moderate to high polymorphism rates,
result in gain and loss of restriction sites; two to four restriction endonucleases with
and (ii) insertions/deletions at the restric- hexanucleotide recognition sites are tested.
tion sites within the restriction fragments EcoRI, EcoRV and HindIII are widely used.
on which the probe sequence is located. In species with low polymorphism rates,
Molecular probes are DNA fragments additional restriction endonucleases can
isolated and individualized by cloning or be tested to increase the chance of find-
PCR amplification. They may originate from ing a polymorphism. Both the theory and
fragmented total genomic DNA and thus the techniques for RFLP analysis in plant
contain coding or non-coding sequences, genome mapping have been intensively
unique or repeated, of nuclear or cytoplas- reviewed (Botstein et al., 1980; Tanksley
mic origin. They may also be complementary et al., 1988).
Markers and Maps 27
Most RFLP markers are co-dominant and is used to amplify random sequences from
locus specific. RFLP genotyping is highly a complex DNA template that is comple-
reproducible and the methodology is sim- mentary to it (or includes a limited number
ple and requires no special instrumenta- of mismatches). This means that the ampli-
tion. High-throughput markers (e.g. cleaved fied fragments generated by PCR depend
amplified polymorphic sequence (CAPS) on the length and size of both the primer
or insertion/deletion (indel) markers) can and the target genome. Ten-base oligomers
be developed from RFLP probe sequences. of varying GC content (ranging from 40 to
The CAPS technique, also known as PCR- 100%) are usually used. If two hybridiza-
RFLP, consists of digesting a PCR-amplified tion sites are similar to one another (at least
fragment with one or several restriction 3000 bp) and in opposite directions, that is,
enzymes, and detecting the polymorphism in a configuration that will allow the PCR,
by the presence/absence restriction sites amplification will take place. The amplified
(Konieczny and Ausubel, 1993). products (of up to 3.0 kb) are usually sepa-
RFLP markers are powerful tools rated on agarose gels and visualized using
for comparative and synteny mapping. ethidium bromide staining. The use of a
However, RFLP analysis requires large single 10-mer oligonucleotide promotes the
amounts of high quality DNA and has low generation of several discrete DNA products
genotyping throughput and is very diffi- and these are considered to originate from
cult to automate. Most genotyping involves different genetic loci. Polymorphisms result
radioactive methods so its use is limited to from mutations or rearrangements either at
specific laboratories. RFLP probes must be or between the primer binding sites and are
physically maintained and it is therefore visible in conventional agarose gel electro-
difficult to share them between laboratories. phoresis as the presence or absence of a par-
In addition, the level of RFLP is relatively ticular RAPD band. RAPDs predominantly
low and selection for polymorphic parental provide dominant markers but homologous
lines is a limiting step in the development allele combinations can sometimes be iden-
of a complete RFLP map. tified with the help of detailed pedigree
information.
RAPD RAPDs have several advantages and for
this reason they are widely used (Karp and
Williams et al. (1990) and Welsh and Edwards, 1997). (i) Neither DNA probes nor
McClelland (1990) independently described sequence information is required for the
the utilization of a single, random-sequence design of specific primers. (ii) The proce-
oligonucleotide primer in a low stringency dure does not involve blotting or hybridiza-
PCR (3545C) for the simultaneous ampli- tion steps thus making the technique quick,
fication of several discrete DNA fragments simple and efficient. (iii) RAPDs require rel-
referred to as random amplified polymor- atively small amounts of DNA (about 10 ng
phic DNA (RAPD) and arbitrary primed PCR per reaction) and the procedure can be auto-
(AP-PCR), respectively. Another related mated; they are also capable of detecting
technique is DNA amplification fingerprint- higher levels of polymorphism than RFLPs.
ing (DAF) (Caetano-Anolls et al., 1991). (iv) Development of markers is not required
These methods differ from one another in and the technology can be applied to vir-
primer length, the stringency of the con- tually any organism with minimal initial
ditions and the method of separation and development. (v) The primers can be uni-
detection of the fragments. They all can be versal and one set of primers can be used for
used to identify RAPD. any species. In addition, RAPD products of
The principle of RAPD consists of a interest can be cloned, sequenced and con-
PCR on the DNA of the individual under verted into other types of PCR-based mark-
study using a short primer, usually ten ers such as sequence tagged sites (STS),
nucleotides, of arbitrary sequence. The sequenced characterized amplified regions
primer which binds to many different loci (SCAR), etc.
28 Chapter 2
Reproducibility affects the way in which clear what might be causing the problem, it
RAPD bands can be standardized for compar- is worth starting from the beginning by dis-
ison across laboratories, samples and trials posing of all the reagents used and preparing
and whether RAPD marker information can fresh ones. A careful experiment revealed
be accumulated or shared. Due to frequently that reproducibility could be improved and
observed problems with reproducibility of Taberner et al. (1997) reported that 3396 out
overall RAPD profiles and specific bands, of 3422 bands (99.2%) were reproducible.
this marker class is often treated with On the other hand, low reproducibility
reserve. In replication studies by Prez et al. is a major limitation of RAPD markers, par-
(1998), mispriming error amounted to 60%. ticularly in ongoing genetic and plant breed-
Several factors have been shown to affect ing programmes in which the accumulated
the number, size and intensity of bands. information and markers and marker data
These include PCR buffers, deoxynucleo- are shared between laboratories and experi-
tide triphosphates (dNTPs), Mg2+ concen- ments. RAPD markers may still find their
tration, cycling parameters, source of Taq applications in independent genetic diver-
polymerase, condition and concentration sity and phylogenetic studies that do not
of template DNA and primer concentra- depend on data sharing or accumulation. As
tion. Results obtained by RAPDs are highly RAPD markers can be converted into other
prone to user error and bands obtained can types of markers, they have a unique role in
vary considerably between different runs of the development of target markers for crop
the same sample. To correct the problems species that have limited molecular markers
that may be encountered when carrying out available to cover the whole genome.
RAPD-PCR, it is important to bear in mind To overcome the problem associated
the following: (i) the concentration of DNA RAPD analysis, Paran and Michelmore
can alter the number of bands; (ii) RAPD (1993) converted RAPD fragments into
profiles vary depending on the Mg2+ con- simple and robust PCR markers known as
centration and the PCR buffer provided by SCARs. This procedure increases the repro-
Taq polymerase suppliers may or may not ducibility of RAPD markers and also avoids
contain Mg2+ ions; (iii) there are different the occurrence of non-homologous mark-
sources of Taq polymerase and there is great ers of equal molecular weight. These spe-
variation between profiles produced using cific markers are obtained by introducing
Taq polymerase obtained from different RAPD bands (polymorphic) into single
companies; (iv) there are a large number of markers which are then sequenced and
alternative cycling times and temperatures specific primers are designed usually by
which are equally important and depend on expanding the original decamer primer
the type of machine used and even the wall sequence with 1015 bases so that only the
thickness of the PCR tubes. band of interest is amplified. In general,
Generally if a PCR does not work there DNA can be isolated from agarose gels,
is likely to be something wrong with the cloned and sequenced to produce the start-
template DNA, primers, Taq polymerase or ing DNA template for the development of a
choice of conditions. Initially it is impor- variety of PCR-based markers. The cloned
tant to try and repeat the PCR under the and sequenced DNA fragments can then be
same conditions to ensure that there was used for the development of CAPS, single
not a simple error that resulted in the fail- strand conformation polymorphism (SSCP)
ure. In addition it is recommended that both or SNP markers.
positive and negative controls are included.
A positive control with a template known AFLP
to amplify well will ensure that all reagents
have been added and that they are all func- Amplified fragment length polymorphism
tioning. A negative control without template (AFLP; Zabeau and Voss, 1993; Vos et al.,
DNA will reveal any contamination. In most 1995) is based on the selective PCR ampli-
cases if the PCR does not work and it is not fication of restriction fragments from a total
Markers and Maps 29
with a complementary sequence for the rare the detector near the bottom of the gel/end
cutter and the other with the complemen- of the capillary, resulting in a linear spac-
tary sequence for the frequent cutter. In this ing of DNA fragments and therefore increas-
way only fragments which have been cut by ing the resolution over the whole size range
the frequent cutter and rare cutter will be (Schwarz et al., 2000).
amplified. Primers are designed from the In general, AFLP assays can be carried
known sequence of the adaptor, plus one out using relatively small DNA samples
to three selective nucleotides which extend (typically 1100 ng per individual). AFLP
into the fragment sequence. Sequences not has a very high multiplex ratio and genotyp-
matching these selective nucleotides in the ing throughput and is relatively reproduc-
primer will not be amplified so that the ible across laboratories. Simple off-the-shelf
specific amplification of only those frag- technology can be applied to virtually any
ments matching the primers is achieved. organism with no formal marker devel-
The option to permutate the order of the opment required and in addition, a set of
selective bases and to recombine the prim- primers can be used for different species.
ers with each other will theoretically lead However, there are limitations to the AFLP
to the gradual collection of all restriction assay. (i) The maximum polymorphic infor-
fragments from a particular enzyme com- mation content for any bi-allelic marker
bination that is of a suitable size for DNA is 0.5. (ii) High quality DNA is needed to
fragment analysis from a genotype. The ensure complete restriction enzyme diges-
multiplex ratio of an AFLP assay is a func- tion. Rapid methods for isolating DNA may
tion of the number selective nucleotides in not produce sufficiently clean template
the AFLP primer combination, the selective DNA for AFLP analysis. (iii) Proprietary
nucleotide motif, GC content and physical technology is needed to score heterozygotes
genome size and complexity. Typically, two and ++ homozygotes, otherwise AFLPs must
selective nucleotides are used for species be dominantly scored. (iv) AFLP markers
with small genomes (1 1085 108 bp), often cluster densely in centromeric regions
e.g. Arabidopsis thaliana L. (1 108 bp) and in species with large genomes, e.g. barley
rice (Oryza sativa L.) (4 108 bp), and three (Qi et al., 1998) and sunflower (Gedil et al.,
selective nucleotides are used for species 2001). (v) Developing locus-specific mark-
with large genomes (5 1086 109 bp), ers from individual fragments can be dif-
e.g. maize, soybean, sunflower and many ficult. (vi) AFLP primer screening is often
others. It is theoretically possible to use necessary to identify optimal primer spe-
several tens of combinations of restriction cificities and combinations otherwise the
enzymes at sites of four to six bases and a assays can be carried out using off-the-shelf
large number of combinations of selective technology. (vii) There are relatively high
bases on the amplification primers. Thus, technical demands in AFLP analysis includ-
as indicated by Falque and Santoni (2007), ing radio-labelling and skilled manpower.
the restrictionamplification combinations (viii) Marker development is complicated
are nearly infinite. and not cost-effective. (ix) Reproducibility
AFLP products can be separated in high- is relatively low compared to RFLP and
resolution electrophoresis systems. The simple sequence repeat (SSR) markers but
number of bands produced can be manipu- better than RAPD marker as AFLP reveals
lated by the number of selective nucleotides large numbers of bands and not all the bands
and the nucleotide motifs used. A well- will be comparable across laboratories or
balanced number of amplified restriction trials due to potential false positive, false
fragments ranges from 50 to150 bp. A major negative and complicated gel backgrounds.
improvement has been made by switching The AFLP technique can be modified
from radioactive to fluorescent dye-labelled so that one primer is obtained from a known
primers for the detection of fragments in multi-copy sequence to detect sequence-
gel-based or capillary DNA sequencers in specific amplification polymorphisms. This
which fluorescently labelled fragments pass approach was used successfully to generate
Markers and Maps 31
libraries enriched for one or more repeat ing barley, soybean, sugarbeet, maize,
motifs (although SSR-enriched libraries can cassava and potato; typical SNP frequen-
be commercially purchased) and the high cies are also in the range of one SNP every
start-up costs for automated methods. 100300 bp in plants (see Edwards et al.,
2007a for a review).
SNP SNPs may fall within coding sequences
of genes, non-coding regions of genes or in
A single nucleotide polymorphism or the intergenic regions between genes at dif-
SNP (pronounced snip) is an individual ferent frequencies in different chromosome
nucleotide base difference between two regions. In Arabidopsis the distribution of
DNA sequences. SNPs can be catego- SNPs was found to be even across the five
rized according to nucleotide substitu- chromosomes with the exception of cen-
tion as either transitions (C/T or G/A) or tromeric regions which contain few tran-
transversions (C/G, A/T, C/A or T/G). For scribed genes (Schmid et al., 2003). SNPs
example, sequenced DNA fragments from within a coding sequence will not neces-
two different individuals, AAGCCTA to sarily change the amino acid sequence of
AAGCTTA, contain a single nucleotide dif- the protein that is produced due to redun-
ference. In this case there are two alleles: dancy in the genetic code. A SNP in which
C and T. C/T transitions constitute 67% of both forms lead to the same polypeptide
the SNPs observed in humans, and about sequence is termed synonymous, while if
the same rate was also found in plants a different polypeptide sequence is pro-
(Edwards et al., 2007a). In practice, single duced they are non-synonymous. SNPs
base variants in cDNA (mRNA) are consid- that are not in protein coding regions may
ered to be SNPs as are single base inser- still have consequences for gene splic-
tions and deletions (indels) in the genome. ing, transcription factor binding or the
As a nucleotide base is the smallest unit sequence of non-coding RNA. Of the 317
of inheritance, SNPs provide the ultimate million SNPs found in the human genome,
form of molecular marker. 5% are expected to occur within genes.
For a variation to be considered a SNP, Therefore, each gene may be expected to
it must occur in at least 1% of the popula- contain 6 SNPs.
tion. SNPs make up about 90% of all human A variety of approaches have been
genetic variation and occur every 100300 adopted for discovery of novel SNPs in a
bases. Two of every three SNPs involve the wide range of organisms including plants.
replacement of cytosine (C) with thymine These fall into three general categories
(T). This is supported by a genome-wide (Edwards et al., 2007b): (i) in vitro discov-
analysis in rice. A polymorphism data- ery, where new sequence data is generated;
base constructed to define polymorphisms (ii) in silico methods that rely on the analysis
between cultivars Nipponbare (from sub- of available sequence data; and (iii) indirect
species japonica) and 93-11 (from subspe- discovery, where the base sequence of the
cies indica) contains 1,703,176 SNPs and polymorphism remains unknown. On the
479,406 indels (Shen et al., 2004), which other hand, a large number of different SNP
equates to approximately 1 SNP/268 bp genotyping methods and chemistries have
in the rice genome. Using alignments of been developed based on various meth-
the improved whole-genome shotgun ods of allelic discrimination and detection
sequences for japonica and indica rice, platforms. A convenient method for detect-
SNP frequencies varied from 3 SNPs/kb in ing SNPs is RFLP (SNP-RFLP) or by using
coding sequences to 27.6 SNPs/kb in the the CAPS marker technique. If one allele
transposable elements with a genome-wide contains a recognition site for a restriction
measure of 15 SNPs/kb or 1 SNP/66 bp enzyme while the other does not, digestion
(Yu et al., 2005). Based on partial genomic of the two alleles will give rise to fragments
sequence information, SNP frequencies of different length. A simple procedure is
have been revealed in many crops, includ- to analyse the sequence data stored in the
Markers and Maps 35
major databases and identify SNPs. Four be bound to streptavidin-coated wells and
alleles can be identified when the complete denatured under alkaline conditions. An
base sequence of a segment of DNA is con- oligonucleotide probe complementary to
sidered and these are represented by A, T, G one allele is added to the single-strand target
and C at each SNP locus in that segment. DNA molecules. The differences in melting
Sobrino et al. (2005) assigned the major- curves are measured by slowly heating and
ity of SNP genotyping assays to one of four observing the changes in fluorescence of a
groups based on the molecular mechanisms: double-strand-specific, intercalating dye.
allele-specific hybridization, primer exten- The 5' nuclease or TaqMan assay, molecu-
sion, oligonucleotide ligation and invasive lar beacon and the scorpion assays are all
cleavage. These four are described below. examples of ASH SNP genotyping technolo-
Chagn et al. (2007) added three methods gies. Large-scale scanning of SNPs in a vast
to this list, sequencing, allele-specific PCR number of loci using allele-specific hybridi-
amplification, DNA conformation methods zation can be carried out on high-density
and also generalized the enzymatic cleav- oligonucleotide chips.
age method to include the invader assay 2. The Invader assay, also known as flap
and also dCAPS and targeting induced endonuclease discrimination, is based on
local lesions in genomes (TILLING). the specificity of recognition and cleavage
by a three-dimensional flap endonuclease
1. Allele-specific hybridization (ASH), also which is formed when two overlapping oli-
known as allelic-specific oligonucleotide gonucleotides hybridize perfectly to a target
hybridization, is based on distinguishing by DNA (Lyamichev et al., 1999). The cleaved
hybridization between two DNA targets dif- fragment may be labelled with a probe-
fering at one nucleotide position (Wallace specific fluorescent dye which fluoresces
et al., 1979). Allelic discrimination can be following probe cleavage due to spatial sep-
achieved using two allele-specific probes aration from the quencher. Alternatively, the
labelled with a probe-specific fluorescent flap may act as the invader probe in a sec-
dye and a generic quencher that reduces flu- ondary reaction to amplify the fluorescent
orescence in the intact probe. During ampli- signal (Invader squared) (Hall et al., 2000).
fication of the sequence surrounding the Third Wave Technologies Inc. (http://www.
SNP, probes complementary to the DNA tar- twt.com) has manufactured an Invader assay
get are cleaved by the 5' exonuclease activ- for flap endonuclease discrimination which
ity of Taq polymerase. Spatial separation of can be carried out in solid phase using
the dye and quencher results in an increase oligonucleotide-bound streptavidin-coated
in probe-specific fluorescence which can be particles (Wilkins-Stevens et al., 2001).
detected with a plate reader. 3. Primer extension is a term used to
Under optimized assay conditions, describe mini-sequencing, single-base exten-
the SNP can be detected by the difference sion or the GOOD assay (Sauer et al., 2002).
in Tm of the two probetemplate hybrids A popular method which was designed
as only the perfectly matched probetarget specifically for genotyping SNPs is the
hybrids are stable and those with one-base mini-sequencing technique (Syvnen, 1999;
mismatch are unstable. To increase the reli- Syvnen et al., 1990). The method forms the
ability of SNP genotyping the probes should basis of a number of methods for allelic dis-
be as short as possible. Originally, ASH crimination. The robust detection of known
used the dot blot format in which probes are mutations employs oligonucleotides which
hybridized to membrane-bound genomic anneal immediately upstream of the query
DNA or PCR fragments. However, the SNP and are then extended by a single
more advanced PCR-based dynamic allele- dideoxynucleotide triphosphate (ddNTP)
specific hybridization (DASH) method uses in cycle sequencing reactions. The fidel-
a microtitre plate format (Howell et al., ity of thermostable proof-reading DNA
1999). Since one of the PCR primers is bioti- polymerases guarantees that only the com-
nylated at the 5' end, the PCR products can plementary ddNTP is incorporated. Several
36 Chapter 2
Illumina
BeadArray
Allele-specific Luminex 100 Flow
Semi-homogen.
extend ligate Cytometry
Sequenom iPlex
Oligonucleotide Solid phase Mass Spec.
ligation assay microspheres Fluorescence
ABI SNPlex
Single nucleotide
primer extension Homogeneous Mass Microarray
spectrometry minisequencing
ABI TaqMan
Capillary 5-Nuclease
Allele-specific
electrophoresis
hybridization Fluor. res. energy
transfer (FRET) ABI SNaPshot
Solid phase
DASH,
microarray
Amplicon Tm
Allele-specific Fluorescence
PCR polarization Perkin-Elmer
FP-TDI
Fig. 2.5. Chemistry, demultiplexing, detection options in SNP genotyping. From Syvnen (2001) reprinted
by permission from Macmillan Publishers Ltd.
LIGHT DETECTION. Pyrosequencing involves level, the multiple steps can be assembled
hybridization of a sequencing primer to a and automated so that one laboratory tech-
single stranded template and sequential nician can produce 10,000 data points per
addition of individual dNTPs. Incorporation day. The TaqMan platform is highly suita-
of a dNTP into a primer releases pyrophos- ble for genetically modified organism tests
phate which triggers a luciferase-catalysed and MAS using a few markers for a large
reaction. The genotype of a SNP is deter- number of samples.
mined by the sequential addition (and The SNaPshot Multiplex Assay
degradation) of nucleotides. The light (Applied Biosystems, Foster City, USA)
produced is detected by a charge coupled is based on mini-sequencing, i.e. a single-
device camera and each light signal is pro- base extension using fluorescent labelled
portional to the number of nucleotides ddNTPs. The systems multiplex ready
incorporated (http://www.pyrosequencing. reaction mix enables robust multiplex
com), for which reason pyrosequencing is SNP interrogation of PCR-generated tem-
suitable for the quantitative estimation of plates. Multiplexing can be accomplished
allele frequencies in pooled DNA samples. by representing multiple SNP products
Furthermore, pyrosequencing proved to spatially. This is achieved by tailing the
be an appropriate method for genotyping 5' end of the unlabelled SNaPshot primers
SNPs in polyploidy plant genomes such with different lengths of non-complemen-
as potato because all possible allelic states tary oligonucleotide sequences that serve
of binary SNP could be accurately distin- as mobility modifiers. The reactions may
guished (Rickert et al., 2002). be carried out in 5- to 10-plex using capil-
There are various SNP detection systems lary electrophoresis for data detection in a
which differ in their chemistry, detection 96-well format so that one individual can
platform, multiplex level and application; generate over 10,000 data points per day.
some of these will be discussed below. SNaPshot is suitable for MAS of several
The reader is also referred to Bagge and traits simultaneously and if multiple sets
Lbberstedt (2008) for further information. of 10-plex are combined, it can be used for
The TaqMan SNP Genotyping Assay rough mapping and marker-assisted back-
(Applied Biosystems, Foster City, USA) is crossing with several hundreds of samples
a single-tube PCR assay that exploits the and markers involved.
5' exonuclease activity of AmpliTaq Gold The SNPlex Genotyping System
DNA. The assay kit includes two locus- (Applied Biosystems, Foster City, USA)
specific PCR primers that flank the SNP of uses OLA/PCR technology for allelic dis-
interest and two allele-specific oligonucle- crimination and ligation product amplifica-
otide TaqMan probes. These probes have a tion. Genotype information is then encoded
fluorescent reporter dye at the 5' end and into a universal set of dye-labelled, mobil-
a non-fluorescent quencher with a minor ity modified fragments known as Zipchute
groove binder at the 3' end. Upon cleav- Mobility Modifiers, for rapid detection by
age by the 5' exonuclease activity of Taq capillary electrophoresis. The same set of
polymerase during PCR, the reporter dye Zipchute Mobility Modifiers can be used
will fluoresce as it is no longer quenched for every SNPlex pool regardless of which
and the intensity of the emitted light can SNPs are chosen. The SNPlex System
be measured. Modified probes such as allows for multiplexed genotyping of up
locked nucleic acids, a modified nucleic to 48 SNPs simultaneously against a single
acid analogue, showed better hybridization sample with the ability to detect up to 4500
properties than standard TaqMan probes SNPs in parallel in 15 min. This integrated
(Kennedy et al., 2006). TaqMan is a simple system delivers cost-efficient, medium- to
assay, since all the reagents are added to the high-throughput genotyping and is suitable
microtitre well at the same time in a 96- or for various genetic and breeding applica-
384-well format. Although the assay can tions including fingerprinting, gene map-
be carried out at the monoplex or duplex ping and MAS for both foreground and
Markers and Maps 39
background. Both SNaPshot and SNPlex ing well-known reaction principles for DNA
can be used with capillary electrophoresis amplification and SNP genotyping.
systems as the genotyping platform which Identification of a specific single-base
can be also used for SSR genotyping. change among up to billions of bases that
MassARRAY iPLEX Gold (SEQUE- constitute a plant species is a challenging
NOM, San Diego, USA) combines the ben- task. PCR offers a means of reducing the
efits of the simple and robust single-base complexity of a genome and increasing
primer extension biochemistry with the the copy number of the DNA templates
sensitivity and accuracy of MALDI-TOF to the levels required for the specific and
mass spectometry (see Chapter 3) detection. sensitive detection of single-base changes.
It uses a single termination mix and universal However, the design of robust PCR assays
reaction conditions for all SNPs. The primer with multiplexing levels exceeding 1020
is extended, dependent upon the template amplicons has proven to be more diffi-
sequence, resulting in an allele-specific dif- cult than initially anticipated because in
ference in mass between extension prod- multiplex PCR the number of undesired
ucts. The assays can be multiplexed up to interactions between the PCR primers
40 SNPs in a 384-well format allowing for increases exponentially as the number of
throughput levels of up to 150,000 geno- primers included in the reaction mixture
types per instrument per day. MassARRAY increases. This interaction usually results
is flexible and suitable for generating both in preferential amplification of unwanted
small and large marker numbers for each primerdimer artefacts instead of the
sample so that it can be used for a variety of intended DNA templates (amplicons).
genetic and breeding purposes. Another problem in multiplex PCR is the
There are two major chip-based sequence-dependent differences in PCR
high-throughput genotyping systems, DNA efficiency between the amplicons. The
microarrays developed by Affymetrix (Santa problems of multiplexing can be reduced
Clara, USA) and a high-density biochip to some extent by using PCR primers that
assay by Illumina Inc. (San Diego, USA), are as similar to one another as possible.
both of which offer different levels of mul- The multiplexing level that can be read-
tiplexes up to several thousands or more ily achieved in standard PCRs is less than
plexes (Yan et al., 2009). As an increasing that offered by current technology for pro-
number of sets of these chips become avail- ducing high-density DNA microarrays.
able, outsourced genotyping through com- Simultaneous analysis of a reasonable
panies or service centres becomes one of amount of genomic DNA with the current
the options for genotyping large numbers detection sensitivity of microarray scan-
of samples using the same set of markers ners requires an amplification step. The
(e.g. fingerprinting) to achieve high effi- PCR step complicates the molecular reac-
ciency and low cost per data point. tions underlying the assays and introduces
multiple laboratory steps into the proce-
THE FUTURE OF SNP TECHNOLOGY. A key techni- dures and is therefore the chief obstacle to
cal obstacle in the development of micro- highly multiplexed SNP genotyping.
array-based methods for genome-wide SNP
genotyping is the PCR amplification step Diversity array technology
which is required to reduce the complexity
and improve the sensitivity of genotyping Diversity array technology (DArT) is a novel
SNPs in large, diploid genomes. The level type of DNA marker which employs a
of complexity that can be achieved in PCR microarray hybridization-based technique
does not match that of current microarray- developed by CAMBIA (http://www.diversity
based methods thus making PCR the lim- arrays.com) that enables the simultaneous
iting step in these assays (Syvnen, 2005). genotyping of several hundred polymorphic
Highly multiplexed microarray systems loci spread over the genome (Jaccoud et al.,
have recently been developed by combin- 2001; Wenzel et al., 2004). DArT can be
40 Chapter 2
A
Gx Gy Gn DNAs of interest
B
Gx Gy
Choose two genomes to analyse
Same complexity
reduction as used to make
the diversity panel
Hybridize to chip
Fig. 2.6. Procedure of diversity array technology (DArT). (A) Preparing the array. RE, restriction enzyme.
(B) Genotyping a sample.
Markers and Maps 41
DArT markers are biallelic and behave derived from polymorphisms within genes.
in a dominant (present versus absent) or co- FMs are derived from polymorphic sites
dominant (two doses versus one dose versus within genes that are causally associated
absent) manner. DArT detects single-base with phenotypic trait variation and are supe-
changes as well as indels. It is a good alter- rior to RMs as a result of their complete link-
native to currently used techniques includ- age with trait locus alleles and functional
ing RFLP, AFLP, SSR and SNP in terms of motifs (Anderson and Lbberstedt, 2003).
cost and speed of marker discovery and The major drawback of the RMs is that their
analysis for whole-genome fingerprinting. predictive value depends on the known
It is cost-effective, sequence-independent, linkage phase between marker and target
non-gel based technology that is amenable locus alleles (Lbberstedt et al., 1998b).
to high-throughput automation and the dis- Genetic diversity at or below the spe-
covery of hundreds of high quality markers cies level has mostly been characterized
in a single assay. An open source software by molecular markers that more or less
package, DArTsoft, is available for automatic randomly sampled genetic variation in the
data extraction and analysis. The weak- genome. RM is a very effective tool among
nesses of this technology include marker others for the establishment of a breed-
dominance and its technically demanding ing system, the study of gene flow among
nature. Also there is some concern as to natural populations, and the determination
whether DArT markers are randomly dis- of the genetic structure of GeneBank col-
tributed across the whole genome, as DArT lections (Chapter 5; Xu et al., 2005). RM
markers in barley appear to have a moderate systems are still the systems of choice for
tendency to be located in hypomethylated, marker-assisted breeding (Xu, Y., 2003).
gene-rich regions in distal chromosome However, users of biodiversity are often not
areas (Wenzl et al., 2006). interested in random variation but rather in
DArT technology has been successfully variation that might affect the evolutionary
developed for Arabidopsis, cassava, bar- potential of a species or the performance of
ley, rice, wheat, sorghum, ryegrass, tomato an individual genotype. Such functional
and pigeon pea, while work is in progress variation can be tagged with neutral molec-
to establish DArT in chickpea, sugarcane, ular markers using quantitative trait loci
lupins, quinoa, banana and coconut (http:// (QTL) and linkage disequilibrium mapping
www.diversityarrays.com). For example, a approaches. Alternatively, DNA-profiling
genetic map with 385 unique DArT mark- techniques may be used that specifically
ers spanning the 1137 cM barley genome target genetic variation in functional parts
(Wenzl et al., 2004) was constructed, DArT of the genome.
markers along with AFLP and SSR mark-
ers were mapped on the wheat genome GENIC MARKERS. A wealth of DNA sequence
(Semagn et al., 2006), and a cassava DArT information from many fully characterized
genotyping array containing approximately genes and full-length cDNA clones has been
1000 polymorphic clones (Xia, L. et al., generated and deposited in online databases
2005) is now available. for an increasing number of plant species
and the sequence data for ESTs, genes
Genic and functional markers and cDNA clones can be downloaded from
GenBank and scanned for identification of
DNA markers can be classified into random SSRs. Subsequently, locus-specific primers
markers (RMs) (also known as anonymous flanking EST- or genic SSRs can be designed
or neutral markers), gene targeted mark- to amplify the microsatellite loci present in
ers (GTMs) (also known as candidate gene the genes. In maize for example, gene-derived
marker) and functional markers (FMs) SSR markers that have been developed
(Anderson and Lbberstedt, 2003). RMs from genes and their primer sequences
are derived at random from polymorphic are available at www.maizeGDB.org. Genic
sites across the genome whereas GTMs are SSRs have some intrinsic advantages over
42 Chapter 2
genomic SSRs because they can be obtained Novel markers can be developed from
quickly by electronic sorting, are present the transcriptome and specific genes. As
in expressed regions of the genome and summarized by Gupta and Rustgi (2004),
expected to be transferable across species these include EST polymorphisms (devel-
(when the primers are designed from more oped using EST databases); conserved
conserved coding regions; Varshney et al., orthologue set markers (developed by com-
2005a). The potential use of EST-SSRs devel- paring the sequences of target genomes with
oped for barley and wheat has been demon- sequences of the closely related species);
strated for comparative mapping in wheat, amplified consensus genetic markers (based
rye and rice (Yu et al., 2004; Varshney et al., on the known genes from model species);
2005a). These studies suggested that EST-SSR gene-specific tags (with primers designed
markers could be used in related species for using gene sequences); resistance gene
which little information is available on SSRs analogues (with primers designed to iden-
or ESTs. In addition, the genic SSRs are good tify consensus domains conferring resist-
candidates for the development of conserved ance); exonretrotransposon amplification
orthologous markers for the genetics and polymorphism (with primers designed to
breeding of different species. For example, a combine with a long terminal repeat retro-
set of 12 barley EST-SSRs was identified that transposon-specific primer or a randomly
showed significant homology with the ESTs selected microsatellite-containing oligonu-
of four monocotyledonous species (wheat, cleotide); and PCR-based markers target-
maize, sorghum and rice) and two dicotyle- ing exons, introns and promoter regions of
donous species (Arabidopsis and Medicago) known genes with high specificity.
which could potentially be used across these Target region amplification polymor-
species (Varshney et al., 2005a). phism (TRAP) markers are derived from a
Kumpatla and Mukopadhyay (2005) rapid and efficient PCR-based technique
examined the abundance of SSR in more which uses bioinformatics tools and EST
than 1.54 million ESTs belonging to 55 database information to generate poly-
dicotyledonous species. They found that the morphic markers around targeted candi-
frequency of ESTs containing SSR among date gene sequences (Hu and Vick, 2003).
species ranged from 2.65 to 16.82%, with This TRAP technique uses two primers of
dinucleotide repeats being most abundant 18 nucleotides to generate markers. TRAP
followed by tri- or mononucleotide repeats, markers are amplified by one fixed primer
thus demonstrating the potential of in designed from a target EST sequence in the
silico mining of ESTs for the rapid develop- database and a second primer of arbitrary
ment of SSR markers for genetic analysis sequence except for AT- or GC-rich cores
and application to dicotyledonous crops. that anneal with introns and exons, respec-
However, EST-SSRs produce high quality tively. The TRAP technique should be use-
markers but these are often less polymorphic ful in genotyping germplasm collections
than genomic SSRs (Cho et al., 2000; Eujayl and in tagging genes with beneficial traits
et al., 2002; Thiel et al., 2003). EST resources in crop plants.
are also being used to mine SNPs (Picoult-
Newberg et al., 1999; Kota et al., 2003). ESTs FUNCTIONAL MARKERS. Functional markers
provide a quantitative method of measuring (FMs) are derived from polymorphic sites
specific transcripts within a cDNA library within genes causally affecting phenotypic
and represent a powerful tool for gene dis- variation. The development of FMs requires
covery, gene expression, gene mapping and allele-specific sequences of functionally
the generation of gene profiles. The National characterized genes from which polymor-
Center for Biotechnology Information (NCBI) phic, functional motifs affecting plant phe-
database, dbEST 0900409 (http://www.ncbi. notype can be identified. Some theoretical
nlm.nih.gov/dbEST_summary.html) contains and application issues relevant to functional
the largest collection of ESTs in rice, wheat, markers in wheat have been addressed (Bagge
barley, maize, soybean, sorghum and potato. et al., 2007; Bagge and Lbberstedt, 2008).
Markers and Maps 43
Genomic coverage Low copy coding region Whole genome Whole genome Whole genome Whole genome
Amount of DNA required 5010 g 1100 ng 1100 ng 50120 ng 50 ng
Quality of DNA required High Low High Medium high High
Type of polymorphism Single base Single base Single base Changes in Single base changes,
changes, indels changes, indels changes, indels length of repeats indels
Level of polymorphism Medium High High High High
Effective multiplex ratio Low Medium High High Medium to high
Chapter 2
Inheritance Co-dominant Dominant Dominant/ Co-dominant Co-dominant
co-dominant
Type of probes/primers Low copy DNA or Usually 10 bp Specific sequence Specific sequence Allele-specific PCR
cDNA clones random nucleotides primers
Technically demanding High Low Medium Low High
Radioactive detection Usually yes No Usually yes Usually no No
Reproducibility High Low to medium High High High
Time demanding High Low Medium Low Low
Automation Low Medium High High High
Development/start-up cost High Low Medium High High
Proprietary rights required No Yes and licensed Yes and licensed Yes and some Yes and some licensed
licensed
Suitable utility in diversity, Genetics Diversity Diversity and All purposes All purposes
genetics and breeding genetics
Markers and Maps 45
telophase and cytokinesis, two new daugh- with an increased number of molecular
ter cells are formed. Each of these daughter markers in the segregated population; geno-
cells has half the chromosomes (n) of the typing each individual/line using molecu-
parental cell (2n). The second meiotic divi- lar markers; and constructing linkage maps
sion closely resembles mitosis with each of from the marker data.
the nuclei generated during the first meiotic The recombination frequency between
division splitting to form two more nuclei. two linked genetic markers can be defined
Thus, four haploid gametes are produced. in units of genetic distance known as cen-
Crossing over is the process by which tiMorgans (cM) or map units. If two mark-
homologous chromosomes exchange por- ers are found to be separated in one of 100
tions of their chromatids during meiosis, progeny, those two markers are 1 cM apart.
resulting in new combinations of genetic However, 1 cM does not always correspond
information and thus affecting inheritance to the same length of physical distance or
and increasing genetic diversity. Genes that the same amount of DNA. The amount of
are present together on the same chromo- DNA per cM is referred to as the physical
some tend to be inherited together and are to genetic distance. Areas in the genome
referred to as linked. Genes that are nor- where recombination is frequent are known
mally linked may be inherited independ- as recombination hot spots; there is rela-
ently during crossing over. tively little DNA per cM in these hot spots
The proportion of recombinant gam- and it can be as low as 200 kb/cM. In other
etes depends on the rate of crossover during areas recombination may be suppressed and
meiosis and is known as the recombination 1 cM will represent more DNA and in some
frequency (r). The maximum proportion of regions the physical to genetic distance can
recombinant gametes is 50% and in this be up to 1500 kb/cM.
case crossover between two genetic loci has
occurred in all the cells. This is equivalent
to the case of non-linked genes, i.e. the two Developing mapping populations
loci are inherited independently. In population development, several factors
The recombination frequency depends should be taken into consideration includ-
on the rate of crossovers which in turn ing the selection of parental lines and
depends on the linear distance between two population types and the determination of
genetic loci. Recombination frequencies population size.
range from 0 (complete linkage) to 0.5 (com-
plete independent inheritance).
CHOICE OF PARENTAL LINES.
Four factors should
be considered in selecting appropriate
parental lines (Xu and Zhu, 1994):
2.2.2 Genetic linkage mapping 1. DNA polymorphism: genetic polymor-
phism between parental lines usually
In order to utilize the genetic information depends on how closely related they are,
provided by molecular markers more effi- which can be determined by criteria such as
ciently, it is important to know the locations geographical distribution and morphological
and relative positions of molecular mark- and isozyme polymorphisms. In general,
ers on chromosomes. The construction of DNA polymorphism is greater in open-
genetic linkage maps using molecular mark- pollinated species than in self-pollinated
ers is based on the same principles as those species. For example, RFLP polymorphism
used in the preparation of classical genetic is very high among maize lines so that a
maps: selection of molecular markers and population derived from any two inbred
genotyping system; selection of parental lines would be desirable for RFLP mapping.
lines from the germplasm collection that are Genetic polymorphism is very low in tomato
highly polymorphic at marker loci; devel- so that only interspecific populations are
opment of a population or its derived lines sufficiently polymorphic to allow for RFLP
46 Chapter 2
50
Maximum distance between markers
Average distance between markers
40
30
cM
20
10
0
0 300 600 900 1200 1500 1800 2100 2400
Number of markers
Fig. 2.8. Average and maximum distance expected between markers on a linkage map depending
on number of random markers mapped for a genome with 1200 cM, e.g. 12 chromosomes of 100 cM
each. The maximum distance curve is for 95% confidence level. From Tanksley et al. (1988) with kind
permission of Springer Science and Business Media.
Population F2 BC DH (RIL)
Co-dominance 1 M1M1:2 M1M2:1 M2M2 1 M1M2:1 M2M2 1 M1M1:1 M2M2
M1 is dominant 3 M1_:1 M2M2 1 M1M2:1 M2M2 1 M1M1:1 M2M2
50 Chapter 2
F2 gamete frequency M1N1 (1 r)/2 M1N2 r/2 M2N1 r/2 M2N2 (1 r)/2
Fig. 2.10. Theoretical ratios in an F2 population derived from two parents M1M1N1N1 and M2M2N2N2 with
recombinant frequency r.
Fig. 2.11. Genotypes and their frequencies for three linkage combinations at two loci in F2 populations
(each frequency divided by 4).
4 2 4
cT2 = {n5 + 2(n22 + n42 + n62 + n82 )
2
cM = ((n1 + n3 + n5 )2
n 3n
dfT = 8 + 3(n2 + n4 + n6 )2 ) n
+ 4(n12 + n32 + n72 + n92 )} n dfM = 1
Markers and Maps 51
2 We have
c N2 = (2(n1 + n2 )2 + (n3 + n5 )2
n c T2 c20.05(8) = 15.5
+ 2(n4 + n6 )2 ) n dfN = 2 2
cM c0.05(2)
2
= 5.99
c2L = cT2 cA2 c2B dfL = 2 c N2 c20.05(2) = 5.99
c L2 c20.05(4) = 9.49
For linkage combination (3:1)-(3:1)
which indicates that both loci M and N
show normal Mendelian segregation and
1 2
c 2M = (n1 + n22 3n32 3n42 ) dfM = 1 are linked.
3n
cT2 =
4
{(562 + 2(62 + 52 + 42 + 32 ) We have p1 (M1_N1_) = (3 2r + r 2)/4, p2
132 (M 1 _N 2 N 2 ) = p 3 (M 2 M 2 N 1 _) = (2r r 2 )/4,
+ 4(272 + 12 + 02 + 302 )} p4 (M2M2N2N2) = (1 2r + r 2)/4, and pi = 1.
132 = 165.818 Considering the number of individuals
observed for each category, n1, n2, n3 and n4,
2 and ni = n, they have a probability distri-
2
cM = {2(27 + 6 + 1)2 + (5 + 56 + 4)2 bution of (p1+p2+p3+p4)n. For a specific set
132
of observations (n1, n2, n3 and n4), the likeli-
+ 2(0 + 3 + 30)2 } 132 = 0.045
hood function is:
2 n!
c N2 = {2(27 + 5 + 0)2 + (6 + 56 + 3)2 L (r ) = ( p1 )n1 ( p2 )n2 ( p3 )n3 ( p4 )n4
132 n1!n2!n3!n4!
+ 2(1 + 4 + 30)2 } 132 = 0.167 n!
= (1/4)n (3 2r r 2 )n1
n1!n2!n3!n4!
c2L = 165.818 0.045 0.167 = 165.606 (2r r 2 )n2 + n3 (1 2r + r 2 )n4
Fig. 2.12. Data example used for test of linkage The natural logarithm of L(r) is called sup-
for (1:2:1)-(1:2:1) in an F2 population. port or log-likelihood. Here we have
52 Chapter 2
where k 2
d 2[ln L (r )] ni dpi
C = ln
n!
n ln(1/4) dr 2
= p i
2
i dr
n1! n2! n3! n4! k
ni d 2 pi
is a constant. + p dr
i
i
2
The first partial derivative is the
slope of a function. The slope will be zero k 2
d 2[ln L (r )]
p dr
at the maximum (global/local and/or min- 1 dpi
imum). The partial derivative is set with
E
dr 2 = n
i
i
respect to r k
ni d 2 pi
d ln L(r)/dr = 0
+n p dr
i
i
2
k 2
The partial derivative of ln L(r) is usually 1 dpi
denoted as score or S =n i
pi dr
n1 2 (1 r ) 2(1 r )
S= + (n2 + n3 ) k k
3 2r + r 2 2r r 2 d 2 pi
p = 0,
d
Because = i
2(1 r ) dr 2 dr
n4 =0 i i
1 2r + r 2 k
1 dpi 2 k
i
1
That is =n =n i =I
Vr i pi dr i
n1 n + n3 n4
2 + =0
3 2r + r 2 2r r 2 1 2r + r 2 where I is the total information content and
n1 n2 + n3 n4 ii = I/n is the information derived from a
+ =0 single observation.
2 + (1 r )2 1 (1 r )2 (1 r )2
From the above formula, the variance
If (1 r)2 = k, then of r can be calculated using the information
provided in Table 2.3.
n1 n + n3 n4 To estimate k, the values of ni listed in
2 + =0
2+ k 1 k k the table are used in the formula:
therefore (see equation at bottom of page) 1927 19272 + 8 6952 1338
and the MLE is k=
2 6952
= 0.7743
r = 1 k
r = 1 0.7743 = 0.1201
According to the Rao-Cramer Unequation,
the sampling variance of r is Vr = 1.76702 105
1 d 2[lnL (r )]
= E = I Thus,
Vr dr 2
r = 0.1201 1.76702 105
2
d [ln L ( r )]
where 2 is the secondary derivative = 12.01% 0.42%
dr
Table 2.3. Calculation of the variance of recombinant frequency for two linked loci each with complete
dominance.
2
dpi 1 dpi
ii =
Group ni pi dr p i dr
2
M1_N1_ 4831 (3 2r + r 2 )/4 2(1 r )/4 (1 r )
i1 = 2
3 2r + r
2
M1_N2N2 390 (2r r 2)/4 2(1 r )/4 (1 r )
i2 = 2
2r r
2
M2M2N1_ 393 (2r r 2)/4 2(1 r)/4 (1 r )
i2 = 2
2r r
2
4(1 r )
M2M2N2N2 1338 (1 r 2)/4 2(1 r)/4 i4 = =1
2
4(1 r )
(1 r )2 2(1 r )2
Total 6952 = n 1 0
ii =
3 2r + r 2
+
2r r 2
+1
This is an example of (3:1)-(3:1) link- To simplify the calculation, the log base 10 of
age combination. Allard (1956) derived the ratio L(r)/L(1/2) known as LOD, is used
formulas for r and Vr for almost all possi-
ble linkage combinations and for different L( r )
populations. LOD = log10
L(1/2)
Likelihood ratio and linkage test With n = 6952, n1 = 4831, n2 = 390, n3 = 393,
and n4 = 1338, likelihood of odds (LOD)
In human genetics the linkage phase (repul- scores can be calculated for different r values
sion or coupling) is usually unknown thus as shown below (see (b) at bottom of page).
making it impossible to calculate recom- The result indicates that LOD scores
binant frequency based on the observable vary with r and reach the maximum when
recombinants. As a result, likelihood ratios r = 0.12.
or odds ratios (Fisher, 1935; Haldane and If M and N are linked, L(r)/L(1/2) > 1,
Smith, 1947; Morton, 1955) have been used and thus LOD is positive. When L(r)/L(1/2)
for linkage testing. The method is based < 1, LOD is negative.
on the comparison of the probability that In human genetics the likelihood ratio
observed data follow an hypothesis, for should be greater than 1000:1, i.e. LOD > 3
example two linked loci and the alternative in order to establish linkage unequivocally.
hypothesis, two independent loci. The ratio The concept of the likelihood ratio is now
of the two probabilities L(r)/L(1/2) is tested widely used in genetic mapping of other
as follows: r = 1/2 is entered into the like- organisms including plant species to judge
lihood function (see equation (a) at bottom the reliability of linkage estimation and to
of page). verify its existence.
n!
L(1 / 2) = (1 / 4)n (2.25)n1 (0.75)n2 + n3 (0.25)n4 (a)
n1 ! n2 ! n3 ! n4 !
Multi-point analysis and ordering the observed data at the converged iteration
a set of markers is 10303.28 (351.45) = 1048 times higher than that
for the initial ri = 0.05.
The methods discussed above are all based
on two-point analysis using two markers
at a time. However, when more than two Linkage mapping in the presence
markers from one chromosome are consid- of genotyping errors
ered, they can theoretically be arranged in As generating marker data is time consum-
many different orders but only one particu- ing and expensive, maximum use should be
lar order will match the genetic order on the made of the information generated. Without
chromosome and this particular order can accounting for genotyping errors, each error
be determined by multi-point analysis. in a non-terminal marker causes two appar-
Consider M1, M2, . . . , Mm genetic markers, ent recombinations in the dataset. Thus
ordered by their real locations on a chromo- every 1% error rate in a marker adds 2 cM
some for m genetic markers, there are a total of of inflated distance to the map. If there is
m!/2 possible orders. Assume the recombinant an average of one marker every 2 cM, then
frequency between two flanking markers, Mi an average of a 1% error rate will double
and Mi+1 is ri. The objective is to find r1, r2, . . . , the size of the map. There will be large
rm1 to maximize the likelihood L(r), distances between adjacent markers with
very high error rates. These cases can be
L(r) p1(r1,r2, . . . ,rm1)n1 p2(r1,r2, . . . ,rm1)n2 detected, either manually or automatically,
. . . pm(r1,r2, . . . ,rm1)nm and the markers removed. Such genotyping
errors can be identified by simply sorting
Using the natural logarithm, the par- the marker data by a given linkage order to
tial derivative is then set with respect to determine whether there are a large number
r1, r2, . . . , rm1. EM algorithm (Dempster of crossovers involved.
et al., 1977) can be used to obtain the MLE For the markers with low error levels
for r1, r2, . . . , rm1, which involves multi- that cannot be detected easily, the best
ple iteration steps of Expectation (E) and strategy is to integrate error detection with
Maximization (M). The multiple steps map-building procedure. Cartwright et al.
include: (i) providing an initial set of esti- (2007) extended the traditional likelihood
mates, r old = (r1, r2, . . . , rm1); (ii) using the model used for genetic mapping to include
intial estimates as the estimates of recom- the possibility of genotyping errors. Each
binant frequencies to obtain the E, i.e. the individual marker is assigned an error rate
expected numbers of recombinants and which is inferred from the data as are the
non-recombinants in each marker interval; genetic distances. A software package,
(iii) using these expected values as true val- TMAP, was developed to use this model to
ues to obtain the MLE for r new = (r1, r2, . . . , rm1); identify maximum-likelihood maps for
(iv) repeating steps (ii) and (iii) until the phase-known pedigrees. The methods
MLE has converged to its maximum. were tested using a data set in Vitis and a
Lander and Green (1987) provided an simulated data set, which confirmed that
example of the EM method for multi-point the method dramatically reduced the infla-
linkage analysis. Using 15 marker inter- tionary effect caused by increasing the
vals on human chromosome 7 determined number of markers and resulted in more
by 16 markers and initial recombinant fre- accurate orders.
quencies of ri = 0.05, the log-likelihood was
found to be 351.45. To reduce the difference Molecular maps in plants
of log-likelihoods between two consective
iterations to less than a given critical value Table 2.4 lists some representative molecu-
(tolerance value, T = 0.01), 12 iterations were lar maps that have been developed for major
needed which resulted in convergence at crop plants including legumes, cereals and
log-likelihood 303.28. The probability of clonal crops, which vary in marker density,
Table 2.4. Representative genetic maps in plants.
Azuki bean SSR, RFLP, AFLP; 187 BC1F1 486 markers mapped into 11 linkage groups spanning 832.1 cM with Han et al. (2005)
(JP81481 Vigna nepalensis) an average marker distance of 1.85 cM, 95% genome coverage
Barley AFLP, SSR, STS, and vrs1); 1172 markers with a total distance of 1595.7 cM, and average marker Hori et al. (2003)
95 RILs (Russia 6 H.E.S. 4) density of 1.4 cM per locus
SNP, SSR, RFLP, AFLP; three DH 1237 markers, based on three mapping populations consisted of 1237 loci, Rostoks et al. (2005)
populations with a total map length of 1211 cM and an average marker density
of 1 locus per cM
Lettuce AFLP, RFLP, SSR, RAPD; seven inter- 2744 markers assigned to nine linkage groups that spanned Truco et al. (2007)
and intraspecific populations a total of 1505 cM. The mean interval between markers is 0.7 cM
Maize SSR markers; one intermated The IBM map: 748 SSR and 184 RFLP markers with a total map length Sharopova et al. (2002)
RIL (IBM) and two immortalized F2s of 4906 cM; two immortalized F2 maps: 457 and 288 SSR markers with
total map length of 1830 and 1716, respectively
cDNA probes; two RIL populations: Framework maps: 237 and 271 loci in IBM and LHRF populations, Falque et al. (2005)
IBM (B37 Mo17) and LHRF that both maps contain 1454 loci (1056 on IBM_Gnp2004 and
(F2 F252) 398 on LHRF-Gnp2004) corresponding to 954 cDNA probes
Oat RFLP, AFLP, RAPD, STS, SSR, 426 loci (with 243 loci each) spanning 2049 cM of the oat genome Portyanko et al. (2001)
isozyme, morphological; 136 F6:7
RIL (Ogle TAM O-301)
Pearl millet RFLP and SSR; four populations A consensus genetic map: 353 RFLP and 65 SSR markers, Qi et al. (2004)
marker density in four maps ranged from 1.49 cM to 5.8 cM
Potato AFLP markers; heterozygous diploid > 10,000 AFLP loci, with marker density proportional to physical van Os et al. (2006)
potato distance and independent of recombination frequency
Rice 726 markers; 113 BC1 (BS125 WL02) 726 markers with a total distance of 1491 cM and average marker Causse et al. (1994)
BS125 density of 4.0 cM on the framework map, and 2.0 cM overall
2275 markers; 186 (Nipponbare 2275 markers with a total distance of 1521.6 cM, and average Harushima et al. (1998)
Kasalath) F2 marker density of 0. 67 cM per locus
Sorghum 2590 PCR-based markers and 137 RIL The 1713 cM map encompassed 2926 loci Menz et al. (2002)
(BTx623 IS3620C)
RFLP probes; 65 F2 (Sorghum bicolor The S. bicolor S. propinquum map is composed of 2512 loci, Bowers et al. (2003a)
Sorghum propinquum) spanning 1059.2 cM, a marker per 0.4 cM
Sweet potato AFLP; (Tanzania Bikilamaliya) 632 (Tanzania) and 435 (Bikilamaliya) AFLP markers, with Kriegner et al. (2003)
F2 population a total of 3655.6 cM and 3011.5 cM, and a marker per 5.8 cM
and 6.9 cM, respectively
Wheat SSR and DArT markers; 152 RILs from a 14 linkage groups, 690 loci (197 SSR and 493 DArT markers), Peleg et al. (2008)
cross between durum wheat and wild spanning 2317 cM, a marker per 7.5 cM
emmer wheat
56 Chapter 2
and genomic coverage. For example, crops can be integrated with the molecular link-
such as barley, maize, potato, rice, sorghum age map by using the same population for
and wheat have high-density genetic maps both conventional and molecular markers.
while cassava, Musa, oat, pearl millet, sweet As only very few morphological markers
potato and yam have less saturated maps. can segregate simultaneously in one popu-
The large variation in map length results lation, integration of many of these mark-
from differences in the number of chro- ers requires multiple populations each with
mosomes and total size of the genomes as an available preliminary molecular map. If
well as from the use of different numbers of a complete linkage map for morpholgical
markers (increasing the number of markers markers is available, the positions of these
will generally give a larger total map length markers relative to molecular markers can
up to a certain threshold), the inclusion of be inferred from the linkage relationship
skewed markers (that tend to exaggerate map revealed by both morphological and molec-
distances) and the use of different mapping ular markers. In addition, morphological
software (which vary in estimates of genetic markers, including some traits of agronomic
distances). In addition, many published importance, can be mapped much more
maps report more linkage groups than the precisely if they are integrated with a dense
basic chromosome number of that species. molecular map and this has now become
This is frequently the result of insufficient an integral step in trait and gene mapping.
marker density as most saturated maps can Integration of conventional and molecu-
be directly aligned with the basic chromo- lar maps has been very successful for crop
some complement (Tekeoglu et al., 2002). plants for which relatively complete genetic
The sophistication of molecular map linkage maps are available as a result of the
construction has developed from the RFLP use of morphological markers.
maps of the 1980s to PCR-based markers Some representative examples of such
of the 1990s to more integrated maps, as maps include rice, maize, tomato and soy-
a result of the use of different types of bean. In rice, 39 morphological markers and
molecular markers including genic mark- 82 RFLP markers were mapped together
ers, over the past decade. Linkage maps based on the segregation analysis of 19 F2
have been used in gene mapping for major populations derived from the crosses between
genes and QTL (Chapters 6 and 7), MAS indica cultivar IR24 and japonica lines with
(Chapters 8 and 9) and map-based gene different morphological markers (Ideta et al.,
cloning (Chapter 11). 1996). In tomato, a number of morphologi-
cal and isozyme markers were mapped with
respect to RFLP markers by orienting the
2.2.3 Integration of genetic maps molecular linkage map to both morphologi-
cal and cytological maps. An integrated high-
Integration of conventional density RFLP-AFLP map of tomato based on
and molecular maps two independent Lycopersicon esculentum
Lycopersicon pennellii F2 populations was
During the period 19801990 molecular constructed (Haanstra et al., 1999), which
maps were developed for many plant species. spanned 1482 cM and contained 67 RFLP
The first generation of molecular maps have and 1175 AFLP markers. Integrated maps
been integrated with conventional genetic were also developed for maize (Neuffer et al.,
maps constructed using morphological and 1997; Lee et al., 2002) and soybean (Cregan
isozyme markers through cytological mark- et al., 1999).
ers and markers shared by different maps.
The 12 molecular linkage groups in rice Integration of multiple molecular maps
(McCouch et al., 1988) were assigned to clas-
sical linkage groups using trisomics for each For many crop plants, several molecular
of the 12 rice chromosomes. Shared markers maps have been constructed using differ-
and those which segregate in the population ent populations. These populations are of
Markers and Maps 57
variable size and structure and maps have Integration of genetic and physical maps
been created using different numbers and
types of markers. To build an integrated Integrated genetic and physical genome
reference or consensus map, the order and maps are extremely valuable for map-
genetic distance between specific markers based gene isolation, comparative genome
is compared across populations and maps. analysis and as sources of sequence-ready
Stam (1993) developed a computer pro- clones for genome sequencing projects.
gram, JOINMAP, for the construction of genetic A well-defined correlation between the
linkage maps for several types of mapping physical and genetic maps will greatly
populations: BC1, F2, RILs, DHs and out- facilitate molecular breeding efforts
breeder full-sib family. JOINMAP can be used through associating candidate genes with
to combine (join) data derived from several important biological or agronomic traits,
sources into an integrated map. positional cloning and comparative analy-
For each crop all the molecular maps sis across populations and species, and
developed from different populations will whole genome sequences, which will in
finally be integrated into a consensus map. turn facilitate the development of various
This process has been very successful for molecular breeding tools.
several major crops and it can be expected Various methods have been developed
that it will be extended to all crops when for assembling physical maps of complex
sufficient maps become available. In wheat, genomes and integrating them with genetic
an SSR consensus map was constructed by maps. To create an integrated genetic and
fusing several genetic maps to maximize the physical map resource for maize, a compre-
integration of genetic mapping information hensive approach was used that included
from different sources (Somers et al., 2004). three core components (Cone et al., 2002).
In cotton, chromosome identities were The first was a high-resolution genetic
assigned to 15 linkage groups in the RFLP map that provided essential genetic anchor
joinmap developed from four intraspecific points for ordering the physical map and
cotton (Gossypium hirsutum L.) popula- for utilizing comparative information from
tions with different genetic backgrounds other smaller genome plants. The physical
(Ulloa et al., 2005). In maize, two popula- map component consisted of contigs (sets
tions of intermated RILs (IRILs) were used of overlapping fingerprint clones) assem-
to build a consensus map, the first panel bled from clones from three deep-coverage
(IBM) was derived from B73 Mo17 and genomic libraries. The third core compo-
the second panel (LHRF) from F2 F252. nent was a set of informatics tools designed
Framework maps of 237 loci were built from to analyse, search and display the mapping
the IBM panel and 271 loci from the LHRF data. In rice, most of the genome (90.6%)
panel. Both maps were used to locate 1454 was anchored genetically by overgo hybrid-
loci (1056 on map IBM_Gnp2004 and 398 ization, DNA gel blot hybridization and
on map LHRF_Gnp2004) that corresponded in silico anchoring (Chen et al., 2002).
to 954 previously unmapped cDNA probes In wheat, the geneticphysical map rela-
(Falque et al., 2005). In barley, Wenzl et al. tionship of microsatellite markers was
(2006) built a high-density consensus link- established using the deletion bin system
age map from the combined data sets of ten (Sourdille et al., 2004). In sorghum, Klein
populations, most of which were simultane- et al. (2000) developed a high-throughput
ously typed with DArT and SSR, RFLP and/ PCR-based method for building bacterial
or STS markers. The map comprised 2935 artificial chromosome (BAC) contigs and
loci (2085 DArT, 850 other loci), spanned locating BAC clones on the genetic map
1161 cM and contained a total of 1629 bins in order to construct an integrated genetic
(unique loci). The arrangement of loci was and physical map. It was found that 30%
very similar to, and almost as optimal as, of the overlapping BACs aligned by AFLP
the arrangement of loci in component maps analysis provided information for merg-
created for individual populations. ing contigs and singletons that could not
58 Chapter 2
be joined using fingerprint data alone. In automated matching of BACs were then
the grasses Lolium perenne and Festuca anchored on to IBM2 and IBM2 neighbour
pratensis, the physical map was integrated maps. In the Gramene database, a web-
with a genetic map using genomic in situ based tool, CMAP, was developed to allow
hybridization, which was composed of 104 users to view comparisons of genetic and
F. pratensis-specific AFLPs. The integrated physical maps (Ware et al., 2002). In addi-
map demonstrated the large-scale analy- tion, an integrated bioinformatic tool, the
sis of the physical distribution of AFLPs Comparative Map and Trait Viewer (CMTV),
and variation in the relationship between was developed to construct consensus
genetic and physical distance from one part maps and compare QTL and functional
of the F. pratensis chromosome to another genomics data across genomes and exper-
(King et al., 2002). iments (Sawkins et al., 2004). All these
An integrated genetic and physi- tools can be used to build integrated maps
cal mapping tool has been developed by based on shared markers and a reference
the Maize Mapping Project, Columbia, map to initiate the process. The integra-
Missouri, USA (http://www.maizemap. tion of genetic, cytological and physical
org/iMapDB/iMap.html). Contigs that maps is illustrated in the example shown
were assembled by fingerprinting and the in Fig. 3.6.
3
Molecular Breeding Tools:
Omics and Arrays
The success of molecular breeding depends sis (2DE). The proteins can be identified by
upon the various tools that can be used for excising the spot from the gel, digesting
the efficient manipulation of genetic varia- the polypeptide into smaller peptide frag-
tion. All kinds of omics, arrays and high- ments with specific proteases, and sequenc-
throughput technologies make it possible to ing the peptides directly or analysing them
carry out more large-scale genetic analyses by mass spectrometry (MS). Although this
and breeding experiments than ever before. method is still useful and widely used, it
These technologies have been incorpo- is limited in sensitivity, resolution, and the
rated into many novel genetic and breeding range of abundance of the different proteins
processes, some of which were described in the sample (Zhu et al., 2003; Baginsky
in Chapter 2. In this chapter, microarrays, and Gruissem, 2004). For example, abun-
high-throughput technologies and several dant proteins in the sample dominate the
aspects of genomics will be briefly discussed gel whereas less abundant proteins might
to provide some of the fundamental know- not be visible. New approaches involve
ledge required for molecular breeding. both improved separation methods and
advanced detection equipment, and several
other new technologies are available for use
3.1 Molecular Techniques in Omics in proteomic research (Kersten et al., 2002;
Zhu et al., 2003; De Hoog and Mann, 2004).
New detection methods and proteomic
Developments in molecular techniques have
technologies are also being developed in an
contributed to the various fields of omics,
array format, which is increasingly being
which include genomics, transcriptomics,
focused on proteinprotein interactions,
proteomics, metabalomics and phenomics.
post-transcriptional modification, and
These underlying developments include
elucidation of three-dimensional protein
advanced gel, hybridization and expression
structure.
systems, cell imaging by light and electron
microscopy, high density microarrays and
array experiments, and genetic readout
experiments. 3.1.1 2-Dimensional gel electrophoresis
Using proteomics as an example, clas-
sical techniques used in proteomics involve 2DE is a form of gel electrophoresis com-
the use of two-dimensional gel electrophore- monly used to analyse proteins. Mixtures of
proteins are separated by two properties in proteins are separated in one dimension by
two dimensions in 2DE. During the early isoelectric point and in the second dimen-
years of proteomics and until recently, sion by mass. In one-dimensional electro-
profiling of protein expression relied phoresis, proteins (or other molecules)
primarily on the use of two-dimensional are separated in one dimension, so that all
polyacrylmide gel electrophoresis (2D the proteins/molecules in one lane will
PAGE), which was later combined with be separated from one another according
MS. The basic procedure is to solubilize to the differences in a particular property
the protein contents of an entire cell popu- (e.g. isoelectric point) between each com-
lation, tissue or biological fluid, followed ponent. The result is a gel with proteins
by separation of the protein components separated out on its surface (Fig. 3.1a).
in the lysate using 2DE and visualization The proteins can then be visualized by a
of the separated proteins with silver stain- variety of staining methods, the most com-
ing. This approach allows only a limited monly used stains are silver nitrate and
display of the total protein content and Coomassie blue. By combining electro-
can identify only the relatively abundant phoresis with MS, individual proteins can
proteins. be profiled (Fig. 3.1b, c) and theoretical
2DE begins with one-dimensional and acquired MS profiles can be matched
electrophoresis and then separates the by a database search.
molecules by a second property in a direc- An important development in 2D PAGE
tion at 90 to the first. In this technique is the use of immobilized pH gradients
(a) pl
10 9 8 7 6 5 4 3
100
Molecular weight
80
Trypsin
60
40
12 14 16
20 Time
Peptides Separate peptides
0
LLEAAAQSTK
516.27 (2+)
400
y7 y8
Fig. 3.1. Standard protein analysis by two-dimensional electrophoresis followed by mass spectrometry
proteomics. (a) Protein is separated by two-dimensional electrophoresis: in one dimension by
isoelectronic point (pI) and in the second dimension by mass (molecular weight). Individual peptides
are obtained using trypsin to cleave peptide chains. (b) Peptides are separated by chromatography and
then peptides are ionized using electospray ionization (ESI): they pass through the first quadrupole (q1)
and collision chamber (q2). (c) Individual ions are separated based on their mass-to-charge (m/z) by a
mass analyser. (d) From the MS spectrum, an individual peptide ion (516.27 (2+)) is selected for MS/MS
analysis to produce peptide ion fragmentation patterns. Letters S, Q, A, A, E, L and L represent amino
acids in the selected peptide and a2, b2, y3, etc. represent different ions.
Omics and Arrays 61
There are many types of mass analys- ally coupled to TOF analysers that measure
ers which use static or dynamic fields and the mass of intact peptides, whereas ESI
magnetic or electric fields. Each analyser has mostly been coupled to ion traps and
type has its strengths and weaknesses. Four triple quatrupole instruments and used to
basic types of mass analyser used in pro- generate fragment ion spectra (collision-
teomic research are: ion trap, time-of-flight induced spectra) of selected precursor ions
(TOF), quadrupole and Fourier transform (Aebersold and Goodlett, 2001). ESI creates
mass spectrometry (FT-MS) analyser. In ion- ions by application of a potential to a flow-
trap analysers, the ions are first captured or ing liquid causing the liquid to charge and
trapped for a certain time interval and are subsequently spray. The electrospray creates
then subjected to MS or tandem MS (MS/ very small droplets of solvent-containing
MS) analysis. Ion traps are robust, sensitive analyte. Solvent is removed by heat or some
and relatively inexpensive. A disadvantage other form of energy (e.g. energetic collisions
is their relatively low mass accuracy, due in with a gas) as the droplets enter the mass
part to the limited number of ions that can spectrometer and multiply-charged ions are
be accumulated at their point-like centre formed in the process. ESI ionizes the ana-
before space-charging distorts their distribu- lytes out of a solution and is therefore read-
tion and thus the accuracy of the mass meas- ily coupled to liquid-based (for example,
urement. The linear or two-dimensional ion chromatographic and electrophoretic) sepa-
trap is a recent development where ions ration tools (Fig. 3.1). MALDI creates ions
are stored in a cylindrical volume that is by excitation of molecules that are isolated
considerably larger than that of the tradi- from the energy of the laser by an energy-
tional, three-dimensional ion traps, allow- absorbing matrix. The laser energy strikes
ing increased sensitivity, resolution and the crystalline matrix to cause rapid excita-
mass accuracy. The FT-MS instrument is tion of the matrix and subsequent ejection of
also a trapping mass spectrometer, although matrix and analyte ions into the gas phase.
it captures the ions under high vacuum in MALDI-MS is normally used to analyse
a high magnetic field. It measures mass by relatively simple peptide mixtures in cases
detecting the image current produced by where integrated liquid-chromatography
ions cyclotroning in the presence of a mag- ESI-MS systems (LC-MS) are preferred for
netic field. Its strengths are high sensitiv- the analysis of complex samples.
ity, mass accuracy, resolution and dynamic Key developments leading to improved
range. In spite of the enormous potential, detection of proteins include TOF MS and
the expense, operational complexity and relatively non-destructive methods for con-
low-peptide-fragmentation efficiency of verting proteins into volatile ions (Zhu et al.,
FT-MS instruments has limited their rou- 2003). MALDI and ESI have made it possible
tine use in proteomic research (Aebersold to analyse large molecules such as peptides
and Mann, 2003). The TOF analyser uses an and proteins. Although MALDI-TOF MS is a
electric field to accelerate the ions through relative high-throughput method compared
the same potential and then measures the with ESI, the latter is more easily coupled
time they take to reach the detector. with separation techniques such as LC or
Techniques for ionization have been key high pressure LC (HPLC) (Zhu et al., 2003).
to determining what types of samples can This has provided an attractive alternative
be analysed by MS. Electrospray ionization to 2DE, because even low-abundance pro-
(ESI; Fenn et al., 1989) and matrix-assisted teins and insoluble transmembrane proteins
laser desorption/ionization (MALDI; Karas can be detected (Ferro et al., 2002; Koller
and Hillenkamp, 1988) are two techniques et al., 2002). Other MS techniques include
most commonly used to volatize and ion- gas chromatographymass spectrometry
ize proteins or peptides for MS analysis (GC-MS), and ion mobility spectrometry/
while inductively coupled plasma sources mass spectrometry (IMS/MS). All MS-based
are used primarily for metal analysis on a techniques require a substantial and search-
wide array of sample types. MALDI is usu- able database of predicted proteins, ideally
Omics and Arrays 63
representing the entire genome. Protein called bait) is screened against a library of
identification is possible by comparing the activation-domain hybrids (prey) to select
deduced masses of the resolved peptide interaction partners (Phizicky et al., 2003).
fragments with the theoretical masses of The key advantages of the Y2H assay
predicted peptides in the database. are its sensitivity and flexibility (Phizicky
Mass spectrometers are restricted in the et al., 2003). The sensitivity derives in part
number of ions that can be detected at any from overproduction of protein in vivo, their
point in time. Pre-fractionation of proteins designed direction to the nuclear compart-
on the basis of isolation of specific cell types ment where interactions are monitored,
or subcellular organelles is often necessary the large number of variable inserts of the
to reduce the complexity (Lonosky et al., interacting proteins that can be examined at
2004). Another method of fractionating a once, and the potency of the genetic selec-
complex sample is to introduce a chromato- tions. This sensitivity leads to the detection
graphic technique before MS analysis. This of interactions with dissociation constants
method, referred to as multidimensional around 107 M which is in the range of most
protein identification technology (MudPIT) weak protein interactions found in the cell
(Whitelegge, 2002) has been used to conduct and is more sensitive than co-purification.
a shotgun survey of metabolic pathways in It also allows detection of certain transient
the leaves, roots and developing seeds of interactions that might affect only a subpop-
rice (Koller et al., 2002). Compared with ulation of the hybrid proteins. Flexibility of
2DE-MS, each method identifies unique pro- the assay is provided by calibration to detect
teins, supporting the complementary nature interactions of varying affinity by altering the
of the different proteomic technologies. expression levels of the hybrid proteins, the
number and nature of the DNA-binding sites
and the composition of the selection media.
3.1.3 Yeast two-hybrid system Some disadvantages of the Y2H assay
include the unavoidable occurrence of false
The yeast two-hybrid assay (Fields and negatives and false positives (Phizicky et al.,
Song, 1989) provides a genetic approach 2003). False negatives include proteins
to the identification and analysis of pro- such as membrane proteins and secretory
teinprotein interactions. Yeast two-hybrid proteins that are not usually amenable to
(Y2H) systems detect not only members of nuclear-based detection systems, proteins
known complexes but also weak or tran- that failed to fold correctly and interactions
sient interactions (Jansen et al., 2005). The dependent on domains occluded in the
Y2H assay makes use of the molecular fusions or on post-translational modifica-
organization found in many transcription tions. False positives include colonies not
factors that have a DNA-binding domain resulting from a bona fide protein interac-
and activation domains that can function tion, as well as colonies resulting from a
independently, but when these domains are protein interaction not indicative of an
fused to two proteins that interact, the abil- association that occurs in vivo.
ity of the domains to control transcriptional There are several variations of the Y2H
activity is reconstituted. In this assay hybrid system. In the reverse Y2H system, induced
proteins are generated that fuse a protein X URA3 expression leads to 5-FOA being con-
to the DNA-binding domain and protein Y verted into the toxic substance 5-fluorouracil
to the activation domain of a transcription by Ura3p, leading to growth prohibition.
factor (Fig. 3.2a). Interaction between X Mutated or fragmented genes are created and
and Y reconstitutes the activity of the tran- then subjected to analysis and only loss-of-
scription factor and leads to expression of interaction mutants are able to grow in the
a reporter gene with a recognition site for presence of 5-FOA. In the one-hybrid sys-
the DNA-binding domain. In the typical tem, the bait is a target DNA fragment fused
practice of this method, a protein of interest to a reporter gene. Preys that are able to bind
fused to the DNA-binding domain (the so- to the DNA fragmentreporter fusion will
64 Chapter 3
(a)
X Y
(b) (c)
X Screened
against
Y1
Screened
X
Screened against
X
against Y2
Screened
X
against
Yn
(d) (e)
X1
Y1
Screened Screened
X
against against
X96
Y96
Fig. 3.2. Yeast two-hybrid approaches. (a) The yeast two-hybrid system. DNA binding and activation
domains (circles) are fused to two proteins X and Y, the interaction of X and Y leads to reporter gene
expression (arrow). (b) A standard two-hybrid search. Protein X, present as a DNA binding domain hybrid,
is screened against a complex library of random inserts in the activation domain vector (shown in square
brackets). (c) A two-hybrid array approach. Protein X is screened against a complete set of full length open
reading frames (ORFs) present as activation domain hybrids (shown as yeast transformant spotted on to
microtitre plates). (d) A two-hybrid search using a library of full length ORFs. The set of ORFs as activation-
domain hybrids (microtitre plates in square brackets) is combined to form a low-complexity library.
(e) A two-hybrid pooling strategy. Pools of ORFs as both DNA-binding domain and activation domain
hybrids (in square brackets) are screened against each other. From Phizicky et al. (2003) reprinted by
permission from Macmillan Publishers Ltd.
lead to activation of the reporter genes (lacZ, bait and prey proteins requires the presence
HIS3 and URA3). In the repressed transac- of a third interacting molecule to form a
tivator system, the interaction of baitDNA complex. The third interacting molecule can
binding domain fusion proteins and the be a protein used with a nuclear localization
preyrepressor domain fusion proteins can acting as a bridge between bait and prey to
be detected by repression of the reporter cause transcriptional activation.
URA3. The interaction of bait and prey ena- Different genome-wide two-hybrid
bles cells to grow in the presence of 5-FOA, strategies have been used to analyse protein
whereas non-interactors are sensitive to interactions in Saccharomyces cerevisiae.
5-FOA as a result of Ura3p production. In One approach involved screening a large
the three-hybrid system, the interaction of number of individual proteins against a
Omics and Arrays 65
AAAAA
TTTTT
AAAAA
GTAC TTTTT
AAAAA
GTAC TTTTT
AAAAA
GTAC TTTTT
Divide in half
Ligate to linkers (A + B)
GGATGCATGXXXXXXXXX GGATGCATGOOOOOOOOO
CCTACGTACXXXXXXXXX CCTACGTACOOOOOOOOO
TE AE Tag TE AE Tag
GGATGCATGXXXXXXXXXOOOOOOOOOCATGCATCC
CCTACGTACXXXXXXXXXOOOOOOOOOGTACGTAGG
Ditag
Cleave with anchoring enzyme
Isolate ditags
Concatenate and clone
CATGXXXXXXXXXOOOOOOOOOCATGXXXXXXXXXOOOOOOOOOCATG
GTACXXXXXXXXXOOOOOOOOOGTACXXXXXXXXXOOOOOOOOOGTAC
Tag 1 Tag 2 Tag 3 Tag 4
AE AE AE
Ditag Ditag
70
wild type Pti4
60
50
# Tags
40
30
20
10
0
Ca/b
Pti4
PDF1.2
Di19
Lhcb5
TIP
Catalase
Oxygen-
evolving protein
Germin1
TF
MYB60
BAC clone
T18N14
ATPase
Chrom.
5 clone
Peroxidase
Genes
Fluorescence
DNA oligonucleotide probes that fluoresce
0.6
when hybridized with a complementary
DNA (cDNA).
Real-time RT-PCR uses fluorophores in 0.4
order to detect levels of gene expression. As
mRNA becomes translated at the ribosome to
0.2
produce functional proteins, mRNA levels tend
0 10 20 30 40
to roughly correlate with protein expression. Cycle number
In order to adapt PCR to the measurement of
RNA, the RNA sample first needs to be reverse B 4
10 copies
3.0
transcribed to cDNA via an enzyme known as
a reverse transcriptase. The original RT-PCR 2.5 10 copies
technique required extensive optimization
Fluorescence
the same source as sample to be tested) and Transcriptional analysis may also be
driver cDNA (from a normal sample) to carried out by inserting a reporter gene such
obtain shorter fragments; (iii) divide tester as lacZ or GFP (green fluorescent protein)
cDNA into two portions and ligate each to downstream from the promoter under study.
a different adaptor, while driver cDNA has lacZ encodes -galactosidase and its expres-
no adaptors; (iv) hybridization kinetics lead sion is detected by the blue colour obtained
to equalization and enrichment of differ- in the presence of X-Gal. GFP is a protein
entially expressed sequences among single containing a chromophore which fluoresces
strand tester molecules; and (v) ultimately under blue light (395 nm). These reporters
generate templates for PCR amplification are used to evaluate the expression levels and
from differentially expressed sequences. identify the tissues in which the normal gene
As a result, only differentially expressed is expressed under the chosen promoter.
sequences are amplified exponentially.
highly repetitive sequences, while prokary- concentration and time required to pro-
otes have small genomes, single and cir- ceed to the half way of re-association. It is
cular chromosomes (few linear) with no directly related to the amount of DNA in the
centromere or telomere, high gene density genome.
without introns and very few or no repeti- The DNA content of haploid genomes
tive sequences. The genome size refers to ranges from 5 103 for viruses to 1011 bp for
the haploid genome since different cells flowering plants. Within mammals, there is
within a single organism can be of differ- only a two fold difference between the larg-
ent ploidy. Germ cells are usually haploid est and smallest C-value. However, there
and somatic cells diploid. The size of the is up to a 100-fold variation in size within
genome is known as the C-value and is flowering plants. The minimum genome
measured by re-association kinetics. After size found in each phylum increases from
denaturation, the rate of re-association is prokaryotes to mammals (Fig. 3.5).
dependent on genome size. The larger the Among the most important food crops,
genome, the more repeated DNA sequences rice has the smallest genome (389 Mb)
and the longer time to re-anneal, the higher (IRGSP, 2005) and wheat the largest
the C-value. C0 t1/2 is the product of the DNA (15,966 Mb). According to Arumuganathan
Flowering plants
Birds
Mammals
Reptiles
Amphibians
Bony fish
Cartilaginous fish
Echinoderms
Crustaceans
Insects
Molluscs
Worms
Fungi
Algae
Bacteria
Mycoplasmas
Virus
103 104 105 106 107 108 109 1010 1011
DNA content (bp)
Fig. 3.5. DNA contents of organisms. Modified from Primrose (1995) and Arumuganathan and Earle (1991).
70 Chapter 3
and Earle (1991), other crops can be called selfish DNA). Some of the sequences
grouped into seven classes: Musa, cowpea are found to cause insertional or deletion
and yam (873 Mb); sorghum, bean, chick- mutations such as Alu.
pea and pigeonpea (673818 Mb); soy-
bean (1115 Mb); potato and sweet potato
(15971862 Mb); maize, pearl millet and 3.2.2 Physical mapping
groundnut (23522813 Mb); pea and barley
(43975361 Mb); and oat (11,315 Mb). Physical mapping entails constructing a
Genome size is often correlated with physical map which consists of continuous
plant growth and ecology and extremely overlapping fragments of cloned DNA that
large genomes may be limited both eco- has the same linear order as found in the
logically and evolutionarily. The manifold chromosomes from which they are derived.
cellular and physiological effects of large A series of overlapping clones or sequences
genomes may be a function of selection of that collectively span a particular chromo-
the major components that contribute to somal region and form a contiguous segment
genome size such as transposable elements is called a contig. Recommended references
and gene duplication (Gaut and Ross-Ibarra, for physical mapping include Zhang and
2008). Wing (1997), Brown (2002), Meyers et al.
(2004) and Lolle et al. (2005).
Sequence complexity
because the fixed capacity of the phage head mediated transformation. A similar vec-
prevents genomes that are too long being tor called TAC, was developed and used
packaged into progeny particles. Cosmids to complement a mutant phenotype in
are one type of hybrid vector that replicate Arabidopsis (Liu et al., 1999). Table 3.1
like a plasmid but can be packaged in vitro provides characteristics of several artificial
into l phage coats. The vector can accom- chromosome vectors.
modate DNA inserts as large as 45 kb.
The YAC vector was developed in ISOLATION OF HIGH MOLECULAR WEIGHT DNA.
which an insert up to 1000 kb can be main- Preparing quality high molecular weight
tained. The YAC cloning system includes (HMW) DNA (most of the DNA > 1 Mb)
Tel yeast telemeres, ARS1 autonomously suitable for large insert library construc-
replicating sequence, CEN4 centromere tion can be one of the most difficult
from yeast chr.4, URA3 (Uracil) and TRP1 steps in constructing a large-insert plant
(tryptophan) yeast selection marker genes, genomic library. There are four predom-
Amp ampicillin-resistance gene and Ori inant problems involved in isolating
origin of replication of pBR322. Although plant nuclear DNA: (i) plant cell walls
the YAC clones have played a major role must be physically broken or enzymati-
in several genome projects and map-based cally digested without damaging nuclei;
cloning of many genes in the early 1990s, (ii) chloroplasts must be separated from
the following four problems have prevented nuclei and/or preferentially destroyed,
their further use in genome studies: (i) high an important process since copies of the
percentage of chimaeric clones; (ii) dif- chloroplast genome may comprise the
ficulty in DNA preparation and storage; majority of the DNA within a plant cell;
(iii) low transformation efficiency; and (iv) (iii) volatile secondary compounds such
instability of some inserts in yeast. In the as polyphenols must be prevented from
rice cultivar Nipponbare for example, 40% interacting with the nuclear DNA; and
of the clones in the YAC library alone were (iv) carbohydrate matrices that often form
chimaeric thus limiting its use for genome after tissue homogenization must be pre-
sequencing or map-based cloning. vented from trapping nuclei.
The BAC cloning system is based on Several different isolation methods
the E. coli single copy F factor (Shizuya have been developed. The first method
et al., 1992). It is easy to manipulate, screen was to isolate the protoplast from leaf tis-
and maintain the cloned DNA. It is non- sue and then embed the protoplast in low-
chimaeric, and has high transformation melting point agarose in the forms of a plug
efficiency. or bead. This method is expensive and
To facilitate gene identification in plant time consuming. In addition, chloroplast
species, second-generation BAC vectors DNA is not separated. The development of
such as BIBAC were constructed (Hamilton methods to isolate nuclei from leaf tissue
et al., 1996). A 150-kb human DNA frag- has dramatically improved the procedure
ment in the BIBAC vector was transferred and quality of the HMW DNA for library
into the tobacco genome by Agrobacterium- construction.
PREPARATION OF INSERT DNA FOR LIGATION. The gerprinting; chromosome walking; sequence
average size of DNA fragments produced tagged site (STS) mapping; and fluorescent
by complete digestion with restriction in situ hybridization (FISH). In restriction
enzymes with four- or six-base recogni- fragment fingerprinting, individual clones
tion sequences is too small for large insert are first digested with different restriction
library construction. To obtain relatively enzymes. The digested DNA is then labelled
HMW restriction fragments (100300 kb), with radioactive or fluorescent dye and run
the popular method is to partially digest the on a sequence gel. The fingerprint data is
target DNA with a four-base-cut enzyme. collected and analysed for contig assembly.
Partial DNA digestion not only yields frag- During the procedure, markers with known
ments of the desired size but also fragments map position are used as probes to screen
the genome randomly without exclusion of the large insert library. Clones hybridized
any sequence. with the same single copy marker are con-
To determine the conditions that yield sidered to be overlapping. PCR amplifica-
a maximum percentage of fragments between tion of DNA pools using primers derived
100 and 300 kb, a series of partial digestions from DNA markers with known position
are carried out by using different amounts can also be used for physical map construc-
of restriction enzyme for a specific diges- tion. The disadvantages of this method are
tion period. Once the optimal conditions that it is labour intensive and filling the
for producing fragments between 100 and gaps is difficult.
300 kb are determined, a mass digestion STS mapping uses a sequenced tagged
using several plugs is carried out to obtain site (STS) which is a short region of DNA about
sufficient DNA for size selection. Partially 200300 bases long whose exact sequence
digested HMW DNA is then subjected to is found nowhere else in the genome.
pulsed field gel analysis. Two or more clones containing the same STS
If there is no size selection of partially must overlap and the overlap must include
digested DNA, a random library will have a the STS. There are two disadvantages to this
preponderance of small inserts since small method: it is still very labour intensive and
fragments ligate more efficiently and clones the primer synthesis is expensive.
with small inserts transform with higher FISH uses synthetic polynucleotide
efficiency. Contour-clamped homogeneous strands that bear sequences known to be
electrical field (CHEF) is the most common complementary to specific target sequences
method for separating large DNA molecules. at specific chromosomal locations. The poly-
It uses a hexagonal array of fixed electrodes nucleotides are bound via a series of linked
and a homogeneous electrical field is gen- molecules to a fluorescent dye that can be
erated for enhancing DNA resolution. After detected with a fluorescence microscope.
two-size selection using CHEF Mapper, In addition, physical mapping can
the HMW restriction fragments must be be achieved by a combination of finger-
removed from surrounding agarose before printing, molecular linkage mapping, STS
they can be used in ligation reactions. After mapping, end sequencing and FISH map-
developing the high insert library, a number ping. A by-product of physical mapping
of random clones can be selected to confirm is the integration of genetic, physical and
the successful cloning of the inserts and the sequence maps as shown in Fig. 3.6.
average insert size. The average insert size
will then determine how many clones are
needed to achieve the desired amount of
3.2.3 Genome sequencing
genome coverage.
The sequencing of DNA in laboratories
Physical mapping first began in 1978. The first genome of a
multicellular eukaryote, Caenorhabditis
There are five physical mapping methods: elegans, was published in 1998. The ration-
optical mapping; restriction fragment fin- ale behind genome sequencing includes
74 Chapter 3
Human chromosome 16
Cytogenetic
map
Site of hybridization
FRA16D
FRA16B
CY180
CY165
Somatic cell
CY14
23HA with labelled probe
CY19
CY11
CY13
CY15
CY12
CY8
CY7
CY2
CY4
hybridization
map
(from cultured Region of interest
humanmouse D16 S159
D16 S150
D16 S149
D16 S160
D16 S144
between breakpoints
16AC 6.5
D16 S85
D16 S60
D16 S48
D16 S40
hybrid cells) CY8 and CY7
Genetic
linkage map Region of interest
between genetic
Region of interest can be localized either markers 16AC6.5
on physical map (somatic cell hybrid map) and D16S150
or genetic map.
BAC and/or
PAC contigs
STS GATCAAGGCGTTACATGA
AGTCAAACGTTTCCGGCCTA
Fig. 3.6. Example of physical mapping and integration of genetic, cytological and physical maps.
identification of all the genes in the sequenced DNA sequencers; and (iii) PCR. Until the
genome, elucidation of the functions and the late 1970s, obtaining the DNA sequences
interactions of genes in the genome, func- of even five to ten nucleotides was dif-
tional analysis of orthologues in related ficult and very laborious. The develop-
complex genomes, evolutionary analysis of ment of two new methods in 1977, that
genes or genomes and product development of Maxam and Gilbert (chemical sequenc-
and commercial application. As the next- ing method) and the other by Sanger and
generation sequencing technologies contin- Coulson (enzymatic sequencing), made it
ued to facilitate genome sequencing, new possible to sequence large DNA molecules.
applications and new assay concepts (e.g. Later refinements of Sangers chain termi-
Huang et al., 2009) have emerged that are nation method made it the preferred proce-
vastly increasing our ability to understand dure since it has proven to be technically
genome function, including sequence census simpler.
methods for functional genomics (Wold and The modified Sanger sequencing
Myers, 2008; Varshney et al., 2009). method or chain terminator procedure capi-
talizes on two properties of DNA polymer-
Technical developments in DNA sequencing ases: (i) their ability to synthesize faithfully
a complementary copy of a single-stranded
There are three major milestones in DNA DNA template; and (ii) their ability to use
sequencing: (i) the invention of sequenc- 3'-dideoxynucleotides as substrates. Once
ing reactions; (ii) automated fluorescent the analogue is incorporated at the growing
Omics and Arrays 75
point of the DNA chain, the 3' end lacks a and opening up many new possibilities
hydroxyl group and is no longer a substrate (Kahvejian et al., 2008; Shendure and Ji,
for chain elongation. Thus, the dideoxynu- 2008). There are three commercial next-
cleotides act as chain terminators. generation DNA sequencing systems avail-
The development of labelling and able (Schuster, 2008) which promise vastly
detection techniques have contributed to more sequencing capability (> 1 Gb of
an acceleration of sequencing procedures, sequence per run) than standard capillary-
which include 33P labelled primer (1970s); based technology can produce. A high-
33
P or 35S labelled primer with sharper throughput DNA sequencing technique
image and lower radiation (early 1980s); using a novel massively parallel sequenc-
and fluorescently labelled primers and ing-by-synthesis approach called pyrose-
dyes in four different reactions (1986). quencing was developed more recently by
DNA sequencing became automated in the 454 Life Sciences (Margulies et al., 2005;
late 1980s when the primer used for each www.454.com). 454 Sequencing employs
reaction was labelled with a differently clonal DNA fragment amplification on
coloured fluorescent tag. This technology beads in droplets of an aqueousoil emul-
allowed thousands of nucleotides to be sion, followed by loading the beads into
sequenced in a few hours and the sequenc- nanoscale ( 44 m) wells of a PicoTiterPlate
ing of large genomes then became a reality. which is a fibre optic chip. In each reac-
With ABI PRISM technology, up to four tion cycle, one of the four deoxynucleotide
different dyes can be used to label DNA triphosphates (dNTPs) is delivered to the
each of which can be differentiated when reactor along with DNA polymerase, ATP
run together in the same lane of a gel or sulfurylase and luciferase. Incorporation,
injected into a capillary. For DNA sequenc- which is accompanied by a chemolumins-
ing, this means that the four different dyes cent signal, is detected by a high-resolution
representing each of the DNA bases (A, C, charge-coupled device (CCD) sensor. 454
G and T) can be electrophoresed together. Sequencing is capable of sequencing roughly
The improvement of polyacrylamide 100 Mb of raw DNA sequence per 7-h run
gel electrophoresis (in the late 1980s and with their 2007 sequencing machine, the GS
early 1990s) led to high resolution, thin- FLX Genome Analyzer.
ner gels and a sharper image. Capillary 454 Sequencing allows large amounts
electrophoresis (CE) (1998) offers a number of DNA to be sequenced at low cost
of performance advantages such as faster compared to the Sanger chain-termina-
runs, small sample volumes and the abil- tion methods; G-C rich content is not as
ity to eliminate manual gel pouring and much of a problem, and the lack of reli-
sample loading tasks. Walk-away automa- ance on cloning means that unclonable
tion reduces instrument-associated labour segments are not skipped; it is also capa-
time by more than 80% over slab-gel sys- ble of detecting mutations in an amplicon
tems. The introduction of CE resulted in the pool at a low sensitivity level. However,
availability of automated electrophoresis each read of the 2005 sequencing machine
instruments with much lower cost per sam- GS20 is only 100 bp long, resulting in
ple (Amershams MegaBACE and Applied some problems when dealing with highly
Biosystems ABI3700, 3730, etc.). High- repetitive genomes, as repetitive regions
throughput sequencing can also incorporate of over 100 bp cannot be bridged and
full automation in colony picking, 96-well thus must be left as separate contigs. Also,
plasmid isolation and purification, PCR the nature of the technology lends itself
reactions, sample loading and sequence to problems with long homopolymer runs.
data analysis. As one of the projects using 454 sequenc-
The new generation of high-through- ing, Project Jim determined the first
put sequencing technologies promises to sequence of an individual, the complete
transform the scientific enterprise, poten- genome sequence of James Dewey Watson,
tially supplanting array-based technologies in May 2007.
76 Chapter 3
The second high-throughput sequenc- in a DNA strand offers the prospect of third
ing technique is Solexa (Illumina, Inc.; generation instruments that will sequence a
http://www.illumina.com) which depends diploid mammalian genome for US$1000
on sequencing by synthesis. Diluted DNA in 24 h (Branton et al., 2008).
templates are attached to a solid planar sur-
face and then amplified clonally. Sequencing Sequencing strategies
is performed by delivering a mixture of four
differentially labelled reversible chain ter- There are two general genome sequencing
minators along with DNA polymerase. The strategies: (i) clone-by-clone or hierarchical
resulting signal is detected at each cycle sequencing (International Human Genome
and a new cycle can be initiated after termi- Sequencing Consortium, 2001); and (ii) whole
nator removal (Bennet et al., 2005). Current shotgun sequencing (Venter et al., 2001).
average read lengths are about 3040 bases After constructing the complete physical
with 1 Gb per run. map, clone-by-clone sequencing can be
The third high-throughput sequenc- started in any specific region. Clone-by-clone
ing technique is SOLiD System which or hierarchical sequencing strategy has the
enables massively parallel sequencing of following advantages: (i) the ability to fill
clonally-amplified DNA fragments linked gaps and re-sequence the uncertain regions;
to beads. The SOLiD sequencing method- (ii) the ability to distribute the clones to
ology is based on sequential ligation with other laboratories; and (iii) the ability to
dye-labelled oligonucleotides. The SOLiD check the produced sequence by restriction
technology provides unmatched accu- enzymes. The main disadvantages are that
racy, ultra-high throughput and applica- it is expensive and time consuming for the
tion flexibility. It delivers advancements in construction of a physical map and experi-
throughput approaching 20 Gb per run. The enced personnel are required.
flexibility of two independent flow cells, The shotgun sequencing strategy
each capable of running 1, 4 or 8 samples, consists of making small insert librar-
allows multiple experiments to be con- ies (110 kb) from the genomic DNA of an
ducted in a single run. With unparalleled organism, sequencing a large number of
throughput and greater than 99.9% overall clones (six to eight times redundancy) and
accuracy, the SOLiD System enables large- assembling contigs using bioinformatics
scale sequencing and tag-based experiments software. It has no physical map construc-
to be completed more cost effectively than tion and less risk of recombinant clones. It
previously possible. is cost effective and fast and ideal for small
There are several emerging sequencing genome sequencing. However, it is difficult
methods: sequencing by hybridization; mass to fill gaps and re-track all the sequenced
spectrophotometric techniques; direct visu- plasmids and the resulting data is less use-
alization of single DNA molecules by atomic ful for positional cloning. Figure 3.7 com-
force microscopy; single-molecule sequenc- pares the two sequencing methods.
ing strategies. The intense drive towards
developing technology that can sequence a COMBINING CLONE-BY-CLONE AND SHOTGUN SEQUENC-
complete human genome for under US$1000 ING STRATEGIES. In 1997 The Institute of
will ensure that the speed and cost of Genome Research (TIGR) launched the ini-
sequencing will continue to improve rap- tiative of a whole-genome shotgun approach
idly (Schuster, 2008). For example, a nano- for the human genome. But BACs, BAC
pore-based device provides single-molecule end sequences and STS markers were used
detection and analytical capabilities that extensively in assembling the sequencing
are achieved by electrophoretically driving data from shotgun clones. The first draft of
molecules in solution through a nano-scale the human genome was completed within 3
pore. Further research and development to years compared with the 12 years taken by
overcome current challenges to nanopore the Human Genome Project which is funded
identification of each successive nucleotide by government agencies.
Omics and Arrays 77
3. Take subset
of clones,
fragment and
sequence
U-unitigs
Rock 50 kb Mates
Scaffold
Stones
Gap
Link mapped
scaffold to
existing map
STSs
Fig. 3.7. Comparison of two sequencing strategies: assembly of a mapped scaffold. U-unitigs are
assembled into scaffolds using mate-pair information to bridge gaps between two U-unitigs, and by
linking unitigs to rock, which are less-well supported unitigs that nevertheless fit in place according
to at least two independent large insert mate pairs. Stones are single short contigs whose position is
supported by only a single read. Gaps are filled in the finishing stage by further site-directed sequencing.
Scaffolds are placed against existing genetic and physical maps by sequence tagged site (STS) matches
and against the cytological map by fluorescent in situ hybridization (FISH).
high C0t and MF, Martienssen et al. (2004) Plant genomic sequences
generated up to twofold coverage of the
gene space with less than one million The first complete plant genome to be
sequencing reads and simulations using sequenced was that of Arabidopsis. The
sequenced BAC clones predicted that sequenced regions cover 115.4 Mb of the
5 coverage of gene-rich regions, accompa- 125-Mb genome and extend into centro-
nied by less than 1 coverage of subclones meric regions. The evolution of Arabidopsis
from BAC contigs, will generate a high qual- involved a whole genome duplication fol-
ity mapped sequence that meets the needs lowed by subsequent gene loss and extensive
of geneticists while accommodating unu- local gene duplications. The genome contains
sually high levels of structural polymor- 25,498 genes encoding proteins from 11,000
phism. Haberer et al. (2005) selected 100 families (The Arabidopsis Genome Initiative,
random regions averaging 144 kb in size, 2000). Arabidopsis contains many families of
representing about 0.6% of the genome, to new proteins but also lacks several common
define their content of genes and repeats protein families. The proportion of predicted
for characterizing the structure and archi- Arabidopsis genes in different functional cat-
tecture of the maize genome. Combining egories is provided in Fig. 3.8. The complete
CBCS with genome filtration can greatly genome sequence provides the foundation
reduce the cost while retaining the high for more comprehensive comparison of con-
coverage of genic regions. An alternative served processes in all eukaryotes, identifying
approach is the identification of gene-rich a wide range of plant-specific gene functions
regions on a detailed physical map and and establishing rapid systematic methods
sequencing large-insert clones from these of identifying genes for crop improvement
regions. (Varshney et al., 2009).
Unclassified Metabolism
10% 11%
Cellular organization
5%
Intracellular traffic
3%
Protein destination
12%
Rice was the first crop to be fully (University of Missouri), Mark Vaudin
sequenced because of its importance as one (Monsanto) and Steve Rousley (Cereon);
of the major cereals and also because of its the other included Jeff Bennetzen (Purdue
small genome size, small number of chromo- University), Karel Schubert and Roger Beachy
somes (n = 12), well characterized genetic (Danforth Center), Cathy Whitelaw and John
and genomic resources and availability of Quackenbush (TIGR) and Nathan Lakey
a large number of DNA markers and a high (Orion). These two pioneer programmes have
density genetic linkage map. Two draft been extended by a massive US programme
sequences were completed in 2002 (Goff et from the National Science Foundation (NSF),
al., 2002; Yu et al., 2002) and a complete USDA and the Department of Energy (DOE)
sequence was published in 2005 (IRGSP, led by Rick Wilson (Washington University).
2005) which is available in the National The sequencing strategy is a hybrid between
Center for Biotechnology Information (NCBI) a BAC-by-BAC approach and a whole-
database. genome shotgun.
Many sequencing projects for impor-
tant crop species are currently ongoing. The
US Department of Energys Joint Genome 3.2.4 cDNA sequencing
Institute (JGI) is providing funding and
technical assistance to decode the genomes Why cDNA sequencing
of several major plants, including cassava
(Manihot esculenta), cotton (Gossypium), Large-scale DNA sequencing can be car-
foxtail millet (Setaria italica), sorghum, soy- ried out on genomic DNA or cDNAs. There
bean and sweet orange (Citrus sinensis L.) are four advantages to performing cDNA
(http://www.jgi.doe.gov/sequencing/). sequencing. First is the cost of sequencing
Other plants for which there are ongo- a whole genome. Although DNA sequenc-
ing genome sequencing projects include ing costs have fallen more than 50-fold over
Medicago truncatula (http:///www.medi the past decade, it still costs around US$10
cago.org/genome), Lotus japonicum (http:// million to sequence three billion base pairs.
www.kazusa.or.jp), poplar, tomato (http:// It will take years to realize the goal to lower
www.sgn.cornell.edu) and grapevine. the cost of sequencing a mammalian-sized
The International Wheat Genome genome to US$100,000 and ultimately to
Sequencing Consortium (IWGSC) has been cut the cost of whole-genome sequencing to
formed to advance agricultural research for US$1000 or less.
wheat production and utilization by develop- Secondly, the interpretation of the
ing DNA-based tools and resources that result genomic sequence of eukaryotes is not
from the complete sequencing of the expressed straightforward in contrast to prokaryotes:
genome of common (hexaploid) bread wheat coding regions are separated by non-coding
and to ensure that these tools and the sequences regions; introns and alternative splic-
are available for all to use without restriction ing occurs; one gene can lead to multiple
and without cost (Gill et al., 2004; http://www. mRNAs and gene products; a significant
wheatgenome.org/). A Global Musa Genomics fraction of genomic DNA does not code for
Consortium (GMGC) is decoding the Musa proteins (non-coding sequences).
genome (http://www.newscientist.com/article. Thirdly, cDNA sequencing helps in
ns?id-dn1037). A Global Cassava Partnership, annotation and identification of exons and
an alliance of the worlds leading cassava introns. Estimates of the number of human
researchers and developers, has proposed that genes vary from 30,000 to 80,000. The accu-
sequencing the cassava genome should be a racy of the Arabidopsis genome annotation
priority (Fauquet and Tohme, 2004). varied from 50 to 70% in the first draft.
To sequence the maize genome, two Many Arabidopsis genes are still not accu-
consortia in the USA began a pilot study: rately annotated.
one with Jo Messing (Rutgers University), Fourthly, sequencing cDNAs helps
Rod Wing (Arizona University), Ed Coe gain information about the transcriptome.
80 Chapter 3
mRNA populations are variable among efficiency of full-length cDNA cloning using
cells. The transcriptome is dynamic and a cap trapper method (biotinylated cap) and
constantly changing. Cells adapt to envi- thermoactivation of reverse transcriptase
ronmental, developmental and other sig- (cDNA synthesis at 60C: RNA secondary
nals by modulating their transcriptome. structures are melted). Some normalization
mRNA populations form an important and subtraction methods also allow enrich-
level of regulation between signal per- ment of full-length cDNAs.
ception and response. Genetically identi- For a given mRNA, multiple expressed
cal cells can exhibit distinct phenotypes. sequence tags (ESTs) can be obtained.
cDNA sequencing allows direct insight Depending on the extent of sampling, ESTs
into mRNA populations and allows the may or may not overlap. EST process-
dissection of the transcriptome which ing is needed to remove vector sequences,
genomic sequencing alone does not pro- linker sequences, check the quality using a
vide. Sequencing of random cDNA clones sequence quality filter, clean up the contam-
prepared from different tissues also allows inants and chimaeric sequences and store in
analyses of mRNA abundance. databases. To construct EST contigs, there
are two commonly used programs: Phrap/
cDNA libraries consed and TIGR assembler. These programs
generate a unigene set (contigs or Tentative
When constructing a representative cDNA Consensus): a consensus sequence for all
library, the source of the mRNA for the overlapping ESTs that (supposedly) corre-
cDNA library is critical and will vary spond to a single mRNA.
depending on the goal of the study. To esti- Several factors affect the quality of
mate the diversity of mRNAs expressed in EST contigs: contaminating sequences, bad
a given plant, the mRNA should represent quality sequences, non-overlapping ESTs
most plant tissues and organs. On the other from the same mRNA, alternative splicing
hand, to define the diversity of mRNAs resulting in one gene with multiple mRNAs
represented in a specific tissue, organ or and closely related genes (chimaeric con-
developmental stage, the library should tigs). EST annotation can be carried out
be prepared from the most highly defined using similarity searches against Genbank
source feasible. As indicated by Nunberg and other databases, e.g. protein motif data-
et al. (1996), it is better to invest the time bases, to assign a putative function or iden-
to harvest sufficient quantities of scarce tis- tify functional categories. This process can
sue for a library rather than using materials be automated or manual (usually a combi-
which will contain a significant proportion nation of the two).
of extraneous messages. Non-random (normalized or subtracted)
If large quantities of RNA are available, cDNA libraries are needed in order to over-
it is possible to create a plasmid library come some of the problems with redundant
directly. This is particularly feasible since ESTs in order to saturate EST databases when
electroporation transformation efficiencies budget is limited or when there is a specific
are so high. Plasmid libraries may or may interest in a particular stage. Hybridization-
not be directional and are easily arranged based methods are most commonly used
in an ordered array. Constructing plas- to decrease redundancy (reduce represen-
mid libraries directly avoids any sequence tation of abundant cDNAs and increase
bias, including internal deletion and trans rare cDNAs). Normalized cDNA libraries
recombination that may occur during the are used when gene discovery is the main
excision process. objective of the EST project.
The frequency of full-length cDNAs
depends on the length of transcript (the cDNA sequencing
longer the transcript the lower the frequency
of obtaining full-length cDNAs). Carninci Strategies for cDNA sequencing include
and Hayashizaki (1999) discussed the high- single-pass cDNA sequencing (ESTs),
Omics and Arrays 81
be laborious to clone full-length cDNA) and indicated by Busch and Lohmann (2007),
simple gene identification that is limited the limited length of the sequenced tags
by sequences that are already in a database precludes the use of MPSS for de novo
(otherwise the corresponding gene must be sequencing but makes it a very powerful
cloned). tool for expression profiling of organisms
Several alternative technologies have with pre-existing sequence information.
emerged for measuring transcript abun- By contrast, two other high-throughput
dance in a parallel fashion. Essentially, these sequencing techniques as described previ-
methods can be divided into three catego- ously, 454 and Solexa, are ideally suited
ries according to their underlying principle, for expression-profiling purposes. Short
namely PCR-, sequencing- or hybridization- tags are sufficient to identify a transcript
based technologies. Therefore, strategies unambiguously and therefore problems
that are currently available for analysis of arising from assembling short tags into
transcriptomes include RT-PCR (qualitative larger contigs can be ignored.
and quantitative), hybridization methods PCR product-based arrays were heavily
(northern blots, macroarrays, DNA micro- used in the early days of global transcriptome
arrays, oligonucleotide microarrays), analysis. However, the low level of stand-
cDNA fingerprinting (differential display, ardization among laboratories, high levels
cDNA-AFLP), cDNA sequencing (full-length of noise and experimental variation and
cDNAs, subtracted cDNAs, normalized cross-hybridization between homologous
cDNA libraries, SAGE, massive parallel sig- transcripts have eroded the attractiveness of
nature sequencing MPSS) and combina- these arrays. Oligonucleotide-based micro-
tions of the above techniques. arrays are now becoming the most popular
The most straightforward and unbi- technology for large-scale expression pro-
ased method of analysing an RNA popu- filing because they allow the simultaneous
lation is the sequencing of cDNA libraries detection of tens of thousands of transcripts
and quantitative analysis of the result- at a reasonable cost. The expression level
ing ESTs. Traditionally, ESTs with read- of any gene represented on the array can
lengths of about 200900 nucleotides have be deduced from the fluorescence inten-
been produced by Sanger-sequencing but sity of the corresponding probe. However,
the associated costs have severely limited microarrays only offer linear expression
the resolution of this approach (Busch and measurements over a range of three orders
Lohmann, 2007). Deep sequencing has of magnitude compared to quantitative
become a viable alternative for unbiased RT-PCR which has a dynamic range of five
large-scale expression profiling because orders of magnitudes. Microarrays perform
of the development of new protocols and with less precision and sensitivity than
entirely new sequencing techniques. Non- other techniques when used for measuring
gel-based sequencing techniques promise low abundance transcripts in particular and
to deliver greatly increased throughput this is manifested in their greater inter-assay
and a considerable cost reduction. MPSS variability (Busch and Lohmann, 2007).
combines in vitro cloning of millions of Another major limitation of microarrays
template tags on separate microbeads designed for expression analysis is that they
with ligation-mediated sequence detec- rely on current genome annotations, which
tion. In each reaction cycle, a four-base precludes the identification of novel or very
overhang is produced on every tag to small transcription units.
which a fluorescently labelled adaptor of Microarrays and quantitative RT-PCR
defined sequence is ligated. The position have dominated expression profiling to date
and fluorescence of every microbead is but deep sequencing and whole-genome
monitored by a high resolution camera in tiling arrays will become increasingly
each of the reaction cycles, allowing the important because these techniques are
sequences of the 17-nucleotide tags to be not limited to the detection of known tran-
reconstructed (Brenner et al., 2000). As scripts. Tiling arrays, on which the entire
Omics and Arrays 83
genome is represented by evenly spaced only a rough estimate of its level of expres-
probes, provide a novel means of transcript sion into a protein. An mRNA produced
identification. In Arabidopsis, tiling arrays in abundance may be degraded rapidly or
have been used to map transcriptionally translated inefficiently, resulting in a small
active regions by profiling four different tis- amount of protein. Secondly, many proteins
sues (Yamada et al., 2003). experience post-translational modifications
The interaction transcriptome is the that profoundly affect their activities; for
sum of all microbe and host transcripts that example some proteins are not active until
are produced during the interaction. The they become phosphorylated. Methods
challenges in studying interaction transcrip- such as phosphoproteomics and glycopro-
tomes include how to discriminate patho- teomics are used to study post-translational
gen from host ESTs, similarity searches modifications. Thirdly, many transcripts
to genome/cDNA sequences, GC analyses give rise to more than one protein through
and determination of hexamer frequency alternative splicing or post-translational
(windows of 6 bp). Systems genomics/tran- modifications. It is generally supposed that
scriptomics can be used to analyse complex if genomes contain tens of thousands of gene
transcriptomes, for example the mixtures of sequences, the proteome comprises several
mRNAs from different species (e.g. infected hundred thousand proteins as a result of
tissue, environmental samples such as soil alternative slicing and post-translational
or seawater, etc.). One challenge is to iden- modifications. Finally, many proteins form
tify the species of origin in the mixtures. complexes with other proteins or RNA mol-
ecules and only function in the presence of
these molecules.
3.3.2 Proteomics Proteomics has become an important
approach for investigating cellular proc-
Proteomics is the study of the identification, esses and network functions. Significant
function and regulation of complete sets improvements have been made in technolo-
of proteins in a tissue, cell or subcellular gies for high-throughput proteomics, both at
compartment. Such information is crucial the level of data analysis software and mass
to understanding how complex biological spectrometry (MS) hardware (Baginsky and
processes occur at a molecular level and Gruissem, 2006). In this section, proteom-
how they differ in various cell types, stages ics will be briefly discussed. For further
of development or environmental condi- details, readers are referred to the follow-
tions (Bourgualt et al., 2005). Proteomics is ing review articles: van Wijk (2001), Molloy
important as proteins are active agents in and Witzmann (2002), de Hoog and Mann
cells and they execute the biological func- (2004), Saravanan et al. (2004), Baginsky and
tions encoded by genes. Sequences of genes Gruissem (2006), Cravatt et al. (2007) and
(or genomes) and transcriptome analyses Zivy et al. (2007).
are not sufficient to elucidate biological
functions. Proteomics complements tran- Protein extraction
scriptomics by providing information about
the time and place of protein synthesis Obtaining high quality protein is the first step
and accumulation, as well as identifying in proteomic research. Extracting protein
those proteins and their post-translational from plant tissue requires tissue disrup-
modifications. Gene expression does not tion by grinding and sonication, separation
necessarily indicate whether a protein is of proteins from unwanted cell materials
synthesized, how fast it is turned over or (cell wall, water, salt, phenolics, nucleic
which possible protein isoforms are synthe- acids) by centrifugation after precipitation
sized (Mathesius et al., 2003). In some cases, of proteins with acetonetrichloroacetic
the correlation between gene expression acid, resolubilizing protein in a solution
and protein presence is as low as 0.4. First, that dissolves the maximum number of dif-
the level of transcription of a gene gives ferent proteins and inactivation of protease
84 Chapter 3
by acetonetrichloroacetic acid treatment or tion can be calculated for all the known
specific protease inhibitors.Pre-fractionation sequence proteins of a given organism (Zivy
of tissue is optional for the analysis of pro- et al., 2007). These masses will depend on
teins from different organelles or micro- the length of peptides and their composi-
somal fractions. Solubilization requires urea tion since most amino acids have differ-
or, for more hydrophobic proteins, thiourea, ent masses. Thus, masses predicted from
as a chaotrope which solubilizes, denatures sequences stored in databases can simply be
and unfolds most proteins. Non-ionic zwit- compared with masses effectively measured
ter detergents, e.g. 3-[3-cholamidopropyl- by the MALDI-TOF equipment. The greater
dimethyl-ammonio]-1-propane sulfonate the number of positive mass matches the
(CHAPS), Triton-X, or amidosulfobetaines more likely it is that the peptides originate
are used to solubilize and separate proteins from the same protein thus facilitating the
in a mixture. Sodium dodecyl sulphate rapid identification of proteins.
(SDS) is also a strong detergent and used to
solubilize membrane proteins. However, it Protein profiling
renders a negative charge to proteins and,
therefore, interferes with isoelectric focus- Protein mixtures of considerable complexity
ing (Mathesius et al., 2003). Reducing agents can now be routinely characterized in some
(usually dithiothreitil [DDT], 2-mercapto- detail. One measure of technical progress is
ethanol or tributyl phosphine) are needed the number of proteins identified in each
to disrupt disulfide bonds. study. Such numbers can now reach the
thousands for suitably complex samples.
Protein identification and quantification Large-scale proteomic studies are needed
to solve three types of biological problem
N- or C-terminal sequencing has made pro- (Aebersold and Mann, 2003): (i) the genera-
tein identification possible on a small scale tion of proteinprotein linkage maps; (ii)
although with limitations. Improvements the use of protein identification technol-
in MS have made it possible to identify ogy to annotate and, if necessary, correct
proteins faster, on a larger scale, using genomic DNA sequences; and (iii) the use
smaller amounts of protein. In addition, of quantitative methods to analyse protein
post-translational modifications can be expression profiles as a function of the
determined by MS/MS analysis and pro- cellular state as an aid to inferring cellular
teins can be identified even when bound function.
to other proteins in complexes. A standard The sequences of many mature pro-
technique for protein identification with teins in higher eukaryotes after processing
MALDI-TOF MS is peptide mass finger- and splicing are often not directly apparent
printing. Protein spots in a gel can be vis- from their cognate DNA sequences. Peptide
ualized using a variety of chemical stains sequence data of sufficient quality provides
or fluorescent markers. Proteins can often unambiguous evidence of translation of a
be quantified by the intensity with which particular gene and can in principle, dif-
they stain. Once proteins have been sepa- ferentiate between alternatively spliced or
rated and quantified, they can be identi- translated forms of a protein (Aebersold
fied. Individual spots are cut out of the gel and Mann, 2003). Thus, it might be tempt-
and cleaved into peptides with proteolytic ing to systematically analyse the proteins
enzymes. These peptides can then be iden- expressed by a cell or tissue, that is, to gen-
tified by MS, specifically MALDI-TOF MS. erate comprehensive proteome maps.
The MALDI-TOF analysis will measure very The more common and versatile use
precisely (< 0.1 Da) the mass of peptides of large-scale MS-based proteomics has
formed by this digestion. Since the cleav- been to document the expression of pro-
age sites are known, the digestion can be teins as a function of cell or tissue state.
simulated by informatics, that is, the masses Aebersold and Mann (2003) argued that to
of all the peptides produced by this diges- be meaningful, such data must be at least
Omics and Arrays 85
semi-quantitative and that a simple list of There are many important charac-
proteins detected in the different states is teristics of a proteinprotein interaction.
insufficient. This is because analyses of Obviously, it is important to know which
complex mixtures are often not comprehen- proteins are interacting. In many experi-
sive and therefore the non-appearance of a ments and computational studies, the focus
particular sequence in the list of identified is on interactions between two different
peptides does not indicate that the peptide proteins. However, one protein can interact
or protein was not originally present in the with other copies of itself (oligomerization)
sample. Additionally, it is often impossible or with three or more different proteins.
to prepare a certain cell type, cell fraction The stoichiometry of the interaction is also
or tissue in completely pure form without important, that is, how many of each pro-
trace contamination from other fractions. tein involved are present in a given reac-
And because the ion current of a peptide is tion. Some protein interactions are stronger
dependent on a multitude of variables that than others because they bind together more
are difficult to control, this measure is not tightly. The strength of binding is known
a good indicator of peptide abundance. If as affinity. Proteins will only bind to each
stable-isotope dilution has not been used, a other spontaneously if it is energetically
rough relative estimate of the quantity of a favourable. Energy changes during bind-
protein can be obtained by integrating the ing are another important aspect of protein
ion current of its peptide-mass peaks over interactions. Many of the computational
their elution time and comparing these tools that predict interactions are based on
extracted ion currents between states, pro- the energy of interactions.
vided that highly accurate and reproducible Protein interaction maps represent
methods are used. Increasingly, stable-iso- essential components of the post-genomic
tope dilution and LC-MS/MS are used to tool kits needed for understanding biologi-
accurately detect changes in quantitative cal processes at a systems level. Over the
protein profiles and to infer biological func- past decade, a wide variety of methods have
tion from the observed patterns (Aebersold been developed to detect, analyse and quan-
and Mann, 2003). tify protein interactions, including surface
plasmon resonance spectroscopy, nuclear
Proteinprotein interactions magnetic resonance (NMR), Y2H screens,
peptide tagging combined with MS and
Proteinprotein interactions occur among fluorescence-based technologies. Lalonde
most proteins and there are six types of et al. (2008) and Miernyk and Thelen (2008)
interfaces found in proteinprotein inter- reviewed the latest techniques and cur-
actions: domaindomain, intra-domain, rent limitations of biochemical, molecular
hetero-oligomer, hetero-complex, homo- and cellular approaches for the detection
oligomer, and homo-complex. The analysis of proteinprotein interactions. In vitro
of proteinprotein interactions can be either biochemical strategies for identifying and
qualitative or quantitative. Traditional bio- characterizing interacting proteins include
chemical methods such as co-purification co-immunoprecipitation, blue native gel
and co-immunoprecipitation have been electrophoresis, in vitro binding assays, pro-
used to identify the members of protein tein cross-linking and rate-zonal centrifuga-
complexes. Proteomics-based strategies tion. Fluorescence techniques range from
have been used to determine the composi- co-localization to tags which may be limited
tion of complexes and to establish interac- by the optical resolution of the microscope,
tion networks. The systematic, large-scale, to fluorescence resonance energy transfer
high-throughput approaches now being (FRET)-based methods that have molecular
taken to build maps of the interactions resolution and can also report on the dynam-
between proteins predicted by genome ics and localization of the interactions within
sequence information have become known a cell. Proteins interact via highly evolved
as interactomics (Causier et al., 2005). complementary surfaces with affinities that
86 Chapter 3
can vary over many orders of magnitude. strate. For example, drugs can be used as
Some of the techniques such as surface plas- affinity baits in the same way as proteins to
mon resonance provide detailed information define their cellular targets and small mol-
regarding the physical properties of these ecules such as cofactors can be used to iso-
interactions. To analyse protein complexes late interesting sub-proteomes (MacDonald
systematically at a sub- or full-genome level, et al., 2002).
several methods have been adapted for high- The Y2H system has become one of
throughput screens using robotics: (i) Y2H the standard laboratory techniques for the
systems; (ii) the mating-based split-ubiquitin detection and characterization of protein
system (mbSUS); and (iii) affinity purifica- protein interactions. It can be used to map
tion of protein complexes followed by iden- individual amino acid residues involved
tification of proteins by MS (AP-MS). in a specific proteinprotein interaction.
One of the first questions usually asked It can also be used to identify novel inter-
about a new protein, apart from where it is actions from complex libraries of expressed
expressed, is to what proteins does it bind? proteins. The Y2H system has been widely
To study this question by MS, the protein used for determination of protein interac-
itself is used as an affinity reagent to isolate tion networks within different organisms.
its binding partners. Compared with two- In plants, the Y2H system has been suc-
hybrid and array-based approaches, this cessfully applied to detect interactions
strategy has the advantages that the fully with phytochromes, cryptochomes, tran-
processed and modified protein can serve scription factors, proteins involved in self-
as the bait, that the interactions take place incompatibility mechanisms, the circadian
in the native environment and cellular loca- clock and plant disease resistance (Causier
tion and that multi-component complexes et al., 2005). Taken together with the recent
can be isolated and analysed in a single progress made in the development of large-
operation (Ashman et al., 2001). However, scale Y2H screening procedures, the time is
because many biologically relevant interac- now ripe for large-scale Y2H screens to be
tions are of low affinity, transient and gen- applied to organisms such as Arabidopsis
erally dependent on the specific cellular and rice.
environment in which they occur, MS-based Another potential method to detect
methods in a straightforward affinity experi- proteinprotein interactions involves the
ment will detect only a subset of the protein use of FRET between fluorescent tags on
interactions that actually occur (Aebersold interacting proteins. FRET is a non-radio-
and Mann, 2003). Bioinformatics methods, active process whereby energy from an
correlation of MS data with those obtained excited donor fluorophore is transferred to
by other methods or iterative MS measure- an acceptor fluorophore that is within 60
ments possibly in conjunction with chemi- of the excited fluorophore (Wouters et al.,
cal crosslinking (Rappsilber et al., 2000) 2001). After excitation of the first fluoro-
can often help to further elucidate direct phore, FRET is detected either by emis-
interactions and overall topology of multi- sion from the second fluorophore using
protein complexes. appropriate filters or by alternation of the
The ability of quantitative MS to detect fluorescence lifetime of the donor. Two
specific complex components within a fluorophores that are commonly used are
background of non-specifically associated variants of GFP: cyan fluorescent protein
proteins increases the tolerance for high (CFP) and yellow fluorescent protein (YFP)
background and allows for fewer purifica- (Tsien, 1998). The potential of FRET is con-
tion steps and less stringent washing condi- siderable, for two reasons (Phizicky et al.,
tions, thus increasing the chance of finding 2003). First, it can be used to make meas-
transient and weak interactions. The same urements in living cells, which allows the
methods can be used to study the interac- detection of protein interactions at the
tion of proteins with nucleic acids, small location in the cell where they normally
molecules and in fact with any other sub- occur in the presence of the normal cellular
Omics and Arrays 87
change in environmental conditions on par- The global study of the structure and
ticular metabolites (Verdonk et al., 2003). dynamics of metabolic networks has been
Sample preparation is focused on isolating hindered by a lack of techniques that iden-
and concentrating the compound of inter- tify metabolites and their biochemical
est to minimize detection interference from relationship in complex mixtures. Recent
other components in the original extract. advances in ultra-high mass accuracy MS
Metabolite profiling refers to a qualitative provide two advantages that can enable ab
and quantitative evaluation of metabolite initio determination of metabolic networks:
collections, for example those found in a (i) the ability to identify molecular formu-
particular pathway, tissue or cellular com- lae based on exact masses; and (ii) the infer-
partment (Burns et al., 2003). Finally, meta- ence of biosynthetic relationships between
bolic fingerprinting focuses on collecting masses directly from the mass spectrum.
and analysing data from crude extracts to Mass spectrometers with the necessary per-
classify whole samples rather than separat- formance parameters (mass accuracy around
ing individual metabolites (Johnson et al., 1 ppm and resolution above 100,000 m/m)
2003; Weckwerth, 2003). are now within the reach of many research-
In stark contrast to transcriptomics and ers and will change the way we think about
proteomics, metabolomics is mainly spe- metabolomics (Breitling et al., 2006). The
cies-independent, which means that it can recent application of Fourier transform
be applied to widely diverse species with ion cyclotron resonance MS (FTICR-MS)
relatively little time required for re-optimiz- to metabolomic analysis suggests a way to
ing protocols for a new species. Metabolite tackle the problem. A lower-cost alterna-
profiling can monitor variation in the accu- tive to high-field FTICR-MS, the Orbitrap
mulation of metabolites in plant cells in mass analyser, promises accelerated activ-
culture which are ectopically expressing ity in this area. These two analysers are able
transcription factors, as a hypothesis-gener- to achieve high resolution and mass accu-
ating tool to establish the possible pathways racy in the 1-ppm range for biomolecular
regulated by particular regulatory proteins. samples. In both instruments, the ionized
The first step consists of generating a trans- metabolite mixture is trapped in an orbital
genic cell line expressing the regulator from trajectory. The frequency of their orbit
a constitutive or inducible promoter. The depends on the mass-over-charge ratio of the
second step is to subject extracts from trans- ions and can be measured precisely, which
formed and control cells to various meta- is the basis of the exceptional accuracy. In
bolic profiling approaches to determine the FTICR-MS, trapping is achieved in a strong
qualitative and quantitative differences in magnetic field which exerts a force on the
metabolite accumulation. A more practical charged particles that is perpendicular to
approach to monitoring and purifying indi- their direction of motion and thus confines
vidual metabolites is to profile hundreds or them to a circular path. The Orbitrap traps
thousands of small molecules biochemically ions without a magnetic field and ions are
and to screen for changes in the relative trapped in a radial electrical field between
levels of these compounds. By comparing a central and an outer cylindrical electrode.
two conditions, a profile of the differences Theoretically, the resolving power of the
can be obtained that is then used as a blue- FTICR-MS and Orbitrap is sufficiently high
print to identify the individual compounds to resolve even the most complex metabo-
affected (Dias et al., 2003). The immense lite mixtures using direct infusion.
chemical diversity of small biomolecules Gas chromatography (GC)-MS or
makes comprehensive metabolome screens LC-MS is the tool of choice for generating
difficult. The lack of unifying principles high-throughput data for identification and
such as genetic codes that would assist mol- quantification of small-molecular-weight
ecule identification, comparison and causal metabolites (Weckwerth, 2003). Capillary
connection is another important challenge electrophoresis (CE) is an alternative
(Breitling et al., 2006). method which separates particular types of
90 Chapter 3
compound more efficiently and can be cou- centration in a single NMR experiment with
pled with MS or other types of detectors. excellent reproducibility.
NMR, infrared (IR), ultraviolet (UV) and HPLC and GC are the most widely used
fluorescence spectroscopy can be used as analytical techniques for the separation of
alternative means of detection, often in par- small metabolites. GC is used to separate
allel with MS (Weckwerth, 2003). TOF MS compounds on the basis of their relative
technology has also been used in metabo- vapour pressure and affinities for the sta-
lite analysis and provides a means of high tionary phase in the chromatographic col-
sample-throughput. In the end, a combina- umn. It offers very high chromatographic
tion of methods enables analysis of a broad resolution but requires chemical derivatiza-
range of metabolites. tion for many biomolecules: only volatile
NMR is a spectroscopic technique that chemicals can be analysed without deriva-
exploits the magnetic properties of the tization. Some large and polar metabolites
atomic nucleus (Macomber, 1998). In NMR, cannot be analysed by GC. GC tends to give
the sample is immersed in a strong external much greater chromatographic resolution
magnetic field and transitions between the than HPLC but has the disadvantages of
nuclear magnetic energy levels are induced being limited to compounds that are vola-
by a suitably oriented radiofrequency field. tile and heat stable. A big advantage of GC
In theory, any molecule containing one is that it can be easily combined with MS,
atom with a non-zero nuclear spin (I) is which greatly increase its utility for multi-
potentially visible by NMR. Considering component profiling because of its inherent
the isotopes with a non-zero nuclear spin high specificity, high sensitivity and positive
such as 1H, 13C, 14N, 15N and 31P, all biologi- peak confirmation (Dias et al., 2003). HPLC
cal molecules have at least one NMR signal. is a form of column chromatography used
There is wide variation in the sensitiv- frequently in biochemistry and analytical
ity of the experiment for different nuclei, chemistry. It is used to separate components
hence 1H NMR remains the best choice for in a mixture by using a variety of chemical
metabolite profiling by NMR mainly due to interactions between the substance being
its natural abundance (99.8%) and sensitiv- analysed (analyte) and the chromatography
ity (Moing et al., 2007). The NMR spectrum column. Compared to GC, HPLC has lower
generally consists of a series of discrete chromatographic resolution but it does have
lines (resonances) which are character- the advantage that a much wider range of
ized not only by the familiar spectroscopic analytes can potentially be measured.
quantities of frequency (chemical shift), The generation of reproducible and
intensity and line shape, but also by relaxa- meaningful metabolomic data requires great
tion times. Although less sensitive than GC care in the acquisition, storage, extraction
or LC-MS, proton NMR spectroscopy is a and preparation of samples (Fiehn, 2002).
powerful complementary technique for the The true metabolic state of samples must be
identification and quantitative analysis of maintained and additional metabolic activ-
plant metabolites either in vivo or in tissue ity or chemical modification after collec-
extracts (Krishnan et al., 2005). Typically, tion must be prevented. Depending on the
2040 metabolites have been identified in type of sample and the analysis performed,
metabolite profiling of plant extracts and this can be achieved in various ways. The
the number of metabolites quantified can be most common strategies are freezing in liq-
increased with higher field strength (increas- uid nitrogen, freeze-drying, and heat dena-
ing spectral resolution) and by using micro- turation to halt enzymatic activity (Fiehn,
probes for small quantity samples together 2002). Metabolomic experiments are typi-
with cryogenic probe heads (increasing cally conducted by comparing experimen-
sensitivity). One of the main advantages of tal plants possessing an expected metabolic
1
H-NMR is that structural and quantitative modification (i.e. because of the introduc-
information can be obtained on numerous tion of a transgene or exposure to a particu-
chemical species with a wide range of con- lar treatment) to control plants. Statistically
Omics and Arrays 91
is accounted for. Map comparisons between based on inferred protein matches between
closely related species are largely unaffected 26,028 genes. A total of 34 non-overlapping
because most duplications pre-date them. chromosomal segment pairs were identified
Comparative maps lay the groundwork for consisting of 23,177 (89%) Arabidopsis genes
asking questions about whether specific (Bowers et al., 2003b). To relate this alpha
linkage blocks or gene arrangements are sta- duplication to the angiosperm family tree, all
tistically associated with increased fitness or duplicated syntenic Arabidopsis gene pairs
have a relationship between polyploidy and were compared to individual genes from
plant adaptation. For example, comparative pine, rice, tomato, Medicago, cotton and
linkage mapping and chromosome painting Brassica. It was determined whether inferred
in the close relatives of Arabidopsis have protein sequences were from duplicated
inferred an ancestral karyotype of these spe- syntenic gene pairs. Arabidopsis genes were
cies. In addition, comparative mapping to more similar to one another than to the heter-
Brassica has identified genomic blocks that ologous protein in another species.
have been maintained since the divergence
of the Arabidopsis and Brassica lineages RELATIVE AGE OF CHROMOSOMAL DUPLICATION EVENTS.
(Schranz et al., 2007). It was concluded that the alpha duplication
event pre-dated divergence from Brassica
An example: Arabidopsistomato about 14.520.4 million years ago but post-
comparative map dated divergence from cotton about 8386
million years ago.
DEVELOPMENT OF ARABIDOPSISTOMATO COMPARA-
About 50% (4964%) of Brassica
TIVE MAP TO DETECT MACROSYNTENY. Fulton et al.
sequences were more similar to one dupli-
(2002) identified over 1000 conserved cated Arabidopsis sequence than was the other
orthologous sequences (COS) between Arabidopsis sequence to its paralogue. Only
tomato and Arabidopsis by comparison of 619% of cotton, rice, pine, etc. sequences
Arabidopsis genomic sequence with 130,000 clustered internally to the Arabidopsis syn-
tomato ESTs (representing 27,000 unigenes tenic duplicates (Bowers et al., 2003b).
or approximately 50% of the tomato gene
content). For 1025 COS markers developed,
POLYPLOID ANCESTRY OF MOST PLANT SPECIES. As
927 were screened against tomato DNA
using Southern analysis to classify them as more data accumulates, the history of
single, low or multiple copy, among which angiosperms emerges as a history of genome-
85% were considered to be single or low wide duplication followed by massive gene
copy (> 95% hybridization signal assigned loss (and return to diploidy). Only 30% of
to three or fewer restriction fragments) and Arabidopsis genes have retained syntenic
50% matched a gene of unknown function copies in less than 86 million years since
(Gene Ontology classification). A total of 550 the alpha duplication. In contrast, mam-
COS markers was mapped on to the tomato mals appear to harbour fewer polyploidiza-
genome. The size of conserved segments was tion events and less cycling of duplicated
generally smaller than 10 cM. Results indi- genes; 70% of human and mouse proteins
cated that multiple polyploidization events show conserved synteny after 100 million
punctuate the evolution of Arabidopsis and years of evolution.
tomato. Distinguishing orthologues from
paralogues is difficult due to reciprocal loss
of genes and chromosome segments follow- 3.5.2 Collinearity
ing polyploidization events.
Orthology and paralogy
PHYLOGENETIC ANALYSIS OF CHROMOSOMAL DUPLI-
The
CATION EVENTS TO DETECT MICROSYNTENY. Figure 3.9 shows the concepts of orthology
Arabidopsis genome sequence was used and paralogy. Orthologues and paralogues
to analyse internal duplication events are two types of homologous sequence.
96 Chapter 3
gene families and, accordingly, it is often dif- revealed excellent conservation between
ficult to determine if a gene mapped in the the overall structure and gene order of sor-
second species is orthologous or paralogous ghum chromosome 3 and rice chromosome
to that in the first species. Fourthly, the col- 1 but also indicated several rearrangements.
linearity of gene order and content observed Together, these studies indicate a general
at the recombinational map level is often conservation of large syntenic blocks within
not observed at the level of local genome cereals but with many more rearrangements
structure (Bennetzen and Ramakrishna, and synteny breakdowns than originally
2002). Finally, in most early studies, no anticipated.
statistical analysis was used to evaluate This trend is even more obvious when
whether the presence of a few markers in synteny is analysed at the sequence level.
the same order on two chromosomal seg- Rearrangements may occur that involve
ments in two species occurs by chance or is regions smaller than a few centimorgans and
truly significant. would be missed by most recombinational
The genome collinearity of several mapping studies. Comparative sequence
Cammelineae and Brassicaceae species analysis involving large genomic segments
have been recently compared to that of can detect these rearrangements. Such anal-
A. thaliana by comparative genetic link- yses reveal the composition, organization
age mapping and comparative chromosome and functional components of genomes and
painting (Schranz et al., 2007). A compre- provide insight into regional differences
hensive study identified 21 syntenic blocks in composition between related species.
that are shared by Brassica napus and Recently, the sequencing of genomic seg-
A. thaliana genomes, corresponding to 90% ments in the cereals has enabled microcol-
of the B. napus genome (Parkin et al., 2005). linearity across genes or gene clusters to
be investigated. Sequencing of the domes-
Microcollinearity tication locus Q in Triticum monococcum
revealed excellent collinearity with the
Using the rice genome sequence as the ref- bread wheat genetic map (Faris et al., 2003).
erence to compare with molecular marker Following the sequencing of the leaf-rust-
information of other cereals gave a result resistance locus Rph7 from barley, it was
which indicated many more rearrangements observed that this locus is flanked by two
than had been expected from Gale and HGA genes. The orthologous locus in rice
Devoss (1998) concentric circles model. chromosome 1 consists of five HGA genes.
One such comparison involved more than In barley, only four of the five HGA genes
2600 mapped sequenced markers in maize are present, one is duplicated as a pseudo-
among which only 656 putative ortholo- gene and six additional genes have been
gous genes could be identified (Salse et al., inserted in between the HGA genes. These
2004). The comparison of the wheat genetic six genes have homologues on eight dif-
map with the rice sequence also suggests ferent rice chromosomes (Brunner et al.,
numerous rearrangements between the two 2003). The most striking rearrangement
genomes with a high frequency of break- was revealed by the comparison of 100 kb
downs in collinearity (Sorrells et al., 2003). around the Bronze locus of two maize lines.
Extensive comparisons have also been made Not only does the retrotransposon distribu-
between sorghum and rice (Klein et al., tion differ between the two lines but the
2003; The Rice Chromosome 10 Sequencing genes themselves could also be different (Fu
Consortium, 2003). To align the sorghum and Dooner, 2002). Comparison of the low
physical map with the rice map, sorghum molecular weight glutenin locus between
BAC clones were selected from the mini- T. monococcum and Triticum durum also
mum tiling path of chromosome 3. Unique revealed dramatic rearrangements: more
partial sequences were obtained from each than 90% of the sequence diverged because
BAC clone and could be directly compared of retro-element insertions and because dif-
with the rice sequence. This approach ferent genes are present at this locus (Wicker
98 Chapter 3
et al., 2003). Therefore collinearity can be for identifying regions of cereal genomes
lost very rapidly within two genomes from that are prone to rapid evolution. Similar
the same species. comparative analyses of Arabidopsis acces-
With the sequencing of long regions, sions have shown that both the relocation
several studies in cereals have demon- of genes and the sequence polymorphisms
strated incomplete microcollinearity at the between accessions (in both coding and
sequence level. Song et al. (2002) identified non-coding regions) are common in the
orthologous regions from maize, sorghum Arabidopsis genome (The Arabidopsis
and two subspecies of rice. It was found Genome Initiative, 2000). Intraspecific vio-
that gross macrocollinearity is maintained lation of collinearity has also been identified
but microcollinearity is incomplete among in maize (Fu and Dooner, 2002). Han and
these cereals. Deviations from gene colline- Xue (2003) also discovered significant num-
arity are attributable to micro-rearrangement bers of rearrangements and polymorphisms
or small-scale genomic changes such as gene when comparing indica and japonica
insertions, deletions, duplications or inver- genomes in rice. The deviations from col-
sions. In the region under study, the orthol- linearity are frequently due to insertions or
ogous region was found to contain six genes deletions. Intraspecific sequence polymor-
in rice, 15 in sorghum and 13 in maize. In phisms commonly occur in both coding and
maize and sorghum, gene amplification non-coding regions. These variations often
caused a local expansion of conserved genes affect gene structures and may contribute to
but did not disrupt their order or orienta- intraspecific phenotypic adaptations.
tion. As indicated by Bennetzen and Ma
(2003), numerous local rearrangements dif- Implications of genome collinearity
ferentiate the structures of different cereal
genomes. On average, any comparison of a Genomics would be much simpler if the
ten-gene segment between rice and a dis- order of genes were common (syntenic)
tant grass relative such as barley, maize, across the major groups of plants. The
sorghum or wheat shows one or two rear- usefulness of the collinearity between the
rangements that involve genes. A simple genomes of model plants and important
extrapolation to the rice genome of about crops can be assessed by the number of
40,000 genes (Goff et al., 2002) suggests that failures or successes in its exploitation. For
about 6000 genic rearrangements occurred example, the analysis of the Arabidopsis
which differentiate rice from any of the sequence provides information that will
other cereals. Most of these rearrangements facilitate the annotation of the rice sequence
appear to be tiny and thus would not inter- and likewise sequencing Medicago provides
fere with the macrocollinearity observed by a resource for research on important crop
recombinational mapping. There are excep- legumes. Furthermore, the effort put into
tions however, which include chromosomal sequencing and annotating the rice genome
arm translocations and movements of single has also been rewarded, as this annotation
genes to different chromosomes (Bennetzen will be transferred to related sequences and
and Ma, 2003). used repeatedly in the future. The synteny
As expected, there is a high degree of between the monocots will help decipher
gene conservation between the two shot- the structure and function of the more
gun-sequenced subspecies of rice, japonica complex genomes. A fully assembled rice
and indica, which diverged more than 1 sequence allows more accurate assessment
million years ago. On careful inspection, of the macro- and microsynteny of rice with
however, narrow regions of divergence can other cereals (Xu et al., 2005).
be found in these genomes (Song et al., The advent of technologies for map-
2002). These regions correspond to areas of ping genomes directly at the DNA level has
increased divergence among rice, sorghum made comparative genetic mapping among
and maize, suggesting that the alignment sexually incompatible species possible.
of the two rice subspecies might be useful Extensive comparative maps for marker
Omics and Arrays 99
genes have been constructed for a number of of divergence among grass species. When
plant taxa, including species in the Poaceae evaluating 124 CISPs across rice, sorghum,
(rice, maize, sorghum, barley and wheat), millet, Bermuda grass, teff, maize, wheat
Solanaceae (tomato, potato and pepper) and barley, about 18.5% of them seemed
and Brassicaceae (Arabidopsis, cabbages, to be subject to rigid intron size constraints
mustard, turnip and rape). As a result, the that were independent of per-nucleotide
concept of a single genetic or ancestral DNA sequence variation. Likewise, about
map for all grasses, with species-specific 487 conserved non-coding sequence motifs
modifications, is emerging (Moore et al., were identified in 129 CISP loci. As pointed
1995). The extensive collinearity of wheat, out by Feltus et al. (2006), CISP provides the
rye, barley, rice and maize suggests that it means to effectively explore poorly char-
may be possible to reconstruct a map of the acterized genomes for both polymorphism
ancestral cereal genome. These conserved and non-coding sequence conservation on
gene orders and the possibility of sharing a genome-wide or candidate gene basis and
DNA probes and PCR primers across spe- also to anchor points for comparative genom-
cies will greatly extend the power of map- ics across a diverse range of species. After
ping analysis by facilitating the molecular the whole genomes of the major food crops
analysis of the corresponding chromosomal have been sequenced, plant breeders will be
regions in different species and allowing able to access new gene tools that will facili-
information, and perhaps DNA sequences tate the selection of outstanding individu-
and genes, to be transferred quickly and als characterized by resistance to biotic and
efficiently between different species. abiotic stresses and good seed quality, thus
The challenge of finding which map, enabling breeders to produce new cultivars
sequence and eventually functional genomic in addition to those currently available.
information from one species can be accessed, As a fundamental tool in biology, com-
compared and exploited across all plant spe- parative analysis has been extended from
cies will require the identification of a subset being focused on a specific field to biology
of plant genes that have remained relatively as a whole. With the growing availability of
stable in both sequence and copy number phenotypic and functional genomic data,
since the radiation of flowering plants from comparative paradigms are now also being
their last common ancestor. Identification of extended to the study of other functional
such a set of genes would also facilitate taxo- attributes, most notably gene expression.
nomic and phylogenic studies in higher plants Microarray techniques present an alterna-
that are presently based on a very small set of tive method of studying differences between
highly conserved sequences, such as those closely related genomes. Advances in micro-
of chloroplast and mitochondrial genes. The array-based approaches (see Section 3.6)
conserved orthologue set of markers, identi- have enabled the main forms of genomic var-
fied computationally and experimentally, iation (amplifications, deletions, insertions,
may further studies on comparative genomes rearrangements and base-pair changes) to be
and phylogenetics and elucidate the nature of detected using techniques that can easily be
genes conserved throughout plant evolution. undertaken in individual laboratories using
Completed genome sequences provide simple experimental approaches (Cresham
templates for the design of genome analysis et al., 2008).
tools in orphan species lacking sequence Tirosh et al. (2007) reviewed recent
information. For example, Feltus et al. studies in which comparative analysis was
(2006) designed 384 PCR primers to con- applied to large-scale gene expression data-
serve exonic regions flanking introns using bases and discussed the central principles
sorghum and millet EST alignments to the and challenges of such approaches. As differ-
rice genome. These conserved-intron scan- ent functional properties often co-evolve and
ning primers (CISP) amplified single-copy complement one another, their combined
loci with 3780% success rates; i.e. sampling analysis reveals additional insights. Unlike
most of the approximately 50 million years sequence-based genetic map information
100 Chapter 3
however, most functional properties are ogy. Depending on the type of molecules that
condition-dependent, a property that needs are arrayed, microarrays can also be based on
to be accounted for during interspecies com- proteins, tissues or carbohydrates.
parisons. Furthermore, functional proper- An array is an orderly arrangement of
ties often reflect the integrated function of samples. It provides a medium for match-
multiple genes, calling for novel methods ing known and unknown molecular samples
that allow network-centred rather than gene- based on base-pairing (i.e. A-T and G-C for
centred comparisons. Finally, one of the DNA; A-U and G-C for RNA) or hybridiza-
main challenges in comparative analysis is tion and automating the process of identify-
the integration of different data types which ing the unknowns. From its origin as a new
is becoming particularly important as addi- technique for large-scale DNA mapping and
tional data types are being accumulated. The sequencing and initial success as a tool for
lack of appropriate descriptors and metrics transcript-level analyses, microarray technol-
that succinctly represent the new informa- ogy has spread into many areas by adapting
tion originating from genomic data is one of the basic concept and combining it with other
the roadblocks on this path. Galperin and techniques. Microarray-based processes,
Koller (2006) outlined recent trends in com- either mature or under development, include
parative genomic analysis and discussed transcriptional profiling, genotyping, splice-
some new metrics that have been used. This variant analysis, identification of unknown
issue is related to the ontology concept and is exons, DNA structure analysis, chromatin
discussed in detail in Chapter 14. immunoprecipitation (ChIP)-on-chip, protein
binding, proteinRNA interaction, chip-based
comparative genomic hybridization, epige-
3.6 Array Technologies in Omics netic studies, DNA mapping, re-sequencing,
large-scale sequencing, gene/genome syn-
It is widely believed that thousands of genes thesis, RNA/RNAi synthesis, proteinDNA
and their products (i.e. RNA and proteins) in interaction, on-chip translation and universal
any given living organism function in a com- microarrays (Hoheisel, 2006).
plicated and orchestrated manner. However, In this section, the basic procedures
traditional methods in molecular biology of arraying will be introduced and several
generally work on a one gene in one experi- major microarray technologies and plat-
ment basis which means that the through- forms will be briefly described. The two
put is very limited and the whole picture volumes of DNA Microarrays (Kimmel and
of gene function is difficult to obtain. In the Oliver, 2006a, b) provide a comprehensive
late 1990s, a new technology known as a coverage of all the related fields from tech-
biochip or DNA microarray, attracted great nologies and platforms to data analysis.
interest among biologists. This technology The reader is also referred to Zhao and Bruce
promised to monitor the whole genome on a (2003), Amratunga and Cabrera (2004),
single array so that researchers would have Mockler and Ecker (2004), Subramanian
a better picture of the interactions among et al. (2005), Allison et al. (2006), Hoheisel
thousands of genes at the same time. (2006) and Doumas et al. (2007).
Various terminologies have been used in
the literature to describe this technology; for
DNA microarrays these include, but are not 3.6.1 Production of arrays
limited to, biochip, DNA chip, DNA micro-
array and gene array. Affymetrix, Inc. owns Complementary strands of DNA and nucleic
a registered trademark, GeneChip, which acids in general can pair in a duplex via non-
refers to its high density, oligonucleotide- covalent binding. This fundamental charac-
based DNA arrays. However, in some articles teristic is used in all DNA array techniques.
appearing in professional journals, popular Amaratunga and Cabrera (2004), Arcellana-
magazines and on the Internet, the term gene Panilio (2005) and Doumas et al. (2007)
chip(s) has been used as a general terminol- describe the principles of DNA miroarray
ogy that refers to DNA microarray technol- technology and how they are prepared and
Omics and Arrays 101
used. First, two terms related to microarrays, resulting in a dramatic increase in through-
probe and target, should be introduced. The put. In GeneChips (http://www.affymetrix.
gene-specific DNA spotted on to the array com/) the probe array was designed using
is referred to as the probe and the sample to an optimal set of oligonucleotides selected
be tested that will hybridize with the probe using computer algorithms and manu-
is referred to as the target. The same probe factured using Affymetrix light-directed
spotted on to the array can be repeatedly chemical synthesis. Fluorescent labels were
hybridized with many different targets (sam- used for hybridization and detection and
ples). An experiment using a single DNA the Affymetrix software suite was used for
chip can provide researchers with informa- data analysis and database management.
tion on thousands of genes simultaneously, Figure 3.10 illustrates a flowchart showing
EST database or
cDNA library
Treatment 1 Treatment 2
PCR inserts
from EST clones RNA 1 RNA 2
Hybridization
Spotting
Laser Wash
Dry
scanning
10000 10000
1000 1000
100 100
10 10
1 1
1 10 100 1000 10000 1 10 100 1000
Treatment 1 Treatment 2 Treatment 1 Treatment 2
10
Mean fold-decrease compared to 0h
10
Mean fold-increase compared to 0h
7.5 7.5
5.0 5.0
4.0 4.0
3.0 3.0
2.0 2.0
1.0 1.0
Treatment 1 Treatment 2
numbers of identical DNA copies can be ing oligos close to the 3' end might also boost
generated by growing them in bacteria. signal intensity.
The DNA spots on a microarray are
produced either by synthesis in situ or by Slide substrates
deposition of the pre-synthesized product.
DNA synthesis in situ methods have largely Glass microscope slides are the solid sup-
been within the purview of commercial port of choice and they should be coated
companies. In this method, 2025-bp long with a substrate that favours binding of the
gene-specific oligonucleotides are gener- DNA. Development of substrates on atomi-
ated in situ on a silicon surface by combin- cally flat slide surfaces and minimum back-
ing a standard DNA synthesis protocol with ground for higher signal-to-noise ratios has
phosphoramidite reagents modified with contributed to the improvement of data
photolabile 5'-protecting groups (Doumas quality (Arcellana-Panilio, 2005). Different
et al., 2007). The activation for oligonucle- versions of silane, amine, epoxy and alde-
otide elongation is achieved using a mask hyde substrates which attach DNA by either
(Affymetrix; http://www.affymetrix.com) ionic interaction or covalent bond forma-
or maskless (NimbleGen; www.nimblegen. tion are commercially available.
com) method. Alternatively, the reagents
can be delivered to each spot using ink-jet Arrays and spotting pins
technology (Agilent; http://www.agilent.
com). Ongoing research and development The physical process of delivering the DNA
efforts ensure the optimum design of the to pre-determined coordinates on the array,
DNA content and continued technologi- involves spotting pens or pins carried on a
cal advancements enable the production of print head that is controlled in three dimen-
increasingly higher-density arrays. sions by gantry robots with sub-micron pre-
cision. A total of 30,000 features of 90-m
diameter can easily be spotted on to a 25 75-
Array content mm slide with a maximum spotting density
of over 100,000 features per slide. There are
The choice of DNA type to print is funda- several DNA arraying technologies, includ-
mental. The sequence of the cDNA could be ing high speed robotic printing of DNA
several hundred to a few thousand base pairs fragments on glass (usually PCR amplified
long. The DNA spotted on oligonucleotide cDNAs), high speed robotic printing of long
arrays consist of synthesized chains of oligo- oligonucleotides (70-mers; Agilent technol-
nucleotides corresponding to part of a known ogy and many academic facilities), synthesis
gene or putative ORF; each oligonucleotide is of oligonucleotides (25-mers) on micro-chips
usually about 2570 bp long. In an oligonu- using photolithographic masks (Affymetrix
cleotide array, a gene is generally represented GeneChips) and synthesis of oligonucle-
by several different oligonucleotides and otides (2570-mers) on microchips using
they are carefully chosen for maximal specif- maskless aluminium mirrors (NimbleGen
icity. Longer stretches of DNA such as those GeneChips). Improvements in arraying sys-
obtained from PCR of cDNA clones produce tems have included shorter printing times
robust hybridization signals but less specifi- and longer periods of walk-away operation.
city. Short oligonucleotides (2430 nt) have Arrayers are invariably installed within
greater discrimination and are also suitable controlled-humidity cabinets to maintain an
for assessing single-nucleotide changes. Long optimum environment for printing.
oligonucleotides (5070 nt) afford an excel-
lent compromise between signal strength and
specificity and their use has increased among
academic core facilities (Arcellana-Panilio, 3.6.2 Experimental design
2005). Choosing oligos corresponding to the
3' untranslated regions (3'UTR) increases the Careful experimental design is required
likelihood of their being specific and design- to determine the type of array to run; how
104 Chapter 3
many replicates to use; and which samples to 1025 g total RNA for cDNA spots and
will be hybridized to obtain meaningful long oligonucleotide arrays. In some cir-
data amenable to statistical analysis, upon cumstances it becomes necessary to amplify
which sound conclusions can be drawn. the RNA in the sample to obtain adequate
A biological question must first be framed amounts for labelling and hybridization to
and a microarray platform then chosen, fol- an array.
lowed by a decision on biological and tech- To prepare the labelled sample, the
nical replicates and the design of a series of first step is to purify mRNA from total cel-
hybridizations. lular contents. There are several challenges
Microarray experimental design is usu- involved: (i) mRNA accounts for only a
ally governed by the aim of the experiment. small fraction (less than 3% of all RNA in a
An important aspect of experimental design cell) so isolating mRNA in sufficient quan-
is deciding how to minimize variation which tity for an experiment (12 g) can be a chal-
can be thought of as occurring in three lay- lenge. Common mRNA isolation methods
ers: biological variation, technical variation take advantage of the fact that most mRNAs
and measurement error. Replication is the have a poly-adenine, poly(A), tail. These
easy answer to dealing with variation. To poly(A) mRNAs can be purified by captur-
make the best use of available resources, it ing them using complementary oligodeoxy-
is important to know what to replicate and thymidine (oligo(dT) ) molecules bound to
how many replicates to apply. Hybridization a solid support such as a chromatographic
of two samples to the same slide is made column or a collection of magnetic beads.
possible by labelling each sample with (ii) The more heterogeneous the cells, the
chemically distinct fluorescent tags. This more difficult it is to isolate mRNA specific
also provides the opportunity to make direct to the study. (iii) Captured mRNA degrades
comparisons between samples of primary very quickly and the mRNA has to be imme-
interest (Arcellana-Panilio, 2005). Using a diately reverse-transcribed into more stable
common reference becomes more efficient cDNA (for cDNA microarrays). The reverse
when a large number of samples need to transcription reaction usually starts from
be compared. When an experiment is test- the poly(A) tail of the mRNA and moves
ing the effect(s) of multiple factors, a well- toward its head; such a reaction is described
thought-out design is extremely critical so as oligo(dT)-primed.
that resources are not wasted on eventually
useless comparisons.
3.6.4 Labelling
coloured fluor is used for each sample so cDNA whose sequence is complementary
that the two samples can be differentiated to the DNA on a given spot, that cDNA
on the array. will hybridize to the spot where it will be
The cDNA or mRNA can be labelled detectable by its fluorescence. In this way,
either directly or indirectly. In the direct every spot on an array is an independent
labelling procedure, fluorescently labelled assay for the presence of a different cDNA.
nucleotide is incorporated into the cDNA Hybridization is achieved by pouring the
products as it is being synthesized. With labelled sample on to the array and allow-
this method, a difference in the steric hin- ing it to diffuse uniformly. It is then sealed
drance conferred by different label moie- in a hybridization chamber and incubated
ties causes some labelled nucleotides to at a specific temperature for a period of time
be more efficiently used than others, pro- sufficient to allow hybridization reactions
ducing a dye bias in which one sample is to complete. The experimental conditions
labelled at a higher level overall than the should ensure that all areas of the array are
other. Cyanine 3 (Cy3) and cyanine 5 (Cy5) exposed to a uniform amount of labelled
are large molecules that reduce reverse sample throughout the hybridization.
transcriptase efficiency of long transcripts Hybridizations are processed directly
and certain sequences. Cy3-nucleotide on the slides after target synthesis. The
tends to be incorporated at a higher fre- hybridization step is literally where every-
quency than Cy5 although this does not thing comes together, i.e. the labelled mol-
necessarily translate into a better labelled ecules find their complementary sequences
target. To prevent the dye bias, the indirect on the array and form double stranded
labelling approach was developed where hybrids which are strong enough to with-
RNA is reverse transcribed in the pres- stand stringent washes. As in the hybridiza-
ence of an amino allyl-modified nucle- tion of classical Southern and northern blots,
otide that enables the chemical coupling the objective is to favour the formation of
of fluorescent labels after the cDNA is hybrids and the retention of those which are
synthesized. If the coupling reaction goes specific. Hybridization conditions depend
to completion, the frequency of labelling on the length of probes arrayed on the slide
becomes independent of the fluorophore and need to be extensively tested before
(Arcellana-Panilio, 2005). analysis. As an example, probe melting tem-
The labelled sample is the target for the peratures range from 42 to 70C depending
experiment. The number of fluor molecules on the nature of the buffer: the presence of
that label each cDNA depends on its length formamide exerts a positive effect on buffer
and also possibly its sequence composi- stringency in Denhardt-type buffers which
tion. For an RNA sample, either total RNA are used at 42C, whereas Sarkosyl-based
or mRNA is typically isolated and labelled buffers are commonly used around 70C.
using a first strand cDNA synthesis step Exogenous DNA (e.g. salmon sperm and
either by direct incorporation of a fluores- Cot-1 DNA) reduces background by block-
cent dye or by coupling the dyes to a modi- ing areas of the slide with a general affinity
fied nucleotide. For non-expression-based for nucleic acid or by titrating out labelled
experiments, DNA rather than RNA can be sequences that are non-specific. Denhardts
labelled and hybridized to the array. reagent (containing equal parts of Ficoll,
polyvinylpyrrolidone and bovine secum
albumin) is also used as a blocking agent.
Detergents such as SDS reduce surface
3.6.5 Hybridization and tension and improve mixing while help-
post-hybridization washes ing to lower background at the same time.
Temperature is an important factor that can
The array holds hundreds or thousands be manipulated during the hybridization
of spots, each of which contains a differ- and post-hybridization washes of microar-
ent DNA sequence. If a sample contains a rays and here much can be learned from
106 Chapter 3
what has already been established for end models enable excitation at several
Northern or Southern blots. For microar- wavelengths and offer dynamic focus, lin-
rays to be useful as a means of quantifying ear dynamic range over several orders of
expression the target has to be present in lim- magnitude and options for high-throughput
iting concentrations and the probe must be scanning. The objective of the scanning pro-
present in sufficient excess so as to remain cedure is to obtain the best image, where the
virtually unchanged even after hybridiza- best is not necessarily the brightest (to avoid
tion (Arcellana-Panilio, 2005). One impor- over-saturation beyond the signal range) but
tant feature of fluorescence detection is that is the most faithful representation of the
it allows the simultaneous hybridization of data on the slide.
two to several targets that have been differ- Although it is only supposed to pick up
ently labelled. the light emitted by the target cDNAs bound
The quality of the hybridization can to their complementary spots, the scanner
be assessed by spotting the sample with a will inevitably pick up light from various
set of hybridization control genes, spiking other sources, including the labelled sam-
the labelled sample with a known amount ple hybridizing non-specifically to the glass
of these controls prior to exposure to the slide, residual (unwanted) labelled probe
array and verifying that these control genes adhering to the slide, various chemicals
are indeed showing up as having been used in processing the slide and even the
hybridized. slide itself. This extra light creates back-
ground signals. Once signal and background
values are clearly defined, which is specific
to each experiment, data can be extracted
3.6.6 Data acquisition and quantification from the image by counting the pixels with
each probe and background area and record-
Once the wet phase (e.g. slide hybridization ing this in a computer readable format.
and washing off any excess labelled sample) Data extraction from the image involves
is completed, signal detection of each of the several steps (Arcellana-Panilio, 2005):
hybridization targets can be captured, that (i) gridding or locating the spots on the
is, the array must be scanned to determine array; (ii) segmentation or assignment of
how much of each labelled sample is bound pixels either to foreground (true signal)
to each spot. The signal is acquired using or background; and (iii) intensity extrac-
array scanners, either a charge-coupled tion to obtain a new value for foreground
device (CCD) or a confocal microscope, and background associated with each spot.
typically equipped with lasers to excite the Subtracting the background intensity from
fluorophores at a specific wavelength and the foreground yields the true spot intensity
photo-multiplier tubes to detect the emitted which can be used as an approximation of
light. Spots with more bound sample will relative gene expression.
have more reporters and will therefore fluo-
resce more intensely. Whatever the scanner
resolution, the microarray spot diameter
needs to be five to ten times larger than 3.6.7 Statistical analysis and data mining
the scanner resolution which can be as lit-
tle as 5 m for the most recent models. The Huge data sets are generated by microar-
end-product of a microarray experiment is ray experiments. For example, 20 hybridi-
a scanned grey scale image whose inten- zation experiments with the Arabidopsis
sity measurements range from 0 to 216. The GeneChip generates a set of 2,624,000 data
image is usually stored in a 16-bit tagged points (8200 genes 16 oligonucleotides
image file format (tiff, for short). The most 20 hybridizations). Such a massive amount
basic scanner models offer excitation and of data prohibits any manual treatment.
detection of the two most commonly used Also experimental variability is generally
fluorophores (Cy3 and Cy5) whereas higher- significant and has to be managed in order
Omics and Arrays 107
to exploit the data properly. Allison et al. spots and background can be difficult espe-
(2006) examined five key components of cially when the spots fade gradually around
microarray analysis: (i) design (the develop- their edges. Detection efficiency might not
ment of an experimental plan to maximize be uniform across the slide, leading to exces-
the quality and quantity of information sive red intensity on one side of the array
obtained); (ii) pre-processing (processing and excessive green on the other.
of the microarray image and normaliza- Data normalization addresses system-
tion of the data to remove systematic vari- atic errors that can skew the search for
ation. Other potential pre-processing steps biological effects. One of the most com-
include transformation of data, data filtering mon sources of systematic error is the
and in the case of two-colour arrays, back- dye bias introduced by the use of differ-
ground subtraction); (iii) inference (testing ent fluorophores to label the target. Print-
statistical hypotheses, e.g. which genes tip differences can also lead to sub-grid
are differentially expressed); (iv) classifi- biases within the same array while scanner
cation (analytical approaches that attempt anomalies can cause one side of an array to
to divide data into classes with no prior seem brighter than the other. Normalization
information or into predefined classes); and across multiple slides to remove bias can be
(v) validation (the process of confirming the accomplished by scaling the within-slide
veracity of the inferences and conclusions normalized data. In practice, examining the
drawn in the study). box plots of the normalized data of individ-
Reproducible and reliable microarray ual arrays for consistency of width can usu-
results can be only achieved through quality ally indicate whether normalization across
control starting with data generation. Good arrays is required.
laboratory proficiency and appropriate data Spatial plots can locate background
analysis practices are essential (Shi et al., problems and extreme values. The shape
2008). Numerous software packages, both and spread of scatter plots and the height
free and commercial, are available for quan- and width of box plots give an overall view
tifying microarray data. Typically, the inter- of data quality that can give clues about the
preted array data will highlight a relatively effects of filtering and different normaliza-
small number of spots that deserve further tion strategies. Gene expression profiling
investigation. Alternatively, the overall pat- will be taken as an example for the rest
tern of profiling can be used as a finger- of this section. Clustering algorithms are
print to characterize specific phenotypes. means of organizing microarray data accord-
The quantified data from the images ing to similarities in expression patterns. In
are obtained in typical form of tab-delim- this case, co-expressed genes must be co-
ited text files. First, dust artefacts, comet regulated, and a logical follow-up to this
tails and other spot anomalies should be analysis is the search for regulatory motifs
identified and flagged so that they will not and the common upstream or downstream
enter the analysis. Pre-processing the quan- factors that may tie these co-expressed
tified data before formal analysis includes genes together. Treatments can be clustered
the flagging of ambiguous spots with inten- based on similarity in gene expression pro-
sities lower than a threshold defined by the files. Genes can be clustered based on simi-
mean intensity plus two standard devia- larity in expression patterns across profiles.
tions of supposedly negative spots (no Two mathematical approaches are often
DNA, buffer and/or non-homologous DNA used, hierarchical or k-means clustering
controls). (Stanford) and self organizing maps (SOMs)
Interpreting the data from a micro- (Whitehead Institute).
array experiment can be challenging. A strategy for identifying differentially
Quantification of the intensities of each spot expressed genes is to compute the t-statis-
is subject to noise from irregular spots, dust tic and correct for multiple testing using
on the slide and non-specific hybridization. adjusted P-values. The B-statistic, derived
Deciding the intensity threshold between using an empirical Bayes approach, has
108 Chapter 3
been shown in simulations to be far supe- array. Compared with DNA microarrays, the
rior to either log ratios or the t-statistic development of protein-based approaches
for ranking differentially regulated genes poses technical problems for several rea-
(Lonnstedt and Speed, 2002). The twofold sons (Bernot, 2004): (i) proteins consist of
change continues to be a benchmark for 20 distinct amino acids while there are only
researchers perusing lists of microarray data four bases in DNA; (ii) depending on their
in order to validate the data by PCR, which amino-acid composition, proteins may be
can provide independent confirmation of hydrophilic, hydrophobic, acidic or basic
the expression patterns of specific genes. (while DNA is always hydrophilic and neg-
However, fold change has become more of atively charged); and (iii) proteins are often
a secondary criterion for the selection of post-translationally modified (by glycosyla-
candidates for follow-up from a list of genes tion, phosphorylation, etc.).
ranked according to more reliable measures Although detection of protein micro-
of differential expression (Arcellana-Panilio, arrays can be carried out using general
2005). After preliminary data mining and detection methods as described above, the
statistical analysis, validation and follow- problem is that protein concentrations in
up experiments can be designed. a biological sample may be many orders of
There are many examples of the array magnitude different from that of mRNAs.
technologies described in this section. In Therefore, protein array detection methods
yeast, 260,000 oligonucleotides correspond- must have a much larger range of detection.
ing to all the genes in yeast have been syn- The preferred method of detection is cur-
thesized on to a 1.28 cm2 chip. These chips rently fluorescence detection. Fluorescent
have allowed the identification of genes detection is safe, sensitive and can produce
expressed in various mutants under differ- high resolution. The fluorescent detection
ent culture conditions or at different stages method is compatible with standard micro-
of growth. Numerous genes of unknown array scanners but some minor alterations
function have thus been recognized, regu- to software may need to be made.
lated in a manner similar to or opposite to Protein microarrays have been made
that of genes of known function; transcrip- in the following manner (Macbeath and
tion of the genome is thus incorporated into Schreiber, 2000; Bernot, 2004). Proteins
a vast combinatorial network. In plants, are deposited on to a support and subse-
Affymetrix has commercialized microchips quently fixed to it. Thus 1600 distinct
to evaluate the expression of Arabidopsis proteins may be arranged per cm2. These
genes, allowing the identification of genes arrays are ordered so that it is known which
that are active during pathogen infection protein is represented by any given spot.
or during treatment with herbicides, fun- The microarrays are then incubated with
gicides or insecticides. This also facili- other ligands (fluorescently labelled) and
tates the determination of which genes are the result of the hybridization is analysed
transcribed in which tissues under which by confocal microscopy (it is also possible
conditions or during which stages of devel- to employ radioactively labelled ligands).
opment. Commercial microarrays are also The protein recognized may be identified
available from Affymetrix for several other using the signal localization data obtained.
crop plants such as maize and tomato. The intensity of the signal obtained is pro-
portional to the level of ligandprotein
interaction.
Except for the most frequently used
3.6.8 Protein microarrays and others DNA and protein microarrays discussed
above, other microarrays include those built
A protein chip or microarray is a piece of using tissues (cells) and carbohydrates.
glass on which different molecules of pro- Similar to other microarrays, a tissue chip
tein have been affixed at separate locations or microarray is a piece of glass on which
in an ordered manner to form a microscopic different tissues have been affixed, while
Omics and Arrays 109
ddTTP ddGTP
ddCTP D-form 5
ddATP gene-specific
primer L-form
zip-code
A
T Genomic DNA
Molecule separation
Base discrimination by primer extension in solution on zip-code array
Genotyping
ProteindsDNA interactions
ProteinssDNA
Epigenetic studies interactions
Protein selection
CGH or attachment
by aptamers
Transcriptional profiling
D-form 5
Primer extension gene-specific
and labelling primer L-form
zip-code
AAAAAA-3
Sample 1
D-form 5 Hybridization to
Primer extension gene-specific
L-form
zip-code array
and labelling primer
zip-code
AAAAAA-3
Sample 2
Fig. 3.11. The concept of universal microarray. dsDNA, double-stranded DNA; SSDNA, single-stranded
DNA; CGH, comparative genomic hybridization.
data have led to the advent of high-density genomic content and should provide a dra-
DNA oligonucleotide-based whole-genome matic improvement in the understanding
tiling microarrays (WGAs) which can be of numerous biological processes. WGAs
employed to interrogate a full genomes comprise relatively short (< 100-mer) oligo-
worth of sequence data in a single experi- nucleotide features. Furthermore, they can
ment. This technology allows a more be created with > 6,000,000 discrete fea-
complete understanding of an organisms tures, each comprising millions of copies
Omics and Arrays 111
2006). Perlegen designed SNP-discovery morphism were found across diverse rice
arrays to include all possible SNP variations accessions.
with multiple levels of redundancy. In soybean, the GoldenGate assay, which
Edwards et al. (2008) developed a micro- is capable of multiplexing from 96 to 1536
array platform for rapid and cost-effective SNPs in a single reaction, has been tested
genetic mapping using rice as a model. In to determine the success rate of converting
contrast to methods employing genome til- verified SNPs into working arrays (Hyten
ing microarrays for genotyping, the method et al., 2008). Allelic data were successfully
is based on low-cost spotted microarray generated for 89% of the 384 SNP loci when
production, focusing only on known poly- it was used in three recombinant inbred line
morphic features. A genotyping microarray (RIL) mapping populations. Using the same
was produced comprising 880 SFP elements system, two panels of 1536 SNP markers
derived from indels identified by aligning have been developed in maize through col-
genomic sequences of the japonica cultivar laboration between Cornell, CIMMYT and
Nipponbare and the indica cultivar 93-11. Illumina, one with SNPs developed from
The SFPs were experimentally verified by candidate genes relevant to drought toler-
hybridization with labelled genomic DNA ance and the other with SNPs randomly
prepared from the two cultivars. Using the distributed on the maize genome (Yan et al.,
genotyping microarrays, high levels of poly- 2009).
4
Populations in Genetics
and Breeding
these crosses are then genetically analysed. females, to produce crosses of all possible
The mating design is as follows: combinations.
Parent P1 P2 P3 Pn Cultivar 1 2 3 n1
P1 n1+1
P2 n1+2
P3 n1+3
Pn n1+n2
A full diallel analysis will include all NCIII: n individuals are selected from
one-way hybrids and parents while a partial an F2 population to backcross with two par-
or incomplete diallel analysis may contain ents, P1 and P2:
just half the diallel without reciprocals or
parents. Diallel crosses are usually used to F2 individual 1 2 3 n
estimate general combining ability for the
parents and special combining ability for P1
specific crosses, providing information for P2
producing hybrids.
TRIPLE TESTCROSS (TTC) AND SIMPLIFIED TTC
NORTH CAROLINA DESIGNS. There are three North (STTC). TTC is an extension of NCIII, where
Carolina designs, denoted by NCI, NCII, and n individuals (n > 20) are selected from an F2
NCIII. These designs are most often used in population to backcross with both parental
cross-pollinated crops and to study broad- lines, P1 and P2, and the F1 (P1 P2):
based populations. Their use in self-pollinated
crops usually involves many inbred lines that
can reasonably be considered to represent a F2 individual 1 2 3 n
large, reference population, e.g. late matur-
ing soybean adapted to a geographical belt of P1
P2
USA. To simplify the description, however,
F1
inbreds are taken as an example.
NCI: two inbred lines are crossed to
produce F2, and then some individuals In sTTC: n cultivars or strains (n > 20)
are randomly selected from the F2 popu- are selected from the germplasm pool to cross
lation as males to intermate with other with two cultivars or strains, PH and PL, which
randomly selected females. The offspring show extreme phenotypes (with the highest
derived from this intermating will be used and lowest phenotypic values), respectively.
in genetic analysis. The design can be
described as below: Strain 1 2 3 n
Males 1 2 3 PH
PL
Mather and Jinks (1982) or sections discuss- et al. (2007) reviewed various approaches
ing quantitative genetics in plant breeding for haploid production in plants. Forster
texts for details regarding the genetic infor- and Thomas (2004) and Szarejko and
mation that can be derived from the study Forster (2007) reviewed the use of DHs in
of hybrids or families formed using each of genetic studies and plant breeding. Recent
these mating designs. Some of these designs reviews on specific crop species are avail-
have also been used in genetic mapping of able for tomato (Bal and Abak, 2007) and
quantitative traits. nutraceutical species (Ferrie, 2007).
Inbreeding populations
4.2.1 Haploid production
This type of population includes segregat-
ing populations such as F2 and F3 popula- There are several approaches to haploid
tions which are derived from selfing or production. Naturally occurring haploids
sibmating an F1 hybrid, BC populations that have been reported in a number of species
are derived from backcrossing the F1 to one including tobacco, rice and maize. In bar-
of the parents or advanced BC populations ley, the hap initiator gene was reported to
derived by multiple backcrossings of the F1 control haploidy and spontaneous haploids
to one of the parents. were recovered at high frequency (Hagberg
Populations used in genetic studies and and Hagberg, 1980), with up to 8% haploid
plant breeding can be derived from any of the offspring being recovered when a cultivar
mating designs discussed above. For breed- that was homozygous for the hap allele was
ing purposes, the sizes of populations that used as the female parent to cross with other
will be maintained can be much smaller than cultivars, but none were produced from the
those used in genetic studies because breed- reciprocal cross. In maize, the indeterminate
ers only need to retain the populations with gametophyte gene (ig) results in a monoploid
desirable traits. For genetic studies, however, embryo either from the sperm cell or the egg
geneticists need to maintain as large a popu- cell (Kermicle, 1969). Although DHs can be
lation as possible and all types of segregates recovered from such spontaneous haploids,
including those with undesirable traits. their frequencies are usually too low for
genetics and breeding purposes.
With the recognition of the importance of
4.2 Doubled Haploids (DHs) DHs in plant breeding, extensive efforts have
been made to induce haploid embryogenesis
Cells or plants that contain a single com- and increase the frequency at which DHs
plete set of chromosomes are called hap- can be recovered. The benefits of DHs have
loid. Haploids derived from diploids are already been demonstrated in many research
called monoploid, while haploids derived and breeding programmes. This progress has
from polyploids are called poly-haploid. led to DH cultivars for commercial produc-
Diploids produced from chromosome dou- tion and DH populations for genetics and
bling of haploids are called doubled or breeding studies. In barley, over 100 culti-
double haploid (DH). The DH approach vars have been released and similar numbers
has several advantages that make it useful of rice and rapeseed DH cultivars have been
in genetics and plant breeding. DHs can be listed (Forster and Thomas, 2004). DHs have
produced via in vivo and in vitro systems. also been used successfully in recalcitrant
Haploid embryos are produced in vivo by species such as oat (Kiviharju et al., 2005)
parthenogenesis, pseudogamy or chromo- and rye (Tenhola-Roininen et al., 2006).
some elimination after extensive crossing. Maluszynski et al. (2003) edited a
The haploid embryo is rescued, cultured manual presenting a set of protocols for the
and chromosome-doubling produces DHs. production of DH in 22 major crop plant spe-
The in vitro methods include gynegenesis cies including four tree species. The manual
(ovary and flower culture) and androgene- contains various protocols and approaches
sis (anther and microspore culture). Forster to DH production that have been success-
Populations in Genetics and Breeding 117
fully used for different germplasm resources endosperm. Chromosome or genome prefer-
in each species. The protocols describe in ential or uniparental elimination arises as a
detail all the steps in DH production, from result of certain crosses; fertilization occurs
donor plant growth conditions, through in but soon afterwards the genome of one par-
vitro procedures, media composition and ent is preferentially eliminated. Haploids
preparation to regeneration of haploid plants can be produced by interspecific hybridi-
and methods for chromosome doubling. The zation followed by chromosome elimina-
manual enables the researcher to choose the tion. In barley, this extensive hybridization
most suitable method for production of DH method consists of crossing cultivated bar-
for their particular laboratory conditions and ley, Hordeum vulgare (2n = 2x = 14) with
plant materials, e.g. microspore versus anther the wild, diploid cross-pollinated peren-
culture, wide hybridization or gynogenesis. nial Hordeum bulbosum (2n:::: 2x = 14).
The manual also contains information on Most progeny (95%) are barley haploids,
the organization of a DH laboratory, basic while the remainder is made up by diploid
DH media and associated simple cytogenetic hybrids. This technique, called the bulbo-
methods for ploidy level analysis. An excel- sum method, has been extensively utilized
lent overview of haploid induction and the for the production of haploids in barley.
application of doubled haploids is provided Haploids can also be produced in hexaploid
for Brassicaceae, Poaceae and Solanaceae wheat (var. Chinese Spring) by chromosome
in Haploids in Crop Improvement II elimination following hybridization of wheat
(Biotechnology in Agriculture and Forestry) with H. bulbosum (both 2x and 4x). A fre-
edited by Palmer et al. (2005). quency of 13.7% grain set with 2x bulbosum
There are now five methods generally and 43.7% grain set with 4x bulbosum were
applicable to the production of haploids in obtained (Barclay, 1975). During formation
plants with frequencies that are useful for of the embryo the chromosomes of H. bulbo-
genetics and breeding programmes (Palmer sum are eliminated. The immature embryos
and Keller, 2005): are cultured in vitro and plantlets from these
monoploid embryos can be induced via an
Extensive hybridization crosses fol-
efficient chromosome doubling technique to
lowed by chromosome elimination
produce fertile flowers bearing homozygous
from one parent of a cross, usually the
hexaploid seeds.
pollination parent.
The production of embryos as a result
Gynogenesis: cultured unfertilized
of wheat maize crosses was first reported
isolated ovules and ovaries of flower
by Zenkteler and Nitzsche (1984). Laurie
buds develop embryos from cells of the
and Bennett (1986) cytologically exam-
embryo sac.
ined embryos produced via this system and
Androgenesis: cultured anthers or iso-
found maize chromosomes to be preferen-
lated microspores undergo embryogen-
tially eliminated during the first three cell
esis or organogensis directly or through
divisions, leaving a haploid complement of
intermediate callus.
wheat chromosomes. This method was used
Parthenogenesis: development of an
in wheat haploid production and applied
embryo by pseudogamy, semigamy or
with some success in generating genetic
apogamy.
and mapping populations (Laurie and
Inducer-based approach: haploid-induc-
Reymondie, 1991). Mean frequencies of fer-
ing lines are used to produce haploids.
tilization, embryo formation, embryo germi-
nation and haploid regeneration of 83, 20,
Chromosome or genome elimination 45 and 8%, respectively have been reported
(Chen et al., 1999). Significant differences
Haploid embryos can be produced in plants in the percentage of embryo germination
after pollination by distantly related spe- and haploid regeneration were observed
cies. In most cases, normal double fertiliza- among crosses suggesting that the efficacy
tion takes place to form a hybrid zygote and of haploid production could be improved by
118 Chapter 4
hybrid embryo development in plants: for some 3 of H. vulgare are responsible for chro-
example, differences in timing of essential mosome elimination, although their effect
mitotic processes due to asynchronous cell may be neutralized or offset if a sufficient
cycles or asynchrony in nucleoprotein syn- dose of bulbosum chromosomes is available.
thesis leading to a loss of the most retarded
chromosomes. Other hypotheses propose Ovary culture or gynogenesis
the formation of multipolar spindles, spatial
separation of genomes during interphase Ovary culture involves production of a hap-
and metaphase, parent-specific inactiva- loid individual by culture of unfertilized
tion of centromeres and by analogy with the ovaries to obtain haploid plants from egg
host-restriction and modification systems of cells or other haploid cells of the embryo;
bacteria, degradation of alien chromosomes the process is known as gynogenesis. Under
by host-specific nuclease activity. Gernand the appropriate culture conditions the
et al. (2005) provide evidence for a novel unfertilized cell of the embryo sac develops
chromosome elimination pathway in wheat into an embryo by as yet unknown mecha-
pearl millet hybrids that involves the for- nisms. Haploid plants generally originate
mation of nuclear extrusions during inter- from egg cells in most species (in vitro par-
phase in addition to post-mitotically formed thenogenesis) but in some species, e.g. rice,
micronuclei. They found that the chroma- they arise chiefly from the synergids; in at
tin structure of nuclei and micronuclei was least Allium tuberosum even antipodal cells
different and heterochromatinization and produce haploid plants (in vitro apogamic)
DNA fragmentation of micronucleated pearl (Mukhambetzhanov, 1997).
millet chromatin was the final step during Gynogenesis may occur either via
haploidization. embryogenesis or plantlet regeneration
The mechanism of chromosome elimi- from callus. In rice 2-methyl-4-chlorophen-
nation in Hordeum hybrids was studied oxyacetic acid (MCPA) generally leads to a
by Subrahmanyam and Kasha (1975) and small amount of protocorm-like callus for-
Bennett et al. (1976) and the following con- mation from which shoots and roots regen-
clusions were drawn: (i) normal double fer- erate, while picloram promotes embryo
tilization occurs in interspecific crosses as regeneration. In contrast, sugarbeet usually
confirmed by cytological study; and (ii) after shows embryo development while in sun-
fertilization there is a gradual and selective flower embryos regenerate following a cal-
elimination of H. bulbosum chromosomes lus phase. In general, regeneration from a
from nuclei of endosperm as well as embryo callus phase appears, at least for the present,
cells so that eventually haploid embryos are to be easier than direct embryogenesis.
produced. A sudden shortage of proteins in Generally, gynogenesis has two or
the developing embryo and endosperm and more stages and each stage may have dis-
the better ability of vulgare chromosomes to tinct requirements. In rice, two stages, i.e.
form spindle attachments relative to bulbo- induction and regeneration, are recognized.
sum chromosomes, may be responsible for During induction, ovaries are floated on a
elimination of the bulbosum chromosomes. liquid medium containing low auxin levels
Other possible causes such as differences in and kept in the dark, while for regeneration
mitotic cycle, congression during mitosis, they are transferred on to an agar medium
etc. were ruled out by the authors. containing a higher auxin concentration
It has also been demonstrated that the and incubated in the light.
elimination of bulbosum chromosomes is Depending on the species, unfertilized
under genetic control (Subrahmanyam and ovules, ovaries or flower buds can be cultured.
Kasha, 1975). The above-mentioned authors In some members of the Chenopodiaceae,
used primary trisomics and monotelotri- Liliaceae and Cucurbitaceae, gynogenesis is
somics in crosses with tetraploid H. bul- the main route to DH production (Palmer and
bosum and concluded that both arms of Keller, 2005). Even where anther or micro-
chromosome 2 and the short arm of chromo- scope culture is successful, gynogenetic
120 Chapter 4
haploids have been produced, e.g. in barley, at lower levels somatic calli and somatic
maize, rice and wheat. embryos were also produced. Ovaries are
San Noeum (1976) was the first to generally cultured in the light but in some
demonstrate that gynogenesis can be species at least, e.g. sunflower and rice,
induced under in vitro conditions. She incubation in the dark favours gynogenesis
obtained gynogenic haploids using an and minimizes somatic callusing; in rice
ovary culture of H. vulgare. Subsequently, light may lead to the degeneration of gyno-
success has been obtained with many genic pro-embryos.
species, e.g. wheat, rice, maize, tobacco, Ovary culture has two main limita-
petunia, gerbera, sunflower, sugarbeets, tions: (i) it is not successful in all species;
onions, rubber, etc. About 0.26% of the and (ii) the frequency of responding ova-
cultured ovaries show gynogenesis and ries and the number of plantlets per ovary
one or two plantlets, rarely up to eight, is usually low. Therefore, anther culture is
originate from each ovary. preferred over ovary culture; only in those
Embryogenic frequency is low in many cases where anther culture fails, e.g. sugar-
cases, but relatively high frequencies have beet and for male sterile lines, ovary culture
been reported in some cases (Alan et al., assumes significance.
2003; Martinez, 2003). The rate of success
varies considerably with species and is Anther culture or androgenesis
markedly influenced by explant genotype
so that some cultivars do not respond at Anther culture or androgenesis is a proc-
all. In rice, japonica genotypes are far more ess by which a haploid individual develops
responsive than indica cultivars. In most from a pollen grain. Anther culture is often
cases, the optimum stage for ovule culture the method of choice for DH production
is the nearly mature embryo sac, but in rice in crop plants (Sopory and Munshi, 1996).
ovaries at the free nuclear embryo sac stage Good aseptic techniques are required but
are the most responsive. the methods are generally simple and appli-
The culture response is still genotype cable to a wide range of crops (Maluszynski
dependent (Alan et al., 2003; Bohanec et al., et al., 2003). In general, haploid plants are
2003). Generally, for culture of whole flow- generated in vitro from the microspores
ers, ovary and ovules attached to placenta contained in the anther and require chro-
respond better, but in gerbera and sunflower mosome doubling treatments. The number
isolated ovules give a better response. Cold of chromosomes in haploid plants can be
pretreatment (2448 h at 4C in sunflower doubled either naturally or by colchicine
and 24 h at 7C in rice) of the inflorescence treatment.
before ovary culture enhances gynogenesis. The process involved in anther cul-
The composition of the culture medium ture is poorly understood. Investigations
and stage of embryo sac development are have been hampered by the presence of
important considerations for successful the sporophytic anther wall that presents
culture (Keller and Korzun, 1996). Growth direct access to the microspores contained
regulators are crucial in gynogenesis and at within. This has become an important issue
higher levels they may induce callusing of because although many species respond to
somatic tissues and even suppress gynogen- anther culture, responsive genotypes can
esis. Growth regulator requirements seem be a limiting factor thus making it neces-
to depend on species. For example, in sun- sary to study, understand and manipulate
flower growth regulator-free medium is the microspore embryogenesis in order to
best and even a low level of MCPA induces develop genotype-independent methods
somatic calli and somatic embryos. But in (Forster et al., 2007). Many factors influ-
rice, 0.1250.5 mg l1 MCPA is optimal for ence the production of anther-culture-
gynogenesis. The sucrose level also appears derived plants including the physiological
to be critical; in sunflower 12% sucrose status of the donor plants, pre-treatment of
leads to gynogenic embryo production while anthers, developmental stage of the pollen,
Populations in Genetics and Breeding 121
to other cereals such as wheat yielded a low feasible means for production of haploids in
frequency of green plants. Although a high cotton (Zhang and Stewart, 2004).
frequency of green plants is produced for There are many examples of DH lines
most barley crosses, androgenesis still poses developed from cultivars and intra- and
some problems that need to be addressed. interspecific hybrids between upland cot-
There are barley genotypes which are ton (Gossypium hirsutum L.) and American
extremely recalcitrant to microspore divi- Pima cotton (Gossypium barbadense L.)
sion and/or with a high rate of albinism. using semigamy. The semigametic trait has
The rate of embryogenesis is still low and also been transferred into different cotton
poorly-developed embryos are formed very cytoplasms to facilitate rapid replacement
frequently. New methods are needed that of nuclei. Stelly et al. (1988) proposed a
reduce the cost of DH production and are scheme called hybrid elimination and hap-
effective for all genotypes. loid production system using a cotton strain
Future objectives in plant androgen- with semigamy (Se), lethal gene (Le2dav),
esis include the development of efficient virescent (v7) and male sterility or glandless
androgenesis protocols for a wide range of (gl2gl3).
genotypes, a better understanding of the Semigametic lines can produce 3060%
biological processes involved in the stress haploids when self-pollinated and about
pre-treatment, the study of the influence 0.71.0% androgenic haploids when used
of different micronutrients on the induc- as female parents in crosses with normal
tion of gametic embryogenesis and possi- non-semigametic cottons (Turcotte and
ble gametophytic selection. Identification Feaster, 1967). A unique feature of semi-
of genetic loci associated with the anther gamy is that the inheritance of the gene is
culture response process will facilitate the conveyed by both male and female gametes
understanding of the mechanisms underly- but expression of the trait in terms of hap-
ing androgenesis. Identification and locali- loid production occurs only in the female
zation of molecular markers linked to the parent. As a consequence, for example, in
yield of green plants per anther and the reciprocal crosses between SeSe and sese
evaluation of their potential use for the parents, haploids will be produced only
prediction of the anther culture response when SeSe or Sese is the female parent.
of genotypes will also help to optimize the The results reported by Zhang and
production of DHs. Stewart (2004) verified that semigamy in
cotton is controlled by one gene, previously
Semigamy designated Se. The gene functions sporo-
phytically and gametophytically resulting
Semigamy is a form of parthenogenesis and in an incomplete dominance mode of action.
occurs when the nucleus of the egg cell and Consistent with the difference between the
the generative nucleus of the germinated two parental isogenic lines, semigametic
pollen grain divide independently, resulting F2.3 lines had significantly lower chloro-
in a haploid chimera (a plant whose tissues phyll content than non-semigametic F2.3
are of two different genotypes). Semigamy is lines, an observation that was confirmed by
a type of facultative apomixis in which the a significant association between haploid
male sperm nucleus does not fuse with the production and chlorophyll content. The Se
egg nucleus after penetrating the egg in the gene and the gene for reduced chlorophyll
embryo sac. Subsequent development can content could be either the same or closely
give rise to an embryo containing haploid linked.
chimaeral tissues of paternal and maternal
origins. In cotton, the semigametic phe- Inducer-based approach
nomenon was first reported by Turcotte and
Feaster (1963), who developed the Pima line Haploid inducing lines have been used
57-4 that produced haploid seeds at a high in maize to produce haploids by develop-
frequency. Currently semigamy is the only ment of the unfertilized egg cells (Eder and
Populations in Genetics and Breeding 123
Chalyk, 2002). A haploid induction rate (iii) improved chromosome doubling sys-
of up to 2.3% was detected by Coe (1959) tems using colchicine that gave a doubling
in crosses with the inbred line Stock 6. rate of greater than 10%.
A higher rate (about 6%) was later obtained A scheme to show in vivo haploid
by Sarker et al. (1994) and Shatskaya et al. induction includes the following steps:
(1994) in progenies of crosses between Creating new variation by intercrossing
Stock 6 and Indian and Russian germplasm,
with selected lines.
respectively. Inducer lines are now available In-vivo haploid induction in generation
with haploid seed induction rates of 812%
F1.
in temperate maize germplasm (Melchinger Chromosome doubling of haploid seed-
et al., 2005; Rber et al., 2005).
lings:
Segregation studies (Lashermes and
selection of haploid kernels;
Beckert, 1988; Deimling et al., 1997) and
germination of kernels;
quantitative trait loci (QTL) analysis (Rber,
cutting of coleoptile;
1999) demonstrated that in vivo haploid
doubling procedure: treatment of
induction in maize is a quantitative trait
seedlings with colchicine;
under the control of an unknown large
planting of treated seedlings in
number of loci. Individual QTL explained
greenhouse;
only small parts of the genetic variation.
transplanting DH plants at the
Compared with other methods of DH
three-leaf stage to the field and
production such as anther culture, the
selfing (generation D0); and
inducer-based approach is rather efficient,
formation of testcross hybrids.
less dependent on the genotype and can be Evaluation of testcrosses in multi-envi-
practised in almost every maize breeding
ronment yield trials (two stages).
programme without access to expensive lab-
oratory facilities (Rber et al., 2005; http://
www.uni-hohenheim.de/ipspwww/350a/
linien/indexl.html). 4.2.2 Diploidization of haploid plants
Requirements for in vivo DH produc-
tion in practical breeding include: (i) avail- As described above, haploids can be pro-
ability of inducer genetic stocks; (ii) high duced through various approaches. Haploid
induction rate; (iii) the inducer is a good plants may grow normally under in vitro or
pollinator; (iv) reproducibility with rea- greenhouse conditions up to the flowering
sonable seed quantities; (v) availability of stage, but viable gametes are not formed
a marker system that is independent of the due to the absence of one set of homologous
genetic background of the female and of chromosomes and consequently, there is no
environmental effects and can be used for seed set.
effective and unambiguous identification The only mechanism for perpetuating
of haploid kernels; and (vi) availability of the haploids is by duplicating the chro-
an artificial chromosome doubling system mosome complement in order to obtain
with high doubling rates that is safe, simple homozygous diploids. In pollen-derived
and cost-effective. plants duplication of chromosomes may
Since the late 1990s, these requirements occur spontaneously in cultures. However,
have been partially met in maize with: the spontaneous chromosome doubling
(i) inducer lines (e.g. RWS and UH400 devel- rate of haploids is usually low. In maize,
oped at the University of Hohenheim) with for example, the rate ranges from 0 to 10%
improved induction rates of 10% or higher; (Chase, 1969; Beckert, 1994; Deimling et al.,
(ii) a combination of two dominant mark- 1997; Kato, 2002). Therefore, it is neces-
ers (anthocyanin colour of endosperm and sary to diploidize the haploids by chemical
embryo for identification of haploids and means. Thus, artificial chromosome dou-
anthocyanin coloration of stalk for identi- bling (diploidization) is necessary for the
fication of false positives in the field); and efficient large-scale use of haploid plants.
124 Chapter 4
in the latter case can be either genetic or the formation of multi-polar spindles on
epigenetic in origin. Typical genetic altera- chromosomes lagging at anaphase cause the
tions are: changes in chromosome numbers development of cell lines with haploid, tri-
(polyploidy and aneuploidy), chromo- ploid or other uneven ploidy status.
some structure (translocations, deletions Many studies have indicated that cryp-
and duplications) and DNA sequence (base tic structural modification of individual
mutations). chromosomes is more likely to cause soma-
clonal variation than modification induced
GENETIC VARIATION ARISING FROM SOURCE by ploidy changes in many tissue-cultured
PLANTS. The source plants used to initiate plants. Chromosomal changes occurring
cultures are likely to be heterogeneous with during tissue culture include transposition
respect to the state of differentiation, ploidy of mobile genetic elements (transposons),
level and age. These explant-related factors chromosome breakage and repositioning
will affect the genetic make-up of the cells of chromosome segments.
produced in the culture and thus the cal- As summarized by Taji et al. (2002),
lus arising from such a group of cells with several mechanisms have been proposed to
diverse genetic make-up will inevitably lead explain the genetic variability that occurs
to a mixed population of cells. Depending in tissue culture. The most possible causes
on the cell types from which the plants are are:
originated, those regenerated from such a 1. Reduced regulatory control of mitotic
genetically mosaic callus will undoubtedly events in culture: the ploidy status of plants
be of different genetic make-up. Taji et al. generated from callus, cell suspension or
(2002) indicated that such genetic mosaic- protoplast cultures of certain species differ
ness seems to occur commonly in polyploid significantly despite the fact that the cul-
plants rather than in diploids or haploids. tures originate from a highly homogenous
genetic background. This indicates a lack of
GENETIC VARIATION ARISING DURING CULTURE. tight regulation of cell-cycle-related controls
Although a significant degree of genetic during cell proliferation in culture.
variability can be traced to the genetically 2. Use of growth regulators: plant growth
heterogeneous cell types of explant at least regulators, particularly synthetic auxins
in polyploid species, there is substantial such as 2,4-D, are considered to be the
evidence to indicate that much of the vari- major cause of genetic variability in cul-
ability observed in generated plants stems ture. For example, cytokinins at low con-
from the culture process itself. Aneuploids, centrations have been shown to reduce the
polyploids or cells with structurally altered range of ploidy in culture while low levels
chromosomes may arise in culture. Many of both auxins and cytokinins appear to
differentiated cells when induced to divide preferentially activate the division of cyto-
in culture, undergo endoduplication of logically stable meristematic cells enabling
chromosomes resulting in the production the regeneration of genetically uniform
of tetraploid or octaploid cells with distinct plantlets.
phenotypes. 3. Medium components: some of the min-
Various phenomena have been eral nutrients influence the establishment of
observed in tissue culture of various plant genetic variability in culture. For instance,
species which explain the production of by altering the levels of phosphate and nitro-
cells with unusual ploidy levels (Bhojwani, gen as well as the form of nitrogen in the
1990). Occurrence of multi-polar spindles medium, the genetic composition (ploidy
due to failure of spindle formation during level) of the cultured cells can be controlled
cell division is one of the contributing fac- to a considerable extent. A marked increase
tors. Absence of spindle formation during in chromosome breakage has been observed
mitosis results in the appearance of cells in plant cell cultures grown with different
with doubled chromosome number while levels of magnesium or manganese.
126 Chapter 4
4. Culture conditions: some culture con- systems could thus be attributed to tissue-
ditions, such as incubation temperatures culture-induced methylation or demethyla-
above 35C and long duration of culture, tion of DNA. The activity of transposons and
have been implicated in inducing genetic retrotransposons induced by tissue culture
variability in regenerated plants. could also be responsible for some of the
5. Inherited genomic instability: molecular genetic and epigenetic variability observed
studies indicate the existence of certain in culture.
regions of genome that are more susceptible
to tissue-culture-induced structural alterna-
tions, although the reason for the increased
4.2.4 Quantitative genetics of DHs
susceptibility of these genomic loci known
as hot spots is not fully understood.
DH lines that are derived randomly from
an array of gametes produced by F1 plants
CAUSES OF EPIGENETIC VARIATION IN TISSUE CULTURE. are very useful in quantitative genetics.
Any culture-induced changes which are sta- Compared with diploid genetic models
ble but not heritable have frequently been for populations such as F2, F3 or BC, there
considered as epigenetic variation. However, are no dominance or dominance-related
a greater understanding of genetic and epi- epistasis effects involved in the genetic
genetic alterations in tissue culture in the model of DH populations. As a result, addi-
recent past has led to a clear distinction tive, additive-related epsitasis and linkage
between these two types of variation. For effects can be investigated properly. As a
instance, genetic mutations occur randomly permanent population, DH lines can be
and at a much lower rate than epigenetic replicated as many times as desired across
variations. Genetic changes are usually sta- different environments, seasons and labo-
ble and heritable. Epigenetic variation may ratories, providing endless genetic material
also lead to stable traits; however, reversal for phenotyping and genotyping particu-
can occur at high rates under non-selective larly for understanding the genotype-by-
conditions. Epigenetic traits are often trans- environment interaction. In DH populations,
mitted through mitosis in a stable manner the additive component of genetic variance
but rarely through meiosis and the level is larger than that of diploid populations
of induction of epigenetic traits is directly such as F2 and BC. Choo et al. (1985) dis-
related to the selection pressure experi- cussed in detail the quantitative genetics
enced by the cells. Epigenetic changes are associated with DH populations, including
generally assumed to reflect alteration in detection of epistasis, estimation of genetic
expression rather than in the information variance components, linkage test, estima-
content of genes. tion of gene numbers, genetic mapping of
As Taji et al. (2002) summarized, the polygenes and tests of genetic models and
epigenetic variation observed in cultured hypotheses. Rber et al. (2005) compared
cells or regenerated plants is mainly due to the expected gain from selection for DH
three cellular events: (i) gene amplification; lines and other populations and implica-
(ii) DNA methylation; and (iii) increased tions of epistatic effects, which is briefly
activity of transposable elements. In plants, described here.
nearly 25% of the genome can be methylated
at cytosine residues but the significance of
Expected gain from selection
this cytosine methylation is not apparent.
It has been suggested that methylation (and As is well known from quantitative genet-
demethylation) of DNA is one of the ways of ics (see e.g. Falconer and Mackay (1996)
controlling transcriptional activity and that and also Chapter 1), the expected gain from
this process can be affected by the tissue selection can be described by G = i hx rG sy,
culture process. The non-heritable genetic where i is the selection intensity, hx the
variability observed in many tissue culture square root of the heritability of the selection
Populations in Genetics and Breeding 127
criterion, rG the genetic correlation between for DH lines this correlation is 1. Thus com-
selection criterion and gain criterion and sy pared with S2 lines, the correlation of DH
the standard deviation of the gain criterion. lines is 1: 0.75 = 1.15 times stronger.
In long-term breeding programmes, the deci-
sive gain criterion for evaluating selection Implications of epistatic effects
progress in hybrid breeding is the general
combining ability (GCA) of the improved Epistatic gene action may positively or neg-
lines. At the beginning of a breeding cycle atively affect hybrid performance (Lamkey
the test units are the DH lines per se and and Edwards, 1999). In most cases, epi-
later on in the cycle their testcrosses. static effects have been reported to cause
Strong selection (large i) leads to a small a decrease in the testcross performance
effective population size and consequently of segregating generations (Lamkey et al.,
to a loss of genetic variance due to random 1995) or to penalize three-way and double
drift. To keep this loss within certain lim- crosses compared to their non-parental sin-
its, a minimum number of lines should be gle crosses (Sprague et al., 1962; Melchinger
recombined after each breeding cycle. This et al., 1986). These effects are commonly
number depends on the inbreeding coeffi- referred to as recombinational loss and
cient (F) of the candidate lines. The number may be explained by a disruption during
should be (2F) times larger for inbred lines meiosis of co-adapted gene arrangements
than for non-inbred genotypes. Assuming assorted by prior natural and artificial selec-
that S2 lines (F = 0.75) are recombined in tion. Marker-based analyses of QTL partially
conventional breeding, the number of DH corroborate this hypothesis (Stuber, 1999).
lines (F = 1) would have to be increased To avoid recombinational loss and still offer
1:0.75 = 1.33-fold to preserve an equiva- a chance to select for new positive interac-
lent level of genetic variation. This means tions, a balance between recombination and
that the selection intensity must be reduced fixation of gene arrangements is needed. The
accordingly when using DH lines. DH-line approach might offer the method
In contrast to the selection intensity, for achieving this goal as homozygosity can
hx and rG increase when using DH lines. be reached in one cycle of recombination
This increase is particularly large in the when F1 is used for DH development or in
first testcross stage. Neglecting epistasis, the different cycles when segregating popula-
GCA variance of inbred lines is equal to 1/2 tions of different generations are used.
F sA2 (Falconer and Mackay, 1996), where sA2
is the additive variance of the base popu-
lation. Thus the GCA variance of DH lines 4.2.5 Applications of DH populations
is 1:0.75 = 1.33 times larger than that of S2 in genomics
lines. This leads to better differentiation
among the testcrosses and consequently to In genetics, DHs may serve to recover
higher heritability. Seitz (2005) compared recessives. Using DHs, linkage data can be
three sets of S2 and S3 lines each with DH obtained directly by sampling gametes as
lines derived from the same crosses and monoploids. DHs are ideal for the study
evaluated the same testers in the same envi- of mutation frequencies and spectra. As
ronments. On average, the estimated genetic DHs represent homozygous, immortal and
testcross variances for grain yield (bu. acre1) true breeding lines, they can be repeatedly
amounted to 50, 94 and 124 for S2, S3 and phenotyped and genotyped so phenotypic
DH lines, respectively. and genotypic information can be accumu-
The genetic correlation between selec- lated over years and across laboratories. In
tion and gain criterion (rG) also increases genomics, DHs are therefore ideal for study-
with the degree to which the tested lines ing complex traits that are quantitatively
have been inbred. For example, the correla- inherited which may require replicated tri-
tion between St lines and their homozygous als over many years and locations for accu-
progenies for GCA is equal to Ft whereas rate phenotyping.
128 Chapter 4
AB P2 ab
P1 ab
AB
AB
F1
ab
Gamete AB Ab aB ab
Haploid AB Ab aB ab
AB Ab aB ab
Double haploid
AB Ab aB ab
1991). A genetic map was constructed using assess their true breeding potential for target
55 RFLP markers and two known genes and traits. They have the following advantages
is the first complete molecular map to be and clear beneficial applications (Melchinger
constructed using DH populations in crops. et al., 2005; Rber et al., 2005; Longin et al.,
Since then, many DH populations have been 2006; W. Schipprack, University of
developed using the different approaches Hohenheim, personal communication):
described above and have been used for map providing the quickest possible route to
construction and genetic mapping.
complete homozygosity;
giving an immediate product of stable
4.2.6 Application of DHs recombinants from species crosses;
in plant breeding no masking effects because of the high
homogeneity attained in the first gen-
The benefits of DHs in plant breeding have eration of DH populations;
been widely reviewed; readers should refer
increased performance per se due to selec-
to Forster and Thomas (2004), Forster et al. tion pressure in the haploid phase and/or
(2007), and the five volumes on In Vitro during the first generation of DHs;
Haploid Production in Higher Plants edited
complete genetic variance accessible
by Jain et al. (19961997). from the very beginning of the selec-
Application of DHs in plant breeding tion process;
can be described by comparison of the time
easy integration of line/hybrid develop-
required to obtain fixed inbreds relative to ment with recurrent selection;
inbreeding, starting from a heterozygote:
reduced efforts in the nursery after the
first multiplication of DH lines compared
to a conventional breeding nursery;
Selfing of a Haploids of a maximum genetic variance in line per
heterozygote heterozygote se and testcross trials;
high reproducibility of early-selection
Gametes: 1/2 A + 1/2 a Gametes: 1/2 A + 1/2 a
F2 1/4 AA, 1/2 Aa, 1/4 aa chromosome doubling results;
F3 1/4 Aa 1/2 AA + 1/2 aa
high efficiency in stacking specific tar-
F4 1/8 Aa geted genes in homozygous lines; and
F5 1/16 Aa simplified logistics for seed exchange
F6 1/32 Aa between main and off-season pro-
1/2 AA + 1/2 aa grammes since each line is fixed and
can be represented by a single plant.
Apparently, the DH approach has a time DHs have been used in plant breeding
reduction of three to four generations com- programmes to produce homozygous geno-
pared to inbreeding-based breeding. The DH types in a number of important species,
approach features many logistical advan- e.g. tobacco (Nicotiana tabacum L.), wheat,
tages simplifying breeding to a large extent barley, canola (Brassica napus L.), rice
and enabling evaluation of genetically fixed and maize (Maluszynski et al., 2003), but
hybrid components from the very beginning only rarely in triticale, oat, rye and others.
of the selection process. Depending on the Research in crops such as rice, wheat and
material, the costs and the breeding scheme maize has shown that significant progress
adopted, the DH approach can reduce the in haploid technology is attainable given an
time for development and commercializa- intensive research effort. Well-established
tion of new inbred lines and lead to a higher methods in these crops have allowed major
expected genetic gain per unit of time. parts or whole breeding programmes to
As outlined above, DH lines extracted be based on DH production. Oat, triticale,
from a heterozygote or a segregating popula- wild barley, potato and cabbage are exam-
tion represent immortalized, reproducible ples of crops where DH technologies are
gametes that can be immediately evaluated to less advanced but in which hundreds of
130 Chapter 4
DHs may still be obtained (Tuvesson et al., rare alleles and aid the efficient selec-
2007). In other crops, including some veg- tion for quantitative traits in breeding. In
etable species and forage and turf grass spe- outcrossing species, DHs enable undesir-
cies, DH methods are being developed, but able recessive genes to be eliminated from
applications in crop improvement are rare. lines at any breeding stage (Forster and
The DH approach has yet to be exploited Thomas, 2004).
in leguminous species, predominantly due Development of DHs through anther
to their cultivation in developing countries culture has been very successful with many
and consequent paucity of research fund- cultivars released in barley breeding world-
ing. Difficulties have also been posed by the wide and in rice breeding in China since the
small anther size and relatively low number 1970s. The production of DHs has become
of microspores per anther in legume crops the preferred tool in many advanced plant
(Croser et al., 2006). breeding institutes and commercial compa-
The DH technique offers an efficient nies for breeding many crop species. Due
tool for extracting individual gametes from to the obvious advantages of DH lines and
heterozygous materials and transform- the enhancements made in in vivo haploid
ing them into homozygous lines that can induction in recent years, many commer-
be reproduced ad libitum by selfing. DHs cial breeding companies such as Agreliant,
extracted from a heterogeneous popula- Monsanto and Pioneer are presently adopt-
tion, e.g. landraces, represent immortal- ing or are already routinely using this
ized, reproducible gametes that can be technology in their maize breeding pro-
immediately evaluated to assess their true grammes (Seitz, 2005). Recurrent selection
breeding potential for target traits. They for testcross performance using DHs has
can also serve as source material for breed- reduced the cycle length and improved
ing programmes of hybrids and synthetics. genetic advance (Gallais and Bordes, 2007).
Furthermore, DH lines may be used for In some companies in vivo haploid induc-
long-term conservation of heterogeneous tion has more or less replaced conventional
germplasm resources such as landraces line development with up to 15,000 DH
without the risk of genetic drift and other lines per year per breeding programme and
changes in gene frequencies, as well as for over 100,000 DH lines per year across all
in-depth characterization of the breeding programmes at costs of US$10 or less per
potential of each heterogeneous germplasm DH line. The first maize hybrids produced
collection because each of the extracted DH using DH lines have been commercialized
lines can be evaluated in replicated trials in in the USA and Europe (W. Schipprack,
diverse environments. University of Hohenheim, personal com-
With some DH methods, only a tiny munication). However, the development
fraction of the haploid seedlings will ger- of new, more efficient and cheaper large-
minate and survive to the adult stage due scale production protocols has meant that
to the uncovered genetic load and the stress DHs have also recently been applied in less
in plant development exerted by colchi- advanced breeding programmes.
cine treatment for chromosome doubling.
Nevertheless, because the DH technique is
rather simple, it is feasible to generate and
identify large numbers of haploid seeds, 4.2.7 Limitations and future prospects
treat them with colchicine and transplant
them to the field. Hence, by starting with a Genetics and breeding in DHs have not
sufficiently large number of haploid seeds given the desired and expected dividends,
it is possible to generate hundreds of via- despite the substantial investments made
ble DH lines with acceptable agronomic in haploid research since the late 1980s.
performance. Some of the widely recognized limitations
DHs are essentially important in the of DH breeding are as follows: (i) haploids
evaluation of diversity, because they fix cannot be obtained in the high frequency
Populations in Genetics and Breeding 131
required for selection in most important 4.3.1 Inbreeding and its genetic effects
crop species; (ii) the costbenefit ratio in
DH breeding is often not favourable, thus RILs result from continuous inbreeding such
discouraging its use despite the obvious as selfing or sibmating starting from an F2
advantages; (iii) haploids and DHs will population until homozygosity is reached.
express recessive deleterious traits and There are two genetic responses to inbreed-
deleterious mutations may arise during the ing, gene recombination and genotype
DH development process including anther homogenization. Starting from a heterozy-
culture, particularly for open-pollinated gote at a locus A-a, for example, selfing will
species; (iv) different ploidy levels may be produce three genotypes, AA, Aa and aa.
available so that haploid status may need With continuous selfing, two homozygotes,
to be confirmed cytologically; alternatively, AA and aa, will not segregate, while the
pollen culture may be necessary, which heterozygote Aa will continue to segregate
is expensive and has a relatively low suc- producing the three genotypes. However,
cess rate and is also genotype-dependent the proportion of heterozygotes in the popu-
in many species; (v) doubled haploidy may lation will decrease with continuous selfing
also decrease genetic diversity, which is and will approach zero. This process can be
better maintained in heterozygous lines; described as below.
(vi) the success of the DH method is highly Consider one locus with two alleles,
genotype dependent, so is not yet suitable A and a, underlying continuous selfing.
for all breeding programmes; (vii) some Homozygotes will increase by 50%, while
techniques, e.g. inducers in maize (espe- heterozygotes will decrease by 50% with
cially the good ones), are proprietary and each generation of selfing. At generation t,
not available to all interested breeders; and the proportion of heterozygotes in the
(viii) health and legal concerns related to population will be (1/2)t, while the propor-
handling the chemical doubling agents. tion of homozygotes will be 1 (1/2)t ; the
The Third International Conference on homozygotes AA and aa each account for
Haploids in Higher Plants (1215 February [1 (1/2)t]/2 = (2t 1)/2t+1 (Table 4.1).
2006, Vienna, Austria) highlighted the When two or more loci, for example k
following issues that are important to future loci, are involved, successive selfing from F1
studies on DHs: (i) new methods of haploid hybrids will produce (1/2)tk heterozygotes
and DH plant formation; (ii) mechanism and [1 (1/2)t]k = [(2t 1)/2t]k homozygotes
of initiation of haploids; (iii) application at generation t. The more loci are involved,
of haploid cells, gametes, haploid and DH the longer it takes to reach homozygos-
plants in fundamental and applied sci- ity (Fig. 4.2). In the seventh generation of
ence; (iv) genes controlling haploid forma- selfing starting from a heterozygous hybrid
tion from female and male gametes; and for example, the proportion of homozy-
(v) methods of diploidization of haploids. gotes will be 99% for the population with
one heterozygous locus involved, 96% for
the population with five heterozygous loci
involved, 89% for 15 loci, 79% for 30 loci
4.3 Recombinant Inbred Lines (RILs) and 46% for 100 loci.
If heterozygous loci are linked,
Recombinant inbred lines or random inbred successive inbreeding can still produce
lines (RILs) are usually a part of the ultimate a homozygous population. However, the
products of many breeding programmes rate of approach to homozygosity depends
and are also used as genetic materials. They on the recombination frequencies between
can be produced by various inbreeding the linked loci. The lower the recombina-
procedures. To help understand the whole tion frequency, the higher the proportion of
process of development and applications homozygotes in the population and the more
of RILs, the inbreeding procedure and its rapidly the population becomes homog-
effects will be discussed first. enized. If the recombination frequency, r,
132 Chapter 4
Table 4.1. Genotypes derived from a single-locus heterozygote and their frequencies in selfing generations.
Genotype
Frequency of Frequency of
Generation AA Aa aa heterozygotes homozygotes
0 1 - 1 0
1 1/4 2/4 1/4 1/2 50.0
2 3/8 2/8 3/8 1/4 75.0
3 7/16 2/16 7/16 1/8 87.5
4 15/32 2/32 15/32 1/16 93.8
5 31/64 2/64 31/64 1/32 96.9
10 1023/2048 2/2048 1023/2048 1/2048 99.9
100
75
Homozygotes (%)
1 5 10 20 40 100
50
25
0
1 2 3 4 5 6 7 8 9 10 11 12
Generations of selfing
Fig. 4.2. Effects of generations and genetic loci on the proportion of homozygotes in self-pollinated
populations (numbers of generations are 1, 5, 10, 20, 40, 100).
is close to zero or two loci are completed the genetic combinations of two parental
linked, the rate of homogenization will genomes represented in individual F2 plants
be close or equal to the rate for the popu- are each represented by an RIL (Fig. 4.3).
lation with one heterozygous locus. If r is The genetic combinations of two parental
about 50%, the rate of homogenization will genomes are fixed in a group of RILs.
be about the same as that for the popula- For quantitative traits that are con-
tion with two heterozygous loci. It can be trolled by polygenes or multiple QTL, the
estimated that for two linked loci and after mean value of the population will return
one generation of selfing, the proportion of to the average value of the parental lines
homozygotes will be 41% for r = 10%, 34% because dominance and dominance-related
for r = 20%, 26% for r = 40% and 25.26% epistasis will dissipate with increasing
for r = 45%. homogenization. The variance will also
Continuous inbreeding (e.g. selfing) change with increasing homogenization
results in the fixation of segregation so that but the direction of change will depend
Populations in Genetics and Breeding 133
A B
38 10
III
III
34 8
IV
I
Variance
Mean
30 6
I
IV II
26 4
P1 II
22 P2 2
P F1 F2 F3 F4 F5 F6 F2 F3 F4 F5 F6
Generation Generation
Fig. 4.4. Change of mean (A) and variance (B) in RIL populations derived by SSD. (I) Additive increasing
alleles are completely dominant. (II) Additive without dominance effect. (III) Additive increasing alleles are
completely dominant with complementary interaction. (IV) Additive increasing alleles are completely domi-
nant with duplicate interaction.
because of lack of seed germination or fail- hill the following generation. An individual
ure of plant establishment to produce seed. plant is harvested from each line when the
It is necessary to decide on the number of population has reached the desired level of
inbred plants that are desired in the last homozygosity.
generation and begin with an appropriate With the single-hill procedure the iden-
population size in the F2 generation. The tity of each F2 plant and its progeny can be
single-seed procedure ensures that each maintained during self-pollination. When
individual in the final population traces to the identity of an F2 is maintained, the seed
a different F2 individual. However, the pro- packet and hill must be properly identified
cedure cannot ensure that a particular F2 with a line designation for planting and
will be represented in the final population harvest.
because failure of any seed to germinate or
generate a productive plant automatically MULTIPLE-SEED PROCEDURE. Use of the single-
eliminates that seeds F2 family. seed procedure requires that the size of the
populations in F2 be larger than in later gen-
SINGLE-HILL PROCEDURE. The single-hill pro- erations, due to lack of seed germination or
cedure can be used to ensure that each F2 plant establishment for seed set. Usually,
plant will have progeny represented in two samples are harvested, one for planting
each generation of inbreeding. Progeny in the next generation and one for a reserve.
from individual plants are maintained as Researchers sometimes bulk two or three
separate lines during each generation of seeds from each plant during harvest. Part
inbreeding by planting a few seeds in a of the sample is planted and the remainder
hill or row, harvesting self-pollinated seeds is reserved. The procedure is referred to as
from the hill and planting them in another modified SSD. The number of seeds planted
Populations in Genetics and Breeding 135
and harvested each season depends on the opportunities to recombine in RIL popu-
number of lines desired from the popula- lations. This property was discovered by
tion and the anticipated percentages of seed Haldane and Waddington (1931) by studying
germination, seedling establishment and inbreeding populations. For tightly linked
seed set. loci, the number of recombinants observed
in RILs is twice that observed in the popu-
lations with only one cycle of meiosis. At
Advantages and disadvantages of SSD
the beginning stage of genetic mapping,
procedures
this multiple recombination in RILs makes
Fehr (1987) summarized the merits of the it difficult to detect linkage. Once linkage
SSD procedures and indicated the follow- relationships are roughly established among
ing advantages: loci, the greater frequency of recombination
makes it easy to detect non-allelism among
They are an easy way to maintain popu- loci. It also makes the estimation of genetic
lations during inbreeding. distances more accurate because the con-
Natural selection does not influence fidence interval for an estimated genetic
the population unless genotypes differ distance is a function of recombination
in their ability to produce at least one frequency. With the increased number of
viable seed each generation. meiosis events, there are more opportunities
The procedures are well suited to green- to find recombinants between two tightly
house and off-season nurseries where linked loci (Fig. 4.5).
the performance of genotypes may not In populations that have undergone only
be representative of their performance one cycle of meiosis, recombinant frequency
in the area in which they are normally r (%) is linearly related to map distance
grown. R (cM), as indicated by the dashed line in
The disadvantages are: (i) artificial selec- Fig. 4.5. In RIL populations derived from
tion is based on the phenotype of individual selfing, r is almost equal to 2R when the
plants, not on progeny performance, when map distance is small, which is indicated
SSD is used for cultivar development rather by the solid line and formula R = r/(22r)
than genetic population development; and (Fig. 4.5). For the RIL populations derived
(ii) natural selection cannot influence the
populations in a positive manner unless
undesirable genotypes do not germinate or 50
set any seed.
Recombinant frequency r (%)
2R
40 r =
1+2R
from sibmating, the skew becomes more sig- populations of comparable population size.
nificant with r nearly equal to 4R when the According to Taylor (1978), RILs derived
map distance is small. from sibmating were more powerful in the
estimation of map distances than popula-
tions undergoing single meiosis when R
12.5cM. Based on Taylors method, it can be
4.3.4 Construction of genetic inferred that RILs derived from self-pollina-
maps using RILs tion have greater influence on the estima-
tion of map distance when R 23cM.
As each RIL is inbred as a DH line and thus Because of the advantages of RILs,
can be propagated indefinitely, a panel of they have been receiving great attention in
RILs has a number of advantages for genomic genomics research. Numerous RIL popu-
studies: (i) each line needs to be genotyped lations have been developed in plant spe-
only once; (ii) multiple individuals can be cies, especially in maize and rice. Burr et
phenotyped from each line to reduce indi- al. (1988) reported RFLP maps constructed
vidual, environmental and measurement var- using two maize RIL populations, T232
iability; (iii) multiple invasive (destructive) CM37 and CO159 Tx303. Among
phenotypes can be obtained on the same set 334 mapped genetic loci, 220 were poly-
of genomes; and (iv) as recombinations are morphic in both populations. By comparing
more frequent in RILs than in populations the map distances obtained from these two
with only one meiosis, greater resolution populations with each other and with pub-
can be achieved in genetic mapping. licly accepted map distances, they found
In genetic mapping with RIL popula- that the differences could be twice as large
tions, recombinant frequency should be in some cases. Although these differences
converted into map distance using the for- were still within the range of confidence
mula R = r/(22r) proposed by Haldane and intervals, they might be due to the genetic
Waddington (1931). There are no mapping difference in recombinant frequencies at
functions available for RIL populations to specific chromosomal regions. In maize
adjust for double crossover events as there there is no significant polymorphism caused
are for populations with one cycle of mei- by chromosome rearrangement, except for
osis as discussed in Chapter 2. When the chromosome 10. Therefore, it is not surpris-
map distance is within the range that allows ing that there was no significant difference
confidence about linkage detection, recom- in map distance between the two maize RIL
binant frequency has a linear relationship populations. Table 4.2 provides some exam-
with map distance (Fig. 4.5; Silver, 1985). ples of RIL populations developed in maize
Non-linked loci may be linked simply (Burr et al., 1988) and in rice (Xu, Y., 2002)
due to chance. These false linkages can often that have been widely used for linkage map-
be confirmed by whether a linkage detected ping and gene tagging.
with one marker is also judged to be linked
by other markers in the same linkage group
and whether the suspected linkage found in 4.3.5 Intermated RILs and nested RIL
one population can also be detected in other populations
RIL populations. Mouse geneticists dis-
cussed the case when a linkage could not be
Intermated RILs
certain because of small population sizes,
and Silver (1985) provided a table for the The production of RILs allows for the accu-
95 and 99% confidence intervals for esti- mulation of recombination breakpoints
mated map distances based on RILs derived during the inbreeding phase. However, the
from sibmating. At low rates of recombi- accumulation in RILs is limited by the fact
nation, these intervals are relatively small that each generation of inbreeding makes
when compared with those obtained from the recombining chromosomes more simi-
the binomial distribution for F2 and BC1 lar to one another so that meiosis ceases to
Populations in Genetics and Breeding 137
Table 4.2. Some examples of RIL populations developed in maize (Burr et al., 1988) and rice (Xu, 2002).
The Complex Trait Consortium for ing variation in maize, 25 RIL mapping
mouse proposed the development of a large populations were created. Twenty-five
panel of eight-way RILs (Complex Trait diverse lines were selected to capture
Consortium, 2004). An eight-way RIL, also 80% of the nucleotide polymorphism in
known as Collaborative Cross, is formed by maize. In order to provide a uniform eval-
intermating eight parental inbred strains uation background, each line was crossed
followed by repeated sibling mating to to a common parent, B73 (the standard
produce a new set of inbred lines whose US inbred), to form 25 RIL populations.
genome is a mosaic of the eight parental Each of these RIL populations has at least
strains (Broman, 2005). Such a panel of 200 RILs, each descended from a unique
RILs would serve as a valuable resource F2 plant, resulting in a total of 5000 RILs.
for mapping the loci that contribute to Using SSD and low density planting, 88%
complex phenotypes in mouse and would success in advancing lines per genera-
support studies that incorporate multiple tion was achieved. This has developed as
genetic, environmental and developmen- an integrated mapping approach, called
tal variables into comprehensive statisti- Nested Association Mapping (NAM),
cal models of complex traits (Complex which exploits simultaneously the advan-
Trait Consortium, 2004). The genomes of tages of linkage analysis and association
eight founder strains are rapidly combined (or linkage disequilibrium, LD) mapping
and are then inbred to produce finished as discussed in Chapter 6. The power of
RIL strains. Eight-way RIL strains achieve NAM for genome-wide QTL mapping has
99% inbreeding by generation 23. Each been demonstrated by computer simu-
strain captures 135 unique recombinant lation with varied numbers of QTL and
events. With genetic contributions from trait heritabilities (Yu et al., 2008). With
multiple parental strains including several a dense coverage (2.6 cM) of common-
wild derivatives, the eight-way RILs will parent-specific (CPS) markers, the genome
capture an abundance of genetic diversity information for 5000 RILs can be inferred
and will retain segregating polymorphisms based on the parental genome informa-
every 100200 bp. This level of genetic tion. Essentially, the linkage information
diversity will be sufficient to drive phe- captured by the CPS markers and the LD
notypic diversity in almost any trait of information among loci residing between
interest. An estimated 1000 strains will CPS markers was then projected to RIL
be required to guarantee high mapping based on parental information, ultimately
resolution and detect extended networks allowing for genome-wide high-resolution
of epistatic and geneenvironment interac- mapping. The power of NAM with 5000
tions. This estimate is based on the statisti- RILs allowed 3079% of the simulated
cal power necessary to detect biologically QTL to be precisely identified. In the
relevant correlations among thousands ongoing genome sequencing projects,
of measured traits. A set of 1000 strains NAM would greatly facilitate complex
containing 135,000 recombinant events is trait dissection in many species in which
a far more powerful and flexible research a similar strategy can be readily applied.
tool than a set of 100 strains containing the
same total number of recombinant events.
Mounting evidence suggests that genegene 4.4 Near-isogenic Lines (NILs)
interactions (epistasis) are crucial in many
complex disease aetiologies. A set of 1000 Near-isogenic lines (NILs) derived from
strains will readily support simultaneous inbreeding are in most cases the product of
mapping of many two-way and three-way successive backcrossing. The methods for
epistatic interactions. obtaining NILs, including the genetic effects
Similar resources have been of backcrossing will be described first, fol-
developed in maize, Arabidopsis and lowed by their applications in genomics
Drosophila. To map much of the segregat- and plant breeding.
Populations in Genetics and Breeding 139
seed from individual plants of 25 BC2F1 the recipient and 24 cultivars including
lines from each cross were bulk-harvested 14 indica and ten japonica cultivars from
to form a single bulk BC2F2 population. worldwide sources as donors (Xi et al.,
In addition, 3050 superior high-yielding 2006). The current library consists of 1529
BC2F1 plants from each cross were further SSSLs with an average substitution seg-
backcrossed with the RPs to produce BC3F1 ment length of 18.8 cM. The library cov-
lines and likewise BC4F1 lines. Similarly, ers 28705.9 cM of genome in total which
BC3F2 and BC4F2 bulks were generated is equal to 18.8 genome-equivalents. This
by bulk-harvesting seeds of all BC3F1 library has been used for QTL mapping
and BC4F1 lines from each cross. The BC of many traits (Xi et al., 2006; Liu et al.,
bulks were then screened for their resist- 2008).
ance or tolerance to different abiotic and
biotic stresses, including drought, salin- The Lycopersicon pennellii ILs
ity, submergence, anaerobic germination,
zinc deficiency, brown planthopper, etc. Using whole genome marker analysis,
and in all cases the severity of the stresses Eshed and Zamir (1994) developed a per-
was strong enough to kill the RPs and only manent mapping population designed for
the surviving BC progeny were selected. QTL analysis. This resource is composed
Selection for many agronomic traits, such of a tomato cultivar (Lycopersicon esculen-
as flowering time, plant height, plant tum cv. M82) which includes single intro-
type related traits (leaf and culm angles), gressed genomic regions from the wild
grain quality parameters, yield related green-fruited species L. pennellii. This
traits, etc., was also undertaken based congenic resource, composed of 76 ILs,
on visual observation of BC bulks. Each provides nearly complete coverage of the
of the selected BC progeny was tested for wild species genome. The IL map is con-
selected phenotypes and selfed for two or nected to the high-resolution F2 map which
more generations to form a homozygous is composed of 1500 markers (IL chromo-
IL. ILs within each genetic background some maps) with seed of the ILs available
are phenotypically similar to their RP but through the C.M. Rick Tomato Genetics
each carries one or few traits introgressed Resource Center, University of California
from a known donor. Together, these ILs Davis.
contain a significant portion of the alle- Applications of the ILs developed in
les affecting the selected complex pheno- rice and tomato will be further discussed in
types in which allelic diversity exists in later chapters.
the primary gene pools of rice. A forward
genetics strategy was proposed and dem-
onstrated with examples for using these
ILs for large-scale functional genomics 4.4.4 Gene tagging strategy using NILs
research (Chapter 11). Complementary
to the genome-wide insertional mutants, Due to genetic linkage, the chromosome
these ILs provide new methods for highly fragments around the target locus will be
efficient gene discovery, candidate gene dragged into backross progeny and may be
identification and cloning of important retained in the subsequent progeny. This
QTL for specific phenotypes based on the phenomenon is called linkage drag. The
convergent evidence from QTL position, basic idea behind gene tagging using NILs
expression profiling and functional and is to use the opportunity provided by link-
molecular diversity analysis of candidate age drag to identify the molecular mark-
genes. ers located on the chromosome segments
In another example in rice, a single- around the target locus. This can be accom-
segment substitution line (SSSL) library plished by comparing the marker genotypes
was developed using Hua-Jing-Xian, an among RP, DP and NILs. When the genotype
elite indica cultivar from South China as at a particular marker locus is the same for
Populations in Genetics and Breeding 143
the NIL and DP but different for the RP, pos- 4.4.5 Theoretical considerations
sible linkage between this marker and the in genetic mapping using NILs
target locus can be determined (Fig. 4.7).
Success in gene tagging using NILs One source of errors in NIL-based map-
relies on the assumption that there is genetic ping is the possibility that the RP not only
difference between DP and RP in the chro- differs from the DP in the regions around
mosome region flanking the target locus. the target locus but also differs at other
Apparently, the likelihood of detecting this loci distributed all over the genome. This
difference depends on the length of the chro- is because for a limited number of back-
mosome regions in the DP which have been crosses, t, the DP genome retained in the
retained during the backcross; this parame- backcross progeny is 1/2t, which creates
ter decreases with the increasing number of the possibility that the polymorphic mark-
backcrosses. Detecting these differences also ers in the retained regions could be falsely
depends on the molecular polymorphism in identified as being located in the target
this region in DP and RP. When DP and RP region. These false positive markers are not
are from different species such as cultivated actually linked with the target region. As
and wild species, there is a high probability calculated theoretically by Muehlbauer et
of finding polymorphisms between them. al. (1988), for a genome containing 20 chro-
Conversely, the probability will be low if DP mosomes each of 50 cM, progeny derived
and RP are genetically closely related to each from five backcrosses will retain DP alleles
other. The likelihood of detecting molecular at four out of 100 randomly selected marker
polymorphisms between DP and RP at the loci. Among these four retained DP loci, it
target region can be increased by using large is estimated that only one or two will not
numbers of markers and/or different types be linked to the target gene. This estima-
of markers. tion is based on the assumption that there
With the increasing availability of com- is no selection for the RP phenotype, that
plete sets of BILs or NILs that cover the is, individuals that are heterozygous at the
whole genome, NIL mapping strategies pro- target marker loci are selected randomly to
vide a convenient approach to tagging and be backcrossed with RP.
isolating numerous genes. In the following
section, the discussion will be focused on Backcross introgression without selection
major gene-related issues while QTL map- for the RP phenotype
ping using NILs is described in Chapter 7.
Assume that a plant species has n chro-
mosomes, each of L M and the objective is
A B to transfer the target gene (classic marker)
located in the middle of one of the chro-
mosomes from DP to RP by t backcrosses
(b = t + 1). Suppose there are 100 polymor-
Marker 1
phic markers between DP and RP and these
markers are randomly distributed over the
Marker 2 genome. In the final backcross progeny,
the proportion of the marked chromosome
(M, where the target gene is located) from
the DP will be
DP NIL RP DP NIL RP
UMb = [2 (1ebL/2)/b]/L
Fig. 4.7. Gene tagging using NILs. (A) Possible
linkage, NIL has the same allele as DP at Marker
1 locus. (B) No linkage, NIL has different alleles with variance
from DP at both marker loci (1 and 2). NIL,
near-isogenic line; DP, donor parent; R, recurrent VMb = (2/L2){[2 (bL + 2)ebL/2]/b2 [(1/b)
parent. (1 ebL/2)]2}
144 Chapter 4
be determined. However, these factors vary Selection for the RP phenotype reduces
depending on specific DP RP crosses so the DP genome ratio during the backcross
that it is impossible to develop general process compared to cases where the RP
formulas that are applicable to different phenotype has not been selected.
crosses. Instead, Muehlbauer et al. (1988)
provided two examples to explain the
effect of selection for the RP phenotype on
4.5 Cross-population Comparison:
the retention of the DP genome for marked
and unmarked chromosomes. In each case,
Recombination Frequency and
selection for the RP phenotype signifi- Selection
cantly reduced the DP genome retained in
backcross progeny. 4.5.1 Recombination frequencies
across populations
4.4.6 Application of NILs in gene tagging The recombinant frequencies in DH and RIL
populations have been compared in maize
(Murigneux et al., 1993), wheat (Henry et al.,
NILs have been successfully used in gene
1988) and rice (Courtois, 1993; Antonio
tagging for almost all crop species with
et al., 1996) using populations derived from
available molecular marker systems and
different crosses.
NILs. Some pioneering examples in this
DH lines are used extensively in bar-
field include identifying molecular mark-
ley breeding programmes to reduce the
ers for the Tm-2a gene controlling resist-
time required to obtain pure lines and to
ance to tobacco mosaic virus in tomato
increase breeding efficiency. The avail-
(Young et al., 1988), the Dm1, Dm3 and
ability of high-density linkage maps of
Dm11 genes for downy mildew resistance
barley makes it possible to compare the
in lettuce (Paran et al., 1991) and the Pto
recombination frequencies in H. bulbosum
gene for stem rot (Pseudomonas) resistance
(Hb) and anther culture-derived DH lines
in tomato (Martin et al., 1991). Since then,
(see Section 4.2.1) across most of the bar-
numerous studies using NILs for gene tag-
ley genome. These methods differ in three
ging and map-based gene cloning have been
major aspects: (i) the Hb and anther cul-
reported. From these studies, some general
ture-derived DH lines arise from female
conclusions can be drawn:
and male recombinant products, respec-
Backcrossing significantly reduces the tively; (ii) the optimal donor plant growth
linkage drag around the target region. conditions differ (Pickering and Devaux,
The more backcrosses that are carried 1992); and (iii) the in vitro culture phases
out, the smaller the linkage drag frag- are distinct: microspores evolve into
ment in backcross progeny and the less embryoids that give rise to plantlets, while
frequently will false positives be found in the Hb method, plantlets develop from
between the NILs. zygotic embryos. Recombinant frequency
Molecular markers can be used to is likely to be affected by the first two
improve the efficiency of backcrossing features. Devaux et al. (1995) reported
by significantly reducing the linkage drag the results of an experiment comparing
and increasing the RP genome ratio. map distances observed in Hb and anther
Linkage drag has significant influ- culture-derived DH lines obtained from an
ence on the recovery of the RP genome F1 (Steptoe Morex) hybrid. Male (anther
around the target region, indicating its culture-derived) and female (Hb-derived),
important impact on backcross breed- DH populations were used to map the
ing programmes. barley genome and thus determine the
Using multiple NILs significantly different recombination rates occurring
reduces the probability of reporting during meiosis in the F1 hybrid donor
false positives for linked loci. plants. The anther culture-derived (male
146 Chapter 4
derived from anther culture (male gameto- Selection pressure involved in RIL
phytes), the observed distortion in segrega- development
tion can be attributed to differential viability
or lethality of pollen or to selective regen- Deviation from randomness due to selec-
eration in in vitro culture and clearly not to tion pressure in the production of RILs is
the selective influences of the gynoecium or a potential problem that needs more atten-
the differential competitive ability of pollen. tion. In contrast to the populations derived
The distortion in three chromosomal regions from a one-step homogenization, RILs are
(two on chromosome 2 and one on chromo- produced by many generations of inbreed-
some 10) detected in the DH populations by ing during which plants are subjected to
Xu et al. (1997) indicated an overrepresenta- selection pressures generated by various
tion of alleles from the japonica parent that environmental disturbances and competi-
has been proven to be easily regenerated by tion among plants that may well occur for
anther culture. The other parent is indica many years and seasons and in many loca-
which belongs to a subspecies that is more tions. The distortions resulting from selec-
recalcitrant to anther culture (Shen et al., tion pressures involved in RIL development
1982; Yang et al., 1983). It has been suggested can be understood by comparison of mul-
that these regions may be associated with the tiple populations of different genetic struc-
preferential regeneration of japonica geno- tures derived from the same crosses and by
types during anther culture. Yamagishi et al. comparison of populations produced by dif-
(1996) also identified markers in several ferent approaches.
chromosomal regions that showed aberrant He et al. (2001) compared molecular
segregation ratios favouring japonica alleles marker segregations between DH and RIL
in a DH population, although these markers populations derived by anther culture and
segregated normally in the corresponding SSD respectively from the same rice cross,
F2 population. They concluded that these ZYQ8 (indica) JX17 (japonica). In the RIL
regions contained genetic factors which con- population, 27.3% of the markers showed
ferred a selective advantage on the japonica distorted segregation at the P < 0.01 level, of
genotypes during anther culture. which 90% of the markers favoured indica
Selective regeneration of genotypes alleles while in the DH population, 18.2%
has also been reported in other plants. Very of the markers were skewed almost equally
strong distortions of single locus segrega- towards indica and japonica alleles. This
tions were observed in an anther culture- might reflect the different types of selection
derived barley population (Devaux et al., pressures to which the DH and RIL popula-
1995). Devaux and Zivy (1994) demon- tions were subjected. Eight commonly dis-
strated that some markers showing distorted torted regions on chromosomes 1, 3, 4, 7,
segregation are linked to genes involved in 8, 10, 11 and 12 were detected in both RIL
the anther culture response. In another bar- and DH populations of which seven skewed
ley DH population, a significant proportion towards indica alleles and one towards a
(44%) of the mapped markers showed dis- japonica allele. Five of them were located
torted segregation which was caused mainly near gametophytic gene loci (ga) and/or ste-
by the prevalence of alleles from the parent rility gene loci (S).
that responded better to in vitro culture To compare the frequency and location
(Graner et al., 1991). Although segregation of loci showing distorted allele frequencies
distortion may arise from genetic, physio- between different population types (F2, DHs
logical and/or environmental causes and the and RILs), information from 53 populations
relative contribution of each of these factors with a known number of distorted markers
may differ in specific populations, much of was summarized and analysed (Xu et al.,
the reported segregation distortion in anther 1997). In summary, RIL populations had
culture-derived populations is likely to be significantly higher frequencies of distorted
the result of using parental genotypes that markers (39.4 2.5%) than other population
differ in their response to anther culture. structures (DH: 29.4 3.5%; BC: 28.6 2.8%;
148 Chapter 4
F2: 19.3 11.2%), which may indicate the ity, the gametophyte gene (ga) (Nakagahra,
cumulative effects of selection pressures dur- 1972) also referred to as a gamete eliminator
ing the process of RIL development. Distorted or pollen killer, causing abortion of gametes
segregation in RIL populations derived via (Sano, 1990). A large number of ga loci and
SSD represents the cumulative effect of both sterility gene loci (S) have been identified
genetic (G) and environmental (E) factors on using morphological markers.
multiple generations and the G E interac- If segregation distorters have high herit-
tion becomes more pronounced with the ability, they will be detected in almost any
progress of selfing. Thus, it is difficult to dis- population if the parents differ at the genetic
tinguish genetic from environmental causes locus in question and in almost any environ-
of distortion in RIL populations. However, an ment in which the population is grown. For
over-representation of indica alleles in two a specific chromosomal region, the prob-
chromosomal regions on chromosomes 3 and ability of a distortion locus being falsely
6 was specific to one RIL population. These assigned will decrease with the number of
chromosomal regions may be associated with populations sharing the same distortion and
a selective advantage in the indica growth with the number of markers in a cluster of
environment in which the RIL population distorted markers. Use of multiple popula-
was developed. tions developed in multiple environments
In contrast to DH and RIL populations would facilitate the detection of highly
where genotypic frequencies are a perfect heritable genetic segregation distortion fac-
reflection of the allele frequencies due to tors. Chromosomal regions associated with
lack of heterozygotes, F2 populations offer marker segregation distortion in rice were
the potential to detect an advantage or dis- compared using six molecular linkage maps
advantage associated with the heterozygote (Xu et al., 1997). Mapping populations were
class at specific loci, even when the paren- derived from one interspecfic backcross
tal allele frequency is normal. and five inter-subspecfic (indica/japonica)
Expression of distorters with low her- crosses including two F2 populations, two
itability will be influenced by the environ- DH populations and one RIL population.
ment and therefore these will be detected Marker loci associated with skewed allele
only in experiments carried out under well- frequencies were distributed on all 12 chro-
controlled conditions. Because the segrega- mosomes. Distortion in eight chromosomal
tion distortion occurs either during, or just regions showed the grouping of previ-
before or after meiosis, the experimental ously identified gametophyte (ga) or steril-
environment must be controlled during the ity genes (S). Three additional clusters of
reproductive phase of the parental lines, skewed markers were observed in more than
although the effect will only be detected in one population in regions where no game-
the offspring. tophytic or sterility loci had been reported
previously. A total of 17 segregation distor-
Genetics of selection associated tion loci were postulated and their locations
segregation distortion in the rice genome were estimated. Using a
single F2 cross, Harushima et al. (1996) iden-
The genetic control of distorted segregation tified 11 major segregation distortions at ten
has been studied in rice (as summarized by positions on chromosomes 1, 3, 6, 8, 9 and
Xu et al., 1997) and barley (Konishi et al., 10 and at least two of these segregation dis-
1990, 1992) using morphological and iso- tortion regions (on chromosomes 1 and 3)
zyme markers. The genetic basis of seg- were also detected by Xu et al. (1997).
regation distortion may be the abortion of A similar comparison was undertaken
male or female gametes or the selective fer- among four maize mapping populations
tilization of particular gametic genotypes. using 1820 co-dominant markers (Lu et al.,
Distortion at a marker locus in rice may be 2002). On a given chromosome nearly all of
caused by linkage between the marker and the markers showing segregation distortion
the gene conferring lower pollinating abil- favoured the allele from the same parent.
Populations in Genetics and Breeding 149
Plant genetic resources are one of the most tion and enhancement more efficient. In this
important tools in agricultural research and chapter, most fields related to plant genetic
are used for the improvement of productiv- resources will be covered, including germ-
ity and sustainability of production systems, plasm collection, maintenance, evaluation,
both in the developed and in the develop- enhancement, utilization and documenta-
ing world. The beginnings of the contem- tion. As an introduction, biodiversity and
porary international system for germplasm genetic diversity will be discussed first.
conservation can be traced back to the bril- Biodiversity is of ecological, economic
liant and pioneering work of the Russian and cultural importance. Diversity within
botanist, Nikolai Vavilov. He was the first, an ecosystem allows it to survive and be pro-
in the 1920s, to realize the importance and ductive while providing an enormous range
potential benefits to be derived from gath- of products and services for exploitation by
ering plant genetic resources from around man. Agrobiodiversity, as a component of
the world and organizing them into a col- the total biodiversity, is important to agri-
lection. He noted that the task was not only culture. It helps ensure sustainability, sta-
to gather plants for the immediate breeding bility and productivity (Hawtin, 1998). As
needs of Soviet agriculture, but also to save recognized by the Convention on Biological
seeds from extinction. He also recognized Diversity, there are three interdependent
that modern cultivars were replacing the levels of biodiversity: ecosystem level, spe-
local landraces and that plant research was cies level and genetic level, each of which is
destroying the very foundation of its own influenced by, and influences the other.
existence, thereby threatening global food As described by Hawtin (1998), eco-
security. Since the late 1950s, there has system diversity can be defined as the vari-
been an increasing awareness and docu- ability between interdependent communities
mentation of the benefits of biodiversity and of species and the physical environment in
the risks associated with genetic erosion. which they live. Diverse agroecosystems
Different methods have been developed can lead to a wide variety of enterprises on
to conserve plant genetic resources and a national, regional or community scale
make them available to breeders. Analytical which in turn contributes to maximizing
methods including molecular markers and food security, helps to increase employment
geographic information systems (GIS) have opportunities and increase local or national
been developed to characterize genetic self-reliance. Species diversity relates to the
diversity and make its management, evalua- cultivars of species within an area. A diversity
of crop and animal species on the com- ticularly for societies in general. This chap-
munity, farm or field levels can help add ter will focus on genetic diversity within
stability by reducing reliance on a single crops, i.e. the genetic resources that lie at
enterprise. Such diversity can also lead to a the heart of sustainable agricultural devel-
more efficient use of resources, for example opment and provide the basis for the con-
by providing increased opportunities for tinued evolution and adaptation of crops.
nutrient recycling. Species diversity at the
field level, e.g. by planting crop mixtures,
can help to provide a buffer against adverse
conditions, pests and diseases. In addition,
5.1 Genetic Erosion and Potential
diversity in agricultural enterprises allows Vulnerability
for a more efficient use of inputs such as
labour. Genetic diversity refers to the vari- 5.1.1 Genetic erosion
ation within a species which represents the
total genetic information contained in indi- For centuries human intervention has
vidual plants of a species, each consisting altered the dynamic relationships among
of a unique assembly of genes constituting the various ecosystems used for food, feed,
its evolutionary heritage. This diversity fuel, fibre, shelter and medicines. However,
begins at the molecular level, is carried as one of the most profound and irreplace-
sequences of instructions on chromosomes able changes that humans have wrought is
and provides the foundation for environ- the acceleration of the rates of extinction
mental adaptation and ultimately for the of species caused by human colonization,
evolution of species. Genetic diversity ena- extension and intensification of agriculture
bles species to adapt to new ecosystems and industrialization. For example, defor-
and environments or changes in the cur- estation of tropical rainforests at the current
rent environment, by natural and/or human rate, may result in the elimination of some-
selection. Diversity within a crop species where between 5 and 15% of the worlds
helps diminish the risk of losses through species between 1990 and 2020. Given the
diseases or pests and provides opportuni- current estimate of about 10 million species
ties for the exploitation of different features in the world, these rates would translate
of the microenvironment by, for example into a loss of 15,00050,000 species year1 or
the presence of diverse growth habits and 50150 species day1. Thomas et al. (2004)
rooting patterns. Such factors can contrib- estimated the total extinction of plant spe-
ute both to greater stability and in many cies in Amazonia following the maximum
circumstances, greater productivity. At the expected climate change leading to habitat
same time, genetic diversity within crops destruction or climatic unsuitability to be
helps to provide a reservoir of genes for 69% for species with seed dispersal and
future crop improvement by farmers and 87% for those without seed dispersal. The
professional plant breeders. Figure 5.1 clearing of forests and the spread of urban
provides an example of genetic diversity areas is also resulting in the disappear-
in maize kernel phenotypes which exists ance of the wild relatives of crop plants.
in the maize germplasm, although only On the other hand, as indicated by Brown
a few of these phenotypes now exist in cul- and Brubaker (2002) several hundred crop
tivated maize. and wild plant species previously used by
Of the three levels which comprise glo- humans, are now classified as underutilized
bal biological diversity, genetic diversity has or neglected.
received the greatest attention within the The erosion in biodiversity is caused
agricultural community. As the raw mate- by a multiplicity of factors, including loss,
rial of future elite cultivars and an indicator fragmentation and degradation of habitats;
of sustainability of agricultural production, introduction of alien species into ecological
the status of genetic diversity is of utmost niches; overgrazing; excessive harvesting
concern for agricultural production and par- beyond the levels of natural regeneration
Plant Genetic Resources 153
Fig. 5.1. Maize kernel phenotype. After Neuffer et al. (1997; the original plate is from Correns, 1901).
as in the case of trees harvested for timber; settlement of new lands, changes in culti-
pollution of various media that sustain the vation methods and changes in agricultural
biological nutrient cycles in ecosystems; systems. Apart from the narrowing of genetic
deforestation and land clearance which diversity by extensive mono-cropping, the
is cited as being the most frequent cause Green Revolution has contributed indirectly
of genetic erosion in Africa; adverse envi- to the loss of biodiversity by soil mining. As
ronmental conditions such as drought and a result of the heavy use of fertilizers, chem-
flooding; introduction of new pests and dis- ical inputs and irrigation, agricultural crops
eases; population pressure and urbanization; which rely on Green Revolution technology
war and civil strife; technological advances have rendered the soil sub-fertile and in
in agriculture, particularly the green revolu- some cases inhospitable to other species.
tion which resulted in the abandonment of In addition, the phenomena of genetic drift
traditional crops in favour of new ones, the and selection pressure produce cumulative
154 Chapter 5
genetic erosion that may sometimes exceed that grew in the USA at the turn of the 20th
the genetic erosion actually taking place in century, more than 95% no longer exist;
the field (Esquinas-Alczar, 1993). Concerns (ii) only 20% of the maize types recorded
about the genetic erosion of plant genetic in 1930 in Mexico can now be found; and
resources were first articulated by scien- (iii) only 10% of the 10,000 wheat culti-
tists in the mid-20th century and have since vars grown in China in 1949 remain in use
become an important part of national poli- (Day Rubenstein et al., 2005; Gepts, 2006).
cies and international treaties. The process is under way in all countries,
Genetic erosion, or the reduction in both developed and developing and unfor-
genetic diversity in crop plants, takes on tunately includes some of the richest pri-
various shapes depending on ones view- mary and secondary gene centres of several
point, including the reduction in the number important food crops (Dodds, 1991). As
of different crop species being grown and the demand for uniform performance and
the decrease in genetic diversity (number grain quality has increased, new cultivars
of unrelated cultivars being grown with including hybrids are increasingly derived
crop species). In addition, other organisms, from adapted, genetically related and elite
both within and across agroecosystems, modern cultivars. The more genetically
are increasingly taken into account when variable but less productive primitive
assessing biodiversity as it relates to agri- ancestors have been almost excluded from
culture (Collins and Qualset, 1999; Hillel most breeding programmes. In a study of
and Rosenzweig, 2005). As one of the major pedigree relationships among 140 US rice
factors for genetic erosion, the transition accessions, Dilday (1990) concluded that
from primitive to advanced cultivars as a all parental germplasm in public cultivars
result of plant breeding, is worthy of a fur- used in the southern USA could be traced
ther discussion. This has occurred by two back to 22 plant introductions in the early
distinct pathways: (i) selection for relative 1900s and those used in California could be
uniformity, resulting in pure lines, multi- traced back to 23 introductions. The same
lines, single or double hybrids, etc.; and situation is true for soybean and wheat.
(ii) selection for closely defined objectives. Virtually all modern US soybean cultivars
Both processes have resulted in a marked can be traced back to a dozen strains from
reduction in genetic variation. At the same a small area in north-eastern China and the
time there has been a tendency to restrict majority of hard red winter wheat cultivars
the gene pool from which parental material in the USA originated from just two lines
has been drawn. This is a result of the high imported from Poland and Russia (Duvick,
level of productivity achieved when breed- 1977; Harlan, 1987).
ing within a restricted but well-adapted gene
pool and breeding methods that have made
it possible to introduce specifically desired
improvements such as disease resistance 5.1.2 Genetic vulnerability
and quality characteristics, into breeding
stocks with a minimum of disturbance to Genetic vulnerability is the potentially
the genotypic structure by backcrossing or dangerous condition which results from
transgenic approaches. a narrow genetic base. One of the most
In the process of modern plant improve- tragic cases on record caused by genetic
ment, the traditional cultivars (landraces) vulnerability is the Irish potato famine of
of farmers have been replaced by modern the 1840s in which more than one million
cultivars. In the 1990s, only about 15% of Irish starved to death as a consequence of a
the global area devoted to rice and 10% massive attack of late blight (Phytophthora
of the developing worlds wheat area were infestans) that destroyed the Irish potato
planted to landraces (Day Rubenstein et al., crop. The potato had been the main staple
2005). Other examples include the follow- of the Irish diet for the preceding centuries.
ing: (i) of the nearly 8000 cultivars of apple The underlying cause of the catastrophe
Plant Genetic Resources 155
was the narrow genetic base of the potato genomics, promises the potential to both
plants in that country; all had originated enhance and further endanger diversity. As
from a small quantity of uniform materi- a double-edged sword, biotechnology can
als brought from Latin America in the 16th enhance or jeopardize the greater utiliza-
century. Other famous examples include tion of genetic resources.
the coffee rust epidemic in Ceylon (1868)
and Southern corn leaf blight epidemic in
the USA (1970).
Genetic vulnerability stems from 5.2 The Concept of Germplasm
genetic uniformity, examples of which are
homozygosity (often recessive) as a result 5.2.1 A generalized concept
of clonal reproduction and the formation of of germplasm
F1 hybrids from inbred parents (e.g. hybrid
maize). The types of uniformity desired Germplasm can be defined as the genetic
in a crop are: (i) rapid and uniform germi- materials that represent an organism. The
nation of seeds; (ii) nearly simultaneous expression of plant genetic resources usu-
flowering and maturation; (iii) stature that ally refers to the sum total of genes, gene
promotes mechanical harvest; (iv) product combinations or genotypes embodied as
uniformity for taste, flavour and chemical cultivars that are available for the genetic
composition; and (v) year-to-year stability improvement of crop plants. Following the
of yield (Wilkes, 1993). With the substitu- proposal of Harlan and de Wet (1971), plant
tion and consequent loss of a primitive genetic resources were classified into three
cultivar, the genetic diversity contained gene pools that reflected the increasing dif-
in it is eliminated. To prevent such losses, ficulties in carrying out sexual crosses and
samples of the replaced landraces should obtaining viable and fertile progenies. Gene
be adequately conserved for possible future Pool I includes the crop species itself and
use. The tendency to eliminate the genetic its wild progenitor. Crosses within Gene
diversity contained in primitive landraces Pool I can generally be made easily and the
of plants jeopardizes the possible develop- resulting progeny is viable and fertile. This
ment of future cultivars adapted to tomor- gene pool corresponds closely to the biolog-
rows unforeseeable needs. ical species concept. Gene Pools II and III
As a few elite cultivars have come include other species that are less related to
to dominate the major crops worldwide, the crop species of interest. Crosses between
genetic vulnerability has increased. As Gene Pools II and III are possible but are
Wilkes (1993) indicated, we are now pro- usually more difficult to achieve. The prog-
moting a carpet of closely related dwarf- eny shows reduced viability and fertility.
stature cultivars across the grain belts of Finally, crosses between Gene Pools I and
the world. The magnitude of this potential III are the most difficult. Special techniques
is made clear by the fact that most of the such as tissue culture and embryo rescue
hybrid rice planted in Asia now shares must be used to obtain a progeny from
the same maternal cytoplasm and most of these crosses. The progeny often show a
the high-yielding bread wheat cultivars severe reduction in viability and fertility.
are presently based on only three types The operational definition of Harland and
of cytoplasm. There are many more such de Wet (1971) has been very useful because
examples in other important crops. The it reflects the realities of the breeding proc-
burden of genetic vulnerability has been ess, particularly the introduction of new
placed primarily on the shoulders of plant genetic diversity into the populations of a
breeders because elements in the technol- breeding programme by sexual hybridiza-
ogy of plant breeding can be designed to tion (Gepts, 2006). However, it could be
minimize its impact, for example by devel- argued that this definition may need to be
oping synthetic or composite cultivars expanded to include a Gene Pool IV based
and multi-lines. Biotechnology, including on the advances in scientific technology
156 Chapter 5
and increased awareness of the benefits of rier of genetic material, germplasm can be
biodiversity in general. Availability of plant anything that carries genetic information
transformation techniques (as discussed in required for controlling and rebuilding an
Chapter 12) has extended the reach of plant organism which includes genes and their
breeding beyond the limitations imposed clones, chromosome segments and even
by sexual cross-compatibility and as a pieces of functional DNA sequences. The
result, the Gene Pool IV should include all generalization of the concept of germplasm
organisms as a potential source of genetic depends on two major developments: cell
diversity. totipotency (the potential for regenerating
Classical germplasm can be defined a whole plant from a single cell) and the
according to reproduction systems to include development of the gene concept (genetic
seeds from sexual plants and all types of tis- material can be traceable to a small piece
sue such as roots, stems and other organs of DNA that controls a biological trait and
that can be used for reproduction in asexual codes for a specific protein) (Xu and Luo,
plants. Therefore, germplasm is tradition- 2002; Fig. 5.2).
ally defined as a morphologically distinct DNA as a type of genetic resource is
biological object. Different plant species or rapidly increasing in importance. DNA
cultivars from the same species can be dis- from the nucleus, mitochondrion and chlo-
tinguished from each other morphologically roplasts are now routinely extracted and
based on size, colour and shape. In the case immobilized on to nitrocellulose sheets
of sexual plants, seeds are the major carrier where the DNA can be probed with numer-
of germplasm and for most plant species ous cloned genes. With the development of
germplasm can be maintained and repro- PCR, specific fragments or entire genes from
duced by the collection and regeneration a mixture of genomic DNA can now be rou-
of seeds. Seeds are of major importance in tinely amplified (Engelmann and Engels,
the process of germplasm management and 2002). Genetic information can be synthe-
are collected, maintained and reproduced. sized and living variants can be created or
Evaluation and utilization of germplasm is rebuilt by using DNA sequence information.
dependent on the seeds that can be used to These advances have led to the formation of
generate plants and on other useful organs an international network of DNA repositor-
such as root, leaves, stems or even seeds ies for the storage of genomic DNA (Adams,
themselves. 1997). The advantage of this technique is
With the development of molecu- that it is efficient and simple and overcomes
lar biology, the concept of germplasm has physical limitations or constraints. The dis-
been generalized and broadened. As a car- advantage lies in problems with subsequent
Seed
ent
lo pm DN
d eve Ae
nd ct ion Synthetic xtra
th a odu Tissue seed or ctio
n
Gro
w epr
xu al r culture regeneration
Se
Plant Cell or tissue DNA
Tissue culture DNA extraction
Se
xu Tissue Protoplast ion
al lat
rep or pr culture fusion, etc.
e iso sfer
rod opa n n
uc g Ge d tra
tio ated an
n
Transgenic plants
Fig. 5.2. Germplasm carriers and their conversion through biological, molecular and biotechnological
approaches. Modified from Xu and Luo (2002).
Plant Genetic Resources 157
gene isolation, cloning and transfer (Maxted of their source (Kresovich et al., 2002). As
et al., 1997). A tool of potential importance a carrier for genetic materials, therefore,
is the development of collections of DNA germplasm is no longer limited to a specific
samples in glacie from wild species. species. Germplasm can be managed based
Genome sequencing and the under- on the classification of genes or properties
standing of the function of all plant genes across species rather than the terminology
will have significant impacts on the con- and classification of the plant. With the new
servation of plant genetic resources. The technologies that have been developed in
role that the genetic resources community molecular biology and genomics, the gene
might undertake in respect to conserving pool for any given species has expanded
molecular genetic products has yet to be well beyond the tertiary gene pool and
defined. As such, genebanks are facing new can be taken to include any gene from any
demands from the user community. There source, perhaps in the future, even to new
are an increasing number of resources being synthetic or shuffled sequences.
generated by the molecular community that As germplasm collection turns increas-
are relevant to conservation work. Some ingly to tissues, cells and DNA, methods for
genebanks might wish to store primers, collecting and preserving these materials
probes and DNA libraries to facilitate their may need to be modified or revolutionized
work, in addition to populations generated by using diversified techniques for preser-
for gene isolation and plant improvement. vation and reproduction. For example, pres-
In the future, users may want to receive ervation of tissues and cells can be achieved
functional DNA sequences, genes, clones by subculture or regeneration instead of the
or markers instead of seeds or other tradi- storage of seeds. Using cloning and transfor-
tional means of transporting DNA. They mation, DNA can be transferred into other
may want a series of specific alleles rather plants to obtain transgenic plants (Fig. 5.2).
than accessions where alleles are segregat- Germplasm management in the future will
ing. As Kresovich et al. (2002) indicated become an integrated science that is closely
a future-oriented analysis of these possi- linked with biotechnology including tissue
ble trends and their implications is very culture, gene cloning and transformation,
important in order to predict and thus pre- molecular marker technology and synthetic
pare for the changing role of genebanks and seed technology. As summarized by Taji
curators. et al. (2002), there are five main areas of
On the other hand, the concept of germ- biotechnology that can directly assist plant
plasm is no longer defined for each species conservation programmes: (i) molecular
or crop plant and its relatives. With the biology, particularly molecular markers:
development of technologies for gene clon- assisting in germplasm collection, aiding
ing and transfer as discussed in Chapters 11 genebank design and accession structure
and 12, the genetic barriers that used to exist and assessing genome stability, genetic
among different species or genera no longer diversity, population structure and distri-
exist and genes can be exchanged freely bution patterns; (ii) molecular diagnostics:
between different families and genera or assessing phytosanitary status; (iii) in vitro
even between plants, animals and bacteria. culture: micropropagation, slow growth and
Useful genes identified in one species can embryo rescue; (iv) cryopreservation: long-
be used to modify another species. Taxa that term conservation of seed-recalcitrant spe-
are evolutionarily related (e.g. grasses, leg- cies, vegetatively propagated species and
umes and members of the Solanaceae) have biotechnological products; and (v) informa-
strikingly similar genome organizations as tion technology: documentation, training,
discussed in Chapter 2. At the fundamen- transfer technology, germplasm exchange,
tal level of the gene, many sequences are DNA databases, genome maps, genebank
highly conserved across families. Therefore, inventories and international networking.
users of generic resources will acquire use- These areas will be discussed in various
ful genes from repositories independent sections of this chapter.
158 Chapter 5
These types of germplasm resources have dynamic ecosystems of their original or nat-
become very important materials in genet- ural environment including conservation in
ics and plant breeding. nature reserves and on the farm. This type of
conservation control is most suited to wild
related species. Ex situ conservation entails
removing germplasm resources (seed, pol-
5.2.4 In situ and ex situ conservation len, sperm and individual organisms) from
their original habitats and preserving them
As discussed above, the prospects of species in botanical gardens or gene/seed banks.
extinction is forever with or without human Examples of different methods of ex situ
intervention. Therefore, the best prepara- conservation include field genebanks, seed
tion for future uncertainties is to conserve storage, pollen storage, in vitro conservation
as many gene pools, species and ecosystems and DNA storage.
as possible, whether they have actual or There is an obvious fundamental differ-
potential utility to humankind. Use for the ence between these two strategies: ex situ
benefit of humankind is the strongest justi- conservation involves the sampling, transfer
fication for the conservation of plant genetic and storage of target taxa remotely from the
resources. During recent decades awareness collection areas whereas in situ conserva-
has been raised of the importance of con- tion involves the designation, management
serving crop gene pools to ensure that the and monitoring of target taxa where they are
breeder has adequate raw materials. This encountered (Maxted et al., 1997). Another
process is dynamic since the breeder is difference lies with the more dynamic nature
continually seeking new alleles and allelic of in situ conservation as opposed to the
combinations to improve the performance more static nature of ex situ conservation.
of a crop species in its target environments. Each technique has its own advantages
Conservation of germplasm resources and limitations. The major drawback of
goes far beyond the preservation of a species. ex situ conservation is that the evolution
The objectives must be to conserve sufficient of species would be frozen since no fur-
diversity within each species to ensure that ther adaptation to environmental or biotic
its genetic potential will be fully available stresses indigenous to their origin can take
in the future. Conservation of germplasm place and that the processes of selection
resources has been generalized to include and continuous adaptation to those local
all activities relating to germplasm manage- habitats are halted. Other disadvantages are
ment such as collection, maintenance, reju- that long-term integrity of the germplasm
venation and multiplication, evaluation, remains in question and that high rates of
exchange and documentation. In this sec- mutation exist among the ex situ stored
tion the discussion will focus on the meth- plants. Further drawbacks are the occur-
ods for germplasm conservation. rence of genetic drift (random loss of diver-
Conservation and control are two sity due to the fact that the samples collected
issues broadly related to biodiversity that and multiplied are necessarily very small)
have a bearing on the role of biotechnology. and selection pressure (the materials are
Conservation refers to the maintenance or usually multiplied in ecogeographical areas
enhancement of biodiversity particularly which differ from those in the areas where
the plant species and control refers to they were originally collected).
accessing this diversity. Two basic conserva- In situ techniques allow the conserva-
tion strategies, in situ and ex situ conserva- tion of greater inter- and intraspecific genetic
tion, each composed of various techniques diversity than is possible in ex situ facilities.
are employed to conserve genetic diversity They also permit continued evolution and
for various research and development pro- adaptation to take place, whether in the wild
grammes including plant breeding. In situ or on the farm where selection by man also
conservation refers to the preservation of plays a critical role. For some species such
genetic resources within the evolutionary as many tropical trees, it is the only feasible
160 Chapter 5
method of conservation. The main drawback orthodox seed for several reasons. First,
is the difficulty in characterizing, evaluating some species do not produce seeds at all
and assessing genetic resources and suscep- and consequently are propagated vegeta-
tibility to hazards such as extreme weather tively; these include banana and plantain
conditions, pests and diseases. In addition, (Musa spp.). Secondly, some species such
the monetary expense may be quite high, as potato, other root and tuber crops such
especially where there is pressure for alterna- as yams (Dioscorea spp.), cassava (Manibot
tive uses of the land. The method selected for esculenta), sweet potato (Ipomoea batatas)
in situ conservation depends on the nature of and sugarcane (Saccharum spp.) either
the species. Traditional crop cultivars may be have some sterile genotypes and/or do not
conserved on the farm while undomesticated produce orthodox seed. However, if they
relatives of food crops may require land to be are capable of seed production, these seeds
set aside as reserves (Hawtin, 1998). are highly heterozygous and are therefore
In situ conservation is especially appro- of limited utility for the conservation of
priate for wild species and for landrace particular genotypes. These crops are usu-
materials on the farm while ex situ conserva- ally propagated vegetatively to maintain
tion techniques are particularly appropriate the genotypes as clones (Simmonds, 1982).
for the conservation of crops and their wild Thirdly, a considerable number of species,
relatives (Engelmann and Engels, 2002). predominately tropical or subtropical in ori-
In situ conservation of biodiversity enables gin such as coconut, cacao and many forest
the preservation of the knowledge of farm- and fruit tree species, produce seeds which
ing systems, including biological and social do not undergo maturation drying and are
knowledge associated with them. Ex situ shed at relatively high moisture content.
conservation on the other hand, divorces Such seeds are unable to withstand desic-
the biological from the social context. cation and are often sensitive to chilling.
In situ and ex situ systems for conser- Seeds of this type are called recalcitrant and
vation of germplasm resources should be need to be kept in moist, relatively warm
considered as complementary and not antag- conditions to maintain viability (Roberts,
onistic. The current approach is to combine 1973; Chin and Roberts, 1980). Even when
both methods of conservation depending on recalcitrant seeds are stored in an optimal
such factors as reproductive biology, nature manner, their lifespan is limited to weeks
of the storage organs and propagules and or occasionally months.
availability of human, financial and institu- Other conservation methods are needed
tional resources (Bretting and Duvick, 1997). for these recalcitrant species. These include
Many major food plants produce seeds that conservation as living collections in field
undergo maturation drying or can be dried genebanks as described above for in situ
to low-water content due to their tolerance conservation or in vitro conservation either
to extensive desiccation and can therefore be as living plantlets, plant tissue on appropri-
stored dry at low temperatures. Seeds of this ate media often under conditions of slow
type are known as orthodox (Roberts, 1973). growth or by cryopreservation at very low
Storage of such orthodox seeds is the most temperatures, generally using liquid nitro-
widely practised method for ex situ conser- gen. For those problem species whose seeds
vation of plant genetic resources and about do not survive under conventional storage
90% of the 6.1 million accessions stored in conditions largely because they cannot tol-
genebanks are maintained as orthodox seed erate desiccation and die when exposed to
(Engelmann and Engels, 2002). For most spe- low temperatures, the field genebank is the
cies, stored seed is the most genetically rel- conventional approach to their conserva-
evant, i.e. it is the raw material with which tion. However, there are many drawbacks
the breeder works and seed propagation is an to this, not least being that field genebanks
integral part of the growth cycle of the crop. cannot provide secure, long-term conserva-
There are a significant number of tion as compared to the safety and low input
crops which fall outside the category of requirements of a seed genebank.
Plant Genetic Resources 161
facilitating more rational and effective sam- Wild relatives of our present crop
pling. Molecular markers can be used to plants, although agronomically undesira-
measure the degree of divergence within ble, may also have acquired many desirable
species, analyse inter- and intrapopulational stress-resistant characteristics as a result of
diversity and monitor genetic erosion within their long exposure to natures pressures.
genebank collections. Secondly, in vitro Many recent studies using wild relatives in
propagation methods can be modified for genetic mapping have identified cryptic
application in the field to provide new ways alleles that do not exist in cultivated plants
of collecting problem materials. (for details see Chapter 7) which make con-
For clonally propagated and recalci- servation of wild species a more important
trant seed-producing species, the materi- component in germplasm resources than
als collected are often bulky and heavy. ever before. Requirements for the devel-
Furthermore, they are often soil bearing, opment of collection strategies suitable
thereby introducing a plant health hazard. for wild relatives has been increasing and
Recalcitrant seeds and vegetative explants genomic tools including molecular mark-
such as shoots, suckers or tubers have a lim- ers can help to identify the genetic diversity
ited lifespan and may be prone to decom- and merits that exist in the wild relatives by
position through microbial attack. In some the methods discussed in Section 5.5.
cases, suitable materials for collection may
not even be available and seed may be
immature or absent as a result of grazing. 5.3.2 Core collections
However, new in vitro collecting techniques
involve the principles of in vitro inocula- As germplasm collections of major crop
tion and culture without the cumbersome plants continue to grow in number and
and complex conditions that normally per- size around the world, better access to and
tain to the laboratory. This was originally use of the genetic resources in collections
explored for cacao buds and the coconut have become important issues. Potential
embryo and was also successfully adapted users require either populations repre-
for several other materials (Withers, 1993). sentative of the diversity or accessions that
The observance of adequate quaran- describe particular agronomic characters
tine, disease indexing and disease eradica- (e.g. disease resistance, drought tolerance).
tion procedures are essential for the safe In either case the managers of collections
movement of germplasm from its origin to may find it difficult to meet such needs.
genebanks and among genebanks and users. The very size and heterogeneous structure
Clonally-propagated crops present particu- of many collections have hindered efforts
lar problems in that they are commonly col- to increase the use of genebank materials in
lected in the form of vegetative propagules plant breeding. Recognizing this, Frankel
that carry a relatively high risk of disease (1984) proposed that a collection could
transmission. They may accumulate sys- be represented by what he termed a core
temic pathogens since they lack the patho- collection, which would represent with
gen filter that the seed production stage can minimum repetitiveness, the genetic diver-
offer. The potential for eliminating patho- sity of a crop species and its relatives. The
gens via meristem-tip culture, sometimes accessions excluded from the core collec-
linked to other therapeutic processes such tion would be retained as the reserve col-
as thermotherapy, is now an important com- lection. Construction of a core collection
ponent of the process of introducing many involves selecting approximately 10% of
clonally-propagated crops into conservation the germplasm accessions to represent at
collections. The introduction of the enzyme- least 70% of the genetic variation (e.g. Brown,
linked immunosorbent assay (ELISA) and 1989a, b) unless the entire germplasm col-
other methods based on nucleic acid, bio- lection is very large, in which case less than
chemical and molecular technology provide 10% would be necessary. This proposal
new methods for detecting pathogens. was further developed by Frankel and
Plant Genetic Resources 163
Brown (1984) and Brown (1989a), who out- selection of which species and which sam-
lined how to achieve core coverage of the ples to include is crucial. Since the aim is
collection by using information regarding to obtain the maximum amount of useful
the origin and characteristics of the acces- information from a limited sample, the use
sions. In terms of practical use, the three of core collections is an obvious approach.
major objectives of the core collection are A general procedure for the selection
to set up as wide a representation as possi- of a core collection can be divided in four
ble of the genetic diversity to be able to con- steps:
duct intensive studies on a reduced set of
genotypes and to attempt to extrapolate the Definition of the domain: the first step
results thus obtained to facilitate research in creating a core collection is defin-
on appropriate genotypes in the base col- ing the material that should be rep-
lection (Noirot et al., 2003). resented, i.e. the domain of the core
The core proposal was a radical depar- collection.
ture in thinking regarding genetic resources Division into groups: the second step is
(Frankel, 1986). Until then, the main dividing the domain into groups which
emphasis had been on the open-ended task should be as genetically distinct as
of collecting as many samples as possible possible.
and securing their survival in storage, irre- Allocation of entries: the size of the
spective of continuing cost and use. Frankel core collection should be determined
and Brown (1984) introduced the notion of and the choice of number of entries per
adequacy of sampling of the species range. group should be made.
Analysis of climatic, ecological and geo- Choice of accessions: the last step is the
graphical information on the species range choice of accessions from each group
could be used to suggest where distinctly that are to be included in the core.
different environments or separated locali-
ties occurred for that species. This analysis Several different methods have been
could be checked with the available collec- used to construct core collections and these
tions and used to identify places or habi- aim to represent most of the genetic diver-
tats where collections had been excessive sity with the fewest number of accessions
and others where further collection is war- possible (see for example Noirot et al.,
ranted. In this way, a complete collection 2003). Many reports have been published
can be built up, from which a core collec- on the formation of core subsets. Hintum
tion can be extracted. (1999) described one such system, the Core
Using all the available data, core col- Selector to generate representative selec-
lections are arranged to make their entries tions of germplasm accessions. Upadhyaya
representative of genetic diversity. The and Ortiz (2001) developed a two-stage
basic procedure is to recognize groups of strategy for developing a mini-core collec-
related or similar accessions within the tion, again based on selecting 10% of the
collection and sample from each group. accessions from the core collection repre-
Presently, in the constitution of a core col- senting 90% of the variability of the entire
lection, most researchers agree on the need collection. In this process, a representative
for stratification prior to the sampling. In core collection is first developed using all
other words, the organization of the vari- the available information on geographic ori-
ability in groups and subgroups should be gin, characterization and evaluation data. In
taken into account. There are clear ben- the second stage, the core collection is eval-
efits to the greater use of these more pre- uated for various morphological, agronomic
cise measures of genetic variation. Equally and quality traits to select a subset of 10%
clearly, it is costly in human and finan- accessions from this core subset (or 1% of
cial resources to generate these measures the entire collection) that captures a large
so they can only be employed in a lim- proportion (i.e. more than 80% of the entire
ited number of collections. Therefore, the collection) of the useful variation. At both
164 Chapter 5
stages in selection of core and mini-core DNA markers, studies of genetic diversity
collections, standard clustering procedures aimed at developing core collections have
are used to separate groups of similar acces- been reported for several plant species.
sions combined with various statistical tests Crops with cores established at the early
to identify the best representatives. stage include lucerne, barley, chickpea,
Molecular markers have been used clover, lentil, medic, groundnut, bean, pea,
to construct core subsets which preserve safflower and wheat (Clark et al., 1997).
as much of the diversity present in the Mini-core collections are reported for
original collection as possible (Franco et crops such as chickpea (Upadhyaya and
al., 2005, 2006). Genetic markers on three Ortiz, 2001), groundnut (Upadhyaya et al.,
maize data sets and 24 stratified sampling 2002), pigeonpea (Upadhyaya et al., 2006b)
strategies were used to investigate which and rice (1536 accessions, D.J. Mackill,
strategy conserved the most diversity in the International Rice Research Institute (IRRI),
core subset as compared with the original personal communication). Such efforts have
sample (Franco et al., 2006). The strategies led to the identification of diverse germ-
were formed by combining three factors: plasm with beneficial traits of significant
(i) two clustering methods (unweighted economic value being found in barley and
pair-group means arithmetic (UPGMA) many legume crops (Dwivedi et al., 2005,
and Ward); based on (ii) two initial genetic 2007; Brick et al., 2006). Table 5.1 provides
distance measures; and using (iii) six allo- examples for core collections that have been
cation criteria (two based on the size of the established with a relatively large number
cluster and four based on maximizing dis- of germplasm accessions included. Several
tances in the core (the D method) used with types of data were used for each crop, with
four diversity indices). The success of each geographic origin usually being one of the
strategy was measured on the basis of max- first criteria used for selection.
imizing genetic distances (Modified Roger In rice, methods for selecting accessions
and Cavalli-Sforza and Edwards distances) to construct a core collection were inves-
and genetic diversity indices (Shannon tigated based on shared allele frequencies
index, proportion of heterozygous loci and (SAFs) and the frequency of unique RFLP
number of effective alleles) in each core. and SSR alleles (Xu et al., 2004; Fig. 5.3).
For the three data sets, the UPGMA with D Subsets of various sizes were selected (rep-
allocation methods produced core subsets resenting 550% of the US and world collec-
with significantly more diversity than the tions) using random selection as a control.
other methods and were better than the M For each sample size, 200 replications were
strategy implemented in the MSTRAT algo- analysed using a re-sampling technique and
rithm for maximizing genetic distance. the number of alleles in each subgroup was
Using the advanced M strategy with a compared with the total number of alleles
heuristic search for establishing core sets, identified in the larger collection from which
a program known as POWERCORE has been the subsets were sampled. A cultivar subset
developed (Kim et al., 2007). The program (13% of the entire collection) selected on the
supports development of core-sets by reduc- basis of both SAFs and number of unique alle-
ing the redundancy of useful alleles and thus les detected, represented 94.9% of the RFLP
enhancing their richness. The output of the alleles but only 74.4% of the SSR alleles. It
POWERCORE has been validated using some can be expected that selection criteria based
case studies and the program effectively on additional sources of information will fur-
simplifies the generation process of core-set ther improve the value and representative-
while significantly cutting down the number ness of core collections. This resource may
of core entries, maintaining 100% of the serve as a source of novel alleles for genetic
diversity. POWERCORE is applicable to various studies and for broadening the genetic base
types of genomic data including SNPs. of US rice cultivars. In addition, the follow-
Based on phenotypic evaluation of eco- ing conclusions were drawn (Xu et al., 2004):
nomically important traits and the use of (i) more samples were needed to represent
Plant Genetic Resources 165
Table 5.1. Description of core collections in barley, cassava, finger millet, maize, pearl millet, potato, rice,
sorghum and wheat (modified from Dwivedi et al., 2007).
Number of
Crop Descriptiona accessions Reference
A B
100 100
80 80
Alleles detected (%)
60 60
USA-SAF
USA-SAF
USA-RS
40 USA-RS 40
World-SAF
World-SAF
World-RS
20 World-RS 20
0 0
5 10 15 20 26 30 35 40 45 50 5 10 15 20 26 30 35 40 45 50
Varieties selected (%) Varieties selected (%)
Fig. 5.3. Comparison of selection methods based on shared allele frequency (SAF) or random selec-
tion (RS) for identifying members of a core collection in rice. Proportion of RFLP (A) and SSR (B) alleles
detected in US and World collections based on SAF or RS. Modified from Xu et al. (2004).
the world collection, which was more diverse (iv) more samples were needed to adequately
than the US collection, which contained more represent genetic diversity if highly polymor-
pedigree-related cultivars; (ii) combining the phic markers were used (e.g. SSRs versus
use of SAF and unique alleles improved RFLPs).
the representativeness of the core collection; The core collection concept has aroused
(iii) core collections selected by SAF required considerable worldwide interest and debate
fewer samples than random selection for within the plant germplasm resources com-
the same level of representativeness; and munity. It has been welcomed as a way of
166 Chapter 5
making existing collections more accessible using carefully designed sets of molecular
through the development of a small group markers known to target specific traits or
of accessions that would be the focus of regions of the genome. The construction
evaluation and use and provide an entry of core collections using these approaches
point to the large collections that it aims may help establish heterotic groups from
to represent. However, a concern that still which parents can be chosen to establish
remains is that the available knowledge base populations for breeding hybrid crops.
regarding genetic diversity in any crop is
insufficient to enable a meaningful core to
be developed and that the most useful char-
acters often occur at such a low frequency 5.4 Maintenance, Rejuvenation
that they would be omitted from any small and Multiplication
core collection. Other concerns regard-
ing core collections include rendering the The main task of a germplasm bank is to
reserve collection more vulnerable to loss, conserve germplasm in a state in which it
the lack of representation of rare, endemic can be indefinitely propagated without loss
alleles and a poor relationship with the spe- of genetic diversity or integrity. In general,
cific needs of users (Gepts, 2006). the term base collection is applied to col-
When molecular markers are developed lections stored under long-term conditions,
from DNA sequences with unknown or no whereas the term active collection is used
function, identical marker alleles among for collections stored under medium-term
collections may not necessarily mean that conditions and working collection refers
these collections share identical functional to breeders collections usually stored under
alleles linked to the marker locus. Genetic short-term conditions. Monitoring the
variation for important phenotypic traits health of collections, particularly of field
could be lost if core collections are based genebanks, assessment of accession viabil-
solely on the use of such anonymous DNA ity and rejuvenation and multiplication of
markers. As the genome sequence is deci- collections are essential housekeeping func-
phered and the function of many genes is tions. For most crops with seed as the germ-
determined, gene-specific markers with plasm carrier, maintenance, rejuvenation
identified functional nucleotide polymor- and multiplication processes have been well
phisms (FNPs) will become available for established. This section will focus on prob-
many genes. Core collections of germplasm lem crops and the methodologies that will
constructed using FNPs could be assembled become increasingly important in the field.
to represent a core collection of genes.
As gene structurefunction relationships
are clarified with greater precision, it will
be possible to focus attention on genetic 5.4.1 In vitro storage techniques
diversity within the active sites of a struc-
tural gene or within key promoter regions. In the late 1970s and early 1980s, tissue cul-
This will make it productive to screen large ture or in vitro culture techniques had begun
germplasm collections for FNPs, targeting to make an impact in plant physiological
the search for alleles that are likely to be studies, vegetative propagation, disease erad-
phenotypically relevant at specific loci. ication and genetic manipulation. In vitro
From a primary collection, a user who had storage techniques were then recognized as
identified an accession or accessions of a way of conserving the genetic resources of
interest would move to the next level of problem crops and also of providing a con-
information where clusters of germplasm servation method for the emerging field of
known to represent a broader spectrum of plant biotechnology. Figure 5.4 provides a
diversity within a specific gene pool, or a flowchart for plant tissue culture from vari-
specific trait, could be defined. The second ous tissues to generate plantlets. A common
level of investigation could be conducted factor involved in plant tissue culture is the
Plant Genetic Resources 167
Meristem
Meristem culture
Meristem
or shoot
apex growth
Shoot tip culture Shoot tip culture
Shoot apex auxiliary branching multiple branching
There are some examples of the appli- tures (meristems, shoot tips and embryos)
cation of in vitro storage techniques. are more stable than non-organized cultures
Techniques have been developed for the (protoplasts, suspensions and calli). Thus,
collection of species that produce recalci- organized cultures have a better likelihood
trant seeds and for vegetatively propagated of retaining their genetic integrity during
material, which enable a collector to intro- prolonged culture in vitro. There is a need
duce the material in vitro, under aseptic to develop and utilize storage methods that
conditions, directly in the field (Withers, reduce the maintenance requirements of
1995). This approach will allow germplasm plant cultures, while maintaining genetic,
collection to be made in remote areas (e.g. in biochemical and phenotypic stability. For
the case of highly recalcitrant cacao seeds) short- to medium-term maintenance, cell
or when the transport of the collected fruits and tissue cultures may benefit from some
would become prohibitively expensive (e.g. form of growth reduction, but for long-term
collecting coconut germplasm). Also in maintenance, growth suspension must be
cases where the target species does not have recommended.
seeds or other storage organs to be collected In vitro culture requires strict control of
or when budwood would quickly lose via- environmental conditions such as medium
bility or is contaminated, establishment constituents and often such conditions
of aseptic culture in the field will facili- are not immediately applicable to a wide
tate collection and improve its efficiency range of species or even to every selection
(Engelmann and Engels, 2002). within a species. Thus, culture conditions
The disadvantages of in vitro main- often need to be developed for each particu-
tenance are relatively high inputs of time lar species, subspecies or even culture of
and labour for culture establishment and interest. The International Board for Plant
maintenance, potential losses due to con- Genetic Resources (IBPGR, now known as
tamination or mislabelling, risk of microbial Biodiversity International) published gen-
infection at each subculture, the cumulative eral recommendations for in vitro storage,
risk of somaclonal variation with time and including recommendations on the design
accidental loss through equipment failure. and operation of culture facilities (IBPGR,
With certain tissue cultures, the morphoge- 1986). Extensive research has been car-
netic potential of the cultures may decline ried out to reduce growth rates by reduc-
after growth under in vitro conditions for an ing the components in the culture medium
extended period of time. Prolonged main- and modifying the physical environment.
tenance of de-differentiated plant cells Modification of the gaseous environment
and tissues in vitro by repeated subcul- by mineral oil overlay or control of the gas
ture is expensive, time consuming, labour balance can retard growth. However, the
intensive and often results in a reduction most practical and effective slow growth
in morphogenic or biosynthetic capacity methods to date involve reducing the cul-
and changes in genetic, chromosomal or ture temperature and/or adding osmotic
genomic composition, such as mutations, retardants to the culture medium. Cultures
aneuploidy and polyploidy. Somaclonal of many species can be maintained in this
variation resulting from subculture may way for 6 months2 years without the need
manifest itself at the molecular, biochemi- for subculturing.
cal or phenotypic level. However, its extent
can be limited by controlling environmen-
tal factors such as medium formulation and
subculture interval, but cannot be elimi- 5.4.2 Cryopreservation
nated with certainty unless metabolism
is suspended. The type of explant and the Efforts have been made to reduce or elim-
preservation method can have a significant inate the risks discussed above by the
impact on survival and the extent of soma- development of slow growth methods for
clonal variation. In general, organized cul- medium-term storage and cryopreservation
Plant Genetic Resources 169
to envisage a role for the discrete collec- technologies, similar to stored DNA. Pollen
tion of genes governing particular traits. storage would also be a useful adjunct for
Furthermore, as the barriers between gene germplasm preservation of lines developed
pools are reduced through biotechnology for breeding programmes; however, cyto-
thereby facilitating the selective transfer of plasmic genes may not be conserved. Pollen
genes, relevant materials stored in the form storage can preserve genes, but may not pre-
of DNA may not necessarily originate from serve desirable gene combinations.
the target crops own gene pool.
Plant preservation initiatives have
by necessity, focused on the conservation
of species and landraces of international 5.4.4 Rejuvenation and multiplication
agricultural importance and to a lesser
extent, endangered and threatened spe- The loss of the germination capacity of
cies. Conversely, little effort has been made stored seeds necessitates their periodic reju-
to systematically collect and preserve the venation. As seed ages, before losing germi-
increasing number of genotypes being nation capacity, mutations increase and if
developed with biotechnological applica- rejuvenation of the material does not take
tions (Owen, 1996). Since these elite cul- place within a particular period of time,
tures are being maintained by individual the genetic structure of the population can
researchers or laboratories, they are in dan- vary. The multiplication site should have
ger of being lost. Owen (1996) highlighted ecological characteristics similar to those
several methods for the maintenance and where the material was collected in order
storage of plant germplasm, with particular to prevent selection that can change the
emphasis on those techniques most applica- allelic frequencies, even eliminating those
ble to the preservation of elite cultures used alleles most sensitive to certain soilclimate
in biotechnology and the plants derived factors.
from them. Tissue culture can be applied to mass
Somatic and zygotic embryos have been produce carbon copies of a selected (elite)
suggested as useful propagules for preser- plant whose agronomic characteristics are
vation. Somatic embryogenesis involves known (Fig. 5.4). It allows propagation of
adventitious propagation and produces plant material with high multiplication
cultures that are convenient to handle and rates in an aseptic environment. Since the
amenable to some technologies such as arti- 1970s, in vitro propagation techniques,
ficial seed production, which can be linked mainly based on micropropagation and
to storage by cryopreservation. Preservation somatic embryogenesis, have been exten-
of some recalcitrant species has been made sively developed and applied to thousands
possible by the observation that excised of different species.
embryos behave in an orthodox manner and As indicated previously, in vitro propa-
can be cryopreserved. Research has also gation techniques can be used for rapid
been conducted to determine the feasibil- clonal multiplication of germplasm in the
ity of using desiccated somatic embryos or vegetative form and also for other materi-
encapsulated somatic embryos. Preservation als such as recalcitrant seeds which are
of these synthetic seeds would be espe- often available in relatively small numbers.
cially useful for the preservation of clonal Preference is generally given to propagation
lines; however, more research is needed to methods that confer the lowest risk of soma-
elucidate how to increase viability after dry- clonal variation in culture, such as shoot
ing and to inhibit precocious germination. or meristem cultures reproduced by non-
The total genetic information of a adventitious means. However, other factors
plant can be readily isolated and DNA seg- must be included in the total equation; these
ments can be stored in lyophilized form. include availability of the preferred propa-
Thus, it would be a useful method for the gation technique, ease and rate of multipli-
storage of genes of interest to gene transfer cation and amenability to storage.
Plant Genetic Resources 171
Plants lose many of their distinguish- cific combining ability; and (iv) compari-
ing phenotypic characteristics when trans- son of genetic diversity among different
ferred to in vitro culture. Therefore, accurate groups of maize germplasm. Taking maize
records and vigilance are essential to ensure as an example, some applications in these
that genetic integrity is preserved. The risk areas can be found in Melchinger (1999),
of somaclonal variation can be reduced by Warburton et al. (2002), Betrn et al. (2003),
monitoring, but frequent regeneration of Reif et al. (2004), Xia, X.C. et al. (2005) and
plants from stored cultures and monitoring Lu et al. (2009). Such studies have provided
them in the field is costly and inefficient. useful information for genebank curation,
Techniques are needed that can be applied gene identification and breeding.
easily and economically to cultured mate- Understanding the range of diversity
rial. Visual monitoring in vitro will only and the genetic structure of gene pools is
detect the most gross of variants, e.g. vari- critical for the effective management and
egated leaves or extreme dwarfism. More use of germplasm resources. The first ques-
accurate, wide-ranging and reliable moni- tion to ask might be about the distinctive-
toring may be achieved by biochemical ness of the concerned entities since the
methods and molecular marker techniques. issue of what level of diversity we should
Minute changes at the DNA level can be actually try to maintain is still under debate.
detected with molecular markers that pro- Some have argued that highly unique enti-
vide an ideal means of determining genetic ties should be given preference over equally
integrity. It may become possible to detect rare taxa with close relatives of abundant
undesirable variants in an overnight proce- distribution (Vane-Wright et al., 1991) while
dure at the culture stage, thereby eliminating others argue that evolutionary potential
the need for establishing in vitro propagated is highest in species-rich groups since the
plantlets in the field for monitoring. ability to adapt is seemingly greater (Erwin,
1991). On the other hand, the importance
of species versus subspecies, hybrids and
populations has generated considerable
5.5.3 Genetic diversity debate about the scientific legitimacy of
legal conservation units (OBrien and Mayr,
New genetic technologies, especially large- 1991). Therefore, as indicated by Hahn and
scale DNA sequencing (Chapter 3), have Grifo (1996), the first measures to be taken
led to the development of molecular sys- with molecular methods are taxon-specific
tematics and new methods of measuring markers and estimation of the degree of dif-
genetic similarity and divergence in plant ferentiation between units.
species and populations. It is now possi- Diversity studies are generally under-
ble to compare organisms from the genome taken using molecular markers that are
level (using for example, fluorescent in situ assumed to be neutral, that is, not within
hybridization or FISH) down to the level of expressed regions of the DNA. The correla-
single nucleotides (DNA sequencing and tion between molecular variation and quanti-
SNPs). Molecular markers have been used tative variation in expressed traits has rarely
for genetic diversity studies for the follow- been studied in detail but is an issue that must
ing purposes: (i) examination of genotype be addressed if studies in genetic diversity
frequencies for deviations at individual loci are to be used more effectively in biodiver-
and characterization of molecular variation sity assessment and conservation (Butlin and
within or between populations; (ii) con- Tregenta, 1998). Across a large genome, such
struction of phylogenetic trees or classi- as that of maize, diversity can accumulate so
fication of germplasm accessions based on that 150 million sites are commonly poly-
genetic distance and determination of heter- morphic. A small but important proportion
otic groups for hybrid crops; (iii) analysis of of these polymorphisms is responsible for
the correlation between the genetic distance the complex variation in phenotypic traits.
and hybrid performance, heterosis and spe- Molecular markers have increased our under-
Plant Genetic Resources 175
standing of the spatial and temporal patterns adapted and wild related species contain
of genetic variation and of the evolutionary untapped sources of new alleles for future
mechanisms that generate and maintain vari- crop breeding improvement (Tanksley and
ation. However, the direct benefit of these McCouch, 1997).
data to either practical biodiversity conserva-
tion or germplasm collection management is Factors impacting genetic diversity
equivocal (Harris, 1999).
Several past studies have highlighted The extent of polymorphism differs sub-
the decline of genetic diversity in modern stantially between species and sampled
cultivars compared to landraces or wild loci. In a comprehensive study of variation
relatives. In maize, for example, Liu et al. within a maize chromosome, the diver-
(2003) evaluated the genetic diversity among sity at 21 loci varied by 16-fold (Tenaillon
260 diverse maize inbred lines with 94 SSR et al., 2001). The variation between loci may
markers and found that tropical and sub- partly reflect sampling effects but selec-
tropical inbreds contain a greater number tion and other factors play a more impor-
of alleles and gene diversity than temperate tant role (Table 5.2). Although many factors
inbreds. It was also found that maize inbreds influence diversity, the neutral theory of
capture less than 80% of the alleles seen evolution suggests that the level of poly-
in the landraces, suggesting that landraces morphism (q) should be the product of the
can provide substantial additional genetic effective population size (Ne) and the muta-
diversity for maize breeding. After analys- tion rate (m) with q = 4 Nem (Kimura, 1969).
ing over 100 maize inbred lines and teosinte Unfortunately, there is little empirical
accessions with 462 SSRs, Vigouroux et al. proof of this in plants. Background selec-
(2005) concluded that many alleles in the tion is likely to be one of the major factors
progenitor species of maize (teosinte) are determining nucleotide diversity and it
not present in maize. Wright et al. (2005) suggests that diversity should be shaped
compared SNP diversity between maize by recombination at the intragenomic scale
and teosinte in 774 genes and concluded and by the outcrossing rate at the species
that maize accessions had much less genetic level. Strong selection pressure is impor-
diversity consistent with products of artifi- tant in decreasing the nucleotide diversity
cial selection and crop improvement. These of some plant species. During the selec-
reports in maize along with genetic mapping tion of advantageous phenotypes, some
studies involving wild relatives in other crops appear to have passed through bottle-
crops, support earlier conclusions that non- necks that substantially reduced diversity
Table 5.2. Factors that impact nucleotide diversity (reprinted from Buckler and Thornsberry (2002) with
permission from Elsevier).
(Doebley, 1992). Balancing selection and/ information content of the sample accession
or frequency-dependent selection may also (Brown and Brubaker, 2002).
play an important role in increasing diver- When genetic marker data can be inter-
sity at specific loci within a genome. In preted by a locus/allele model, allelic diver-
these selection regimes, selection favours sity can be described by: (i) the percentage
the maintenance of multiple alleles with of polymorphic loci, calculated by divid-
different effects over evolutionary time. ing the number of polymorphic loci by the
total number of loci assayed; (ii) the mean
Measurement of diversity number of alleles per locus, calculated by
dividing the total number of alleles detected
The estimation of genetic similarity is vital by the number of loci assayed; (iii) total
to the formulation of optimal germplasm gene diversity or average expected hetero-
management strategies and lies at the core of zygosity (Nei, 1973; Brown and Weir, 1983),
modern plant systematics and evolutionary calculated by
biology. Plant systematicists and evolution-
ary geneticists have developed techniques m
Table 5.3. Dissimilarity coefficients d for allelic informative marker data. pij and qij are allelic frequencies
of the jth allele at the ith locus in the two operational taxonomic units consideration, ni is the number of
alleles at the ith locus, and m refers to the number of loci.
m ni
dE
(p
i =1 j =1
ij q ij )2 Euclidean 0, 2m
m ni
2 (p
dR 1 1 Rogers (1972) 0,1
ij qij ) 2
m i =1 j =1
m ni
(p
dW 1 Modified Rogers 0,1
ij qij )2
2m i =1 j =1
m ni
1
dCE (1 pij qij ) Cavalli-Sforza and Edwards (1967) 0,1
m i =1 j =1
m ni
p q ij ij
p q
i =1 j =1
2
ij
i =1 j =1
2
ij
m ni
(1
dN83 1 Nei et al. (1983) 0,1
pij qij )
m i =1 j =1
m ni ni
2 (p (p
1 1 2
ij qij )2 2
ij + qij2 )
2(2n 1)
i =1 j =1 j =1
q= m nj
(1 p q )
i =1 j =1
ij ij
178 Chapter 5
being analysed using similarity measures utilization strategies may change tangibly.
derived decades ago. Similarity measures For example, since at least the mid-1980s,
and classification methods are needed spe- maize evolutionists in general have accepted
cifically for handling molecular marker data the tripartite hypothesis of Mangelsdorf
from polyploid species. (originating in the 1930s and reviewed in
Mangelsdorf (1974) ). This hypothesis pos-
Phylogenetics tulated that maize evolved directly from
an undiscovered wild maize and that teo-
One of the most important roles of genetic sinte was derived from a hybrid between
markers in plant germplasm management maize and Tripsacum species. During that
is in the elucidation of the systematic rela- period, substantial resources (relative to
tionships within genera, tribes and families those devoted to similar programmes with
and obtaining characteristic genetic profiles teosinte) were allocated to improving maize
of germplasm. Using the similarity meas- with introgressed Tripsacum germplasm
ures and classification methods described (Galinat, 1977). To summarize, a clear under-
above, genetic markers of all types have standing of the systematic relationships
been instrumental in characterizing system- among a crop and its wild relatives is vital
atic and evolutionary genetic relationships for sound genetic resource management and
and in establishing a germplasms taxo- for crop improvement as a whole.
nomic identity which will probably change Taxonomic relationships have been
how the germplasm accessions are managed re-evaluated for many crop plants by using
and utilized. As indicated by Bretting and molecular markers and genomic sequences
Widrlechner (1995), clarifying evolutionary that cover some part of the genome for spe-
relationships among intermediate taxa may cific traits, attempting to replace the clas-
challenge the germplasm managers judge- sical morphological survey with a point
ment and acuity. Molecular taxonomy will survey using data obtained from one or
substantially improve our knowledge of the more marker or sequence loci. For example,
primary, secondary and tertiary gene pools studies of the genetic architecture of key
of many crops and evolutionary studies will yield-related components (e.g. flower and
help identify crop ancestors, past genetic seed production, maturity and photoperiod
bottlenecks and opportunities for introduc- response) will enable us to focus on areas
ing useful variation. It is particularly vital of the genome where diversity is particu-
for germplasm management purposes to larly important for this trait (Hodgkin and
discriminate recently synthesized, naturally Ramanatha Rao, 2002). Phylogenetic stud-
occurring F1 hybrids and/or hybrid deriva- ies provide a fundamental gain in genetic
tives from taxonomically intermediate taxa knowledge not only to prove that two indi-
originating from convergent-parallel evolu- viduals or gene copies differ but also to
tion, clonal variation, recombinational spe- place them in a hierarchy of relationships
ciation and/or the retention of intermediate based on the timing of a shared ancestor.
ancestral traits (where the latter includes Phylogenetic diversity can be also
the phenomenon known as lineage sorting; estimated by whole genome analysis and
Avise, 1986). genome-scale phylogenetic trees can be
Supraspecific systematic relationships created. Such genome-trees can be built
are best elucidated by phylogenetic meth- based on gene content, gene order, evolu-
ods. These methods can sometimes help tionary distances between orthologues and
estimate phylogenetic relationships among concatenated alignments of orthologous
crops and related taxa and accordingly, may protein sequences (see Wolf et al. (2003)
help determine whether a weedy crop rela- for a review). Both the initial results and
tive is a crop progenitor or a feral crop deriv- the general notion that using genome-wide
ative. As the exact systematic relationships information helps enhance the phyloge-
among a crop and its relatives are better netic signal suggest that the future belongs
understood, germplasm conservation and to these approaches.
Plant Genetic Resources 181
5.5.4 Collection redundancies and gaps they are isolines, with Calrose 76 repre-
senting a variant derived from Calrose via
As a large number of germplasm accessions chemical mutagenesis. Using 15 SSR mark-
are available for each cultivated plant, ers, Dean et al. (1999) assayed 19 sorghum
many likely represent duplicate or nearly (Sorghum bicolor (L.) Moench) accessions
identical samples of the same cultivar identified as Orange currently main-
while others embody those with rare alle- tained by the US National Plant Germplasm
les or highly unusual allele combinations System (NPGS). They found that most acces-
or those where many of their genes or alle- sions were genetically distinct, but two
les are underrepresented in current collec- redundant groups were found. The variance
tions. Molecular technology will help us to analysis also indicated that it should be pos-
understand the genetic structure of exist- sible to reduce the number of Orange acces-
ing collections and to design appropriate sions held by NPGS by almost half without
acquisition strategies. In particular, genetic seriously jeopardizing the overall genetic
distance can be calculated as described variation contained in these holdings.
previously to identify particularly diver- Germplasm collections can be compared
gent subpopulations that might harbour for the frequencies of alleles at all genetic loci
valuable genetic variation complementary so that distinctive alleles, allele combinations
to that in current holdings. and allele frequency patterns can be identi-
Germplasm redundancy exists in many fied for a given population. Chromosomal
germplasm collections due to the differ- regions containing loci that show the greatest
ent names given to the same cultivars or changes in allele frequency between the col-
duplicate samplings of the same accessions. lections can be located. The rationale for this
Duplication of germplasm among collections analysis is to define genomic regions where
is substantial. Eliminating this type of dupli- selection gave rise to allele combinations or
cation has often been suggested as a way of allele frequency patterns that distinguish
reducing the costs associated with the oper- a group of accessions with less diversity
ation of genebanks. Lyman (1984) estimated from those in a more diverse group. Alleles
that at least 50% of the germplasm held con- originally found in ancestral cultivars or the
sists of duplicated accessions. The Food and wild relatives may be gradually lost through
Agriculture Organization (FAO) (1998) esti- domestication and breeding. Modern breed-
mates that of the 6 million accessions stored ing programmes generally rely on a small
worldwide, only between 1 and 2 million number of superior accessions which results
are unique. On the other hand, it has been in genetic uniformity and loss of diverse alle-
recognized that all germplasm should be les that could be important to future breeding
backed up in at least two different sites to programmes. Valuable lost genes or alleles
avoid complete loss of the collections at any can be recovered by going back to the ances-
given site. Pedigree-related cultivars, sib- tors or wild relatives of our crop species.
ling lines and early isogenic lines may rep- Using 47 SSR markers, Christiansen et al.
resent another type of redundancy because (2002) determined the variation of genetic
they are genotypically duplicated at most diversity in 75 Nordic spring wheat cultivars
genetic loci. For example, US rice cultivars bred during the 20th century. They found
M5, M301, M103, S201, Calrose, Calrose 76, that some alleles were lost during the first
CS-M3 and Calmochi-202 shared the same quarter of the century whereas several new
panel of alleles at all of the 100 RFLP loci alleles were introduced in the Nordic spring
surveyed. Each of these cultivars can be wheat material during the second quarter of
traced back to a common ancestor, Caloro. In the century.
addition, no genetic polymorphism could be The allele frequencies at 100 RFLP and
detected at another 60 loci between Calrose 60 SSR loci in rice were compared between
and Calrose 76 when a more polymorphic the US and world collections and between
marker type, SSR, was used (Xu, Y. et al., the two major types, indica and japonica,
2004). This is probably due to the fact that within the world collection (Xu et al.,
182 Chapter 5
2004). Among 34 alleles at 20 RFLP and 14 eventual loss of alleles from the population.
SSR loci that were found most frequently The change in allele frequency in one gen-
(allele frequencies are 20.459.5%) in the eration for diploid organisms can be quan-
world collection, three of them were com- tified by q(1 q)/2Ne, where q represents
pletely lost and 31 of them were underrep- the frequency of allele and Ne the effective
resented (less than 5%) in the US collection population size (Falconer, 1981). Thus, the
(japonica) while some of them were also effective population size determines the
lost or underrepresented in indica types. extent of genetic drift. Effective popula-
As examples, the lost alleles and the under- tion sizes are generally smaller than actual
represented alleles with frequencies of less population sizes because of unequal num-
than 2% are listed in Table 5.4. Selection bers of females and males, overlapping gen-
against these alleles is clear and stems from erations, non-random mating, differential
the fact that modern US rice cultivars have fertility and fluctuations in population size
been developed from a small set of germ- (Falconer, 1981; Barrett and Kohn, 1991).
plasm introductions. Genetic drift may be controlled by adjusting
the size of the regenerating population or
developing improved regeneration methods
(Engels and Visser, 2003). Genetic drift can
5.5.5 Genetic drifts/shifts and gene flow be measured using molecular markers that
are neutral and co-dominant because of the
To generate stocks for distribution or to random nature of the process.
maintain the seed viability, a variable The genetic composition of populations
accession needs to be regenerated regu- may also change during regeneration due to
larly. During this process there is a risk that selection. Selection is different from genetic
the genetic integrity of the accession will drift because it does not affect all loci simul-
be compromised by genetic drift, selec- taneously and usually occurs towards par-
tion or gene flow (Sackville Hamilton and ticular genotypes or loci. Selection during
Chorlton, 1997). Genetic drift is a stochas- regeneration can be inferred from strong
tic phenomenon of fluctuations in allele shifts in marker allele frequencies for certain
frequencies in the offspring deviating from loci between parental and offspring popula-
the parental population which may result tions (Spooner et al., 2005). Maintaining
in random fluctuations in allele frequen- genetic diversity and preventing genetic
cies from generation to generation or the shifts are important objectives for germplasm
Table 5.4. SSR and RFLP alleles lost or underrepresented in the US collection but most frequent in the
world collection (the markers designated with the RM prefix are SSRs and others are RFLPs) (selected
from Xu et al., 2004).
assimilation (Para et al., 2005) or a combina- alleles and further phenotypic characteri-
tion of both at different times and in different zation may determine whether these alle-
locations. Molecular marker analysis can be les will be important to our future breeding
used to monitor gene flow among cultivars programmes. The germplasm that holds
developed in a long history of plant breed- unique alleles may contain unique genetic
ing (vertical flow), among pedigree-related variation required for trait improvement.
cultivars developed in a relatively short For example, 15 (6.4%) of the 236 rice
period (horizontal flow) and among culti- accessions examined by Xu et al. (2004)
vated species and weeds. Tracing specific contained unique alleles (those present in
alleles or genes has become an important only one of the cultivars) for at least one
objective in parentage control and cultivar RFLP locus and 81 (34.3%) rice accessions
identification which has been used to pro- had unique SSR alleles. The germplasm
tect the rights of breeders. Further discus- accessions identified as having unique
sion on gene flow between transgenic plants alleles also had unusual geographic origins
and weeds can be found in Chapter 12. with high genetic diversity and could have
potential use in the exploitation of hetero-
sis and novel alleles for agronomic traits.
5.5.6 Unique germplasm The degree of genetic similarity between
any two cultivars can be calculated as the
To broaden the genetic base of specific cul- proportion of shared alleles. The most simi-
tivated species, the genetic diversity within lar accessions share alleles at almost all
collections must be assessed in the context marker loci while the least similar acces-
of the total available genetic diversity for sions have few or no alleles in common.
each species. With the use of DNA profiles, When evaluating genetic similarity, shared
the genetic uniqueness of each accession in allele frequencies (SAFs) can be averaged
a germplasm collection or in a population over all possible pairs of cultivars in a sam-
can be determined and the identity and fre- ple. A smaller average similarity indicates
quency of individual alleles can be clearly a greater genetic difference with respect
described and characterized (Brown and to the rest of the cultivars in the collec-
Kresovich, 1996; Smith and Helentjaris, tion. Based on the averaged SAF, the most
1996; Lu et al., 2009). A DNA bank can be diverse accessions can be selected to rep-
developed to undertake allele mining for resent cultivars that host the least-frequent
identifying unique germplasm containing alleles and are genotypically most differ-
novel alleles and allele combinations. ent from other accessions. From 236 rice
The sampling of exotic germplasm cultivars, Xu et al. (2004) selected the 16
should emphasize the genetic composi- most diverse accessions (with SAF < 50%)
tion rather than the appearance of some- based on RFLP markers and 49 accessions
thing very different. Accessions with based on SSR markers. Most of these selec-
DNA profiles most distinct from that of tions, such as Caloro, Cina, Badkalamkati,
modern germplasm are likely to contain DGWG and TN1, were ancestral cultivars
the greatest number of novel alleles (dif- that had been used as parents in breeding
ferent from those already present in the programmes more than 40 years previously;
elite gene pool). Marker analysis could none of the selections includes lines from
be used to identify accessions harbouring the US collection which has a much nar-
rare or novel alleles so that the functional rower genetic basis.
significance of the resident genes can be Genetic mapping studies involving
determined using both traditional crossing interspecific crosses have identified novel/
and sequence-based genomics approaches. superior alleles originating from pheno-
Considering the allele frequency profiles typically unfavorable distant relatives that
across all cultivars and germplasm acces- enhance the performance of modern culti-
sions will give us some idea of which germ- vars (Xiao et al., 1998; Moncada et al., 2001;
plasm may retain or contain the rare genes/ Brondani et al., 2002; Nguyen et al., 2003;
Plant Genetic Resources 185
Thomson et al., 2003). These novel alleles infancy currently facing the fundamental
are present in germplasm collections but challenge to establish which of the various
have not been previously identified because alleles present is functionally different from
they are hidden in the inferior phenotype. the wild type and where possible to identify
The valuable alleles identified from Oryza which new alleles beneficially influence
rufipogon that increase yield in commercial the target trait. Methods to ascertain allele
cultivars have been used to improve the function include marker-assisted back-
best hybrids that have been commercial- crossing (MABC), transformation, transient
ized in China since before 1989 (Xiao et al., expression assays and association analysis
1998) and new hybrids containing these using an independent set of germplasm
O. rufipogon introgressions demonstrate for association mapping from that used to
more than a 30% yield advantage over previ- identify the original allele. As more of these
ous Chinese hybrids (Yuan, 2002). The novel studies are carried out, it is hoped that the
genes/alleles identified from a germplasm growing database of comparisons between
collection can also be utilized in transfor- sequence variation and phenotype will
mation experiments by one of the methods allow bioinformaticians to identify patterns
described in Chapter 12 if sexual transfer is that can form the basis of future predictive
impossible or too slow, especially during methods. The current rate limiting factor for
the phase of testing new genes and alleles in the effective use of outputs from allele min-
a common genetic background. Utilization ing in breeding programmes is that there
of genetic resources in plant breeding is the is insufficient information on the relation-
major task of plant breeders, a topic which ship between SNP variation and changes in
is discussed in detail in Chapters 7 and 9. phenotypes that may be useful for breeders.
However, the resources and tools neces-
sary to perform in silico trait targeted selec-
5.5.7 Allele mining tion of the outputs from allele mining are
becoming available. Thus, proof-of-concept
There are several options for identifying projects are now being carried out in model
or capturing diversity that might not exist organisms in order to study the relation-
in the germplasm pool of existing breeding ships between SNP haplotypes and changes
lines: allele mining, transformation, muta- in phenotypes. This has already led to the
tion breeding, use of landraces or synthetic development of predictive tools that can
polyploids and wide crossing (Able et al., identify those SNPs with a high probabil-
2008). Allele mining, which is important ity of conferring deleterious phenotypes.
for utilizing novel alleles hidden in genetic However, the next big step in this area is
diversity, will be discussed here. the development of bioinformatics tools
Molecular and functional diversity to compare sequence variation with pro-
of crops genomes can be characterized by tein and functional domain variation or
allele mining, identification of distinct hap- with public databases including associated
lotypes for different inbred lines, single phenotype data, in order to predict which
feature polymorphism (SFP) analysis, dis- sub-selections of SNP haplotype variants
covery of nearly identical paralogues (NIPs; have the maximum likelihood of providing
Emrich et al., 2007) and determination of beneficial phenotypic variation in the target
their evolutionary implications. In general, trait. It is likely that SNPs in promoter and
there are two approaches that have been non-coding regions will also be important
elaborated for allele mining: re-sequencing for predictive phenotype analysis.
(e.g. Huang et al., 2009) and EcoTILLING The same methodology used in asso-
(discussed in Chapter 11) (Comai et al., ciation mapping may also be used for
2004). Whole genome genotyping using allele mining of the diverse core subsets
gene-based markers can be used as the foun- of germplasm being created from breeders
dation of the re-sequencing method. Allele lines, genebank accessions and wild rela-
mining from germplasm collections is in its tives. Once a gene of interest is positively
186 Chapter 5
typical plants or the plants developed by 5.6.2 Tissue culture and transformation
breeders. They may result from mechanical in germplasm enhancement
mixtures, outcrossing, mutation or residual
genetic variation. From the point of view A new level of enthusiasm and activity in
of germplasm conservation, the off-types plant cell culture research developed in the
could be mixed with the real type and when early 1970s with reports of plant cell lines
the proportion of the off-types is sufficiently resistant to amino acid analogues, nucle-
high within accessions, they can dominate otide analogues, antibiotics and plant
the collection and cannot be differentiated pathogen toxins. The real excitement of
from the typical plants. From the point of these reports from plant breeding was the
view of germplasm utilization, the presence potential to generate crop plant germplasm
of off-type plants will reduce the uniformity that expressed new sources of resistance
of the crop and thus reduce its productiv- to herbicides, plant pathogens and min-
ity and quality. Phenotypic off-types can be eral and salt stresses that could not be
easily rogued if there are not too many and obtained through conventional breeding
if they can be distinguished from the real methodologies. The manipulation of plant
type by phenotype. In addition to the off- cells, tissues and organs in vitro is produc-
types that are visible phenotypically, many ing an increasing number of unique clones
off-types are genetically different from the of industrial, biochemical, genotypic and
typical plants but are difficult to distinguish agronomic importance. Examples include
visually. The presence of genotypic off-types regenerable genotypes, transformants, hap-
may impose a more severe effect on germ- loids, polyploids, mutants, isogenic lines,
plasm and could be one of the reasons for somaclonal variants, somatic hybrids and
genetic drift and cryptic loss of germplasm secondary product-producing cultures. In
accessions. Both genotypic and phenotypic addition, a wide array of industrial chemi-
off-types may be exaggerated by multipli- cals is derived from plants, including fla-
cation. Molecular technology provides a vours, pigments, gums, resins, waxes, dyes,
powerful mechanism for distinguishing essential oils, edible oils, agrochemicals,
both the phenotypic and the genotypic off- enzymes, anaesthetics, analgesics, stimu-
types from the typical plants. Markertrait lants, sedatives, narcotics and anticancer
associations and high-resolution molecular agents. The ultimate strategy was to put
markers such as SSRs and SNPs could be the most advanced breeding germplasm
used to distinguish two plants with very into cell culture and obtain either by selec-
similar genetic backgrounds. With ten or tion or somaclonal variation a derived line
more co-dominant molecular markers for improved by the addition of one new trait.
example, breeders can identify distinct off- Advances in tissue culture and molec-
types from their breeding populations and ular biology have opened new avenues
hybrid seed bulks and obtain detailed geno- for the precise transfer of novel genes into
typic information such as the source of the crop plants from diverse biological sys-
off-type genotypes and proportion of the tems (plants, animals and microorganisms)
off-types to typical plants. A selection and which were previously not feasible. The
purification decision can then be made to development of efficient procedures for
refine the germplasm collection and breed- culture of somatic cells, pollen, protoplasts
ing materials. and for plant regeneration from a large
Heterogeneity existing in a germplasm number of plant species combined with a
collection reduces the potential of utiliza- broad-suite of tools, including improved
tion and reduces the interest of plant breed- DNA vector systems based on Ti and Ri
ers. The first step in genetic enhancement plasmids of Agrobacterium, direct DNA
for this kind of germplasm is purification by transfer methods, transposable elements,
selecting typical plants to obtain true-breed- series of promoters, marker genes and a
ing genotypes. This is extremely important large number of cloned genes, have made
for wild relatives of self-pollinated crops. gene transfer more precise and directed
188 Chapter 5
(for details see Chapter 12). As a result of adapted to Iowa, yield-enhancing genomic
these developments, transgenic plants have segments from an inbred line adapted to
been produced in many plant species with Texas by: (i) identifying the favourable seg-
foreign genes inserted for a wide range of ments through yield trials coordinated with
traits. These developments have resulted molecular (RFLP and isozyme) marker gen-
in a large number of germplasm/cultivars otyping; and (ii) transferring into the Iowa
with enhanced agronomic traits. line, with the help of molecular marker
genotyping, only the favourable segments
from the Texas line. Although favourable
5.6.3 Gene introgression in germplasm segments were identified in field trials
enhancement conducted in the diverse environments
of North Carolina, Iowa and Illinois, both
Up until now the primitive cultivars and the recipient and the donor lines could be
related wild populations have been a fruit- considered somewhat alien to the primary
ful and sometimes the sole, source of genes breeding site for this programme in North
for pest and disease resistance, adaptations Carolina. Nevertheless, these two examples
to difficult environments and other agricul- do represent cases in which genetic markers
tural traits. The proportions of unadapted apparently facilitated yield enhancement
and adapted genomes and/or genotypes successfully. With numerous genes identi-
persisting in enhanced germplasm will dif- fied from wild relatives of crop species by
fer according to the particular goals of the molecular markers, marker-assisted gene
enhancement programme. Incorporation introgression will be used increasingly in
programmes seek to increase genetic diver- germplasm enhancement.
sity by maximizing the proportion of unad- The importance of the various base broad-
apted genome/genotypes that is retained. In ening procedures that are being explored
contrast, when introgressing adapted germ- at present should be emphasized. Genetic
plasm with high-value traits, only the requi- resources workers need to collaborate with
site high-value genes should be transferred. plant breeders in the development of proce-
Finally, yield enhancement efforts identify dures that allow effective testing of new mate-
whichever proportion of the unadapted and rials and their introduction into improvement
adapted genome/genotypes that optimizes programmes in a systematic manner. As indi-
the yield of the desired end product. cated by Hodgkin and Ramanatha Rao (2002),
Gene introgression from wild species both introgression and incorporation pro-
through molecular marker-assisted selec- grammes will be needed.
tion will be discussed in detail in Chapters
8 and 9. Only two examples that relate to
germplasm enhancement will be given here. 5.7 Information Management
Isozyme, RFLP and morphological markers
diagnostic for chromosomes of one of the
Information management has become
wild-weedy relatives of tomato facilitated
increasingly important in plant breeding and
efforts to introgress wild genomic segments
germplasm conservation as a large amount of
into elite tomato-breeding germplasm
data is accumulating. Since breeding-related
(DeVerna et al., 1987, 1990). As a result, the
information is described in detail in Chapter
elite germplasm received wild genomic seg-
14 only issues related to germplasm manage-
ments that improved horticulturally valu-
ment will be discussed in this section.
able traits to increase the yield (Rick, 1988).
Notably, as a result of this programme,
modern tomato cultivars may be more
genetically diverse than heirloom, vintage 5.7.1 Information system
cultivars (Williams and St Clair, 1993).
Stuber and Sisco (1991) and collabora- There are two areas in which the rapid
tors introgressed into a maize inbred line developments of the past few years have
Plant Genetic Resources 189
had a major effect on plant genetic resources tions. In addition, an increasing body of
work: molecular genetics as discussed in geo-referenced data has become available,
Chapters 2 and 3 and information technol- i.e. data associated with coordinate and
ogy. Information generated throughout germ- altitude information. These geo-referenced
plasm conservation activities must be stored data include both biological (e.g. landcover,
in an easily accessible form. Dissatisfaction cattle density) and non-biological (e.g. cli-
with the quantity, quality and availability of mate, topography, soil and human activity)
information on the accession level is the most data. The non-spatial attributes are any bio-
frequent concern expressed by genebank logical, including genetic, data associated
clients (Fowler and Hodgkin, 2004). The with the individual accessions collected.
information situation has improved in the Thus, GIS is a tool designed to visualize
last few years. The CGIAR System-wide and analyse spatial patterns in genetic data
Information Network for Genetic Resources in relation to ecological data; it is also a
(SINGER) provides access to data on the hypothesis-generating tool for investigat-
plant collections held in trust by the Future ing processes that shape genomes. With
Harvest Centres while the US Genetic a free mapping program, DIVA-GIS, we can
Resources Information Network (GRIN) DA create grid maps of the distribution of
system and the European EURISCO system biological diversity to identify hotspots
provide access to data on collections held and areas that have complementary levels
in the USA and Europe, respectively. The of diversity (Hijmans et al., 2001; http://
information revolution enables us to manage www.diva-gis.org/). Furthermore, informa-
and process the very large amounts of data tion generated by GIS analysis can help
generated in various areas and the Internet in conserving and using genetic diversity
potentially provides global access to that as effectively and efficiently as possible
data. Information management has always (Greene and Guarino, 1999; Jarvis et al.,
had a central place in plant genetic resources 2005).
conservation. The need to identify, record Examples of GIS applications, as
and communicate information about acces- summarized by Gepts (2006), include:
sions has led to the development of a sub- (i) study of isolation by distance and its
stantial infrastructure with relatively highly effect on the genetic structure of gene
developed database structures and informa- pools by comparing genetic and geo-
tion management systems. The information graphic distances; (ii) linking diversity
revolution will profoundly affect our under- and environmentally heterogeneity; (iii)
standing of the organisms we conserve. All determination of species distribution and
scientists involved in germplasm collection, areas of greatest diversity; (iv) identifica-
conservation and utilization would ben- tion of germplasm with specific adapta-
efit from professionally built, deployed and tion; (v) predicting the distribution of
maintained information resources for their species of interest and identifying new
favourite organisms. areas for germplasm exploration; (vi) plan-
Application of GIS technologies to the ning germplasm exploration trips by iden-
management of information on global plant tifying highly diverse areas, ecologically
diversity is one of the greatest achieve- dissimilar areas, under-conserved areas
ments made in plant genetic resources con- and areas containing threatened species,
servation since the late 1950s (Kresovich timing of the exploration and additions to
et al., 2002). A GIS system is a database passport data; (vii) designing zoning plans
management system that can simultane- for in situ conservation integrated with
ously handle digital spatial data and asso- socio-economic and indigenous knowl-
ciated non-spatial attribute data. Spatial or edge data; and (viii) establishment of core
location data are acquired via geographic collections (e.g. those based in part, on
positioning system devices which are now environmental variables such as length
quite inexpensive and have become part of of the growing season, photoperiod, soil
the obligatory equipment for field explora- types and moisture regimes).
190 Chapter 5
of SNP analysis is now about US$0.200.30 locus from many accessions on a single chip
per genotype, with a cost of only a few cents as described in Chapter 3. Consequently,
per genotype expected in the coming years the breadth of genetic information from
(Jenkins and Gibson, 2002). With well- thousands of DNA polymorphisms and the
established marker systems and sequenc- depth of phenotypic measurements hold
ing facilities, genotyping with SSR markers promise for identifying markertrait cor-
costs about US$0.300.80 per data point, relations through linkage disequilibrium-
depending on marker multiplexing and based association genetics. The current QTL
the number of markers genotyped for each cloning procedures are time consuming; for
sample (Xu et al., 2002). There are several example, in species that have two growing
ways to reduce the cost of genotyping. First, seasons per year, it may take 5 years to pro-
increasing the throughput using automated duce the population needed for fine-scale
genotyping and data-scoring systems can mapping. With thousands of genes evaluated
help increase the daily data output (Coburn for QTL effects, a more efficient approach is
et al., 2002). Secondly, the optimization of needed to complement map-based cloning.
marker systems, including facilities and This role may be fulfilled by the applica-
personnel, will result in less cost per data tion of association tests to naturally occur-
point. Now there are powerful new tech- ring populations (Buckler and Thornsberry,
niques for screening thousands of plants 2002). This process, which can be called
for sequence variation in any particular association mapping or linkage disequilib-
gene which is known to be of importance in rium mapping within a sample of known
a breeding programme. Detection of SNPs pedigree (described in detail in Chapter 6),
can reduce a collection of many thousands exploits related individuals that differ for
of accessions to some tens of plants with a particular trait to establish which region
sequence changes in the gene of interest. of the genome is associated with the phe-
These can then be screened for their pheno- notype among the population members. In
typic characteristics and used where appro- order to apply this method to mapping genes
priate (Peacock and Chaudhury, 2002). using a plant genetic resources collection,
Genomic research has helped establish the following prerequisite resources will be
an information flow from molecular mark- required: (i) a dense set of molecular mark-
ers to genetic maps to sequences to genes ers; (ii) passport and phenotypic data; (iii)
and to functional alleles. Apparently, how- information on population structure; and
ever, there is still a gap between sequence- (iv) a sample with contrasting genotypes for
based information targeting genes and the trait of interest (Kresovich et al., 2002).
alleles and breeding-related information For several reasons, there is great enthusi-
targeting germplasm, pedigrees and pheno- asm at present about the promise of linkage
types. Phenotypic evaluation still provides disequilibrium-based association studies
the foundation for the functional analysis of for uncovering the genetic components of
many genes even when a complete genomic complex traits in humans: dense SNP maps
sequence becomes available. Integration of across the genome, elegant high-throughput
breeding-related phenotypic evaluation genotyping techniques, simultaneous com-
with the high-throughput evaluation of parison of groups of loci, statistical meas-
mutants in a genomics context will hasten ures for assessing genome-wide significance
progress towards understanding the func- and phenotypic insights as the basis for
tions of all plant genes in the years ahead. comparative genomic studies among differ-
This information will enhance the efforts to ent human groups are all available. These
use MAGE effectively to achieve a substan- conditions have already been or will soon
tial increase in the efficiency of germplasm be satisfied in some plant species as well,
management in the years to come. and association studies have been reported
With new genotyping methods such as in many plant species including maize, rice,
the locus-specific microarray and resequenc- barley and Arabidopsis. These studies bring
ing, it is now possible to scan variation at a together the power of genomics with the
194 Chapter 5
richness of crop germplasm collections and availability of the genome sequences from
promise to provide new insights into the rice and other plants, accelerated efforts are
genetic bases of domestication and produc- underway to determine the function of all
tivity of our major food crops. genes in a plant. As gene structurefunction
In the age of genomics, new attention relationships are clarified with greater pre-
is being focused on the value of germplasm cision, it will be possible to focus atten-
resources, including whole plants, seeds, tion on genetic diversity within the active
plant parts, tissues and clones from distinct sites of a structural gene or within key pro-
species and synthetic germplasm and all moter regions. This will make it productive
types of mutants. The ultimate goal of germ- to screen large germplasm collections for
plasm conservation is to maintain diver- FNPs, targeting the search for alleles that
sity of the genes and gene combinations are phenotypically relevant and have high
(Xu and Luo, 2002). Information regarding breeding value.
germplasm resources is being increasingly Locating useful genes in collections
extracted from studies involving the rela- will require an integrated approach that
tionship between genome sequence and its brings together information from molecular
biological and evolutionary significance in studies and other areas. This might include
the context of genetic resources. This infor- using an extensive set of molecular markers
mation can be translated across species in a for diversity studies, analysis of the extent
comparative context and thus the effective of linkage disequilibrium and identifica-
management of germplasm resources today tion of areas of the genome where important
involves both the practical management genes may occur combined with more con-
of seeds, tissues, clones, cells and mutant ventional approaches using passport data,
stocks and the effective management of large GIS and collections (Kresovich et al., 2002).
reservoirs of electronic information that These techniques could together provide
helps us decipher the value and meaning an optimum set of candidate accessions
of the genetic information contained within for phenotypic and genetic analyses. In all
each germplasm accession. An important cases, efficient characterization methods
question will be whether the increased use will remain an essential component of any
of molecular tools in genebanks will make plant genetic resources studies.
genebanks into true banks of genes and Finally, the domestication stories of
whether, for example, associated sequence maize and tomato should provide a warning
data will be freely available. to all curators and users of genetic resources
A user who has identified an accession that major phenotypic differences between
or accessions of interest from a primary col- accessions do not always mean that there
lection would then move to the next level are equally extensive genetic differences. In
of information, where clusters of germ- addition, significant contributions to agro-
plasm known to represent a broader spec- nomically desirable traits may result from
trum of diversity within a specific gene pool the regulation, both spatial and temporal,
(a subspecies or ecotype within a species) of gene expression rather than from differ-
or a specific trait (resistance to diseases ences in amino acid sequences or protein
and pests) could be defined. The second structures. The challenge for curators now
level of investigation could be conducted is to interpret how this knowledge affects
using carefully designed sets of molecu- not only genetic resource conservation in
lar markers known to target specific traits general, but where and how to look for alle-
or designed to provide haplotype data for les that will be useful for genetic diversity
specific regions of the genome. With the characterization and plant breeding.
6
Molecular Dissection of Complex Traits: Theory
is more likely that we can work indirectly DH between two inbred lines (Chapter 4).
with QTL via linked marker loci. Early QTL If a marker is linked to a QTL, marker and
studies were based on manipulations of QTL alleles co-segregate to some extent in
whole chromosomes, including substitu- the progeny. As a result, the frequencies
tion of one chromosome from one inbred of QTL genotypes will be different among
line into another. The approach was refined marker genotypes (Fig. 6.1) and hence the
to apply to small segments of chromosomes, distribution (e.g. the mean and variance)
delineated first by morphological markers of the quantitative trait will vary over the
(Thoday, 1961). marker genotypes. Marker-based analysis of
As large numbers of molecular mark- linkage proceeds by testing for phenotypic
ers become available and thus the whole differences among marker genotypes (Soller
genome mapping of quantitative traits and Beckmann, 1990).
become feasible, Xu (1997) proposed the Before the discovery of molecular
concept of separating, pyramiding and clon- markers, marker-based analysis utilized the
ing QTL, describing how multiple quantita- data from single markers (e.g. Sax, 1923).
tive trait loci (QTL), either clustered together Here, only one or a few markers could be
or dispersed in different chromosomes, can analysed in an experiment because the
first be separated or dissected by molecular number of markers available at that time
marker-assisted QTL mapping and selection was limited. Further, most were morpho-
and then pyramided into one genetic back- logical or biochemical markers making it
ground, either by marker-assisted selection impossible to construct a complete link-
(MAS) or transformation of cloned multi- age map by using single or even multiple
ple QTL, to create transgressive progeny populations. With the development of high-
in plant breeding. At about the same time, density molecular maps, it became appar-
Molecular Dissection of Complex Traits ent that simple (one-locus) marker-based
became an attractive title for a multi-author
book (Paterson, 1998).
The basic method used in Mendelian M1Q1 M2Q2
P1 P2
genetics to find linkage relationship is to M1Q1 M2Q2
classify individuals based on phenotypes
and then compare the proportions of these F1
groups with the theoretical ratio expected
from independent loci and estimate the
recombination fraction. QTL mapping is
establishing linkage between marker loci
and QTL. The fundamental principle is the M1Q1 M1Q2 M2Q1 M2Q2
DH
same, i.e. classifying individuals. Depending M1Q1 M1Q2 M2Q1 M2Q2
on what criterion is used for classification
(1r )/2 r/2 r/2 (1r )/2
of individuals, there are two major types
of QTL mapping methods: marker-based
analysis and trait-based analysis. M1M1 M2M2
1. Marker-based analysis: methods for (1r )mQ1Q1 + rmQ2Q2 (1r )mQ1Q1 + rmQ2Q2
locating chromosomal regions or loci affect-
ing quantitative traits or QTL based on their Fig. 6.1. The frequency distribution of QTL
linkage relationships to Mendelian marker genotypes, Q1Q1 and Q2Q2, within two marker
genotypes, M1M1 and M2M2, in a double haploid
loci was first presented by Thoday (1961)
(DH) population. r is the recombination frequency
and applied in experimental and agricul- between the marker and QTL loci. When r = 0.5
tural species. These studies are all based on (there is no linkage between them), the
differences generated by QTL linked to the frequencies of Q1Q1 and Q2Q2 are the same
marker locus in the mean value of a quan- between the two marker genotypes, which
titative trait between marker genotypes in a means that there is no phenotypic difference
segregating population such as F2, BC and between the two marker genotypes.
Molecular Dissection of Traits: Theory 197
analysis alone could not fully utilize the tal design, linear modelling and the theory
genetic information harboured in complete of probability. Although the readers who
linkage maps for QTL mapping. To fully have little background in these fields are
exploit the potential of complete linkage recommended to focus on Chapter 7 for
maps to locate QTL more efficiently and more practical concepts of molecular dis-
accurately, many QTL mapping approaches section of complex traits, they will still
have been developed using multiple mark- benefit by scanning through each section
ers simultaneously. of this chapter. On the other hand, how-
2. Trait-based analysis: an alternative ever, it is a challenge to have all the sta-
approach to TB analysis is to examine marker tistical issues in this field fully described
allele frequencies in the lines originating in a single chapter. The following refer-
from a segregating population but selected ences are highly recommended for a full
for specific phenotypes (Stuber et al., 1980, coverage of QTL mapping statistics: Xu and
1982). In such populations, selection, espe- Zhu (1994), Lynch and Walsh (1997),
cially for the phenotypic extremes, would Liu (1998), Sorensen and Gianola (2002)
be expected to change the allelic frequen- and Wu et al. (2007). Furthermore, there
cies of segregating plus or minus alleles at are several websites which provide free
QTL thus affecting the trait in question. access to statistical genomics and QTL
Although variation in quantitative traits is mapping courses (e.g. http://www.stat.wisc.
continuous in a population, two extremes edu/yandell/statgen/course/).
of phenotypes could be distinguishable if
the intermediate phenotypes are excluded.
The frequency of plus alleles increases 6.1 Single Marker-based
in the high extreme and the frequency of Approaches
minus alleles increases in the low extreme
(ref. Fig. 7.6B). Hitchhiking effects between Even though trait variation is often known to
such QTL alleles and nearby marker alleles be genetic, the number and location of the
would be expected to generate correspond- genes controlling this variation is generally
ing changes in the allelic frequencies of unknown. On the other hand, marker geno-
the coupled marker alleles. Consequently, types can be scored precisely. If there is an
marker loci at which allele frequencies association between marker type and trait
differed significantly in the high and low value, it is likely that a trait locus is close
extremes would be considered to be in link- to that marker locus. Therefore, the simplest
age with QTL having an effect on the trait analyses consider each marker locus in turn.
under selection. In this way, the number of
segregating QTL affecting the trait under
selection and their general map locations 6.1.1 Assumptions
could be determined. The deviation of allele
frequency from the Mendelian ratio in each Assume two inbred parental lines, P1 and P2
of the two extremes can be tested by the c2 and their F1, a marker locus M with two alle-
statistic. This method is called trait-based les, M1 and M2 linked with a QTL (Q) with
analysis (Keightley and Bulfield, 1993). As two alleles Q1 and Q2 and the recombination
less statistical but more practical issues are fraction between M and Q is r with the trait
associated with trait-based analysis, utiliz- values normally distributed, we have
ing this method is discussed in Chapter 7
along with selective genotyping and pooled P1(Q1Q1M1M1): Y N(mQ1Q1, s 2)
DNA analysis.
P2(Q2Q2M2M2): Y N(mQ2Q2, s 2)
This chapter discusses theoretical
or statistical aspects of QTL mapping. To F1(Q1Q2M1M2): Y N(mQ1Q2, s 2)
understand better the story behind QTL
mapping, the reader should have some For the populations derived from this cross,
basic knowledge of statistics, experimen- there are three marker types in F2 (M1M1,
198 Chapter 6
M1M2 and M2M2), two marker types in the For the convenience of discussion, fre-
double haploid (DH) (M1M1 and M2M2) or in quencies for genotypes in F2 and DH (BC)
the backcross (BC) (M1M1 and M1M2 for BC populations are expressed by a matrix:
derived from backcrossing F1 to P1 and M1M2
and M2M2 for BC derived from backcrossing F1 (1 r )2 2r (1 r ) r2
to P2). Similarly, QTL genotypes for each pop-
pij = r (1 r ) 1 2r + 2r 2 r (1 r )
ulation can be obtained. Tables 6.1, 6.2 and
r2 2r (1 r ) 2
(1 r )
6.3 provide the genotypic frequencies, means
and variances for each marker type and QTL
(i, j = 1, 2, 3 for F2 )
genotypes in populations F2, DH and BC.
In most cases, we assume that trait and
variances are homogeneous over the differ-
ent QTL genotypes, so that s Q21Q1 = s Q2 1Q2 = 1 r r
pij =
s Q2 2Q2 = s 2 for F2 and we also make this same
r 1 r
assumption for other populations such as BC,
DH and recombinant inbred lines (RILs). (i, j = 1, 2 for DH or BC)
Table 6.1. Genotypic frequencies, means and variances in marker and QTL genotypes in the F2 population.
Genotypic frequency
Marker genotype Q1Q1 Q1Q2 Q2Q2 Sample mean Sample variance Sample size
Table 6.2. Genotypic frequencies, means, and variances in marker and QTL genotypes in the DH
population.
Genotypic frequency
Marker genotype Q1Q1 Q2Q2 Sample mean Sample variance Sample size
Table 6.3. Genotypic frequencies, means and variances in marker and QTL genotypes in the BC
population derived by backcrossing to P1.
Genotypic frequency
Marker genotype Q1Q1 Q1Q2 Sample mean Sample variance Sample size
6.1.2 Comparison of marker means linkage), the bigger the difference between
mM1M1 and mM1M2. The difference reaches the
Backcross design maximum, mM1M1 mM1M2 = mQ1Q1 mQ1Q2, when
r = 0, i.e. M and Q are completely linked.
Now we take the BC population as an exam- In this case, all differences between marker
ple to show how to detect markertrait genotypes can be attributed to the effect of
association through comparison of means the putative QTL.
for the different marker classes. The difference between marker geno-
The genotypic array for the BC popula- types can be tested by the t-test statistic
tion derived from backcrossing F1 to P1 is
1 r m M1M1 m M1M 2
r
Q1Q1M 1M 1 + Q1Q2M 1M 1 t=
2 2 1 1
s2 +
r
+ Q1Q1M 1M 2 +
1 r
Q1Q2M 1M 2 nM1M1 nM1M 2
2 2
2
For a BC population derived from back- The quantity s is a pooled estimate of
crossing F1 to P2, the two marker genotypes the variance within each marker class of BC
are M1M2 and M2M2 and the two QTL geno- individuals. The higher the t value, the more
types are Q1Q2 and Q2Q2, with other items significant the difference and the closer the
unchanged. linkage is between M and Q.
Only the marker genotypes can be The above discussion can be extended
directly observed and BC individuals have to DH populations where mM1M2 is replaced
two classes of segregates: marker types by mM2M2 and mQ1Q2 by mQ2Q2.
M1M1 and M1M2. The trait distributions in
these two classes are
F2 design
2 2
M1M1: Y(1 r) N (mQ1Q1,s ) + rN (mQ1Q2,s ) For an F2 population, there are ten trait
2
M1M2: YrN (mQ1Q1,s ) + (1 r)N (mQ1Q2,s ) 2 marker genotypes with three distinguish-
able marker classes. The trait distributions
The means and variance of the two are
mixture distributions are
M1M1: (1 r)2 N(mQ1Q1, s 2) + 2r(1 r)
mM1M1 = (1 r)mQ1Q1 + rmQ1Q2 N(mQ1Q2, s 2) + r 2N(mQ2Q2, s 2)
mM1M2 = rmQ1Q1 + (1 r)mQ1Q2 M1M2: r (1 r) N(mQ1Q1, s 2) + [r2 + (1 r)2]
s 2
M1M1 =s 2
M1M2= s + r(1 r)
2 N(mQ1Q2, s 2) + r (1 r) N (mQ2Q2, s 2)
(mQ1Q1 mQ1Q2)2 M2M2: r2 N (mQ1Q1, s 2) + 2r(1 r) N (mQ1Q2, s 2)
The expected difference in average trait val- + (1 r)2 N(mQ2Q2, s 2)
ues is
The trait means of these three marker classes
mM1M1 mM1M2 = (1 2r)(mQ1Q1 mQ1Q2) are
m M1M1 = (1 r )2 mQ1Q1 + 2r (1 r )mQ1Q2
Therefore, only when r = 0.5, i.e. there is no
linkage between M and Q, + r 2mQ2Q2
m M1M 2 = r (1 r )mQ1Q1 + [r 2 + (1 r )2 ]mQ1Q2
mM1M1 mM1M2 = 0 + r (1 r )mQ2Q2
(nM1M1 1) M
2
1M 1 + ( nM 2M 2 1) M 2M 2
2
1M 1 = s + 2r (1 r )[( mQ1Q1 mQ1Q2 )
2 2
sM S2 =
nM1M1 + nM 2M 2 2
r ( mQ1Q1 + mQ2Q2 2mQ1Q2 )]2
+ r 2(1 r )2( mQ1Q1 + mQ2Q2 2mQ1Q2 )2 To test marker dominance effect, the test
22
statistic is
s 2
M 1M 2 = s + r (1 r )[( mQ1Q1 mQ1Q2 )
2
(nM1M1 1) s M
2
1M 1 + ( nM 1M 2 1) s M 1M 2 + ( nM 2M 2 1) s M 2M 2
2 2
S2 =
nM1M1 + nM1M 2 + nM 2M 2 3
Molecular Dissection of Traits: Theory 201
as shown in Table 6.4. A significant treat- So, the regression coefficient for Y on X is
ment effect implies linkage to a segregat-
ing QTL. bYX = (1 2r)d
Yj = b0 + bYX Xj + ej
From this statistic, we know that the
where the indicator variable Xj takes the environmental error will affect sbYX but not
values 1 or 0 according to whether the indi- bYX so that reducing the environmental
vidual has marker genotype M1M1 or M1M2. error by controlling environmental factors
The variances and correlations among vari- will improve the QTL mapping effect. The
ables Y and X are: major drawback of such linear model-based
(e.g. ANOVA, regression) single marker
1 1 approaches is that they do not indicate
mX = , mY = ( mQ1Q1 + mQ1Q2 ) which side of the marker the QTL is located
2 2
nor how far it is from the marker.
1 2 1
s X2 = , sY = s 2 + d 2
4 4
6.1.5 Likelihood approach
1
XY = (1 2r )d
4 For a normal variable, Y N(m,s 2), the like-
r XY = (1 2r ) 1 + 4s / d 2 2 lihood for the parameters (see Chapter 2
Table 6.4. One-way ANOVA for quantitative trait values among marker genotypes. df, degree
of freedom; SS, sum of squares; MS, mean square; EMS, expected mean square.
Source df SS MS EMS
Between genotype dfg SSg MSg se2 + n0 st2
Error dfe SSe MSe se2
Total dfT
dfT = n 1, df = k 1, df
i
i t e = dfT dft
SST = y ( y ) / n
i j
2
ij
i j
ij
2
i
i
SS =
y ) ( y )
(
j
ij
2
i j
ij
2
n
t
n i i i
i
n0 =
( n ) n
i
i
2
i
i
2
( n )(k 1) i
i
202 Chapter 6
for a basic description of the likelihood DH) have genotypes M1M1Q1Q1N1N1 and
method) is M2M2Q2Q2N2N2, respectively, at two marker
2
loci M (with alleles M1 and M2) and N (with
/(2s 2 )
e (Y m ) alleles N1 and N2) and a QTL, each with two
L( m, s 2 ) =
2ps 2 alleles. The QTL is located between two
marker loci. Its genetic distances to M and
If Y1i and Y2i are the trait values for the ith
N are qMQ and qQN, respectively. The genetic
individuals in BC marker classes M1M1 and
distance (cM) between markers M and N is
M1M2, then the likelihood from all nM1M1 and
q. q can be converted into recombinant fre-
nM1M2 backcross individuals is shown in
quency, r, by
the equation at the bottom of the page. The
hypothesis of no linkage can be tested by 1 1 e 2q e 2q
the likelihood ratio statistic r= tanh(2q ) =
2 2 e 2q + e 2q
nM1M1
1 r (Y1i mQ1Q1 )2 (Y1i mQ1Q2 )2
r
L= exp 2 + exp
i =1 2ps
2
2s 2ps 2 2s 2
nM1M2
(Y2i mQ1Q1 )2 (1 r ) (Y2i mQ1Q2 )2
r
exp 2 + exp
i =1 2ps 2 2s 2ps 2 2s 2
Molecular Dissection of Traits: Theory 203
and the genetic effects of three QTL geno- variance s 2, with m1 = mQ1Q1, m2 = mQ1Q2 and m3
types, Q1Q1, Q1Q2 and Q2Q2 follow normal = mQ2Q2 for an F2 population.
distribution N(mQ1Q1,s2), N(mQ1Q1,s2) and
N(mQ2Q2,s2). Therefore, the effect of QTL on
the quantitative trait can be described by 6.2.2 Likelihood approach
the mixture of these three normal distribu-
tions, with proportions Pm1, Pm2 and Pm3, For two specific markers M and N, the F2
respectively, for the marker locus M. Pm1 and BC (DH) populations have nine and
is zero for the BC population derived from four marker genotypes, respectively, and
backcrossing F1 to P2, Pm2 is zero for DH pop- any individual in the population must have
ulations and Pm3 is zero for the BC popula- one of these genotypes. For the individuals
tion derived from backcrossing F1 to P1. The or lines with a specific marker genotype, the
probability density function for the pheno- sum of the probabilities for three QTL geno-
typic value of an individual or line is types, Q1Q1, Q1Q2 and Q2Q2, is Pm1 + Pm2 +
P
Pm3 = 1. From the genotypic frequencies pro-
f ( y i mi ; rM ) = f (y )
mq q
vided in Tables 6.5 and 6.6, probabilities for
q =1
three QTL genotypes can be obtained. The
where m is the marker genotype; rM is the combined probability density function or
recombinant frequency between the marker likelihood function for all individuals/lines
M and QTL and PMq is the probability of QTL in a population can be expressed as:
genotype (q {1,2,3} for an F2 population),
which depends on the marker genotype m L = L( mq , m j , s 2, y 1, y 2,...y n )
and rM and n n
fq ( y ) =
1 ( y mq )2
exp
= f (y
i =1
i mi ; rM ) = P
i =1 q =1
f (y )
mq q
2ps 2 2s 2
n
( y i mq )2
P
1
which is the probability density function = mq exp
2ps 2 2s 2
for a normal distribution with mean mq and i =1 q =1
Table 6.5. The expected genotypic frequencies in the F2 population (each frequency 4).
Table 6.6. The expected genotypic frequencies in the DH (BC) population (each frequency 2).
where mi is the marker genotype for the ith M (maximization) step is to maximize the
individual/line, with a total of n individu- likelihood function (Eqn 6.3 ) to obtain a
als/lines in the population. new cycle of l values, by using the initial
Maximum likelihood estimates (MLEs) values of l and the expectation obtained
of parameters mj and s2 are those for maxi- for missing data. E and M steps are proc-
mizing the above likelihood function. In essed alternatively by using the new l to
order to maximize the function, we take the replace the old l, until the likelihood func-
logarithm for the likelihood, tion (Eqn 6.3) does not increase (the differ-
ence between the two iterations is less than
n
f (y
a predetermined critical value).
ln L = ln i mi ; rM )
Under the null hypothesis H0: mi = mL
i =1
(i L) (there is no linked QTL), the likeli-
n hood function becomes
= ln (6.3)
2ps 2 n
n
( y i mq ) 2 L0 = L( m P , s P2 , y 1, y 2,...y n ) = f (y ) (6.5)
i
+ ln Pmq exp i =1
i =1 q =1 2s 2
where
If we define 1 n
m p = yi is the average mean of the
n i =1
Pmq fq ( y i ) mapping population;
Wq ( y i mi ; rM ) =
f ( y i mi ; rM ) n
s p = 1 ( yi m P )2 is the variance
2
n i =1
thus the probability that QTL genotype is of the mapping population; and
q, when an individual or line has pheno-
1 ( y m )2
type y and marker genotype m, is deter- f ( yi ) = exp i 2 P is the
mined by Wq. 2ps 2 2s
Set the derivative of Eqn 6.3 as zero and normal density function with mean mP
solve the equation to obtain and variance s P2.
n The statistic for the likelihood ratio test of
i =1
[Wq ( y i mi ; rM ) y i ] the alternative hypothesis (at least one QTL
exists at this location) can be converted into
m q = n (6.4a) a likelihood of odds (LOD) score,
i =1
Wq ( y i mi ; rM )
L( m j , s 2, y 1, y 2,...y n )
LOD = log10 2
n L0 ( m P , s P , y 1, y 2,...y n )
1
s 2 = [Wq ( y i mi ; rM ) (y i 2
mq ) ]
n (6.4b) For an interval bracketed by the two
i =1 q =1
markers M and N, a LOD score is calculated
When QTL is located between the two for every scanning position. LOD scores
marker loci, Eqn 6.4 has no explicit solu- obtained for all marker intervals located
tion. However, it can be solved using EM on the same chromosome form a likelihood
iteration method (Dempster et al., 1977). profile to show possible position(s) of QTL
The E (expectation) step in the EM method associated with the quantitative trait. This
is to obtain the expectations for unknown method uses two flanking markers at a time
missing data by using known data (y and m) to test whether there is any QTL locating at
and initial approximations of, for example the interval bracketed by the two markers.
for a F2 population, l (m1, m2, m3, s2)(using For a specific interval, the test is carried
the average phenotypes of quantitative trait, out at any point by moving a step from one
x1, x2 and x3, for individual/line groups of marker to the other. After completion of the
marker genotypes and the sample variance test for the interval, the test moves to the
of the population, s2, as initial values). The next two flanking markers. The LOD score
Molecular Dissection of Traits: Theory 205
does not provide a test for the presence of a intervals are larger than the critical values.
QTL between the two markers and so is not The two-LOD support interval determined
a formal test of a QTL within the interval. by the range of the highest LOD minus two
Instead the LOD compares the likelihood of LOD provides an empirical confidence
the QTL being at the position characterized interval for the range of QTL location (Fig.
by recombination fractions rMQ, rQN against 6.2). Simulation studies show that a two-
the likelihood that it is at some position LOD interval is close to the 95% interval.
unlinked to the interval.
The amount of support for a QTL at a
particular map position is often displayed
graphically through use of likelihood (or 6.3 Composite Interval Mapping
profile) maps (Fig. 6.2), which plot the like-
lihood-ratio statistic (or a closely related Most of the single-QTL methods can be
quantity) as a function of the map position extended to multiple QTL by condition-
of the putative QTL. Lander and Botstein ing additional marker loci and using
(1989) plotted the LOD scores defined by conditional probabilities for multi-locus
Morton (1955). genotypes. This approach has been used
Empirically, a QTL is claimed when to develop explicit models for two or
the LOD is larger than a critical value pre- three linked QTL (e.g. Knapp, 1991;
determined (for example, 2 or 3) or gener- Haley and Knott, 1992; Martinez and
ated by permutation. The location of the Curnow, 1992; Jansen, 1996; Satagopan
QTL should be the chromosome region that et al., 1996). Kearsey and Hyne (1994),
corresponds to the highest likelihood map Hyne and Kearsey (1995) and Wu and Li
if LOD scores in several flanking marker (1994, 1996) also proposed a very simple
8
Estimated map location
7
6
LOD score
5 2-LOD decrease
2 2-LOD
support
interval
1
0
0 10 20 30 40 50 60
Linkage map (cM)
Fig. 6.2. Hypothetical likelihood map for the markerQTL association on a linkage map in internal
marker analysis. A QTL is indicated if any part of the likelihood map exceeds a critical value. In such
cases, the estimated QTL location is the value of centimorgans giving the highest likelihood. Approximate
confidence intervals for QTL position (two-LOD support intervals) are often constructed by including
the set of all centimorgan values giving likelihoods within two-LOD scores of the maximum value.
206 Chapter 6
where X j B = m + bk x jk . The MLEs of the Setting this derivative to zero leads to the
k
various parameters can be found in a similar solution
manner as for interval mapping. For b*see ) b*2 c
)'(Y XB
ns 2 = (Y XB
Eqn a at the bottom of the page. Setting this
derivative to zero gives
b* = (y j =1
j
)Pj /
X jB P
j =1
j
with the MLEs
= (Y XB)' P/c
B = (X'X)1X'Y
2 = (Y XB)'(Y XB)/n
where c = nj =1 p j , Y = { y j }n1, P = {Pj }n1, and
a prime denotes transposition.
The likelihood ratio (LR) test statistic is
Differentiating the log-likelihood with
respect to B L(b* = 0, B , s 2 )
n LR = 2ln or
ln L L(b*, B
, s 2 )
B
= [ P X' ( y
j j j X jB b*)
2
j =1 L(b* = 0, B , s )
LOD = log10
+ (1 Pj )X'j ( y j X jB)]/s 2
L(b*, B, s 2 )
Expressed in matrix notation, the equation Like Lander and Botsteins interval
ln L/ B = 0 becomes mapping, this test can be performed at any
position in a genome. Thus it gives a system-
X'(Y XB) = X'Pb* atic strategy to search for QTL in a genome.
B = (X'X)1X'(Y Pb*) As the test statistic is almost independent
for each interval, a test on each interval is
Differentiating the log-likelihood with res- more likely to test for a single QTL only.
pect to s 2:
n
ln L
s 2
= [P (y
j =1
j j X j B b*)2
6.3.5 Selection of markers as cofactors
n
ln L p1 jf([ y j X jB b*] / s ) y j X jB b*
b*
= p f([y
j =1
1j j X jB b*] / s ) + p0 jf([ y j X jB] / s ) s2
(a)
p1 jf([ y j X jB b*] / s )
Pj = (b)
p1 jf([ y j X jB b*] / s ) + p0 jf([ y j X jB] / s )
208 Chapter 6
depends on the number and positions of close to the interval under test are not suita-
underlying QTL, the information that is not ble as cofactors. To solve this problem, only
available a priori. Suppose the interval of markers that are some distance away from
interest is delimited by markers i and i + 1. the test interval can be selected. Because
Additional markers i 1 and i + 2 as cofac- the size of this test window depends on test
tors account for all linked QTL to the left of intervals, different sizes should be tested
marker i 1 and to the right of marker i + 2. to find a suitable window size for each test
Thus, while these cofactors do not account interval.
for the effect of linked QTL in the intervals
immediately adjacent to the one of interest,
they do account for all the linked QTL.
The number of cofactors should not 6.3.6 Inclusive composite interval
exceed 2 n where n is the number of indi- mapping
viduals in the analysis (Jansen and Stam,
1994) or alternatively it can be determined In Zengs (1993, 1994) algorithm, the QTL
automatically by F-to-enter or F-to-drop effect at the current testing position and
criterion in the forward or backward step- regression coefficients of the marker varia-
wise regression analysis. A first approach bles used to control genetic background are
would be to include all unlinked markers estimated simultaneously in an expecta-
showing significantly markertrait associa- tion and conditional maximization (ECM)
tion (detected, for example, by stand single- algorithm. Thus, the same marker variable
marker regression). If several linked markers may have different coefficient estimates
from a single chromosome all show signifi- as the testing position changes along the
cant effects, one might just use the marker chromosomes. The algorithm used in CIM
having the largest effect. A related strategy cannot completely ensure that the effect of
is to first perform a multiple regression QTL at the current testing interval is not
using all markers unlinked to the region of absorbed by the background marker vari-
interest and then eliminate those that are ables, which may result in biased estima-
not significant. tion of the QTL effect.
In the computer program designed for A modified algorithm called inclusive
CIM, QTL CARTOGRAPHER, a two-step procedure composite interval mapping (ICIM) was
for practical data analysis was implemented. proposed by Li et al. (2007). In ICIM, marker
In the first step, np markers that are signifi- selection is conducted only once through
cantly associated with the trait are selected stepwise regression by considering all
by (forward or backward) stepwise regres- marker information simultaneously and the
sion. In the second step (mapping step), for phenotypic values are then adjusted by all
each testing interval, except of the markers markers retained in the regression equation
for the putative QTL, two markers that are at except the two markers flanking the current
least Ws cM away from the test interval (one mapping interval. The adjusted phenotypic
for each direction) are first picked up to fit values are finally used in interval mapping.
in the model to define a testing window for The modified algorithm has a simpler form
blocking other possible linked QTL effects than that used in CIM but a faster conver-
on the test. Then, those selected np mark- gence. ICIM retains all the advantages of
ers that are outside of the testing window CIM over interval mapping and avoids the
are also fitted into the model to reduce the possible increase of sampling variance and
residual variance. the complicated background marker selec-
The accuracy of locating QTL provided tion process. Extensive simulations using
by CIM is at the cost of reduced statistical two genomes and various genetic models
power because markers selected as cofac- indicated that ICIM has increased detection
tors around the test interval will pick up power, reduced false detection rate and less
some effect of the QTL that is located in biased estimates of QTL effects. ICIM has
the test interval. Therefore, markers that are been extended to map digenic interacting
Molecular Dissection of Traits: Theory 209
search for number, positions, effects and genotype of putative QTL r (defined
interaction of significant QTL. Using mark- by 1/2 or 1/2 for the two genotypes),
ers for simultaneous multiple QTL analysis which is unobserved but can be
was suggested first by Lander and Botstein inferred from marker data in sense of
(1989), although the idea was pursued only probability;
with a very limited scope. Bayesian statis- brs is the epistatic effect between puta-
tics via MCMC for mapping QTL is also tive QTL r and s;
based on multiple QTL, particularly when it r s (1, , m) denotes a subset of
is combined with a reversible-jump process, QTL pairs that each shows a significant
which will be discussed in the Section 6.7. epistatic effect, because if all pairs of m
MIM consists of four components: QTL are fitted in the model, the model
can be over parameterized;
an evaluation procedure to analyse the m is the number of putative QTL cho-
likelihood of the data given a genetic model sen by either their significant marginal
(number, position and epistasis of QTL); effects or significant epistatic effects;
a search strategy to select the best t is the number of significant pairwise
genetic model (among those sampled) epistatic effects; and
in the parameter space; ei is a residual effect of the model
an estimation procedure to estimate assumed to be normally distributed
all parameters of interest in the genetic with mean zero and variance s 2.
architecture of quantitative traits
(number, positions, effects and epista- As the genotypes of an individual at
sis of QTL; genetic variances and covar- many genomic locations are not observed
iances explained by QTL effects) given (but marker genotypes are), the model con-
the selected genetic model; and tains missing data. So the likelihood function
210 Chapter 6
of the data given the model is a mixture of The M-step is shown in Eqns 6.96.11
normal distributions (see bottom of page) where Er is the rth
element of E and Dijr is the rth element
L(E, m, s 2 ) of Dij.
n 2m These equations can be expressed in a
(6.7)
= p f( y
ij i m + D ij E, s 2 )
general form in matrix notation as (Kao and
Zeng, 1997)
i =1 j =1
p
r 1 w
[t + 1]
ij Dijr [( y i m [t ] ) Dijs Es[t +1] Dijs Es[t ] ]
[t + 1] i j s =1 s = r +1
E = (6.9)
p
r
[t + 1] 2
ij Dijr
i j
y p
1
m [t +1] = i
[t + 1]
ij Dijr E r[t +1] (6.10)
n i j r
1
s 2[t +1] =
n (y i m [t +1] )2 2 (y i m [t +1] ) p [t + 1]
ij Dijr E r[t +1]
i i j r
(6.11)
+ p [ijt +1]Dijr Dijs E r[t +1]Es[t +1]
r s i j
Molecular Dissection of Traits: Theory 211
values for other parameters). Equation 6.9 analysis to finalize the search for a genetic
is more stable than Eqn 6.12 numerically: model under MIM.
Eqn 6.12 can lead to divergence in certain
cases and Eqn 6.9 can always lead to con- 1. Begin with a model that contains m QTL
vergence, although at a slightly slower pace and t epistatic effects.
(Z.-B. Zeng, North Carolina State University, 2. Scan the genome to search for the best
personal communication). position of an (m + 1)th QTL and then per-
Note on the meaning and difference form a likelihood ratio test for the marginal
between pij and pij: pij is the probability of effect of this putative QTL. If the test statis-
each multi-locus QTL genotype conditional tic exceeds the critical value, this effect is
on marker genotype and pij is the prob- retained in the model.
ability of each multi-locus QTL genotype 3. Search for the t +1 epistatic effect among
conditional on marker genotype and also the pairwise interaction terms not yet
phenotypic value. included in the model and perform the like-
The test for each QTL effect, say Er, is lihood ratio test on the effect. If LOD exceeds
performed by a likelihood ratio test con- the critical value, the effect is retained in
ditional on other selected QTL effects (see the model. Repeat the process until no more
equation at bottom of page). significant epistatic effects are found.
For given positions of m putative QTL 4. Re-evaluate the significance of each QTL
and m + t QTL effects, the likelihood analy- effect currently fitted in the model. If LOD
sis can proceed as outlined above. Now the for a QTL (marginal or epistatic) effect falls
task is to search and select the best genetic below the significant threshold conditional
model (number, positions and interaction of on other fitted effects, the effect is removed
QTL) that fits the data well. from the model. However, if the marginal
effect of a QTL that has significant epistatic
effect on other QTL falls below the thresh-
6.4.2 Model selection old, this marginal effect is still retained.
This process is performed in a stepwise
Pre-model selection manner until the test statistic for each effect
is above the significance threshold.
As the evaluation of the MIM model is com- 5. Optimize estimates of QTL positions
putationally intensive, it is important to based on the currently selected model.
select a good pre-model for MIM analysis. Instead of performing a multi-dimen-
The following procedure can be used. First, sional search around the regions of cur-
select a subset of significant markers. Then, rent estimates of QTL positions (which is
use the results from marker selection to an option), estimates of QTL positions are
perform CIM to scan the genome for can- updated in turn for each region. For the
didate positions. Finally, evaluate and test ith QTL in the model, the region between
each parameter in the pre-model under its two neighbour QTL is scanned to find
MIM and drop any non-significant estimate the position that maximizes the likelihood
in a stepwise manner. (conditional on the current estimates of
positions of other QTL and QTL epista-
Model selection using multiple interval
sis). This refinement process is repeated
mapping
sequentially for each QTL position until
After the first evaluation of the pre-model, there is no change in the estimates of QTL
perform the following stepwise selection positions.
L( E1 0,..., E m+t 0)
LOD = log10
L( E10,..., E r 1 0, E r = 0, E r +1 0,..., E m+t 0)
212 Chapter 6
6. Return to step 2 and repeat the process LR-to-enter statistic for likelihood analysis
until no more significant QTL effects can be at minimum is
added into the model and estimates of QTL
positions are optimized. Lk
LRk = 2log n log(c(n) / n + 1)
L k +1
Stopping rules c(n)
An important issue associated with model
selection is a stopping rule for the model The criterion is basically defined by the
search algorithm or criterion for compar- choice of the penalty c(n). Using c(n) = 2 as
ing different models. In regression analysis suggested by Akaike (1969) would mean
with model selection, the stopping rules are that the final threshold in LOD is 0.43. c(n)
usually decided by minimizing the final can take a variety of forms, such as: c(n) =
prediction error (FPE) criterion or informa- log(n), which is the classical Bayes informa-
tion criteria (IC). tion criterion (BIC); c(n) = 2, which is the
The FPE criterion is Akaike information criterion (AIC) (Zou
and Zeng, 2008).
In reference to QTL analysis on markers,
Sk = (n + k)RSSk / (n k)
Broman (1997) suggested using c(n) = d log n
and recommended d be between 2 and 3. For
where RSSk is the residual sum of squares n = 100500, the threshold in LOD would be
and k is the number of parameters fitted in 22.7 for d = 2 and 34 for d = 3. However,
the model. The IC of the general form is this argument is still rather arbitrary and
does not relate to the genetic length of the
IC = 2[log Lk kc(n)/2] (6.13) linkage map, number of markers and linkage
groups or the distribution of markers.
where Lk is the likelihood of data (Eqn 6.7)
given a genetic model with k parameters
and c(n) is a weighting function of the sam- 6.4.3 Estimating genotypic values
ple size (examples given below). This is and variance components of QTL effects
approximately equivalent to
Given estimates of the QTL parameters, the
IC = log [RSSk/n] + kc(n)/n genotypic values of an individual can be
estimated. This estimation is complicated by
the fact that QTL genotypes are not observed
in regression analysis. directly, rather only marker genotypes are
The IC criteria can be related to the observed. Thus, the estimation for an indi-
F-to-enter statistic (for regression analy- vidual is the weighted mean of all possible
sis) or LR-to-enter statistic (for likelihood genotypic values, weighted by the probabil-
analysis) in the stepwise selection proce- ity (pij) of each QTL genotype conditional on
dure. It was shown (Miller, 1990: p. 208) both the marker and the phenotypic data.
that Eqn 6.12 leads to the F-to-enter statis- From Eqn 6.10, this estimation equation is
tic for regression analysis at the minimum
(see Eqn 6.14 at bottom of page) provided
2m m +t
D
that c (n)/n is small. As LR = n log (SSRk /
SSRk + 1) in the setting of regression analy- y i = m + ij ijr E r
j =1 r =1
sis, Eqns 6.13 and 6.14 imply that the
SRRk SRRk +1 k + 1
(n k 1)(e c( n)/ n 1) 2c(n) 1 (6.14)
SRRk +1 / (n k 1) n
Molecular Dissection of Traits: Theory 213
where the first summation is over all pos- In this form s 2 is expressed as a differ-
sible 2m QTL genotypes and the second ence between the MLE of total phenotypic
summation is over all effects of the model variance s p2 (the first part of Eqn 6.15) and
(m main effects and t epistatic effects). m that of the genetic variance s g2 (the second
is the MLE of m obtained from Eqn 6.10 at part of Eqn 6.15). g2 can be further parti-
the equilibrium of the final model and r tioned into the equation at the very bottom
is the MLE of QTL effect Er obtained from of the page. s E2r estimates genetic variance
Eqn 6.9. pij is the MLE of p obtained from due to the QTL effect Er and s E2r ,Es estimates
Eqn 6.8. genetic covariance between QTL effects Er
To predict the genotypic values of and Es.
quantitative traits based on marker informa- It is convenient and informative
tion only, we need to use to combine the variance due to each
QTL effect with half of the covariances
y i = m + p D
j r
ij ijr E r between this QTL effect and other effects,
and report this variance component as
the variance component explained by this
as p ij is a function of phenotype yi which is QTL effect
unavailable in early selection.
s
1
The genetic variances and covariances s r2 = s E2 r + E r , Es
explained by each QTL effect can be esti- 2 sr
mated directly from the likelihood analy-
sis. Applying the EM algorithm, Eqn 6.12 Whereas 2Er estimates the variance of the rth
leads to QTL effect in linkage equilibrium (in which
sEr ,Es = 0), s r2 estimates the contribution to
E = V '(Y m )
1D'P the total variance in the current population
with linkage disequilibrium. Estimates of
This implies
these variances, covariances and variance
1 components can be given as a ratio of the
s 2 = [(Y m )'(Y m ) E ' VE
]
total phenotypic variance. Note that s2g/s2p
n
is the coefficient of determination (R2) of the
or Eqn 6.15 at the bottom of m the page MIM model. Note also whereas s 2Er is always
n n 2
where y = i =1 yi /n and Dr = i=1 j =1 ij Dijr/n. positive, s r2 is not necessarily positive.
1
n m +t m +t n 2m
s 2 =
n i =1
( y i m )2 p D ij ijr Dijs E r E s
r =1 s =1 i =1 j =1
1
n m+t m+t n 2m (6.15)
=
n i =1
( y i y )2 p (D ij ijr Dr )( Dijs Ds )E r E s
r =1 s =1 i =1 j =1
m +t 1 n 2m m + t r 1 2 n 2m
s g2 =
n
p ij ( Dijr Dr ) E r2 +
2
n
p ij ( Dijr Dr )( Dijs Ds )E r E s
r =1 i =1 j =1 r = 2 s =1 i =1 j =1
m +t m + t r 1
= s + s
r =1
2
Er
r = 2 s =1
E r , Es
214 Chapter 6
interaction and epistatic interaction. For ing dominance when it does not exist and
all multiple crosses, a more reasonable increased variance and thus bias QTL
analysis would be the combined analy- results, although location estimate is unbi-
sis over crosses. In this way, crosses cre- ased. Statistical power can be increased
ated or evaluated at different times can be by combining crosses, which is important
combined and multiple projects in a team when several related crosses are created.
or across research groups can be related The threshold idea for testing and loci inter-
and shared. Disadvantages include more vals has been extended to multiple crosses
complications for analysis with few soft- (Zou et al., 2001).
ware packages available and having to Jannick and Jansen (2001) devel-
account for the multiple related crosses oped a method to map epistatic QTL by
(where individuals may be correlated to identifying loci with strong interaction
each other both genotypically and pheno- between QTL and genetic background.
typically). QTL mapping approaches have The approach requires large populations
been developed for four-way crosses (Xu, derived from multiple related inbred-line
1996) and crosses derived from multiple crosses. The method is applied to simu-
inbred lines (Liu and Zeng, 2000). late DH populations derived from a diallel
Broman et al. (2003a) discussed how among three inbred parents. This approach
to combine multiple crosses in QTL analy- allows detection of QTL involved not only
sis. For crosses with founders unrelated to in pairwise but also higher-order interac-
each other, a nave sum of separate LODs tion and does so with one-dimensional
by cross can be used assuming a different genome searches.
gene action in different crosses, or com- The North Carolina Experimental III
bined analysis can be used for independent (design NCIII in Chapter 4), originally
crosses. For crosses with related founders, designed by Comstock and Robinson
QTL analysis depends on genetic relation- (1952), is the first complex design that
ships within and between crosses. With was exploited for QTL mapping. In NCIII,
constant genetic covariance within a cross, the experimental units are produced from
all individuals have the same genetic rela- BC matings of F2 plants to the two paren-
tionship and combined analysis has no tal lines from which the F2 was derived.
effect on single cross analysis. However, Additive and dominance components of
genetic covariance may differ between variance can be estimated with nearly
crosses, depending on the expected equal precision under the assumption
number of alleles shared by identity by of diploidy, biallelic and equal gene
descent (IBD). It should be noted that cov- frequencies and absence of linkage and
ariance across multiple crosses is not con- epistasis. Cockerham and Zeng (1996)
stant. In these cases, combined analysis extended Comstock and Robinsons
will provide results that are different from ANOVA to include linkage and epistasis
single cross analysis. The problems with for F2 and F3 progenies and developed
multiple cross analysis can be fixed sim- orthogonal contrasts for QTL mapping
ply by the introduction of blocking factors using single-marker ANOVA. Melchinger
for crosses as a random effect for genetic et al. (2007) demonstrated the excep-
relationships. This addresses the constant tional features of NCIII for identification
covariance with each cross and different of QTL contributing to heterosis. They
covariances between crosses, which pro- defined a new type of heterotic gene
vides an appropriate recombination model effect, denoted as the augmented domi-
for crosses to relate the recombination rate nance effect di*, which is equal to the net
to distance and common phenotype model contribution of QTLi to mid-parent het-
across all crosses to allow cross by genetic erosis (MPH). It comprises the dominance
effect interactions. effect d minus half the sum of additive
Ignoring polygenic effects will result dominance epistatic interactions with
in a biased additive effect estimate, detect- genetic background. The novelty of their
216 Chapter 6
approach is that QTL that significantly Pooled analysis provides a means for evalu-
contribute to MPH are identified and both ating, as a whole, evidence for the existence
dominance and epistasis are accounted for. of a QTL from different studies and exam-
An elegant experimental design that ining differences in gene effect of a QTL
can provide a test of significance for the among different populations.
presence of epistasis is the triple testcross Walling et al. (2000) extended least
(TTC) design (Chapter 4), proposed by square interval mapping (Haley et al., 1994)
Kearsey and Jinks (1968), which is an to analysis of combined data from seven
extension of NCIII. In the TTC design, test- porcine populations, while Li, R. et al.
crosses are produced not only with the two (2005) extended the Bayesian QTL analysis
parental lines but also with the F1 derived method (Sen and Churchill, 2001) to analy-
from them. For every progeny from a seg- sis of combined data from four mouse popu-
regating population, e.g. F2 plant or RIL, lations. The former (Walling et al., 2000) is
three sets of data can be generated: (i) the simple and computation and general sta-
average parental testcross performance; (ii) tistical software such as SAS is applicable.
the difference between the parental test- The latter (Li, R. et al., 2005) adopted a new
cross performances; and (iii) the deviation QTL analysis method and this requires spe-
of testcross progenies with the F1 from the cial software. Some earlier studies (Rebai
mean of the parental testcrosses. Kearsey and Goffinet, 1993; Xu, 1998; Liu and Zeng,
et al. (2003) and Frascaroli et al. (2007) 2000) also developed QTL analysis meth-
presented experimental results from QTL ods for data which may be produced from
analyses based on the TTC design with several populations.
data from Arabidopsis and maize, respec- Guo, B. et al. (2006) provided an exam-
tively. Melchinger et al. (2008) gave genetic ple of pooled analysis of data from multiple
expectations of QTL effects estimated with QTL mapping populations. Least square
the TTC design in the presence of epistasis. interval mapping was extended for pooled
With the TTC design, dominance addi- analysis by inclusion of populations and
tive epistatic interactions of individual cofactor markers as indicator variables and
QTL with the genetic background can be covariate variables separately in the multi-
estimated with one-dimensional genome ple linear models. The general linear test
scans. They demonstrated that the limita- approach was applied to the detection of
tion of NCIII in the analysis of heterosis to QTL. Single population-based and pooled
separate QTL main effects and their epi- analyses were conducted on data from two
static interactions with all other QTL can F2:3 mapping populations, Hamilton (sus-
partially be overcome with the TTC design. ceptible) PI 90763 (resistant) and Magellan
They also presented genetic expectations (susceptible) PI 404198A (resistant), for
of variance components for the analysis of resistance to cyst nematode in soybean. It
TTC progeny tested in a split-plot design, was demonstrated that where a QTL was
assuming digenic epistasis and arbitrary shared among populations, pooled analy-
linkage. Kusterer et al. (2007) used the the- sis showed increased LOD values for the
ory to study heterosis for biomass-related QTL candidate region over single popula-
traits in Arabidopsis. tion analyses. Where a QTL was not shared
among populations, however, the pooled
analysis showed decreased LOD values
for the QTL candidate region over single
6.5.3 Pooled analysis population analyses. Pooled analysis on
data from genetically similar populations
Very often, more than two mapping popu- may have a higher power of QTL detection
lations are studied for the same or related relative to single population-based analy-
traits. QTL analysis on pooled data from ses. An important issue emerges from such
multiple mapping populations was sug- pooled analyses: because of this dilution
gested by Lander and Kruglyak (1995). effect, a QTL with strong effects, but exist-
Molecular Dissection of Traits: Theory 217
ing in only one or few populations, may and variance. A few QTL can dramatically
become undetectable if a large number of reduce bias while many predictors (QTL)
populations are pooled. can increase variance. Finally, estimation
of QTL parameters depends on sample size,
heritability and environmental variation.
What can we do with the QTL below
6.6 Multiple QTL the limits of detection? There is a problem
of selection bias: QTL of modest effect can
6.6.1 Reality of multiple QTL sometimes be detected but their effects are
biased upwards when detected (Beavis,
A multiple QTL model is designed to: (i) effec- 1994). To avoid sharp in/out dichotomy,
tively search over the space of genetic archi- caution should be taken about only exam-
tecture for the number and positions of loci, ining the best model and the probability
gene action (additive, dominance, epistasis); that a QTL is in the model should be con-
(ii) select best or better model(s) includ- sidered. Building m detected loci into the
ing what criteria to use and where to draw QTL model will directly allow uncertainty
the line; and (iii) estimate features of model in genetic architecture and model selection
such as means, variances and covariances, over number of QTL.
confidence regions and marginal or condi-
tional distributions (Broman et al., 2003a).
The multiple QTL approach should 6.6.2 Selecting a class of QTL models
have several advantages relative to single
QTL approaches. First, statistical power
There are many parameters to be consid-
and precision can be improved so that the
ered when selecting a class of QTL models
number of QTL detected will increase and
(Broman et al., 2003a): (i) number of QTL,
better estimates of loci (less bias, smaller
single QTL or multiple QTL of known or
intervals) will be provided. Secondly, the
unknown number; (ii) location of QTL with
inference of complex genetic architecture
known positions and widely spaced (no two
including patterns and individual elements
QTL within a marker interval) or arbitrarily
of epistasis can be improved; means, vari-
close; (iii) gene action including additive
ances and covariances can be estimated
and/or dominance effects, epsitatic effects
appropriately and the relative contribu-
(four combinations for diploid species aa,
tions of different QTL can be assessed.
ad, da, dd; more combinations for species
Thirdly, estimates of genotypic values can
with higher levels of ploidy) and phenotypic
be improved with less bias (more accurate)
distribution (normal, binomial, Poisson, etc.).
and smaller variance (more precise).
Consider a phenotype normally distrib-
Is there any limit of estimation for
uted with
QTL? As indicated by Bernardo (2001), the
reasonable number of QTL for an efficient
Pr(Y|Q,q) = N(GQ,s2)
MAS is 10. A larger number such as 50 is
too big. Phenotype is a better predictor than
Typical assumptions that are required for
genotype when there are a large number of
building a model are: (i) normally-distributed
QTL. Increasing sample size does not give
environmental variation, i.e. residuals e (not
multiple QTL any advantage. Also it is
Y!) give a bell-shaped histogram; (ii) genetic
hard to select many QTL simultaneously
value GQ is a composite of m QTL, i.e. Q =
because there are 3m possible genotypes to
(Q1, Q2, , Qm); and (ii) genetic effect uncor-
choose from when a trait is controlled by
related with environment. That is,
m QTL. Genetic linkage between QTL, i.e.
multi-collinearity, will lead to correlated
Y = m + GQ + e, e N(0, s 2)
estimates of gene effects and the precision
of each effect drops as more predictors E(Y|Q,q) = m + GQ, var(Y|Q,q) = s 2
are added. There is a need to balance bias q = (m,GQ,s 2)
218 Chapter 6
Considering multiple QTL, the genotypic 6.6.3 Multiple QTL with epistasis
value can be partitioned (assuming no
epistasis) as When a trait is controlled by multiple QTL,
it is very possible that some epistasis may
GQ = qQ (1) + qQ (2) + ... + qQ ( m ) occur between loci. With two QTL involved,
or GQ = q j
Q( j )
there are four types of epistasis, aa, ad, da
and dd. With more than two loci involved,
there would be higher-order epistasis.
Thus genetic variance can be partitioned as Considering genetic models with
epistasis, genotypic values can be parti-
var(GQ ) = s G2 = s
j
2
G( j ) , tioned with epistasis as
q
3. Models can be compared based on re-sam-
GQ = kjQ
pling techniques, which include bootstrap
k j
(re-sampling with replacement from data),
cross validation (repeatedly dividing data qkjQ = q(j1, j2, , jk)Q
into estimation and test sets) and sequen-
tial permutation tests which are conditional Genetic variance is partitioned as
on the QTL already in the model and stops
when added QTL are not significant.
4. There are some information criteria for
G2 = k
2
kG , kG
2
= j
2
kGj ,
The properties of the posterior distribution From full conditionals for a model with
can be studied by using prior distributions mQTL, it is hard to sample from joint poste-
that are independent between QTL and by rior probability
drawing samples from posterior probabili-
ties. The conditional posterior probability Pr(l,Q,q|Y,X) = Pr(q) Pr(l) Pr(Q|X,l)
for multiple imputation or MCMC is shown Pr(Y|Q,q)/constant
in the equation at the bottom of the page.
To construct a Markov chain around But it is easy to sample parameters from full
posterior distribution, we need posterior conditionals as following:
probability as a stable distribution of the
Markov chain. In practice, the chain tends Pr(q|Y,X,l,Q) = Pr(q|Y,Q) = Pr(q)
towards stable distribution. The MCMC Pr(Y|Q,q)/constant
algorithm starts with given values of the (for genetic effects)
parameters in the prior distributions and Pr(l|Y,X,q,Q) = Pr(l|X,Q) = Pr(l)
the initial values for all the unknowns Pr(Q|,l)/constant
generated from their prior distributions (for QTL locus)
Pr(Q|Y,X,l,q) = Pr(Q|X,l) Pr(Y|Q,q)/
(l,Q,q,m) Pr(l,Q,q,m|Y,X)
constant (for QTL
genotypes)
and m-QTL model components from full
conditionals are updated with the following
updating steps:
6.7.3 Bayesian mapping methods
update genetic effects q given geno-
types and traits;
When fully structuring the gene map-
update locus l given genotypes and
ping problem in the Bayesian framework,
marker map; and
the types of models considered to be suit-
update genotypes Q given traits, marker
able (e.g. no epistasis) need to be defined
map, locus and effects.
a priori. A prior opinion concerning the
This generates the following chain of plausible values of the model dimension
estimates: (the number of parameters) in addition to
plausible values of the parameters them-
(l, Q, q, m)1 (l, Q, q, m)2 (l, Q, q, m)N selves then need to be incorporated. This
includes the prior distributions attached to
To ensure that the chain mixes well, the the number of influential genes (QTL) and to
initial values may have low posterior prob- their effects, which together reflect the prior
ability at the period of burn-in (initial itera- beliefs concerning sensitivity towards small
tions of the MCMC process that are used to gene effects. In general, the MCMC analysis
locate the sampler in this part of the sample requires specific prior or proposal, distribu-
space). tions and involves many iterations. In order
After the burn-in period, realizations of for the MCMC process to provide useful esti-
(l, Q, q, m) are sampled from the chain and mates, it is necessary for the sampler to move
stored. Once enough realizations have been around the sample space successfully.
sampled, empirical posterior distributions Bayesian mapping was initiated
for parameters in (l, Q, q, m) can be created by Hoeschele and VanRaden (1993a,b)
from the posterior sample. and subsequently developed by Satagopan
et al. (1996) and Sillanp and Arjas effects to be reduced towards zero, while
(1998, 1999). Since then, various Bayesian QTL with large effects are estimated with
mapping methods have been developed virtually no shrinkage. To do this, each
for different models and genetic sys- marker effect is allowed to have its own var-
tems, including the reverse-jump MCMC iance parameter, which in turn has its own
Bayesian method (Green, 1995; Satagopan prior distribution so that the variance can be
et al., 1996; Sillanp and Arjas, 1998, estimated. Henceforth, prior distributions
1999; Sillanp and Corander, 2002), model for all parameters are firstly assumed, i.e.
selection framework (Yi, 2004; Yi et al., p(b0) 1, p(se2) 1/se2, p(bj) = N(0,sj2) and
2005, 2007) and the shrinkage estimation p(sj2) 1/sj2 (j = 1, , q); then conditional
(SE) method (Xu, S., 2003; Zhang and posterior distributions (CPD) for all param-
Xu, 2004; Wang, H. et al., 2005). Wu and eters and hyperparameters are deduced,
Lin (2006) concluded that the SE method i.e. CPD for bj is N(bj, sj2) where
allows analytical strategies for QTL map-
ping to expand to whole-genome mapping 1
n
of epistatic QTL by use of all markers.
However, the number of variables involved
bj =
i =1
x ij2 + s e2 / s 2j
is so large that the computation time is too n q
long. To solve this problem, Zhang and Xu
(2005) proposed the penalized maximum
i =1
x ij y i bo
x
kj
b
ik k
likelihood (PML) method. Yi and Shriner
(2008) reviewed Bayesian mapping meth- and
ods and associated computer software for
mapping multiple QTL in experimental 1
n
crosses. They compared and contrasted
the various methods to clearly describe the
s 2j =
i =1
x ij2 + s e2 / s 2j s e2
relationship between them.
The CPD for sj2 is an inverted chi-square dis-
tribution; and finally, we sample observations
Bayesian shrinkage estimation (BSE) method of all parameters from the corresponding CPD.
When the sampling chain converges to the sta-
With BSE, the number of effects that can
tionary distribution, the sampled parameters
be handled can be larger than the number
actually follow the joint posterior distribu-
of observations. The BSE method has been
tion. When the sample of a single-parameter
extended to map multiple QTL (Zhang
is considered, this univariate sample is actu-
and Xu, 2004; Wang, H. et al., 2005) and
ally the marginal posterior sample for this
epistatic QTL.
parameter. Therefore, the number, positions
Assuming m QTL, Q1, Q2, and Qm,
and effects of QTL can be estimated.
the model for the quantitative trait value
Provided the jth QTL is false (i.e. effect
can be written as
size is zero), the estimate of sj2 will tend to
q zero and the mean and variance of the pos-
y i = b0 + x b +e
j =1
ij j i
terior distribution for bj regress to zero so
that the sampled observations of bj are close
to zero. Note that updating the variance sj2
where yi is the quantitative trait value for for the jth QTL is important because this
individual i, b0 is the mean, bj is the main either overcomes the shortcomings of the
effect of Qj, xij is coded as 1/2 or 1/2 if the fixed ridge parameter in ridge regression or
genotype of Qj is Qj Qj or Qj qj (j = 1, , m), reflects the information of the data. If bj
and m is not equal to the number of markers 0 in the formula sj2 = bj2/c v2 = 1, then sj2 0;
for multiple-marker analysis but the number however, dividing bj2 by a chi-square vari-
of marker intervals for multiple QTL analy- able allows sj2 a chance to recover because
sis. The BSE method allows spurious QTL c v2 = 1 can be very small by chance.
222 Chapter 6
Bayesian shrinkage analysis was used decrease the running time. PML is different
to develop a QTL model for mapping multi- from the ML method because the function
ple QTL for dynamic traits (such as growth to be maximized is a penalized likelihood
trajectories) under the maximum likelihood function rather than a likelihood function.
framework (Yang and Xu, 2007). The growth Penalized likelihood is similar to the poste-
trajectory was fitted by Legendre polynomi- rior distribution of the parameters, with the
als. The method combines the shrinkage prior distribution of the parameters serving
mapping for individual quantitative traits as the penalty. So PML method depends
with the Legendre polynomial analysis for on the prior distribution. It estimates
dynamic traits. The multiple-QTL model means and variances of prior distributions
was implemented in two ways: (i) a fixed- of QTL effects together with QTL effects,
interval approach where a QTL is placed i.e. QTL effects can be estimated by using the
in each marker interval; and (ii) a moving- equation shown at the bottom of the page.
interval approach where the position of a If, sj2 0 then bj mj. Additionally, mj = bj/
QTL can be searched in a range that covers (m + 1), so bj 0. This explains the reason
many marker intervals. Simulation showed why the estimate of a false-QTL effect is
that the Bayesian shrinkage method gener- close to zero. Note that the PML method
ated much better signals for QTL than the can select variables in the estimation
interval mapping approach. of parameters, handle a model with the
number of considered effects ten times
Model selection larger than the sample size (Zhang and Xu,
2005; Hoti and Sillanp, 2006) and be a
A composite model space approach was refined method of mapping QTL (Yi et al.,
proposed by Yi (2004) for mapping multi- 2006) because of small residual variance
ple non-epistatic QTL and extended by Yi at the beginning of parameter estimation.
et al. (2005) to epistatic QTL mapping for However, the PML method cannot detect
continuous traits. The key advantage of this epistasis between nearby markers because
approach is that it provides a convenient of their multi-collinearity. For the real data
way to reasonably reduce the model space analysis, two approaches are available for
and to construct efficient algorithms for epistatic analysis. With the PML method
exploring the complicated posterior distri- along with the variable-interval approach,
bution. Yi et al. (2007) proposed a Bayesian whole-genome mapping of epistatic QTL
model selection approach of genome-wide may be carried out by the use of all markers
interacting QTL for ordinal traits in experi- or the BSE method along with the variable-
mental crosses. They first developed a interval approach can be used to map epi-
Bayesian ordinal probit model for multiple static QTL.
interacting QTL on the basis of the com- MCMC and especially the Gibbs sam-
posite model space framework and then pler allows for the efficient exploration of
used this framework to develop an efficient very complex likelihood surfaces and cal-
MCMC algorithm for identifying multiple culation of Bayesian posterior distributions.
interacting QTL for ordinal traits. For these reasons, Walsh (2001) predicted
that the next 20 years will likely be marked
Penalized maximum likelihood (PML) method by a strong influx of Bayesian methods
replacing their likelihood counterparts. In
Integrating the shrinkage estimation with contrast to classical methods, the Bayesian
maximum likelihood (ML) method can MCMC approach necessitates more human
1
n
n q
b j =
x + s /s
2
ij
2
e
2
j
x ij y i b0
x ik bk + ms e2 / s 2j
i =1 i =1 kj
Molecular Dissection of Traits: Theory 223
effort and care to ensure that the simulation (2002), Flint-Garcia et al. (2003), Breseghello
produces a representative sample from the and Sorrels (2006b), De Silva and Ball (2007),
posterior distribution. This requires careful Mackay and Powell (2007), Oraguzie et al.
monitoring of the convergence and the mix- (2007), Zhu et al. (2008), Buckler et al. (2009),
ing properties of the MCMC sampler. Myles et al. (2009) and Yu et al. (2009).
usually selected from highly diverse germ- regarded as an initial screening for identi-
plasm. Thirdly, the relatively low number of fication of QTL (Bar-Hen et al., 1995; Virk
generations after maximum LD, where the et al., 1996). The development of saturated
maximum LD is reached in the F1, implies a linkage maps and highly informative micro-
reduced number of sampled meioses within satellite and SNP markers in plants makes
designed populations (typically a few hun- it possible to systematically survey marker
dred), leading to relatively long stretches trait association on a whole-genome scale.
of chromosome being in LD. Consequently, Compared with transmission-based linkage
the characteristic size of confidence inter- mapping, LD mapping provides more oppor-
vals for QTL locations is between 10 and tunities for breeding applications since
20 cM (Darvasi et al., 1993). In addition, hundreds of germplasm accessions that are
germplasm resources and breeding popula- useful as parents in breeding are involved.
tions that have been accumulating in breed- An important asset of LD mapping strategies
ing programmes with available phenotypic is the straightforward utilization of large
information cannot be used so that genetic amounts of historical phenotypic data that
mapping and breeding are usually two sepa- are available for mapping efforts at no or lit-
rate, independent procedures. tle extra costs, especially when evaluation
LD mapping takes advantage of events of the trait is time and money consuming,
that created association in the relatively dis- as is the case with mean yield, adaptabil-
tant past. Assuming many generations and ity and stability. As an increasing number
therefore meioses have elapsed since these of germplasm accessions are evaluated with
events, recombination will have removed molecular markers and phenotyped for agro-
association between a QTL and any mark- nomic traits, it is essential to consider using
ers not tightly linked to it. LD mapping thus the LD mapping approach to map genes or
allows for much finer mapping than stand- at least to provide a pre-screen for linkage-
ard biparental cross approaches. At a fun- based genetic mapping (Xu, Y., 2002).
damental level, both LD and linkage rely on
the co-inheritance of adjacent DNA variants,
with linkage capitalizing on this by identi-
fying haplotypes that are inherited intact 6.8.2 Measurement of linkage
over several generations and LD relying disequilibrium
on the retention of adjacent DNA variants
over many generations. Thus, LD studies A variety of statistics have been used to
can be regarded as very large linkage stud- measure LD. Delvin and Risch (1995) and
ies of unobserved, hypothetical pedigrees Jorde (2000) reviewed the relative advan-
(Cardon and Bell, 2001). LD analysis has the tages and disadvantages of each statisti-
potential to identify a single polymorphism cal approach. Here, we introduce the two
within a gene that is responsible for the dif- most common statistics for measuring LD:
ference in phenotype and is perfectly suited r2 and D'. Consider a pair of loci with alleles
for sampling a wide range of alleles from A and a at locus one and B and b at locus
germplasm collections with high resolu- two, with allele frequencies pA, pa, pB and pb,
tion (Flint-Garcia et al., 2003). A less obvi- respectively. The resulting haplotype fre-
ous additional attractive property is that quencies are pAB, pAb, paB and pab. The basic
LD mapping approaches offer possibilities component of all LD statistics is the differ-
for QTL identification in polyploidy crops ence between the observed and the expected
with hard to model segregation patterns haplotype frequencies,
(Malosetti et al., 2007).
For markertrait association, differences Dab = (pAB pApB)
in both phenotype and allele frequency can
be identified in a group of cultivars that The distinction between these statistics lies
are derived from a common ancestral gene in the scaling of this difference (Flint-Garcia
pool (Xu and Zhu, 1994). The procedure is et al., 2003).
Molecular Dissection of
Traits: Theory 225
The first of the two measures, r2, also LD when the polymorphisms are not com-
described in the literature as 2, is calcu- pletely correlated, but there is no evidence
lated as of recombination. One way this type of LD
structure can develop is when the muta-
tions occur on different allelic lineages.
( Dab )2
r2 = This situation can reflect the same recom-
p A pa p B pb binational history but different mutational
histories. This is the situation in which r2
It is convenient to consider r2 as the square and D' act differently, with D' still equal to
of the correlation coefficient between the 1, but where r2 can be much smaller. Figure
two loci. However, unless the two loci 6.3C shows an example of polymorphisms
have identical allele frequencies, a value in linkage equilibrium. If the sites are
of 1 is not possible. Statistical significance linked, then equilibrium could be produced
(P-value) for LD is usually calculated using by a recombination event between the two
either Fishers exact test to compare sites sites. In this case, the recombinational his-
with two alleles at each locus or multi- tory differs for the various haplotypes but
factorial permutation analysis to compare the mutational history is the same. Hence,
sites with more than two alleles at either both r2 and D' will be zero.
or both loci. Although neither r2 nor D' perform
Alternatively, the LD statistic D' extremely well with small sample sizes
(Lewontin, 1964) is calculated as and/or low allele frequencies, each has
distinct advantages. Whereas r2 summa-
( Dab )2 rizes both recombinational and mutational
D = for Dab < 0 history, D' measures only recombinational
min(p A p b , p a p B )
history and is therefore the more accurate
statistic for estimating recombination dif-
( Dab )2
D = for Dab > 0 ferences. However, D' is strongly affected
min(p Ap B , p ap b ) by small sample sizes, resulting in highly
erratic behaviour when comparing loci
D' is scaled based on the observed allele fre- with low allele frequencies. This is due to
quencies, so it will range between 0 and 1 the decreased probability of finding all four
even if allele frequencies differ between the allelic combinations of low frequency poly-
loci. D' will only be less than 1 if all four morphisms even if the loci are unlinked.
possible haplotypes are observed; hence, a For the purpose of examining the resolu-
presumed recombination event has occurred tion of association studies, the r2 statistic is
between the two loci. preferred, as it is indicative of how markers
The statistics r2 and D' reflect different might correlate with the QTL of interest.
aspects of LD and perform differently under There are two common ways to visual-
various conditions. Figure 6.3 presents three ize the extent of LD between pairs of loci
A
1 2
Locus 1
Locus 2
6 0
0 6
lDl = 1
r2 = 1
B
1 2
Locus 1
Locus 2
6 0
3 3
lDl = 1
r 2 = 0.33
C
1 2
Locus 1
Locus 2
3 3
3 3
lDl = 0
r2 = 0
Fig. 6.3. Hypothetical scenarios of linkage disequilibrium (LD) between linked polymorphisms caused by
different mutational and recombinational histories demonstrating the behaviour of the r 2 and D' statistics.
Images in the left column represent the allelic states of two loci. The middle column represents the
2 2 contingency table of haplotypes and the resulting r 2 and D' statistics. The right column represents
a possible tree responsible for the observed LD present. (A) An example of absolute LD, where the
two polymorphisms are completely correlated with one another. (B) An example of LD when the
polymorphisms are not completely correlated, but there is no evidence of recombination. (C) An example
of when polymorphisms are in linkage equilibrium. Modified from Rafalski (2002).
random variation in LD owing to a variety including the allele frequency of QTL, its
of forces discussed below. effects, its location and its population asso-
The limits of linkage analysis and LD ciation with a known marker locus.
mapping when they are used alone can be
overcome by a joint mapping strategy as
demonstrated by Wu and Zeng (2001) in
which a random sample from a natural pop- 6.8.3 Factors affecting linkage
ulation and the open-pollinated progeny disequilibrium
of the sample were analysed jointly. The
joint linkage and LD mapping strategy was In a large, randomly mated population
extended to map QTL segregating in a natu- with loci segregating independently,
ral population (Wu et al., 2002b). The exten- but in the absence of selection, muta-
sion allows for simultaneous estimates of a tion or migration, polymorphic loci will
number of genetic and genomic parameters be in linkage equilibrium (Falconer and
Molecular Dissection of Traits: Theory 227
for control individuals or for the opposite molecular marker data. Many individuals or
extreme. The TDT has been extended to lines will not belong uniquely to one popu-
study haplotype transmissions, quantita- lation, but will be the descendents of crosses
tive traits, the use of sib pairs rather than between two or more ancestral populations.
parents and progeny and information from STRUCTURE also estimates the proportion
extended pedigrees. of ancestry attributable to each popula-
In crops, parental and progeny lines are tion. Following allocation of individuals to
usually separated by several generations of populations, the test for association is car-
gametogenesis rather than by one. In this ried out in a model fitting exercise. Here,
case, the TDT is still valid, but might no the principle is that variation attributable
longer be so robust: the process of breed- to population membership is accounted for
ing might itself distort segregation patterns. first, using estimates of population mem-
A family-based association test that is appli- bership from STRUCTURE and then the pres-
cable to plant breeding programmes has ence of any residual association between
been proposed by Stich et al. (2006). The the marker and phenotype is tested. For
authors point out that for candidate gene example, to test for association between a
studies, this method is more cost-effective quantitative trait and a microsatellite, the
than the alternative methods described trait is first regressed on the estimated coef-
below given that no additional control mark- ficients of population membership and then
ers are required. However, some power will on the marker coded as a factor as if in an
be lost because only progeny derived from analysis of variance (Aranzana et al., 2005).
F1s known to have a heterozygous marker Alternatively, the groups can be integrated
genotype are informative. Laird and Lange as an extra factor or a set of covariables in
(2006) reviewed TDT and other family- a statistical model relating phenotype to
based association tests. genotype (Thornsberry et al., 2001; Wilson
et al., 2004).
Structured association As a valid alternative to the use of
STRUCTURE, classical multivariate analysis
Structured association provides a sophis- methods can be used to classify genotypes.
ticated approach to detecting and control- In that case a matrix of genetic/genotypic
ling population structure (Pritchard et al., distances is calculated from molecular
2000a,b; Falush et al., 2003; Mackay and marker information and used as input
Powell, 2007). To deal with non-functional, for clustering and/or scaling techniques
spurious associations between a phenotype (Ivandic et al., 2002; Kraakman et al., 2004).
and an unlinked candidate gene caused For collections of cultivars and breeding
by the presence of population structure lines, genotypic relationships as obtained
and unequal distribution of alleles within from the pedigree or from similarities in
sub-populations, several methods have neutral marker profiles (Yu et al., 2006) can
been proposed. Pritchard et al. (2000b) be translated into distances that are subse-
proposed a method of testing association quently analysed by cluster analysis. The
that depends on the inferred ancestries of groups detected by such a cluster analysis
individuals. Ancestries were inferred by can be interpreted as representing popula-
a Bayesian method proposed by Pritchard tion structure and form an approximation to
et al. (2000a). Thornsberry et al. (2001) the original relationships between the geno-
extended this method to deal with a quanti- types as present before the grouping. The
tative trait and studied a candidate gene for identified groups can be used as a kind of
the control of flowering time in maize. correction factor in association analyses.
The computer program STRUCTURE
(http://pritch.bsd.uchicago.edu/software/ Principal component analysis
structure2_1.html; Pritchard et al., 2000a)
uses computationally intensive methods to A method termed EIGENSTRAT (Price et al.,
partition individuals into populations given 2006) is based on principal component
230 Chapter 6
and Sorrels, 2006a; Auzanneau et al., 2007; as hot spots for markertrait associations
Crossa et al., 2007; Brown et al., 2008; Dhoop have been assigned to QTL clusters. Several
et al., 2008; Raboin et al., 2008; Weber et al., highly consistent alleletrait associations
2008; Buckler et al., 2009; Chan et al., 2009; were revealed among multiple alleles at
McMullen et al., 2009; Stich et al., 2009). specific loci.
Earlier attempts at establishing association The same data set was used to evaluate
between traits and markers across germplasm the potential of discriminant analysis, a mul-
collections concerned rice, oats, maize, sea tivariate statistical procedure, to detect can-
beet and barley. In rice, Virk et al. (1996) pre- didate markers associated with agronomic
dicted the value for six traits using multiple traits (Zhang et al., 2005). Model-based meth-
linear regression. In oats, Beer et al. (1997) ods revealed population structure among the
found associations between markers and 13 lines. Marker alleles associated with all traits
quantitative traits in a set of 64 landraces and were identified by discriminant analysis at
cultivars. In maize, Thornsberry et al. (2001) high levels of correct percentage classifica-
found associations between Dwarf8 poly- tion within sub-populations and across all
morphisms and flowering time. In sea beet, lines. Associated marker alleles pointed to
Hansen et al. (2001) mapped the bolting gene, the same and different regions on the rice
using AFLP markers in four populations. genetic map when compared to previous QTL
In barley, Igartua et al. (1999) concluded mapping experiments. Results suggested
that markertrait associations for heading that candidate markers associated with agro-
date, found in mapping populations, were to nomic traits can be readily detected among
some extent, maintained in 32 cultivars. inbred lines of rice using discriminant analy-
Ivandic et al. (2003) found association sis combined with other methods.
between markers and the traits of water- Using 236 AFLP markers and 146 mod-
stress tolerance (chromosome 4H) and ern two-row spring barley cultivars, asso-
powdery mildew resistance in 52 wild bar- ciations between markers were found for
ley lines. Chromosome 4H is, according to markers as far apart as 10 cM (Kraakman et
Forster et al. (2000), known for many loci al., 2004). Subsequently, for the 146 cultivars
involving abiotic stress tolerance, includ- the complex traits mean yield, adaptability
ing salt tolerance, water use efficiency and (FinlayWilkinson slope) and stability (devi-
adaptation to drought environments. ations from regression) were estimated from
Using 237 rice accessions collected from the analysis of cultivar trial data. Regression
around the world and genotypic data for 100 of those traits on individual marker data dis-
restriction fragment length polymorphism closed markertrait associations for mean
(RFLP) and 60 simple sequence repeat (SSR) yield and yield stability. Many of the associ-
marker loci and phenotypic data for 12 traits, ated markers were located in regions where
a stronger markermarker association was earlier QTL were found for yield and yield
found in the cultivar groups that had greater components. In tetraploid potato, LD map-
genetic variation or closer pedigree relation- ping has been successfully applied to study-
ship (Xu, Y., 2002). Markers within linkage ing disease resistances for which candidate
groups showed stronger allelic association genes were defined (Gebhardt et al., 2004;
than markers between linkage groups. The Simko et al., 2004a,b).
statistical associations, however, could not Historical multi-environmental trial
be interpreted solely from genetic linkage. data provides comprehensive phenotypic
Comparison of markertrait association in data for LD mapping and modelling geno-
different cultivar groups demonstrated that type-by-environment interaction. Crossa
both phenotypic variation and pedigree et al. (2007) reported a comprehensive
relationship among rice accessions strongly study using historical wheat data. Mapped
influenced the association detection. diversity array technology (DArT) markers
A highly consistent alleletrait association (Chapter 2) were used to find association
was revealed among multiple alleles at a with resistance to stem rust, leaf rust, yellow
given locus. Several chromosomal regions rust and powdery mildew, plus grain yield
Molecular Dissection of Traits: Theory 233
location of a QTL and its effect as com- the curvature (Fisher information) of the
pared with any single study. However, log-likelihood profile at the estimated map
there are many challenges in combining position
the results of QTL mapping across studies,
including differences in marker density, si = [ 2 ln L/d2|d = di]1/2
linkage map, sample size, study design,
as well as statistical methods used. One In particular, the curvature is estimated
aspect that might transcend the meta- by fitting a local quadratic near the maxi-
analysis problem and benefit the whole mum of ln L and determining the coefficient
field of QTL detection and location is of the quadratic term. These standard errors
the reliability of the principal parameters are used to construct a weighted estimate of
which characterize QTL: position, confi- QTL location, the weights being inversely
dence interval, R2 and LOD score (Hanocq proportional to the squared standard errors
et al., 2007). These parameters are criti- (wi = si2).
cal to the meta-analysis process but are For studies that did not include an
often only partially reported in research interval map, average standard errors
papers.
m
s = (1 / m) s
i =1
2
i
congruency. Even if the average number behind this meta-analysis is to estimate the
of QTL per experiment is around four in variance of these effects, sA2. Next assume
plants (Kearsey and Farquhar, 1998; Xu, that for each sire in the available studies,
Y., 2002; Chardon et al., 2004), it would be the estimate of the QTL allelic substitution
expected that more than four genes can be effect, ai, is i with corresponding standard
involved in the trait variation on a single error Vi = se(i) and variance Vi2, i = 1, 2, ,
chromosome. n, where n is the number of sires. To
A meta-analysis of flowering time and model the imprecision of i estimating ai,
related traits in maize from 22 QTL detection we assume that i|ai N(ai,Vi2) and conse-
studies concluded that a total of 62 different quently, the unconditional distribution of
QTL are likely to be involved in the variation estimated effects will be i N(0,Vi2 + sA2).
of these traits, whereas on average four to As also considered by Hayes and Goddard
five QTL were detected in single-population (2001), there are two other features that
analyses (Chardon et al., 2004). To remove need to be modelled in the meta-analysis.
these impediments, Veyrieras et al. (2007) First, since it is to a certain extent arbitrary
developed a new two-stage meta-analysis which sire allele is labelled as having a
procedure in order to integrate multiple positive effect, we will ignore the sign and
independent QTL mapping experiments condition on ai > 0 and i > 0. Secondly,
with the aim of creating a global framework only significant QTL tend to be pub-
to evaluate the homogeneity of both genetic lished (resulting in potential publication
marker and QTL mapping results from lit- bias), so we assume that i > c where c is
erature and public databases. First, it imple- the threshold QTL effect that just reaches
ments a new statistical approach to merge publication level. With these constraints,
multiple distinct genetic maps into a single the probability density function, h(), for
consensus map which is optimal in terms the observed QTL effects will be
of weighted least squares and can be used
to investigate recombination rate heteroge- h(i|ai > c) = ni(i)/[1 Ni(c)], i > c
neity between studies. Secondly, assuming
that QTL can be projected on the consensus where say,
map, METAQTL, a computational and statisti-
cal package developed for the whole-genome 1 y2
ni ( y ) = exp 2
meta-analysis of QTL mapping experiments, 2p (V + s )
i
2 2
A
2(Vi + s A )
2
this was not reported. Consequently, the the largest interval between adjacent mark-
consensus estimate of s A2 will be the propor- ers when considering the Nm markers; and
tion of the phenotypic variance explained (v) the weighted standard deviation stand-
by the consensus QTL. ardized to 100 cM (which evaluates the
heterogeneity in homothetic coefficients
for intervals within and flanking a QTL CI
region). Chromosomes of groups 2 and 5
6.9.4 Examples of meta-analysis had greater control over the incidence of
earliness as they carry the known, major
Meta-analysis of all identified QTL prom- genes Ppd and Vrn. The other four chromo-
ises to contribute to our understanding of some regions played an intermediate role in
fundamental questions and to expedite control of earliness.
crop improvement. Khatkar et al. (2004) In cotton, a total of 432 QTL involv-
reviewed the results of QTL mapping in ing cotton fibre quality, leaf morphology,
dairy cattle. Based on the information flower morphology, resistance to bacteria,
available in the public domain, they devel- trichome distribution and density and
oped an online QTL map for milk produc- other traits that were mapped in one dip-
tion traits. To extract the most information loid and ten tetraploid interspecific cotton
from these published records, a meta-anal- populations, was aligned using a reference
ysis was conducted to obtain consensus on map which consisted of 3475 loci in total
QTL location and allelic substitution effect and was depicted in a CMAP resource (Rong
of these QTL. The meta-analysis indicated et al., 2007). Meta-analysis of polyploidy
a number of consensus regions, the most cotton QTL showed unequal contributions
striking being two distinct regions affect- of sub-genomes to a complex network of
ing milk yield on chromosome 6 at 49 cM genes and gene clusters implicated in lint
and 87 cM explaining 4.2 and 3.6% of the fibre development. QTL correspondence
genetic variance of milk yield, respectively. across studies was only modest, suggest-
Outputs from such analyses highlight the ing that additional QTL for the target traits
specific areas of the genome where future remain to be discovered. Crosses between
resources should be directed to refine char- closely-related genotypes differing by sin-
acterization of the QTL. gle-gene mutants yield profoundly differ-
To identify the genome regions of bread ent QTL landscapes, suggesting that fibre
wheat involved in the control of earliness variation involves a complex network of
and its three components: photoperiod interacting genes. Meta-analysis linked
sensitivity, vernalization requirement and to synteny-based and expression-based
intrinsic earliness, Hanocq et al. (2007) car- information provides clues about spe-
ried out a QTL meta-analysis to examine cific genes and families involved in QTL
the replicability of QTL across 13 inde- networks.
pendent studies and to propose meta-QTL. Munaf and Flint (2004) described
QTL were projected on to the reference map how meta-analysis works and considered
using the BIOMERCATOR 2.0 software (Arcade whether it will solve the problem of under-
et al., 2004). To assess the reliability of this powered studies or whether it is another
projection, five variables were calculated to affliction visited by statisticians on geneti-
assess QTL projection quality for each QTL: cists. A crucial question for any meta-analy-
(i) the percentage of QTL confidence inter- sis is the degree of heterogeneity that exists
val (CI) included in the linked region; between the individual studies, which is per-
(ii) Nm (the number of common markers haps, not surprisingly common. Ioannidis
characterizing a QTL CI region, i.e. within et al. (2001) conducted a meta-analysis of
and flanking it); (iii) local map density 370 studies addressing 36 genetic associa-
(which is computed as the local average tions. They found that significant between-
distance of the Nm markers on the projected study heterogeneity is frequent and that
map); (iv) maximum gap size or the size of the results of the first study often correlate
Molecular Dissection of Traits: Theory 237
only modestly with subsequent research progeny, reducing the time required for
on the same association. It has been argued QTL interval identification to milliseconds
that meta-analysis is analogous to averag- when a large number of related data become
ing the characteristics of apples and orange available.
(Hunt, 1997) and consequently, its outcome
is meaningless. Another concern in meta-
analysis is publication bias that can exist
when non-significant findings remain 6.10.1 Pros and cons
unpublished, thereby artificially inflat-
ing the apparent magnitude of the effect. As massive amounts of phenotypic data for
The concern is not new and was raised in different traits have accumulated in pub-
the late 1950s in relation to psychiatric and lic and private plant breeding programmes
psychological research in humans (Sterling, in major crop species, in silico mapping
1959). in plants has become possible and attrac-
As indicated by Munaf and Flint tive. Compared with designed mapping
(2004), meta-analysis has been successful experiments, in silico mapping has several
in revealing unexpected sources of hetero- advantages (Grupe et al., 2001; Parisseaux
geneity, such as publication bias. If hetero- and Bernardo, 2004). First, in silico map-
geneity is adequately recognized and taken ping exploits larger populations than
into account, meta-analysis can confirm the designed mapping experiments. In maize,
involvement of a genetic variant, but it is for example, thousands of experimental
not a substitute for an adequately powered hybrids are evaluated each year (Smith
primary study. et al., 1999). In contrast, the small populations
(e.g. fewer than 500 progenies) often used
in designed mapping experiments lead to a
low power for detecting QTL (Melchinger
6.10 In Silico Mapping et al., 1998), overestimation of QTL effects
(Beavis, 1994) and imprecise estimates of
As an alternative to designed mapping QTL location (van Ooijen, 1992; Visscher
experiments using an F2 or BC mapping et al., 1996). Secondly, phenotypic data
population, in silico mapping was devel- used for in silico mapping are obtained
oped to detect genes by simultaneously through more extensive testing under mul-
exploiting existing phenotypic, genotypic tiple, diverse environments. An experimen-
and pedigree data available in breed- tal maize hybrid is typically evaluated in
ing programmes and genomic databases. 20 environments; those that are eventually
Grupe et al. (2001) were the first to use this released as cultivars are evaluated in up to
approach to investigate whether chromo- 1500 locationyear combinations (Smith
somal regions regulating quantitative traits et al., 1999). The use of many environments
(QTL intervals) could be computationally permits the sampling of a sufficient set of
predicted with the use of the mSNP data- QTL environment interactions. Thirdly,
base and available phenotypic informa- the hybrids and inbreds tested typically
tion obtained from mouse inbred strains. represent a wide sample of the germplasm
The phenotypic and genotypic information and genetic backgrounds. In contrast, only
was analysed in silico to identify candidate a narrow genetic background is exploited in
QTL intervals. Ability of the computational designed mapping experiments that use F2
method to correctly predict QTL intervals or BC populations. Fourthly, the data used
was evaluated and 19 of 26 experimentally for in silico mapping are already available
verified QTL intervals for ten phenotypic without extra cost.
traits were correctly identified. In silico Offsetting these advantages are three
mapping can eliminate many months to main complications to in silico mapping
years of laboratory work required to gener- (Parisseaux and Bernardo, 2004). First, the
ate, characterize and genotype intercross performance data are highly unbalanced:
238 Chapter 6
the same set of hybrids or inbreds are evalu- mapping via a mixed-model approach can
ated in a different set of environments, as detect associations that are repeatable
some hybrids or inbreds that fail to perform across different populations.
well are discarded and those that perform Because of differences in the germ-
well are subjected to more testing. Secondly, plasm used, the numbers of QTL identified
the hybrids or inbreds do not comprise through in silico mapping were not directly
a single homogenous population. Any in comparable with those previously detected
silico mapping procedure would therefore through designed mapping experiments. On
have to account for pedigree relationships the one hand, the wide range of germplasm
and differences in the genetic backgrounds sampled with in silico mapping enhances
among tested hybrids or inbreds. Thirdly, the detection of many QTL. On the other
few crops have enough data available for in hand, mapping populations are often devel-
silico mapping. oped by crossing two parents that are widely
divergent for a trait, e.g. susceptible parent
and resistant parent for smut. A diverse
mapping population also enhances the
6.10.2 Mixed-model approach detection of many QTL. In the largest QTL
mapping study published in maize (976
The usefulness of in silico mapping families from an F2 population, genotyped
has been explored via a mixed-model with 172 markers and evaluated in 19 envir-
approach in maize (Zea mays L.) to deter- onments), Openshaw and Frascaroli (1997)
mine whether the procedure gave results detected 36 significant markers for plant
that were repeatable across populations height and 32 for grain moisture (data for
(Parisseaux and Bernardo, 2004). Multi- smut resistance were absent). This result
location data were obtained from the for plant height (36 QTL) was consistent
19952002 hybrid testing programme with the number of significant markers
of Limagrain Genetics in Europe, which detected for plant height (37) via in silico
included: (i) multi-location phenotypic mapping. The number of significant mark-
data for 22,774 single-cross hybrids; ers (44) for grain moisture was larger than
(ii) SSR marker data at 96 loci for the that detected by Openshaw and Frascaroli
1266 parental inbreds of the single-cross (1997), perhaps because of a wider range
hybrids; and (iii) pedigree records for the of maturities sampled in the in silico map-
1266 parental inbreds which were classi- ping germplasm than in the single F2 popu-
fied into nine different heterotic groups. lation used by Openshaw and Frascaroli
Using a mixed-model approach, the (1997). For smut resistance, Lbberstedt
general combining ability effect associ- et al. (1998a) detected 19 significant mark-
ated with marker alleles in each heterotic ers across four populations, whereas Kerns
pattern was estimated. The numbers of et al. (1999) detected 22 significant markers
marker loci with significant effects 37 in one population. These previous results
for plant height, 24 for smut (Ustilago were consistent with the number of signifi-
maydis (DC.) Cda.) resistance and 44 for cant markers (24) detected for smut resist-
grain moisture were consistent with ance in in silico mapping.
previous results from designed mapping
experiments. Each trait had many loci
with small effects and few loci with large
effects. For smut resistance, a marker in 6.10.3 Statistical power
bin 8.05 on chromosome 8 had a signifi-
cant effect in seven (out of a maximum It has been shown that the heritability and
of 18) instances. For this major QTL, the genetic architecture (e.g. number of QTL
maximum effect of an allele substitution and distribution of effects) of the trait and
ranged from 5.4% to 41.9%, with an aver- resources available for QTL mapping (e.g.
age of 22.0%. It is concluded that in silico sample size and number of markers) affect
Molecular Dissection of Traits: Theory 239
the statistical power of designed QTL in hybrid crops can be initiated by in silico
mapping experiments as discussed in this mapping. Finding an acceptable compro-
chapter. These genetic and non-genetic fac- mise, however, between the power to detect
tors are also expected to affect the power QTL and the proportion of false QTL would
of in silico mapping via a mixed-model be necessary.
approach. In plant breeding programmes, the
The statistical power of the in silico phenotypic data are highly unbalanced
mapping method was evaluated via a and the inbreds and hybrids have a pedi-
mixed-model approach in hybrid crops (Yu gree structure. In silico mapping via a
et al., 2005). Simulation mimicked a two- mixed-model approach accommodates
stage breeding process in maize, with inbred unbalanced data, pedigree relationships
development and hybrid testing. First, two and different heterotic groups of parental
opposite heterotic groups were considered, inbreds by fitting relevant terms in the
each having a total of n1 = n2 = 112 inbreds mixed model. Furthermore, the relative
developed from different ancestral inbreds. effects of the QTL are measured by the
Secondly, it was assumed that n = 600 or regression coefficients of the significant
2400 hybrids, among all potential single- markers and the approximate positions of
cross hybrids (112 112 = 12,544) between the QTL are indicated by the location of
the two heterotic groups, had data available the significant markers.
from multi-location performance trails. The As with other QTL mapping methods,
number of inbreds in each heterotic group the results from in silico mapping should
and the number of hybrids with avail- be followed by fine mapping at the target
able phenotypic data were chosen to agree regions, sequence analysis and functional
with the empirical data of Parisseaux and tests of gene effects (Glazier et al., 2002). In
Bernardo (2004). hybrid crops for which multiple heterotic
A total of 64 simulation experiments groups exist, in silico mapping via a mixed-
was conducted. These 64 experiments had model approach can be applied to differ-
contrasting values of six different param- ent heterotic patterns. Subsequently, the
eters: level of initial LD (t = 10 or 20 gen- markers or the genomic regions that show
erations of random mating), significance a repeatable association with the trait of
level (a = 0.01 or 0.0001), number of QTL interest across different populations can be
(l = 20 or 80), heritability (H = 0.40 or 0.70), considered as the prime targets for further
number of markers (m = 200 or 400) and analysis (Parisseaux and Bernardo, 2004).
sample size (n = 600 or 2400 hybrids). For Cross validation by conducting in silico
each experiment, 50 runs were conducted mapping in multiple heterotic patterns
with different locations of QTL and mark- would result in better control of the overall
ers on the genetic map and different inbreds false discovery rate and provide increased
and hybrids. confidence for conducting further investiga-
It was found that the average power tion in putative QTL regions.
to detect QTL ranged from 0.11 to 0.59 for
a significance level of a = 0.01 and from
0.01 to 0.47 for a = 0.0001. The false dis-
covery rate ranged from 0.22 to 0.74 for a
6.11 Sample Size, Power
= 0.01 and from 0.05 to 0.46 for a = 0.0001. and Thresholds
As with designed mapping experiments, a
large sample size, high marker density, high 6.11.1 Power and sample size
heritability and small number of QTL led
to the highest power for in silico mapping There are two types of errors that can be
via a mixed-model approach. The power to made when carrying out a statistical test.
detect QTL with large effects was greater A false positive (a Type I error) occurs when
than the power to detect QTL with small the null hypothesis is rejected when in fact
effects. It is concluded that gene discovery it is correct. We control for this by setting
240 Chapter 6
a low significance level a for a test (the hypothesis t = 0 and F(x) is the standard
probability of a false positive). The other normal cumulative distribution function.
source of error is a false negative (a Type II For given a and b, the sample size n
error), i.e. failing to reject the null hypoth- required for the test is
esis when in fact it is false. The power of
a test is defined to be the probability that za + z b
2
Now it remains to determine the likely Thus as long as f, the proportion of the
magnitudes of 2a/se. Suppose that a QTL genetic variation attributed to the QTL, is
contributes to a proportion f of the genetic fixed, the required sample size for the test
variance sg2 in an F2 population. Assuming is unchanged.
that no other genes are linked to the QTL
and ignoring the dominance (d = 0),
Effect of linkage: multiple linked QTL
(2a)2 There are two issues that need to be
= f s g2/s e2
8s e2 considered:
sg2/se2 is an unknown quantity. For example, 1. Detection of QTL on the chromosome:
assuming hF22 = s g2/(s g2 + s e2) = 0.6 means for two linked QTL, if the model is mis-
identified (two QTL analysed as one), the
s g2 (2a)2 power to identify the one QTL is based on
2
= 1.5 and = 12 f the joint effect of QTL (a weighted sum). If
se s e2
the two QTL are in coupling linkage, the
Given that a* = 0.001 and b = 0.1 (z0.001 + joint effect is aggregated and thus power is
z0.1 = 3.09 + 1.28 = 4.37), the required sam- increased. If the two QTL are in repulsion
ple sizes for detecting leading QTL for f = linkage, the joint effect is reduced. Thus
0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4 and 0.5 are power is decreased and it can be very low.
n = 1653, 826, 330, 165, 82, 55, 41 and 33. However, if the model can be identified
correctly (searching for two QTL or condi-
tional searching), the issue is about separat-
Effects of dominance
ing linked QTL and the power to identify
Depending on the degree of the dominance repulsion-linked QTL is not necessarily
effect, the sample size required for detect- very low.
ing a dominance effect may need to be sub- 2. Separating linked QTL (identifying both
stantially increased. Dominance does not, QTL): the required sample size is increased
however, affect the calculation of the power by a factor (Zeng, 1993)
detecting QTL. For example, suppose d = a.
In this case we may use s i2 1/4
=
s i2 j r (1 r )
m M1 _ m M 2M 2 (1 2r )2a
t3 = =
s2 s2 16s 2/(3n) where si2 is the variance of marker i and s 2i.j
+ is the variance of marker i conditional on
3n/4 n/4
marker j.
But because of dominance The values for these factors corre-
sponding to the recombinant frequency, r,
3(2a)2 between the two QTL are shown at the bot-
= f s g2
16 tom of the page.
QTL detection and power calculation Experimental QTL studies in plant species
depend on QTL mapping analysis procedure: have been inadequate for drawing inferences
CIM is more powerful than simple interval about numbers, magnitudes and distribution
mapping; MIM is more powerful than CIM. of QTL for most quantitative traits. Unless
The power of the test can be increased large numbers of progeny are evaluated for
by combining information from multiple QTL, MAS will have minimal impact on
related traits, multiple crosses and multiple plant breeding (Gimelfarb and Lande, 1994a)
environments. The genetic structure becomes and new breeding strategies based on evalu-
more complex in this case and so does the ation of large numbers of progeny will be
statistical analysis. But, there are definite necessary to realize the potential of MAS.
advantages in the joint multiple trait analysis To test this hypothesis, the largest
for QTL identification (Jiang and Zeng, 1995) QTL experiment available in plants con-
and of course for hypothesis testing (pleio- ducted by Pioneer Hi-Bred in maize was
tropy) and parameter estimation. analysed in detail by Schn et al. (2004).
There are many factors that determine This study consisted of testcrosses of 976
how large a sample size is required for a spe- F4:5 lines derived from the cross of two
cific QTL mapping experiment. The sample elite lines. These materials together with
size depends on the heritability of the trait testcrosses of their parents were evaluated
of interest (any knowledge or guess), how in 16 environments. The F4:5 lines were
large an effect of a QTL (as a minimum) is assayed with 172 RFLP markers covering
expected to be detected (for example, detect the entire genome. With the entire data set
a QTL that explains 5% variation), what is (comprising N = 976 genotypes and E =
the likely complexity of the genetic archi- 16 environments) the number of detected
tecture of QTL and how many QTL, distri- QTL confirmed the infinitesimal model of
bution of effects, epistasis, etc. quantitative genetics (e.g. 30 QTL detected
with LOD 2.5 for plant height, explain-
ing 61% of the genetic variance).
For studying the effects of sample
6.11.2 Cross validation and sample size size as well as genotypic and environ-
mental sampling on the outcome of QTL
Cross validation (CV) is a re-sampling tech- analyses, the entire data set was parti-
nique that samples from a genetic cross (e.g. tioned into smaller data sets with N = 488,
F2 intercross) and divides a large-size sam- 244, 122 and E = 16, 4, 2. After randomi-
ple into several subsamples (e.g. k = 5). As zation of genotypes and environments,
an example, cross validation that was used the partitioning of the experimental data
for a sample size study (Melchinger et al., reference population, PED(N, E), was
2004) will be discussed in this section. repeated to obtain a total of 120 differ-
Melchinger et al. (2000) compiled a lit- ent small data sets for given values of N
erature survey based on 45 published QTL and E. Within each PED(N, E), heritabilities
studies in crops encompassing 34 complex were estimated and QTL analyses were
traits. Sample sizes ranged between 60 and performed for each data set with LOD =
380 with a median of 150. In most studies 2.50 and 3.21. Fivefold CV accounting
only a small number of QTL (median of six) for genotypic sampling was applied by
were detected which generally explained subdividing each small data set into five
a surprisingly large proportion (50% and genotypic samples. Four genotypic sam-
more) of the genetic variance. Although ples were used as the estimation set for
these findings seem to contradict Fishers localization of QTL and estimation of their
(1918) infinitesimal model upon which effects. The fifth sample was used as a
quantitative genetics is based, Beavis (1998) test set to obtain asymptotically unbiased
conjectured from simulation results that dif- estimates of the proportion of the geno-
ferent conclusions might be drawn if larger typic variance explained by QTL in each
experimental populations were evaluated. test set. For each small data set five
Molecular Dissection of Traits: Theory 243
30 15
20 10
16 16
10 4 5 4
2 E 2 E
0 0
976 488 244 122 976 488 244 122
N N
Fig. 6.5. Average number of QTL (nQTL) detected in the estimation set of 120 different data set and
partitionings PED(N, E) using standard cross validation and LOD = 2.50 and 3.21 for plant height. From
Melchinger et al. (2004) with kind permission of Springer Science and Business Media.
244 Chapter 6
Visscher and Goddard (2004) derived can protect against one or more false posi-
by theory simple equations that can be used tives and roughly adjust the size of each
to predict any confidence interval and give individual test. LOD distribution under the
expressions for the 95% interval. This con- null hypothesis at any particular location
fidence interval in centimorgans is l depends on design (1 degree of freedom
for BC, 2 degrees of freedom for F2) with
CI(1 b) (200x)X1b/(nd2)
LOD(l )
= LR ~ c12 or c 22
with x = 2 for F2 and x = 4 for BC and 0.217
X1 b the threshold of a central chi-squared Some point-wise P-values for different levels
distribution with 1 degree of freedom corre- of LR and LOD are provided in Table 6.7.
sponding to a cumulative density of (1 b).
For example, for BC and F2 populations, the
Genome-wide threshold
95% confidence interval are predicted as
Assume a dense marker map with mark-
CI95BC (200)(4)(3.84)/(nd2) = 3073/(nd)2 ers everywhere. LR test statistics are cor-
and related and correlation drops off quickly
with distance but there is no correlation for
CI95F2 (200)(2)(3.84)/(nd2) = 1537/(nd)2 unlinked markers.
Based on Lander and Bostein (1989),
The prediction of the CI as a function of the a genome-wide threshold value, t, can be
proportion of the variance explained by the determined using the OrnsteinUhlenbeck
QTL (q) is process as follows
For example, the 95% CI for a QTL that with Pr(c12 > t) = at, where C = number of
explains 35% of the variation in either a BC chromosomes; G = length of genome in
or F2 population is centimorgans; t = genome-wide threshold
value; and at = corresponding point-wise
CI95 (200)(3.84)(1 0.35)/(nq2) = 499/(nq)2 significance level.
Figure 6.6 shows LOD thresholds
A general form of the prediction of the CI for different point-wise and genome-wise
for dense marker maps that also applies to P-values in BC and F2 populations (Broman
other population structures is et al., 2003a).
A
0.20
0.100
0.15
0.001
P-value
P-value
0.10
1e-05
0.05
0.00 1e-07
0 1 2 3 4 5 6 0 1 2 3 4 5 6
LOD threshold LOD threshold
B
0.20
0.10
0.15 0.01
P-value
P-value
0.10
1e-04
0.05
0.00 1e-06
0 1 2 3 4 5 6 0 1 2 3 4 5 6
LOD threshold LOD threshold
Fig. 6.6. Point-wise and genome-wide P-values and LOD threshold for BC (A) and F2 intercross (B).
hypothesis by shuffling the quantitative trait values does not change the summary
trait values among the individuals in the statistics such as the number of individu-
data set. If there is a QTL effect at specific als, mean and variance of the individuals.
location(s) in the genome, there will be an Estimating significance threshold values
association between the trait values and the by permutation includes four steps (Doerge
point of analysis on the genetic map. If there and Churchill, 1996):
is no QTL present in the genome, or it is
1. Hold the genetic map fixed (i.e. keep the
unlinked to the point of analysis, there is
marker information from a sampled indi-
no markertrait association (i.e. exactly the
vidual intact. If the individual has m and
situation described under the null hypoth-
y, the elements of the vector m should be
esis). Permutation tests resolve the prob-
kept together and the trait values y should
lem of finding a significance threshold by
be shuffled over these).
simulating a large number of permutations
2. Shuffle the trait values.
(say 1000) of the observed data set (marker
3. Analyse the shuffled data set by applying
observations are shuffled with respect to
a t-test, likelihood ratio test or calculation of
traits) so that distribution of the test statistic
LOD score.
(LOD score) can be estimated under a null
4. Store the test statistic from each analysis
hypothesis as no relationship between traits
point of step 3 in an analysis matrix.
and markers. This distribution determines
how large the LOD values obtained from a Repeat steps 24 N times.
particular data set by chance can be. Threshold values can be created for
Shuffling trait values among individu- comparison-wise (per marker), chromo-
als in the data set represents the situation some-wise (chromosome specific) and
under the null hypothesis (Churchill and experiment-wise (experiment specific). In
Doerge, 1994), i.e. randomness. Shuffling the computer software developed for the
246 Chapter 6
Declare
incorporated into the model so that empiri- positives positives
QTL
cal thresholds for declaring significant QTL (3) (1)
can be calculated.
Fail to
declare QTL
6.11.5 False discovery rate
0.2 led to the largest responses to marker- QTL mapping has also been moving from
based recurrent selection, despite the high single trait-based analysis to integrated anal-
FDR. To prevent false QTL from confus- ysis of multiple traits or even thousands of
ing the literature and databases, a detected expression traits simultaneously. Methods
QTL should, in general, be reported as a for some specific traits including triploidy
QTL only if it was identified at a stringent endosperms and dynamic traits across dif-
significance level, e.g. aC 0.0001. In con- ferent developmental stages and methods for
clusion, the question of what proportion of complicated genetic effects including epista-
declared QTL in plants are false? cannot sis and genotype-by-environment interaction
be answered definitely because QTL stud- (Chapter 10) have also been developed.
ies have used different significance levels, Statistical methods have been devel-
traits differ in the number of underlying oped for almost all types of complicated
QTL and experiments have used differ- situation that would be encountered in
ent types of mapping populations (e.g. BC the genetic dissection of complex traits.
instead of F2 populations). However, most methods still remain at the
As a potential alternative to the FDR, stage of theory, published by statisticians
Chen and Storey (2006) proposed a gener- who keep moving forward to new publi-
alized version of genome-wide error rate cation opportunities or optimizing their
(GWER). Rather than guarding against any methodologies with endless efforts, leav-
single false positive linkage from occurring, ing few feasible options for geneticists.
the generalized GWER allows the researcher It should be noted that the best method
to guard against exceeding more than k false means nothing but statisticians games
positive linkage, where k is chosen by the unless it can be translated into user-
user. For example, if we set k = 1, then friendly software.
the goal for GWERk is to prevent more than Practically, there might be no general
one false positive linkage from occurring. method that can meet all requirements.
The user can apply this significance crite- This is because, on the one hand, the sim-
rion at the appropriate value of k or at sev- plest methods (such as phenotypic com-
eral values of k. GWERk allows the user to parison of alternative marker genotypes) are
provide a more liberal balance between true the best for simple traits; and on the other
positives and false positives at no additional hand, modelling or simulation, no matter
cost in computation or assumptions. how many parameters can be incorporated
into the model, might be too complicated
for complex traits that interact with vari-
ous environments. Ultimately, dissection
6.12 Summary and Prospects of complex traits will rely on continuous
research effort in applied QTL mapping
QTL mapping has evolved from single and integrated utilization of various infor-
marker-based, two flanking marker-based, mation and materials that has been accu-
to multiple marker-based approaches and mulating, including genetic and breeding
finally to all-marker-based whole genome materials (populations, structured mate-
approaches. It started by using simple and rials), molecular markers (sequences and
well-characterized F2 or RIL populations genes) and various phenotypic data col-
derived from biparental lines and now lected across environments (years, seasons
extends to any population including those and locations).
derived from multiple parents or randomly Commercial breeding programmes grow
selected materials. As an increasing number thousands of progenies per year derived
of populations, maps and QTL information from multiple, related and unrelated crosses
have been accumulating worldwide, it has and evaluate them for many agronomically
become increasingly important to use all important traits in diverse environments.
available data for meta-analysis, pooled With the use of high-throughput facilities
analysis and in silico mapping. (DNA sequencers, DNA microarrays, protein
248 Chapter 6
chips, etc.), these materials can be assayed QTL mapping; and (ii) the sampling of full
in parallel with the use of genomic tools. QTL variation present in a wide range of
Thus, the current limitations of classical germplasm, which allows the breeders
QTL mapping studies may soon be over- to search for the best alleles (allele min-
come by pedigree-based and/or haplotype- ing) present in elite materials as well as
based QTL mapping approaches (Jannink in genetic resources. On the other hand,
et al., 2001; Jansen et al., 2003). The main molecular dissection of complex traits will
ideas include: (i) the exploitation of pedigree depend on the utilization of both linkage-
and phenotypic data, routinely collected and LD-based methods (Manenti et al.,
in applied plant breeding programmes, for 2009; Myles et al., 2009).
7
Molecular Dissection of Complex Traits: Practice
In recent years, quantitative trait loci (QTL) requires three major data sets, which are
research has been attracting many scientists for molecular marker, phenotype and link-
resulting in numerous publications every age map, plus genetic mapping software
year. From the viewpoint of practice, how- that provide an appropriate mapping
ever, one should consider the whole picture procedure with user-friendly output. The
of QTL from single QTL to multiple QTL, linkage map can be specific to the map-
single traits to trait complexes, homogene- ping population or is inferred from phys-
ous to heterogeneous genetic backgrounds ical positions available for the molecular
and static mapping to dynamic mapping. markers.
Advances in molecular dissection of com-
plex traits could answer the following
questions: How many genes are involved
in genetic control of each quantitative trait 7.1 QTL Separating
in a segregating population? Can we sep-
arate closely linked QTL into single units? Most quantitative traits can be genetically
How can we compare QTL across different associated with molecular markers that are
genetic backgrounds and developmental located in different chromosomal regions.
stages? How can we handle multiple traits These regions represent either separate sin-
and expression QTL? Issues related to gle QTL or multiple closely linked QTL. The
these questions will be addressed in this number and distribution of multiple QTL
chapter. Some theoretical considerations on chromosomes determines their manipu-
will be also discussed as complementary lability in genetics and breeding. Generally,
to Chapter 6. For general resources, readers multiple QTL affecting a specific trait have
may refer to Xu (1997), Liu (1998), Lynch four possible distributions on chromo-
and Walsh (1998), Paterson (1998), Flint somes (Xu, 1997; Fig. 7.1): (i) independent
and Mott (2001), Xu, Y. (2002), Collard QTL genes are independently distributed
et al. (2005), Gibson and Weir (2005) and on each chromosome; (ii) loosely linked
Wu and Lin (2006). QTL genes are located on the same chro-
QTL mapping practice will usually mosome but separated by large distances so
include creating, genotyping and pheno- that they recombine with high frequency
typing mapping populations, generating and can be easily separated; (iii) clustered
genetic linkage maps and establishing QTL genes are closely linked or clustered
markertrait association. QTL mapping in a specific chromosomal region so that
Minor
gene
mapping
Fig. 7.1. Models for QTL distribution. Three traits (A, B and C) are used as examples for independent QTL
(Trait A), loosely linked QTL (Trait B), closely linked or clustered QTL (Trait C, Chromosome II and III), and
mixed model (Trait C). Detectable QTL are indicated by circles, and their effects are represented by the cir-
cle sizes. Likelihood maps for Trait C are given at the right side of each linkage map, and for Chromosomes
I and III, two likelihood maps are given to show expected results from minor QTL mapping and regional
mapping. From Xu (1997). This material is reproduced with permission of John Wiley & Sons, Inc.
they behave as one gene with major effect; by different methods including mapping
and (iv) mixed distribution for a specific and selection approaches.
trait, QTL have a combined distribution of
the three models above.
Because of continuous variation in
quantitative traits, QTL genotypes cannot 7.1.1 Mapping approaches
be easily determined by inspecting the dis-
tribution of trait phenotype alone. This is In theory, molecular-marker-based QTL map-
one of the fundamental problems of quan- ping can be used to draw inferences about
titative genetics. Historically important QTL allelic differences. Normally, however,
genetic parameters, e.g. genetic variances it is difficult to determine whether the effect
and heritability (Chapter 1), summarize the detected with a particular molecular marker
effects caused by all QTL but do not pro- is due to one QTL with large effect or linked
vide information to distinguish the effects QTL each with relatively small effect. For
of individual QTL. In order to understand this reason, the term QTL usually describes
the genetic structure of QTL and ultimately a region of a chromosome defined by link-
clone them, multiple QTL affecting the age to a marker gene (Tanksley, 1993). Using
same trait must be mapped on to chromo- a mapping procedure, one can partition
somes and their effects must be well sepa- multiple QTL into single manipulable units
rated. Theoretically, multiple QTL can be and determine whether a QTL is comprised
separated into single manipulable factors of one or more genes (Fig. 7.1). This strategy
Molecular Dissection of Traits: Practice 251
depends on both the resolution of molecular ing the progeny of individual members of an
maps and the mapping power for the QTL F2 population until virtually homozygosity
with small effect and requires improvement is achieved. Considering the fact that more
of the statistical power in QTL analysis, sat- and more RILs have been accumulated in
uration of molecular maps and optimization breeding programmes and such populations
of population structures. can be exploited for the mapping of one
or more traits, the RIL approach is practi-
Fine mapping cal for most crops. AILs are initiated by a
cross between two inbred lines and derived
It is generally acknowledged that a typical by sequentially and randomly intercross-
higher plant genome includes 10,000 ing each generation until advanced gener-
100,000 genes, scattered through a total of ations are attained. For these two kinds of
1081010 bp of DNA. Consequently, 0.1% populations, many recombinational events
of the genome would include an average required for fine mapping of QTL are accu-
of 10100 genes. Several genes lying close mulated in a single relatively small popu-
together, each with a small effect on a trait, lation over the course of many generations.
could appear to be a single QTL of large effect Due to more opportunity for meiotic recom-
(Michelmore and Shaw, 1988; Paterson et al., bination, they have the advantage of possi-
1988). Reducing the size of the regions iden- bly distinguishing more closely linked QTL.
tified as containing QTL through fine map- For example, with the same population size
ping has been envisioned as an initial step and QTL effect, the 95% confidence inter-
in identifying single QTL that ultimately val of a QTL map location of 20 cM in the
could be manipulated using transforma- F2 is reduced fivefold after eight additional
tion (recombinant DNA) technology (Stuber, random mating generations (F10; Darvasi
1994a; Tanksley et al., 1995). Current strate- and Soller, 1995). RILs have similar effects
gies for mapping QTL depend on comparing on the resolution power of QTL mapping.
the means of recombinant and non-recom- It is worth noting that increases in recom-
binant classes. Given the practical size of binational events will reduce the effect due
segregating populations (n = 200300) and to the QTL associated with any particular
the marker density of molecular maps most marker and thus, these populations are more
frequently used for QTL studies, preliminary suitable for fine mapping of QTL of moder-
mapping resolution of QTL has been limited ate and large effects. In another approach,
to approximately 1020 cM inadequate Paterson et al. (1990) suggested that recom-
for distinguishing between single gene and binant individuals could be identified in
multi-gene composition. To reveal just what primary generations and selectively multi-
lies at a locus, techniques with much higher plied in subsequent generations so that the
resolution are necessary. recombinant classes occur at near equal fre-
Conventional mapping populations quency with the non-recombinant classes,
such as backcross (BC) and F2 have limi- increasing the power for statistical compari-
tations in fine mapping of QTL due to the sons among the classes.
lack of sufficient recombinational events The benefits of using designs such as
even in large populations. Therefore, an RILs other than the conventional F2 and BC,
alternative approach for fine mapping is to where genotyping and phenotyping could be
exploit populations whose derivations are done on the same set of individuals, include
based on multiple cycles of recombination. reducing cost and environmental variance
These populations include recombinant and taking advantage of the changes in
inbred lines (RILs) (Burr and Burr, 1991) population structures of other RIL popu-
and advanced intercrossing lines (AILs) lations. Kao (2006) proposed a statisti-
(Darvasi and Soller, 1995) or intermated cal method considering the differences in
recombinant inbred lines (IRILs) (Liu et al., population structures between different RIL
1996). As described in Chapter 5, RILs are populations on the basis of a multiple-QTL
produced by continually selfing or sibmat- model to map for QTL in different designs.
252 Chapter 7
The proposed method has the potential to with planned crosses that yields more pre-
improve the resolution of genetic architec- cise estimates than those under random
ture of quantitative traits and can serve as intermating.
an effective tool to explore the QTL map-
ping study in the system of RIL popula- Minor QTL mapping
tions. Martin and Hospital (2006) described
the non-independence of multiple recom- In most QTL identification studies, rather
binations arising in RIL recombination data stringent threshold probability levels have
even though there may be no interference been set so that there is a low risk in making
in each meiosis. They also provide formu- Type I errors (i.e. false positives). Thus, only
las for interference tests, gene mapping and those QTL with sufficiently large pheno-
QTL detection in RIL populations. typic effects to be detected statistically can
A new genetic map of maize, ISUIBM be identified while QTL with smaller effects
Map4, that integrates 2029 existing markers will fall below the threshold of detection (cf.
with 1329 new indel polymorphism (IDP) Fig. 7.1). When multiple QTL are located
markers has been developed using IRILs in the same chromosomal region, the ones
from the intermated B73 Mo17 (IBM) with smaller effects cannot be detected in
population (Fu et al., 2006). The mosaic most instances. This overshadow effect
structures of the genomes of 91 IRILs, an of major QTL over minor QTL makes the
important resource for identifying and map- molecular marker approach biased towards
ping QTL and expression QTL (eQTL), were the detection of QTL of large phenotypic
defined. When this RIL population was effects. It should be pointed out that these
evaluated in four environments for resist- major QTL would be ones with high herit-
ance to southern leaf blight (SLB) disease abilities, easily manipulated through trad-
caused by Cochliobolus heterostrophus race itional breeding practices and may already
O (Balint-Kurti et al., 2007), four common be fixed in many breeding lines. There are
SLB resistance QTL were identified in all accumulated data from numerous QTL stud-
environments, two in bin 3.04 and one each ies to establish definitely that QTL affecting
in bins 1.10 and 8.02/3. A comparison was a number of quantitative traits are distrib-
made between SLB QTL detected in two uted throughout the genome and certain
populations, independently derived from chromosomal regions appear to contribute
the same parental cross: the IBM advanced greater effects than others. More surprising
intercross population and a conventional has been the finding that in many instances
RIL population. Several QTL for SLB resist- a large proportion of quantitative variation
ance were detected in both populations, can be explained by the segregation of a
with the IBM providing between 5 and 50 few major QTL. It is not uncommon to find
times greater mapping resolution. individual QTL that can account for more
Population size and mating design are than 20% of the phenotypic variation in a
two important aspects to be given adequate population (Table 1 of Tanksley, 1993) and
consideration during the development of values as high as 85.7% (Lin et al., 1995)
IRILs. Although random intermating of F2 (a major gene with distinct bimodal distribu-
populations has been suggested for obtain- tion) have been reported for a single major
ing precise estimates of recombination fre- QTL. It may be reasonable, therefore, to use
quencies between tightly linked loci, Frisch marker technology as a means for placing
and Melchinger (2008) in a recent simula- greater emphasis on those QTL showing
tion study showed that sampling effects due only relatively minor effects (minor QTL).
to small population sizes in the intermating Detectability of a trait locus is
generations have abolished the advantages severely limited by its genetic background.
of random intermating that were reported A straightforward background effect is dilu-
in previous theoretical studies consider- tion, i.e. the more QTL alleles that exist, the
ing an infinite population size. They also smaller the relative contribution of a given
propose a mating scheme for intermating locus (Frankel, 1995). The smallest effects
Molecular Dissection of Traits: Practice 253
a QTL can have and still be detected by reduce the chances of spurious QTL being
the marker method depend on a number of reported, but also reduce the chances of
factors (Tanksley, 1993; Chapter 6), which detecting QTL with smaller effects. This
include: relationship indicates that development of
QTL mapping methods which can improve
1. Map distance: the closer a QTL is to a the mapping power with a specific sample
marker, the smaller the QTL effect and still size would benefit the separation of minor
be detected statistically. This relationship QTL. Based on the concept of permutation
indicates that the power of QTL mapping tests, Churchill and Doerge (1994) described
can be improved with the saturation of a method to determine an appropriate
molecular maps. Now, many high-density threshold value for declaring significant QTL
linkage maps providing high quality ref- effects, providing an alternative approach to
erence maps for QTL mapping have been the likelihood of odds (LOD) drop-off method
available for many crop plants. (Lander and Botstein, 1989). The condi-
2. Sample size: the larger the sample (popu- tional empirical threshold and the residual
lation) size, the more likely the effects of empirical threshold yield critical values that
smaller QTL will reach statistical signifi- can be used to construct tests for the pres-
cance. This relationship indicates that the ence of minor QTL effects while account-
detection of QTL with a relatively small ing for effects of known major QTL (Doerge
effect largely depends on the size of the map- and Churchill, 1996). Now the permutation
ping population. Using a typical sample size method has been widely used in various
(n < 500), two or more genes closely linked QTL mapping approaches and some QTL
(within 20 cM) will usually be detected as mapping software such as QTL CARTOGRAPHER
a single QTL (i.e. they cannot be distin- (http://statgen.ncsu.edu/qtlcart/WQTLCart.
guished as separate QTL when mapped with htm) provide the permutation function for
the interval approach of two flanking mark- the statistical methods incorporated.
ers). In maize using an F2 population size
of 1700 individuals and probability thresh- In rice, a total of 15 QTL for heading
old of 0.05, a QTL contributing as little as date (Hd1Hd3, Hd3bHd14) were identified
0.3% of phenotypic variance was reported in several populations derived from crosses
(Edwards et al., 1987). In experiments with between Nipponbare, a rice cultivar from
smaller sample sizes and higher probability Japan, and Kasalath, a rice cultivar from India
thresholds, QTL that explain less than 3% (as reviewed by Yano et al., 2001). Nine of
of the phenotypic variance are not normally these have been mapped as single Mendelian
detected. The bias towards detecting QTL factors and studies have shown that Hd1,
with larger effects means that it is unlikely Hd2, Hd3a, Hd3b, Hd5 and Hd6 are involved
that one will ever detect, map and charac- in day-length response (reviewed by Uga et
terize all of the QTL affecting a character in al., 2007). Using an extremely late heading
any single segregating population. (202 days to heading) cultivar Nona Bokra
3. Heritability: the larger the environmen- from India and japonica cultivar Koshihikari
tal effect on the character (i.e. low heritabil- (105 days) from Japan, QTL analysis iden-
ity), the less likely a QTL will be detected. tified 12 QTL on seven chromosomes. The
Estimates of heritability can be improved by Nona Bokra alleles of all QTL contributed
controlling environmental error. Permanent to an increase in heading date. Comparison
mapping populations such as RILs, double of chromosomal locations between heading
haploids (DHs) and advanced backcrossing date QTL detected between these two culti-
populations or near-isogenic lines (NILs) vars and 15 QTL identified from Nipponbare
can be used to improve the mapping power Kasalath populations revealed that eight
by replicate phenotyping in different envi- of the heading date QTL were nearby the
ronments (years, seasons or locations). Hd1, Hd2, Hd3a, Hd4, Hd5, Hd6, Hd9 and
4. QTL threshold: higher probability thresh- Hd13. The results suggested that the strong
olds for declaring a QTL effect significantly photoperiod sensitivity in Nona Bokra was
254 Chapter 7
generated mainly by the accumulation of disease resistance locus. Now NILs have
additive effects of particular alleles at previ- been widely used in map-based gene clon-
ously identified QTL (Uga et al., 2007). This ing through fine mapping.
also indicates that multiple QTL for complex With the accumulation of permanent
traits like extremely late-heading can be dis- mapping populations such as RILs and
sected by QTL mapping. DHs, it is possible to select lines that are
almost genotypically identical in the whole
Regional mapping genome except for only one or a few marker
loci. Combined with phenotypic similarity,
The most common method of placing mole- this information can be used to obtain NILs
cular markers on a linkage map is by ran- for qualitative or quantitative traits (Xu,
dom cloning of genomic/cDNA sequences, 1997).
by PCR-based detection of polymorphism, The second strategy, referred to as DNA
or by single nucleotide polymorphism pooling or the bulked segregant analysis
(SNP) markers based on chip technology (BSA), which is discussed later in this chap-
(Chapter 3), followed by linkage analysis. ter, relies on the use of segregating popula-
This whole genome mapping method is tions (Michelmore et al., 1991; Giovannoni
extremely useful and can be used to construct et al., 1991), which does not require highly
both low and high resolution maps of com- specialized genetic stocks. This strategy is
plex genomes. Nevertheless, this approach is derived from the concept of selective geno-
limited when one is interested in targeting a typing based on selection for contrasting
particular chromosomal region. A majority of phenotypes or bracket DNA markers.
random markers will ultimately be mapped Both regional mapping methods
outside of a target interval and as the inter- described above, when used in conjunction
val size decreases the odds of any new ran- with high-volume DNA marker technology,
domly generated marker being placed within permit one to screen thousands of loci and
it decreases (Tanksley et al., 1995). selectively identify those adjacent to the
Two strategies have been proposed gene of interest in a specific chromosomal
to target the chromosomal region of inter- region and are very suitable for analysis of
est (regional mapping; Xu, 1997) and have clustered QTL. Moreover, these regional
proven effective for identifying from a large mapping approaches can be accomplished
number of markers the few that reside near without having a genetic map for the spe-
a targeted locus. Both involve the use of cies. For major genes, the efficiency of the
genetic stocks that are (almost) genetically NIL and BSA approaches have been dem-
identical, except in the regions flanking onstrated in plants and early examples
the targeted gene. The first strategy uses of success include Young et al. (1988),
NILs, which are generated by introgression Michelmore et al. (1991), Giovannoni et al.
(Wehrhahn and Allard, 1965). As discussed (1991), Schller et al. (1992), Mackill et al.
in Chapter 4, the inbred lines differ at the tar- (1993) and Pineda et al. (1993).
geted locus or region. If the donor parent and To solve the issue associated with the
the recurrent parent are sufficiently diver- masking effects of major QTL and epistatic
gent, it is possible to detect polymorphisms interactions of multiple QTL involved in RIL-
between the pairs of NILs. The marker that based QTL mapping, Keurentjes et al. (2007a)
detects such polymorphisms will likely be empirically compared the QTL mapping
linked to the target gene. As early examples, power of a genome-wide NIL population with
Young et al. (1988) used NILs and pools of an already existing RIL population derived
restriction fragment length polymorphism from the same parents in Arabidopsis thal-
(RFLP) probes to detect new markers within iana. By analysing and mapping QTL affect-
the Tm-2a region of tomato. Using a simi- ing six developmental traits with different
lar strategy, Martin et al. (1991) were able heritability, overall, QTL with smaller effects
to use random PCR amplification on NILs could be detected in the NIL population more
to isolate new markers near the tomato Pto easily than in the RIL population, although
Molecular Dissection of Traits: Practice 255
the localization resolution was lower. In gen- les at others. The phenomenon where QTL
eral, population size is more important than alleles of similar effect are dispersed among
the number of replicates to increase the map- genetic stocks is referred to as allele disper-
ping power of RILs, whereas for NILs several sion. However, a genetic stock may contain
replicates are absolutely required. (associate) all the alleles of similar (positive
In an effort to identify putative candi- or negative) effect at the multiple QTL; this
date genes underlying drought tolerance in is referred to as allele association. For traits
rice, Nguyen et al. (2004) developed sev- naturally selected towards intermediate phe-
eral expression sequence-based markers notype, alleles of similar effect at multiple
using BSA for saturation mapping of QTL loci are more likely dispersed than associ-
regions. Thirteen of the markers were local- ated. In natural or breeding populations with
ized in the close vicinity of the targeted neutral selection, there are many genetic
QTL regions. In rice, substitution map- stocks that have positive alleles at some loci
ping of a flowering-time QTL associated but negative alleles at others. The extreme
with transgressive variation has separated phenotypes of quantitative traits come from
a previously located QTL, dth1.1, into at the association of QTL alleles while the
least two sub-QTL (Thomson et al., 2006). intermediate phenotype usually indicates
The QTL dth1.1 was associated with trans- allele dispersion. Therefore, different QTL
gressive variation for days to heading in an alleles with similar effect can be identified
advanced BC population derived from the from the existing populations. On the other
Oryza sativa cultivar Jefferson and an acces- hand, if one has allele-associated stocks in
sion of the wild rice relative Oryza rufipo- hand, the QTL alleles could be separated by
gon. A series of NILs containing different O. selection for different genotypes. Allele dis-
rufipogon introgressions across the target persion differs from linkage equilibrium in
region were constructed to dissect dth1.1 two respects. First, allele dispersion refers
using substitution mapping. In contrast to to independent or linked loci controlling the
the late-flowering O. rufipogon parent, O. same trait; while in linkage equilibrium the
rufipogon alleles in the substitution lines related genetic loci are usually supposed to
caused early flowering under both short and be genetically linked and control different
long day-lengths and provided evidence for traits. Secondly, allele dispersion represents
at least two distinct sub-QTL: dth1.1a and a situation in which any two non-genetically
dth1.1b. Potential candidate genes under- related genotypes (strains) within a species
lying these sub-QTL included genes with show allelic differences at the same genetic
sequence similarity to Arabidopsis GI, FT, locus, while linkage equilibrium represents
SOC1 and EMF1 and Pharbitis nil PNZIP. a situation in which a constant gene fre-
Evidence from families with non-target O. quency has been reached in a given popula-
rufipogon introgressions in combination tion derived from two related strains.
with dth1.1 alleles also detected an early Genetic stocks with dispersed QTL
flowering QTL on chromosome 4 and a alleles usually show similar phenotype,
late-flowering QTL on chromosome 6 and making it difficult to identify genetic differ-
provided evidence for additional sub-QTL ences only by phenotypic evaluation. When
in the dth1.1 region. these stocks are used as the parents to pro-
duce segregating populations, however, a
part of the progeny will have transgressive
phenotypes, i.e. they are phenotypically
7.1.2 Screening for allele dispersion outside of the range of the parents, because
these progeny associate all alleles of similar
When multiple QTL control a trait, their effect with the result of recombination of
alleles of positive or negative effect (increas- different QTL alleles. Positive and negative
ing or decreasing trait value) tend to be dis- transgressive individuals will arise from the
persed among genetic stocks, with positive associations of positive and negative alle-
alleles at one or some loci but negative alle- les, respectively. Transgression caused by
256 Chapter 7
dominance and/or overdominance can also obtained by successively selfing the trans-
be excluded by successively selfing the trans- gressive individuals (Xu and Shen, 1992b).
gressive individuals to determine if they Comparative genetic analysis of two con-
maintain the same phenotype in advanced trasting crosses (from the original cultivars
generations. If no significant epistasis can and from the corresponding extreme strains)
be detected by biometrical genetic analysis, revealed that two loci were responsible for
the transgressive segregation in populations the genetic difference of tiller angle in each
derived from two genetic stocks provides pair of the original cultivars, and alleles of
evidence for allele dispersion. Based on the similar effect were dispersed in the origi-
additive-dominance model, Xu and Shen nal cultivars but associated in the extreme
(1992a) suggested three methods to screen strains (largest tiller angle strains having all
the separable QTL alleles by detection of the positive alleles and smallest tiller angle
allele (gene) dispersion, including: (i) test- strains having all the negative alleles). The
ing the homogeneity between F2 phenotypic second cycle of crossing between the extreme
variance and the environmental variance strains derived from different original crosses
estimated from phenotypic variances of revealed further transgression. Biometrical
non-segregating populations (P1, P2 and F1); genetic analysis and selection response indi-
(ii) testing the differences of means among cate that four loci controlled the total varia-
F1, F2, F1 P1 and F1 P2; and (iii) comparing tion of tiller angle in four original cultivars,
the genetic parameters such as gene effects each cultivar carrying two positive alleles
and genetic variances estimated from the at only one locus (Xu et al., 1998).
cross derived by intermating transgressive Allele-dispersion can also be identi-
individuals of two kinds with those esti- fied based on QTL mapping results. In QTL
mated from the cross of the original stocks. mapping, phenotypic difference between
Classical genetic analysis provides parents is not necessary for detection of
some examples for allele dispersion. The QTL. In most cases where no parental dif-
first example in plants may come from ference is found, QTL are still detected,
Nicotiana rustica. The allelic differences which could be due to the complementary
for final height, flowering time and related patterns of positive and negative allelic
characters were largely dispersed between effects. QTL mapping can provide informa-
two cultivars (genotypes) 1 and 5 (Jinks tion about the genetic constitution of each
and Perkins, 1969, 1972; Perkins and Jinks, segregate in the mapping population so
1973) with 127 and 103 cm of final height that one can infer which individual carries
and 77 and 72 days of flowering time (days desirable alleles and then separate the mul-
after sowing), respectively. Among the ran- tiple QTL by selection for individuals with
dom sample of 82 inbred lines derived from different allele combinations. For example,
the cross between these two cultivars, trans- if four QTL are inferred to control a trait, the
gressive lines were found and two of them, allelic constitution can be determined for all
B2 and B35, were the shortest and tallest in individuals and each QTL. Therefore, one
final height (92 and 144 cm, respectively) can easily screen the individuals carrying
and the earliest and latest to flower (70 and the positive allele at each of the four QTL.
84 days, respectively). The simultaneous Because of allele dispersion, it is not likely
analysis of the two contrasting crosses (1 that one QTL mapping experiment using
5 and B2 B35) indicated the allele disper- any single population can detect all the
sion in the original cultivars (Jayasekara QTL affecting a given trait. Therefore, inde-
and Jinks, 1976). Another example is rice pendent experiments tend to reveal differ-
tiller angle (the angle between the main ent QTL or QTL alleles. Comparison of QTL
stem and its tillers). Transgressive segrega- effects and mapping positions may result in
tion was found in the two crosses derived separation of the multiple QTL. However,
from four indica rice cultivars with simi- this approach largely depends on the preci-
lar tiller angle and the extreme strains sion of QTL mapping results available. So
with largest and smallest tiller angles were far, numerous QTL affecting the same traits
Molecular Dissection of Traits: Practice 257
have been identified in most crops and from low-yielding, short-statured parental
they are different in locations and effects. lines (Edwards et al., 1992). A wild rice
Variation among investigations and among species with low yield potential contains
populations can logically be expected for genes that may significantly increase the
the following reasons (Smith and Beavis, productivity of the high-yielding cultivated
1996): (i) different polymorphisms in the rice (Xiao et al., 1996a). There are numerous
populations studied; (ii) different number recent examples available to support these
and location of polymorphic regions affect- early reports.
ing the trait; (iii) environmental effects or As observed in rice QTL mapping, on
genotypeenvironment interaction; and (iv) the average, about four QTL are identified for
small sample size. With the development of each trait (Table 7.1; Xu, Y., 2002), the same
highly polymorphic DNA markers such as as the average obtained for 176 trialtrait
simple sequence repeats (SSRs), the first rea- combinations as reviewed by Kearsey and
son will become less important. Permanent Farquhar (1998). When QTL identified for
populations can be phenotyped at different the same trait are summarized over different
seasons, years or locations, reducing the projects/populations, this number becomes
environmental effect on QTL identification. much larger. For example, rice plant height
Using relatively large sample size, com- has been mapped using 13 populations,
bined with highly polymorphic markers, with 63 QTL reported. Some of the QTL are
permanent population and replicate pheno- allelic to each other, i.e. they were mapped
typing, will help to determine whether the to the same chromosomal region or inter-
variation of QTL mapping comes from dif- vals of less than 15 cM. After elimination
ferent QTL constitutions of populations or of possible allelic QTL, the total number of
not. It is of interest to note that cryptic fac- QTL for plant height is reduced to 29, with
tors were frequently uncovered (e.g. Stuber, up to five QTL existing on a chromosome
1995; Ragot et al., 1995), indicating the pos- (Xu, Y., 2002). The QTL qPH1-1, which cor-
sibility of QTL dispersion. For example, responded to a major semi-dwarf gene sd-1
genetic factors contributing to high grain and qPH8-1 were each detected in six popu-
yield and tall stature in maize occasionally lations. QTL qPH2-2 and qPH3-3 were each
have been associated with marker alleles detected in five populations. Over 50 major
Table 7.1. The number of QTL identified in rice using permanent mapping populations. See Xu, Y. (2002)
for all references.
genes for dwarf and semi-dwarf mutants factors, which regulate three plant proc-
have been found (Kinoshita, 1995) and 14 esses and one plant characteristic and two
of them were linked to molecular markers are environmental factors, which modulate
(Huang et al., 1996; Kamijima et al., 1996), the ongoing gene actions of one or more of
with 13 of them (93%) co-localized with the three plant processes. These six direct
plant height QTL. More plant-height QTL components are the rate of node and leaf
will likely be co-localized with major loci, development, change from node to flower,
as more major loci are linked with molecu- vernalization requirement, node number at
lar markers. These co-localizations support the minimal days to flowering, impacting
Robertsons (1985) hypothesis that alleles for photoperiod and impacting temperature
qualitative mutants are simply lost-function (Wallace, 1985). In most crops, the yield
alleles at the same loci underlying quanti- trait is comprised of several yield compo-
tative variation. Until QTL are mapped to nents and oil or protein content is related to
higher degrees of precision and/or cloned, many compounds and amino acids. In rice,
however, it would be difficult to prove that low fertility controlled by polygenes can be
the particular QTL actually correspond to partitioned into several components includ-
known loci defined by macromutant alleles ing male and female sterility, or ovary and
and which QTL are allelic to each other. The pollen abortion so that polygenes can be
QTL allelism test and the determination of divided into several single genes with dif-
the major-gene and QTL correspondence ferent effects and thus can be handled with
depend on the availability of high-density ease. Dissecting or partitioning a complex
molecular maps with a common set of mark- trait into separate components can benefit
ers shared among researchers. both QTL mapping and cloning.
With the generalization of this concept,
non-allelic alleles can be searched for among
an entire set of related species. The high
incidence of transgressive segregation in 7.2.2 Correlated traits
interspecific crosses tells us that individu-
als that do not exhibit a particular trait often Trait correlation may arise from either plei-
carry superior/hidden alleles that condition otropic effects of single genes or from tight
that trait. It is likely that non-allelic alleles linkage of genes affecting different traits.
usually would be present in other strains Correlated traits often share some QTL
but missed due to limitations in genetic mapped to similar chromosomal regions. In
analysis. By using the entire set of the most Poaceae, increased plant height often
related species, it thus may be possible to correlates with late flowering. Comparative
identify all of the genes involved in a given data support the possibility that different,
trait or physiological process because the closely linked genes, rather than a single
genes phenotypically hidden in one spe- gene, account for correlated traits. In sor-
cies may not be hidden in another species ghum, two of the three QTL affecting flow-
(Bennetzen, 1996). ering time were associated with height QTL
and two major QTL were mapped within
overlapping 90% confidence intervals,
explaining 85.7% and 54.8% of phenotypic
7.2 QTL for Complicated Traits variation, respectively, and showing similar
gene action (dominance/additive = 0.72 and
7.2.1 Trait components 0.73) (Lin et al., 1995). Many pairs of inde-
pendent discrete mutations affecting height
Many quantitative traits are a complex con- and flowering are closely linked in corre-
sisting of different or related components or sponding locations of wheat (Ppd1 and Rht8)
subtraits. For example, there are six direct (Worland and Law, 1986; Hart et al., 1993)
component effects upon days to flower- and rice (Se-1/Se-3 and d-4/d-9) (Kinoshita
ing in legume seed crops. Four are genetic and Takahashi, 1991; Causse et al., 1994).
Molecular Dissection of Traits: Practice 259
Correlated traits have been analysed It is also likely that physiological interac-
separately in QTL mapping without tions independent of genetic factors may
using correlation information and thus result in correlated phenotypic response.
correlation between traits will affect the Li et al. (2006) introduced a method for
mapping of any single trait involved. the analysis of multi-locus, multi-trait
Considering the correlation between genetic data that provides an intuitive
different physiological characters, a poly- and precise characterization of genetic
genic complex controlling a physiological architecture and they showed it is possi-
trait can be manipulated to a large extent ble to infer the magnitude and direction
by cloning only one or a few QTL from of causal relationships among multiple
this complex. That is, all functions of correlated phenotypes. They illustrated
a polygenic complex affecting multiple the technique using body composition
physiological characters can be started and bone density data from mouse inter-
and pushed by making one of the func- cross populations. The identification of
tions into a highly efficient system. This causal networks sheds light on the nature
was observed in growth-hormone-trans- of genetic heterogeneity and pleiotropy in
formed mice (Palmiter et al., 1983): when complex genetic systems.
growth hormone was produced largely Genetic correlation can be understood
by inserting additional copies, all other at gene expression levels. Coordinated reg-
components could respond appropriately ulation of gene expression levels across a
to this change, although growth hormone series of experimental conditions provides
is only one of the components affecting valuable information about the function
development. It seems that QTL control- of correlated transcripts. In order to anno-
ling closely related quantitative traits tate gene function and identify potential
could be manipulated to show linked members of regulatory networks, Lan et al.
response by only manipulating one or a (2006) explored correlation of expression
few of these QTL. profiles across a genetic dimension, namely
Multivariate analysis of complicated genotype segregating in a panel of 60 F2
traits can be used to investigate the struc- mice derived from a cross used to explore
ture of a genetic system that includes diabetes in obese mice. They first identi-
allelic variation at multiple loci, interme- fied 6016 seed transcripts for which they
diate phenotypes and their relationships. observed that gene expression is linked to a
Jiang and Zeng (1995) proposed a method particular region of the genome. Then they
for QTL detection based on a multivariate searched for transcripts whose expression is
normal model with unconstrained covari- highly correlated with the seed transcripts
ance structure. Alternatively, dimension and tested for enrichment of common bio-
reduction techniques, such as principal logical functions among the lists of corre-
component analysis, can be applied to a lated transcripts. They found and explored
set of correlated traits. Multivariate QTL the properties of 1341 sets of transcripts
analyses can provide enhanced power and that share a particular gene ontology term.
resolution in QTL mapping when traits Thirty-eight seeds in the G protein-coupled
are highly correlated and share common receptor protein signalling pathways were
genetic determinants (Korol et al., 2001). correlated with 174 transcripts, all of which
Mapping studies that investigate clus- are also annotated as G protein-coupled
ters of related phenotypes often reveal a receptor protein signalling and 131 of which
network of genetic effects, in which each share a regulatory locus on chromosome 2.
phenotype is influenced by multiple loci They noted that many of these findings
(heterogeneity) and different phenotypes would have been missed by simple eQTL
share one or more loci in common (plei- analysis without the correlation step. Trait
otropy). The complexity of observed QTL correlation combined with linkage mapping
networks will vary depending on the is more sensitive compared with linkage
traits and the power of the study design. mapping alone.
260 Chapter 7
Increase of
gene number
Compound
environmental
factors Minor QTL
Frequency
Major QTL
+
minor QTL
Single
environmental
factors
Major QTL
Uniformity of
environments Phenotypic value
Fig 7.2. Relationship between phenotypes, genes, and environments. Discrete phenotypic distribution for
qualitative traits arises from major genes, bimodal distribution for qualitativequantitative traits from the
joint effect of a major QTL (with dominant effect) and some minor QTL, and normal distribution for typical
quantitative traits from many minor QTL. With partition and uniformity of environments, some continuously
distributed traits can be converted into a bimodal of discretely distributed traits. From Xu (1997). This mate-
rial is reproduced with permission of John Wiley & Sons, Inc.
Molecular Dissection of Traits: Practice 261
phenotypic effects can be accounted for in into consideration maternal genetic effects
the search for secondary QTL. This method and cytoplasmic effects along with the direct
is suitable for unlinked multiple QTL and/ genetic effects of seeds (Xu, 1997). As seeds
or QTL residing on different chromosomes. initiate a new generation that differs from
Mapping methods suitable for majorminor their maternal plants, some seed traits should
genes warrant further research. be considered as a generation advanced
over their maternal plants. Since the DNA
used in most molecular analyses has been
extracted from leaves or tissues of maternal
7.2.4 Seed traits plants, genetic analysis of endosperm traits
should be based on the DNA extracted from
The improvement of seed yield and quality both maternal plants and endosperm tissues
is one of the most important objectives in in order to understand the relative contri-
cereal breeding. As a major storage organ of bution of the different genetic factors to the
cereal seeds, endosperms provide humans variation of endosperm traits.
with proteins, essential amino acids and Many years after Xus (1997) advocacy,
oils. An understanding of the inheritance of several articles were published detailing
endosperm traits is critical for the improve- the unique difference associated with tri-
ment of yield potential and seed quality. ploid traits and some statistical methods
Genetic behaviour in triploid endosperms have been developed with consideration of
is very different from that of the maternal the trisomic inheritance of the endosperm
plants that supply the components for grain and the generation difference between the
growth and development. Thus, methods mapping population and the endosperm
suited for genetic analysis of traits in mater- (Wu et al., 2002a,c; Xu, C. et al., 2003;
nal plants (diploids for most cereal crops) Kao, 2004; Cui and Wu, 2005; Wang, X. et al.,
cannot directly be used for endosperm traits 2007). In general, the proposed triploid-
(Xu, 1997). Based on triploid models, bio- based methods use the marker informa-
metrical methods have been proposed for tion either from only the maternal plants
conventional genetic analysis of endosperm or from both the maternal plants and their
traits (Gale, 1975; Bogyo et al., 1988; Mo, embryos for mapping endosperm traits and
1988; Foolad and Jones, 1992; Pooni et al., provide better detection power and estima-
1992; Zhu and Weir, 1994). Any analytical tion precision than diploid-based methods.
method for endosperm traits needs to com- The genetic models are also developed to
bine a QTL analytical method developed handle epistatic effects (Cui and Wu, 2005)
for diploid maternal plants with a triploid and to use bulked grain samples (Wang, X.
model proposed for conventional genetic et al., 2007).
analysis. Zheng et al. (2008) conducted QTL anal-
On the other hand, the genetic system ysis on maternal and endosperm genome for
controlling endosperm traits may be much three cooking quality traits (amylose con-
more complicated than that which controls tent, gel consistency and gelatinization tem-
the traits of maternal plants. Because mater- perature) in rice using a genetic model with
nal plants provide seeds with a portion of endosperm and maternal effects and envi-
their genetic material and almost all the ronmental interaction effects. The results
nutrients required for growth and develop- suggested that a total of seven QTL were asso-
ment, seed traits are genetically affected by ciated with cooking quality of rice, which
both the seed nuclear genes and the mater- were subsequently mapped to chromosomes
nal nuclear genes. In addition, cytoplas- 1, 4 and 6. Six of these QTL were also found
mic genes may also affect some seed traits to have environmental interaction effects.
through their indirect effects on the biosyn- As we discussed earlier, several studies
thetic processes of chloroplasts and mito- have shown that maternal genotypic vari-
chondria. To understand endosperm traits ation could greatly influence the estima-
with biological accuracy, one should take tion of the direct effects of QTL underlying
262 Chapter 7
endosperm traits. Recently, Wen and Wu provides a central tool for the determination
(2008) proposed methods of interval map- of the relative allelism of genes in different
ping of endosperm QTL using seeds of F2 or species (Bennetzen, 1996).
BC1 (an equal mixture of F1 P1 and F1 P2 Many QTL mapping studies have been
with F1 as the female parent) derived from a published for species connected by com-
cross between two pure lines (P1 P2). The parative linkage maps, which can be used
most significant advantage of these experi- to infer some conclusions regarding the
mental designs is that the maternal effects hypothesis of conserved QTL among diver-
do not contribute to the genetic variation gent species. Perhaps the first evidence
of endosperm traits and therefore the direct for orthologous QTL comes from compara-
effects of endosperm QTL can be estimated tive mapping in mung bean and cowpea
without the influence of maternal effects. In (Fatokun et al., 1992), where the research-
addition, these experimental designs could ers showed that the single most important
greatly reduce environmental variation QTL for determining seed weight in these
because a few F1 plants grown in a small two distinct species mapped to the same
block of field will produce sufficient F2 locus in both genomes and that the chance
or BC1 seeds for endosperm QTL analysis. occurrence of such coincidental mapping
More recently, He and Zhang (2008) pro- is very unlikely. Lin et al. (1995) and Xiao
posed mapping endosperm trait loci (ETL) et al. (1996c) discussed the putative orthol-
and epistatic ETL (eETL) as an efficient ogous QTL across grass species. Despite of
way to genetically improve grain quality different chromosome numbers and ploidy
using an alternative random hybridization levels, homoeologous relationships among
design. Using a penalized maximum likeli- rice, maize, wheat, oats and barley chromo-
hood method, the endosperm trait means of somes have been defined by using common
random hybrid lines together with known anchor probes (Ahn and Tanksley, 1993;
marker genotype information from their Ahn et al., 1993; van Deynze et al., 1995a, b).
corresponding parental F2 plants were used This information allows the comparison of
to estimate efficiently and without bias the locations of QTL affecting the same or cor-
positions and all of the effects of eETL. This responding traits in different species. For
new method may enable us to map triploid the QTL documented, some of them show
eETL in the same way as diploid quantita- similarities in locations for the same or
tive traits in future. similar traits. As an example, take flower-
ing traits (days to heading, flowering and
anthesis). The QTL close to CDO1081 on
rice chromosome 3 coincides with a similar
7.3 QTL Mapping across Species QTL on the homoeologous chromosomes
1 (Stuber et al., 1992) and 9 (Koester et al.,
For species connected by parallel genome 1993; Veldboom et al., 1994) of maize,
mapping, as described in Chapter 3, it chromosome 4 of barley (Hayes et al.,
should be possible to compare the map posi- 1993) and chromosome 5 of hexaploid oat
tions of QTL for the same or similar char- (Siripoonwiwat, 1995). Across 15 maize
acters. In this case, breeders might be able populations studied by seven groups of
to predict the positions of important QTL researchers, 55 QTL or mutants affecting
(e.g. for growth rates in animals or yield in flowering time were reported. A total of 26
plants) in one species based on mapping (47%) are clustered in five regions that span
studies from the others. Coincidence of 12.1% of the maize genome. One flowering
map positions would support the hypoth- QTL reported in sorghum (Pereira et al.,
esis that loci underlying natural quantita- 1994), three QTL in rice (Li et al., 1995),
tive variation have been conserved during three discrete mutants in wheat (Hart et al.,
long periods of evolutionary divergence 1993) and one in barley (Laurie et al., 1994)
(i.e. they are orthologous genes). The col- were included. Such a coincidence of QTL
linearity of genomes among related species map positions among these distinct species
Molecular Dissection of Traits: Practice 263
suggests that this kind of locus can be traced discrimination was conserved between the
back to the last common ancestor of these two species. Putative candidate genes for
species. Paterson et al. (1995) indicated that bud burst can be identified on the basis of
in sorghum, rice and maize, three similar co-locations between EST-derived markers
phenotypes (seed size, disarticulation of and QTL.
the mature inflorescence and day-neutral Schaeffer et al. (2006) reported a strat-
flowering) are largely determined by a small egy for consensus QTL maps that leverages
number of QTL that correspond closely in the highly curated data in MaizeGDB, in
the three taxa, which impels the compara- particular, the numerous QTL studies and
tive mapping of complex phenotypes across maps that are integrated with other genome
large evolutionary distances. data on a common coordinate system. In
Further studies identified QTL control- addition, they exploited a systematic QTL
ling important agronomic traits that showed nomenclature and a hierarchical categoriza-
similarities in locations for the same or simi- tion of over 400 maize traits developed in
lar traits (i.e. Fatokun et al., 1992; Lin et al., the mid-1990s; the main nodes of the hier-
1995; Xiao et al., 1996c; for a review, see archy are aligned with the trait ontology at
Xu, 1997). Shattering and plant height are Gramene, a comparative mapping database
examples that were also mapped to collinear for cereals. Consensus maps are presented
regions among grass genomes (Paterson et for one trait category, insect response (80
al., 1995; Peng et al., 1999). Chen et al. (2003) QTL); and two traits, grain yield (71 QTL)
identified four QTL for quantitative resist- and kernel weight (113 QTL), representing
ance to rice blast that showed correspond- over 20 separate QTL map sets of ten chro-
ing map positions between rice and barley, mosomes each.
two of which had completely conserved The use of anchor markers has enabled
isolate specificity and the other two had detection of possible orthologous QTL by
partial conserved isolate specificity. Such comparing QTL across cereals or construc-
corresponding locations and conserved spe- tion of phylogenetic relationships. Although
cificity suggested a common origin and con- it is unclear how many claimed orthologous
served functionality of the genes underlying QTL are real, detection of QTL that are com-
the QTL for quantitative resistance. mon across cereals at least indicates that the
In forest trees, a comparative genetic same QTL could be identified from very dif-
and QTL mapping was performed between ferent genetic backgrounds.
Quercus robur L. and Castanea sativa Mill., In a significant across species study,
two major forest tree species belonging to Campbell et al. (2007) identified a set of evo-
the Fagaceae family (Casasoli et al., 2006). lutionarily conserved and lineage-specific
Oak EST-derived markers (sequence tagged rice genes, which is termed conserved
sites, STSs) were used to align the 12 link- Poaceae-specific genes (CPSGs) reflecting
age groups of the two species. Fifty-one and the presence of significant sequence simi-
45 STSs were mapped in oak and chestnut, larity across three separate Poaceae sub-
respectively. These STSs, added to SSR families. Using the rice genome annotation,
markers preciously mapped in both species, along with genomic sequence and clustered
provided a total number of 55 orthologous transcript assemblies from 184 species in
molecular markers for comparative mapping the plant kingdom, they have identified a
within the Fagaceae family. Homologous set of 861 rice genes that are evolutionarily
genomic regions identified between oak and conserved among six diverse species within
chestnut allowed comparison QTL posi- the Poaceae yet lack significant sequence
tions for three important adaptive traits. similarity with plant species outside the
Co-location of the QTL controlling the tim- Poaceae. It was interesting to note that
ing of bud burst was significant between the vast majority of rice CPSGs (86.6%)
the two species. However, conservation of encode proteins with no putative function or
QTL for height growth was not supported by functionally characterized protein domain
statistical tests. No QTL for carbon isotope and for the remaining CPSGs, 8.8% encode
264 Chapter 7
using these two common types of segregat- imize the genetic background effect and
ing populations. to confirm the phenotype for the recom-
QTL have been fine mapped by apply- binants selected.
ing a mapping strategy based on analy-
sis of large progenies derived from NILs.
This approach requires the construction of 7.4.2 Heterogeneous genetic
highly inbred lines involving many genera- backgrounds
tions prior to generating the cross needed
for fine mapping. Instead of homogeniz- Although genetic distances and order of
ing the complete genetic background, as DNA markers are comparable among very
in the NIL approach, Peleman et al. (2005) different crosses, QTL mapping using
have chosen to focus specifically on the different populations derived from the
loci involved in expression of the pheno- same cross has identified very different
type. The strategy involved simultaneous QTL. Only some QTL are common across
fine mapping of QTL already at the F2 stage populations of different structures, such as
rather than producing inbred lines prior to DHs and RILs derived from a single cross
fine mapping. The main principle of the (He et al., 2001) where there is an identi-
approach is the selective genotyping and cal set of genes segregating. Heterogeneous
phenotyping of only those plants that yield genetic backgrounds can also come from
information on the map position of the various crosses derived from different culti-
QTL. Such plants are selected after a first vars. Genetic materials with heterogeneous
rough-scale mapping by standard methods genetic backgrounds can be used to esti-
(e.g. 200 F2 individuals). After identifica- mate epistasis, detect non-allelic QTL and
tion of the QTL for the trait of interest, a discover multiple alleles. QTL mapping for
larger part of the population (e.g. 1000 F2 the same traits using different populations
plants) is screened with markers flanking can be illustrated using seed dormancy
the QTL to identify sets of QTL isogenic in barley as an example, where QTL were
recombinants (QIRs). QIR plants carrying a compared across seven RIL populations
recombination event in one QTL while they and one DH population derived from
are homozygous at all other QTL are most crosses including 11 cultivated strains and
informative. The trait complexity can thus one wild barley strain showing the wide
be reduced to a monogenic trait as plants range of seed dormancy levels (Hori et
with all but one QTL having an identical al., 2007). Linkage maps were constructed
homozygous genotype are selected. These based on EST, SSR, RFLP and morpho-
QIRs are subsequently genotyped with suf- logical markers, each map consisting of
ficient markers at the recombinant QTL 821114 markers (Table 7.2). Using these
region to precisely map the recombinant populations, a total of 38 QTL clustered
event within the QTL-bearing interval. around 11 regions were identified on the
Phenotyping other QIRs becomes more reli- barley chromosomes except chromosome
able by reducing the trait complexity as 2H among eight populations. The QTL at
these plants are nearly isogenic for all QTL the centromeric region of the long arm of
that affect the trait. Peleman et al. (2005) chromosome 5H was identified in all popu-
demonstrated that for fine mapping oligo- lations with different degrees of dormancy
genic traits, homogenizing the background depth and period (Fig. 7.3).
genome is not required. The method was Considering several populations derived
demonstrated by fine mapping a QTL from diverse parental materials increases
responsible for erucic acid content in rape- the probability that a QTL will be poly-
seed. For quantitative traits that are con- morphic in at least one population. To go
trolled by many QTL each with relatively beyond comparison of results between
small effects, progeny test and background populations, some authors have proposed
selection with markers to cover the whole jointly analysing the different populations.
genome would be required in order to min- This can be done first for independent
266 Chapter 7
Table 7.2. Summary of linkage map information for eight permanent mapping populations in barley
(from Hori et al. (2007) with kind permission of Springer Science and Business Media).
Number of markers
Total map
Morpho- length
Population logical EST SSR RFLP AFLP Total (cM)
Bmac113
k06317
k08607
k04509
DHHS
RI2
(2003,5w)
(2003,5w) RI4
(2003,10w)
(2003,10w) (2003,5w)
k03390 (2005,5w)
RIA (2005,5w) (2003,10w)
k10895 (2005,10w)
(2005,5w) (2005,10w) (2005,5w)
k09404 (2005,10w)
k04703
k00950
k00584 RI5
(2005,5w)
RHI
(2003,5w) RI1
k06860
(2005,5w) (2005,5w)
Bmag223
(2005,10w) RI3
srh (2003,5w)
k09282 (2003,10w)
Bmag113 (2005,5w)
(2005,10w)
k03192
k03993
k04768
k09350
k03846
DHHS
k07669 (2005,5w)
k04431 (2005,10w)
ABC155
Bmag222
Fig. 7.3. A consensus barley linkage map based on eight mapping populations (for codes see Table 7.2)
and positions of QTL for seed dormancy at 5 and 10 weeks (5w and 10w, respectively) after ripening
in 2003 and 2005. Linkage groups are oriented with short arms from the top. The anchor loci including
SSR, RFLP and morphological markers are indicated with under line. QTL positions are indicated by grey
boxes. Peaks of the significant marker intervals as indicated by triangles in boxes. Only chromosome 5H
is included, which shows large-effect QTL near the centromere on the long arm in all of the populations.
From Hori et al. (2007) with kind permission of Springer Science and Business Media.
Molecular Dissection of Traits: Practice 267
populations (no known pedigree relation- new QTL were also detected in advanced
ship between the parents of the different generations.
populations) (Muranty, 1996; Xu, 1998).
In this case, QTL effects are nested (in the
statistical sense) within populations and 7.4.3 Epistasis
the number of parameters to be estimated
increases with the number of populations. Importance of epistasis
Also, the lack of connections between pop-
ulations does not allow global comparison The importance of epistasis to the genetic
of the effects of all QTL alleles segregat- control of quantitative traits has been
ing in the different populations. An alter- debated. As one of the early supports to the
native approach is therefore to develop importance of epistasis, Eshed and Zamir
connected populations (common parents (1996) reported that QTL epistasis is a sig-
among populations). Under the assump- nificant component in determining phe-
tion of additivity, considering identical notypic values using tomato NILs. For the
allelic effects over populations rather than five yield-associated traits, 2040% of the
nesting effects within populations reduces 45 dichromosome segment combinations
the total number of parameters and, conse- were epistatic, which is much higher than
quently increases the power of QTL detec- would be expected by chance alone. The
tion (Rebai and Goffinet, 1993; Jannink detected epistasis was predominantly less-
and Jansen, 2001). In such an analysis, the than-additive, i.e. the effect of double het-
effects of alleles segregating are estimated erozygotes was smaller than the sum of the
simultaneously, which facilitates a global effects of the corresponding single hetero-
comparison. This is of particular interest to zygotes. Several other studies showed that
identify the parental origin(s) of favourable the epistatic variance can account for a large
allele(s) at each QTL. proportion of the genetic variance of quanti-
QTL for six quality traits in tomato tative traits (Carlborg et al., 2005; Malmberg
(fruit weight, firmness, locule number, and Mauricio, 2005; Malmberg et al., 2005).
soluble solid content, sugar content and Epistatic interaction among loci could
titratable acidity) were studied in order to contribute substantially to the variation in
investigate their individual effect and their complex traits (Carlborg and Haley, 2004;
stability over years, generations and genetic Marchini et al., 2005).
backgrounds (Chab et al., 2006). Three sets In contrast, after a review of most stud-
of genotypes corresponding to three genera- ies conducted at that time Tanksley (1993)
tions were compared: (i) an RIL population suggested that strong epistatic interactions
containing 50% of each parental genome; are the exception and not the rule for natu-
(ii) three BC3S1 populations segregating rally occurring polygenes. These conclu-
simultaneously for the five regions car- sions are supported to some extent by the
rying fruit quality QTL, but almost fully few studies in which individual QTL have
homozygous for the recipient genome on been genetically identified by introgres-
the eight chromosomes carrying no QTL; sion from other QTL in NILs and have been
and (iii) three sets of QTL-NILs (BC3S3 lines) shown to continue producing their same
which differed from the recipient line only individual effects (De Vicente and Tanksley
in one of the five chromosome regions. 1993; Eshed and Zamir 1995) and also by
Eight of the ten QTL detected in RILs were a recent report on negligible interaction
recovered in the QTL-NILs with the genetic using a barley DH population (Harrington
background used for the initial QTL map- TR306) developed by the North American
ping experiment, with the exception of two Barley Genome Mapping Project for QTL
QTL for fruit firmness. Several new QTL mapping, which consisted of 145 lines and
were detected. In the two other genetic 127 markers covering a total genome length
backgrounds, the number of QTL in com- of 1270 cM. These DH lines were evaluated
mon with the RILs was lower, but several in 25 environments for seven quantitative
268 Chapter 7
traits: heading, height, kernel weight, lodg- errors, due to increasing difficulty in man-
ing, maturity, test weight and yield. Xu aging such a population effectively.
and Jia (2007) applied an empirical Bayes
method that simultaneously estimates Statistical methods for epistatic QTL
127 main effects for all markers and main-
effect QTL (single marker) and the larg- Methods for mapping QTL with epistatic
est epistatic effect (single pair of markers) effects are still premature. Some methods
explained 18 and 2.6% of the phenotypic utilize models including a single epistatic
variance, respectively. On average, the sum effect at a time (Holland, 1998; Malmberg
of all significant main effects and the sum of et al., 2005), while others apply a model
all significant epistatic effects contributed selection strategy that searches for multi-
35 and 6% of the total phenotypic variance, ple epistatic effects (Carlborg et al., 2000;
respectively. Epistasis seems to be negligible Yi et al., 2003, 2005; Baieri et al., 2006).
for all the seven traits. They also found that Xu (2007) developed an empirical Bayes
whether two loci interact does not depend method that can simultaneously estimate
on whether or not the loci have individual main effects and all individual markers
main effects. This invalidates the common and epistatic effects of all pairs of markers.
practice of epistatic analysis in which epi- Recently, epistatic QTL analysis has been
static effects are estimated only for pairs of extended to the genome-wide level. Such
loci of which both have main effects. an example is that Yi et al. (2007) proposed
The contradicting reports may result a Bayesian model selection approach of
from the fact that QTL mapping studies and genome-wide interacting QTL for ordinal
analytical methods have not been able to traits in experimental crosses. Stich et al.
detect epistasis and thus the conclusions (2007) examined a genome-wide QTL map-
could be biased, preferentially identify- ping strategy using genome sequence infor-
ing genes that have large effects and/or act mation of RILs that were generated from
independently (Xu, 1997). This argument several crosses of parental inbreds. The
is supported by the results that QTL with SNP haplotype data of B73 and 25 diverse
large effects are detected in very different maize inbreds were used to simulate the
crosses and environments. The second rea- production of various RIL populations.
son is that ordinal QTL analysis was made Higher power to detect three-way interac-
with populations segregating for the whole tions was observed for RILs derived from
genome simultaneously so that it may be optimally allocated distance-based designs
difficult to detect an interaction in a spe- than from nested designs or diallel designs.
cific combination of QTL genotypes. For The power and proportion of false positives
example, Yano et al. (1997) predicted an to detect three-way interactions using a
interaction between the two largest QTL, nested design with 5000 RILs were for both
Hd1 and Hd2, for heading date. But the the 4-QTL and 12-QTL scenario of a mag-
existence of another QTL, Hd6 and its inter- nitude that seems promising for their iden-
action could not be detected in their pri- tification. To find an optimal model for the
mary population (F2), where many epistatic epistatic effect, Bayesian model selection
interactions could exist in so-called minor (George and McMulloch, 1993), by taking
QTL. Successful examples for detection of advantage of Markov chain Monte Carlo
epistatic interactions by using primary pop- (MCMC) sampling, is a more efficient algo-
ulations seem to be related to population rithm than both the exhaustive and the heu-
sizes and structures, quantitative traits, the ristic searches. The simulation experiments
number of existing QTL and QTL effects. conducted by Xu (2007) showed that the
The more QTL involved, the more diffi- MCMC-based methods performed satisfac-
cult is the detection of significant differ- torily when the sample size was 600. The
ences for individual QTL. Although using a empirical Bayes method is more robust to
large population size may help to detect epi- small sample size than the MCMC-based
static interactions, it increases experimental full Bayes methods. Considering that most
Molecular Dissection of Traits: Practice 269
QTL studies reported so far have sample order (Charcosset et al., 1994). The statistical
sizes less or much less than 600, a larger properties of QTL-by-genetic-background
mapping population has to be created in interaction tests in the case of a single
order to use the MCMC-based full Bayes digenic interaction has been analysed by
methods developed so far for QTL mapping means of simulations (Jannink and Jansen,
where epistatic effects are involved. 2001) and the result showed that it was
possible to identify the two QTL involved by
Population strategies for epistatic using an appropriate statistical test and also
QTL studies proposed guidelines for the interpretation of
the sign of the QTL-by-genetic-background
QTL interaction has been analysed using interaction effects. For more complex situa-
different types of plant materials includ- tions, the results are less predictable. Several
ing a series of chromosomal substitution digenic epistatic interactions that involve a
lines or QTL-NILs. If NILs are used, inter- given QTL may add up if similar in sign,
action between the target QTL and other yielding a significant interaction with the
major genes/QTL can be eliminated and genetic background whereas none of them
only epistasis between multiple target QTL were significant. They may also cancel out
needs to be considered. With removal of each other if opposite in signs and lead to no
noise from heterogeneous backgrounds, the detectable interaction with the genetic back-
proportion of variance explained by the tar- ground. It is therefore interesting to compare
get QTL will increase and minor QTL can both types of interactions.
be identified. Blanc et al. (2006) presented results
Epistatic interactions between QTL and from six connected F2 populations of 150 F2:3
the genetic background can be addressed families each, derived from four maize
using connected designs of multiple popu- inbreds and evaluated for three traits of
lations, provided the mating design contains agronomic interest using the MCQTL software
loops (in the simplest case, three popula- (Jourjon et al., 2005). This software permits
tions derived from three parents A B, B C the joint analysis of multiple populations
and A C). In such designs, epistasis can be using a composite interval mapping method
tested through the comparison between: (i) a based on a linearized regression model
connected additive model where the allele (Haley and Knott 1992; Charcosset et al.
effects at a QTL are assumed to be identical 2000). They first detected QTL in each pop-
in the different populations; and (ii) an ulation independently (single-population
hierarchical model where allele effects are analysis), secondly on the whole design
nested within populations, which accounts without taking into account connections
for possible interactions with the genetic (multi-population disconnected analysis),
background. Such an analysis tests for con- and then on the global design using connec-
sistency of allelic effects over populations tions (multi-population connected analysis).
and therefore permits evaluation of the Lastly, they tested for digenic interactions and
contribution of QTL-by-genetic-background for locus-by-genetic-background interactions,
epistatic effects to variation in QTL results estimated the contribution of epistasis to the
observed among populations, relative to variation of the traits studied and checked if
that of other factors such as allelic relation- epistatic interactions could explain discrep-
ships between parental inbreds and statis- ancies among the analyses. The joint estima-
tical noise. Tests for epistasis in connected tion of the different parental allele effects in
designs following this principle have been a connected model allowed them to identify,
proposed by several authors (Rebai et al., for each QTL, the parental inbred line(s)
1994; Charcosset et al., 1994; Jannink and that carried the most interesting allele(s).
Jansen, 2000, 2001). One of the advantages Taking into account the connections between
of these tests, when compared to testing populations increased the number of QTL
only for digenic interactions, is to enable the detected and the accuracy of QTL position
detection of epistatic interactions of higher estimates. Many epistatic interactions were
270 Chapter 7
detected, particularly for grain yield QTL Ayres et al. (1997) determined the relation-
(R2 increase of 9.6%). Allelic relationships ship between polymorphism at that locus
and epistasis both contribute to the lack and variation in amylose content. Eight wx
of consistency for QTL positions observed microsatellite alleles were identified from
among populations, in addition to the lim- 92 long-, medium- and short-grain US rice
ited power of the tests. cultivars, which explained 85.9% of the
Melchinger et al. (2008) derived quan- variation. The amplified products ranged
titative genetic expectations of QTL effects from 103 to 127 bp in length and contained
obtained from one-dimensional genome (CT)n repeats, where n ranged from 8 to 20.
scans with the triple testcross (TTC) design Average amylose content in cultivars with
and pairwise interactions between marker different alleles varied from 14.9 to 25.2%.
loci using two-way analyses of variance Using more diverse rice germplasm acces-
(ANOVA) under the F2- and the F-metric sions (n = 243), Zeng et al. (2000) identified
model. It was demonstrated that the TTC 15 alleles at the wx locus, using microsat-
design can partially overcome the limita- ellite class and G-T polymorphism, result-
tions of the design III in separating QTL ing in a total of 16 alleles identified so far.
main effects and their epistatic interac- Now the question is whether the multiple
tions in the analysis of heterosis, and that alleles identified at the waxy locus can
dominance additive epistatic interac- be associated to QTL alleles and whether
tions of individual QTL with the genetic the case can be extended to other traits or
background can be estimated with a one- genetic loci.
dimensional genome scan. Using molecular markers with multiple
alleles in QTL mapping will help identify
multiple QTL alleles. QTL studies using
different populations have identified some
7.4.4 Multiple alleles at a locus common QTL. It is necessary, however,
to further clarify whether they identified
Two-parent derived populations in diploid common or different alleles at those QTL.
crops have only two alleles segregating at Reporting the sizes of associated alleles
each locus. Identification of multiple alleles and using allele-rich markers in QTL stud-
requires comparison of populations derived ies will provide information required for
from different crosses. To distinguish QTL this clarification, with the assumption that
alleles identified in one cross from those in each marker allele has a corresponding QTL
another, all mapped alleles must be accu- allele.
rately sized and documented.
As an example of multiple alleles at a
locus, rice amylose content, mainly con-
trolled by the wx gene, can be taken as 7.5 QTL across Growth and
an example. Wide variation in amylose Developmental Stages
content occurs and cultivars with differ-
ent amylose content, varying from waxy Classical breeding methods rely heavily
(02%), very low (39%), low (1019%) on end-point measurements of agricultural
and intermediate (2025%) to high (> productivity, which are influenced by dif-
25%), have been selected in breeding pro- ferent parameters and consequently dif-
grammes. Conventional genetic studies ferent genes, in different environments.
using cultivars with different amylose con- If more specific measures of agricultural
tents revealed transgressive segregation in productivity can be identified, for exam-
F2s in almost all possible parental combi- ple, physical or chemical properties of the
nations (Pooni et al., 1993). A polymorphic plant which relate directly to productivity
microsatellite was identified in the wx gene under a particular environmental stress, it
(Bligh et al., 1995) located 55 bp upstream will be much more feasible to identify the
of the putative 5'-leader intron splice site. underlying genes by mapping. However,
1
Plate 1. Circle diagram for comparative genomics in cereals. (From Gale and Devos (1998) 1998 National
Academy of Sciences, USA.)
Plate 2. Disequilibrium matrix for polymorphic sites within sh1. Polymorphic sites are plotted on both the x-axis
and the y-axis. The pair-wise calculation of linkage disequilibrium (LD) (r 2) is displayed above the diagonal with
the corresponding P-values for Fishers exact test displayed below the diagonal. Coloration is indicative of the
corresponding P-value or r 2 values from the bars on right. Notice that some blocks of LD do persist over larger
distances within the gene, which do not necessarily correspond to tight linkage. (Reproduced with permission of
Annual Reviews Inc., from Flint-Garcia et al. (2003); permission conveyed through Copyright Clearance Center,
Inc.)
3
time intervals (e.g. Bradshaw and Stettler, vice versa; and (iv) proportionally acting:
1995; Plomion et al., 1996; Verhaegen et al., QTL either expressed with a proportion-
1997), from which the incremental or net ally increased or decreased rate or with a
effect of a QTL at each time interval can be consistent rate.
estimated. This is called effectincrement As an early example in dynamic QTL
analysis or conditional QTL mapping (Yan mapping, Yan et al. (1998a,b) used rice
et al., 1998a). Phenotypic data collected IR64/Azucena DHs to study the develop-
at different growth stages or time intervals mental characteristics of QTL for tiller
can be analysed either separately or jointly. numbers and plant height by conditional
Compared to separate analysis, joint analy- and unconditional interval mapping, in
sis can synthesize all the information from combination with phenotyping these traits
different times or time intervals to give a every 10 days after transplanting. They
comprehensive estimate of each QTL posi- concluded that many QTL identified at
tion, according to which a corresponding the early stages were undetectable at the
complete expression (or expression rate) final stage. Conditional mapping identi-
curve of each QTL can be estimated (Wu fied more QTL than unconditional map-
et al., 1999). In practice, both separate and ping. Temporal patterns of gene expression
joint analyses should be conducted. A third changed with developmental stages.
approach to looking at the QTL over time is Genes at a specific genomic region might
to do a multivariate analysis based on fitting have opposite genetic effects at various
the parameters of the growth curve (animal growth stages. For chromosomal regions
breeders call this general approach random significantly associated with plant height,
regression). conditional QTL were found only at one
The significant advantage of dynamic to several specific periods and no QTL for
mapping is that it provides a quantita- plant height was continually active during
tive framework for testing the interplay the entire period of growth.
between genetic (inter)actions and the
pattern of development in a time course.
Dynamic mapping constructs a setting
for precisely estimating and predicting 7.5.3 Statistical methods for dynamic
a number of fundamental events in the mapping
genetic control of development (Wu et al.,
2004), which include: (i) the timing of a Several dynamic mapping methods devel-
QTL to turn on and off to affect growth oped (Ma et al., 2002; Wu, W. et al., 2002;
in a time course; (ii) the duration of the Wu et al., 2004; Wu and Lin, 2006) have
dynamic genetic effect of a QTL; (iii) the made it possible to test interesting hypothe-
magnitude of the genetic effect of a QTL ses about the quantitative genetic control of
on maximal growth rate; and (iv) the plei- the rate of change in the phenotype as well
otropic effect of the growth QTL on other as the time specificity of the genetic effects.
developmental traits related to growth To be informative, these latter methods
processes. require that a measured trait value can be
In general, there are four types of obtained from the same individual at differ-
QTL effects in time-to-event experiments ent time points and that the phenotype can
(as modified from Wu and Lin (2006) and be described as a process unfolding along a
Johannes (2007)). These include: (i) early- continuous trajectory.
acting: QTL expressed at the early stage The biological and statistical advan-
of the developmental process but not dur- tages of dynamic mapping result from joint
ing the rest of the process; (ii) late-acting: modelling of the mean-covariance struc-
QTL expressed only at the late stage of the tures for developmental trajectories of a
developmental process; (iii) inversely act- complex trait measured at a series of time
ing: QTL highly expressed at the early stage points. While an increased number of time
but with low expression at the late stage, or points can better describe the dynamic
Molecular Dissection of Traits: Practice 273
pattern of trait development, significant dif- same time, the EC model recovered all of
ficulties in performing dynamic mapping the QTL the CPH model detects. It was con-
arise from prohibitive computational times cluded that potentially important QTL may
required as well as from modelling the be missed if their time-dependent effects
structure of a high-dimensional covariance are not accounted for.
matrix. An efficient approach for applying The most cumbersome issue in multi-
dynamic mapping to high-dimensional data ple QTL mapping for dynamic traits is how
is through dimensional reduction, i.e. the to determine the optimal number of QTL.
transformation that brings data from a high- To do this, variable selection via stepwise
to low-order dimension. Zhao et al. (2007) regression is commonly used in maxi-
developed a statistical model for dynamic mum-likelihood mapping. Reversely-jump
mapping of QTL that govern the develop- Markov chain Monte Carlo (RJ-MCMC) is
mental process of a quantitative trait on the the corresponding variable selection proce-
basis of wavelet dimension reduction. By dure used in Bayesian analysis. However,
breaking an original signal down into a spec- RJ-MCMC is shown to be subjected to poor
trum by taking its averages (smooth coeffi- mixing and slow convergence to the sta-
cients) and difference (detail coefficients), tionary distribution. Variable selection by
they used the discrete Haar wavelet shrink- Bayesian shrinkage analysis and stochastic
age technique to transform an inherently search are more efficient than RJ-MCMC
high-dimensional biological problem into (reviewed in Yang and Xu, 2007). In these
its tractable low-dimensional representation methods, no variable selection is conducted
within the framework of dynamic mapping in an explicit manner; rather, a treatment
constructed by a Gaussian mixture model. similar to variable selection is made implic-
The wavelet-based parametric dynamic itly by shrinking the effects of excessive
mapping holds great promise as a power- QTL to zero. Yang, R.Q. et al. (2006) devel-
ful statistical tool to unravel the genetic oped an interval-mapping procedure to
machinery of developmental trajectories map QTL for dynamic traits under the max-
with large-scale high-dimensional data. imum-likelihood framework. They fitted
To be informative, methods based on the growth trajectory by Legendre polyno-
the test of time specificity of the genetic mials. The method was intended to map one
effects require that a measured trait value QTL at a time and the entire QTL analysis
can be obtained from the same individual involved scanning the entire genome by fit-
at different time points and that the pheno- ting multiple single-QTL models. Yang and
type can be described as a process unfold- Xu (2007) proposed a Bayesian shrinkage
ing along a continuous trajectory. Johannes analysis for estimating and mapping mul-
(2007) developed the idea of time-varying tiple QTL in a single model. The method
QTL effects in the context of time-to-event is a combination between the shrinkage
analysis. An extension of the Cox model (EC mapping for individual quantitative traits
model) (Therneau and Grambsch, 2000) was and the Legendre polynomial analysis, an
applied to an interval-mapping framework. extensively used linear growth model in
In its simplest form, this model assumes animals, for dynamic traits. Simulation
that the QTL effect changes at some time study showed that the method generated a
point t0 and follows a linear function before much better signal for QTL than the interval-
and after this change point. The approxi- mapping approach.
mate time point at which this change occurs Although various statistical methods
is estimated. Using simulated and real data, have been developed to meet the require-
the mapping performance of the EC model ments of dynamic mapping for different
was compared to the Cox proportional haz- trait categories, effectiveness and efficiency
ards (CPH) model, which explicitly assumes of these methods needs further studies.
a constant effect. The results showed that Application of these methods to QTL map-
the EC model detects time-dependent QTL, ping needs full support of user-friendly
which the CPH model fails to detect. At the mapping software.
274 Chapter 7
7.6 Multiple Traits and Gene effect size and also many forms of genetic
Expression complexity. Direct evidence for genetic
complexity of transcript levels comes from
Plant breeders manage numerous pheno- detecting multiple QTL for at least some
types simultaneously in order to develop a expression traits. Even the detected QTL
suitable breeding product for a suitable envi- typically explain only a minority of trait
ronment. Geneticists face the same challenge variation (Rockman and Kruglyak, 2006).
when handling many transcripts in genetic In yeast, the median phenotypic effect of a
mapping for gene expression. In terms of the detected QTL was 27% of genetic (inherit-
complexity and variability, both phenotypes able) variance explained and only 23% of
handled by breeders and transcripts handled traits had a QTL that explained > 50% of the
by geneticists belong to a same category: genetic variance (Brem and Kruglyak, 2005;
multiple traits. However, discussion in this Fig. 7.4). Whereas visible trait variation is
section will focus on gene expression. All often described by several QTL that collec-
issues discussed in this section can be found tively account for up to half of the genetic
in the corresponding discussion on multiple variance and individually rarely > 20% of it,
quantitative traits in Chapter 1. eQTL accounting for 2550% of transcrip-
tional variation are prevalent (as summarized
by Gibson and Weir, 2005). It is clear that
major-effect QTL are more prevalent than
7.6.1 Features of gene expression many investigators would have expected.
Fourthly, transcriptional variation is
An emerging approach is to ask whether the probably highly polygenic. It is important
parameters of gene activity at the level of to recognize that even in the cases where
transcription regarding additivity, heritabil- a major-effect eQTL explains half of the
ity and complexity parallel those of clas- genetic variance for transcript abundance,
sical phenotypic traits (Gibson and Weir, the other half remains to be accounted for
2005). There are six features that character- and in most cases will be caused by unde-
ize gene expression studies. First, there is tected loci. Because conservative thresholds
now a reasonable expectation that for any of detection are required to adjust for the
tissue from any organism sampled under a extraordinarily large number of comparisons
particular set of environmental conditions, involved in a genome-wide linkage scan for
1050% of the transcripts will be found several thousand transcripts (the so-called
to vary as a result of heritable differences multiple comparison problem), most true
(Stamatoyannopoulos, 2004). eQTL remain undetected. Based on a yeast
Secondly, numerous examples were data set studied by Brem and Kruglyak
observed of non-additivity of transcription (2005) transcription is more often likely to be
including over- and under-dominance (F1 highly polygenic than monogenic: only 3%
with higher or lower expression, respec- of highly heritable transcripts are consist-
tively, than either parent), parent-of-origin, ent with single locus inheritance, 18% sug-
maternal and reciprocal F1 effects, indicat- gest control by two loci and > 50% require
ing an unexpected complexity to the map- at least five loci under an additive model
ping of genotype on to the transcriptional (Fig. 7.5). They also argue that more than
phenotype. Similar results have been half of the transcripts show transgressive
observed in studies targeting specific can- segregation (transcript abundance in F2 prog-
didate genes in maize (Auger et al., 2005) eny falls outside the range of both grandpar-
and wheat (Sun et al., 2004) and in a mas- ents) and that > 15% are better explained
sively parallel signature sequencing (MPSS) by models that include epistatic interaction.
analysis of hybrid oysters (Hedgecock et al., Clearly, the landscape of gene expression in
2002). yeast is genetically complex and it can be
Thirdly, genetic complexity of transcript expected that it will be anything but more
levels is reflected by the QTL number and complex in higher eukaryotes.
Molecular Dissection of Traits: Practice 275
0.30
0.25
Fraction of loci
0.20
0.15
0.10
0.05
0
0 0.2 0.4 0.6 0.8 1.0
Fraction of genetic variance
Fig. 7.4. Most gene expression traits are affected by multiple loci. Each bar represents the fraction of
QTL that explain a percentage of genetic variance in the range on the x-axis. For each trait with significant
linkage(s), only the single most significant QTL is included. Data are derived from the first table in Brem
and Kruglyak (2005). The panels below the plot show examples of QTL that explain, from left to right,
low (10%), average (29%) and high (94%) percentages of genetic variance. In each panel, the left-most
column shows the relative expression of the corresponding genes in all 112 segregants (Seg), the next
two columns show the expression in replicates of the two parent strains (BY, RM), and the last two col-
umns show the expression in the segregants that inherit the QTL alleles for the first and second parent
strains (Seg BY, Seg RM). From Rockman and Kruglyak (2006) reprinted by permission from Macmillan
Publishers Ltd.
70
Max = n underlying variation in growth in an inter-
60 specific BC population of eucalyptus. QTL
50 analysis of transcript levels of lignin-related
40
genes showed that their mRNA abundance
is regulated by two genetic loci, coordinat-
30
ing genetic control of lignin biosynthesis.
20 These two loci co-localize with QTL for
10 Min < n + 1 growth, suggesting that the same genomic
0
Min = n regions are regulating growth and lignin con-
1 2 3 4 5 6 7 8 9 10 tent and composition. Using a high-density
Number of eQTL (n) oligonucleotide array and phenotypically
divergent rice accessions and their transgres-
Fig. 7.5. Inference of polygenic regulation from
sive segregants, Hazen et al. (2005) measured
eQTL analysis. A plot of the data from Brem and
Kruglyak (2005) showing the range of complexity the expression of approximately half of the
of transcriptional regulation in yeast inferred by genes in rice (21,000) to associate changes
their likelihood analysis. The error bars indicate the in stress-regulated gene expression with
range of the percentage of differently expressed QTL for osmotic adjustment (OA), which is
transcripts in the F2 segregation that are predicted a known mechanism of drought tolerance. A
to be regulated by n genes indicated on the x-axis. total of 662 transcripts were observed to be
The large Xs indicate the minimum number of tran- expressed differentially between the parental
scripts regulated by more than n eQTL: for exam- lines. Only 12 genes were induced in the low
ple, at least 20% of transcripts are predicated to
OA parent (CT9993) at moderate dehydra-
have more than ten eQTL. The circles place a low
tion stress levels while over 200 genes were
limit on the number of transcripts regulated by up
to n eQTL: for example, at least 10% of transcripts induced in the high OA parent (IR62266).
are regulated by four or fewer eQTL. Reprinted Sixty-nine genes were upregulated in all
from Gibson and Weir (2005) with permission from high OA lines and nine of those genes were
Elsevier. not induced in any of the low OA lines, of
which four could be annotated as follows:
sucrose synthase, a pore protein, a heat shock
genes and environmental factors (Jansen protein and a late embryogenesis abundant
and Nap, 2001). This approach has facili- (LEA) protein. Previous conventional QTL
tated the identification of genomic regions mapping using the same two rice accessions
or eQTL associated with transcript variation showed that the parental genotypes differed
in co-regulated genes and when correlated for five of the OA QTL, that two of these QTL
with phenotypic data from a quantitative are syntenic with other cereal drought stress
character, has successfully identified candi- QTL (Zhang et al., 2001) and a major OA QTL
date genes by co-localizing gene eQTL and in the same genomic region on rice chromo-
trait QTL (Brem et al., 2002; Klose et al., some 7 is also reported in a different cross
2002; Wayne and Mclntyre, 2002; Schadt (Lilley et al., 1996). Of the 3954-probes that
et al., 2003; Rockman and Kruglyak, 2006; correspond to this part of the chromosome,
Keurentjes et al., 2007b). few showed a differential expression pattern
Plants exhibit massive changes in gene between the high and low OA lines. Thus,
expression during morpho-physiological these preliminary results demonstrate the
and reproductive development as well when power of integrating quantitative analysis of
exposed to a range of biotic and abiotic gene expression data with genetic map infor-
stresses. These have been observed as dif- mation to identify genetic and metabolic net-
ferences in transcriptional profiles in many works that would not have been identified
crops. Variation in transcript abundance is through conventional QTL analysis.
Molecular Dissection of Traits: Practice 277
Guo, M. et al. (2006) applied genome- Most eQTL mapping studies to date
wide transcript profiling to gain a global pic- have searched for eQTL by analysing gene
ture of the ways in which a large proportion expression traits one at a time. As thou-
of genes are expressed in the immature ear sands of expression traits are typically ana-
tissues of a series of 16 maize hybrids that lysed, this can reduce power because of the
vary in their degree of heterosis. Key obser- need to correct for the number of hypoth-
vations include: (i) the proportion of allelic esis tests performed. In addition, gene
additively expressed genes is positively expression traits exhibit a complex correla-
associated with hybrid yield and heterosis; tion structure, which is ignored when ana-
(ii) the proportion of genes that exhibit a bias lysing traits individually. To address these
towards the expression level of the paternal issues, Biswas et al. (2008) applied two
parent is negatively correlated with hybrid different multivariate dimension reduction
yield and heterosis; and (iii) there is no cor- techniques, the singular value decomposi-
relation between the over- or under-expres- tion (SVD) and independent component
sion of specific genes in maize hybrids with analysis (ICA) to gene expression traits
either yield or heterosis. The relationship of derived from a cross between two strains
the expression patterns with hybrid perform- of Saccharomyces cerevisiae. In total, 21
ance is substantiated by analysis of a geneti- eQTL were found, of which 11 were novel
cally improved modern hybrid (Pioneer and both cis and trans-linkages to the meta-
hybrid 3394) versus a less improved older traits were observed. These results demon-
hybrid (Pioneer hybrid 3306) grown at dif- strated that dimension reduction methods
ferent levels of plant density stress. The are a useful and complementary approach
proportion of allelic additively expressed for probing the genetic architecture of gene
genes is positively associated with the mod- expression variation.
ern high-yielding hybrid, heterosis and As we have discussed earlier, a range
high-yielding environments, whereas the of biological and statistical tools enable
converse is true for the paternally biased research on natural variation to move from
gene expression. The dynamic changes simple reductionistic studies focused on
of gene expression in hybrids responding to individual genes to integrative studies con-
genotype and environment may result from necting molecular variation at multiple loci
differential regulation of the two paren- with physiological consequences. Hansen
tal alleles. Their findings suggested that et al. (2008) provides a comprehensive
differential allele regulation may play an review focusing on recent examples that
important role in hybrid yield or heterosis demonstrate how expression QTL data can
and provide a new insight to the molecular be used for gene discovery and to untangle
understanding of the underlying mecha- complex regulatory networks. The latter is
nisms of heterosis. also briefly discussed in Chapter 10.
Recently, Keurentjes et al. (2007b)
described the results of genome-wide expres-
sion variation analysis in an RIL population
of A. thaliana and for many genes varia- 7.7 Selective Genotyping
tion in expression could be explained by and Pooled DNA Analysis
eQTL. The nature and consequences of this
variation are discussed based on additional As introduced in Chapter 6, replacing indi-
genetic parameters, such as heritability and vidual genotyping by selecting only the
transgression and by examining the genomic individuals from the high and low tails of
position of eQTL versus gene position, the population distribution (selective gen-
polymorphism frequency and gene ontol- otyping) or DNA analysis in pools of the
ogy. Besides, the authors have developed selected individuals (pooled DNA analy-
an approach for genetic regulatory network sis) was proposed for QTL analysis and for
construction by combining eQTL mapping testing of linkage between markers and a
and regulatory candidate gene selection. major gene. This concept is referred to as
278 Chapter 7
tail analysis (Hillel et al., 1990; Dunnington one for the other at the two marker loci. The
et al., 1992; Plotsky et al., 1993), bulked result is that each DNA pool is homozygous
segregant analysis (Giovannoni et al., 1991; at all loci within and adjacent to the target
Michelmore et al., 1991), or selective DNA region. However, the homozygous target
pooling (Darvasi and Soller, 1994) and is an region differs between the two pools in
effective solution to reduce costs associated parental origin, thus providing the basis
with genotyping large mapping popula- for selection of polymorphic markers spe-
tions. As reducing the size of a QTL map- cific to the target region. When pooled DNA
ping population will decrease the detection samples are subsequently utilized as tem-
power (Charcosset and Gallais, 1996) and plates for random primer amplification via
also increase the QTL confidence interval, PCR, polymorphism should result only if
as well as the risk of detecting false QTL, the primer primes within or adjacent to the
selective genotyping can save much more target interval. This polymorphism can also
cost than using smaller population sizes be detected by probing with other molecu-
while maintaining the same mapping power lar markers.
as the large populations. Take a large popu- The genotyping in this approach
lation with 500 individuals and select 25 becomes very simple since it relies on
individuals from each tail, which means just two DNA pools each of plants from
that selective genotyping will only cost one or other of the phenotypic extremes
10% (= 2 25/500) of the total cost required (Giovannoni et al., 1991; Michelmore et al.,
for genotyping the whole population. When 1991). The pools that have been used in
pooled DNA analysis is used, two tails can plants usually consist of 1015 individuals
be genotyped as two individuals, which taken from as large a population as possible.
brings the genotyping cost down to 0.4% This approach has already been successfully
(= 2/500) of the total cost. Apparently, the used in many plants (e.g. Barua et al., 1993;
bigger the original population size, the more Hormaza et al., 1994; Villar et al., 1996; van
saving there will be on all related costs Treuren, 2001; Zhang et al., 2002).
including genotyping.
A B
Population
distribution
Selection
DNA pools
0.0
Allele frequency
0.0
0.0
R plants S plants High Low
tail tail
Fig. 7.6. Selective genotyping and pooled DNA analysis. (A) Pooled analysis using disease resistant
(R) and susceptible (S) plants as an example. DNA pools are constructed from R and S plants selected
from a mapping population and then genotyped by molecular markers. When the two DNA pools show
different alleles at a specific marker locus, the marker is linked with the disease response, while when
both pools show the same heterozygous genotype, the marker is unlinked with the disease response.
(B) Pooled DNA analysis using extreme plants selected for a target quantitative traits from two tails of
a normal distribution in the mapping population. Markertrait linkage is revealed by allele frequency at
specific marker loci. When allele frequencies are significantly different between two pools at a marker
locus, the marker is linked with the target traits, while when the allele frequencies are very close to each
other (each approximately to 0.5), the marker is unlinked with the target trait. In both A and B, assume
that the marker is dominant and reveals polymorphisms between the parental lines that are used to
derive the mapping population.
plants (Foolad and Jones, 1993; Zhang, QTL. An additional interest of the marker
L.P., et al. 2003; Wingbermuehle et al., 2004; frequency approach is that it makes it pos-
Coque and Gallais, 2006). It is especially sible to use DNA pooling of selected indi-
useful for genes that have large effects on viduals to estimate the frequencies needed
the trait of interest. It can also be used for for the tests (Darvasi and Soller, 1994).
traits controlled by a few major-effect QTL
(Quarrie et al., 1999). Furthermore, in
maize, with two cycles of recurrent selec-
tion on phenotype from a population of F4 7.7.3 The power of selective genotyping
independent families, Moreau et al. (2004) and pooled DNA analysis
have shown that the significant changes in
marker allele frequency were for a marker There are several problems associated with
locus located in the vicinity of the detected pooled or bulked DNA analysis in plants
280 Chapter 7
as summarized by Xu and Crouch (2008). (ranged from 200 to 3000), tail population
These include: (i) a relatively small number size (15100 plants in each tail, equivalent
of markers has been used to try to cover the to 1350% of selection rate), number of
whole genome with the assumption that QTL (15), marker density (115 cM), QTL
the recombinant frequencies are consist- effect (explaining 120% of phenotypic
ent across the genome and genes of interest variation), two linked QTL and two QTL
can be readily identified within a marker with epistatic interaction. One hundred
density of 1525 cM; (ii) contrasting indi- simulation runs were carried out for each
viduals have been selected from a relatively scenario from which the power of QTL
small population size so that the phenotypic detection and the mean LOD score were
difference between the pools may be only then calculated.
big enough for identification of large-effect Comparative analysis of two selective
genes/QTL; (iii) when allele signal is judged genotyping strategies (Fig. 7.7) indicated
by a gel-based genotyping system, allele fre- that conventional selective genotyping (Fig.
quency in each pool cannot be quantified 7.7A, Strategy A, where relatively small total
accurately and the allele signal generated and tail population sizes were used with a
by a small percentage of individuals in low density of marker coverage), resulted in
the pool cannot be detected and thus, the the detection of only one marker in the tar-
genetic difference between the pools can be get region with an average LOD score of 3.94
only scored as presence and absence; and and power of detection of 67%. In contrast,
(iv) because of the above reasons, a relatively Strategy B (Fig. 7.7B), where large total and
small number of individuals (about 15) is tail population sizes were used along with
included in each pool to guarantee that the a high density of marker coverage, resulted
real associated markers will not be missed, in the detection of multiple markers around
at a cost of a high level of false positives the target region with the highest having a
(markertrait is not really associated to each LOD score of 10.37 and a power of detec-
other but still indicated so statistically). The tion of 98%.
false positive markers have to be eliminated When various QTL effects (responsible
by a whole population validation step with for 120% of the total phenotypic variation),
all putative markers. tail sizes (15100) and total population sizes
Simulation studies have been carried (200, 500, 1000 and 3000) (Fig. 7.8) were
out by Xu et al. (2008) and Sun et al. (2009) used in the simulation analysis, the power of
using QTL ICIMAPPING (available at http:// QTL detection indicated the optimum total
www.isbreeding.net), an integrated com- and tail population sizes required for detec-
puting package for common QTL mapping tion of small QTL. To identify QTL explain-
methods including single marker analysis, ing 15% of the phenotypic variation with
traditional interval mapping (Lander and a 95% or higher power of QTL detection,
Botstein, 1989), and inclusive composite will require a population size of 200 or
interval mapping for additive (Li et al., more with a minimum tail size of 15, which
2007) and interacting (Li et al., 2008) QTL. matches most reported cases of successful
Several parameters associated with selec- use of bulked DNA analysis. However, to
tive genotyping were simulated based on detect QTL of small effect, ranging from 3
the assumption that phenotypic extremes to 10% of the phenotypic variation, 50100
from two tails of a recombinant inbred pop- individuals needed to be selected from each
ulation can be reliably selected and that tail of a population with 1000 individuals,
they can be genotyped either individually in order to have a 95% power of QTL detec-
so that the allele frequency in each tail can tion (Fig. 7.8). The simulation analysis also
be inferred or genotyped using bulked DNA indicated that the power of detection would
from each tail so that the allele frequencies not change when multiple QTL (two to five)
can be estimated based on the relative sig- are involved but they are independent of
nal strength of two DNA pools. Simulated each other. The simulation also indicated
parameters include total population size that selective genotyping can be also used to
Molecular Dissection of Traits: Practice 281
A
80 4.5
70 4.0
60 3.5
LOD = 3.0
3.0
50
Power (%)
2.5
LOD
40
Power 2.0
30 LOD
1.5
20 1.0
10 0.5
0 0.0
0 15 30 45 60 75 90 105 120 135 150
cM
B
120 12
100 Power 10
LOD
80 8
Power (%)
LOD = 6.0
LOD
60 6
40 4
20 2
0 0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
cM
Fig. 7.7. Effects of selective genotyping strategies on detection power and mean LOD score around the
target region (15 cM, grey area) assuming the QTL explain 10% of phenotypic variation. (A) Strategy A:
population size = 200, tail size = 15, marker density = 15 cM, resulting in only one marker showing posi-
tive in the target region with an LOD score = 3.94 and power of detection = 67%, which has been widely
used in conventional bulked DNA analysis. (B) Strategy B: population size = 500, tail size = 30, marker
density = 1 cM, resulting in multiple markers showing positive in the target region with LOD = 10.37 and
power of detection = 98%, which is proposed for selective genotyping-based fine mapping.
n = 200 n = 500
100 100
90 90
80 80
70 70
Power (%)
Power (%)
60 60
50 50
40 40
30 30
20 20
10 100 10 100
0 50 0 50
iz e
iz e
30 30
20 15 20 15
il s
il s
15 10 15 10
5 5
Ta
Ta
3 1 3 1
QTL effect QTL effect
(%) (%)
n = 1000 n = 3000
100 100
90 90
80 80
70 70
Power (%)
Power (%)
60 60
50 50
40 40
30 30
20 20
10 100 10 100
0 50 50
0
ize
ize
30 30
20 15 20 15
il s
il s
15 10 15 10
5 5
Ta
Ta
3 1 3 1
QTL effect QTL effect
(%) (%)
Fig. 7.8. Detection power of selective genotyping under various QTL effects (120%, percentage
phenotypic variation explained by identified QTL), tail sizes (15100) and total population sizes
(2003000). A total of 100 permutations were implemented for each case and each combination.
ANOVA or is better whereas USG is less All-in-one plate genetic mapping of all
powerful or equivalent. target traits in one step
agronomic traits of importance in a crop genes that interact with each other and the
species (Xu et al., 2008; Sun et al., 2009). environment, selective genotyping will face
the same challenges as experienced with
Genome-wide association mapping linkage-based QTL mapping using entire
population genotyping.
Developments in SNP genotyping technolo-
gies and methodologies recently reported Integration with selective phenotyping
in human genomics have made it possible
now to carry out genome-wide linkage- The selective phenotyping method involves
disequilibrium-based association mapping preferentially selecting individuals to
in human beings by using an integrated maximize their genotypic dissimilarity.
technology package including selective Selective phenotyping is most effective
genotyping, pooled DNA analysis and when prior knowledge of genetic archi-
microarray-based SNP genotyping with tecture allows focus on specific genetic
100,000 markers (Sham et al., 2002; Meaburn regions (Jin et al., 2004; Jannink, 2005) and
et al., 2006; Yang, H.-C. et al., 2006a). This specific allele combinations. As genotyping
system has the power to estimate allele fre- becomes cheaper, it may be more efficient
quencies and identify unique alleles from a to first carry out low density genotyping of
pooled DNA sample of several hundreds of the whole population in order to identify
individuals. If this approach is successfully the most informative subset of individuals
translated to plants it will resolve many of in terms of minimum level of relatedness
the constraints of pooled DNA analysis. The between individuals plus optimum subpop-
high frequency of false positive markers ulation structure and allele representative-
that would be detected when substantially ness. Then carry out precision phenotyping
fewer plants are used in each pool could of this subset, particularly for the traits that
be avoided if a pooled DNA can be formed are difficult or expensive to evaluate. And
using many more plants selected from a then finally carry out dense whole genome
large population. However, optimizing SNP genotyping of the individuals from the tails
genotyping systems for pooled DNA analy- of the phenotypic distribution. In this way,
sis is considerably more complicated than the total number of individuals to be pheno-
for SSR markers and suffers a much higher typed and genotyped may not change, but
level of redundancy. Where this has been the power of the analysis will be dramati-
achieved in human genomics, it required at cally increased. This approach could also
least half a million SNPs as a starting point be achieved for traits where phenotypic
in order to identify 100,000 optimized SNPs extremes can be easily identified by using a
suitable for pooled analysis. This density of simple screening method, for example abi-
SNP markers is available in rice and maize otic stress tolerance where a large number
and in due course other crops when whole of plants/families can be eliminated eas-
genome sequences are generated. ily under stress conditions through visual
Genome-wide association mapping scoring. As the original population can be
may provide a shortcut to discovering func- selected under a strong environmental stress
tional alleles and allelic variations that are to eliminate a large proportion of the plants,
associated with agronomic traits of inter- only the most stress tolerant and probably
est. Selective genotyping and pooled DNA the most stress sensitive plants too, are
analysis can be extended to using inbred selected for genotyping. Following selective
lines with extreme phenotypes selected genotyping of the individuals with extreme
from various collections of germplasm. phenotypes, precision phenotyping of the
This is in principal similar to linkage- resultant subset of individuals can be car-
disequilibrium-based association mapping ried out using physiological component and
but using selected phenotypic extremes. surrogate traits. High-density planting and
For association mapping of quantitative selection at early stages of plant develop-
traits governed by a large number of minor ment, combined with selective phenotyping
A Large-size segregating populations B Large-size segregating populations
284
(F2, BC, composite F1, RIL, DH, etc., n > 500) (F2, BC, composite F1, RIL, DH, etc., n > 500)
High phenotypic extreme Low phenotypic extreme Phenotypic extremes Phenotypic control
(3050 plants/fixed lines) (3050 plants/fixed lines) each with 3050 plants/fixed lines 3050 plants/fixed lines
selected from 500+ selected from 500+ selected under randomly selected under
plants/fixed lines plants/fixed lines a target environment a normal environment
Chapter 7
Extraction of DNA Extraction of DNA Extraction of DNA Extraction of DNA
Fig. 7.9. Flowchart for large-scale selective genotyping and genetic mapping, including: selection of phenotypic extremes from large-size segregating populations,
phenotype confirmation, DNA extraction, genotyping and markertrait association analysis. (A) A procedure for most target traits which can be scored phenotypically
for all individuals/fixed lines, and then high- and low-phenotypic extremes are selected for further analysis. (B) A procedure particularly suitable for abiotic and biotic
stress tolerance where only the phenotypic extreme for tolerance is available under a target environment and comparison is made between the extreme and the
phenotypic control that is randomly selected from the individuals/fixed lines under a normal environment. From Xu and Crouch (2008) with permission.
Molecular Dissection of Traits: Practice 285
and genotyping should also be investigated in this endeavour (Dudley and Lambert,
as a potential option for some traits in order 2004), particularly regarding the success of
to allow one to work with more plants/fami- marker-assisted recurrent selection (MARS)
lies at the same cost (Xu and Crouch, 2008). to accumulate favourable alleles at numer-
Where the target trait is influenced by plant- ous loci. In this approach, pyramiding of
ing density or strong selection pressure this minor genes can be achieved using MARS
will clearly confound the ability to make to accumulate minor QTL where decades
genetic gain. However, many major-gene of breeding efforts have resulted in the fixa-
controlled traits can be investigated in this tion of all major genes.
way without much disturbance. It can be expected that selective geno-
Figure 7.9 shows this method for detec- typing and pooled DNA analysis, which have
tion of markertrait association for stress tol- been widely used with mixed success in
erance and other traits that can be selected genetic mapping, will become increasingly
for phenotypic extremes under a target important in genetic mapping and MAS and
environment. It can be inferred that pheno- will gradually replace entire population gen-
typic extremes or extremely stress tolerant otyping in many cases. Selective genotyping
plants are those with an accumulation of will greatly facilitate and improve genetic
favourable alleles from multiple loci, each mapping and marker-assisted breeding pro-
with small to large effects, so that genetic cedures in general. As genome-wide selec-
mapping will identify the genetic regions tive genotyping become possible, an effective
with relatively large accumulative effect on information management and data analysis
the target trait. In this case, lessons learnt system will be required to make full use of
from long-term selection of protein and oil the potentialities of selective genotyping in
content in maize may be highly instructive genetics, genomics and plant breeding.
8
Marker-assisted Selection: Theory
reliably been assessed, the breeder is able the magnitude of their effects and the way
to monitor the transmission of trait genes these loci interact. Hence, error margins on
via closely linked markers, thus enabling the measurement of phenotypes tend to be
genotype building, i.e. construction of significantly larger than those of genotyping
desired genotypes by deliberate crossing scores based on DNA markers.
and selection, using the marker genotype as 2. Increased efficiency: DNA markers can
a selection criterion. be scored at seedling stage or even based on
The potential value of genetic markers, seed before germination. This is especially
linkage maps and indirect selection in plant advantageous when selecting for traits which
breeding has been known for over 80 years. are expressed only at later stages of develop-
Since the advent of DNA marker technology ment, such as traits associated with flower,
in the 1980s, it has dramatically enhanced fruit and seed. By selecting at the seedling
the efficiency of plant breeding. In the past 20 stage or based on seed DNA, considerable
years, a number of breeding companies have, amounts of time and space can be saved.
to varying degrees, used markers to increase 3. Reducing costs: there are ample traits
the effectiveness of selection in breeding where the determination of the phenotype
and to significantly shorten the development costs more than the performance of geno-
time of cultivars (Dwivedi et al., 2007). Now, typing using a PCR assay or hybridization.
advances in automated technology enable a In a high-throughput setting, the material
new approach in marker-assisted breeding, and consumable cost for a PCR assay will
called Breeding by Design. The advances in typically not exceed US$2. In comparison,
applied genomics and the possibility to gen- the growth of a tomato or pepper plant to
erate large-scale marker data sets provide us full maturity in a heated greenhouse will
with the tools to determine the genetic basis cost approximately US$20. Every plant that
for all traits of agronomical importance. Also, can be rejected before planting, particularly
methods for assessing the allelic variation at for those with the seed that is big enough
these agronomically important loci are now for single seed-based DNA extraction, will
available. This combined knowledge will in such settings save a considerable amount
eventually allow the breeder to combine of money.
favourable alleles at all these loci in a con-
trolled manner, leading to superior cultivars The use of DNA markers for indirect
(Peleman and van der Voort, 2003). selection offers greatest benefits for quanti-
Changing concepts and molecular app- tative traits with low heritability as these
roaches provide opportunities to develop are the most difficult characters to assess
rational and refined breeding strategies. in field experiments. Obviously, the devel-
Knowledge about map position and allelic opment of marker-assisted assays for such
variation at agronomically important loci traits is difficult and costly due to the exten-
in concert with available, easy-to-assay sive phenotypic assays required for such
molecular markers have made possible the traits. However, once the knowledge exists
design of superior cultivars. Compared to to estimate the parameters which determine
phenotypic assays, as summarized from the trait of interest, a well-designed experi-
Xu, Y. (2002), Peleman and van der Voort mental set up will result in the availability
(2003), Xu, Y. (2003) and Xu and Crouch of marker-assisted selection (MAS) tools,
(2008), DNA markers offer great advantages which can reduce to a major extent future
to accelerate the cultivar development time application of phenotypical assays (Peleman
as a result of the following. and van der Voort, 2003). As described in
previous chapters, molecular marker tech-
1. Increased reliability: the outcome of nology will help identify favourable alleles
phenotypic assays is affected, among others, for agronomic traits, associate these alleles
by environmental factors, the heritability with specific molecular markers and intro-
of the trait, the number of genes involved, duce them from one genetic background
288 Chapter 8
to another through MAS. Theoretical con- for efficient MAS, including: (i) suitable
siderations in MAS will be discussed in this genetic markers and their characterization;
chapter and the practical issues in MAS will (ii) high-density molecular maps; (iii) estab-
be discussed in Chapter 9. lished markertrait associations for traits of
interest; (iv) high-throughput genotyping
systems; and (v) functional data analysis
and delivery.
8.1 Components of Marker-assisted
Selection
(i) chromosomal location associated with the coupling reliable chemical assays with an
trait must be reduced to a manageable piece appropriate detection system to maximize
of DNA if cloning of specific genes is neces- efficiency with respect to accuracy, speed
sary; (ii) to identify all the related genes for and cost. With current technology plat-
a specific trait, a high-density genetic map forms (e.g. Illumina), one lab can deliver
is required because the fewer markers are throughputs in excess of one million data
used, the smaller proportion of genetic fac- points per day, with an accuracy of > 99%,
tors contributing to that trait will be sam- at a cost of US$0.060.10 per data point
pled; (iii) large genetic distances between using array-based SNP genotyping system.
markers and target traits will contribute to In order to meet the demands of the com-
the rapid decrease of MAS efficiency after ing years, however, genotyping platforms
several successive cycles of selection; and need to deliver throughputs in the order
(iv) to minimize linkage drag involved in of one million genotypes per day at a cost
gene introgression, closely linked markers of only a few cents per genotype (instead of
around the target region are needed. per data point). In addition, DNA template
QTL mapping presumes accurate phe- requirements must be minimized such that
notypic scoring methods, something that hundreds of thousands of SNPs can be
can be difficult to optimize and even more interrogated using a relatively small amount
difficult to keep consistent for months or of genomic DNA. Released whole genomic
years. Just a few mis-scored individuals can sequences in model and crop plants includ-
totally confound QTL discovery and place- ing Arabidopsis, rice and maize have been
ment (Young, 1999). This is also true for used to develop gene-based SNPs for other
fine mapping of major genes for map-based related species.
cloning, where mis-scoring of several plants
in a population with thousands of individu-
als will result in a large error (up to 1 cM) in
estimating genetic distances. High levels of 8.1.5 Data management and delivery
accuracy are required to dissect a chromo-
somal region associated with a given trait To handle the daily data flow from the lab-
and narrow down the candidate region to a oratory to the breeder and integrate infor-
single contig or several mega bases, that is, mation from molecular markers, genetic
a set of clones that can be assembled into a mapping and phenotyping, many informat-
linear order. ics tools are needed. Decision support tools
required in molecular breeding are fully
discussed in Chapter 15 so only data man-
agement and delivery are briefly described
8.1.4 Genotyping and high-throughput here as a component of MAS.
genotyping systems For efficient data management and
delivery, it is important for all researchers
To make marker-based technology practi- to follow general rules through all these
cal for breeding applications, an automated procedures. A standard reporting system is
genotyping system is required. As an ulti- also critical for comparative genomics, QTL
mate marker type, SNPs have gained wide allelism tests, data sharing and mining and
acceptance as genetic markers for use in the correspondence between major genes
linkage and association studies, especially and QTL. As discussed by Xu, Y. (2002), a
for human genetics and many crop plants standard system for markertrait associa-
as well. High-throughput SNP genotyping tion should include associated alleles and
has great potential for many applications, allele characterization such as allele sizes,
including MAS on the basis of whole genome gene effects, variation explained by each
approaches. This has led to a requirement for gene or all genes in the model, gene inter-
high-throughput SNP genotyping platforms. action if more than one gene is identified
Development of such a platform depends on and genotype-by-environment interaction
Marker-assisted Selection: Theory 293
that the donor segment dragged along foreground selection can involve one to
with the target gene is smaller than the several markers. The simplest way is to use
segment bracketed by these flanking one closely linked marker (on either side
markers? of the target locus). The most complicated
Does it pay to increase the number of approach is to integrate foreground selec-
markers for background selection? If tion with background selection using mul-
so, to what extent does this depend on tiple markers for the target locus and many
population sizes used and/or genome others for covering the entire genome, this
size? is referred to as whole genome selection
If a certain pre-set goal, e.g. less than in this book and differs from genome-wide
0.05 DGC, is to be achieved in a given selection which will be discussed later in
number of generations, should popula- this chapter. The most frequently used
tion size in successive generations be approach is to use a triplet, markertarget
constant or is it better to vary popula- marker. Depending on how close the linked
tion size over generations? markers are to the target, the population
If the number of generations is not a sizes required for identification of particu-
limiting factor, but the total number of lar genotype, the cost and efficiency related
plants to be genotyped is, then what is to foreground selection compared to phe-
the optimal distribution of plant num- notypic selection varies significantly. For
bers over generations? example, a two-genetic locus model with
Do the same guidelines for optimal one marker and one target locus involved
transfer of a single target gene also hold can be simplified as selection for a single
for the transfer of multiple genes? gene-based marker when the marker is
developed from the target gene.
Some of these issues on gene introgres-
sion have been also addressed by a number
of authors, using an analytical approach, Selection using single markers
numerical methods, computer simula-
The reliability of foreground selection
tions or a combination thereof (Hospital
largely depends on the genetic distance
et al., 1992; Hospital and Charcosset, 1997;
between the markers and the target gene. If
Hospital, 2001; van Berloo et al., 2001;
only one marker, located on one side of the
Stam, 2003). As a special case, Frisch (2004)
target gene, is used in selection, the link-
discussed the issues related to introgres-
age between the marker and the gene has
sion of a recessive gene, where recurrent
to be very tight in order to have relatively
backcrossing without the aid of molecular
high selection efficiency. Suppose a marker
markers requires progeny tests in each BC
locus (M/m) is linked with the target locus
generation in order to determine whether a
(Q/q) with recombination frequency of r
plant is a heterozygous carrier of the reces-
and the F1 has genotype MQ/mq, where Q
sive gene or not.
is the target allele to be selected; when M is
linked to Q, Q can be selected on the basis
of M. The probability that the Q/Q genotype
8.2.1 Marker-assisted foreground can be obtained through selection of marker
selection genotype M/M, that is, the probability for
selecting the correct individuals, is
There are several approaches to using
molecular markers to select an associated P1 = (1 r)2 (8.1)
target gene or allele (foreground selection).
Foreground selection can be used for gene From Fig. 8.1, the probability for select-
introgression from one genetic background ing the correct individuals decreases rap-
to another and pyramiding multiple genes/ idly with the increase of recombination
alleles to a genotype from multiple donors frequency. In order to have over 90% proba-
as well. For a specific target gene or allele, bility, the recombination frequency between
Probability for selecting correct plants (%) Marker-assisted Selection: Theory 295
100 18
marker in the middle could be developed between single crossovers (as is generally
from the cloned gene. This system will be the case), the frequency of actual double
very useful when the target gene is only crossovers is lower than the expected value
available in a wild species and linkage drag (which assumes no interference). Therefore,
is associated with the chromosome segment the actual probability for making the right
to be introgressed. selection based on flanking markers should
Suppose there are two marker loci (M1/ be higher than the theoretical expectation.
m1 and M2/m2) which are located on each The population size required to gen-
side of the target gene (Q/q) with recombi- erate (in a single BC generation) a high
nation frequencies r1 and r2 and the F1 has probability of obtaining at least one plant
genotype M1QM2/m1qm2. The F1 will pro- recombinant between the target gene and
duce two gametes with the marker genotype both flanking markers is greater than the
M1M2, one of which is the parental type reproductive rate for most crop species.
containing the target allele (M1QM2) and the For example, for a flanking marker distance
other is the double-crossed containing non- of 5 cM on each side of the target gene,
target allele (M1qM2). Because the frequency about 4000 individuals are required to find
of double crossing is very low, the double- a double recombinant with a probability of
crossed gamete is very rare. As a result, the 0.99 (Frisch et al., 1999b). Therefore, Frisch
probability of making the correct selection (2004) proposed a sequential strategy to find
for the target allele Q based on the presence an individual with recombination between
of M1 and M2 is very high. Under no interfer- the target gene and one flanking marker in
ence, the probability of obtaining the target generation BC1 and a recombinant between
genotype Q/Q based on selection of M1M2/ the target gene and the second flanking
M1M2 in the F2 generation is marker in generation BC2 (also see Fig. 8.4b
for further explanation of this strategy).
P1 = (1 r1)2 (1 r2)2 / [(1 r1)(1r2)+r1r2]2 (8.3) Table 8.1 gives the optimum popula-
tion size n1 in generation BC1 and corre-
When the target gene is located in the sponding expected population size E(n2)
middle of two flanking markers, i.e. r1 = r2, in generation BC2 such that the expected
the probability of making the right selection total number of individuals E(n) = n1 +
is minimized. Figures 8.1 and 8.2 show the E(n2) required to introgress one gene with a
relationship between the minimum number minimum number of individuals in a two-
of plants required and r1 (or r2), when generation BC programme is minimized.
P2 = 0.99 and r1 = r2. Selection efficiency is The values depend on the map distances
much higher using two flanking markers d1 and d2 between the target gene and two
than using one marker. With interference flanking markers (Frisch, 2004).
Table 8.1. The expected total number of individuals E(n) = n1 + E(n2) required to introgress
one gene with a minimum number of individuals in a two-generation BC programme (from
Frisch (2004) with kind permission of Springer Science and Business Media).
Map distance 4 6 8 12 16
d1(cM) n1/E(n2)a
Selection using multiple markers for specific trait or trait category. The multiple-
multiple targets marker approach can be used to select the
best trait/gene combinations based on selec-
MAS provides opportunities for simulta- tion for each of the target loci whose posi-
neous selection of multiple traits/genes tion in the genome is known. It is possible
using multiple markers. In some cases, to select the best cassette for any traits and/
multiple pathogen races or insect biotypes or trait combinations.
must be used to identify plants for multi- When single chromosomes are distin-
ple resistances, but in practice phenotypic guishable, partial genome selection or whole
selection may be difficult or impossible chromosome selection are possible as an
because different genes may produce simi- alternative to whole genome selection so that
lar phenotypes that cannot be distinguished the other chromosomes remain unchanged.
from each other. Markertrait association MAS could be focused on a chromosomal
can be used to simultaneously select multi- region/arm if it is separable from the rest
ple resistances from different disease races of the genome. Genes controlling the same
and/or insect biotypes and pyramid them traits or trait category may cluster in some
into a single line through MAS. specific chromosomal regions, which are
For example, to find a restorer for cyto- called gene blocks. Regional mapping strat-
plasmic male sterility (CMS) in rice through egies (Xu, 1997; Monna et al., 2002), com-
testcrossing and progeny test, a candidate bined with a high-density genetic map, can
male plant has to be testcrossed with a CMS help construct high-density regional maps
line to find out if it has fertility restorability that target gene blocks for separation of
based on the fertility of testcross progeny. closely linked genes.
However, sterility in testcross progeny could
result from the absence of either restorabil-
ity genes or wide compatibility genes or both
when an intersubspecific cross is involved. 8.2.2 Marker-assisted background
MAS using multiple markers could be used selection
to distinguish the two different types of ste-
rility. As another example, consider phe- In a BC programme, molecular markers can
notypic selection for multiple traits in rice, be used for indirect selection for the presence
such as thermal-sensitive genic male steril- of a favourable allele (Tanksley, 1983) and
ity (TGMS), amylose and wide compatibil- for selection against the undesirable genetic
ity. Candidate plants must be tested in two background of the donor genotype (Tanksley
different environments where TGMS can be et al., 1989). Selection for the remainder of
identified. Each plant must be testcrossed the genome excluding the target gene(s), i.e.
with wide compatibility testers, following genetic background, is called background
up with a progeny test in the next season. selection (Hospital and Charcosset, 1997).
At the same time, a relatively large amount The background selection is aim-
of seed must be harvested for amylose meas- ing at the whole genome. In a segregating
urement. While conventional selection population, each chromosome represents a
methods require a delay until a large number random combination of two parental chro-
of seeds are available and a reasonable level mosomes. So we have to know the parental
of homozygosity is reached, in MAS only combination of each chromosome in order
a leaf harvested at any growth stage in any to do whole genome selection, i.e. the entire
segregating population is required, with the genome has to be covered by molecular
availability of associated markers for these markers. For an individual plant, we can
traits. infer the parental origin for each marker
As genetic mapping information accu- allele across the whole genome when geno-
mulates from different mapping populations, types at all marker loci are known and thus
it may be possible to establish a complete we can infer the parental combination for
profile for all the genes associated with a each chromosome.
298 Chapter 8
and discussed several issues relating to the ular markers is inferred from the genotypes of
potential power and application of graphi- the markers that delimit the interval. When
cal genotypes. The term RFLP markers used inferring the graphical genotype of an interval
in their paper can be extended to include from the genotypes of the marker endpoints,
genotypes derived from all types of molecu- there are often alternative configurations that
lar markers that are co-dominant and haplo- will satisfy the available marker data. Young
types derived from di-allelic markers such and Tanksley (1989a) used the most likely
as SNPs. This concept (developed on the configurations to develop a graphical geno-
basis of structured populations such as BC type. Thus, simple configurations requiring
and F2) can be extended to all populations the fewest number of crossover events were
including natural populations that consist utilized in developing a graphical genotype,
of germplasm accessions or cultivars. while alternative configurations that require
one or more multiple crossover events are
Requirements for deducing graphical not. In practice, this means that if two con-
genotypes secutive loci have the same genotype, the
genotype of the segment between the mark-
In order to construct a graphical genotype, ers is inferred to be that of two flanking
certain conditions must be met. First, a markers. When two adjacent loci have dif-
well populated or high density, molecular ferent marker genotypes, it is inferred that a
map, for the entire genome of the species crossover event had taken place somewhere
must be available. This map should con- between the two loci.
sist of a large number of markers that cover Since the genotype of a non-recombinant
the entire genome with at least one marker interval is inferred from the genotype of its
every 10 cM or less. In addition, it is also marker endpoints, double crossovers (or
necessary that the cistrans configuration other even numbers of crossovers) in a given
for the molecular markers be known in interval will falsify this inference and the
order to prepare a graphical genotype. In likelihood of double crossovers increases
populations derived from inbred lines, such by the square of the probability of a cross-
as breeding populations consisting of BC or over between the adjacent molecular mark-
F2 progeny, the cistrans configuration can ers. Thus, for any interval, the probability
be inferred simply by the knowledge of the that the inferred genotype will be correct is
breeding scheme. In more complex situa- 1 r 2, where r is the probability of a cross-
tions, complete molecular marker data must over event between adjacent molecular
be obtained for three generations in order markers. For the total genome, the probabil-
to prepare graphical genotypes for individu- ity that there are no incorrect intervals is
als in the third generation. In humans, for
example, molecular marker data must be Total intervals
determined for grandparents and parents
in order to develop graphical genotypes
Pt = (1 r )
n =1
2
n (8.4)
for the children in the pedigree. Without
this knowledge of cistrans configuration, This equation considers only double
molecular marker data from some regions of crossovers and assumes interference bet-
the genome may have more than one pos- ween crossovers to be negligible. As an
sible graphical genotype, all of which are example, consider an organism with a total
equally likely to be correct. genome size of 1000 cM in which molecular
markers are evenly spaced over the entire
Assumptions employed in developing genome. The expected proportion of the
graphical genotypes genome which is described correctly by
the graphical genotype is calculated by first
The primary assumption required for the determining the probability of 0, 1, 2, . . .
development of graphical genotypes is that intervals that are incorrectly described for a
the genotype of a region between two molec- given spacing of molecular markers. These
300 Chapter 8
probabilities, along with the spacing size, additional crossing. Although the concept
are then used to determine the expected of the graphical genotype was proposed a
length of the genome correctly inferred, long time ago, it has been widely used in
which is then divided by the total genome different fields of genomics. It has been
size to yield the expected proportion of the used, as described in Chapter 4, for selec-
genome that is accurately portrayed by the tion of genome-wide introgression lines as
graphical genotype. With molecular mark- a library to cover all traits and the whole
ers spaced every 10 cM, an inferred graphi- genome segment by segment. As molecular
cal genotype will have a probability of only marker data increase exponentially with
30% of being exactly correct for all regions the availability of high-throughput geno-
(i.e. no incorrect intervals). However, this typing systems, the concept of the graphical
same graphical genotype will be accurate in genotype and its derivatives have received
describing the genome constitution for over more attention and are widely used in
99% of the genome. Even when the spac- MAS, near-isogenic line (NIL) construction,
ing between molecular markers increases to introgression line library development and
30 cM, the inferred graphical genotype will association mapping. As numerous points
be accurate for approximately 95% of the in the genome can be covered by markers,
genome. Apparently, as the number of man- graphical genotypes can be simplified by
ageable and available molecular markers displaying them using the physical posi-
becomes unlimited compared to the number tions of markers rather than the intervals
when the concept was proposed, the correct determined by flanking markers.
probability will be improved significantly.
Cistrans ambiguity happens in an
F2 population when heterozygous loci are 8.2.3 Donor genome content
separated by a stretch of one or more homo- in BC generations
logous loci. In this situation, two equally
likely graphical genotypes are possible that DNA marker-based whole genome selec-
differ in the cistrans configuration of the tion or background selection can be used to
flanking heterozygous regions (see Fig. 5 of accelerate recovery of a recurrent genotype
Young and Tanksley, 1989a). Calculations in the backcrossing process for improving
based on the Poisson distribution indicated parental lines. The basic principle of back-
that only 6% of a genome consisting of ten ground selection (as opposed to foreground
chromosomes of 100 cM each will be ambig- selection on the target gene) is that in any
uous. The utility of graphical genotypes in given BC generation the actual DGC varies
F2 populations will not generally be seri- around the theoretical mean value.
ously impaired by cistrans ambiguities. Once QTL alleles of interest in the
resource (donor) parent have been identi-
Application of graphical genotypes fied by linkage to resource-specific marker
alleles, repeated backcrosses to the cultivar
A graphical representation of a genotype (while choosing in each cycle only those
deduced from RFLP data for a randomly backcrossing progeny carrying the exotic
selected individual from a tomato F2 popu- QTL-linked marker alleles) will allow the
lation provided by Young and Tanksley effective introgression of the linked quanti-
(1989a) is shown in Fig. 8.3. Note that it is tative alleles from the donor into the culti-
not only possible to see which portions of var. Depending on the number of alleles to
each set of homologues are derived from be introgressed, it may be possible to expe-
each parent, but also the regions in which dite matters by actively selecting against
crossovers took place. exotic marker alleles (and hence against the
Using graphical genotypes, plants can associated chromosomal regions) that are
be selected that not only contain the gene(s) not in linkage to introgressed alleles.
of interest, but also have the highest prob- Table 8.2 shows the frequency of a
ability that the rest of the genome will favourable allele after one to six BC gen-
return to that of the recurrent parent with erations, with and without selection for a
Marker-assisted Selection: Theory 301
Table 8.2. The frequency of a favourable allele after a given number of BC generations, with and without
selection for a linked marker allele or for a pair of markers bracketing the favourable allele (marker
bracket) and the proportion of recipient genome recovered with and without MAS against the remaining
exotic genome. From Beckmann and Soller (1986a) by permission of Oxford University Press.
linked marker allele. Also shown are results If only two background selection markers
if selection is for a pair of marker alle- on the target chromosomes are used (assum-
les bracketing the QTL to be introgressed. ing direct selection for the target gene), the
Suppose that the proportion of recombi- distances d1 and d2 between the target gene
nation between marker allele and linked and markers can be chosen such that the
favourable allele is 0.10 when a single expected DGC on the target chromosome
marker is used and the proportion of recom- is minimized if both markers are fixed for
bination between the two markers of the the recipient alleles (Hospital et al., 1992)
bracket is 0.40. The comparison of interest by applying
is the frequency of the introgressed alleles
1
after three generations of marker-assisted d1 = d2 = ln(1 + 2 s ) (8.5)
backcrossing (MAB) (single marker, 0.66; 2
marker bracket, 0.85) compared to the fre- where s is the proportion of selected BC1
quency of the introgressed allele after five individuals. This approach is based on the
to six unassisted BC generations (0.01). In assumption of an infinite population size
the former case, the introgressed allele will and the optimum properties only hold true
have an immediate effect on cultivar value if two markers in the carrier chromosome of
and can be rapidly brought to fixation by the target gene are used (Frisch, 2004).
selfing or selection.
With two markers per chromosome
used for MAS against the remainder of the
exotic genome, the proportion of recipient 8.2.4 Linkage drag in gene introgression
(recurrent) genome recovered in BC2 will be
equal to that obtained in BC6 without MAS When transferring a single gene from a donor
(Table 8.2). This result is also given Fig. 8.4 into the genetic background of a recurrent
and is well recognized by many authors parent by repeated backcrossing, genetic
(e.g. Tanksley et al., 1989; Hospital et al., linkage will cause fragments of the donor
1992; Frisch et al., 1999a, 2000). Therefore, genome surrounding the target gene to be
selection based on markers that distinguish dragged along, which is called linkage
between donor and recurrent parent genome drag, a persistent problem in plant breeding
may considerably accelerate the recovery of for gene introgression. Small donor genome
the recurrent parent genome. fragments, not linked to the target gene,
302 Chapter 8
Marker-assisted
backcrossing
breeding
Marker-assisted
backcrossing
breeding
Years 0.5 1 1.5
Fig. 8.4. Comparison of traditional and marker-assisted backcrossing breeding (assuming that
co-dominant markers are used). (a) Rate to return to recurrent parent genotype in regions of genome
unlinked to gene(s) being introduced. (Top) Traditional backcrossing breeding. Graphical genotypes were
generated for randomly selected individuals from various BC generations derived from a single BC1
individual by computer simulation. Only one homologue of each of the 12 tomato chromosomes is shown
(the other homologue can be derived exclusively from the recurrent parent). Darkened regions indicate
donor genome segments, striped regions indicate segments in which crossovers occurred, and white
regions indicate recurrent genome segments. Each interval is 20 cM in length. The numbers beneath each
graphical genotype indicate the percentage of the genome derived from the recurrent parent. The average
number of generations required to return to the recurrent genome, as estimated from 20 independent
simulations, was 6.5 1.7 generations. (Bottom) Graphical genotypes of individuals from marker-assisted
backcrossing breeding programme showing return to the recurrent parentage in only three generations.
In each BC generation, 30 progeny were generated and the best (in terms of percentage recurrent parent
genome) was used as the parent for the next BC generation. (b) Expected linkage drag around a selected
gene held heterozygous during backcrossing. (Top) Traditional backcrossing breeding. (Bottom) MAS for
plants carrying chromosomes with recombination near the selected gene. Markers tightly linked to the
gene of interest are used to identify individuals with crossovers within 1 cM on one side of the selected
gene in BC1. These recombinant individuals are then backcrossed to the recurrent parent and other tightly
linked markers are used to select recombinants within 1 cM on the other side of the target gene in BC2.
The expected number of years to obtain a given level of linkage drag (for a typical crop with a generation
time of 0.5 years) is shown below. From Tanksley et al. (1989) reprinted by permission from Macmillan
Publishers Ltd.
may also end up in the recipients genetic example, even after 20 BCs, one expects to
background. The removal of linked seg- find a sizable piece (10 cM) of the donor
ments occurs in a complex fashion that was chromosome still linked to the gene being
described by Hanson (1959) and further selected (Stam and Zeven, 1981), which is
elaborated by Stam and Zeven (1981). Their shown in Fig. 8.4b. In practice, this region
work showed that it takes many generations may be larger or smaller than the expected
to remove the linked donor segments. For value owing to the large variance associated
Marker-assisted Selection: Theory 303
with the expected value and because a select individuals that have experienced
breeder inevitably practices selection among recombination near the gene of interest.
the progeny. In most plant genomes, 10 cM is In approximately 150 BC plants there is a
enough DNA to contain hundreds of genes. 95% chance that at least one plant will have
Therefore, backcrossing results in the trans- experienced a crossover within 1 cM on one
fer, not only of gene(s) of interest, but also side or the other of the gene being selected.
of additional linked genes from the donor. Molecular markers allow unequivocal iden-
This phenomenon can often result in a new tification of these individuals (Young and
cultivar modified for characters other than Tanksley, 1989b). With one additional BC
those originally targeted. Not surprisingly, generation of 300 plants, there would be a
many examples of linkage drag are known 95% chance of a crossover within 1 cM of
in which undesirable traits that are closely the other side of the gene, generating a seg-
linked to a target gene are carried along dur- ment surrounding the target gene of less
ing the breeding programme, particularly than 2 cM. This would have been accom-
when an exotic germplasm is involved. plished in two generations with molecular
In addition to linkage drag, unlinked markers, while it would have required, on
DNA from the donor parent must also be average, 100 generations without molecular
removed during a BC breeding programme. markers (Fig. 8.4b). It is apparent that the
In order to obtain a better idea of the rela- ability to select for desirable recombinants
tive importance of linked versus unlinked in a region of interest is a function of the
donor segments in BC breeding, a sim- number of markers mapped in that region,
ple curve was derived from the works of as well as the number of plants assayed.
Hanson (1959) and Stam and Zeven (1981) As plant molecular maps become more
to compare the amount of foreign DNA saturated, the efficiency of selecting recom-
due to these two sources as a function of binants will increase.
the number of BC generations (Young and Peleman and van der Voort (2003) pro-
Tanksley, 1989b). The results of this analy- vided an example of linkage drag that hap-
sis demonstrated that for a hypothetical pened in gene transgression of lettuce. In the
genome of ten chromosomes of 100 cM 1990s, Keygene was involved in a marker-
each, the proportion of unlinked DNA assisted breeding approach that led to the
derived from the donor genome is greater development of a novel lettuce cultivar
than that of remaining linked DNA only in resistant to the aphid Nasonovia ribisnigri
the first four BC generations. After this time, (Jansen, 1996). This aphid is a major problem
the proportion of donor DNA due to linkage in field-grown lettuce areas in Europe and
drag far exceed unlinked DNA by a factor of California causing reduced and abnormal
50 and in the 20th BC generations, linked growth in addition to spread of viral diseases.
donor DNA exceeds unlinked by a factor of Resistance to this aphid could be introgressed
more than 105. This simple analysis clearly from a wild relative of lettuce, Lactuca
emphasizes the importance of linkage drag virosa, by repeated backcrossing. However,
as the prominent problem in BC breeding despite many rounds of backcrossing the
programmes. new product was of extremely poor quality,
In a traditional BC programme, the bearing yellow leaves and a greatly reduced
linked segments usually remain large for head. This could either have been caused
many generations not because recombina- by a pleiotropic effect of the resistance gene
tion had not occurred in these regions, but or by linkage drag, a negative trait closely
because there is no effective way to identify linked to the positive trait of interest. Marker
recombinant individuals. In classical breed- analysis eventually demonstrated that the
ing it is usually only by chance that such reduced quality was caused by linkage drag.
recombinants are occasionally selected In this case, the linkage drag was recessive,
which contribute to a reduction in the size only visible in the homozygous state, thereby
of the donor segment. With high-density seriously increasing the difficulty to select
molecular maps it is possible to directly for recombinations based on the phenotype.
304 Chapter 8
It was decided to use DNA markers flank- donor genome substitution. The distribution
ing the introgression to pre-select for indi- of DGC in a BC1 generation is shown in Fig.
viduals that are recombinant in the vicinity 8.5 for three genome sizes (haploid number
of the gene. More than a thousand F2 plants of chromosomes map length): small:
were screened this way, leading to the 127 5 100 cM; medium: 10 100 cM; large:
selection of some 100 individuals bearing a 15 150 cM (Stam, 2003). The important
recombination or even double recombina- feature that can be observed is that the vari-
tions in the vicinity of the gene. Only those ance in DGC decreases as genome size (total
individuals needed to be phenotyped for centimorgans) increases.
both the resistance and, at the F3 level, for the From the tabulated cumulative distribu-
absence of the negative characteristics. This tion of Fig. 8.5 the probability of less than
approach eventually led to the selection of a given DGC can be read. For example, the
an individual bearing recombination events probability that DGC is less than 0.35 equals
very close to each side of the gene thereby 0.21, 0.12 and 0.06 for the small, medium
removing the linkage drag. The results dem- and large genome, respectively. From these
onstrated that the (recessive) linkage drag was probabilities one can calculate the popula-
due to tightly linked factors on both sides of tion size required to ensure that with e.g.
the resistance gene. As indicated by Peleman 90% certainty at least one plant will occur
and van der Voort (2003), this result would with less than a given DGC. Let the threshold
have been very hard to obtain by classical DGC be x and let the corresponding probabil-
selection methods. ity be px. Then from Stam (2003) the required
minimum population size N satisfies
1.0
0.8
Cumulative distribution
0.6
0.4
Small
Medium
0.2
Large
0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
DGC (%)
Fig. 8.5. The cumulative distribution of donor genome content (DGC) in a BC1 generation for a small,
medium or large genome. Results based on 50,000 replicate simulation runs. After Stam (2003).
Marker-assisted Selection: Theory 305
< 0.45 5 5 6 8 9 11
< 0.40 7 9 14 14 16 25
< 0.35 10 18 40 17 31 68
< 0.30 16 41 169 28 69 285
< 0.25 28 111 822 48 187 >1000
least two plants with less than a given DGC assisted BC programme. For example, only
in a BC1 generation, which tells the import- 2650 marker data points were required for
ance of genome size. For example, for DGC n1:n2:n3 = 1:3:9, while 5000 or even 7250
to be less than 0.40 in at least one plant, a marker data points were required for ratios
large genome requires approximately a two- 1:1:1 and 3:2:1, respectively. However, in a
fold larger population size as compared multi-stage selection for a quantitative trait,
to a small genome (14 versus 7). As DGC large populations in early generations are
decreases, this tendency increases rapidly advantageous because when high selection
as up to tenfold for DGC less than 0.30 and intensity is applied, a large selection gain is
thirty times for DGC less than 0.25. From expected due to the large segregation vari-
these simple calculations, the price to be ance (Frisch, 2004).
paid for a rapid decline of DGC in a large
genome is twofold (Stam, 2003): (i) the
larger the genome size, the more markers
(the more marker data points per plant) 8.2.6 Background selection at carrier
required; and (ii) the larger the genome size, chromosome
the larger population size required to attain
a given rate of donor genome substitution. Donor genome substitution is most impor-
When multiple BC generations are con- tant and at the same time most difficult, for
sidered, there are selection strategies on chromosomes that carry the target gene(s).
population sizes. Employing increasing, Suppose that the target gene is flanked by
constant, or decreasing population sizes two markers at map distances d1 and d2
from generations BC1 to BC3 in a simulation which can be used for background selec-
study had little effect on the recurrent par- tion as described previously. Within a given
ent genome values of the selected BC3 plants number of generations the introgressed
(Frisch et al., 1999a). For example, allocat- segment must be smaller than the segment
ing a total of n = 300 plants such that 100 covered by d1d2. Then, given a pre-set
plants are generated in each of generations probability of reaching this goal (with a 99%
BC1 to BC3, (ratio n1:n2:n3 = 1:1:1) resulted success rate), what are the optimal popula-
in a lower 10% percentile of the recurrent tion sizes in successive BC generations?
parent genome (Q10) of 97.4%, while vari- The answer to this question has been
ous ratios from 3:2:1 on the one extreme to given by Hospital and Decoux (2002) and can
1:3:9 on the other resulted in Q10 values readily be obtained with the software pack-
of 97.3 and 97.4%, respectively. In con- age POPMIN (http://moulon.inra.fr/fred/pro
trast, employing a large population size in grams). Table 8.4 provides three important
generation BC1 multiplies the number of features based on the results from POPMIN.
marker data points required for the marker- First, the smaller the segment (interval)
306 Chapter 8
Table 8.4. Optimum population sizes (expressed as number of individuals) required in successive BC
generations to achieve with 99% certainty two markers flanking the target gene becoming detached in at
least one plant of the last BC generation. N, accumulated number of plants. (N), average accumulated
number of plants; this is less than N because with a certain probability the goal may be reached before
the final generation. Figures indicated in configuration column are distances in centimorgans (cM).
T, target locus; d1, d2, flanking markers (Stam, 2003).
bracketed by the markers, the more plants 8.2.7 Whole genome selection
are required because rare recombinants are for genetic background
less likely to occur in smaller populations.
Secondly, population size should increase The question arises about the number of
as generations proceed as two-sided detach- markers (per chromosome) that should be
ment (crossover) is in most cases a two-stage used for whole genome selection for genetic
process. If no detachment (crossover) occurs background and how this depends on genome
at any side in a given generation, more plants and/or population sizes. Several authors (see
are required in the generation(s) thereafter. e.g. Hospital and Charcosset, 1997; Frisch
Thirdly, allowing more generations (three et al., 1999a,b) have shown that in a moder-
versus two) to achieve the goal requires ately sized population from which the most
fewer plants to be grown and genotyped in promising plant is selected for further back-
total, indicating a trade-off between speed crossing, an increase in the number of mark-
and cost (total sample size) of the introgres- ers per chromosome beyond two is hardly
sion programme. rewarding (Table 8.5). An increase from 1 to 8
In addition, the POPMIN software also markers reduces DGC in relative sense (from
allows the user to specify the initial genotype 0.13 to 0.07 in BC2), but the absolute effect
at both markers and the target locus. Given is limited. However, when rapid progress
an initial condition of BC generation, e.g. requires using larger population sizes, espe-
BC1, the user can optimize population sizes cially in the case of a large genome (where
in the following BC2, BC3 etc. Conversely, larger population sizes are required anyway),
if no single recombinant has been obtained the situation is different (Table 8.6).
in a given BC generation, an increase of the
originally planned population sizes in gen- Table 8.5. Average decrease of DGC in a BC
erations thereafter is needed. programme with a medium genome size and
In terms of the relative importance of single target gene. Each chromosome has 1, 2
or 8 markers, uniformly distributed over the
background selection for the carrier chromo-
chromosome. One plant out of 50 is selected in
somes and the remainder genome, Hospital
each generation for backcrossing. The selected
(2002) considered background selection on plant satisfies the following conditions: (i) it carries
carrier chromosomes to be more important the target allele; and (ii) it has the smallest number
than on non-carriers and thus assigned dif- of markers of donor signature. Results based on
ferent weights to carrier and non-carrier 5000 replicate simulation runs (Stam, 2003).
markers. Frisch and Melchinger (2001)
considered multi-stage selection of mark- Number of markers BC1 BC2 BC3
ers: after selection of the target gene(s), one
1 0.34 0.13 0.07
selects plants based on carrier markers and
2 0.31 0.09 0.04
finally, from the obtained subset, one selects 8 0.30 0.07 0.02
based on non-carrier markers.
Marker-assisted Selection: Theory 307
Table 8.6. Average DGC attained in BC2 for small 8.2.8 Multiple gene introgression by
and large genome sizes with various population repeated backcrossing
sizes and number of markers per chromosome
(Stam, 2003). Since little additional effort is required to
Genome size
screen with multiple molecular markers
Number of Population after sampling and DNA extraction, one
markers size Small Large could consider adding many genes simul-
taneously to a cultivar through MAB. For
2 50 0.082 0.121 example, batteries of disease resistance
200 0.079 0.095 genes could be added in a few generations,
400 0.078 0.088 as opposed to the many generations required
8 50 0.040 0.100
with traditional breeding. The ability to rap-
200 0.021 0.067
400 0.019 0.055
idly adjust existing cultivars should allow
breeders to more quickly respond to market
demands, as well as unexpected environ-
Two general conclusions can be drawn mental pressures, such as the appearance of
(Stam, 2003): (i) For a small genome with new pathogens.
few markers per chromosome, increas- With marker assisted introgression,
ing the population size makes little sense. allele frequencies for the introgressed
When many markers are available, however, alleles are sufficiently high that two of
an increase of population size does reduce three alleles could be readily introduced
DGC, but hardly so beyond N = 200. (ii) For and brought to fixation in a given breed-
a large genome, increasing the population ing cycle (Beckmann and Soller, 1986a).
size is beneficial, irrespective of the number Without MAS, many BC progeny will have
of markers per chromosome. Obviously, to be screened for the introduced trait, due
with increasing genome size more inde- to the extreme rarity of BC progeny carrying
pendent recombination events are required desired exotic alleles.
to attain a given reduction in DGC, which Several authors have considered opti-
in turn demand larger populations for their mization aspects of multiple gene transfer
discovery. by repeated MAB (van Berloo et al., 2001;
Whole genome selection for background Frisch and Melchinger, 2001; Hospital,
will help reduce the DGC. The question 2002; Stam, 2003). Frisch (2004) discussed
about what final level of DGC is acceptable the introgression of two dominant genes. It
cannot easily be answered in general terms. is clear that, roughly speaking, the effects
When only relying on estimated DGC, based of population size, genome size and total
on markers, one still runs the risk that after number of markers on the efficiency of
finalization a tiny donor fragment contains recurrent parent genome recovery are simi-
a few wild type genes that confer an unde- lar to those for single gene transfer. As an
sirable trait. Especially in a rapid cycling example, Table 8.7 shows the effect of
introgression programme that hardly allows population size for the introgession of three
phenotypic selection for general agronomic target genes in a genome of medium size
performance, undesired donor traits may using eight markers per chromosome for
unexpectedly turn up despite an expen- background selection (Stam, 2003). With
sive and theoretically powerful BC scheme multiple targets an increase of population
(Stam, 2003). On the other hand, desirable size enhances the efficiency. However, the
DGC levels across different BC populations average DGC decrease in the BC population
largely depend on the genetic difference more apparently when the population sizes
between the donor and recurrent parents. In increase from small to medium such as from
many cases, unsaturated backcrossing with 50 to 100.
selection for the target gene may be enough The number of target genes does affect
particularly when the donor parent is also a the answer to the question whether a given
commercial cultivar. total number of plants should be distributed
308 Chapter 8
Table 8.7. Average DGC in BC2 and BC3 in an genetic value relies on increasing the fre-
example of the simultaneous introgression of three quency of favourable genes controlling
target genes in a genome of medium size, using that trait. To create a superior genotype,
eight markers per chromosome for background the breeder must assemble many genes
selection. A single plant was selected for further
which work well together and, for a spe-
backcrossing, carrying the three target alleles and
having the smallest number of markers of donor
cific trait, assemble the alleles with similar
signature. Averages based on 1000 replicate effects from different loci. This process is
simulation runs (Stam, 2003). called pyramiding, by which different QTL
alleles can be recombined and the true-
Population size BC2 BC3 breeding lines associating alleles of similar
(positive or negative) effect can be selected
50 0.18 0.09 (Xu, 1997; Fig. 8.6). Related techniques
100 0.14 0.06
include effectively identifying the individ-
200 0.11 0.04
400 0.09 0.03
uals with favourable allele combinations,
assembling different alleles into a common
genetic stock to produce new genotypes and
determining the joint effects of alleles at dif-
over two or three generations. Comparison
ferent loci. In the words of Allard (1988):
of average DGC attained with a total of 900
Emphasis was therefore shifted . . . to a
plants, distributed over two or three BC
particulate approach . . . determining the
generations with medium genome size and
individual effects of single marker loci on
eight selection markers per chromosome
adaptive change, then determining the joint
(Stam, 2003), showed that three BC genera-
effects of pairs of loci.
tions each with 300 plants is more effec-
tive than two BC generations each with 450
plants. The average DGC for the former is 8.3.1 Gene-pyramiding schemes
0.010 for one target gene and 0.036 for three
target genes while for the latter these two
If all genes cannot be fixed in a single step
numbers are 0.023 and 0.083, respectively.
of selection, it is necessary to cross again
Again, there is a trade-off among time (how
selected individuals with incomplete, but
many generations involved), cost (how
complementary, sets of homozygous loci
many data points to generate per genera-
(Xu et al., 1998). However, such strategies
tion) and efficiency (how soon the recurrent
are limited to small numbers of target loci.
parent genome can be recovered).
To accumulate more loci in a single geno-
A complication arising with multi-
type by selection on markers, Hospital et al.
ple QTL transfer is the uncertainty about
(2000) proposed a marker-based recurrent
the exact location of QTL. Hospital and
selection (MBRS) method using a QTL com-
Charcosset (1997) investigated the optimal
plementary strategy in a randomly mating
location of markers to be used in foreground
population. When evaluating this method
selection. This optimization process should
using simulations with 50 detected QTL
also consider the relative economic import-
in a population of 200, they found that the
ance of the target traits for the multiple QTL
frequency of favourable alleles went up to
to be introgressed.
100% in ten generations when markers were
located exactly on the QTL, but up to only
92% when markerQTL distance was 5 cM.
8.3 Marker-assisted Gene The reduced efficiency in the latter case
Pyramiding comes from the probability of losing the
QTL during the breeding scheme because
Agricultural productivity is the result of of recombination between the markers and
growing superior genotypes in an envir- QTL. This effect becomes more severe with
onment which allows them to express increasing duration of the breeding scheme
their superiority (Boyer, 1982). Increasing because of the accumulation of meiosis;
Marker-assisted Selection: Theory 309
Germplasm
A B C D E
A Map-based
Screening for B X whole genome
non-allelic C X X selection
QTL D X X X
E X X X X
Q1Q1 q1 q1 q1 q1 q1 q1
Allele- Allele
q2 q2 Q2Q2 q2 q2 q2 q2
dispersed dispersion
q3 q3 q3 q3 Q3Q3 q3 q3
materials
q4 q4 q4 q4 q4 q4 Q4Q4
Crossing x x
Divergent
F2 F2
selection Pyramiding
Q1Q1 q1 q1 q1 q1 q1 q1
Q2Q2 q2 q2 q2 q2 q2 q2
q3 q3 q3 q3 Q3Q3 q3 q3
q4 q4 q4 q4 Q4Q4 q4 q4
Crossing x Separating
Divergent F2
selection
Q1Q1 q1 q1
Allele- Q2Q2 q2 q2
Allele
associated Q3Q3 q3 q3
association
materials Q4Q4 q4 q4
Fig. 8.6. A procedure for QTL separating and pyramiding. Non-allelic QTL with dispersed QTL alleles are
identified by observation of transgressive segregation and map-based whole genome selection, and then
recombinants are obtained by divergent phenotypic selection from crosses derived from non-allelic
QTL materials. Two cycles of cross-selection are exemplified to pyramid non-allelic QTL at four loci
(Q1-q1, Q2-q2, Q3-q3 and Q4-q4). QTL separating is a reverse process of pyramiding, in which
allele-associated materials are used as parents to produce a segregating population (F2) and
intermediate phenotype is selected in order to get allele-dispersion individuals. From Xu (1997).
This material is reproduced with permission of John Wiley & Sons, Inc.
As shown in Fig. 8.7, the gene- bled haploids (DHs) as described in detail
pyramiding scheme has two parts. The in Chapter 4. Using the DH procedure, the
first part is called a pedigree and is aimed ideotype can be developed in just one addi-
at accumulating all target genes in a single tional generation after the root genotype is
genotype (called the root genotype). The obtained, plus one more generation for seed
second part is called the fixation steps, increase to produce large populations. The
which aims at fixing the target genes into fixation steps using the DH procedure can
a homozygous state, that is, to derive the be outlined as follows.
ideotype from the root genotype. A pedigree First, obtain a genotype carrying all
can be represented by a binary tree with n favourable alleles in coupling, namely
leaves corresponding to the n founding par- H(1, 2, , n)(B) by crossing the root genotype with
ents and n 1 nodes. Each node of the tree a blank parent (denoted as H(B)(B)) containing
is called an intermediate genotype and has none of the favourable alleles. This guaran-
two parents. Each intermediate genotype, tees that the linkage phase of the offspring is
which is a particular genotype selected known and that the H(1, 2, , n)(B) genotype can
from among the offspring, becomes a parent be identified without ambiguity.
in the next cross. Denote the gametes (sub- Second, self H(1, 2, , n)(B) to give the ideo-
sets of genes) passed on from the parents to type in one generation.
the intermediate genotype as s. Take H(s1)(s2)
as an example, the intermediate genotype
must produce and pass on to its offspring Pedigree height
a gamete carrying all the favourable alleles The number of generations a pedigree spans
in s1 and s2. is called the pedigree height, denoted h. If
There are many possible procedures the fixation steps span two generations, the
that can be used to fix the root genotype, one complete gene-pyramiding scheme spans
of which is to generate a population of dou- h + 2 generations. A pedigree is of maxi-
mum height when just one cross is per-
Founding parents
formed at each generation (involving an
intermediate genotype H and a founding
parent). This type of pedigree is called a
P1 P2 P3 P4 P5 P6 G0
cascading pedigree. Conversely, a pedigree
is of minimum height when the maximum
H(1) (2) H(3) (4)
Pedigree
Log2(n) h n 1 (8.7)
H(1,2,3,4) (5,6) G3
Node where x denotes the smallest integer larger
Fixation steps
than or equal to x.
Root genotype
all possible values of p, the number N(p) of Note that other target genes might be
pedigree cumulating n genes can be com- on the map, located between the ais, but
puted via not belonging to the set s; recombinations
between these genes do not matter here. As
n 1
n an example illustrating Eqn 8.10, consider
p N (p)N (n p)
1
N (n) = (8.8) the genotype H(1,3)(2,5,6). The probability that
2 p =1
it passes the set (1, 2, 3, 5, 6) is (see Eqn 8.11
at bottom of page). Knowing these probabil-
The factor is there to ensure that the cross-
ities, the overall probability of obtaining
ing of two given parents is counted only
the root genotype of a given pedigree is the
once. This recursion can be solved (see the
product, over all the pedigrees nodes (other
Appendix in Servin et al., 2004) and leads to
than the root node), of the probabilities cal-
n
culated as in Eqn 8.10.
N (n) = (2k 3) = (2n 3)(2n 5)...1
k =2
(8.9)
1
P (H (1,3)(2,5,6) ) (1, 2, 3, 5, 6) = (r1,2 )(r2,3 )(r3,5 )(1 r5,6 ) (8.11)
2
312 Chapter 8
P3
1 1 1
(1r12)(1r23)r34 [ (1r12)(1r23)(1r34)]2 [ (1r12)(1r23)(1r34)]2
2 2 2
n = 102 N = 65 N = 65
Ntot = 961 Ntot = 1001
1
[ (1r12)(1r23)(1r34)]2
2
N = 68
Ntot = 325
Fig. 8.8. Representation of three different gene-pyramiding schemes cumulating four loci. Scheme a is based on a cascading pedigree. Schemes b and c
differ by the order of crosses of the founding parents. The target genes are represented by solid circles and other genes by shaded boxes. At each node the
transmission probabilities of the targeted genes from parent to offspring are given. When the probability is equal to one, it is not indicated. The population sizes
needed at each node (N ) and the cumulated population sizes (Ntot) are provided. From Servin et al. (2004) with permission.
313
314 Chapter 8
marker. For any set of markers, M will be with a smaller population size if selection
minimized if the marker with the lowest is delayed until greater homozygosity has
retained fraction f (or highest culling rate) is been reached.
used first, followed by the next lowest and For more segregating loci, population
so on. The total cost (C) of marker assays sizes quickly increase even in DH or RIL
can be determined from Eqn 8.15 by inclu- populations. For example, in a biparental
sion of the cost of each assay population with eight unlinked segregating
loci, the frequency of the target genotype in
C = Nc1 + Nf1c2 + Nf1 f2c3 + + a homozygous population is 0.58 = 0.0039,
Nf1 f2 fn1cn (8.16) the minimum population size 1777. In these
instances, Bonnet et al. (2005) proposed a
where c1, c2, cn are the cost of the marker two-stage selection strategy. The first stage
assays. The total cost, C, is minimized when is F2 enrichment, where F2 individuals
carrying the entire set of target alleles in
c1 c2 cn either homozygous or heterozygous form
< < ... < .
1 f1 1 f2 1 fn are selected. F2 enrichment takes advantage
of the high expected frequency of carrier
It should be noted that the analytic (either homozygous or heterozygous) at each
expression for the cost of sequential culling locus of 0.75. The value of the technique
ignores the costs of plant/line handling (tag- can be seen in a population segregating at
ging, leaf sampling, etc.) and DNA extrac- 12 loci, where the frequency of genotypes
tion, which are fixed with total sample size selected in an F2 enrichment step is 0.7512 =
and cannot be reduced by sequential cull- 0.031676, resulting in the minimum popula-
ing. If these fixed costs are major parts of tion size of 144 F2 generations, compared to
the expense for genotyping, the order of the frequency of 0.2512 = 5.960464 108 and
markers used in the sequential culling may a population size of > 77 million to identify
become less important. As high-throughput a single homozygous individual in the F2.
genotyping systems have been established After F2 enrichment, the frequency of
for using all markers for all samples to make each of the 12 target alleles in the selected
the genotyping most cost-effective overall, population is increased from 0.5 to 0.67.
the order of markers used in the sequential The second step is to generate a popula-
culling may become less important. tion of more or less homozygous lines from
the selected F2. The frequency of the target
Enrichment of favourable alleles at early genotype in DH/RIL populations gener-
generations ated from the enriched F2 will have been
increased from 0.512 to 0.6712, resulting in a
When many (unlinked) markers have to decrease in minimum population size from
be selected, the frequency of a target homo- 18,861 to 596. Thus, with enrichment, both
zygous genotype will be low and a large the F2 and the DH/RIL populations are of a
population size will be required. For exam- more practical size for breeding.
ple, in the F2 of a biparental cross between The allele enrichment can be done
two inbreds segregating at five unlinked for more than one generation when
loci, the frequency of the target genotype is multiple-generation selection is involved.
0.255 = 0.00098 and the minimum popula- Enrichment at two selection stages (e.g. in F2
tion size (Eqn 8.2) to recover at least one and F3) always requires greater assay num-
target genotype is 4714 (a = 0.01). If selec- bers than simple F2 enrichment (Wang et al.,
tion is made among homozygous lines 2007). As indicated by Bonnet et al. (2005),
(i.e. DH or RIL populations) from the same F2 enrichment increased the frequency of
cross, the frequency of the target genotype selected alleles, allowing large reductions
is 0.55 = 0.03125 with a minimum popu- in minimum population size for recov-
lation size of only 146 (a = 0.01), i.e. the ery of target genotypes (commonly around
target genotype is more readily recovered 90%) and/or selection at a greater number
316 Chapter 8
of loci. So the gain from another cycle of versa for negative correlation. If correlation
allele enrichment selection in F3 following results from pleiotropic effects of a marker
enrichment in F2 is at best minor and often gene rather than linkage, it is difficult, if not
results in a small net increase in minimum impossible, to select towards the direction
population size. opposite to the correlation.
For a top-cross of three adapted wheat
lines from an existing breeding programme,
simulation of changes in allele frequen-
cies at nine target genes (seven unlinked) 8.3.4 Marker-assisted recurrent
showed that population size was mini- selection versus genome-wide selection
mized with a three-stage selection strategy
in the F1 generation of the top-cross (TCF1), Recurrent selection is considered as one
the F2 generation of the top-cross (TCF2) of the selection approaches to combine
and DHs. Enrichment of allele frequencies favourable alleles distributed among dif-
in TCF2 reduced the total number of lines ferent sources of germplasm. There are
screened from > 3500 to < 600. Eight of the various new versions of recurrent selec-
genes were present at frequencies > 0.97 tion available with which molecular
after selection (Wang et al., 2007). marker information is incorporated. The
key advantages of these new versions are
the availability of genetic data for all pro-
8.3.3 Gene pyramiding for different traits geny at each generation of selection, the
integration of genotypic and phenotypic
The methods discussed above are for pyr- data and the rapid cycling of generations
amiding genes affecting a specific trait. of selection and information-directed mat-
However, aggregating favourable genes from ings at continuous nurseries.
different traits in a genotype has long chal- Marker-assisted recurrent selection
lenged plant breeders. The principles dis- (MARS) was proposed in the 1990s (Edwards
cussed above can be used in the same way and Johnson, 1994; Lee, 1995 Stam, 1995)
to accumulate QTL alleles controlling dif- which uses markers at each generation to
ferent traits. A distinct difference in con- target all traits of importance and for which
cept is that alleles at different trait loci to be genetic information can be obtained. Genetic
accumulated may have different favourable information is usually obtained from QTL
directions, i.e. negative alleles are favour- analyses performed on experimental popu-
able for some traits but positive alleles are lations, which includes QTL locations and
favourable for others. Therefore, one may effects. When the QTL mapping is conducted
need to combine the positive QTL alleles based on a biparental population, both par-
of some traits with the negative alleles of ents often contribute favourable alleles. As a
others to meet breeding objectives. Marker- result, the ideal genotype is a mosaic of chro-
assisted gene pyramiding is also important mosomal segments from the two parents. The
when considering multiple traits, as in phe- goal of MARS is to obtain individuals with as
notypic selection each of these traits has to many accumulated favourable alleles as pos-
be tested in different environments, differ- sible. However, the ideal genotype, defined
ent developmental stages or different stages as the mosaic of favourable chromosomal
of a breeding programme. segments from two parents, will usually
Attention should be paid to trait cor- never occur in any Fn population of realistic
relation when one practises pyramiding of size (Stam, 1995). As discussed previously,
alleles for different traits. Positive correla- a breeding scheme to produce or approach
tion will facilitate the pyramiding process this ideal genotype based on individuals of
involved in selection for alleles with the the experimental population could involve
same favourable direction, but impede the several successive generations of crossing
process of selection for QTL alleles with individuals (Stam, 1995; Peleman and van der
different favourable directions and vice Voort, 2003) and would therefore constitute
Marker-assisted Selection: Theory 317
to genome-wide selection was 1843% larger when quantitative traits are involved. First,
than the response to MARS. Regardless QTL mapping so far has provided limited
the heritability and the number of QTL, results so that there is no trait for which all
response to genome-wide selection were related QTL have been located precisely.
smallest when NM = 64 markers were used. Therefore, it is very difficult, if not impos-
A minimum of NM = 128256 polymorphic sible, to make a comprehensive selection
markers should be used in genome-wide for any specific trait. It is also a complicated
selection in maize and more markers should issue to simultaneously select for multiple
be used for complex traits that have, at QTL. Secondly, epistasis will affect both the
the same time, a high heritability. In con- efficiency and the final products of MAS.
trast, response to MARS were largest with Thirdly, there are certain genetic correla-
NM = 64 or 128 markers. Genome-wide selec- tions among quantitative traits so MAS for
tion is most useful for complex traits that are one trait may also modify other correlated
controlled by many QTL and have low her- traits. Therefore, it is much more difficult to
itability. Responses to selection were main- apply MAS to quantitative traits.
tained when the number of DHs phenotyped
and genotyped in Cycle 0 was reduced and
the number of plants genotyped in Cycles 1 8.4.1 Selection based on phenotypic
and 2 was increased. Such schemes that min- values
imize phenotyping and maximize genotyping
would be feasible only if the cost per marker The theoretical basis for phenotypic selec-
data point is reduced to about US$0.02. As tion is that phenotypic value is an approxi-
availability of large numbers of SNP markers mate estimate of genotypic value and thus,
in many crop plants and array-based cheap selection based on phenotypic value can
genotyping systems, genome-wide selection, be considered approximately as a selection
as a brute-force and black-box procedure that based on genotypic value. The higher the
exploits cheap and abundant molecular mark- relatedness between phenotypic and geno-
ers, is superior to MARS in plants. Please typic values, the higher the efficiency of
note that in genome-wide selection, one does phenotypic selection.
not need any QTL information. Rather, one It should be noted that, under random
uses a general regression approach in a test mating, only a fraction of the total genotypic
set to obtain an estimate of breeding value value, namely the component contributed
from a very dense marker set and then selects by additive effects, can be transmitted from
on this marker set. one generation to the next and therefore,
only selection for the additive component of
genotypic value is effective. More precisely,
8.4 Selection for Quantitative Traits the more closely the additive effect of an
individual resembles its phenotypic value,
The most significant distinction of quan- the higher the efficiency of phenotypic
titative inheritance is that there is no cor- selection. In animal breeding, the additive
responding (simple) relationship between value of an individual is often known as its
genotype and phenotype, although conven- breeding value. The relatedness of pheno-
tional plant breeding is based on selection typic value to the additive effect depends
of phenotypes. This is the major reason why on narrow-sense heritability (h2 = sA2 /sP2),
the efficiency of conventional plant breed- where sA2 and sP2 are additive genetic vari-
ing is often low. Therefore, the main objec- ance and phenotypic variance, respectively.
tive of MAS should be for quantitative traits The higher h2, the greater is the relatedness
according to their importance and necessity. between phenotypic value and additive
In principle, the methodologies developed effect. When h2 = 1, the phenotypic value is
for qualitative traits in MAS are also appli- equal to the additive effect value. The effi-
cable to quantitative traits. However, more ciency of phenotypic selection increases as
factors should be taken into consideration the narrow-sense heritability increases.
Marker-assisted Selection: Theory 319
3 0.10
GI p (1 p)2
RE IP = = + (8.24)
GP h2 1 ph2 0.20
2
0.50
where GI is the genetic advance of selection 1.00
index. Fig. 8.11 shows how REIP changes 1
with p under different levels of h2. For a
given p, REIP increases with the decrease 0
of h2, that is, MAS is more efficient when 0.0 0.2 0.4 0.6 0.8 1.0
Proportion of additive variance explained by markers
heritability is low; while REIP increases
with the increase of p but the increase rate Fig. 8.11. Efficiency of MAS in the improvement
becomes slow when h2 is high. When h2 of a single trait relative to traditional individual
reaches to an intermediate level (h2 = 0.5), selection index with the same selection intensity,
index selection has no apparent advantage. assuming very large sample sizes. Relative
When h2 = 1, REIP does not change with p, efficiency is plotted as a function of the proportion
with a constant value of 1, indicating that in of the additive genetic variance in the trait
this case molecular markers do not provide significantly associated with the marker loci, for
any extra information so that MAS has no various values of the heritability of the trait. From
positive contribution at all. Lande and Thompson (1990) with permission.
The relative efficiency of index selec-
tion and marker-score selection can be
which indicates that no matter what
expressed as
values p and h2 take, there is REIM 1.
Therefore, index selection always has higher
GI RE IP h2(1 p)2
RE IM = = = 1+ (8.25) selection efficiency than marker-score selec-
GM RE MP p(1 ph2 )
tion, which has been proven by computer
322 Chapter 8
simulation (Whittaker et al., 1997) and is given period. Based on this result, Hospital
different from the situation where selection et al. (1997) proposed a selection strategy
is based on marker scores. with one-generation of index selection
Index selection depends on both pheno- and several generations of marker-score
typic value and marker score. Therefore, the selection alternatively. In the generation of
factors that affect the efficiency of marker- index selection, a relatively larger popula-
score selection will also affect the efficiency tion is required for re-evaluation and selec-
of index selection. Computer simulation tion based on molecular markers, in order
(Gimelfarb and Lande, 1994a,b, 1995) indi- to maintain the reliability of markertrait
cated that index selection is more efficient regression. Conversely, relatively smaller
than phenotypic selection at least for the populations can be used in the generation
first several generations, but this advantage of marker-score selection.
disappears very quickly with the advance of
generations. In advanced generations, index
selection might be less efficient than the 8.4.4 Genotypic selection
phenotypic selection. This could happen
in advanced generations: when the degree Both marker-score selection and index
of the additive effect explained by marker selection depend on genotypic value or
score is not as good as that by phenotypic more specifically the additive components
value (i.e. p < h2), while sampling errors of genotypic value rather than the genotype
of weight coefficients in Eqn 8.20 amplify itself. Therefore, both selection methods
the relative importance of marker score so are for selection of genotypes through gen-
that the proportion of additive genetic vari- otypic values, which is indirect and they
ance explained by the selection index is not have no virtual difference from phenotypic
as good as that by phenotypic value (h21 < selection. This is not exactly the concept
h2). Therefore, both marker-score and index of MAS which has been proposed and
selection have advantage only at the early expected. Because genotypic value is the
stages of selection and, in advanced genera- result of genotypic expression, different
tions phenotypic selection works better. genotypes may have the same genotypic
Although index selection utilizes more value, that is, a genotypic value could
genetic information and thus it is more match up with many genotypes. There is
efficient than marker-score selection, it a loss or degeneracy of genetic information
costs more and needs more work in order from genotype to genotypic value. This
to gain the extra information for pheno- information degeneracy will result in a low
typic value. Furthermore, measurement of efficiency of selection and the loss of some
phenotypic value is limited to the stage at favourable QTL alleles with relatively
which the trait is expressed, which cancels smaller effects. The more the QTL involved
the advantage that MAS can be done at any in selection, the higher the chance that
stage. In addition, when the measurement favourable alleles will be lost. Therefore, a
of phenotype needs to be progeny tested, more efficient selection method should be
the cycle of index selection becomes longer that based on the genotype itself (which is
so that the advantage of index selection may called genotypic selection) as MAS for the
not offset this disadvantage. For example, qualitative trait. More specifically, each
hybrid maize yield has to be measured by target QTL is selected based on its two
progeny test and each cycle of index selec- flanking markers, a single closely linked
tion will take 2 years, while four cycles of marker, or a gene-based marker.
marker-score selection can be done in 2 Currently, genotypic selection of quan-
years (Edwards and Page, 1994). Although titative traits is now limited by the avail-
marker-score selection has lower genetic ability of QTL that have been fine mapped.
advance per cycle compared to index selec- For most quantitative traits, only QTL with
tion, it has higher genetic advance per unit large effects have been mapped on a rela-
time because more cycles can be done in a tively rough scale, leaving a lot of minor
Marker-assisted Selection: Theory 323
QTL non-detectable. To improve the effi- critical for quantitative traits that are geneti-
ciency and reliability of MAS, markers that cally controlled by many genes and interact
are flanking the target QTL should be tightly with environments. Advanced backcross-
linked. However, if the target region is too ing QTL (AB-QTL) analysis, proposed by
small, it might not contain the target QTL Tanksley and Nelson (1996) to accelerate
because the primary QTL mapping is not so the process of molecular breeding, is one
accurate. It is important to develop flanking of the approaches that can be used for this
markers that bracket the target QTL with purpose. Stuber et al. (1999) discussed
high confidence in order to have high selec- their effort to test a marker-based breeding
tion efficiency. scheme for systematically generating supe-
As discussed in the previous section for rior lines without any prior identification
selection of qualitative traits, it is better to of genes in the donor sources. Identifying
use three linked markers in selection and and mapping of genes in the donor is a
the best positions of these markers will be bonus obtained when the derived NILs are
determined by the confidence interval of the evaluated. This method is somewhat simi-
QTL. The middle marker should be tightly lar to AB-QTL analysis. Other approaches
linked to, located exactly at, or identical to, include using associations identified in F2
the QTL, which should be bracketed by two populations to select the subsequent self-
flanking markers. The optimized window pollinated populations.
size for the target region determined by the The AB-QTL strategy postpones QTL
flanking markers is positively proportional mapping until the BC2 or BC3 generation.
to the confidence interval of the QTL. The The delay of QTL analysis offers advantages
larger the QTL confidence interval, the big- for QTL characterization such that the prob-
ger the target region bracketed by the flank- ability is reduced for the detection of QTL
ing markers is required for guaranteeing that displaying epistatic interactions among
the QTL is located in the target region. donor alleles due to their overall low fre-
As indicated by Hospital and Charcosset quency. In fact, there will be a higher prob-
(1997), using position-optimized markers ability of detecting additive QTL which still
to follow the target QTL in BC breeding, function in a near-isogenic background.
favourable alleles at four independent QTL During the generation of BC2 or BC3 popula-
can be transferred from the donor parent to tions, negative selection is being exercised
the recurrent parent, with a population con- to minimize the occurrence of unfavourable
sisting of several hundreds of individuals. donor alleles. The advantage of focusing
If there is linkage between QTL and QTL on the BC2 or BC3 population is that they
are precisely mapped or larger population offer sufficient statistical power for QTL
sizes are used, more QTL can be transferred identification on the one hand and on the
simultaneously. other hand provide sufficient similarity to
the recurrent parent to select for QTL-NILs
in a short time span (within 12 years). By
use of QTL-NILs, the QTL discovered can
8.4.5 Integrated marker-assisted be verified and the NILs may serve directly
selection either as improved cultivars or as a parent
cultivar in case of hybrid crops (Peleman
As discussed previously, markertrait and van der Voort, 2003).
association identified in one population The AB-QTL approach can be exploited
has to be validated before it can be used for for pyramiding QTL alleles. Each time that
MAS in other populations. One of the best AB-QTL analysis is applied, the map posi-
ways to avoid the marker the validation step tions of donor QTL affecting key traits will
is to integrate genetic mapping with MAS, likely be discovered so that QTL mapping
that is, markertrait associations identified information derived from AB-QTL analysis
from a breeding population will be used is cumulative. Based on this knowledge, as
for MAS of the same population. This is indicated by Tanksley and Nelson (1996),
324 Chapter 8
marker loci needed; (ii) sample size needed the efficiency of MAB in crops with smaller
to detect trait loci with low heritability; and genomes is much higher than that in crops
(iii) sample errors in the estimation of rela- with larger genomes.
tive weights in the selection indices. Using > 80 markers in maize (corres-
Frisch and Melchinger (2005) developed ponding to a marker density of 25 cM) or
a theoretical framework for MAS for the > 60 markers in sugarbeet (marker density
genetic background of the recurrent parent 15 cM) resulted in only a marginal increase
in a BC programme to predict the response of the response to selection, irrespective of
to selection and give criteria for selecting the the population size employed (Fig. 8.12).
most promising BC individuals for further Increasing the population size up to 100
backcrossing or selfing. The approach dealt plants resulted in substantial increase in
with selection in generation n of the BC pro- response to selection on both crops and using
gramme, taking into account pre-selection even larger populations still improves the
for the presence of one or several target expected response to selection. Frisch and
genes, the linkage map of the target gene(s) Melchinger (2005) concluded that increas-
and markers and the marker genotype of the ing the response to selection by increasing
individuals used as non-recurrent parents the number of markers employed is possi-
for generating BC generations. ble only up to an upper limit that depends
Response to selection R is defined as on the number and length of chromosomes.
the difference between the expected donor In contrast, increasing the response to selec-
genome proportion m in the selected fraction tion by increasing the population size is
of a BCn population and the expected donor possible up to population sizes that exceed
genome proportion m' in the unselected BCn the reproduction coefficient of most crop
population: species.
An optimum criterion for the design of
R = m m' (8.26) MAS in a BC population can be defined by
the expected response to selection reached
Prediction of the response to selection with a fixed number of MDP. For a fixed
can be employed to compare alternative sce- number of MDP in sugarbeet, designs with
narios with respect to population size and large populations and few markers always
required number of markers. This applica- reached larger values of response to selec-
tion was illustrated by the example of a BC1 tion than designs with small populations
population using model genomes close to and many markers (Fig. 8.12). For maize,
maize (ten chromosomes of length 2 M) and the same trend was observed for 500 and
sugarbeet (nine chromosomes of length 1 M) 1000 MDP, while for a larger number of
with markers evenly distributed across all MDP the optimum design ranged between
chromosomes, a target gene located 66 cM 40 and 50 markers. Therefore, in BC1 popu-
from the telomere on a chromosome and one lations of maize and sugarbeet and a fixed
individual is selected as the non-recurrent number of MDP, MAS is, within certain lim-
parent of generation BC2. its, more efficient for larger populations than
The expected response to selection for for higher marker densities.
maize ranged from 5% of the donor genome In theory, MAS is proposed to be more
(20 markers, 20 plants) to 12% (120 mark- efficient than phenotypic selection when
ers, 1000 plants) and for sugarbeet it ranged the heritability of a trait is low, where there
from 7% to 15% (Fig. 8.12). To obtain a is tight linkage between QTL and markers
response to selection of 10% with 60 mark- (Dudley, 1993; Knapp, 1998), with larger
ers, a population size of 180 is required in population sizes (Moreau et al., 1998) and
maize, corresponding to 180/2 60 = 5400 in earlier generations of selection before
marker data points (MDP). By comparison, recombinational erosion of markertrait
in sugarbeet a population size of 60 is suf- associations (Lee, 1995). Edwards and Page
ficient, resulting in only 30% of the MDP (1994) proposed that the distance between
required for maize. The result indicates that markers and QTL was the factor that most
326 Chapter 8
m=
13 1000 13 200
Response to selection (%)
100 40
9 80 9
60
40
20
7 7
20
5 5
Fig. 8.12. Expected response to selection throughout the entire genome and expected number of
required marker data points (MDP) when selecting the best out of m = 20, 40, 60, 80, 100, 200, 500 and
1000 BC1 individuals. Model of the maize genome with ten chromosomes of length 2 M (left-hand side).
Model of the sugarbeet genome with nine chromosomes of length 1 M (right-hand side). From Frisch and
Melchinger (2005) with permission.
limited genetic gains from MAS. Yousef composite populations, MAS resulted in
and Juvik (2001a) reported an empirical significantly higher gain than phenotypic
experiment that provided equivocal results selection for 38% of the comparisons, while
regarding the relative efficiency of MAS and phenotypic selection was significantly
phenotypic selection in enhancing econom- greater in only 4% of the cases. The average
ically important quantitative traits in sweet MAS and phenotypic selection gains, calcu-
corn. MAS and phenotypic selection were lated as percent increase or decrease from
applied to three F2:3 base populations with the randomly selected controls, was 10.9%
either the sugary 1 (su1), sugary enhancer 1 and 6.1%, respectively.
(se1), or shrunken 2 (sh2) endosperm muta- Recognizing that small mapping popu-
tions. One cycle of selection was applied lations are not adequate for QTL mapping
to both single and multiple traits such as is the first and most important realization
seedling emergence. Selection efficiencies needed in the research community (Young,
were evaluated on the basis of gains over 1999). Scientists must understand that
one cycle. Among 52 paired comparisons simply demonstrating that a complex trait
between MAS and phenotypic selection can be dissected into QTL and mapped to
Marker-assisted Selection: Theory 327
approximate genomic regions using DNA many of the assumptions of the underly-
markers is not enough. Projects need to ing quantitative genetic models and to test
utilize better scoring methods, larger popu- the limits of selection itself. The power of
lation sizes, multiple replications and envir- selection is best presented by the selection
onments, appropriate quantitative genetic responses that have been observed in two
analysis, various genetic backgrounds and, important agricultural species. US maize
whenever possible, independent verifica- yield increased from a pre-1930 average of
tion through advanced generations or paral- 1.6 t ha1 (26.1 bushels acre1) to an aver-
lel populations (Melchinger et al., 1998; Utz age of 8.6 t ha1 (134.7 bushes acre1) for the
et al., 2000; Schn et al., 2004). Only then 5 year period from 1998 to 2002, a fivefold
will sufficient experimental evidence be in increase over 70 years (http://www.usda.
place for a successful MAS programme. gov/nass/). Of course, not all of the increase
What if we knew all the genes for a is due to selection, but studies have consist-
quantitative trait in hybrid crops? This was ently shown that genetics can account for
asked by Bernardo (2001), when working 50% of the increase. Milk yield in Holsteins
on the prediction of hybrid performance had increased from 5870 kg in 1957 to
through computer simulation. With maize 11,338 kg in 2001, representing a doubling
as a model species, he found through trait in milk yields over 44 years (http://aipl.
and gene best linear unbiased prediction arsusda.gov/dynamic/trend/current/trndx.
(TG-BLUP) that gene information is most html). There is evidence that the genetic
useful in selection when few loci (e.g. ten) trend continues to increase with time
control the trait. With many loci ( 50), in Holsteins. Molecular techniques have
the least square estimates of gene effects provided novel tools to analyse the final
become imprecise. Gene information con- selection product and reveal the change
sequently improves selection efficiency of genetic structures with the progress of
among hybrids by only 10% or less and selection experiments.
actually becomes detrimental to selection, Including long-term selection in this
as more loci become known. Bernardo fur- chapter is justified by a concept reversed
ther indicated that increasing the popula- breeding-to-genetics, which starts with a
tion size and trait heritability to improve selection programme to pyramid favour-
the estimates of gene effects also improves able alleles from various sources of germ-
phenotypic selection, leaving little room plasm to create transgressive variation and
for improvement of selection efficiency via great selection response followed by genetic
gene information. He thought genomics is analysis (usually marker-assisted evaluation)
of limited value in selection for quantitative to identify the genes and alleles associated
traits in hybrid crops. Epistatic interactions, with the selection response. As it will take
which were assumed absent in his study, years to pyramid genes and alleles from mul-
would make the estimation of gene effects tiple sources through genetic mapping and
even more difficult. It is unknown whether MAS, the reversed breeding-to-genetics
methods other than TG-BLUP or multiple approach can be used to exploit cumulated
regression would substantially enhance the novel alleles and genes by taking advantage
usefulness of gene information in selection. of the availability of plant materials that have
been accumulating in genetic and breeding
programmes. Combining with the strategy
of selective genotyping revised by Xu et al.
8.5 Long-term Selection (2008) and Sun et al. (2009), it could be more
realistic by starting with selection to pyramid
As one of the most powerful tools available alleles followed by genetic analysis to iden-
to biology, selection is used in the plant and tify the genes, compared to the genetics-to-
animal sciences to develop improved crop breeding approach by which genes/QTL are
cultivars and livestock breeds. Selection mapped in separate genetic analyses and then
is also used in laboratory species to test to pyramided by MAS. In this section, we will
328 Chapter 8
discuss the long-term selection experiments Segment 1. Generations 09, mass selec-
in plants and marker-assisted evaluation tion based on chemical composition.
of the selection results, which can be con- Numbers of ears analysed and selected
sidered the reversed breeding-to-genetics varied but approximately 20% of the
approach. ears analysed were selected. Each strain
was grown in a separate but isolated
field.
8.5.1 Long-term selection in maize Segment 2. Generations 1025. 120 ears
per strain were analysed and 24 were
There are several long-term selection experi- saved. Seed from each ear was planted
ments in maize (Duvick et al., 2004; Hallauer ear-to-row. Alternate rows were detas-
et al., 2004; Dudley and Lambert, 2004). The selled and 20 ears were analysed from
most well-known is the selection for oil and each of the six highest yielding rows.
protein contents which has been running Four ears were saved per row.
for over 100 generations. The detail about Segment 3. Generations 2652 in IHP
this experiment can be found in the special and ILP; generations 2658 in IHO and
volume of Plant Breeding Reviews (Volume ILO. Twelve selected ears were arbi-
24, Part 1, 2004). Only some significant proce- trarily divided into two lots (A and B)
dures and results will be summarized here. of six ears. Seed within each lot was
bulked and planted in the nursery.
Procedure Silks in lot A were pollinated by a bulk
sample of pollen from 1520 plants
The long-term selection experiment for oil in lot B while silks in lot B were pol-
and protein content in maize was initiated linated with pollen from lot A. Thirty
at the University of Illinois by C.G. Hopkins ears from each lot were analysed and
before the rediscovery of the Mendelian laws the 12 most extreme of the 60 ears ana-
(Hopkins, 1899). The most recent update on lysed were saved.
this long-term selection can be found from Segment 4. Generations 5390 in IHP
Dudley and Lambert (2004). Although the and ILP; 5990 in IHO and 5987 ILO.
original goal was to produce agriculturally The selection procedure was the same
valuable crops by increasing the oil and as in segment 3 but 90100 kg of N fer-
protein content of the kernels, the results tilizer ha1 were added to the soil. Only
are also quite remarkable from a theoreti- 87 generations were completed in ILO
cal viewpoint. One of the most interesting because of difficulties with seed set and
results was that the continued selection did seed quality which cause a loss of some
not deplete the variability. Truly the results generations.
were not in full compliance with the simple
Mendelian expectation. Following 48 generations of forward
In 1896, Hopkins initiated selection in selection, reverse selection was initiated
the open-pollinated maize cultivar Burrs in each of the four strains to form four
White (Hopkins, 1899). He analysed 163 new strains: Reverse High Protein (RHP),
ears for oil and protein concentration. The Reverse Low Protein (RLP), Reverse High
24 ears highest in protein, the 12 ears lowest Oil (RHO) and Reverse Low Oil (RLO)
in protein, the 24 ears highest in oil and the (Figs 8.13 and 8.14). The objective was to
12 ears lowest in oil were selected to initiate determine the extent of residual variability
the Illinois High Protein (IHP), Illinois Low available for selection. The selection pro-
Protein (ILP), Illinois High Oil (IHO) and cedure was the same as in the forward
Illinois Low Oil (ILO) strains, respectively. strains except that selection was for low
Both forward and reverse selection has been protein in IHP, high protein in ILP, etc.
conducted at different times in the experi- Following seven generations of selection
ment. The forward phase of the experiment in RHO, selection was against reverse to
was divided into four segments as follows: initiate the Switchback High Oil (SHO)
Marker-assisted Selection: Theory 329
25
Oil means
IHO
RHO
20 SHO
ILO
RLO
15
Oil (%)
10
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Generation
Fig. 8.13. Mean oil percentage plotted against generations for IHO, RHO, SHO, ILO and RLO derived
from 100 generations of selection. From Dudley and Lambert (2004). This material is reproduced with
permission of John Wiley & Sons, Inc.
40 Protein means
ILP
35 RLP
IHP
30 RHP
25
Protein (%)
20
15
10
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Generation
Fig. 8.14. Mean protein percentage plotted against generations for IHP, RHP, ILP and RLP derived
from 100 generations of selection. From Dudley and Lambert (2004). This material is reproduced with
permission of John Wiley & Sons, Inc.
330 Chapter 8
strain. Beginning in generation 90 of ILP, with other species where viability problems
a new strain called Reverse Low Protein 2 caused progress to cease, significant genetic
(RLP2) was initiated by selection for high variability was found in strains which had
protein in ILP. This strain was initiated to plateaued. Thus, it is not clear whether
determine whether genetic variability still an upper limit has been reached for protein
existed that could be exploited by selection in IHP.
after an apparent lack of progress in ILP for Based on significant progress in the
nearly 35 generations. The selection proce- reverse selection strains, genetic variance
dures were the same as in the regular and had not been exhausted at generation 48.
reverse selection strains, and have been The results from RLP2 are inclusive as to
described in detail (Dudley et al., 1974; whether exploitable genetic variance for
Dudley and Lambert, 1992). high protein still existed at generation 90
in ILP. Results from the per se evaluation
Limits to selection trials suggest that progress is being made,
but the generation data do not confirm this
Response over all generations is presented result.
by means for each generation plotted One unusual result of reverse selection
against generation number for all strains occurred in RHP. All the gain from the first
(Figs 8.13 and 8.14). The data upon which 48 generations of selection was dissipated
the figures were based is available in details by the next 15 generations of selection
in Dudley and Lamberts (2004) Appendix (Fig. 8.14). The progress per generation for
Tables 5.A15.A5. these 15 generations was approximately
One of the objectives of this experiment 0.68% per generation a rate at least three
was to determine the limits to selection for times that in any other segment of any strain
oil and protein in maize. The question has for protein.
been answered for low oil and low protein
in that progress ceased when oil became so Explanation of progress
low it was no longer measurable with the
analytical tools available. Protein appar- For oil, total gain in IHO is approximately
ently reached a lower limit after approxi- four times the total gain in ILO. For protein,
mately 65 generations when no further the gain in IHP is approximately three times
progress was possible with the selection the gain in ILP. Gain in the high direction
methods used. This low limit is likely phy- for both oil and protein is greater from gen-
siological in nature. erations 49 to 100 than from generations 0 to
Three types of evidence suggested an 48. In contrast, nearly 90% of the gain from
upper limit has not been reached for oil in selection in ILO and ILP came in the first
IHO and SHO. Significant genetic variation 48 generations of selection. Given that the
still exists in generation 98 and has not lower limit to selection for ILO and ILP is
changed since generation 65 as results near zero and the upper limit could approach
from the per se evaluation trials showed 100%, this greater gain in the high direction
significant increases in oil during the last is not surprising. The gain between genera-
five generations measured. Thus 100 gen- tions 48 and 100 for IHO (9.7% oil) is simi-
erations of selection have not eliminated lar to that in RHO (9.0% oil) and the gain
genetic variability and an upper limit has in IHP (12.5%) protein from generations 48
not been reached for either IHO or SHO. to 100 is similar to that in RHP (12.2% pro-
For IHP, the results are not clear. Data tein). The gain in RLO from generations 48
from the evaluation trials indicated no sig- to 100 is nearly ten times that in ILO and the
nificant increase in protein since genera- gain in RLP over the same generations is 13
tion 88. Genetic variance in generation 98 times that in ILP.
was not significantly different from that in These results are consistent with the
generation 68. Also there is no apparent gene frequency estimates assuming a model
viability problem in IHP as in experiments with a relatively larger number of genes
Marker-assisted Selection: Theory 331
affecting the traits, each with relatively Table 8.8. Ultimate limits to selection, measured
similar effects and additive gene action. as number of sA with varying values of n (number
The frequency of favourable alleles (q) in of loci segregating) and q (frequency of favourable
the original population was estimated as alleles) (from Dudley (1977) with permission).
approximately 0.2; therefore, greater gain q
for higher oil or protein should be possible
than lower values. When reverse selection n 0.1 0.25 0.5 0.75 0.9
was initiated, q was estimated to be approx-
imately 0.5 in both IHP and IHO. Thus, 10 13 8 3 3 2
selection in either direction should be pos- 50 30 17 10 6 3
sible and the total possible change should 100 42 24 14 8 5
200 60 35 20 12 7
be approximately the same in either direc-
tion. The switchback selection occurred at a
gene frequency of 0.35, which could allow to be low, approximately 0.25, and n > 50.
greater progress in the high direction than Such values are consistent with estimates
in the low, as was observed. of q of approximately 0.2 for both oil and
By evaluating the results from selection protein and n of 54 and 123 for oil and pro-
of 48 generations, Leng (1962) suggested four tein obtained, respectively. Although these
possible genetic interpretations: (i) acciden- results suggest all the progress would be
tal outcrossing; (ii) favouring of heterozy- explained by segregation of a large number
gotes in selection; (iii) high rate of mutation of genes in the original population, muta-
of the chemical genes concerned; and (iv) tion cannot be eliminated as a possible
release of variability by some unknown source of some of the variation upon which
means. He immediately dismissed these selection continues to operate.
interpretations because: (i) pollination has Goodnight (2004) and Eitan and Soller
been under strict control throughout the (2004) suggested epistasis as an important
long-term study. (ii) Favouring heterozy- factor to explain the negative or positive
gosity cannot be ruled out; however The heterosis for oil and protein observed in
rapid response to reverse selection in all the crosses involving the long-term selec-
four strains, if it were attributed to residual tion strains, supporting the hypothesis that
heterozygosity alone, would have required additive additive espitasis was important.
the level of heterozygosity to have remained Further evidence comes from the Design
at nearly the same level through 48 genera- III study of Moreno-Gonzalez et al. (1975)
tions of successful selection. This appears where crosses of both the F2 and the F6 of
highly improbable. (iii) Since all four the cross of IHO ILO back to the parents
strains are relatively uniform and show no exhibited negative heterosis for oil. This
evidence of being highly mutable . . . muta- hypothesis is further supported by the pres-
tion is not considered a likely explanation. ence of significant negative heterosis for pro-
(iv) A plausible mechanism is that contin- tein in the crosses of IHP RLP and IHP
ued recombination plays a role. ILP (Dudley et al., 1977).
However, as indicated by Dudley (1977), Dudley et al. (1974) suggested part of
it is possible to explain all the progress by the continued response in IHP could be due
segregation of a relatively large number of to a change in environment because the
genes (n), each at a relative low frequency addition of N-fertilization in generation 53
(q) in the original population. The number increased response per generation from 1.4
of additive genetic standard deviations to 1.6 g kg1 protein per cycle. The increase
of progress possible for a given value of n of available N fertilizer presumably allowed
and q as calculated by Dudley (1977) based alleles for higher protein to be expressed
on theory derived by Robertson (1970) is and selected.
shown in Table 8.8 for a sample of values of Finally, Walsh (2004) argued that muta-
q and n. For the progress of 21 sA made in tion was a necessary assumption to explain the
IHO and 18 sA in IHP, gene frequency needs result of the long-term selection experiment.
332 Chapter 8
He indicated that gain based on mutational unique opportunity to investigate the genetic
variance is expected to exceed that from gain basis of kernel chemical traits and have
based on residual segregation from the origi- been used to produce maize populations to
nal population after about 46 generations for map the QTL responsible for the selection
oil and 33 for protein. Although per-locus response (Goldman et al., 1993). By using
mutation rates are typically very small, for a 90 genomic and cDNA clones distributed
wide range of traits the mutational variance throughout the maize genome to detect
introduced in each generation is on the order RFLPs between IHP and ILP strains, 22 loci
of 1/100th of the environmental variance. distributed on ten chromosome arms were
This can be quite significant after 1020 gen- significantly associated with protein con-
erations. Keightley (2004) reviewed selection centration and clusters of three or more sig-
experiments in inbred lines and concluded nificant loci were detected on chromosome
that mutational variance was important in arms 3L, 5S and 7L, suggesting the presence
selection response. However, as indicated by of QTL with large effects at these locations.
Dudley (2007), neither Walsh nor Keightley A multiple linear regression model consist-
considered the effects of epistatic inter- ing of six significant loci on different chro-
actionon selection response. They concluded mosomes explained over 64% of the total
that epistasis may be an important factor in variation (Goldman et al., 1993). These sig-
explaining long-term response to selection, nificant QTL associations can be used to
which has been supported by the results from account for the long-term selection response
the crosses of IHO ILO and IHP ILP for and the protein content difference between
epistatic interactions as more epistatic inter- the IHP and ILP strains. It can be expected
actions were significant than expected by that the longer the selection proceeds, the
chance and the number of markers associated bigger the difference of protein content will
only with significant epistatic effects ranged be in the resulting selection strains and thus
from 46.3 to 72.2% of the total number of the potential to detect additional QTL, as
significant markers detected (Dudley, 2008). long as the populations continue to respond
I would rather suggest that both large num- to selection. This expectation can be tested
bers of loci at low gene frequency in the by QTL mapping using the crosses from the
original population and their recombination IHP and ILP strains derived from different
and epistatic interaction in the long-term cycles of selection.
selection lines should have contributed to Instead of using the extreme divergence
the long-term selection response. of the parents to create mapping populations,
Wassom et al. (2008) identified kernel QTL
Marker-assisted evaluation in a genetic background more relevant to
practical breeding by using 150 BC1-derived
Response to phenotypic selection can be S1 lines (BC1S1s) from IHO and recurrent
evaluated and associated genes can be iden- parent B73. Oil, protein and starch were
tified using molecular markers. The Illinois measured in BC1S1s and in Mo17-top-cross
long-term selection experiment on maize oil hybrids. Multiple regression models with
and protein contents (Dudley and Lambert, 39 QTL detected for each trait by compos-
1992, 2004) and marker-assisted evaluation ite interval mapping explained 46.9, 45.2
(Goldman et al., 1993) provides such an and 44.3% of phenotypic variance for oil,
example. The long-term divergent selection protein and starch, respectively, in BC1S1s
response can be attributed to the accumula- and 17.5, 22.9 and 40.1% for oil, protein
tive action of alleles with similar effect that and starch, respectively, in the testcross
had been dispersed among the individuals hybrids.
of the original population (Xu, 1997), while Laurie et al. (2004) used an association
de novo mutations may be an alternative study to infer the genetic basis of dramatic
explanation for this divergence, as indicated changes that occurred in response to selec-
by selection for bristle number in Drosophila tion for changes in oil concentration. The
(Mackay, 1995). The selection strains offer a study population was produced by a cross
Marker-assisted Selection: Theory 333
between the high- and low-selection lines at effective population size, due to the system
generation 70 when the oil concentrations used for bulking of pollen from multiple tas-
were estimated for IHO as 16.7% and for sels and using the bulked pollen to fertilize
ILO as 0.4%, followed by ten generations many ears, may be larger than previously
of random mating and the derivation of 500 calculated, contributing to less inbreeding
lines by selfing. These lines were genotyped than estimated earlier (Walsh, 2004). The
for 488 genetic markers and the oil concen- lower levels of inbreeding observed for the
tration was evaluated in replicated field tri- reverse strains than the forward strains sug-
als. As a single admixture event between gest the change in direction of selection
IHO and ILO created linkage disequilibrium for protein levels may have contributed to
(LD) between genes with different allelic maintenance of heterozygosity over gen-
frequencies and the ten-generation random erations in the reverse strains. There were
mating eliminated essentially all associa- trends in the variant frequencies in the for-
tion between unlinked markers and most of ward and reverse strains that are consistent
those between loosely linked markers, the with response to selection.
population can be used for LD mapping. All the RFLP loci selected to assay on
Three methods of analysis were tested in the strains based on association with QTL
simulations for ability to detect QTL. Using for protein contents in IHP ILP-derived
the most effective method-model selec- mapping populations showed frequency
tion in multiple regression, 50 QTL were trends consistent with response to selection
detected which accounted for 50% of the in one or both of the reverse strains. Only
genetic variance, suggesting that > 50 QTL one RFLP locus that showed a trend has
are involved. The QTL effect estimates are not been identified as a QTL. The selection
small and largely additive. About 20% of of probes based on previous QTL associa-
the QTL have negative effects (i.e. not pre- tions most likely increased the probability
dicted by the parental difference), which of identifying loci with variant frequency
is consistent with hitchhiking and small trends. These probes are more likely to
population size during selection. The large reveal variants that respond to reverse
number of QTL detected accounts for the selection, if they were not fixed by cycle 48
smooth and sustained response to selection when the reverse selection was initiated.
throughout the 20th century. These loci are therefore good candidates to
Mikkilineni and Rocheford (2004) look for changes in variant frequencies in
characterized RFLP variant frequencies in response to reverse selection. Eight probes
two cycles (65 and 91) of IHP, ILP, RHP and (23%) showed reverse trends for the RHP
RLP. As revealed by RFLPs, considerable strain. Twelve probes (34%) showed trends
variation at the DNA level was maintained for the RLP strain. One probe (3%) showed
in the Illinois long-term selection protein a trend for just the RHP strain. Five probes
strains even after 91 generations of selec- (14%) showed trends for just the RLP strain.
tion. Only one locus was observed with a Seven probes (20%) showed trends in com-
unique RFLP variant detected in just one mon for both the strains. All seven loci that
of the four strains. Although only 35 RFLP displayed trends in both directions were
loci were looked at, it does not appear there associated with QTL in IHP ILP mapping
was much variation that might potentially populations (Goldman et al., 1993, 1994;
be attributable to mutation. The inbreeding Dijkhuizen et al., 1998).
values calculated from the RFLP data from RFLP genotypic and variants fre-
cycles 65/69 and 91 were lower than those quency difference among cycle 90 of the
calculated on the strains before molecular oil strains, IHO, ILO, RHO and RLO, were
marker data were available. Maize under- also determined (Sughroue and Rockeford,
goes inbreeding depression and thus there 1994). A high degree of variant polymor-
may have been some natural selection phism was found among the four oil strains
within the selection strains for more vigor- and many RFLP loci were still segregating
ous and more heterozygous plants. Also, the within the oil strains after 90 generations
334 Chapter 8
of selection. RFLP variant trends consistent regation of tiller angle was found in two
with response to directional selection were rice F2 populations, 5002 Zhu-Fei 10 and
detected in comparisons among the four oil HA79317-7 Zhen-Nong13. By divergent
strains. selection for tiller angle in each F2 popula-
tion, two types of true-breeding extremes
Application to plant breeding were obtained, one with larger tiller angle
and the other with smaller tiller angle.
The total gain from selection, both in abso- Transgression of tiller angler was confirmed
lute value and in number of additive genetic in the two extreme crosses (Xu and Shen,
standard deviations, is well beyond what 1992b). For loci contributing to variation
might have been expected from the distribu- in tiller angle, the alleles of similar effect
tion of oil and protein values in the original were proved to be dispersed in the original
population. Likewise, they are well beyond parents but associated (pyramided) in the
what has been possible by selection for agro- extreme selections. By crossing two extreme
nomic traits such as grain yield. To illustrate strains each derived from one original cross,
the possible increases in maize grain yield new transgression was found in the F2 and
if selection for yield was as effective as for then two types of extremes were obtained
oil and protein, estimates of grain yield and by the second cycle of divergent selection.
sA from two maize synthetics, RSSSC (a By crossing the second-cycle extremes with
stiff-stalk synthetic) and RSL (a Lancaster each other and the third cycle of divergent
derivative) obtained in Illinois were used. selection for larger tiller angle, all posi-
The original means were 6.66 t ha1 for RSL tive alleles from the four original parents
and 9.23 t ha1 for RSSSC. Assume a gain of were pyramided. The transgression in each
24 sA, the approximately average of what original cross can be explained by the com-
was observed for oil and protein. The gain plementary action of the genes, which had
would be 33.28 t ha1 for RSL and 27.44 t ha1 been dispersed between the original par-
for RSSSC or a yield at the limit of 39.94 t ents and complemented each other when
ha1 for RSL and 36.68 t ha1 for RSSSC. they were pyramided in the extreme strains
Assuming some heterosis, the ultimate yield (Xu et al., 1998). Since this transgression
would be around 43.96 t ha1. These values was observed in replicate experiments
are not unreasonable when the fact that a (Xu and Shen, 1992b,c), it is unlikely that
yield of over 31.4 t ha1 was reported in Iowa the results are due to the mutation events
in 2002. As indicated by Dudley and Lambert as reported for experiments on divergent
(2004), these results suggest the existence selection for bristle number in Drosophila
of more genetic variability and more plas- (Mackay, 1995).
ticity in the maize genome than is usually We would expect genetic fixation with
expected. They also suggest that limits to long-term selection programmes. However,
selection for yield have not been reached. To selection experiments discussed above for
mark the importance of the long-term selec- maize for high and low protein or oil and in
tion experiment for protein and oil in maize Drosophila for bristle numbers (Yoo, 1980)
at the University of Illinois, the conference show no indication of genetic fixation from
titled Long-term Selection: A Celebration long-term selection resulting in remark-
of 100 Generations of Selection for Oil and able changes in phenotype. Frequent iden-
Protein in Maize was held on 1719 June tification of large-effect QTL, as reviewed
2002 in Urbana, Illinois. by Tanksley (1993), Kearsey and Farquhar
(1998) and Xu, Y. (2002), makes steady and
sustained selection response puzzling: alle-
les of large effects should be fixed rapidly,
8.5.2 Divergent selection in rice after which no further response would be
seen. Barton and Keightley (2002) named
Xu et al. (1998) reported a divergent selec- two factors that might explain this apparent
tion experiment in rice. Transgressive seg- paradox. First, QTL-mapping experiments
Marker-assisted Selection: Theory 335
underestimate the number of QTL and over- effects of statistically significant QTL are
estimate their effects. Secondly, mutation substantially overestimated. As the results
generates alleles of large effect, which can from RFLP-based evaluation of long-term
be picked up quickly enough by selection selected maize lines, the numbers of QTL
to sustain a continuing selection response. discovered in various experiments have
Several mechanisms have been described begun to get close to the numbers of genes
that can create de novo variation, including required to explain the long-term selection
intragenic recombination, unequal crossing response.
over among repeated elements, transposon Studies of this type would not be
activity, DNA methylation and paramuta- possible without the availability of the
tion. Barton and Keightley (2002) listed long-term selection strains. This fact
several factors that make it difficult to esti- points to the importance of maintaining
mate the true numbers and effects of loci longer-term selection programmes so that
influencing a quantitative trait. Hyne and these kinds of genetic stocks are available
Kearsey (1995) pointed out that in a typ- for various types of studies. Maintenance
ical experiment (heritability 40%, 300 F2 of long-term breeding materials is becom-
individuals), no more than 12 QTL are ever ing more challenging in an era frequently
likely to be detected, which is supported focused on short-term genomic-based
by empirical data on the numbers of QTL experiments funded by short-term com-
detected in plants as reviewed by Tanksley petitive grants. However, it is genetic
(1993), Kearsey and Farquhar (1998) and stocks developed by public sector long-
Xu, Y. (2002). Both Beavis (1994) and Utz term breeding and selection programmes
and Melchinger (1994) indicated that unless that frequently facilitate many of these
samples are large (> 500, for example), the studies at molecular level.
9
Marker-assisted Selection: Practice
9.1.3 Selection without laborious field In some cases, multiple pathogen races or
or intensive laboratory work insect biotypes must be used to identify
plants for multiple resistances, but, in prac-
Many important traits are phenotypically tice, this may be difficult or impossible,
invisible or unscorable by visual observa- because different genes may produce simi-
tion and must be measured in the labo- lar phenotypes that cannot be distinguished
Marker-assisted Selection: Practice 339
from each other. MTA can be used to select 9.2 Bottlenecks in Application of
multiple resistances simultaneously. Marker-assisted Selection
Consider selection for multiple traits
for example, temperature-sensitive genic To analyse the bottlenecks that may limit
male sterility (TGMS), amylose and wide the application of MAS in plant breeding,
compatibility in rice. Candidate plants it is necessary to have a brief overview of
must be tested under two different envir- the current status of MAS. Several private
onments where TGMS can be identified. companies have been routinely using MAS
Each plant must be testcrossed with wide- in breeding programmes, benefiting from
compatibility testers, following up with their long-term basic research programmes
a progeny test in the next season. At the and the availability of all the components
same time, a large amount of seed must of MAS. It is certainly a big investment for a
be harvested for amylose measurement. breeding company/institution to start from
Thus, using PS methods, we must wait scratch to run an efficient and fully oper-
until a large number of seeds are available ational MAS-based breeding programme. In
and a reasonable level of homozygosity is contrast to conventional breeding schemes,
reached. the methods and design of infrastructure
needed to support MAS have been the areas
of greatest change. In order to utilize MAS,
9.1.6 Whole genome selection companies had to make significant invest-
ments to assemble or modify various aspects
MAS can also be practised at the whole of infrastructure such as methods to detect
genome level. Whole genome selection can DNA polymorphism, manage information,
be used to eliminate the donor genome in or analyse and track samples, software to
backcross breeding or to get rid of link- relate genotype with phenotype and off-
age drag when a wide cross is involved. season or continuous nurseries (Ragot and
Combined with MAS for multiple traits, Lee, 2007). These components had to be
whole genome selection allows the breeder integrated with each other and with breed-
to transfer multiple traits through back- ing activities, which meant that scientists
crossing simultaneously. needed to learn how and when MAS pro-
High density molecular maps can be vided a comparative advantage over other
used to determine the genotype of an indi- methods when taking into account time and
vidual at many, sometimes thousands of cost components.
loci and make it possible to deduce the most MAS has been applied in the private
favourable genetic constitution for various sector for crops of great commercial interest
regions throughout the entire genome in a including maize, soybean, canola, sunflower
given individual. By portraying molecu- and vegetables. MAS in maize cultivar devel-
lar data in a graphical form, as discussed opment aims at recovering an ideal genotype
in Chapter 8, a graphical genotype can be defined as a mosaic of favourable chromo-
inferred to show the genomic constitution somal segments from the parents (referred
and parental derivation for all points in the to as genotype construction). More specifi-
genome (Young and Tanksley, 1989a), which cally MAS in maize has been used to simul-
opens up the possibility of conveniently taneously select for multiple traits (selection
analysing quantitative traits in map-based based on marker information only) such as
whole genome selection. As an extension of yield, biotic and abiotic stress resistance
this concept, the graphical genotype can be and quality attributes (Ragot et al., 2000;
described for QTL and used to identify from Eathington, 2005; Eathington et al., 2007),
mapping populations the desirable indi- several of which are polygenic in nature.
viduals with a favourable combination of Although there is very limited information
different QTL alleles or with association of on successful breeding product delivery,
all alleles of similar effects across the whole the first commercial products of molecular
genome. breeding (rather than limited MAS) have
340 Chapter 9
Table 9.1. Examples of released crop cultivars developed through marker-assisted selection.
the Co-42 gene conferring resistance to all markers is probably the one published by
known North American races of anthracnose Concibido et al. (1996) for soybean cyst
in the USA (Miklas et al., 2003). In pearl mil- nematode resistance. The volume of publi-
let, the parental lines of the original hybrid cations on the development and to a lesser
(HHB 67) were improved for downy mildew extent application of markers for assist-
resistance through MAS and conventional ing plant breeding has increased dramati-
backcross breeding and a new hybrid HHB cally during the last decade. As a result,
67-2 with improved resistance to downy the number of articles containing the term
mildew was released in India (Navarro et al., marker-assisted-selection climbed to
2006). Quality protein maize from an extra- over 1000 in 2003 (Fig. 9.1). There is lim-
early single cross maize hybrid, Vivek Maize ited targeted public sector funding to sup-
Hybrid-9, which was developed through port the large-scale validation, refinement
MAS for the opaque2 gene (Babu et al., and application of MAS in field breed-
2005) has recently been released in India. ing. This can be seen from the number of
The limited success in breeding prod- articles with the term marker-assisted-
uct delivery in MAS can be further illus- selection (1390 in 2004) compared to the
trated by the numbers of publications number of articles with the term quantita-
that have been generated on QTL map- tive trait locus or quantitative trait loci
ping and MAS since the discovery of the (1250 in 1998 and 4440 in 2005, Fig. 9.1).
first generation of DNA markers (Xu and Most of articles on MAS result from either
Crouch, 2008). The term marker-assisted- investments from donors with a scientific
selection first appeared over two decades mandate or academic institutions with
ago (Beckmann and Soller, 1986b) and was a specific interest in showing promising
initially focused on the potential uses. applications of MAS in plant breeding.
A decade later, the community term To convert promising publications into
became increasingly interested in appli- practical application in field breeding
cation of the genes tagged by molecular requires breaking through many practical,
markers and the term appeared in over logistical and genetical bottlenecks (Xu
100 journal articles in 1995 (Fig. 9.1). and Crouch, 2008). This includes devel-
However, the first real article on applica- oping simple, quick and cheap technical
tion of MAS in plant breeding using DNA protocols for sampling, DNA extraction
5000
4500
4000
QTL
3500
Number of articles
3000
2500
2000
1500
MAS
1000
500
0
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
Year
Fig. 9.1. The numbers of articles with the terms QTL (quantitative trait locus or quantitative trait loci)
and MAS (marker-assisted-selection) by years from Google Scholar, 4 August 2007). From Xu and
Crouch (2008) with permission.
342 Chapter 9
and genotyping that remain reliable and 9.2.1 Effective markertrait association
precise when routinely applied at high-
throughput. This also includes develop- QTL publications have been increasing
ing tailored sample and data tracking tremendously in the past two decades as
and management systems plus powerful shown in Fig. 9.1 and involving almost
decision support tools to ensure effective all crop plants and all types of agronomic
integration of genotyping into breeding traits (as reviewed by Dwivedi et al., 2007).
programmes. Xu and Crouch (2008) dis- However, reports of QTL mapping to date
cussed the bottlenecks associated with have tended to be based on individual small
translation of MAS from publications to moderately sized mapping populations
to practice, particularly in public sec- screened with a relatively small number
tor breeding programmes. William et al. of markers, providing relatively low reso-
(2007a) provided technical, economic and lution of MTAs (Xu, Y., 2002, 2003; Salvi
policy considerations on MAS in crops and Tuberosa, 2005). Very few of the QTL
based on lessons from the experience at an reported have been utilized in plant breed-
international agricultural research centre. ing through MAS. Thus the community
In principle, effective MAS systems are is investing a large amount of money and
the result of the following activities: labour in generating a lot of publications
with little impact on applied plant breed-
Developing DNA extraction and tissue ing. One of the approaches to effective
sampling and tracking systems appro- MTA is selective genotyping and pooled
priate for large-scale field trials. DNA analysis discussed in Section 7.7.
Establishing a platform for molecu- Some inherent limitations to MAS are
lar data generation, management and related to the estimates of QTL position and
analysis that meet the needs of plant genetic effects and the rates of false posi-
breeding. tives and negatives. Confidence intervals
Developing analytical methods for for QTL are typically 1015 cM; a genetic
synthetic cultivar development, het- region that should not be a major barrier
erotic group construction and hybrid for implementing MAS although it could
prediction using molecular marker become a limitation to achieving genetic
information. gain by preventing the selection of desired
Exploiting genetic and breeding mater- recombination events. The advent of asso-
ials including populations, hybrids, ciation mapping and a growing pool of can-
open-pollinated populations, landraces didate genes should provide some resources
under selection and synthetic cultivars needed to minimize problems related to the
from ongoing breeding programmes. estimation of QTL position. The genetic
Validating MTAs using any popula- effects of QTL are overestimated for many
tion that is genotyped genome-wide for reasons, some of which are linked to experi-
marker-assisted backcrossing (MABC) mental designs for phenotyping or popula-
and phenotyped for target traits, lead- tion development while others are inherent
ing to the update and refinement of the to the process of QTL detection (Lee, 1995;
marker set. Beavis, 1998; Melchinger et al., 1998;
Optimizing MAS systems and refining Holland, 2004).
MAS breeding programmes by improv-
ing the following procedures: high-
throughput sampling, DNA extraction
and genotyping, environment control 9.2.2 Cost-effective and high-throughput
and characterization, precision pheno- genotyping systems
typing, integration of diversity analysis,
genetic mapping and MAS, and data Private corporations have established or are
generation, interpretation and delivery developing the capacity to produce hun-
systems. dreds of millions of data points per year in
Marker-assisted Selection: Practice 343
service laboratories, distinct from research and optimizing such a detection system is
units. Besides, smaller biotech compa- time-consuming and also expensive.
nies are developing technologies that could Continual improvement in the cap-
reduce the cost of each marker data point to ability of laboratories to generate molecu-
a mere few US cents (Ragot and Lee, 2007). lar data has come through the development
Without considering any other cost in MAS, of new types of markers allowing increas-
however, the current cost associated with ing automation. However, this has tended
DNA extraction alone is already a big bur- to come with the negative consequence
den for many plant breeding programmes of an increase in the cost of equipment
in terms of sample-based cost, especially required to achieve high-throughput low-
in the early stages when few assays are cost genotyping and in turn, the capacity to
required on each sample. So a great effort see molecular genotyping achieve impacts
will be needed first to minimize the cost at the scale of modern plant breeding pro-
associated with each step of DNA extraction grammes. Due to the large up-front costs of
including sampling, labelling, reagents and assembling infrastructure and personnel
plastic consumables. for genotyping, it is unlikely that individ-
PCR amplification is a necessary and ual national marker laboratories could pro-
also expensive step for all PCR-based mark- duce data points in a cost-efficient manner.
ers. Multiplexing PCR primers has been In advanced laboratories and in animal
an approach to significantly reduce the and human research, this has led to an
PCR related cost but it takes a lot of effort increased tendency towards centralization
to optimize the protocol for suitable multi- and in particular, a shift to an out-sourcing
plex marker sets. Multiplexed PCR primers mode of operation. Therefore, the actual
work well for genetic diversity analysis. genotyping might be most efficiently and
When they are used for genetic mapping effectively carried out through regional
and MAS, however, they have to be opti- hubs and/or out-sourcing services. Collard
mized and even redesigned for each spe- et al. (2008) discussed genotyping systems
cific cross or population because there is no that might be suitable for different situa-
universal marker set that contains markers tions and breeding programmes including
that are polymorphic across all crosses or gel- and non-gel-based genotyping systems
populations. for remote breeding station laboratories
Another significant cost related to MAS and capillary- and array-based genotyping
is the step of marker detection after PCR systems for regional hub laboratories.
amplification, which can be significantly
different from one assay type to another.
When screening PCR-based markers by agar-
ose gel electrophoresis, which is considered 9.2.3 Phenotyping and sample tracking
more suitable for MAS of single traits, gel
preparation and electrophoresis and scor- Once a high-throughput system has been
ing time for a 50200-sample-gel can take as established for DNA extraction, PCR ampli-
long as 34 h. Using microtitre plates or dot fication and marker detection, the bottleneck
blot detection of allele-specific gene-based will be the phenotyping that is required for
markers offers substantially higher through- MTA before MAS and sample tracking that
put and lower costs than gel-based assays. is required for a large number of plants and
However, those systems are not suitable families during MAS. Phenotyping has
for large-scale MAS using large numbers of been considered as critical in the era of
markers for both genetic background selec- post-genomics and is now receiving greater
tion and multiple target traits. Effective and attention than ever. Precision and global
efficient marker genotyping systems for phenotyping of a large number of plant sam-
large-scale MAS depend on a high-through- ples is very expensive and time-consuming
put detection system that works with a large and is the limiting factor that affects the
number of markers. In general, developing accuracy of genetic mapping and the power
344 Chapter 9
of MAS. Private corporations have realized a whole for MAS. As plant breeders always
the need for such high precision phenotyp- work with a large number of plants and
ing as can be seen from their active recruit- populations and some crop species cannot
ing of trait-specific phenotyping scientists be as easily organized in the field as others,
often located in targeted areas where the to facilitate the sample collecting and track-
trait of interest can be more easily meas- ing, sampling tracking will finally deter-
ured (e.g. positions dedicated to drought mine whether MAS can be processed in a
tolerance and located in arid regions of high-throughput manner and thus whether
the world) (Ragot and Lee, 2007). Beyond MAS is practicable on a large scale.
laboratories, plant handling is becoming a
bottleneck to high-throughput protocols.
High-throughput facilities have to be estab- 9.2.4 Epistasis and
lished and equipped at continuous nursery genotype-by-environment interaction
sites potentially to handle millions of plants
per year. Genetic effects related to epistasis are either
The level of heritability of measured poorly estimated or ignored by breeding
traits depends on whether the phenotyping programmes (Holland, 2001; Crosbie et al.,
can be repeated across different seasons, 2006). Such assessments of genetic effects
locations and environments. Clustering tar- will inflate predictions of genetic gain. The
get locations into mega-environments and relative merit of MAS will depend on the
comparing these with selection at differ- nature of predictions, actual results and
ent locations has been used to understand costs of alternative methods.
how the target environments for a breed- The importance of genotype-by-
ing programme differentiate the germplasm environment interaction (GEI; discussed
with respect to yield and other agronomic in detail in Chapter 10), as a bottleneck
traits (e.g. Rajaram et al., 1994; Lillemo in marker-assisted breeding (MAB), has
et al., 2004; Chapter 10). Cross-population been recognized because it affects both the
and environment comparison of phenotyp- power of QTL detection and the response
ing will determine how the MTAs identi- to MAS. To evaluate QTL by environment
fied under one environment can be used for interaction, precision phenotyping at multi-
selection under another. In this case, well ple location/environment trials is required.
characterized environments and well estab- Selection of suitable locations for pheno-
lished selection criteria are essential pre- typing and accurate estimation of QTL
requisites for the development of a reliable effects across environments are two factors
precision phenotyping system. Precision that determine whether the QTL identi-
and high-throughput multi-locational phe- fied can be used for effective MAS. Also in
notyping, together with effective sampling MTA, either through linkage mapping or LD
and data acquisition systems being devel- mapping, QTL-by-environment interaction
oped for many traits, provides the potential effects should also be incorporated in to the
to develop a phenomics-based protocol for statistical model for MTA.
trait-specific breeding programmes. This
will not only help understand the phe-
notypic profile a plant possesses but also
improve the precision of genetic mapping 9.3 Reducing Costs and Increasing
and thus MAS for the target phenotype. Scale and Efficiency
Tracking samples from the field to the
harvest bags to DNA plates for DNA extrac- Highly abundant single nucleotide poly-
tions, PCR amplification and marker detec- morphism (SNP)-based genic markers pro-
tion and then tracing back to the field plants vide great potential for increasing scale
selected based on the genotyping is a time- and efficiency and thus reducing the cost
consuming and error-prone step, which of MAS because genotyping can be auto-
translates into a large proportion of cost as mated. Developments of high-throughput
Marker-assisted Selection: Practice 345
genotyping platforms are largely driven by molecular markers depends on marker type
human and animal research and applications. and its capacity in high-throughput analy-
However, there are many commonalities sis. With well-established marker systems
among MAB of livestock and human health and sequencing facilities, genotyping with
diagnostics, which will provide many impor- simple sequence repeat (SSR) markers costs
tant spillovers for molecular plant breeders. about US$0.300.80 per data point, depend-
The feasibility of marker-assisted approaches ing on marker multiplexing and the number
for plant breeding is heavily influenced by the of markers genotyped for each sample (Xu
relative cost (in time and money) compared et al., 2002). For example, the lowest cost of
with conventional breeding. SNP analysis in maize is now about US$90
There are several ways to reduce the for 1536 samples.
MAS cost. First, high-throughput analysis
using automated genotyping and data scor-
ing systems will help increase the daily data 9.3.1 Costbenefit analysis
output. Secondly, using the same sample for
selection of multiple traits will reduce the Costbenefit analysis will help us under-
trait-based cost. Thirdly, selection at an early stand which components in the system need
stage of plant development or before plant- to be improved and where the bottlenecks
ing and an early stage of the breeding process for large-scale application of MAS are, as
will minimize the number of plants that need preliminary outputs in this area have been
to be retained so that the overall breeding achieved in maize (Dreher et al., 2003; Morris
cost will come down. Fourthly, optimization et al., 2003) and wheat (Kuchel et al., 2005).
of MAS systems, including facilities and per- This analysis needs to be constantly updated
sonnel, will result in less cost per data point. as new genotyping systems become available
While not truly an inherent limitation and new optimizations are implemented
of the methods involved, one unavoidable in respective genotyping labs. Since many
limitation of MAS is the cost of assembling factors that can reduce cost may influence
and integrating the necessary infrastructure genetic gain, it is essential that costbenefit
and personnel. These can be substantial and analysis modules be integrated into those
beyond the means of many programmes. facilitating the genetic modelling and simu-
For such programmes, implementation of lation of different breeding systems (Wang et
MAS could lead to a delusional or unbal- al., 2003, 2004; Wang, J. et al., 2007).
anced reallocation of resources from vital The economic merit of MAS could
activities such as high-quality phenotypic include situations in which molecular costs
evaluation and selection in the target envir- are more than offset by savings in pheno-
onment (Ragot and Lee, 2007). Currently, typic evaluation. If molecular costs are in
only the largest maize breeding programmes addition to, not in place of, phenotypic
in a given market or region have the scale of costs, the economic merit of MAS will
sales and diversity of products that can jus- become questionable and more difficult
tify and support MAS and withstand some to evaluate. In other cases, the ability to
of the financial burdens of establishing and select early offsets the extra costs that are
replacing components of the system (e.g. associated with MAS. Detailed costbenefit
changes in the methods and platforms for analysis of various elements of DNA marker
detecting DNA polymorphisms). development and application, including the
The economic story from DNA sequen- cost of the required genotyping platforms
cing may tell us what we can expect in and professional expertise, needs to be
terms of cost reduction in marker genotyp- assessed at the earliest possible stage. This
ing. Sequencing cost per finished base was is particularly important at this time when
US$10 in 1990, but was reduced to US$1 most public plant breeding programmes are
in 1996, US$0.10 in 2002 and US$0.01 in not adequately funded or poorly equipped
2006, which is a thousand times cheaper to reach a critical threshold of marker assay
than in 1990. The cost of genotyping using throughput.
346 Chapter 9
offs between time and money, relative profit- single-seed-based DNA extraction system
ability can be evaluated using conventional will play a significant role in enhancing MAS
investment theory. Private firms, which can efficiency, particularly for traits expressed
raise operating capital by drawing on corpo- late in the cropping season. Compared to
rate cash reserves, floating shares in the stock MAS using DNA extracted from leaves and
market, or borrowing in commercial credit other tissues, seed-based DNA genotyping
markets, have been actively implementing has many advantages, including: (i) identi-
MAS to maximize the net benefits generated fication of desirable genotypes and discard
by their breeding programmes (also profits) of undesirable genotypes before planting;
by opting for technologies that allow them (ii) increasing the speed of breeding cycles
to bring new products into the market faster, by selecting genotypes during the off season;
even if these technologies are more costly to (iii) reducing the time-consuming and error-
implement. In contrast, public plant breed- prone sample collecting step that currently
ing programmes, which are more likely to involves harvesting leaf tissue from plants
face capital constraints in the sense that they in the field or glasshouse which then need
are usually required to operate within their to be retraced when the genotyping data is
budget allocation, have been much slower released; and (iv) saving land because only
to implement MAS. Public breeding pro- selected genotypes (seeds) are planted.
grammes can maximize the returns to their Although DNA extraction from single dried
limited resources by sticking to lower-cost PS seed has been studied in many plant species,
methods, even though this means that breed- most reports focus on destructive protocols.
ing projects will take longer to complete. To develop a comprehensive and operational
For many plant breeding projects, the system for MAS using single-seed-based
relative attractiveness of PS versus MAS will and non-destructive DNA extraction, the
not be in doubt. When switching between PS extracted DNA must have a high quality
and MAS implies a trade-off between time compared to leaf-tissue DNA so as not to
and money, the cost-effectiveness of DNA confound the PCR amplification and detec-
markers depends critically on four param- tion process. Similarly, the quantity of DNA
eters: (i) the relative cost of phenotypic should be large enough for whole genome
versus genotypic screening; (ii) the time sav- genotyping and DNA extraction should
ings achieved using MAS; (iii) the size and be high-throughput, while sampled seeds
temporal distribution of benefits associated should maintain a high level of germination.
with accelerated release of improved germ- A seed DNA-based genotyping sys-
plasm; and (iv) the availability to the breed- tem that is feasible for crop species with
ing programme of operating capital. All four relatively large seeds has been developed
of these parameters can vary significantly in CIMMYT for molecular breeding in maize
between breeding projects, suggesting that (Gao et al., 2008). An optimized genotyping
detailed economic analysis may be needed method using endosperm DNA sampled
to predict in advance which selection tech- from single maize seeds was developed
nology will be optimal for a given breeding (Fig. 9.2), which can be high-throughput
project (Morris et al., 2003). and is generally applicable to different types
of kernels. The seed DNA-based genotyping
method involved excising endosperm pieces
from imbibed maize seeds, then grinding
9.3.2 Seed DNA-based genotyping the pieces into powder in a 96-tube plate
and MAS system using a tissue shaker to improve efficiency.
Sampled seeds were stored in two 48-well
DNA extraction currently represents the plates as a unit for facilitating trace data
single largest cost in most MAS pipelines from desirable genotypes to corresponding
and often presents the rate limiting step candidate seeds. Using the seed DNA-based
for scale-up of the whole process. Develop- genotyping method, the DNA extraction
ment and optimization of a non-destructive process and following genotyping can be
348 Chapter 9
Fig. 9.2. Flowchart of large-scale seed DNA-based genotyping system. From Gao et al. (2008) with kind
permission of Springer Science and Business Media.
done in 96-well plates using regular extrac- changing population sizes and selection pres-
tion buffers, the DNA quality is function- sures to differences in field design and strat-
ally comparable with that of leaf DNA and egies in MAS. Over several breeding cycles,
the DNA amount extracted from 30 mg of this is likely to result in accelerated gain and
endosperm is sufficient for up to 200400 improved efficiency. Another advantage of
agarose gel-based markers and several mil- seed DNA-based genotyping is that genotyp-
lion chip-based SNP markers. By compar- ing can continue until at least a minimum
ing endosperm and corresponding leaf number of desirable genotypes are identified.
DNA of an F2 population, genotyping errors This means that target genotypes can be iden-
caused by pericarp contamination and het- tified by genotyping populations as small as
ero-fertilization averaged 3.8% and 0.6%, possible, saving the cost of sampling all avail-
respectively, depending on the SSR markers able plants in the field while avoiding the
used. Endosperm sampling did not affect risk that no desirable genotypes can be found
germination rates under controlled con- with available plants in the field (as there is
ditions, while under field conditions the no way to go beyond the plants that have been
germination rate, seedling establishment planted), compared to leaf DNA-based geno-
and normalized different vegetative index typing. For example, a theoretical proportion
(NDVI) were significantly lower than that of of homozygotes at n target loci in an F2 popu-
controls for some genotypes. Careful field lation is (1/4)n and thus for three loci, 1/64
management could compensate for these plants in the population will have the desir-
slight effects on germination and seedling able genotypes. As seed DNA-based genotyp-
establishment. Seed DNA-based genotyping ing can stop at any stage once the suitable
lowered costs by 24.6% compared to leaf amount of target seeds that carry desirable
DNA-based genotyping due to reduced field genotype has been identified, the number
plantings and labour costs. of seeds that have to be genotyped could be
As seed DNA-based genotyping can much less than, or in the worst case equal to,
be processed before planting, for example the number of plants that have to be planted
selecting on F2 seeds harvested from an F1 in the field. For leaf DNA-based genotyping,
plant, it is possible to select desirable geno- to ensure a 99% probability of obtaining at
types before planting. This has a potentially least one desirable genotype, a minimum
large impact on breeding programmes, from number of plants that have to be planted is
Marker-assisted Selection: Practice 349
Outcrossing
9.4 Traits Most Suitable for MAS
Evolutionary change in plant mating sys-
With currently available molecular mark- tems from outcrossing (cross-pollination) to
ers and genotyping systems, some traits are inbreeding (self-pollination) has occurred
more suitable for MAS than others. Xu, Y. frequently throughout the history of flower-
(2002) evaluated various traits and listed ing plants and has been described as the most
those most suitable for MAS, which include common evolutionary trend in angiosperm
Marker-assisted Selection: Practice 351
reproduction (Stebbins, 1957, 1970). For parental lines (Lucken, 1986). Considering
example, wild rice is frequently cross-polli- all hybrid cereal crops with the CMS system,
nated, while cultivated rice is self-pollinated. measurements for increased outcrossing
Many characters involved in mating system rate will include choice of favourite climate
evolution, such as sizes of floral organs or conditions for seed production; ensuring
amount of pollen produced, are quantitative flowering synchronization of the two par-
in nature. Hybrid seed production depends ents; providing a suitable pollen source;
on the improvement of outcrossing-related developing male sterile lines with desirable
traits and for self-pollinated crops, it might outcrossing traits; supplementary pollina-
involve a reconstruction (or recovery) of the tion; and adjustment of flowering habit and
outcrossing mating system (Xu, Y., 2003). stigma characteristics using growth regula-
Various techniques to produce hybrids tors such as gibberellic acid (Xu, Y., 2003).
have been developed depending on the Many plants are naturally self-
crop, including hand emasculation, roguing pollinated. Their floral structure is adapted
of staminate plants in dioecious lines, use for inbreeding. Breeding parental lines may
of gynoecious or highly female lines, CMS need to completely convert the floral struc-
and genetic male sterility, protogyny, or ture and make them suitable for outcross-
self-incompatibility (Janick, 1998). The rate ing. Outcrossing in rice depends on the
of outcrossing is often the limiting factor capacity of stigmas to receive alien pollen
determining whether a hybrid has potential and the capacity of anthers to emit much
for commercialization: seed cost and price pollen to pollinate other plants in the prox-
are both largely dependent of how easy it imity (Oka, 1988). Linkage between long
is to produce high-quality hybrid seed that exerted stigma and undesirable agronomic
both seed providers and farmers accept. traits in wild rice species is quite strong
Maize was particularly suitable for hybrid and needs to be broken to incorporate these
breeding because of monoecism and the traits into selected genotypes. On the other
simple emasculation techniques practised hand, using the gene eui (elongated upmost
in breeding that allowed for easy inbreed- internode) to correct the panicle enclosure
ing and outbreeding (Simmonds, 1979). The associated with CMS has been used in China
necessity of high seeding rates in highly for high-yielding seed production with the
self-pollinated crops such as rice and wheat minimized gibberellic acid application.
introduces an economic problem: seed This gene has been cloned (Zhu et al., 2006)
production costs must be low enough and and hopefully the gene transfer can be facil-
yield of hybrids in the farmers fields must itated by MAS.
be high enough that farmers can profit from The floral structure of wheat is consid-
purchase and use of hybrid seed and com- ered to be oriented towards cross-pollination
panies can profit from their production and (Wilson, 1968). However, a close examina-
sale (Goldman, 1999). tion of its floral traits clearly indicated that
Yield of hybrid seed is determined by wheat is less suited, in its present form, to
many variables, both genetic and environ- cross-pollination than crops such as maize,
mental. In productive, favourable environ- sorghum and rye (Wilson and Driscoll, 1983).
ments, seed yield from seed set through After review of the status of hybrid wheat,
cross-pollination can approach those of con- Lucken and Johnson (1988) indicated the
ventional self-pollinated cultivars in wheat need for acquiring more knowledge about
(Lucken, 1986) or might be up to 80% of genetic variation of floral biology, including:
inbred lines in rice (Yuan and Chen, 1988; (i) spike and flower morphology; (ii) pollen
Lu et al., 2001). The breeders approach to dispersal, buoyancy, durability and vigour;
high, stable seed production is: (i) to iden- (iii) stigma accessibility, receptivity and
tify those plant and flower features that durability; and (iv) development of selec-
affect cross-pollination; (ii) to find varia- tion screens for these traits.
tion for these traits; and (iii) to incorporate Many factors affecting outcrossing
genes for favourable expression of traits into provide opportunities for MAS. However,
352 Chapter 9
start to flower when specific photoperiod mapped rather than the relative response
and/or temperature conditions are met. In measured under the NIEs. In rice, numer-
hybrid crops, flowering synchronization ous QTL for days-to-heading or -flowering
of two parents is one of the factors influ- have been mapped using molecular mark-
encing hybrid seed production and thus ers but very few of them have been tested
the economic advantage over the inbred under both long- and short-day conditions.
lines/cultivars. To understand photoperiod Using an F2 between japonica Nipponbare
and temperature responses, hybrids and and indica Kasalath, Yano et al. (1997)
their parents must be planted in a variety identified two major and three minor QTL
of environments or NIEs. Genetic study of for heading date. Three of them (Hd1, Hd2
these responses will finally characterize the and Hd3) were identified later as photo-
parental photoperiod-thermo response pat- period sensitivity genes by testing the QTL-
tern and its effect on their hybrids and thus NILs under different day-lengths (Lin et al.,
make hybrid photoperiod-thermo response 2000) and one of them (Hd1) was cloned
predictive. (Yano et al., 2000).
Using a rice double haploid (DH) pop-
ulation between Zhaiyeqing 8 and Jingxi Environment-induced genic male sterility
17, days-to-heading and photo-thermo
sensitivity were investigated in two envi- Male sterility can be induced by specific
ronments (Beijing and Hangzhou, China) environmental factors. An EGMS was
that differ mainly in day-length and tem- first discovered in rice by Shi (1981) from
perature (Xu, 2002). Four chromosomal Nongken 58, a japonica cultivar. The
regions were significantly associated mutant Nongken 58S is sterile when the
with days-to-heading in either or both days are long (> 13.5 h) but becomes fertile
locations, whereas a different locus on when days are short (< 13.5 h). Thus, fertility
chromosome 7 (G397A-RM248) was signif- conversion is triggered by the length of pho-
icantly associated with photo-thermo sen- toperiod. EGMS has also been reported in
sitivity, indicating that the photo-thermo pepper, tomato, wheat, barley, sesame, pea,
sensitivity QTL was independent of the rape and soybean.
QTL for days-to-heading. By evaluating The dependency of male sterility on tem-
days-to-flowering of individual CO39/ perature or photoperiod-temperature inter-
Moroberekan RILs under 10 h and 14 h action requires two different environments in
day-lengths and greenhouse conditions, the breeding and selection process. Breeding
Maheswaran et al. (2000) identified 15 populations have to be planted in one envir-
QTL for days-to-flowering. Only four of onment where the plants will be sterile to
them were also identified as influencing make sure of the presence of sterility genes
response to photoperiod. and in another where the plants will be fer-
Different QTL have been identified tile to confirm the fertility conversion and
using direct and relative trait values and produce seeds. Using associated molecular
in rice, days-to-heading and photoperiod markers, confirmation of fertility conversion
are often controlled by different QTL as involving two environments can be avoided.
discussed above. On the other hand, direct Genetic mapping studies in rice have laid a
and relative traits could share some QTL. foundation for MAS in breeding EGMS lines.
That means days-to-heading and photo- To facilitate incorporation of the tms2 gene
period sensitivity are genetically related to in rice, an SSR marker, RM11, located on
some extent because both traits are related chromosome 7, was identified and found to
to the basic vegetative growth that plants be useful in identifying heterozygous fertile
must achieve to flower. There are QTL map- plants in F2 populations and F3F4 progenies
ping studies undertaken in NIEs, but QTL for selection of progenies in advance (Lu et
were mapped using trait values scored in al., 2004). Lang et al. (1999) reported that
each environment rather than using rela- PCR-based markers were 85% accurate in
tive measures. The traits themselves were identifying tms3 in the juvenile stage.
Marker-assisted Selection: Practice 355
Biotic and abiotic stresses silk emergence. A short ASI means rapid
silk extrusion because time to anthesis is
Breeding of insect and disease resist- little affected by drought.
ance and tolerance to abiotic stresses has
become a worldwide issue. To identify
insect/disease resistance, plants must be 9.4.3 Seed and quality traits
inoculated artificially or naturally, or in
specific environments where the stress Seed traits
exists. Artificial inoculation is impractical
when the insects/diseases are under quar- As a major storage organ of crop seeds, endo-
antine control. On the other hand, evalua- sperms provide humans with proteins, essen-
tion of plant response to different insects/ tial amino acids and oils. An understanding
diseases or different biotypes/strains/races of the inheritance of endosperm traits is
of the same stress agents is very difficult, if critical for the improvement of seed quality.
not impossible, using traditional screening Genetic behaviour in triploid endosperms
methods. is very different from that of the maternal
In traditional breeding programmes, plants that supply assimilates for grain
selection for tolerance to abiotic stress such growth and development. Thus, methods
as salinity, drought and submergence tol- suited for genetic analysis of traits in mater-
erance and lodging resistance can only be nal plants (diploids for most cereal crops)
done in specific environments that are either cannot directly be used for endosperm traits
present at specific locations or created at (Xu, 1997). Any genetic analytical method
well-controlled environments. Selection for for endosperm traits needs to combine
these traits is considered most difficult in a genetic method developed for diploid
breeding programmes. maternal plants with a triploid model pro-
For effective MAS, development of posed for conventional genetic analysis.
suitable selection criteria is critical for both The genetic system controlling endo-
MTA and the following MAS, particularly sperm traits may be much more compli-
for abiotic stresses. Taking drought toler- cated than that which controls traits of the
ance in rice as an example, current know- plant per se. Because the plant provides
ledge on physiology suggests that drought seeds with a portion of their genetic mate-
tolerance depends on one or more of the fol- rial and almost all the nutrients required
lowing components: (i) the ability of roots for growth and development, seed traits
to exploit deep soil water to provide for eva- are genetically affected by both the seed
potranspirational demand; (ii) the capacity nuclear genes and the maternal nuclear
for osmotic adjustment that allows plants genes. In addition, cytoplasmic genes may
to retain turgidity and protect meristems also affect some seed traits through their
from extreme desiccation; and (iii) control indirect effects on the biosynthetic proc-
over non-stomatal water loss from leaves esses of chloroplasts and mitochondria.
(Nguyen et al., 1997). These components are To understand endosperm traits with bio-
generally applicable to other cereal crops. logical accuracy, one should take into
A large number of QTL had been identified consideration maternal genetic effects and
in rice for osmotic adjustment, dehydra- cytoplasmic effects along with the direct
tion tolerance, abscisic acid accumulation, genetic effects of seeds. As seeds initiate
stomatal behaviour, root penetration index, a new generation that differs from their
root thickness, total root number, root maternal plants, some seed traits should
length, total dry root weight, deep root dry be considered as one generation advanced
weight and root pulling force (Zhang et al., over their maternal plants. Genetic analy-
1999). In maize, grain yield under drought sis of endosperm traits should be based
stress is negatively correlated with the on the DNA extracted from both maternal
anthesis-silking interval (ASI), the differ- plants and endosperm tissues in order to
ence in days between pollen shedding and understand the relative contribution of the
356 Chapter 9
different genetic factors to the variation of types may come from different genetic
endosperm traits (Xu, 1997). In many cases, factors or different alleles from the same
all endosperm traits have been treated the locus. PS for the same trait values may not
same as other traits of the plant, with few result in the same alleles or genes fixed in
reports (Tan et al., 1999) that considered parents.
the generation advancement issue. On the other hand, almost all quality
traits are only measurable at or after the
Hybrid seed traits reproductive stage. MAS will help distin-
guish different genetic loci that contribute
Although F1 plants are uniform, seeds borne to the same quality traits. Methods for non-
on them represent the F2 seed generation destructive extraction of DNA from single
and are expected to segregate for some grain dry seeds, as discussed in Section 9.3 and
characteristics. Major determinants of grain Xu et al. (2009d), provide an opportunity for
quality, for example in cereal crops, are selection of seed traits so that selection can
milling; grain size, shape and appearance; be processed before planting. Early-stage
and cooking and eating characteristics. selection also provides more opportunities
Some grain tissues are of maternal origin for selection of traits with relatively low
and some result from fertilization and union heritability. MAS could be used for early-
of genetically diverse gametes. For exam- stage quality tests or DNA-based quality
ple, the lemma and palea of the rice hull are tests, whereas such tests would be delayed
maternal tissues. Seed size and shape are in a conventional breeding programme
determined by the shape and size of hulls because a relatively large amount of seeds
and the latter is determined by the genotype is required.
of F1 plants. As a result, all F2 seeds borne Genetic contribution to quality comes
on F1 plants have nearly identical dimen- from both parents, but one of them could
sions even though the parents could have be more important than another in some
very different seed sizes. Endosperm is trip- specific situations. Endosperm properties
loid tissue resulting from the union of one might be affected more by female parents
male nucleus with two female nuclei. If the due to the maternal effect, or more by male
parents differ in endosperm traits, these parents due to the xenia effect. The compo-
traits among F2 grains on F1 plants show sition and development of the kernels can
clear-cut segregation (Kumar and Khush, be changed by the nature of pollen. This
1986; Tan et al., 1999). Single seed analysis was first shown by Kiesselbach (1926) as
of a rice hybrid, Shanyou 63, indicated the change of a sweet corn endosperm into
that the amylose content for seeds on a F1 a starchy endosperm after pollination of
plant could range from 8% to 32% when a sweet corn female by a flint endosperm
two parents had 15.8% and 27.2% amylose male. Large xenia effects were observed for
content, respectively. A similar situation sorghum malt quality in the F1 but this was
was reported for barley. If the parents dif- entirely lost in the F2 generation (Wenzel
fer significantly in malting quality charac- and Pretorius, 2000). Curtis et al. (1956)
ters, the grain produced by barley hybrids observed that the germ is markedly influ-
will be heterogeneous and heterozygous for enced in weight, oil and protein content by
characters critical to the malting process both the seed parent and the pollen parent
(Ramage, 1983). of corn, with a pronounced maternal effect.
Quality traits
Table 9.2. List of markers along with the chromosomal location of the target genes, currently in use for
MAS at CIMMYT (from William et al. (2007b) with kind permission of Springer Science and Business
Media).
(iii) generating NILs that can be used as the because of increases in panicle length, pani-
basis for gene isolation and also as parents cles per plant, grains per plant and grain
for further crossing in a cultivar develop- weight. These improved lines with 9311-
ment programme; and (iv) providing gene- type genetic backgrounds are being used to
based markers for targeted introgression of raise the existing yield potential of super
alleles using MAS. hybrid rice in China (Liang et al., 2004).
Development of exotic genetic libraries, O. grandiglumis (allotetraploid, CCDD
(also known as CSSL, chromosome seg- genome species) is another wild relative
ment substitution line; IL, introgression contributing positive alleles for increased
line; or CL, contig line) is another approach grain yield in rice. In contrast, only 68%
to enhance utilization of wild relatives increase in grain yield was reported when
to expand crop gene pools. These genetic positive alleles from Hordeum spontaneum
stocks provide a well characterized poten- were introgressed into barley. Wild rela-
tial resource for uplifting the yield barriers tives also contributed positive alleles for
through pyramiding beneficial loci and fix- improved grain characteristics in rice (long,
ing of positive heterosis. For example, when slender and translucent grains and grain
tomato introgression lines carrying three- weight), wheat (grain weight and hardness)
independent yield-promoting genomic and barley (grain weight, protein content and
regions were pyramided, the progenies pro- some malt quality traits). Of particular inter-
duced more than 50% greater yield com- est is a locus for grain weight, tgw2, which
pared to controls (Gur and Zamir, 2004). contributed positive alleles from O. gran-
Yoon et al. (2006) reported that several rice diglumis that are independent from unde-
lines outperformed Hwaseongbyeo (approx- sirable effects of height and maturity (Yoon
imately 1 t ha1 increase in grain yield). et al., 2006). In a similar study, Ishimaru
Several grain characteristics including (2003) identified a grain weight QTL, tgw6,
grain weight, were improved after crossing responsible for increased yield potential
an advanced introgression line contain- without any adverse effects on plant type,
ing Oryza grandiglumis segments, HG101 or grain quality in the Nipponbare genetic
(very similar to Hwaseongbyeo) with background. Similarly, alleles from Glycine
Hwaseongbyeo. The above examples dem- soja conveyed 89% increase in grain yield
onstrate that wild relatives contain desira- and improved the protein content in soy-
ble alleles for agronomic traits even though bean (Concibido et al., 2003).
their effect is phenotypically not evident
in wild relatives. It is important that more
emphasis should be given to exploit wild
relatives to identify yield enhancing alleles 9.5.2 Marker-assisted gene
to further raise the yield potential of crop introgression from elite germplasm
cultivars.
Using AB-QTL analysis, yield and grain- Unquestionably, the most pervasive and
quality enhancing alleles from wild relatives direct use of MAS by the private sector has
have been successfully introgressed in rice, been with backcrossing of transgenes into
wheat, barley, sorghum, common bean and elite inbred lines, the direct parents of the
soybean. Dramatic yield advantages have commercial hybrids, particularly in maize
been reported in rice, for example, through (Ragot et al., 1995; Crosbie et al., 2006).
the introduction of two yield-enhancing QTL Currently, the most widely deployed trans-
alleles (yld1.1 and yld2.1) from O. rufipogon genes and combinations thereof (i.e. gene
(AA genome) into 9311 (one of the top per- stacks) are for resistance to herbicides or
forming parental lines used in the produc- insects (e.g. Ostrinia and Diabrotica). As
tion of super hybrid rice in China). This the commercial maize crop of any region,
contributed in excess of 20% yield increases maturity zone, market or country is not yet
in rice; i.e. about 1 t ha1 gain in yield in uniform or homogeneous for any transgene,
some of the newly bred cultivars, largely maize breeders have elected to develop
360 Chapter 9
near-isogenic versions (transgenic and non- showed resistance to late blight. RB has also
transgenic) of elite inbreds and commercial been cloned and transformed into Katahdin,
hybrids in order to satisfy combinations of a highly susceptible potato cultivar. The
licensing agreements, agronomic practices, Katahdin transformed plants with RB
regulatory requirements, market demands showed broad-spectrum resistance against a
and product development schemes (Ragot wide range of late blight isolates (Song et al.,
and Lee, 2007). This has required compa- 2003). Clearly, by having the full sequence
nies to have two parallel maize breeding of the target gene, it should be possible to
programmes, transgenic and non-transgenic. develop a highly efficient low cost assay
In this manner, MABC of transgenes and to system for this trait. The best example of the
a lesser degree, of native genes and QTL use of MAS in commercial barley breeding
for other traits, has expedited the develop- is the barley yellow mosaic virus complex
ment of commercial hybrids. Unless regula- where a variety of different markers have
tory issues change dramatically, MABC will been developed for selection of the rym4
remain the preferred means of delivering and rym5 resistance genes and one, the SSR
transgenes to the market. Bmac0029, is used by many European winter
MABC clearly provides the information barley breeders (Rae et al., 2007). The clon-
needed to reduce the number of generations ing of the rym4/5 locus (Stein et al., 2005)
of backcrossing, to combine (i.e. stack) provides the basis of a diagnostic marker for
transgenes, native genes or QTL into one rym4/5-based virus resistance.
inbred or hybrid quickly and to maximize As reviewed by Dwivedi et al. (2007),
the recovery of the recurrent parents genome MAS coupled with backcross and pedigree
in the backcross-derived progeny. In several breeding methods and field evaluation has
private breeding programmes, MABC has led to reports in the literature of genetic
enabled the number of backcrossing genera- enhancement for resistance to bacterial
tions needed to recover 99% of the recurrent blight (Xa21), gall midge (Gm-6t) and brown
parent genome to be reduced from six to plant hopper (Bph1 and Bph2) in rice; to
three, reducing the time needed to develop leaf rust (Lr19, Lr51 and Yr15) in wheat; to
a converted cultivar by 1 year (Crosbie et al., yellow dwarf virus (Yd2), stripe rust (Yr4)
2006; Ragot et al., 1995). As a line derived and powdery mildew (mlo-9) in barley; and
by MABC can be made to be very similar to to downy mildew (major QTL) in pearl mil-
the original non-converted line, most of its let. The progenies showed the same resist-
attributes, including agronomic perform- ance level as the donor parental lines both
ance, can be assumed to be equal or similar in greenhouse and field evaluations.
to those of the original line.
Marker-assisted gene introgression is
thought to be promising in rice because a
number of rice cultivars are widely grown 9.5.3 Marker-assisted gene
for their adaptation, stable performance introgression for drought tolerance
and desirable grain quality. Chen et al.
(2000) used such an approach to transfer The International Rice Research Institute
the bacterial blight resistance gene Xa21 (IRRI) has several drought-tolerance breed-
into Minghui 63, a widely used parent for ing programmes using identified QTL and
hybrid rice production in China. Ahmadi MAS. QTL affecting root parameters were
et al. (2001) used a similar approach to identified using a rice DH population
introgress two QTL controlling resistance derived from the cross IR64 Azucena.
to rice yellow mottle virus into the cultivar An MABC programme was started to trans-
IR64. Such approaches, however, can only fer the alleles of Azucena (upland rice)
sample a small number of accessions. at four QTL for deeper roots on chromo-
Using PCR-based DNA markers for track- somes 1, 2, 7 and 9 from selected DH lines
ing the RB gene in potato breeding popula- into IR64 (elite rice cultivar) (Shen et al.,
tions, several marker-positive selected lines 2001). The backcross progeny were selected
Marker-assisted Selection: Practice 361
strictly on the basis of their genotype at the inducing less than 40% yield reduction,
marker loci in the target regions up to the performance of testcross hybrids resulting
BC3F2, from which BC3F3 NILs were devel- from MAS was no better than the original
oped and compared to IR64 for the target version of CML274.
root traits. Of the three tested NILs carrying A major QTL on linkage group 2 (LG2)
target 1 (QTL on chromosome 1), one had is associated with increased grain yield
significantly improved root traits over IR64. and harvest index under terminal stress in
Three of the seven NILs carrying target 7 pearl millet cultivar PRLT 2/89-33 (Yadav
(QTL on chromosome 7) alone, as well as et al., 2002). The performance of QTL
three of the eight NILs carrying both targets MAS-derived top cross hybrids (TCH)
1 and 7, showed significantly improved root was compared with that of field-based
mass. Four of the six NILs carrying target 9 TCH. Progenies with the best overall abil-
(QTL on chromosome 9) had significantly ity to maintain under terminal stress envi-
improved maximum root length. ronments were used to generate the TCH
Steele et al. (2006) initiated MABC to and these were compared with randomly
improve drought tolerance into Kalinga III, mated TCH made from randomly selected
an upland indica cultivar. After five back- progenies from the entire population
crosses and conducting over 3000 marker (irrespective of performance under termi-
assays (2548 restriction fragment length nal drought stress). In both cases progenies
polymorphism (RFLP) and 700 SSR) on were selected irrespective of the presence or
323 plants, the NILs were developed and absence of favourable alleles at the putative
evaluated for root traits. The target seg- drought tolerance QTL and evaluated across
ment on chromosome 9 (RM242-RM201) 21 environments (non-stress, terminal
significantly increased root length under stress and gradient stress). The QTL MAS-
both irrigated and drought stress environ- derived hybrids were significantly, but only
ments. Azucena alleles at the locus RM248 modestly, higher yielding both in full and
(below the target root QTL on chromosome in partial terminal stress environments.
7) delayed flowering. However, selection for However, this advantage under stress was at
the recurrent parent allele at this locus pro- the cost of lower yield of the same hybrids
duced early-flowering NILs that are suited under non-stressed environments. The QTL
to upland environments in eastern India. MAS-derived hybrids flowered earlier and
Anthesis-silking interval (ASI) is an had limited effective basal tillers, low bio-
important trait associated with drought tol- mass and high harvest index. All these traits
erance in maize. Ribaut et al. (1996, 1997) are similar to that of the drought tolerant
initiated a major MAB programme to transfer parent thus confirming the effectiveness of
five genomic regions involved in the expres- the putative drought tolerant QTL on LG2
sion of a short ASI from Ac7643 (a drought (Bidinger et al., 2005).
tolerant line) to CML247 (an elite tropical
breeding line). Five genomic regions were
transferred using flanking PCR-based mark- 9.5.4 Marker-assisted gene
ers. Seventy of the best BC2F3 (i.e. S2 lines) introgression for quality traits
lines were crossed with two testers, CML254
and CML274. These hybrids and the BC2F4 Rice
families derived from selected BC2F3
plants were evaluated for 3 years under Rice amylose content, mainly control-
drought stress conditions. The best five led by the wx gene, is a good example of
MABC-derived hybrids yielded, on average, MAS. Ayres et al. (1997) determined the
at least 50% more than the control hybrids relationship between polymorphism at that
under water stress conditions (Ribaut et al., locus and variation in amylose content.
2002b; Ribaut and Ragot, 2007). However, Eight wx microsatellite alleles were identi-
this difference became less marked when fied from 92 long-, medium- and short-grain
the intensity of stress decreased: for a stress US rice cultivars. When used as predictors
362 Chapter 9
value (Babu et al., 2004). A naturally QTL that affect yield. Using backcross breed-
occurring recessive mutant gene opaque2, ing and QTL/marker information, they iden-
observed first in a Peruvian maize landrace, tified a NIL (00170) that when evaluated
gives a chalky appearance to the kernels for yield over 22 environments and for malt
and has improved protein quality due to quality over six environments, produced
increased levels of lysine and tryptophan in yield equal to Baronesse while maintaining
the endosperm (Mertz et al., 1964). However, a Harrington-like malt quality profile. Other
this trait appears to be associated with infe- studies have also reported the development
rior agronomic traits such as brittleness and of lines with improved malt quality: white
increased susceptibility to insect pests. With aleurone colour and high a-amylase content
the discovery of modifier genes that alter (Ayoub et al., 2003) and high in b-glucan and
the soft, starchy texture of the endosperm, fine-coarse difference (Igartua et al., 2000).
maize breeders developed hard endosperm
o2 mutants designated as Quality Protein
Maize (QPM) (Prasanna et al., 2001; Nelson, 9.6 Marker-assisted Gene
2001; Xu et al., 2009d) which have the phe- Pyramiding
notypes and yield potential of normal maize
but maintain the increased lysine content of
Gene pyramiding is the process that brings
o2. Opaque2 is a recessive trait but due to
the genes or alleles dispersed in different
the effect of the modifiers, QPM behaves as a
cultivars into a cultivar/genotype. QTL
quantitative trait. Using SSRs and backcross
pyramiding is an important strategy for
breeding, Babu et al. (2005) developed maize
rebuilding the outputs from reductionist
lines that had twice the amount of lysine and
genomics research into whole traits of value
tryptophan as compared to local cultivars
for crop improvement. Genes can be pyra-
and recovered up to 95% of the recurrent par-
mided through pedigree breeding by crosses
ent genome in two backcross generations.
involving multiple parental lines contain-
ing different favourable alleles or MABC
Barley to introgress those alleles into the same
genetic background. One of the approaches
Malt is a major raw material for the produc-
for the pedigree method is to use NILs. Once
tion of beer. Characters that affect malting
the desirable QTL have been detected, then
quality include malt extract content, a- and
NILs are generated for each QTL in a com-
b-amylase activity, diastatic power, malt
mon elite genetic background and the effect
b-glucan content, malt b-glucanase activ-
of each QTL individually evaluated. The
ity, grain protein content, kernel plumpness
selected NILs containing the most important
and dormancy, all are quantitatively inher-
QTL for the target trait are subjected to pair-
ited and variously influenced by the envir-
wise crosses to pyramid two or more QTL
onment (Zale et al., 2000). There are a few
for one or more target traits. For example, in
barley cultivars with good malt quality that
rice QTL for increased grain number (Gn1)
brewers are reluctant to change from due to
and QTL for reduced plant height [Ph1(sd1)]
their concerns about the resultant changes in
were pyramided in the Koshihikari back-
flavour and brewing procedures. For exam-
ground producing a 23% increase in grain
ple, the goal of the US Pacific Northwest
yield while reducing the plant height by
barley breeding programme is to produce
20% compared with Koshihikari (Ashikari
high yielding NILs that maintain traditional
et al., 2005).
malting quality characteristics but transfer
QTL associated with yield, via MABC, from
the high yielding cv. Baronesse to the North
American two-row malting barley industry 9.6.1 Gene pyramiding for major genes
standard cv. Harrington. Schmierer et al.
(2004) targeted the Baronesse chromosome The great opportunity offered by MAS to
2HL and 3HL fragments presumed to contain select superior lines based on genotype
364 Chapter 9
Crop and target trait Gene Breeding scheme Marker MAS product Reference
Barley yellow mosaic rym4, rym5, rym9 Simple and complex RAPDs and SSRs DHs carrying rym4, Werner et al. (2005)
virus and barley mild and rym11 crosses using rym9 and rym11
mosaic virus double haploids and those with rym5,
rym9 and rym11
Barley stripe rust QTL (1H, 4H Backcross derived SSRs Introgression lines Richardson et al. (2006)
and 5H) introgression lines carrying 1H, 4H or 5H
individually or in
combinations
Common bean rust Nine major genes Three backcrosses RAPDs Lines combining resistance Faleiro et al. (2004)
365
(Continued )
366
Table 9.3. Continued
Crop and target trait Gene Breeding scheme Marker MAS product Reference
Rice blast (BL) Pi1, Piz-5 and Pita Pedigree breeding RFLPs The pyramided lines Hittalmani et al. (2000)
[Magnaporthae grisea showing better
(Herbert) Borr. resistance to blast
(anamorphe Pyricularia
oryza Cav.)]
Rice blast (BL) and Piz-1 and Piz-5 Pedigree breeding RZ536 and r10 (BL) The pyramids showing Narayanan et al. (2004)
bacterial blight (BB) (BL) and and Xa21 (1.4 kb enhanced
Xa21 (BB) fragment of pC822) resistance to BL
and BB
Soybean corn earworm QTL and Three backcrosses Nine SSRs The pyramid lines Walker et al. (2002)
(CEW) (Helicoverpa Bt (cry1Ac) with a
zea Boddie) detrimental effect
Chapter 9
on larval weights
and on defoliation
by CEW
Soybean corn earworm cry1Ac and QTL Two backcrosses Six SSRs and Lines carrying Walker et al. (2004)
and soybean looper (PI 229358) sequence-specific cry1Ac and QTL
(Pseudoplusia includens) primers cry1Ac alleles resistant to
three lepidopteran
pests
Wheat Fusarium head Six FHB QTL, Two backcrosses gwm533, gwm493 Resistant progenies Somers et al. (2005)
blight (FHB) (Fusarium Sm1 for midge and wmc808 containing
graminearum), orange and Lr21 for leaf chromosome
blossom midge rust segments FHB,
(Sitodiplosis mosellana) Sm1 and Lr21
and leaf rust (Lr21)
Wheat powdery mildew Pm2, Pm4a, Pedigree breeding RAPD and SCAR Lines with Pm2 Wang et al. (2001)
(Erysiphe graminis Pm6, Pm8 markersa and Pm4a
DC. F. tritici Em. Marchal) and Pm21 immune to
powdery mildew
a
RAPD, randomly amplified polymorphic DNA; SCAR, sequence characterized amplified regions.
Marker-assisted Selection: Practice 367
abiotic stress tolerance (Ragot et al., 2000) Hybrid performance can be measured by the
and multiple traits are being targeted heterosis, the performance of a hybrid over
simultaneously. Selection indices were their parental lines.
apparently based on ten to probably more Suppose a breeder has 100 inbreds from
than 50 loci, these being either QTL identi- heterotic group 1 and 100 inbreds from
fied in the experimental population where heterotic group 2. There are 10,000 possi-
MARS was being initiated, QTL identified ble (group 1 group 2) single crosses. For
in other populations, or genes. Marker gen- developing new hybrids, there are 495,000
otypes are generated for all markers flank- possible (group 1 F2 ) (group 2 tester)
ing QTL included in the selection indices combinations and 495,000 possible (group
(Ragot et al., 2000). Plants are genotyped 1 tester) (group 2 F2) combinations, if
at each cycle and specific combinations testcrossing starts from the F2. Due to lim-
of plants are selected for crossing, as pro- ited resources, breeders are unable to test
posed by van Berloo and Stam (1998). all combinations in all environments of
Several, probably three to four, cycles or interest but may test a limited set of sin-
MARS are conducted per year using con- gle crosses and F2 tester combinations.
tinuous nurseries. Results reported in Typically, < 1% of the maize single crosses
these recent communications about pri- tested by a breeder eventually become com-
vate MARS experiments (Ragot et al., 2000; mercial hybrids (Hallauer, 1990). Therefore,
Eathington, 2005) are in sharp contrast to predicting hybrid performance has always
those in earlier publications (Openshaw been a primary objective in all hybrid-
and Frascaroli, 1997; Moreau et al., 2004). breeding programmes. Methods for predict-
As summarized by Ragot and Lee (2007), ing the performance of single crosses would
this selection response can be attributed to: greatly enhance the efficiency of hybrid
(i) rather large sizes of the populations sub- breeding programmes. Development of a
mitted to selection at each cycle; (ii) use reliable method for predicting hybrid per-
of flanking versus single markers; (iii) formance and/or heterosis without generat-
selection before flowering; (iv) increased ing and testing hundreds or thousands of
number of generations from one to four single cross combinations has been the goal
generations per year; and (v) lower cost of of numerous studies using marker data and
marker data points. combinations of marker and phenotypic
data, particularly in maize and rice.
the genome-wide epistasis. They detected thesis, response to stress, transcription reg-
841 QTL for 35 diverse traits. NILs show- ulation and others. They further confirmed
ing greater reproductive fitness are char- the expression patterns of 68.2% SSH-
acterized by the prevalence of ODO QTL, derived cDNAs by reverse Northern blot,
which were virtually absent for the non- while semi-quantitative RT-PCR exhibited
reproductive traits. ODO results from true similar results (72.2%). This suggests that
ODO due to allelic interactions of a single the genes differentially expressed between
gene or from pseudo ODO involving linked hybrids and their parents are involved in
loci with dominant alleles in repulsion. In diverse physiological pathways, which may
their study, although they detected domi- contribute to heterosis in wheat.
nant and recessive QTL for all phenotypic Maize inbred lines B73 and Mo17 pro-
traits but ODO only for the reproductive duce a heterotic F1 hybrid. Based on analysis
traits indicates that pseudo ODO is unlikely with a 13,999 cDNA microarrays, Swanson-
to explain heterosis in NIL, thus they favour Wagner et al. (2006) compared global pat-
the true ODO model, a single functional terns of gene expression in seedlings of the
Mendelian locus, involved in heterosis. hybrid (B73 Mo17) with those of its paren-
tal genotypes. A total of 1367 expressed
Gene expression analysis of heterosis sequence tags (ESTs) were observed to be
significantly differentially expressed, using
Using serial analysis of gene expression an estimated 15% false discovery rate as cut
(SAGE), Bao et al. (2005) surveyed tran- off. All possible modes of gene action were
scripomes in panicles, leaves and roots of observed, including additivity, high- and
a super-hybrid rice (LYP9) in comparison to low-parent dominance, underdominance
its parental inbred cultivar genotypes (93- and overdominance. A total of 1062 of the
11 and PA64s). They identified 595 upregu- 1367 ESTs (78%) exhibited expression pat-
lated and 25 downregulated tags in LYP9 terns that are not statistically distinguish-
that were related to enhancing carbon- and able from additivity while the remaining 305
nitrogen-assimilation, including photosyn- ESTs exhibited non-additive gene expres-
thesis in leaves, nitrogen uptake in roots sion. About 181 of the 305 non-additive
and rapid growth in both roots and pani- ESTs exhibited high-parent dominance, 23
cles. They found massive complementation ESTs showed low-parent dominance, while
at the transcript level that further suggests 44 ESTs displayed underdominance or
that the underlying mechanisms of hetero- overdominance. These results suggest that
sis may not be as simple as have been multiple genetic mechanisms, including
reported from studies of a small number of overdominance, contribute to heterosis. This
genes (Birchler et al., 2003). contrasts with previous studies that reported
Yao et al. (2005) used an interspecific heterosis was due to gene action of only a
hybrid between common wheat (Triticum small set of maize genes (Song and Messing,
aestivum L., 2n = 42, AABBDD) line 3338 and 2003; Guo et al., 2004; Auger et al., 2005).
spelt (Triticum spelta L., 2n = 42, AABBDD) Further analysis of allelic variation in
line 2463, which is highly heterotic both for gene expression in the maize hybrid and
aerial growth and for root-related traits. In its parental lines (B73 and Mo17) identi-
their research they included an expression fied a subset of 27 genes that are differen-
assay using modified suppression subtrac- tially expressed in parental lines. When
tive hybridization (SSH) to generate four the transcriptional contribution of each
subtracted cDNA libraries between the allele from the inbred line was analysed
wheat hybrid and its parental genotypes. in the hybrid, the majority of the differen-
Of the 748 non-redundant cDNAs obtained, tial expression was observed to be due to
465 cDNAs had high sequence similarity to cis-regulatory variation and not due to dif-
GenBank entries in diverse functional cate- ferences in trans-acting regulatory factors.
gories, such as metabolism, cell growth and This suggest a predominance of additive
maintenance, signal transduction, photosyn- expression and a lack of epistatic effects,
370 Chapter 9
in the early 1900s, various combinations of components in hybrid breeding for many
within-locus and inter-locus interactions crops. Introgressing exotic germplasm is
(especially dominance-by-dominance inter- often suggested as an approach to increase
action) could contribute to the genetic con- genetic differences between opposing heter-
trol of heterosis. For a specific cross and otic populations, thereby potentially increas-
specific trait, heterosis might be explain- ing heterotic response. An understanding of
able by any single type of these interactions heterotic relationship between populations
(Xu, Y., 2003). For different crosses, spe- is needed to exploit exotic germplasm intel-
cies, or traits, however, their heterosis has ligently. Melchinger and Gumber (1998)
to be explained by the dominance of differ- reviewed the development of heterotic
ent degrees in combination with all possi- groups in five major crops with different pol-
ble inter-locus interactions, as indicated by lination systems: allogamous maize and rye;
Goldman (1999). A full understanding of partially allogamous faba bean and oilseed
heterosis will depend on cloning and func- rape; and autogamous rice.
tional analysis of all genes that are related to A possible explanation for heterotic
heterosis. This process would be very simi- groups is that populations of divergent
lar to that for understanding disease resist- genetic backgrounds have unique allelic
ance genes that functionally appear much diversity that may have arisen from founder
simpler than heterosis. effects, genetic drift, or the accumulation of
unique allelic diversity by mutation or selec-
tion. Significantly greater heterosis could
result from this genetic diversity by specific
9.7.2 Heterotic groups interallelic interactions (overdominance),
repulsion-phase linkage among loci show-
Heterotic groups are the backbone of suc- ing dominance (pseudo-overdominance)
cessful hybrid breeding. In most cases, (Havey, 1998) and/or inter-locus interaction
breeding for heterosis without knowledge (epistasis). Apparently, the most obvious
of heterotic patterns has proven to be a potential heterotic groups are either geo-
hit-or-miss approach (Jordaan et al., 1999). graphically separated populations or sepa-
The concept of heterotic groups or heterotic rate subspecies and ecotypes. Melchinger
pools was first developed in maize, based and Gumber (1998) recommended the fol-
on the observation that inbreds selected out lowing criteria for the identification of
of certain populations tended to produce heterotic groups and patterns in descend-
better performing hybrids when crossed to ing order of importance: (i) high mean per-
inbreds from other groups (Hallauer et al., formance and large genetic variance in the
1988). This recognition resulted from the hybrid population to ascertain future selec-
systematic crossing of thousands of inbred tion response; (ii) high per se performance
lines from different source populations and and good adaptation of both or at least one
evaluation of the hybrids (Havey, 1998). of the parental heterotic groups; (iii) low
In the review of capturing heterosis in for- inbreeding depression in the source mate-
age crop cultivar development, Brummer rials for the development of inbreds; and
(1999) indicated that the key to successful (iv) a stable CMS system without deleteri-
semi-hybrid production is to keep heterotic ous side effects, as well as effective restorers
groups separate, only intercrossing them for and maintainers, if hybrid breeding is based
testing and release. Breeding highly heter- on CMS.
otic hybrids largely depends on selection
of desirable parents as a prerequisite for Construction of heterotic groups based
most hybrid breeding programmes and thus on hybrid performance
depends on genetic diversity in the germ-
plasm resources available to plant breed- With large numbers of inbred or open-
ers. Therefore, construction or development pollinated lines or populations available, it
of heterotic groups has been one of the key is not feasible in most crops to make diallel
372 Chapter 9
crosses and produce sufficient F1 seed for genetically balanced sets of crosses, inter-
multi-environment field-testing. Therefore, group hybrids out-yielded the respective
Melchinger and Gumber (1998) suggested intra-group hybrids by 21% in RYD LSC
a multi-stage procedure to identify heter- crosses (Dudley et al., 1991) and by 16% in
otic groups, which consists of the following flint dent crosses (Dhillon et al., 1993).
steps: (i) grouping the germplasm based on In both studies, the percentage of increase
genetic similarity; (ii) selection of represent- in heterosis for yield of inter-group over
ative genotypes (e.g. two or four lines or one intra-group crosses was about twice as large
population) from each subgroup for produc- as for the hybrid yield itself. Most heterotic
ing diallel crosses; (iii) evaluation of dial- grouping reports are for maize, with only
lel crosses among the subgroups together very few on other crops including summer
with parents in replicated field trials; and squash (Anido et al., 2004) and rapeseed
(iv) selection of the most promising cross (Qian et al., 2007).
combinations as potential heterotic patterns Rice might be the only crop where
using the identification criteria. If estab- hybrids are widely grown but very few
lished heterotic patterns are available, using studies on heterotic groupings have been
selected elite genotypes from them as test- reported. Heterosis in rice has been uti-
ers for the production and evaluation of the lized largely through CMS. Fortunately, rice
germplasm to be classified is recommended. breeders in China identified the restorers for
Based on the testcross performance, popula- CMS from geographically distant rice culti-
tions or lines having similar combining abil- vars from South-east Asia and used them in
ity and heterotic response could be merged hybrid rice breeding. This resulted in high
to constitute a new independent heterotic levels of heterosis among intra-subspecies
group, if they behave differently from the (indica indica) hybrids. A large-scale
existing heterotic groups; however, if their screening of diverse CMS maintainers and
behaviour is similar to an existing heter- restorers provided some clue as to heter-
otic group, they could be merged with it to otic pattern. Three ecotypes from different
enlarge its genetic base. Heterotic patterns subspecies, indica, japonica and javanica,
in many crop species have been established have different morphological and physi-
solely based on the large numbers of test- ological characteristics and ecogeographi-
crosses and breeding experience, without cal distribution and, therefore, serve as a
the use of molecular markers. basis for defining distinct heterotic groups
Ron Parra and Hallauer (1997) reviewed (Xu, Y., 2003). As summarized by Yuan
heterotic patterns used in the major maize (1992), heterosis for grain yield in crosses
production regions of the world. Some pat- among the three rice ecotypes has the fol-
terns have had importance in specific pro- lowing trend: indica japonica > indica
duction regions. Others have been exploited javanica > javanica japonica > indica
on several continents, for example, the het- indica > japonica japonica. This mirrors
erotic patterns based on Reid Yellow Dent the current situation of heterotic pools in
(RYD) and Lancaster Sure Crop (LSC) rice. It is well known to hybrid rice breed-
from the temperate USA and Tuxpeo and ers that a high level of heterosis results from
Estacin Tulio Ospina from tropical Mexico crosses between CMS lines bred in China
and South America. Two heterotic groups and restorer lines derived from South-east
from which inbreds commonly are selected Asian indica cultivars, which is the heter-
and used to produce superior maize hybrids otic pattern for indica indica hybrids.
are Iowa Stiff Stalk Synthetic (BSSS) and
derivatives of LSC (Darrah and Zuber, Construction of heterotic groups using
1986; Gerdes and Tracy, 1993). Although molecular marker information
both populations are primarily comprised
of southern dent germplasm, LSC has Molecular markers have been playing an
more northern flint germplasm than BSSS increasingly important role in the construc-
(Smith, 1986; Gerdes and Tracy, 1993). With tion of heterotic groups since the 1990s. Most
Marker-assisted Selection: Practice 373
Component 2 (12%)
heterotic hybrids. In general, heterotic groups 2
constructed on the basis of marker informa-
tion match up very well with pedigrees, but 0
have the advantage that missing historical
2
information, such as the incomplete pedigree
information or ambiguous pedigree, will not
4
affect the marker-based method. SS
In maize, different types of molecular 6
markers have been successfully used to dif- NSS
ferentiate heterotic groups with results that 8
8 6 4 2 0 2 4 6 8 10 12
are consistent with pedigree-based grouping
Component 1 (31%)
(Mumm and Dudley, 1994; Liu et al., 1997;
Peng et al., 1998; Wu et al., 2000; Menkir Fig. 9.3. A plot of the inbred scores on the
et al., 2004). Based on heterosis and com- first two principal components from analysis of
bining ability analyses using cultivars from SSR marker profiles of the parents of the maize
different heterotic groups, Peng et al. (1998) hybrids (SS, Stiff Stalk Synthetic inbred line; NSS,
proposed seven heterotic patterns for the Non Stiff Stalk Synthetic inbred line). The large
utilization of maize heterosis. Divergence boundaries distinguish three main groups of lines:
at molecular marker loci has been useful in Old, the old inbred lines used before the formation
of the heterotic groups; the other two groups
assigning maize inbreds to known heterotic
represent SS and NSS inbred lines. The arrows
groups previously established in breed-
indicate the direction of the progression of inbred
ing programmes and the molecular infor- improvement in the SS and NSS heterotic groups.
mation agreed with pedigree information From Cooper et al. (2004) with permission.
(Lee et al., 1989; Melchinger et al., 1991;
Messmer et al., 1993). Side-by-side pheno-
typic evaluation of a sequence of successful Using 160 RFLP markers and 21 wide-
maize hybrids produced by Pioneer Hi-Bred compatibility cultivars and three indica and
International, Inc., representing each dec- three japonica cultivars, Zheng et al. (1994)
ade from the 1930s to present, provides a constructed a dendrogram tree and discussed
description of the phenotypic changes for the potential of wide compatibility in hybrid
a number of key traits that the breeders breeding using indica japonica crosses.
have directly or indirectly changed. Genetic Based on diallel crosses among eight indica
fingerprints of the inbred parents of these lines representing the parents of the best-
hybrids provide a description of the geno- performing commercial rice hybrids grown
typic changes that have occurred in asso- in China, Zhang et al. (1995) studied molec-
ciation with the sustained breeding effort ular divergence and hybrid performance.
(Fig. 9.3; Cooper et al., 2004). Important Their results suggest the existence of two het-
phases can be identified over this period erotic groups within indica, one comprised
of breeding. Initially double-cross hybrids of rice strains from southern China and the
(1920s1960s) were developed. From the other comprised of strains from South-east
1960s there was a relatively rapid transi- Asia. Using two types of molecular markers,
tion to the use of single-cross hybrids, the RFLPs and amplified fragment length poly-
foundation of which was the organization of morphisms (AFLPs), Mackill et al. (1996)
the maize germplasm into heterotic groups, obtained similar grouping results. Using
represented in this example by the Stiff RAPD and SSR markers, Xiao et al. (1996b)
Stalk Synthetic (SS) and the Non Stiff Stalk separated the ten parental lines into two
Synthetic (NSS) groups (Fig. 9.3). major groups that correspond to indica and
374 Chapter 9
japonica subspecies. These results and the ing, maintaining and improving heterotic
results from barley (Melchinger et al., 1994) groups. As discussed above, marker-based
and wheat (Sun et al., 1996; Ni et al., 1997) grouping of germplasm and breeding
also supported the conclusion that DNA populations will help establish heterotic
markers are very useful tools for construc- groups that hold maximum genetic diversity
tion of heterotic groups. between groups but minimum diversity
within groups. Identification of marker
Future direction alleles that are specific to each heterotic
group will help keep them genotypically
It is evident from the review of various stud- separated. MAS can be used to improve the
ies that adapted populations, isolated either existing heterotic groups through introgress-
by time and/or space, are the most suitable ing target genes from one heterotic group or
candidates for promising heterotic patterns. outsource germplasm to another with mini-
Genetic diversity can be related to geographic mum linkage drag from the donor.
origin of parental lines. The geographical
variation can be related to ecological and
environmental variations that, in turn, dic-
tate survival fitness, created by spontaneous 9.7.3 Marker-assisted hybrid prediction
and induced genetic variation in natural and
directed-selection situations. Consequently, It is reasonably believed that heterosis
the parental lines derived from different originates, in some way, from the genetic
geographic origins are considered to have differences or heterozygosity between
more genetic diversity than those derived the parents. Theoretically, hybrid per-
from the same geographic origin. During formance is equal to the average parental
internationalization of plant breeding efforts performance plus heterosis. In the past sev-
and massive exchange of unimproved and eral decades, hybrid prediction has been
improved germplasm throughout the world largely based on the evaluation of genetic
attention needs to be paid to avoid the nega- diversity among parental lines. It has been
tive effect of using distant crosses that might expected that understanding the relation-
mix up heterotic groups existing among cul- ship between heterozygosity/parental dif-
tivars of different geographic origins. For ference and heterosis would help predict
example, breeding wide-compatible inbred hybrids. The development of molecular
cultivars as a bridge for harnessing indica/ marker techniques has provided new tools
japonica heterosis in rice has reduced het- for hybrid prediction and DNA markers
erosis compared to what would be expected have been used extensively in investigating
from crosses between typical indica and correlations between parental genetic dis-
japonica cultivars (Xu, Y., 2003). tance (GD) and hybrid performance.
Heterotic groups should not be consid-
ered as closed populations, but should be Genome-wide heterozygosity
broadened continuously by introgressing and hybrid prediction
unique germplasm to warrant medium- and
long-term gains from selection. Heterotic The relationship between parental genetic
groups consisting of poorly utilized and divergence and hybrid performance was
unadapted germplasm should be enhanced first studied in maize. Variability for molec-
through joint publicprivate breeding ven- ular markers generally agreed with pedi-
tures. Different phenotypes may or may gree information and assignment (based on
not reflect divergent genetic backgrounds. hybrid performance) to known heterotic
Phenotypically different populations may groups (Smith, O.S. et al., 1990; Dudley
possess the same genetic background and et al., 1991; Melchinger et al., 1991); how-
divergent phenotypes may be conditioned ever, variability at molecular marker loci
by allelic differences at relatively few loci was ineffective in predicting specific
(Havey, 1998). MAS can be useful in creat- hybrid performance from crosses among
Marker-assisted Selection: Practice 375
maize inbreds (Lee et al., 1989; Melchinger cant and positive for all traits of within-
et al., 1992). Some reports indicated high group hybrids, flint flint crosses, but not
correlation between hybrid performance/ for the subset of flint dent and dent dent
heterosis and parental GDs or the degree of crosses (Boppenmaier et al., 1993). This
heterozygosity (Lee et al., 1989; Smith, O.S. was supported by Benchimol et al. (2000)
et al., 1990; Stuber et al., 1992; Reif et al., using 18 tropical maize inbred lines where
2003), while others revealed very weak cor- correlations of parental GDs with single
relations (Godshalk et al., 1990; Dudley crosses and their heterosis for grain yield
et al., 1991). Correlations between single- were higher for line crosses from the same
cross performance and molecular marker heterotic groups than the crosses from dif-
diversity for unrelated parental inbreds ferent heterotic groups. In rice, Xiao et al.
have been too low to be of any predictive (1996b) reported that yield potential and
value (Godshalk et al., 1990; Melchinger its heterosis showed significantly positive
et al., 1990; Dudley et al., 1991), which is correlations with GD for indica indica or
also supported by the result from sorghum japonica japonica crosses, but the cor-
(Jordan et al., 2004). Molecular-based GD relations were not significant for indica
estimates also failed to predict superior japonica crosses. It was confirmed by Zhao
hybrid performance in oat (Moser and Lee, et al. (1999) that very little correlation was
1994), soybean (Gizlice et al., 1993), chick- detected in intersubspecific crosses using
pea (Sant et al., 1999) and pepper (Geleta diallel crosses derived from 11 elite rice
et al., 2004). A recent large-scale experiment cultivars. In other cases, however, weak
in maize also supported this unpredictabil- or no correlation was found for within-
ity. Using three sets of six sister-line inbred group hybrids. Examples include weak or
lines, each set being highly related and no significant associations of GD with F1
derived from a common parent cross and performance and mid-parent heterosis in
45 sister-line hybrids generated by a partial soybean (Cerna et al., 1997), wheat (Martin
diallel, Lee, E.A. et al. (2007) re-examined et al., 1995) and US long-grain rice cultivars
the relationship between degree of related- (Saghai Maroof et al., 1997). These results
ness, genetic effects and heterosis in maize. may be due to the low levels of heterosis in
The three sets of sister lines ranged between these cultivar groups.
47 and 77% identical-by-descent, creating a Based on results from various studies
series of lines that potentially vary in gene in maize, Melchinger (1993) summarized
frequency. They reported three relevant the relationship between parental GD and
findings regarding heterosis for grain yield: mid-parent heterosis (MPH) in a sche-
(i) substantial genome-wide heterozygos- matic representation. For crosses among
ity is not a requirement for the expression related lines, there exists a tight association
of heterosis; (ii) there is not a consistent between GD and MPH for yield characters
relationship between degree of relatedness because both measures are a linear func-
and the magnitude of heterosis; and (iii) the tion of co-ancestry, f, and thus decrease
presence of non-additive genetic effects is with increasing f. For intra-group crosses,
not a requirement for the manifestation of the correlation r(GD, MPH) is generally
heterosis. positive, too. This can be explained by
hidden relatedness between some parents
Hybrids are more predictable within than considered to be unrelated based on their
between heterotic groups pedigree and the presence of the same link-
age phase between QTL and marker loci in
Correlations between heterozygosity/GD the maternal and paternal gametic arrays
and hybrid performance/heterosis varied of intra-group hybrids, which results in a
for hybrids between lines that belong to positive covariance between GD and MPH
the same heterotic group (within-group (Charcosset et al., 1991). In contrast, no sig-
hybrids). In maize, correlations of GD with nificant association between both measures
F1 performance and heterosis were signifi- exists for inter-group hybrids. In this case,
376 Chapter 9
the maternal and paternal gametic arrays the latter is that from using marker loci that
may differ in the linkage phase for many are significantly associated with the traits of
QTLmarker pairs; as a consequence, posi- interest revealed by single factorial analysis
tive and negative terms cancel each other of variance. The results from rice indicated
in their net contribution to covariance (GD, that there was a weak correlation between
MPH), resulting in a low or zero correlation general heterozygosity and heterosis but
(Charcosset and Essioux, 1994). a significant correlation between specific
heterozygosity and heterosis for yield and
Heterosis-associated markers and hybrid biomass.
prediction
Favourable allele combination and hybrid
It has been common practice in most stud- prediction
ies to determine GD or heterozygosity esti-
mates from a set of DNA markers chosen for Heterogenic gene combinations may not
good coverage of the entire genome but not always lead to heterosis and heterosis
for linkage to genes influencing heterosis of may ultimately depend upon the balance
the target trait. Theoretical investigations between favourable and unfavourable inter-
(Charcosset et al., 1991) and computer mod- actions of genes. It is reasonably inferred that
elling (Bernardo, 1992) demonstrated that heterosis could be caused by specific gene
with intra- and inter-group crosses the cor- combinations derived from the two parents.
relation between GD and MPH is expected Those genes may simultaneously produce
to decrease if genes influencing heterosis different genetic effects in different genetic
are not closely linked to markers used for backgrounds. So, for parental improvement
calculation of genetic estimates and vice and hybrid prediction, investigating the
versa if markers employed for calculation of specific gene combinations that contribute
GDs are not linked to genes controlling the to heterosis should be more important than
trait. Hence, increasing the marker density studying any single gene or QTL. Using 99
alone will not necessarily improve the abil- half-diallel rice hybrids derived from nine
ity to predict MPH by GD estimates; rather, CMS lines and 11 restorer lines, Liu and
markers must additionally be selected for Wu (1998) found that four favourable alle-
tight linkage to genes affecting heterosis les and six favourable heterotic patterns on
of the target trait in the germplasm under the parental lines significantly contributed
study. This is corroborated by comparison to the heterosis of their hybrids for grain
of results obtained with 209 AFLPs versus yield, whereas six unfavourable alleles and
135 RFLPs (Ajmone Marsan et al., 1998) and six unfavourable heterotic patterns signifi-
a study by Dudley et al. (1991). Using these cantly reduced heterosis. They suggested
associative loci will help establish strong that optimal hybrids with superior grain
correlations between heterozygosity and yield could be developed by assembling
heterosis. However, allelic differences at those favourable alleles into and removing
marker loci do not assure allelic differences the unfavourable alleles from their parental
at linked loci for heterosis. For a limited lines.
number of markers to be useful as predic-
tors for hybrid performance, the effects of Conclusions and prospects
alleles at the loci linked to specific marker
alleles must be ascertained (Stuber et al., There are several conclusions that can be
1999). drawn from the numerous investigations
Zhang et al. (1994) proposed two sta- on the relationships between heterozygos-
tistical parameters, general and specific ity and GD with hybrid performance and
heterozygosity, to measure genotypic heterosis. First, the higher the heterozygos-
heterozygosity. The former is the hetero- ity between the parents, the stronger the
zygosity calculated from the GDs between heterosis is. Secondly, using more mark-
the parents using all possible markers and ers alone will not improve the prediction.
Marker-assisted Selection: Practice 377
Thirdly, prediction is possible using mark- the approaches that could be exploited
ers known to be associated with hybrid further to improve the prediction of hybrid
performance or heterosis if the association performance/heterosis using molecular
is used to predict performance of a hybrid markers. Understanding genetic variation
derived from the same heterotic pattern. among cultivars to be tested and identify-
Fourthly, genetic variation (the presence of ing markers associated with heterosis and
heterosis) is a prerequisite for prediction. heterosis-related traits are two important
Fifthly, the relationship of heterozygosity components in hybrid prediction. We should
with heterosis and with hybrid performance keep in mind that markerheterosis associa-
will be different if the two involve differ- tions identified in one cross may not be suit-
ent genes (Xu, Y., 2003). The last conclu- able for selection in others because heterosis
sion was supported by results of Zhu et al. could be controlled by many genes and each
(2001) that heterosis was highly significant cross has different genes and gene combina-
but hybrid performance was not when 57 tions in action.
rice accessions from six ecotypes and their Despite their low values, the inbred-
hybrids were genotyped by 48 SSR and 50 hybrid yield correlations were positive.
RFLP markers. It is anticipated that predic- They indicated a tendency for high-yielding
tion could be possible if heterozygosity is inbreds to produce high-yielding hybrids.
derived from specific marker loci that are Hybrid breeding is always accompanied by
associated with heterosis and hybrid per- the improvement of parental lines. Modern
formance and all possible associated loci maize inbreds, grown at todays high den-
have been identified and their effects and sity, can yield nearly as much as hybrids
interactions clearly defined. of the 1930s (Duvick, 1984; Meghi et al.,
Considering the fact that only heterotic 1984). Duvick (1999) has suggested that if
crosses are of commercial importance and as much effort had been put into improve-
of interest to the breeder, the practical value ment of open-pollinated varieties (OPVs)
of the genetic distance approach for predic- as has been devoted to hybrid improve-
tion of heterosis and hybrid performance is ment over the years, the gap between the
limited (Vuylsteke et al., 2000). This is true best hybrids and the best OPVs might be
for some crop species like maize. For rice, less than what it currently is. Some authors
however, the reproductive barrier between even argue that OPVs might be superior to
the two subspecies, indica and japonica, hybrids (Lewontin and Berlan, 1990), but
has enforced a limitation on the utiliza- their assumption is not backed up by data.
tion of indica/japonica heterosis, although The potential application of DNA mark-
the use of the wide-compatibility gene(s) ers in hybrid breeding depends very much
has had a great impact on the limitation. upon whether divergent heterotic groups
Hybrid breeding for indica rice has been have been established or not and upon crop
based on crosses within the indica group. species. If well-established heterotic groups
The strong relationship between the hetero- are unavailable, marker-based GD estimates
zygosity at marker loci and heterosis within can be used to avoid producing and test-
the indica group as reported before (Xiao ing crosses between closely related lines.
et al., 1996b) indicates that GD estimates Furthermore, crosses with inferior MPH
based on molecular markers could be very could be discarded prior to field-testing
useful in assigning indica cultivars into based on prediction. Another potential
different subgroups for hybrid indica rice application exists. If new lines of unknown
development. heterotic pattern or inbreds developed from
Screening for heterosis-related molecu- crosses between parents from different het-
lar markers as suggested by Melchinger et al. erotic groups (e.g. commercial hybrids) are
(1990b), using specific heterozygosity pro- to be evaluated for testcross performance,
posed by Zhang et al. (1994) and identifying GD estimates could assist the breeder in the
favourable combinations of allele and heter- choice of appropriate testers for evaluating
otic patterns (Liu and Wu, 1998) are among the combining ability of the lines.
378 Chapter 9
9.8 Opportunities and Challenges easier to apply on a large scale, MAS can be
carried out for all genes related to import-
Plant breeding has generally accounted for ant target traits and using information from
one-half of the increases in productivity of genotyping of all germplasm in the breeding
the major crops and the future will continue system.
to depend on its advances. However, the
rate, scale and scope of uptake of genomics
in crop breeding programmes have continu-
9.8.2 Crop-specific issues
ally lagged behind expectations. This is lit-
tle different to the adoption of quantitative
genetics, mechanization and computeriza- The bottlenecks in MAS could be specific to
tion during the last century. This is partly crops except for those discussed in the previ-
due to the long product development cycle ous section. For example, a possible limita-
in plant breeding and in turn the long-term tion of MAS with maize is the structure and
nature of feedback from the market regard- content of various gene pools. Examples of
ing the impact of any changes in the culti- maize gene pools would include European
var development pipeline. Opportunities flint and dent germplasm, US dents and
and challenges we are facing in MAS will various heterotic groups within each of
be discussed in this section. these and other larger pools. Surveys with
DNA markers have established differences
among such groups of germplasm (Smith
and Smith, 1992; Niebur et al., 2004). In
9.8.1 Molecular tools and breeding addition, the efficacy for MAS in relatively
systems complex populations such as synthetics and
OPVs has not been investigated.
The prerequisite for increasing accessibility For open-pollinated crops, breeding for
of MAS to breeders is developing a highly complex traits is limited by an additional
efficient breeding system, particularly bottleneck that there is no standardized
for resource-limited plant breeding pro- protocol available for MAS that can be auto-
grammes in developing countries. Several matically applied to the various breeding
strategies can be used to establish such a systems required for development of inbred,
system through the use of MAS, includ- hybrid, population and synthetic cultivars,
ing: (i) selection at early breeding stages where the material at many stages in the
to eliminate most segregants, particularly breeding process is highly heterogeneous
for highly inheritable traits; (ii) selection and highly heterozygous. This is very dif-
at early developmental stage using high- ferent from breeding systems for inbred
selection pressure and an optimized selec- crops such as wheat that almost always
tion rate, particularly for large-size plants; start and end with inbred lines (Koebner
(iii) one-step selection for multiple traits and Summers, 2003) and rice that may
using high-throughout genotyping; (iv) utili- start with inbreds and ends with inbreds or
zation of cost-effective genotyping systems; inbred-based hybrids (Xu, Y., 2003). Thus,
(v) highly efficient phenotyping, sample MAS efforts in open-pollinated crops can
tracking and data acquisition; (vi) develop- consist of two simultaneous approaches,
ment and utilization of quick fixation and one using the MTAs that have been identi-
stabilization approaches; and (vii) genotyp- fied previously and the other based on an
ing once and phenotyping multiple times. integrated genetic diversity analysis, MTA
To increase accessibility of MAS to breeders, analysis and MAS approach to discover,
the most important thing is to build skills validate and apply new marker associations
and capacity in developing countries and to all in the same breeding populations albeit
develop decision support tools to facilitate at different generations. It is a challenge
MAB programmes. Over the next decade, to make MAS applicable from the earliest
MAS technologies will become cheaper and possible stages of the breeding programme
Marker-assisted Selection: Practice 379
while giving the flexibility to sequentially detect (in the right genetic material) and, be
improve the power of the MAS as data less influenced by GEIs and genetic back-
accumulates and information is integrated ground effects. Of great importance will be
through subsequent breeding processes. a shift away from analysis of entire genetic
populations to an emphasis on selected
individuals with extreme phenotypes from
relevant breeding populations and genetic
9.8.3 Quantitative traits stocks and likely, pooled DNA analysis
using the selected individuals (Xu and
Traditionally the heritability of quantita- Crouch, 2008). Of equal importance will be
tive traits was the most common predictor a shift from linked markers to diagnostic
of genetic gains for different plant breeding gene-based markers, which will generally
methods. DNA markers may be used today be SNP-based and thus readily scalable for
to accelerate and enhance overall breed- high-throughput haplotyping.
ing methods by combining DNA marker
and phenotyping data in a selection index.
Geneticists and plant breeders need to deal
with linkage disequilibrium while using 9.8.4 Genetic networks
MAS in recurrent selection, especially
when using polymorphic markers arising The potential for MAS to contribute to
from mapping populations, which tend to improvements in crops should increase in
be from diverse parents and thus may not parallel with our understanding of the rela-
be relevant for target breeding materials. tionships among genomes, the environment
The power of MAS will also continue to and phenotypes. Candidate transgenes will
rely heavily on the accuracy and precision be developed on a regular basis and their
of phenotyping and the characterization contributions to crop improvement will be
and evaluation of germplasm in the field. realized in the most efficient manner with
Issues such as the error term to test for MAS. Likewise, the identification of candi-
the significance of a QTL, detecting small date native genes and their gene products
effects with narrow genetic variance, or and functions and of other DNA sequences
the number of QTL not related to genetic (e.g. micro-RNA (miRNA), matrix attach-
variance or divergence of parents are all ment and regulatory regions), will improve
under-researched areas that need priority the power of methods such as association
attention by geneticists. Addressing these mapping and genome scans to assess their
issues will allow plant breeders to define genotypic value in the context of defined
the optimum number of individuals/lines reference populations of significance to
and markers to be used in their MAS plant breeding.
programmes. Plants exhibit massive changes in gene
Plant breeders are ready to apply MAS expression during morpho-physiological
for quantitative traits when the genetic gain and reproductive development as well as
and time or cost efficiency from doing so when exposed to a range of biotic and abi-
are clearly higher than through PS meth- otic stresses. A new field of genetics of glo-
ods. Initial emphasis in this area should be bal gene expression has emerged based on
on traits for which a robust cost-effective the application of traditional techniques
phenotyping system is not available. To of linkage and association analysis for
quickly reach this stage requires a para- the thousands of transcripts measured by
digm shift in strategy among the marker microarrays. Dissecting the architecture of
trait identification community: from efforts quantitative traits in this way connects DNA
to identify all QTL influencing the target sequence variation with phenotypic vari-
trait to a focus on identification of a few ation and is improving our understanding
QTL having the largest effect on the target of transcriptional regulation and regulatory
trait. QTL of major effect may be easier to variation (Rockman and Kruglyak, 2006).
380 Chapter 9
a b c
A
120 120 A 120
100 100 100 A
Yield
Yield
Yield
80 B 80 B 80
60 60 60 B
40 40 40
20 20 20
E1 E2 E1 E2 E1 E2
Environment Environment Environment
Fig. 10.1. The relative performance of two cultivars (A and B) in two environments (E1 and E2).
(a) No GEI is present. (b) GEI is present but does not alter genotypic ranking. (c) GEI is present and
alters genotypic ranking. Modified from Allard and Bradshaw (1964).
because the yield differential between required for different row spacings, soil
the cultivars is 50 units in both environ- types or planting dates. (ii) The potential
ments proportionality is maintained, that need for unique cultivars in different geo-
is, the difference between any two geno- graphical areas requires an understanding
types in any two environments is the same. of GEI. The importance of this interaction
GEIs can occur in two ways. (i) The differ- can determine if division of a large geo-
ence among genotypes can vary without any graphical area into subareas is needed and
alternation in their rank, which is referred justified for testing new genotypes and
to as non-crossover interaction. In Fig 10.1b, recommending cultivars to crop produ-
a GEI is present because cultivar A yields cers. (iii) Effective allocation of resources
20 units more than cultivar B in environ- for testing genotypes across locations and
ment E1 but 50 units more in environment years is based on the relative importance of
E2. (ii) The rank among cultivars change genotype location, genotype year and
across environments, which is referred to genotype location year interactions.
as crossover interaction (COI). In Fig 10.1c, (iv) The response of genotypes to variable
cultivar A is more productive in environ- productivity levels among environments
ment E1, but cultivar B is more productive provides an understanding of their stabil-
in environment E2. The most important GEI ity of performance. An understanding of
for the plant breeder is the COI caused by the genotype stability across environments
changes in rank among genotypes. helps in determination of their suitability
Existence of GEI has significant influ- for the fluctuations in growing conditions
ence on the efficiency of crop improvement that are likely to be encountered.
via plant breeding, largely because they There are several key areas in GEI study:
confound comparisons among genotypes (i) methodology for effective environmental
with the environment of test and compli- characterization and classification; (ii) strat-
cate the definition of breeding objectives. egies for partitioning GEIs into repeatable
It is argued that to overcome these con- and non-repeatable components; (iii) experi-
straints to crop improvement we need to mental evidence to quantify the relative effi-
develop an understanding of the differ- ciencies of direct selection for target traits
ences in plant adaptation associated with and indirect selection strategies based on
the differences in performance and in par- crop physiological principles; (iv) integrated
ticular the GEI. GEI is of interest to plant utilization of multi-environment trial data,
breeders for several reasons (Fehr, 1987). pedigree information and genotypic data of
(i) The need to develop cultivars for spe- cultivars; and (v) determination of genetic
cific purposes is determined by an under- loci responsible for GEI and molecular dis-
standing of GEI. Unique cultivars may be section of GEI components. Discussion in
Genotype-by-environment Interaction 383
this chapter is mainly based on several sus low rates of inorganic nitrogen fertili-
important references including Feher (1987), zation? The breeder may have a hypothesis
Romagosa and Fox (1993), Knapp (1994), Xu about the answer to the question on the
and Zhu (1994), Cooper and Hammer (1996), basis of practical experience. It is critical
Bernardo (2002), Chahal and Gosal (2002), that the hypothesis should not be regarded
Kang (2002), Crossa et al. (2004), Cooper as factual, an attitude that can bias the
et al. (2005), van Eeuwijk et al. (2005) and interpretation of the experimental results.
Yan et al. (2007). A MET that involves multiple genotypes,
years and locations is usually required. The
GEI is considered to be absent if all geno-
10.1 Multi-environment Trials types perform similarly across all the envir-
onments, i.e. total variation is explained
only by main effects of environments and
A major objective in plant breeding pro-
genotypes.
grammes is to assess the suitability of
The empirical mean response, yij, of the
individual crop genotypes for agricultural
ith genotype (i = 1,2,,I) in the jth environ-
purposes across a range of agro-ecological
ment (j = 1,2,,J ) with r replications in each
conditions. Appropriate experimental pro-
of the IJ cells is expressed as
cedures are required to understand and
determine the importance of GEI. For this _ _
y ij = m + ti + dj + (td)ij + eij (10.1)
purpose breeders conduct so-called multi-
environment trials (METs). In a MET, a set
where m is the grand mean over all genotypes
of genotypes is evaluated across a number
and environments, ti is the additive effect
of environments that hopefully represent
of the ith genotype, dj is the additive effect
the target environment to select widely
of the jth environment, (td )ij is the non-
or specifically adapted genotypes. As an
additivity, GEI, of the ith genotype in the
example, Table 10.1 provides a MET data
jth environment and eij is the (average)
set for 18 winter wheat cultivars tested at
error assumed normally and independ-
nine Ontario locations in 1993 from Yan
ently distributed, i.e. NID(0, s 2/r), where s 2
et al. (2007). The performance of genotypes
is the within-environment error variance,
in METs is analysed by statistical mod-
assumed to be constant.
els developed to describe and interpret
Except for m, all the terms in Eqn 10.1
genotype-by-environment data (GED). The
are usually treated as random effects. To
statistical analysis should provide estimates
provide a complementary framework for a
for parameters that indicate both how well
genetic interpretation of the observed trait
genotypes perform on average across the
variation, we can also consider the trait
environmental range and how well they per-
phenotypic variation as the combination of
form in specific environmental conditions.
a genetic signal component [ti + (td)ij], an
environmental context component (dj) and
an environmental noise component (eij).
10.1.1 Experimental design For the variance-covariance (VCOV) struc-
ture of the error term, eij, various choices
An understanding of the steps involved are possible, the simplest being that eij is
in the design, implementation, analysis independently identically normally dis-
and interpretation of METs can be useful. tributed. The terms of Eqn 10.1 can also be
Planning of any experiment begins with a considered as fixed effects depending on
statement of the concept or hypothesis to be the sampling methods used and the gen-
evaluated, sometimes phrased in the form eral purpose of the study. For example, if
of a question. Is the relative performance environment refers to locations, then they
among genotypes different with conserva- may be considered a fixed effect when they
tion tillage versus conventional tillage? Do are not randomly chosen from all possible
genotypes respond differently to high ver- sites in an area, while if the environment
384 Chapter 10
Table 10.1. Mean yield (Mg ha1) of 18 winter wheat cultivars (G1G18) tested at nine Ontario locations
(E1E9) in 1993 (from Yan et al. (2007) with permission).
Test environments
Genotypes E1 E2 E3 E4 E5 E6 E7 E8 E9 Mean
G1 4.46 4.15 2.85 3.08 5.94 4.45 4.35 4.04 2.67 4.00
G2 4.42 4.77 2.91 3.51 5.70 5.15 4.96 4.39 2.94 4.31
G3 4.67 4.58 3.10 3.46 6.07 5.03 4.73 3.90 2.62 4.24
G4 4.73 4.75 3.38 3.90 6.22 5.34 4.23 4.89 3.45 4.54
G5 4.39 4.60 3.51 3.85 5.77 5.42 5.15 4.10 2.83 4.40
G6 5.18 4.48 2.99 3.77 6.58 5.05 3.99 4.27 2.78 4.34
G7 3.38 4.18 2.74 3.16 5.34 4.27 4.16 4.06 2.03 3.70
G8 4.85 4.66 4.43 3.95 5.54 5.83 4.17 5.06 3.57 4.67
G9 5.04 4.74 3.51 3.44 5.96 4.86 4.98 4.51 2.86 4.43
G10 5.20 4.66 3.60 3.76 5.94 5.35 3.90 4.45 3.30 4.46
G11 4.29 4.53 2.76 3.42 6.14 5.25 4.86 4.14 3.15 4.28
G12 3.15 3.04 2.39 2.35 4.23 4.26 3.38 4.07 2.10 3.22
G13 4.10 3.88 2.30 3.72 4.56 5.15 2.60 4.96 2.89 3.80
G14 3.34 3.85 2.42 2.78 4.63 5.09 3.28 3.92 2.56 3.54
G15 4.38 4.70 3.66 3.59 6.19 5.14 3.93 4.21 2.93 4.30
G16 4.94 4.70 2.95 3.90 6.06 5.33 4.30 4.30 3.03 4.39
G17 3.79 4.97 3.38 3.35 4.77 5.30 4.32 4.86 3.38 4.24
G18 4.24 4.65 3.61 3.91 6.64 4.83 5.01 4.36 3.11 4.48
Mean 4.36 4.44 3.14 3.49 5.68 5.06 4.24 4.36 2.90 4.19
refers to years then they can be considered replicated. In this case, it is an augmented
as randomly chosen. If years and locations design; it is a perfectly legitimate design,
are typically representing a normal combi- although the precision is lower.
nation of years and locations they can be
perfectly considered as random effects.
The genotypes chosen for an assessment
of possible interactions are an important 10.1.2 Basic data analysis
consideration in designing the experiment. and interpretation
Some analyses of GEI are not based on an
experiment specifically designed for that For all MET data, basic analyses should
purpose, particularly the assessment of the include the calculation of mean values,
importance of interactions with locations determination of the statistical significance
and years. Instead, breeders utilize data of the sources of variation and estima-
from test genotypes including cultivars, tion of appropriate variance components.
hybrids, populations and experimental The sources of variation in an experiment
lines that have been evaluated over loca- are partitioned into main effects and their
tions and years as a part of normal testing interactions (Table 10.2). The mean squares
programmes. for the sources of variation are determined
It is desirable to have at least two repli- and appropriate F-tests are conducted to
cations in each location and year to obtain assess the probability that a source of varia-
an estimate of experimental error so that it is tion is significant. Components of variance
possible to test the significance of the inter- can be calculated for the main effect of the
actions of interest. Any additional replica- genotypes and their interactions with the
tions will allow a more reliable estimate of locations and years. Standard errors can be
the experimental error. However, sometimes computed for each variance component.
resources are not available for replicating Data interpretation includes the statisti-
all genotypes so that only some entries are cal significance of various variation sources
Genotype-by-environment Interaction 385
Table 10.2. Analysis of variance for experiments in an annual crop with different numbers of locations
and years (from Johnson et al. (1955) with permission).
and their practical implications. The geno- testing programmes. The cost of establish-
type location interaction measures the ing independent programmes for different
consistency of performance among geno- geographical areas is substantial; therefore,
types at different locations. The consistency the decision can be difficult. Before estab-
of performance of genotypes in different lishing independent breeding programmes,
years is indicated by the genotype year the breeder should make a detailed exami-
interaction. The genotype location year nation of the environmental factors respon-
interaction measures the consistency of the sible for the genotype location interaction.
genotype location interaction across years. As suggested by Fehr (1987), if the differ-
For all of these mentioned interactions, an ences among locations are due to soil type
examination of mean values is necessary to or other factors that are consistent from year
determine if a significant interaction is due to year, independent programmes may be
to a change in rank among genotypes or to appropriate. Temporary differences among
changes in the differences among genotypes locations associated with unusual climate
without rank change (ref. Fig. 10.1). conditions would not justify this.
Another consideration in determining
Genotype location interaction the implications of genotype location
interaction is that fluctuations in rank may
Wide fluctuations in the rank of genotypes not preclude selection of superior genotypes
across test locations suggest that it may be for multiple locations. Assume that a group
desirable to develop genotypes for different of genotypes are divided into three classes:
locations through independent selection and good, intermediate and poor. A genotype
386 Chapter 10
location interaction could be caused by fluc- mode was applied to two data sets. Data
tuations in rank among genotypes within the set 1 comprised genotype (25) location
three classes, but not among classes. Such (4) sowing time (4) interaction with eight
an interaction would be unlikely to justify traits measured. The structure of data set 2 is
the establishment of breeding programmes genotype (20) irrigation regimes (4) year
for independent locations, at least for the (3) on grain yield. Their results showed that
initial stages of testing. the three-way AMMI analysis gave sensible
and useful information that have otherwise
Genotype year interaction been unavailable to the breeder in relation
to the differential responses of genotypes in
An inconsistent ranking among genotypes different locations and in several years and
grown in different years is in some regards the different relationship between locations
more difficult to deal with than a genotype in different years.
location interaction. A breeder does not
have the option of establishing independ-
ent breeding programmes for different years
(Fehr, 1987). The primary option available 10.2 Environmental Characterization
is to identify genotypes that exhibit supe-
rior performance on the average across The objectives of GED analysis (i.e. MET data
years. This involves the testing of geno- for a single trait) should include three major
types in several years before selection of aspects: (i) mega-environment analysis;
one for release as a cultivar. To reduce the (ii) test-environment evaluation; and
length of time for genetic improvement, (iii) genotype evaluation (Yan and Kang,
multiple locations in 1 year often are used 2003), all of which are associated with
as a substitute for years. The substitution environmental characterization. Yan et al.
is only effective when the range of climate (2007) use the yield data listed Table
conditions among locations in single years 10.1 as an example to illustrate the three
is comparable to that among years. aspects of bi-plot analysis. When supple-
mental information (e.g. data on environ-
Genotype year location interaction mental or genotypic covariates) is available,
a fourth aspect, which is to understand the
This interaction can first be used to test causes of genotype main effect (G) and GEI
if the genotype location interaction is (Yan and Kang, 2003; Yan and Tinker, 2006)
repeatable across years and thereby mega- can be included as described in Section
environments can be established. It can be 10.3.2.
used secondly when there are fluctuations Environmental characterization in-
in the ranking of genotypes associated with volves definition of the key factors which
individual locationyear combinations. influence both performance level and
Here the breeder must identify genotypes the relative performance of genotypes in
with superior average performance over an experiment, as well as assessment of
locations and years. When METs are per- the relevance of these factors to the target
formed across several years, the interaction environments. This provides a basis for
is referred to as a three-mode (three-way) understanding the results from individual
data array, in which the modes are geno- experiments and predicting their applica-
types, locations and years. By extension tion to elsewhere. This can be extended to
of a two-way additive main effect and include the sociological factors that influ-
multiplicative interaction (AMMI) mode ence the utilization of cultivars by farm-
to a three-way mode, Varela et al. (2006) ers. In the majority of METs conducted
offered us a natural approach for assess- there is no clear definition of the environ-
ing the response in locations and years or mental challenge. Further, in many METs
for studying the multi-attribute response of there is no measurement of how well the
genotypes in environments. The three-way test environments match those of the target
Genotype-by-environment Interaction 387
of a crops growing region into several mega- mega-environment analysis and GEI ana-
environments implies more work for plant lysis. Type 3 is the most challenging target
breeders and seed producers, but it also environment and, unfortunately, also the
implies higher heritabilities, faster progress most common one. Statistical methods for
for plant breeders and potentially stronger grouping environments involves classifica-
competitiveness for seed producers. On the tion procedure, ordination procedures (i.e.
other hand, the necessary and sufficient con- using coordinates in a graph to depict rela-
dition for mega-environment division is a tionships among environments), or the joint
repeatable which-won-where pattern rather use of a classification and an ordination
than merely a repeatable environment-group- procedure (DeLacy and Cooper, 1990).
ing pattern (Yan and Rajcan, 2002; Yan and Clustering analysis, which can be used
Kang, 2003). Mega-environments are used to for environment classification, usually
allocate resources in a breeding or research involves the creation of hierarchical groups
programme, to rationalize germplasm and of environments just as the same as described
information exchanges between breed- for germplasm classification in Chapter 5.
ing programmes (allowing even small pro- A given environment is more similar to
grammes to progress by focusing on the most an environment in the same cluster than
promising material), to increase heritabilities to an environment in a different cluster,
within relatively well-defined and predict- in terms of genotype rankings, rather than
able environments, to increase the efficiency the physical factors of the environments
of testing and breeding programmes and to per se. The clustering procedure requires
target genotypes to appropriate production some measure of dissimilarity or distance
areas. Many other terms, however, have between environments. Data from METs are
essentially the same meaning, such as agro- typically unbalanced because the genotypes
climatic or eco-geographic regions. and locations often vary from year to year.
Appropriate mega-environment analy- But the statistical distance between two
sis should classify the target environment environments can be determined from the
into one of three possible types (Table 10.3). performance of the subset of genotypes that
Type 1 is the easiest target environment one are grown in both environments (DeLacy
can hope for, but it is usually an overopti- and Cooper, 1990). The distance measures
mistic expectation. Type 2 suggests oppor- for genotypes summarized by Lin et al.
tunities for exploiting some of the GEI. Such (1986) can be used for environment cluster-
opportunities should not be overlooked ing, whereas Ouyang et al. (1995) measured
if they exist, which is the whole point of the distance between j and j' as
Table 10.3. Three types of target environments based on mega-environment analysis (from Yan et al.
(2007) with permission).
(t
1 A common procedure is the average linkage
= ij t ij ' )
2
g method, also called the unweighted pair-
i =1
group method using arithmetic averages
where g is the number of genotypes grown (UPGMA) method, in which the distance
in both j and j'; mj (or mj) is the mean of all the between two clusters is equal to the average
genotypes in environment j (or j'); and sj (sj) distance between an environment in the first
is the phenotypic standard deviation among cluster and an environment in the second
all the genotypes in environment j (or j'). cluster. The cluster diagram or dendrogram
When all genotypes grown in environment is used to graphically illustrate the groups
j are also grown in environment j', Eqn 10.2 of environments (Fig. 10.2). The cluster dia-
can be rewritten as (Ouyang et al., 1995) gram indicates the hierarchical clustering of
environments and the average distances at
1
D jj ' = 2 1 1 rjj '
n
( ) (10.3)
which they are joined. The clusters based on
Djj' from METs are often consistent with geo-
graphical groupings (Bernardo, 2002). For
where rjj' is the correlation between environ- example, Ouyang et al. (1995) partitioned
ment j and environment across genotypes. the 90 counties in Iowa on the basis of the
This implies that the distance between two performance of seven maize hybrids grown
environments is Djj' = 0 if the performance in a total of 2006 environments. Cluster anal-
of genotypes are perfectly correlated in the ysis partitioned the counties into a northern
two environments, i.e. rjj' = 1. In contrast, group and a southern group, although two
the distance approaches Djj' = 2 if rjj' = 0. The south-eastern Iowa counties were clustered
distance approaches a maximum of Djj' = 4 with the northern Iowa group (Fig. 10.2).
when COIs occur and rjj' approaches 1. The northsouth groups are consistent with
Fig. 10.2. Cluster analysis of Iowa counties. Adapted from Ouyang et al. (1995); original figure provided
by Rex Bernardo.
390 Chapter 10
0.0
G15G16 E2
G14
E1
E4 E3
0.4 G4
G10E6 E9
G17
E8
0.8
G8
G13
1.2
1.2 0.8 0.4 0.0 0.4 0.8 1.2 1.6
PC1
Fig. 10.3. The which-won-where view of the GGE bi-plot based on the G E data in Table 10.1. The
data were not transformed (Transform = 0), not scaled (Scaling = 0), and were environment-centred
(Centering = 2). The bi-plot was based on environment-focused singular value partitioning (SVP = 2)
and therefore is appropriate for visualizing the relationships among environments. It explained 78% of
the total G + GE. The genotypes are labelled as G1G18 and the environments are labelled as E1E9.
From Yan et al. (2007) with permission. PC, principal component.
Genotype-by-environment Interaction 391
of prevailing conditions during key growth of maps, globes, reports and charts. A GIS
stages and observed patterns of GEI, six can be viewed in three ways. The Database
major environment classes (EC) were iden- View: a GIS is a unique type of database of
tified. The relative frequency of each EC the world a geographic database (geodata-
varied greatly from year to year and signifi- base). It is an information system for geog-
cant hybrid EC interaction variance was raphy. Fundamentally, a GIS is based on a
observed. This environmental classification structured database that describes the world
system provided a useful description of in geographic terms. The Map View: a GIS is
some of the features of both the TPE and the a set of intelligent maps and other views that
MET. Knowledge of the spatial (locations) show features and feature relationships on
and temporal (years) distributions of ECs the earths surface. Maps of the underlying
that influence the incidence of GEI can be geographic information can be constructed
used to improve cultivar performance pre- and used as windows into the database to
dictability in the US Corn Belt TPE. support queries, analysis and editing of the
Subdivision of a crops growing regions information. The Model View: a GIS is a set of
into several mega-environments could be information transformation tools that derive
avoided if genotypes could be found with new geographic data sets from existing data
yield superiority throughout the region, sets. These geo-processing functions take
that is, cultivars bred in favourable envir- information from existing data sets, apply
onments would also perform best in analytic functions and write results into
different or unfavourable environments. new derived data sets. To utilize GIS data
However, one can hardly expect a single more effectively, several software packages
cultivar or a hybrid to flourish the world have been developed. ESRI software offers
over, under all environments and manage- scalable solutions for researchers at National
ment practices. A cultivar planted outside Agricultural Research Services (NARS),
its mega-environment frequently suffers universities and international research cen-
yield reductions. Furthermore, even if the tres. From field-based products like ArcPad
breeding goal is wide adaptation (rather to the server level Spatial Database Engine
than mega-environment directed breeding), (ArcSDE), data can be collected and man-
it would still be the best strategy to iden- aged. The Internet Map Server (ArcIMS)
tify several mega-environments and place a allows research sites separated by great geo-
test location in each to select wide adapta- graphical distances to be connected in real
tion. It has been a normal practice that mul- time and ArcGIS provides all of the neces-
tinational breeding companies have their sary tools to analyse the spatial components
programmes established to target specific of agricultural data sets.
eco-geographic regions. There is a growing need to classify
production environments by combining
biophysical criteria with socio-economic
10.2.2 GIS and environment factors. Geospatial technologies, especially
characterization GIS, are playing a role in each of these
areas and spatial analysis provides unique
Modern plant breeding programmes in- insights. Use of GIS to characterize wheat
creasingly use information from different production environments is described by
sources, including geographic information Hodson and White (2007) by drawing from
provided by GIS (http://www.gis.com). GIS examples at the International Maize and
integrates hardware, software and data for Wheat Improvement Center (CIMMYT).
capturing, managing, analysing and dis- Since the 1980s, the CIMMYT wheat pro-
playing all forms of geographically refer- gramme has classified production regions
enced information. GIS allows us to view, into mega-environments based on climatic,
understand, question, interpret and visu- edaphic and biotic constraints. Advances
alize data in many ways that reveal rela- in spatially disaggregated data sets and
tionships, patterns and trends in the form GIS tools allow mega-environments to be
392 Chapter 10
characterized and mapped in a much more The second example is to use GIS
quantitative manner. The combination of parameters to determine the Striga-prone
improved crop distribution data and key areas in Africa. Striga is an obligate para-
biophysical data at high spatial resolutions sitic weed that attacks cereal crops in sub-
also permits exploring scenarios for dis- Saharan Africa. In western Kenya, it has
ease epidemics, as illustrated for the stem been identified by farmers as their major
rust race Ug99. Availability of spatial data pest problem in maize. A new technology,
describing future climate conditions may consisting of coating seed of imidazolinone
provide insights into potential changes in resistant (IR) maize cultivars with the imi-
wheat production environments in the com- dazolinone herbicide, imazapyr, has proven
ing decades. Increased availability of near to be very effective in controlling Striga
real-time daily weather data derived from on farmer fields. To help extension agents
remote sensing should further improve and seed companies to develop appropri-
characterization of environments, as well as ate strategies, the potential for this technol-
permit regional-scale modelling of dynamic ogy was analysed by combining different
processes such as disease progression or data sources into a GIS (De Groote et al.,
crop water status. Below are some examples 2008). Superimposing secondary data, field
where plant breeding research is benefiting surveys, agricultural statistics and farmer
from implementing a spatial aspect to envi- surveys made it possible to clearly identify
ronment characterization. the Striga-prone areas in western Kenya.
The first example is to use GIS param- By extrapolation over the maize area in the
eters in grouping sites to ensure that breed- zone, total potential demand for IR-maize
ers choose as many variable sites as possible seed is estimated at 20002700 t year1.
to represent the target region. The present Similar calculations, but based on much
mega-environments in the Southern African less precise data and expert opinion rather
Development Community (SADC) countries than farmer surveys or trials, gives an esti-
are confounded within each country, which mate of the potential demand for IR-maize
limits the exchange of germplasm among seed in Africa as 153,000 t year1.
them. A study was undertaken to revise and The third example is to classify maize
group similar maize-testing sites across the growing environments based on drought
SADC countries that are not confounded related parameters. GEIs in southern African
within each country (Setimela et al., 2005). maize growing environments result from fac-
The study was based on 3 years (19992001) tors related to maximum temperature, season
of regional maize yield trial data and GIS rainfall, season length, within-season drought,
parameters from 94 sites. Sequential retro- subsoil pH and socio-economic factors that
spective (Seqret) pattern analysis method- result in sub-optimal input application. The
ology was used to stratify testing sites and difficulty of choosing appropriate selection
group them according to their similarity environments has restricted breeding progress
and dissimilarity based on mean grain yield. for abiotic stress tolerance in highly variable
The methodology used historical data, tak- target environments. Bnziger et al. (2006)
ing into account imbalances of data caused applied cluster analysis to the most prominent
by changes over locations and years, such GEIs and grouped trial sites into eight mega-
as additions and omission of genotypes and environments mainly distinguished by season
locations. Cluster analysis grouped regional rainfall, maximum temperature, subsoil pH
trial sites into seven mega-environments, and N application. GIS information available
mainly distinguished by GIS parameters for season rainfall, maximum temperature
related to rainfall, temperature, soil pH and and subsoil pH (Hodson et al., 2002) was
soil nitrogen with an overall R2 = 0.70. This used to map maize mega-environments (Table
analysis can reveal challenges and opportu- 10.4; Fig. 10.4). Classification by maximum
nities to develop and deploy maize germ- temperature distinguished different eleva-
plasm in the SADC region faster and more tions: mega-environments AE corresponding
effectively. to the mid-altitudes; mega-environments F
Genotype-by-environment Interaction 393
Area in Area in
Maize mega- Maximum Season Subsoil pH southern southern
environment temperature (C) precipitation (mm) (water) Africa (103 ha) Africa (%)
A
B
C
D
E
F
G
H
dynamic. Static stability, also referred to as plants over different environments (Allard
the biological concept of stability, implies that and Bradshaw, 1964; Briggs and Knowles,
a genotype has a stable performance across 1967). It has been shown that heterozygous
environments with no among-environment individuals, such as F1 hybrids, are more
variance, i.e. a genotype is non-responsive stable than their homozygous parents.
to increased levels of inputs. Dynamic sta- The stability of heterozygous individu-
bility implies that a genotypes performance als seems to be related to their ability to
is stable, but for each environment, its per- perform better under stress conditions than
formance corresponds to the estimated or homozygous plants. The terms genetic home-
predicted level, which is also referred to as ostasis and population buffering were used
the agronomic concept of stability. Lin et al. to describe the stability of a group of plants
(1986) classified statistical methods for sta- that exceeds that of its individual mem-
bility analysis into four groups: bers (Lerner, 1954; Allard and Bradshaw,
1964). Heterogeneous cultivars generally
Group A: based on deviation from aver-
have higher stability than homogeneous
age genotype effect (DE) represents
cultivars.
sums of squares;
A number of statistical procedures
Group B: based on GEI represents
have been developed to enhance our under-
sums of squares;
standing of GEI and to select genotypes
Group C: based on either DE or GEI
that perform consistently well across many
represents regression coefficient against
environments. The earliest approach was
environment mean; and
the linear regression analysis. Finlay and
Group D: based on either DE or GEI
Wilkinson (1963), Eberhart and Russell
represents deviations from regression.
(1966) and Tai (1971) popularized varia-
In Group A (Type 1 stability) which is tions of the regression approach, assuming
equivalent to biological stability, a geno- an expected linear response of yield to envi-
type is regarded as stable if its among- ronments. Other statistical methods that
environment variance is small. In Groups B have received significant attention are pat-
and C (Type 2 stability), which is equiva- tern analysis (DeLacy et al., 1996), the AMMI
lent to agronomic stability, a genotype is model (Gauch and Zobel, 1996), the shifted
regarded as stable if its response to environ- multiplicative model (SHMM) (Cornelius
ments is parallel to the mean response of all et al., 1996; Crossa et al., 1996), linearbilinear
genotypes in a test. In Group D (Type 3 sta- and mixed models (Crossa et al., 2004) and
bility), a genotype is regarded as stable if the non-parametric methods of Hhn (1996).
residual mean square following regression The methods of Hhn (1996) and Kang
of genotype performance or yield on envir- (1988, 1993) investigate yield and stabil-
onmental index is small. Lin and Binns ity into one statistic that can be used as a
(1988) proposed a Type 4 stability on the selection criterion. Flores et al. (1998) and
basis of predictable and unpredictable non- Hussein et al. (2000) conducted comparative
genetic variation. They suggested the use of evaluation of 22 and 15 stability statistics/
a regression approach for the predictable methods, respectively. Flores et al. (1998)
portion. The mean square for years-within- classified 22 univariate and multivariate
locations for each genotype as a measure of methods into three main groups. Group 1
the unpredictable variation was referred to statistics are mostly associated with yield
as Type 4 stability. level and show little or no correlation with
The stability of cultivar performance stability parameters. In Group 2, both yield
across environments is influenced by the and stability of performance are considered
genotype of individual plants and the simultaneously to reduce the effect of GEI.
genetic structure of the plants. The terms Group 3 statistics emphasize only stability.
homeostasis and individual buffering have Recently, mixed model approaches have
been used to describe the stability in per- become increasingly important in GEI and
formance of individual plants or groups of stability analyses.
396 Chapter 10
10.3.1 Linearbilinear models ia 2ik = jg 2jk = 1 for k = k'. When Eqn 10.4
for studying GEI is saturated the number of bilinear terms is
t = min(I 1, J 1) and for any smaller value,
Statistical methods for detecting and quan- the model is said to be truncated. The inter-
tifying COI and for forming subsets of action parameters lk, aik and gjk of the GEI
environments and/or genotypes with negli- subspace are estimated from the data them-
gible COI have been based on fixed effect selves. The linearbilinear model of Eqn
linearbilinear models. Several classes of 10.4 is a generalization of the regression
these models have been developed, some on the mean model with more flexibility
of which are widely used. In this section, for describing GEI because more than one
linearbilinear model development will genotypic and environmental dimension is
be discussed, mainly based on Crossa et al.s considered.
(2005) review. Several classes of linearbilinear mod-
An early approach towards the ana- els, described by Cornelius et al. (1996),
lyses of GEI included the conventional which are generally derived from Eqn 10.4,
fixed effect two-way (FE2W) ANOVA are Genotypes t
Regression Model (GREG)
model with the sum to zero constraints y ij = m i + k=1lk a ik g jk + e ij , the Sites (envi-
running over indices as shown in Eqn ronments)
t
Regression Model (SREG) yij = mj
10.1. Yates and Cochran (1938) proposed + k=1 lka ik g jk + e ij , the Completely Multipli-
t
to relate the GEI term in Eqn 10.1 linearly cative Model (COMM) y ij = k=1 lka ik g jk + e ij
to the environmental main effect, that is, and the Shiftedt Multiplicative Model
(td)ij = xidj + dij, where xi is the linear regres- (SHMM) y ij = b + k=1 lka ik g jk + e ij .
sion coefficient of the ith genotype on the Two linear and bilinear models, SHMM
environmental mean and dij is a deviation. and SREG, have been used for studying GEI
This approach was later used by Finlay and and for clustering genotypes or sites into
Wilkinson (1963) and modified by Eberhart groups with statistically negligible COI
and Russell (1966). William (1952) linked (Cornelius et al., 1992, 1993; Crossa and
the FE2W model with principal component Cornelius, 1997, 2002; Crossa et al., 1993,
analysis (PCA) by considering the model 1995). Only the SREG model permits the
yij = m + ti + laigj + eij, where l is the larg- detection of COIs (Bernardo, 2002).
est singular value of ZZ' and ZZ (for The SREG model has been used for
Z = yij yi.) and ai and gj are the correspond- grouping environments without geno-
ing eigenvectors. typic rank change (Crossa and Cornelius,
Gollob (1968) and Mandel (1969, 1971) 1997). The interaction parameters aik and
extended Williams (1952) work by consider- gjk of these linearbilinear models define
t
ing the bilinear GEI term as (td )ij = k=1lka ik g jk. the behaviour of the genotypes and the
Thus, the general formulation for the linear environments and when ai1, ai2 and gj1, gj2
bilinear model is are plotted together in the bi-plot (Gabriel,
1978) useful interpretations of the relation-
t
l a
ships between genotypes, environments
y ij = m + t i + d j + k ik g jk + e ij (10.4)
and GEI are obtained. In the bi-plot, the
k =1
interaction between the ith genotypes and
where the constant lk is the singular value of the jth environment is obtained from the
the kth multiplicative component (kth PCA projection of either vector on to the other.
axis), that is ordered l1 l2 lt; ik cor- Crossa et al. (2002) used SREG1 analysis
responds to the left singular vector of the (a reduced SREG model) to examine GEI
kth component and represents genotypic among the 20 environments ranged from
sensitivities to hypothetical environmen- 0.41 to 0.43 and, consequently, the rank-
tal factors represented by the right singu- ing of the nine genotypes differed among
lar vector of the kth component, gjk. The aik environments (Fig. 10.5). The primary
and gjk satisfy the ortho-normalization con- effect for a given environment depends on
straints iaikaik' = jgjkgjk' = 0 for k k' and the other environments included in the
Genotype-by-environment Interaction 397
8
20 environments
7
5
SREG1 predicted yield (t ha1)
8
Subset of 10
7 environments
2
0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
Primary effect of environments
Fig. 10.5. Predicted yield from SREG1: analysis of nine maize genotypes. Data from Crossa et al. (2002)
courtesy of R. Bernardo (2002).
analysis. A subset of ten environments, in after taking out the main effects of geno-
which the resulting primary effects are all types and environments from METs is used
positive and COIs were absent, was found for PCA to extract patterns of GEI or resid-
(Fig. 10.5). ual variation to understand the underly-
Gabriel (1978) described the least ing causes of such interactions. It is thus a
square fit of Eqn 10.4 and explained combination of ANOVA and the PCA.
how the residual matrix of the GEI term, In AMMI analysis, the least square esti-
Z = yij yi. y.j + y.., is subjected to a singular mates of the parameters along with mean
value decomposition (SVD) after adjusting values of genotypes and environments are
the additive (linear) terms. Zobel et al. interpreted to classify genotypes and envi-
(1988) and Gauch (1988) called Eqn 10.4 ronments for their stability. A bi-plot is
additive main effects and multiplicative developed by placing both genotype and envi-
interaction (AMMI) and proposed a cross- ronment means on the x-axis and placing the
validation procedure for determining the first PCA scores of the genotypes and envi-
number of important bilinear components. ronments on the y-axis. This bi-plot can be
The AMMI model separates the multiplica- used to facilitate identification of any pattern
tive portion of GEI into specific patterns of of GEI, i.e. specific interactions of individual
response of genotypes and environments. genotypes and environments based on the
In this analysis the information about GEI sign and magnitude of PCA values especially,
398 Chapter 10
property of the bi-plot is that the rank view because it facilitates genotype com-
2 approximation of any entry in the parisons based on mean performance
original matrix Z can be computed by and stability across environments within
taking the inner product of the corre- a mega-environment. Since GGE repre-
sponding genotype and environment sents G + GEI and since the AEC abscissa
vectors, i.e. (l1f a i1, l2f m i 2 )(l11 f g j 1, l 21 f g j 2 )' approximates the genotypes contributions
= l1a i1g j 1 + l2a i 2g j 2 . This is known as the to G, the AEC ordinate must approximate
inner-product property of the bi-plot. the genotypes contributions to GEI, which
The GGE bi-plot methodology consists is a measure of their stability or instability.
of a set of bi-plot interpretation methods, Thus, G4 in the figure was the most stable
whereby important questions regarding genotype, as it was located almost on the
genotype evaluation and test-environment AEC abscissa and had a near-zero projec-
evaluation can be visually addressed. tion on to the AEC ordinate. This indicates
Within a single mega-environment, cultivars that its rank was highly consistent across
should be evaluated for their mean perform- environments within this mega-environ-
ance and stability across environments. ment. In contrast, G17 and G6 were two of
Figure 10.6 is the Average Environ- the least stable genotypes with above aver-
ment Coordination (AEC) view, which is age mean performance.
based on genotype-focused singular value Several recent articles reviewed and
partitioning (SVP), that is, the singular compared the two statistical approaches
values are entirely partitioned into the discussed above, AMMI and GGE bi-plot
genotype scores (GGE bi-plot option SVP analyses. For their pros and cons, read-
= 1). This AEC view with SVP = 1 is also ers are referred to Gauch (2006), Yan et al.
referred to as the Mean versus Stability (2007) and Gauch et al. (2008).
1.2 E7
0.8 E5
0.4 G7 G1
PC2
G3 G18
G11
G12 G2 G5
G9
G6
0.0 G15G16
G14 E2
G4 E1
G17 G10
0.4 E3
E4
G8
G13 E9
0.8 E6
E8
Fig. 10.6. The mean versus stability view of the GGE bi-plot based on a subset of the G E data
in Table 10.1. The data were not transformed (Transform = 0), not scaled (Scaling = 0), and were
environment-centred (Centering = 2). The bi-plot was based on genotype-focused singular value
partitioning (SVP = 1) and therefore is appropriate for visualizing the similarities among genotypes.
It explained 79.5% of the total G + GE for the subset. From Yan et al. (2007) with permission.
400 Chapter 10
y 1 1m1 Z R1 0 . . . 0 Z G1 0 . . . 0
y 2 1m2 0 Z R2 . . . 0 0
Z G2 . . . 0
. . . . . . . . . . . . . .
= + r + g +e (10.6)
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
y s 1ms 0 . . . . Z Rs 0 . . . . Z Gs
0 R = var(rep) 0 0
r
g
~ N 0 , 0 G = var( genotype ) 0
(10.7)
e
0 0 0 E = var(error )
G = var( genotype ) = g
I g
the jth diagonal element of the s s matrix Ig, which assumes no relationship among
Sg is the genetic variance s 2gj within the jth genotypes.
site and the ijth element is the genetic cov- The genetic environmental component
ariance rij sgi sgj of genotypic effects in sites i (Sg) of the variance of the random effect
and j; thus rij is the correlation of genotypic vector, g, can be modelled by the FA, which
effects in sites i and j. The hypothesis of expresses the random effect of the ith gen-
interest is that the rij for pairs of sites within otype in the jth environment as a linear
subsets of sites (or genotypes) previously function of latent variables xik with coeffi-
identified as non-COI subsets by SHMM or cients djk for k = 1, 2, t, plus a residual
SREG clustering are all unity (Crossa et al., hij. Then
2004).
The approach proposed by Crossa
G = (DD' + Y) Ig = FA(k) Ig (10.11)
et al. (2004) is a step in the right direc-
tion for incorporating the linear mixed
model methodology into the quantification
of COI. However, detection of COI using 11 12 . . 1t
22 . . 2t
SREG (and SHMM) has generally been 21
done within the fixed effect linearbilinear where = . . . . . is a
framework (Cornelius and Seyedsadr,
. . . . .
1997), that is, the differences between any s1
s2 . . st
two genotypic effects in any two environ-
ments are linear functions of least squares
solutions for model parameters regarded matrix of order s t with the kth column
as fixed. Recently, Yang (2007) recog- containing the environment loadings for the
nized that in statistical analyses of METs, tth latent factor (k = 1, , t). The FA model
either genotypes or environments, or both, can be interpreted as the linear regression
should be considered as random effects of genotype and GEI on latent environmen-
and, therefore, the detection of COI must tal covariates (environmental loadings,
consider that the difference between geno- djk), with each genotype having a separate
typic effects in a random environment is slope (genotypic scores, xik) but a common
a predictable function that involves Best intercept (if main effects of genotypes are
Linear Unbiased Estimators (BLUEs) as not distinguished from GEI). The slopes of
well as Best Linear Unbiased Predictions genotypes measure the sensitivity of the
(BLUPs). genotypes to hypothetical environmental
As a development of Crossa et al.s factors represented by the loadings of each
(2004) approach, Burgueo et al. (2008) environment.
presented an integrated methodology for As an example, two CIMMYT maize
clustering environments and genotypes international METs were used to illus-
with negligible COI based on results trate the method for searching for subsets
obtained from fitting FA to MET data, of environments and genotypes with neg-
which was used to detect COI using ligible COI. Results from both data sets
predictable functions based on the lin- showed that the proposed method formed
ear mixed model with FA and BLUP of subsets of environments and/or genotypes
genotypes. with negligible COI. The main advantage of
For the genotype factor, the identity the integrated approach is that one unique
matrix Ig (of order g), described above, is linear mixed model, the FA model, can
used when it is assumed that the genotypes be used for: (i) modelling the association
are not related and the breeding value of among environments; (ii) forming subsets
each genotype will be predicted only by of environments without COI; (iii) group-
the value of the empirical responses of the ing genotypes into non-COI subsets; and
genotype itself. The genotypic component (iv) detecting COI using the appropriate
of G can be modelled by the identity matrix predictable function.
402 Chapter 10
The multivariate approaches help to Y., 2002). Various statistical methods have
identify patterns of variation for genotypes been developed for mapping quantitative
on the basis of their multi-dimensional trait loci (QTL) involved in GEIs. In this
response to many environments. Such section, genetic models involving GEI
grouping, however, does not represent any and molecular dissection of GEI will be
specific types of stability index, but it can discussed.
be used to draw meaningful conclusion in
relation to the response of a cultivar that has
been known for its ability. All the new gen- 10.4.1 Partition of environmental factors
otypes grouped along with a well-known
cultivar or falling near it are considered to
METs involve various environmental fac-
have the same type of stability, since groups
tors and some of them can be partitioned
of genotypes that are known for their stabil-
into several key components. Environment
ity have been quite stable over years. Along
partition can be used to understand the
with these known cultivars, a large number
effect of each environmental component,
of new lines can be screened for their stabil-
the response of a genotype to specific envi-
ity from 1-year experiments.
ronmental factors and the genetic control
In general the multivariate methods
of environment-dependent traits such as
do not provide a simple measure of stabil-
temperature or photoperiod-induced male
ity for a specific genotype which could be
sterility.
used as a trait in breeding programmes. A
Genetic analysis in general involves
detailed description of all such techniques
extracting a genetic signal from many
is beyond the scope of this book and the
sources of noise, such as those from exter-
reader is encouraged to consult Freeman
nal environments and internal genetic back-
(1973), Kang (1990) and Fox et al. (1997).
grounds. For accurate genetic analysis, the
Commercial statistical software packages
noise must be minimized or eliminated.
such as SAS can be used for different mod-
Controlled environments or genetic back-
els described in this section. For example,
grounds are usually created for filtering
mixed models can be fitted using SAS PROC
the noise. In Chapter 4, we described the
MIXED.
development of a set of individuals such as
near-isogenic lines (NILs) that have homo-
geneous genetic background. Similarly, Xu,
10.4 Molecular Dissection of GEI Y. (2002) proposed the concept of near-iso
environments (NIEs). Plant populations
Recent advances in molecular biology used for genetic analysis can be evaluated
have provided some of the best tools for in either natural or controlled environments
obtaining insights into the molecular mech- or both. Controlled environments can be
anisms associated with GEI. Molecular compared with each other or with natural
markers can be employed to find genomic environments. If two environments mainly
regions with stable responses. Marker- differ in one macro-environmental factor,
assisted QTL-by-environment interaction they are considered contrasting or NIEs, if
(QEI) analysis will ultimately provide a the standard plot-to-plot variation and other
better genetic understanding and possible residual micro-environmental effects can
regulation of this phenomenon. Regions of be neglected. A relative trait value is then
plant genomes that provide stable responses derived from two direct trait values meas-
across diverse environments can be identi- ured in each environment to ascertain the
fied. Experimental strategies have been sensitivity of plants to the stress (see for
proposed for resolving environmental fac- example Ni et al., 1998).
tors into several components that affect Xu, Y. (2002) provided an example of
specific quantitative traits so their effects how rice plants respond to photoperiod and
can be either estimated or controlled (Xu, temperature. Using Zhaiyeqing 8/Jingxi 17
Genotype-by-environment Interaction 403
doubled haploids (DHs), days-to-heading whereas likelihood of odds (LOD) scores for
(DTH) and photo-thermo sensitivity (PTS) the PTS in these regions were much lower
were measured in two environments than the threshold. A region on chromo-
(Beijing and Hangzhou) that mainly differ in some 7 (G397ARM248) was significantly
day-length and temperature. At the photo- associated with PTS (LOD = 4.47), where
thermo sensitive stage, Beijing has long LOD scores for DTH in both locations were
day-length (14.515 h) and low temperature much lower than the threshold (Fig. 10.7),
(2027C), whereas Hangzhou has short indicating that this PTS QTL is independ-
day-length (1313.5 h) and high temperature ent of the QTL for heading date. As the rice
(25.530C). Rice is considered a short-day breeding programme has been accelerated
plant and development from vegetative to by growing rice in an off-season or an off-
reproductive stages is promoted under short location where it is not a targeted envir-
day-length and high-temperature condi- onment, marker-assisted selection (MAS)
tions. Differences in photoperiod and tem- for these types of traits would be important
perature in the two locations resulted in as they can only be identified under NIEs.
differences in DTH of 039 days for indi- A second example in rice is from CO39/
vidual DH lines (Fig. 10.7). Using the rela- Moroberekan recombinant inbred lines
tive difference, ( (DTH in Beijing DTH in (RILs), grown under greenhouse conditions
Hangzhou)/DTH in Beijing 100), genes and exposed to two different photoperiod
associated with PTS were mapped with regimes (Maheswaran et al., 2000). Days-
155 restriction fragment length polymor- to-flowering (DTF) of individual lines was
phism (RFLP) and 92 simple sequence evaluated under 10-h and 14-h day-lengths
repeat (SSR) markers. Four chromosomal and loci associated with photoperiod sen-
regions were identified significantly associ- sitivity were identified based on the delay
ated with DTH in either or both locations, in flowering less than the 14-h photoperiod
16
16
14
14 Beijing (BJ)
Number of DH lines
12
12
10
10
8
8
6
6
4
4
Number of DH lines
2 2
0 0
2 6 10 14 18 22 26 30 34
20
Photo-thermo sensitivity (PTS)
18
16
14 Hangzhou (HZ)
12 LODs for DTH and PTS
10 Chr Marker interval DTH(BJ) DTH(HZ) PTS
8
6 1 RG400RM84 2.41* 1.68 0.87
4 7 G379ARM248 1.21 0.56 4.47*
8 RG885RM44 7.35* 6.56* 1.07
2
10 C16RM228 2.67* 3.04* 0.41
0
65 71 77 83 89 95 101 107 113 119 12 RG463RG323 2.30 2.55* 0.82
Days-to-heading (DTH)
Fig. 10.7. QTL mapping for photo-thermo sensitivity (PTS) in rice under two environments (Beijing and
Hangzhou). Left: Days-to-heading (DTH) distribution in Zhaiyeqing 8/Jingxi 17 DH population planted
in Beijing and Hangzhou. Top right: PTS distribution in the population when PTS was measured by the
difference of DTHs in the two environments divided by the DTH in Beijing. Bottom right: QTL identified for
DTH in Beijing and Hangzhou and for PTS (*LOD > 2.4). Modified from Xu, Y. (2002). Chr, chromosome.
404 Chapter 10
(DTF at 14 h DTF at 10 h). In total, 15 QTL (Paterson et al., 1991; Stuber et al., 1992;
were associated with DTF. Only four of them Lu et al., 1996; Veldboom et al., 1996), indi-
were also identified as influencing response cating that QTL detection depends on the
to photoperiod. None of these QTL is allelic specific environment. These QTL can be
to the PTS QTL on chromosome 7. defined as environment-dependent (sensi-
Genetic mapping performed on envi- tive) QTL. For improving mapping power
ronmental sensitivity has provided much and efficient QTL cloning, therefore, the
better quantitative evaluations of QEI and specific conditions highly suitable for
have been used successfully to investigate expression of the quantitative trait of inter-
plasticity and GEI for agriculturally relevant est should be identified. The results from
traits in animals and plants. QTL mapping across multiple environments
provide some evidence for QEIs, in addition
to the information of how QTL detection
10.4.2 QTL mapping across depends on the environments and which
environments traits are more environment-dependent.
QTL can be studied under adverse
Do genes function similarly in different environments (abiotic stress), NIEs or a uni-
environments? The answer is negative. form environment by replicating DH or RIL
Phenotypic expression of quantitative traits populations and splitting tillers or ratoon-
is affected by external environmental factors ing a segregating population. Xu, Y. (2002)
such as day-length, temperature, moisture summarized the QTL mapping experiments
and soil conditions, which can greatly in rice that have been done in two or more
modify the phenotype of quantitative traits. environments by using permanent popu-
In many cases, external environments act lations. For the convenience of compari-
as a regulator of expression of the traits. son, rice QTL mapped in two environments
It was found that when the same mapping were selected for sharing analysis (Table
population was phenotypically evaluated in 10.5). A total of 159 QTL was identified in
different environments, some QTL could be ten QTL mapping reports for 11 categories
detected in all environments tested but oth- of quantitative traits. For different traits,
ers could be detected only in some of them QTL-sharing frequencies between the two
Table 10.5. Comparison of QTL mapped in two environments using the same populations in rice (Xu, Y., 2002).
environments ranged from 9.5% for drought which presumably represent the genetic
tolerance to 52.9% for 1000-grain weight factors underlying the GEI observed in line-
and, for all traits, on average, 46 (30%) of based phenotypes (Beavis and Keim, 1996).
them are shared or are common between QEI has been predicted by comparing the
the two environments. For all shared QTL, QTL detected separately in different envir-
the mean variance explained is 16.7%, onments in many crops. That a QTL can
whereas for the unshared QTL, it is 10.9%. be detected in one environment but not in
QTL with large effect (higher proportion others, as discussed earlier, could result
of the variance explained) are shared more from experimental noise, sampling error or
frequently. Major-gene-related QTL (for experimental error and thus does not neces-
flooding tolerance and paste viscosity) had sarily indicate QEI. As indicated by Jansen
the highest QTL-sharing frequencies. When et al. (1995), the chance for simultaneous
compared across three or more environments, detection of QTL in multiple environments
QTL-sharing frequencies become lower. For is small. On the other hand, sharing QTL
example, a total of 22 QTL for six agronomic among environments does not necessarily
traits were identified in Zhaiyeqing 8/Jingxi mean lack of QEI. This is supported by the
17 DHs, only seven of which were shared fact that QEI was identified for some shar-
in all three tested environments (Lu et al., ing QTL by incorporating QEij into QTL
1996). In three trials using Tesanai 2/CB F2 analysis (e.g. Yan et al., 1999) and by the
and its two equivalent F3s, eight QTL were fact that QTL effects estimated across envi-
identified, two of which were detected in all ronments could be very different.
three trials (Zhuang et al., 1997). In another
report, three of 11 QTL identified for leaf
rolling were shared in the three trials with 10.4.3 QTL mapping with incorporated
different drought-stress intensities (Courtois GEI
et al., 2000).
With grain yield and test weight evalu- There are two approaches for the analysis of
ated in four trials and for grain yield com- QEIs (Leflon et al., 2005). The first approach
ponents evaluated in eight trials, Blanco deduced interactions by comparing QTL
et al. (2001) detected a total of 52 QTL in detected separately in different environ-
durum wheat that were significant in at ments as described in the previous section:
least one environment at P < 0.001 or in at in many cases, an interaction was merely
least two environments at P < 0.01. Paterson detected and no estimate made of the interac-
et al. (2003) described the impact of well- tion itself. In other cases, QEIs were assessed
watered versus water-limited growth condi- by co-localization between QTL detected
tions on the genetic control of fibre quality, for the main effect and QTL detected for
a complex suite of traits that collectively stability statistics (Emebiri and Moody,
determine the utility of cotton. Fibre length, 2006). The second approach takes interac-
length uniformity, elongation, strength, tion effects into account in the analysis of
fineness and colour (yellowness) were influ- multi-environment trials by introducing
enced by 6, 7, 9, 21, 25 and 11 QTL, respec- QTL main effects and QEI effects, like stud-
tively, that could be detected in one or more ies of GEI (see, for instance, Crossa et al.,
treatments. The genetic control of cotton 1999; Campbell et al., 2003, 2004; Groos
fibre quality was markedly affected both by et al., 2003). These methods are powerful
general differences between growing sea- but a large number of environmental meas-
sons (years) and by specific differences in urements is necessary for their application.
water management regimes. Seventeen QTL With data collected from multiple loca-
were detected only in the water-limited tion trials on a core set of genotypes, GEI can
treatment while only two were specific to be detected by ANOVA and various statisti-
the well-watered treatment. cal procedures that measure genotype stabil-
Inconsistent QTL detection across ity (Lin et al., 1986; Kang, 1993) as described
environments may also be the result of QEI, in previous sections. To determine genetic
406 Chapter 10
factors responsible for GEI, QEI can be evalu- themselves be regressed on an environ-
ated on the basis of agronomic data collected mental co-variable, z, in an attempt to link
on a mapping population in multiple location differential QTL expression directly to key
trials and comparing QTL detection across environmental factors. The QEI term xi rj is
environments by ANOVA to test marker replaced by a regression term xi(lzj) and a
locus environment interactions. More residual term xir*j, that again disappears
recent efforts in QTL mapping involving GEI from the expectation when r*j is assumed to
have proven far more effective, largely due be random. The parameter l is a proportion-
to the incorporation of a QEI component by ality constant that determines the extent to
integrating this interaction component into which a unit change in the environmental
actual mapping algorithms (Jiang and Zeng, co-variable z, influences the effect of a QTL
1995; Wang, D.L. et al., 1999). allele substitution.
To analyse GED produced in METs, vari-
ous statistical models have been proposed Mixed models
that differ in the extent to which additional
genetic, physiological and environmental Using mixed models, several papers have
information is incorporated into the model been dedicated to the incorporation of the
formulation. The simplest model is the addi- genetic basis of GEI: differential expression
tive two-way ANOVA model, without GEI of QTL in relation to changing environmen-
and with parameters whose interpretation tal conditions, or QEI. Early work on QEI
depends strongly on the set of included gen- was done by Jansen et al. (1995), Jiang and
otypes and environments. The most compli- Zeng (1995) and Korol et al. (1998), who used
cated model is a synthesis of a multiple QTL a mixed model approach. Regression based
model and an eco-physiological model to approaches were presented by Sari-Gorla
describe a collection of genotypic response et al. (1997), Caliski et al. (2000), Hackett
curves. Among these models, factorial regres- et al. (2001) and van Eeuwijk et al. (2001,
sion models allow direct incorporation of 2002). Piepho (2000) and Verbyla et al. (2003)
explicit genetic, physiological and environ- presented other relevant work on QEI. These
mental co-variables on the levels of the geno- authors developed QTL mapping methods
typic and environmental factor. They are also for the analysis of METs using mixed model
very suitable for the modelling of QTL main theory, thereby giving special attention to the
effects and QEI (van Eeuwijk et al., 2005). modelling of heterogeneity of variance across
In the framework of factorial regression, environments and correlations between envi-
modelling of QEI is a natural extension of ronments, where the latter correlations may
modelling main effect QTL, i.e. QTL that be due to undetected QTL.
are supposed to have constant expression Jansen et al. (1995) developed an ana-
across environments. A model with a QTL lytic approach, multiple QTL mapping,
main effect and QEI at the same location in which accommodates both the mapping
the genome can be written as of multiple QTL and GEI. This approach
was compared to interval mapping in
m ij = m + x i r + Gi* + E j + x i rj + (GE )*ij (10.12) the mapping of QTL for flowering time
in Arabidopsis thaliana under various
The (GE)ij from the ANOVA model is photoperiod and vernalization conditions.
partitioned in part due to a differential QTL Procedures developed by Jiang and Zeng
expression, xi rj and a residual, (GE)*ij, that (1995) for estimating the effect of QTL for
is usually taken to be random and for that multiple traits can be used to test the sig-
reason then disappears from the expres- nificance of QEI.
sion for the expectation. In the light of QEI, A least squares interval mapping
the parameter rj adjusts the average QTL approach developed by Sari-Gorla et al.
expression across environments, rj, to a (1997) allows inclusion in the model of the
more appropriate level for the individual parameters describing the experimental and
environment j. The QEI parameters, rj, can environmental situation so that the QEI can
Genotype-by-environment Interaction 407
be tested. The analysis was performed on for yield, as identified at the 2H chromo-
data concerning two components of maize some could be described as QTL expression
pollen competitive ability, obtained from in relation to the magnitude of the tempera-
an experiment over 2 years. The method, ture range during heading.
in comparison with the traditional single
marker approach, has been shown to be Factorial regression model
more powerful in detecting QTL and more
precise in determining their map position. If climatic data are available for precipita-
The analysis has identified QTL expressed tion, temperature and solar radiation, facto-
across years, putative QTL with major rial regression models (van Eeuwijk et al.,
effects and QTL accounting for GEI. 1996) and partial least squares models
Piepho (2000) proposed a mixed model (Aastveit and Martens, 1986) can be used
method to detect QTL with significant mean to determine the degree to which each of
effect across environments and to charac- these factors influence GEI and QEI (Crossa
terize the stability of effects across multi- et al., 1999). Hence, just as molecular mark-
ple environments. He treated environment ers are commonly used to model the effects
main effects as random, which meant that of chromosomal segments (QTL) on a par-
both environment main effects and QEI ticular quantitative trait, climatic data can
effects were random. also be used to model particular aspects of
Verbyla et al. (2003) developed an the environment that contribute to the dif-
approach for multi-environment QTL analy- ferential performance of genotypes across a
sis. To accommodate a multi-environment range of testing environments. Using facto-
analysis, the size of a QTL effect was assumed rial regression models, Crossa et al. (1999)
to be a random effect. The approach resulted were the first to explain QEI and found that
in a multiplicative mixed model for QEI of temperature differences across environ-
the factor analytic type. The full genetic ments accounted for a large portion of the
model may also include a factor analytic QEI detected in a tropical maize (Zea mays
model for the residual GEI, whereas the L.) mapping population. They showed how
environmental model for the non-genetic regression methods such as the partial least
variation involves local, global and extra- squares regression and the factorial regres-
neous variation. The approach was used sion models, together with genetic markers
to determine QTL for yield in the Arapiles and environmental co-variables (such as
Franklin DH population of the National maximum and minimum temperature and
Barley Molecular Marker Program. sun hours), could be used to: (i) detect rele-
Malosetti et al. (2004) presented a strat- vant sets of correlated markers and environ-
egy for modelling QEI using mixed model mental variables that explain a significant
methodology in combination with regres- proportion of the total GEI; and (ii) study
sion ideas. They proposed a simple interval the influence of environmental variables
mapping approach that consists of fitting on the expression of QTL with the objective
along the genome a mixed model with both of assessing and interpreting the QEI that
a fixed QTL main effect and a fixed QEI accounts for GEI. Vargas et al. (2006) used
term and for the random part, the residual factorial regression and partial least squares
genetic variation, a factor analytic model methods for mapping QTL and QEI for the
with one multiplicative term and residual CIMMYT maize drought stress programme.
heterogeneity. For chromosome positions Van Eeuwijk (2001, 2002) extended the
with identified QTL expression and QEI, factorial regression models for GEI and QEI
a second modelling step regresses the QEI developed by Crossa et al. (1999) from the
on one or more environmental co-variables. original marker-based regressions to interval
To illustrate the approach, they analysed mapping and composite interval mapping.
grain yield data stemming from the North The authors presented: (i) a randomiza-
American Barley Genome Project (NABGP) tion test for controlling the genome-wise
(http://barleyworld.org/NABGP.html). QEI error rate, following the logic introduced by
408 Chapter 10
Churchill and Doerge (1994); and (ii) a par- complex relationships of GEI where many
tial least square (PLS) strategy to deal with traits function as not only dependent vari-
the problem of multi-collinearity among ables to be predicted by environmental and
multiple cofactors. The PLS strategy con- genetic factors, but also as independent pre-
sisted of: (i) taking all the markers outside dictor variables of other traits further down-
the chromosome being evaluated as cofac- stream (Dhungana et al., 2007). To use SEM
tors; (ii) regressing the phenotypic responses to analyse GEI, prior knowledge of the direc-
on this set of markers using multivariate tion of the causal relationships is assumed
PLS; (iii) calculating the fitted values for and specified through a path diagram and
the phenotypic responses; and (iv) using the model is then algebraically specified by
the corrected phenotypic observations, i.e. a system of regression-type equations where
the residuals from the PLS regression, in a each variable is adjusted to contain only
simple interval mapping (SIM) procedure GEI effects. A final model is then developed
for the chromosome being evaluated. by fitting successive models and retaining
significant QEI variables which result in a
Structural equation model better fitting model. The final model yields
path coefficients and a path diagram that
Most agronomically important traits are the contains only significant paths thus giv-
result of a number of genetic, molecular and ing insight into important relationships
physiological mechanisms that affect the between traits, QTL and the environmental
trait of interest either directly or indirectly variables.
through other intermediate traits. The GEI The approach was applied to recom-
of each trait in the network among variables binant inbred chromosome wheat lines
will be influenced either directly or indi- grown in multiple environments. The final
rectly, by a number of QEI and GEI of other model explained 74% of the yield GEI varia-
traits, which may, in turn, be influenced by tion and it was found that spikes per square
other factors (Campbell et al., 2003). A single metre GEI had the highest direct effect on
dependent variable quantitative approach yield GEI and that the genetic markers were
cannot describe the complicated relation- mostly sensitive to temperature and precip-
ship between traits, QTL and environments itation during the vegetative and reproduc-
where some traits function simultaneously tive periods. In addition, a number of direct
as both dependent variables to be predicted and indirect causal relationships were
by other genetic and environmental factors identified that described how genes interact
and as independent predictor variables of with environmental factors to affect GEI of
other traits. several important agronomic traits.
Dhungana et al. (2007) developed a sys-
tematic approach for understanding GEI of QEI mapping examples
complex interrelated traits by combining
chromosome institution lines that allowed There are numerous examples now avail-
studying the effects of genes on a single able for QEI mapping using some of the
chromosome with a structural equation approaches described above. Only a few
model (SEM) that approximated the com- examples will be discussed here to repre-
plex process involving genes, environmen- sent different approaches.
tal conditions and traits. Structural equation Romagosa et al. (1996) assessed AMMIs
modelling is a generation of path analysis value in QTL mapping. This was done
proposed by Wright (1921) and is used to through the analysis of a large two-way table
quantitatively analyse the causal structure of GED of barley (Hordeum vulgare L.) grain
among a number of variables where each yields. Grain yield data of 150 DHs derived
may function as a dependent variable in from the Steptoe Morex cross and the two
some equations and an independent vari- parental lines, were taken by the NABGP
able in others (Bollen, 1989). Because of at 16 environments throughout the barley
this, SEM is ideal for characterizing the production areas of the USA and Canada.
Genotype-by-environment Interaction 409
Four regions of the genome were identified QTL exhibited significant QEIs in the Ler
to be responsible for most of differential Col and Cvi Ler lines, respectively. These
genotypic expressions across environments. interactions were attributable to changes in
They accounted for approximately 50% of magnitude of effect of QTL more often than
the genotypic main effect and 30% of the to changes in rank order (sign) of effect.
GEI sums of squares. The magnitude and Multiple QEIs (in Cvi Ler) clustered in two
sign of AMMI scores for genotypes and sites genomic regions on chromosomes 1 and 5,
facilitate inferences about specific inter- indicating a disproportionate contribution
actions. The parallel use of classification of these regions to the phenotypic patterns
(cluster analysis of environments) and ordi- observed.
nation (PCA of the GE matrix) techniques By using factorial regression models,
allowed most of the variation present in the agronomic and molecular genotype data
GE matrix to be summarized in just a few and three environmental covariates (daily
dimensions, specifically four QTL showing mean temperature, precipitation and solar
differential adaptation to four clusters of radiation) recorded in each test environ-
environments. ment, Campbell et al. (2004) investigated to:
An illustration of the uncertainties (i) detect which of these three environmen-
occurring when attempting to find specific tal covariates may account for GEI by testing
genetic factors for yield is presented by individual genotype environmental cov-
Reyna and Sneller (2001). They attempted ariate interactions; and (ii) detect marker
to introgress alleles for yield into elite soy- environmental covariate interactions that
bean material in the southern soybean area provide explanations of variable QTL gen-
of the USA. As variation was scarce, the otypic differences across environments.
authors tried to exploit beneficial alleles for Agronomic performance and molecular
yield identified as QTL in cv. Archer from marker data available for a population of
the northern soybean area of the country chromosome 3A recombinant inbred chro-
(Orf et al., 1999). Reyna and Sneller (2001) mosome lines (RICLs-3A) in seven environ-
built up four NILs for each QTL and tested ments were used along with environmental
them under different environmental con- covariate data to construct individual fac-
ditions. They found that the QTL for yield torial regressions to explain GEI and QEI.
identified in a particular cultivar and envi- Precipitation and temperature before anthe-
ronmental condition did not contribute sig- sis had the greatest influence on agronomic
nificantly to improved yields in a different performance traits for the RICLs-3A and
genetic background and different environ- explained a sizeable portion of the total
mental conditions. The authors concluded: GEI for those traits. Individual molecular
It may be difficult to capture the value marker environmental covariate interac-
assigned to QTL alleles when the alleles tions explained a large portion of the total
are introgressed into populations with dif- marker environment interactions for several
ferent genetic backgrounds, or when tested agronomic traits.
in different environments. Laperche et al. (2007) used three
Ungerer et al. (2003) examined inflores- methodologies to reveal QTL nitrogen
cence development patterns in Arabidopsis interactions: (i) QTL detected separately
under different, ecologically relevant pho- under both types of N supply; (ii) QTL
toperiod environments for two RIL map- detected for global interaction variables
ping populations (Ler Col and Cvi Ler) assessed as N+/N and N/N+; and (iii) QTL
using a combination of quantitative genet- considered for factorial regression slope
ics and QTL mapping. Plasticity and GEI and ordinate parameters, which represent
were regularly observed for the majority of a plants sensitivity to N stress and plant
13 inflorescence traits. These observations performance under a limited N supply. In
can be attributable (at least partly) to vari- total, 233 QTL were detected for the traits
able effects of specific QTL. Pooled across measured in each combination of environ-
traits, 12/44 (27.3%) and 32/62 (51.6%) of ment and N supply (N+: high supply; N: low
410 Chapter 10
supply). Comparison of QTL detected under ing with GEI in a breeding programme:
N+ and N levels identified 13 non-specific (i) ignoring them, i.e. using genotypic means
QTL, eight N+ specific loci and seven across environments even when GEI exists;
N specific loci. For QTL for global inter- (ii) avoiding them; or (iii) exploiting them.
action variables, four adaptive loci were Kang (2002) discussed these three ways.
validated and eight constitutive loci were Interactions should not be ignored when
found to be involved in G nitrogen inter- they are significant and of the crossover
action. Nine interactive loci were validated type. The second way of dealing with these
and three new loci detected using factorial interactions, i.e. avoiding them, involves
regression variables. minimizing the impact of significant inter-
actions. One approach is to group similar
environments (forming mega-environments)
10.4.4 Utilization of MET and genotypic via a cluster analysis as discussed in pre-
data vious sections. With environments being
more or less homogeneous, genotypes
evaluated in them would not be expected
While the international METs have been
to show COIs. By clustering environments,
used effectively to exchange germplasm
potentially useful information may be
there have been only limited analyses of the
lost. International research centres such as
large data sets that they generate. Further,
CIMMYT, aim to identify maize and wheat
many of these analyses have focused on a
genotypes with broad adaptation (i.e. stable
specific international MET conducted in 1
performance across diverse environments)
year. Analyses which integrate the infor-
at many international sites. If the subgroup-
mation from international METs across
ing is used to eliminate the environments
years have been attempted in only a few
that share the same factors and are identical
cases. The strength of these studies is that
to each other (redundant test environments),
they integrate large quantities of data on
optimization of the environment sites will
spatial and temporal GEIs and provide a
also help determine the broad adaptation by
basis for identifying repeatable interactions.
using as few environment sites as possible.
However, their weakness is that in most
The third approach encompasses stability
cases there is only a limited information
of performance across diverse environments
base for explanation of these interactions.
by analysing and interpreting genotypic and
There are great opportunities for synergy
environmental differences. This approach
between the statistical, genetical and bio-
allows researchers to select genotypes
physical modelling methodologies (Cooper
with consistent performance, identify the
and Hammer, 1996). The complete data set
causes of GEI and provide the opportunity
which contains genotype, phenotype and
to correct the problem. When the cause for
environment information for numerous
the unstable performance of a genotype is
genetics and breeding materials opens the
known, either the genotype can be improved
door for comprehensive use of them in both
by genetic means or a proper environment
genetics and plant breeding including GEI
(inputs and management practices) can be
through genome-wide association mapping.
provided to enhance its productivity.
In Chapter 6, we provided an example of
The best approach for breeders and
the use of cultivars in METs and their phe-
geneticists would be to understand the nature
notypic and genotypic data in linkage dis-
and causes of GEI and to try to minimize its
equilibrium mapping (Crossa et al., 2007).
deleterious implications and exploit its ben-
eficial potential through appropriate breed-
ing, genetic and statistical methodologies
10.5 Breeding for GEI (Singh et al., 1999). Appropriate analyses of
data can provide an opportunity for exploit-
How can a breeder deal with GEI? Eisemann ing GEI through applied analytical meth-
et al. (1990) listed three ways of deal- ods, such as AMMI and GGE bi-plot, using
Genotype-by-environment Interaction 411
climatic factors to explain GEI, evaluating study, factorial regression revealed that
risk of production and optimizing allocation water deficits during the formation of grain
of land resources to various genotypes for number and N level were also associated
selection in heterogeneous environments. with GEI.
To alleviate GEI concerns caused by
stresses, breeders need to know as much
10.5.1 Breeding for resource-limited about the various characteristics of geno-
environments types as possible. They also need to charac-
terize environments as fully as possible
Breeding for resource-limited environments (Kang, 2002). Knowledge of soil character-
is one of the major objectives for many inter- istics and ranges of weather variables and
national breeding programmes. Commonly stresses that plant materials will be exposed
the performance of a genotype in an envir- to is a prerequisite to exploiting the benefi-
onment is a function of the influence of cial potentials of the genotypes and environ-
many interacting factors and environments ments and to targeting appropriate cultivars
differ in the type, intensity and timing of to specific environments.
these challenges. Although biotic stresses andinteractions
The performance level is important. among them and/or with abiotic factors
Where productivity is low due to an over- remain poorly understood, they have signif-
riding environmental limitation, it may be icant relevance to GEI in plants. Plants may
comparatively easy to accomplish improve- respond to pathogen infection by inducing
ment in performance through genetic and/ a long-lasting, broad-spectrum, systemic
or environmental change (Kang, 2002). resistance to subsequent infections. Induced
A relatively simple genetic change may have disease resistance has been referred to as
quite a fundamental influence on perform- physiological acquired immunity, induced
ance and hence adaptation; for example, use resistance or systemic acquired resistance.
of an early maturity, vernalization require- Differences in insect and disease resistance
ment or even a morphological character to among genotypes can be associated with
avoid frost damage, genetic resistance to a stable or unstable performance. It is highly
specific disease, genetic tolerance of a nutri- desirable to identify QTL for a complex trait
tional disorder, etc. Equally, environmental that is expressed in a number of environ-
modification to overcome the limitation ments. Crossa et al. (1999) found that higher
may be possible. In these situations, the key maximum temperature in low- and interme-
to plant improvement is the recognition of diate-altitude sites affected the expression
the nature of the stress or challenge and of of some QTL, whereas minimum tempera-
the adaptive response (Cooper and Byth, ture affected the expression of other QTL,
1996). in tropical maize.
In order to increase crop productivity
through enhanced yield potential, heterosis,
modified plant types, improved yield stabil-
ity, gene pyramiding and exotic and trans- 10.5.2 Breeding for adaptation
genic germplasm, it is important to identify and stability
the factors that are responsible for GEI.
Brancourt-Hulmel (1999) used crop diagno- An understanding of the genetic basis of
sis with the analysis of interaction by fac- adaptation and stability and their physio-
torial regression in wheat. She provided an logical and environmental causes is of fun-
agronomic explanation of GEI and defined damental importance for understanding
the responses or parameters for each geno- GEI, for assessing the association between
type and each environment. Earliness at phenotypic and genotypic values and for
heading, susceptibility to powdery mildew enhancing the selection of superior and
and susceptibility to lodging were the major stable genotypes. The presence of COIs
factors responsible for GEI. In the same has important implications for breeding
412 Chapter 10
strategies that aim to improve either broad tatively in time and degree for a number of
or specific adaptation or some combina- uncontrollable factors, analysis of genetic
tion of both components of adaptation. differences becomes complex (Cooper and
The broad adaptation concept the Byth, 1996).
need to minimize GEIs (and maximize G) With a MET, a breeder can identify cul-
was successful for the rapid adoption of tivars with specific adaptation as well as
the seed-based technology of the Green those with broad adaptation, which will not
Revolution. But is it appropriate for the be possible from testing in a single environ-
green evolution? Cultivars must be diversi- ment. Broad adaptation provides stability
fied and matched with the diversity of pest against the variability inherent in an eco-
systems to ensure effective and durable system, but specific adaptation may provide
pest management. Genotypes will need to a significant yield advantage in particular
be matched with a less predictable water environments as discussed in Section 10.1.
supply in the irrigated system in rice. MET makes it possible to identify culti-
Scientists may also need to match genotypes vars that perform consistently from year to
with radiation levels (again unpredictable) year (small temporal variability) and those
to address the challenge of increasing the that perform consistently from location to
yield by 50%. location (small spatial variability). There
When we look at plant adaptation, is a need for developing cultivars with
which exploits spatial GEI, are we interested broad adaptation to a number of diverse
in this primarily as a proxy for temporal environments (adaptability) and a need for
GEIs to help ensure the stability of perform- farmers to use new cultivars with reliable
ance of our chosen genotypes over time? We or consistent performance from year to year
need to have clarity in our objectives in pur- (reliability) (Evans, 1993). Genetic improve-
suing this particular topic. Are we search- ment for low-input conditions would
ing for adaptability per se in order to exploit require capitalizing on GEI and slower or
technological spillover, or are we using limited gains in low-input or stress envi-
spatial GEIs to ensure stability of perform- ronments suggested that conventional high-
ance over time for farmers using improved input management of breeding nurseries
cultivars in particular locations? Tools like and evaluation trials might not effectively
modelling and simulation can complement select genotypes with improved perform-
spatial genotype-by-environment experi- ance at low-input levels (Smith, M.E. et al.,
mentation and analysis aimed at addressing 1990). Because of the success in favourable
either adaptability or stability issues. environments, plant breeders have tried
More commonly, however, there may be to solve the problems of poor farmers liv-
a range of responses to the environmental ing in unfavourable environments by sim-
challenge and the same adaptive response ply extending the same methodologies and
can result from different challenges. In philosophies applied to favourable, high-
these circumstances, the factors influencing potential environments, without consider-
adaptation are multivariate, quantitative ing the possible limitations associated with
and complex and may vary in an undefined the presence of a large GEI (Ceccarelli et al.,
manner between different genotypes. Thus 2001). Responses to selection under stress
it is more difficult to recognize the nature and non-stress environments and to selec-
of the challenge and explain the adaptive tion at high- and low-input levels need to be
response. In advanced testing programmes compared theoretically and practically.
where genotypes exhibit reasonably high As indicated by Kang (2002), stability
levels of performance, relative differences of cultivars would be enhanced if multiple
in adaptation and the specific nature of resistances/tolerances to stress factors were
the GEI become increasingly important in incorporated into the germplasm used for
defining breeding objectives and strategies. cultivar development. If every cultivar (dif-
However, where adaptation is a function of ferent genotypes) possessed equal resistance/
response to environments differing quanti- tolerance to every major stress encountered
Genotype-by-environment Interaction 413
Unbalanced data
10.5.3 Measurement of GEI in breeding
programmes Plant breeders often deal with unbalanced
data. Searle (1987) classified unbalanced-
ness as planned unbalanced data and miss-
Measure interaction at intermediate
ing observation. When a set of genotypes
growth stages
is grown in a specific set of environments,
A crop is exposed to variable environmen- oftentimes a balanced data set (without any
tal factors throughout the developmental missing scores) is not possible, especially
stages and the growing season. Generally, when a wide range of environments is used,
researchers investigate the causes of GEI or long-term trials are conducted. Hybrids/
for a quantitative trait such as yield that are cultivars are continually replaced year
phenotyped at the final harvest stages. To after year. Also the number of replications
critically investigate GEI, one may need to may not be equal for all genotypes because
record environmental variables and plant- experimental plots may be discarded for
growth measurements at specific time one reason or another. In such cases, plant
intervals throughout the growing season, breeders must deal with unbalanced data.
as suggested by Xu (1997) for dynamic QTL Researchers have used different ap-
mapping. This would help determine what proaches for studying GEI in unbalanced
effect, if any, the environmental variables data (Kang, 2002). Usually environmental
from an earlier period had on GEI at inter- effects are considered as random and cul-
mediate stages and on the final yield. This tivar effects as fixed. Inference on random
may provide a better understanding of the effects using least squares, in the case of
dynamic development process of a quanti- unbalanced data, is not appropriate because
tative trait. information on variation among random
414 Chapter 10
of gene networks. This framework, called N genes can interact in K different ways to
the E(NK) model, is an extension of the NK determine the trait phenotype in E differ-
gene network model that was introduced ent environment-types. K = 0 indicates the
and used by Kauffman (1993) to study N genes act independently in the model
the behaviour of gene networks and their and a larger K indicate increased levels of
influences on organism development and interactions among the N genes.
evolutionary processes. When the E(NK) Kauffmanns landscape concept can be
model is applied to the study of issues rel- used in combination with the E(NK) model
evant to plant breeding processes, it allows to examine how the shape of the phenotype
for the property that the influence of a gene landscape changes with the genetic archi-
network on the expression of a trait can dif- tecture of a trait, as determined by changes
fer in varying environmental conditions. in the levels of E, N and K. The simple addi-
Thus, E identifies different environment- tive finite locus model is defined by the case
types within the context of a defined TPE, where E = 1 and K = 0, thus E(NK) = 1(N:0)
N identifies the different genes and K iden- (Fig. 10.8). As E and K are increased for a
tifies the degree of connection between given level of N, the effects of the alternative
subsets of the total set of N genes, i.e. the alleles for the N genes become increasingly
gene network topology (Kauffman, 1993; context dependent on the genotypes of other
Cooper et al., 2005). Thus, in the termi- genes and on the range of environment-
nology of quantitative genetics the E(NK) types in the target population of environ-
model is a finite locus polygenic model ments. Thus, context dependent effects of
that can be defined to include effects of genes due to epistasis and GEI can be simu-
epistasis and GEIs. The parentheses around lated (Cooper and Podlich, 2002). Building
the NK term are used to indicate that the on the landscape metaphor (Fig. 10.8), it is
0<K<N1
K=N1
Increasing
epistasis
Target population of environments
E(N:K) landscapes
One of the challenges in molecular breed- All techniques for gene isolation exploit
ing is to understand how thousands of gene one or more of the four characteristics
products interact with each other to control that define genes (Gibson and Somerville,
development and the ability of an organ- 1993): they have a defined primary struc-
ism to respond to its environment. Gene ture (sequence); they occupy a particular
isolation and its functional analysis are not location within the genome; they encode
only for development of functional markers an RNA with a particular expression pat-
but also for manipulating plants through tern; and many genes encode protein or
genetic transformation. For sequenced plant mRNA products with a defined function.
species identification of function for each Therefore, identification and functional
gene has become a major focus in the era analysis of genes can start at various points
of functional genomics. For example, the in the process of gathering information about
Arabidopsis community has developed an genomes: they can be identified from their
initiative to empirically identify the func- locations relative to closely linked mark-
tion of all Arabidopsis genes by the year ers on a genetic map, from their presence
2010. in populations of RNAs, from an analysis
To isolate and characterize all the genes of genomic sequence data with gene find-
in plants, it is important to first define ing programs, from comparisons with the
what we mean by a gene. A gene was ini- genomic sequence data from related organ-
tially defined as the nucleic acid sequence isms, or from their disruption with the sub-
that codes for a peptide. The definition now sequent appearance of a phenotypic variant
is extended to encompass many more fea- (Fig. 11.1; Cullis, 2004).
tures including the presence of gene fami- In a simple organism such as E. coli, one
lies within a plant, alternative splicing, can readily isolate individual genes associ-
RNA that functions without translation ated with a particular function. In a few Petri
into a protein and other confounding fac- dishes one might select among millions of
tors that together make a simple universal individuals to identify mutants in the func-
definition more difficult (Cullis, 2004). A tion of interest, then artificially introduce
gene, when defined as a transcribed (and wild-type (e.g. non-mutant) DNA into the
translated) unit, is usually split into coding mutants and identify those segments which
pieces (exons) that are separated by inter- restore normal function. Having a gene in
vening sequences (introns) in the eukaryotic hand, one might determine the sequence of
genomes. genetic information comprising the gene,
Genomic sequence
Gene discovery
Fig. 11.1. From genomic sequence to gene function. Steps and experimental approaches that are used
in the functional annotation of the genome. MPSS, massively parallel signature sequencing; SAGE, serial
analysis of gene expression. Modified from Alonso and Ecker (2006).
the protein product encoded by this infor- is difficult to discern the effect of a single
mation, the function of the protein, the gene (QTL) by merely looking at the appear-
regulation of activity of the gene or protein ance (phenotype).
by environmental factors and so on. Such The main source of empirical information
elegant schemes for gene cloning are now about gene function and structure has been the
being used for plants, but with significant capture and characterization of mRNA tran-
challenges. Higher plants tend to have scripts (Fig. 11.1). A variety of high through-
relatively large quantities of DNA in their put methodologies have been successfully
genome and more non-coding DNA than used for the model plant, Arabidopsis thal-
coding DNA, making it difficult to identify iana (Alonso and Ecker, 2006), including
particular genes of interest. Further, higher expressed sequence tags (ESTs) and full-
plants have relatively long generation times, length cDNA sequencing, whole genome
months to years (versus a few minutes for tiling microarrays and gene-expression arrays,
E. coli). Based on phenotypes, one is sel- a massively parallel signature sequencing
dom able to study millions of individuals or (MPSS) technique and serial analysis of gene
know enough about the function of a gene in expression (SAGE). Such methods provide
order to isolate it from the many thousands information on gene splicing and transcribed
of other genes in the organism. Many major units. The most widely used methods for
genes have been cloned based on various isolating genes based on their functions
approaches, but for traits affected by several involve protein purification, complementa-
genes or quantitative trait loci (QTL), the tion of mutant phenotypes, positional cloning
effects of any one are often partly masked using genetic maps and mutagenesis-based
by others and/or by environments. Thus, it gene identification. The major limitation to
Isolation and Functional Analysis of Genes 419
A B C
a
Peptide data, EST/mRNA data
and full-length cDNA data
b
Predicted Predicted
Actual Actual
Coding sequences
D False exon/coding region (A);
Predicted
False meaningful non-coding sequence (B)
Predicted Exon/intron discrepancy
Predicted True meaningful non-coding sequence
Actual Missed feature
Actual
Fig. 11.2. Intrinsic (light grey arrows) and extrinsic (dark grey arrows) methods for gene prediction. Thick
black lines represent query sequence data. WatsonCrick-strand coding sequences are indicated above
or below sequence strands. Path (A), ab initio gene prediction algorithms model gene content with data
from the query sequence itself. These methods can miss features such as small ORFs and small introns.
Exons are also missed, but ab initio methods can erroneously identify exons or whole coding sequences.
Generally, these methods are not applicable to the prediction of functional non-coding sequences.
Path (B), similarity-based gene prediction. These methods are comparative and incorporate data from
the alignment of one or more syntenic DNA sequences. Similarity-based methods display improved
sensitivity and specificity for coding and non-coding sequences over ab initio methods. The ability to
predict genes or conserved features is a function of the number of sequences compared, the evolutionary
distance of these sequences, and the degeneracy and size of the features in the homologous sequences.
Path (C), evidence-based gene prediction. These methods can be computational or experimental and
display high specificity but low sensitivity. The efficacy of the prediction is contingent on the quality/extent
of available expression data. Path (D), combinatorial approaches. In the example presented, similarity
evidence is combined with an ab initio prediction to improve the overall prediction of gene content.
Reprinted from Windsor and Mitchell-Olds (2006) with permission from Elsevier.
(Davuluri and Zhang, 2003; Windsor and (as a function of the abundance of mRNA)
Mitchell-Olds, 2006; Nicolas and Chiapello, and of alternative splicing. The problem
2007). Most of the algorithms use one of is more severe when the mRNA sequence
three types of external information: pro- has been obtained in a different organism
tein sequences, mRNA sequences or DNA (Nicolas and Chiapello, 2007). If the query
sequences. sequence is very long, MEGABLAST is a bet-
ter choice, which is specifically designed
Comparison with EST/cDNA databases to efficiently find long alignments between
very similar sequences. MEGABLAST is also
It has been demonstrated that homology- optimized for aligning sequences that dif-
based cloning was a very effective way to fer slightly as a result of sequencing errors.
identify tissue-, organ- and developmental Davuluri and Zhang (2003) suggested the
stage-specific expressed genes by assem- use of an expected value (e-value) of 0.1 and
bling EST sequences derived from the same filtering for low complexity repeats. When
library and performing homology searches larger word size (with the default value
in databases. In an early report, about 18% of 28) is used, it increases the search speed
5000 human clones were assigned probable and limits the number of database hits. For
function (Adams et al., 1991) by simply pick- BLASTN, the word size can be reduced from
ing random cDNA clones, obtaining partial the default value of 11 to a minimum of 7
sequence from the 5' end and comparing the to increase the sensitivity. Algorithms that
six possible translations of the partial cDNA can be used for DNAcDNA and DNAEST
sequences with the sequences of known alignments include SIM4 (Florea et al., 1998)
proteins in the various databanks. The and GENESEQUER (Usuka et al., 2000).
sequencing of mRNA through the sequenc- Similarity-based methods (e.g. BLASTN,
ing of cDNA is an experimental technique BLASTX) are perhaps the best to determine
to determine the sequence of genes that are whether a given region of the genome is tran-
transcribed. As sequencing takes places scribed or not. A BLASTN match to a cDNA/
after splicing, cDNA sequencing also allows EST or BLASTX match to a protein is good
precise determination of the intronexon evidence that the region belongs to a gene.
structure of genes. By directly comparing a However, these methods have their own lim-
genomic DNA sequence (query) with ESTs itations (Davuluri and Zhang, 2003). Even
or cDNA, regions of the query sequence the most comprehensive cDNA projects will
that correspond to processed mRNA can be miss low copy number transcripts and those
identified. transcripts whose expression is low, cell- or
BLASTN is a common program that iden- tissue-specific, or expressed only under unu-
tifies similar nucleotide sequences that sual conditions. cDNA or ESTs can contain
exist in databases (nr/EST) to the query one or more introns, if the mRNA was par-
sequence (see Basic Local Alignment Search tially spliced, which could lead to misclas-
Tool (BLAST) help at http://www.ncbi.nlm. sification of intron regions as exons. Some
nih.gov/BLAST for further details about cDNA sequences may result in incorrect
BLASTN and other programs). The similar- protein prediction. Partial BLASTX alignment
ity between the sequences is estimated by to a target protein should not be consid-
aligning the sequences as closely as pos- ered, as the protein may not be a true ortho-
sible. The BLASTN algorithm finds similar logue of the source gene and only shares
sequences by generating an indexed table some domains, although it would still give
or dictionary of short subsequences called some information for gene prediction.
words for both the query and the database.
However, determination of the DNA struc- Comparison with protein sequence
ture of genes from ESTs is not trivial, because databases
these sequences are generally incomplete
(often sequenced at the 3' end), of poor DNAprotein similarity can be compared
quality (sequenced only once), redundant to predict the protein coding sequences
422 Chapter 11
SPL)is significantly higher than simple splice finding genes in genomic sequences, since
site prediction programs described above, the evidence of support (mRNA, EST, pro-
because these programs integrate splice site tein) was already derived experimentally
models with additional types of informa- (Davuluri and Zhang, 2003). Ab initio gene-
tion, such as compositional features of exons prediction programs do not rely on such
and introns. MZEF, based on quadratic discri- data, but miss some known genes (false
minant analysis, was specifically trained to negatives) and predict some that are not real
predict internal exons (Davuluri and Zhang, (false positives). A combination of ab initio
2003). It was shown to perform better than gene prediction programs and homology-
FGENESP, GRAIL, GENSCAN and GENEMARK.HMM in based approaches has been automated in
predicting internal exons for the Arabidopsis several programs such as GENOMESCAN and
genome. For predicting initial and terminal RICEGAAS to produce more reliable predic-
exons, GENSCAN and GENEMARK.HMM are the tions of protein-coding regions. GENOMESCAN
best options, even though the accuracy of incorporates protein homology information
predicting these exons is significantly lower (BLASTX hits) with the exonintron predic-
than that of internal exon prediction. tions of GENSCAN. It first masks the inter-
spersed repetitive elements in the genomic
sequences with REPEATMASKER and then
Gene modelling
combines the GENSCAN predicted peptides
The accuracy of individual exon predic- with BLASTX hits. The program determines
tion can be further improved by combin- the most likely parse (gene structure),
ing the compatibility of the reading frames conditional on the given similarity infor-
of adjacent exons to make a full coding mation under a probabilistic model of the
transcript. Probabilistic models, such as gene structural and compositional proper-
Hidden Markov Models, have been used ties of genomic DNA for the given organism.
to incorporate this information in GENSCAN There are two major ways to integrate dif-
and GENEMARK.HMM, which model different ferent approaches to improve the prediction
states (exon, intron, intergenic regions, etc.) (Nicolas and Chiapello, 2007). The first way
of a gene. The GENEMARK program imple- is to use the programs separately, then carry
ments a sliding window strategy. The win- out a post-treatment of the results. JIGSAW
dow slides along the sequence and at each (Allen and Salzberg, 2005) is an example
position the program computes the proba- of program developed for this task. It uses
bility of the sequence contained in the win- dynamic programming algorithms to auto-
dow under seven models: non-coding and matically combine predictions made with
coding on each of the two strands in each of independent programs. The second way is
three reading frames, to obtain a probability to develop programs that have their predic-
of the locally coding nature of the sequence tions simultaneously based on intrinsic and
in each reading frame. A simple alterna- extrinsic criteria.
tive to sliding windows is implemented by Despite great progress, gene prediction
GLIMMER (Nicolas and Chiapello, 2007). The by computational approaches alone is still
program first extracts ORFs longer than a far from perfect. Comparing the perform-
certain threshold and, secondly, attempts to ances of different approaches is a careful
classify them according to their coding or and difficult task as indicated by Nicolas
non-coding nature. and Chiapello (2007). The difficulties arise
partly from the diversity of information
taken into account as well as the diversity
of the predictions made (complete coding
11.1.4 Gene prediction by integrated sequences, exons, splicing sites). Just as the
methods quality of intrinsic methods depends clearly
on their adjustment to a given species, that of
Gene prediction by homology-based meth- extrinsic methods depends on the degree of
ods is perhaps the most efficient way of similarity between the sequences compared.
Isolation and Functional Analysis of Genes 425
Last but not least is the problem of a gold character. Most of the transmembrane seg-
standard that could serve as a reference for ments are made up of a helices and around
the comparison of the approaches. 2530 residues are required for the polypep-
Before running any gene-finding pro- tide chain to cross the membrane in the form
gram, Davuluri and Zhang (2003) suggested of an a helix. Some segments of protein
the use of programs such as REPEATMASKER, sequences have an over-representation of a
which identifies known classes of inter- small number of amino acids or even show
spersed repeats and long and short inter- a more or less regular repetition of a particu-
spersed nuclear elements (LINEs and SINEs), lar peptide. They do not have a conventional
which exist in non-coding regions of the three-dimensional globular structure. These
genome. Almost all gene finding programs zones rich in specific amino acids supply
can predict only protein coding regions and practically no information on the function
have not been trained to predict untrans- of proteins and they must be masked before
lated exons and untranslated portions of using homology search methods, because
first and last coding exons. In addition, their abnormal amino acid composition
identifying the exact boundaries of all the disturbs the statistics associated with these
exons and assembly of the exons into differ- techniques and often results in false infer-
ent genes is not possible by computational ence of homology. Coiled coils zones are
approaches alone. As indicated by Davuluri formed by a bundle of two or three a helices
and Zhang (2003), however, even the partial and they can be detected based on statistical
predictions are of immense value to design techniques that take into account the prob-
the experiments that can determine the com- ability of observing a particular amino acid
plete gene structure faster than would be at each of several characteristic positions
possible by experimental methods alone. (Lupas, 1996). Cellular sorting of proteins
towards organelles that are their final des-
tination depends on signals present in the
primary structure. Techniques based on the
11.1.5 Detecting protein function from overall composition of amino acids can
genomic sequences be used to predict the location of proteins
in various organelles.
There are three major classes of in silico
methods used to obtain information on pro-
Homology search methods
tein function: methods using information
intrinsic to the sequence; homology search Homology search methods play a central
methods; and methods based on the context role in in silico functional analysis to dis-
of genes (Gibrat and Marin, 2007). Homology cover similar proteins in databases. There
search techniques provide precise informa- are two principal categories of homologous
tion and thus occupy a central place, while proteins: orthologous proteins (which result
others give only general information. from a process of speciation) and paralo-
gous proteins (which result from a process
of duplication). Gibrat and Marin (2007)
Intrinsic methods
evaluated different means by which a rela-
The methods using information intrinsic tionship of homology between two proteins
to the sequence detect recognisable protein can be inferred, including sequence com-
structure such as transmembrane segments, parison, profile detection, motif detection
zones of low complexity, coiled coils and and fold recognition.
cellular sorting signals (Gibrat and Marin, As with DNA sequence comparison,
2007). As transmembrane segments are protein sequence comparison is the most
mostly made up of hydrophobic residues, natural and the oldest method of indicat-
the detection methods are based on the ing a relationship of homology between
search for segments that have an appropri- two proteins. BLAST and FASTA are sequence
ate size and present a marked hydrophobic comparison methods based on the principle
426 Chapter 11
that two sequences with a common ancestor of functional models can be classified into
should maintain some traces of that rela- those that occurred in an ancestor common
tionship in the sequences. Profile detection, to most of the present lines of organisms,
based on a multiple alignment of similar those concerning the genetically mobile
sequences, can be used to estimate the vari- domain that are found in different proteins
ability of each of the positions along the and those similar to cytochrome P450, pro-
sequence of a protein. Multiple alignments viding some information on interaction
can be made by using PSI-BLAST (Altschul between proteins (Gibrat and Marin, 2007).
et al., 1997), which constructs the multiple Gene proximity methods are based on the
alignments iteratively during the search by observation that functionally linked genes
comparing with profiles of families of pro- are co-regulated and have a tendency to
teins from databases. Motif detection is to be close together in the genome and that
search for motifs that correspond to a func- the position of a gene in the genome may
tional signature, or even to residues neces- provide information about its function.
sary to maintain the correct geometry of the Protein function can be predicted, e.g. by
active site of a protein. As these residues are measuring a local proximity that involves
crucial for the function of the protein they the conservation of nearby gene pairs in
are thoroughly conserved. Some programs, different genomes being compared. Gene
such as SCANREGEXP and PFSCAN, can be used co-occurrence is based on the concept that
to search for motifs characteristic of particu- genes that are implicated in a particular cel-
lar proteins in the Prosite library of motifs lular process, common to genomes of this
(Hofmann et al., 1999). Fold recognition set, can share an identical phylogenetic pro-
methods are based on the alignment of a file. Thus, an unknown protein that shares
sequence on a three-dimensional structure a phylogenetic profile with proteins that are
to indicate relationships of homology, which known to be implicated in a particular cel-
can be used to reveal the distant homo- lular pathway has a high chance of playing
logues that cannot be detected by sequence a role in that pathway.
comparison methods.
the exploitation of genomic information for artificial chromosome (BAC), which permits
crop improvement. This is because a large more detailed mapping in the large genome
number of gene functions are conserved species. Chromosome walking, as a basic
across species, either directly or after iden- step in map-based cloning to be described
tifying the functional homologues. Perhaps in detail later, is often difficult with large
the most exciting application of compara- genomes such as barley, maize and wheat. In
tive genomics will be the identification of these cases, related plant species with small
different versions of genes for a target spe- genomes such as rice, which show genomic
cies from other related species. Orthologous collinearity with the large genome species,
genes in related species will be similar in can be used to identify and isolate desired
sequence and function to those in the target genes. This approach has potential pitfalls,
species but could result in markedly differ- especially with respect to some disease
ent phenotypes (Xu et al., 2005). resistance genes (Kilian et al., 1997; Leister
Conservation of gene content and gene et al., 1998; Pan et al., 2000). Resistance
order among closely related plant species gene regions often undergo rapid rearrange-
greatly assists in gene identification and ment that results in a lack of micro-colline-
annotation. Even in closely related plant arity caused by deletion or translocation of
genomes, whose ancestors diverged from the target loci. However, at the very least,
each other more than 10 million years the comparative genomic approach pro-
ago (mya), only genes are conserved in vides numerous probes from one species,
orthologous regions. All of the plant spe- which can be used for gene mapping and
cies with large genomes studied have isolation in another species (Ramakrishna
been evolved by the moment of retrotrans- and Bennetzen, 2003).
posons within the last 6 million years and Comparative genetics has been facili-
these sequences vary greatly between spe- tated by the development of massive data-
cies (Ramakrishna and Bennetzen, 2003). bases, efficient querying and comparison
Hence, plant species that diverged from software and ever-improving computers.
each other more than 50 mya only have Many of the first genes sequenced in rice
exonic regions conserved among genes. and other grasses were represented by
This feature has been used to improve gene abundant mRNAs (e.g. those encoding stor-
annotation with great success (Tikhonov age proteins and photosynthetic proteins).
et al., 1999; Dubcovsky et al., 2001; Members of the same gene families (e.g. para-
Ramakrishna et al., 2002). Gene structure logues), including those that were mapped
can be predicted more accurately using to the same genomic position and thus were
comparative sequence analysis than by the derived by vertical descent from a common
combined use of ESTs, homology to entries ancestral gene (i.e. orthologues), were often
in protein databases and gene prediction cloned and analysed in multiple species
programs as described in the previous sec- (Bennetzen and Ma, 2003).
tion. Conservation of genomic collinear- When sequences from two separate
ity, gene content and order among plant parts of the gene are moderately conserved,
genomes greatly assists in gene isolation degenerate oligonucleotides based on the
from cross-species comparisons. two sequences can be used to attempt to
Differences in gene content are some- amplify the intervening sequence by PCR.
times observed in otherwise collinear regions An amplified RT-PCR product or a degener-
of plant genomes (Tikhonov et al., 1999; ate oligonucleotide may be used in nucleic
Tarchini et al., 2000; Ramakrishna et al., acid hybridization to screen the colonies
2002; Bennetzen and Ramakrishna, 2002). or plaques of a cDNA/genomic library for
This phenomenon can complicate gene clones containing the gene of interest.
isolation, but does not completely invali- Positive clones from a genomic library need
date the approach. Under almost all circum- to prove that the clone actually contains the
stances, a small genome species will provide gene and to be further examined to identify
numerous DNA markers on a single bacterial where on the clone the gene is located.
428 Chapter 11
11.2.2 Experimental procedures involved maize. The results of such studies will show
in comparative analysis whether overall collinearity is maintained
in the region.
Ramakrishna and Bennetzen (2003) de-
scribed methods for plant gene isolation CLONE SELECTION AND MAPPING. Several thou-
based on comparative genetic map and/or sand clones from the small genome BAC
genomic sequence information. This tech- libraries are screened for individual clones
nique involves identification of collinear that show homology to DNA markers
regions, followed by clone selection and mapped in the collinear regions in different
finally, sequence analyses to identify the plant species.
gene of interest.
CONSTRUCTION OF SHOTGUN LIBRARIES AND SEQUENC-
Basic procedures ING. These two steps can follow a standard
procedure described in Chapter 3.
IDENTIFICATION OF COLLINEAR REGIONS. The
genetic map position of the targeted locus
SEQUENCE ANALYSES AND ANNOTATION. The first
in the plant species with a large genome
step in the sequence analysis of collinear
size must be determined accurately by seg-
BACs (for instance, when a collinear sor-
regation analysis of the locus with tightly
ghum BAC is sequenced to isolate a gene
linked markers. These markers should map
based on the genetic map location in maize)
to a collinear region in a related plant spe-
is the delimitation of regions that are con-
cies with a small genome to enable isolation
served and not conserved relative to rice.
of the targeted locus. Comparative genetic
Conserved regions are usually or always
linkage maps with common molecular
genes, while the unconserved regions are
markers serve as the best starting point.
usually not genes. Complete sequences
For example, the maize genome is about
from orthologous BACs are then compared
2400 Mb in size, corresponding to a genetic
using the program DOTTER to identify the
map of about 2500 cM. This translates to an
conserved regions. Genes can be predicted
average of 1 Mb/cM for the maize genome.
as described in the previous section.
A large mapping population of 5000 gam-
etes with no recombinants in the segregat-
ing progeny makes it likely that the targeted CONFIRMATION OF CANDIDATE GENES. The pos-
gene is present within a 500-kb region. The sible functions of candidate genes can be
rice genome has a size of 380450 Mb and a investigated using several independent
genetic map of about 1600 cM. This makes approaches. Sequence analyses and annota-
map-based gene isolation much easier in tion, as described above, using comparative
rice than in maize. sequence analyses, gene-finding programs
In cases where the gene of interest is and BLAST searches, identify putative genes.
absent in the small genome, we can still Sequence variations and gene structure
use markers from the orthologous region in analysis of the gene identified in the region,
the small genome to fine-map in the large for instance in susceptible and resistant
genome, as markers are often the limiting lines in case of disease resistance genes, can
factor for fine-mapping in some crop spe- help verify a candidate gene. As an example,
cies. The complete genome sequence in preliminary mapping, cloning, sequencing,
crop plants provides abundant information gene finding and BLAST searches identified
for choosing suitable markers. For exam- two candidate genes for barley Rpg1. These
ple, the maize BAC libraries can then be were differentiated by segregation analysis
screened with suitable markers from rice in 8518 gametes and by sequence analysis in
to identify BACs that harbour the gene of barley lines susceptible or resistant to stem
interest. The next step is to look for the rust (Brueggeman et al., 2002).
presence of flanking markers (tightly linked Additional experimental analyses can
to the targeted gene) on continuous BACs in be performed to evaluate candidate gene
Isolation and Functional Analysis of Genes 429
function. Several approaches can be used leaf-rust-resistance gene of bread wheat that
including mutation analysis and expression was successfully isolated using a strategy
analysis. of shuttle-mapping between diploid wheat
Mutation analysis can follow these as a model and bread wheat (Huang et al.,
steps (Ramakrishna and Bennetzen, 2003): 2003). Most of the time, however, there are
breakages in microsynteny that prevent
Analysis of knock-out mutations (i.e.
the straightforward identification of a can-
transfer DNA (T-DNA) or transposon
didate gene for a given trait. This was the
insertions).
case when attempts were made to isolate
Wild-type lines that either have a non-
the leaf-rust-resistance gene Rph7 (Brunner
functional or an overexpressed gene of
et al., 2003) or the photoperiod response
interest can be generated by transform-
gene Phd-H1 (Dunford et al., 2002) from bar-
ing wild-type plants with antisense or
ley. A similar story was reported for the Rfo
sense gene constructs.
restorer genes isolated from radish: markers
RNA interference can be employed,
flanking these genes in radish are collinear
where homologous double-stranded
with the Arabidopsis sequence, but the gene
RNA (dsRNA) is used to suppress a
itself is not present in Arabidopsis although
gene, generally resulting in a null phe-
many homologues are present elsewhere in
notype (same as above, combine these
the Arabidopsis genome (Brown et al., 2003;
two).
Desloire et al., 2003).
Complementation studies, where a
Examples for the use of a shuttle-mapping
wild-type copy of the gene of inter-
strategy have to be evaluated on a case-by-case
est is transformed into the mutant to
basis. The present information, from both
see if the T1 progeny yields wild-type
successes and failures, strongly suggests that
phenotype and whether complementa-
the development of efficient tools for isolat-
tion co-segregates with the transgene in
ing genes of agronomic importance within
subsequent generations.
each important family should continue to be
Searching for point mutations by tar-
a priority and that, as indicated by Delseny
geting induced local lesions in genomes
(2004), restricting ourselves to use of several
(TILLING) to provide an allelic series of
model species would be unwise, although
mutations.
collinearity has been useful in providing
Tissue-specific expression of the genes can additional markers with which to saturate
be studied using Northern blot analysis, fine genetic and physical maps.
microarrays, reporter constructs, or reverse
transcription-PCR to see if the expression
patterns agree with the predicted biology of
the targeted gene. 11.2.3 Cloning QTL facilitated by
related major genes
Examples
Robertson (1985) presented evidence that
Probably the most comprehensive appli- qualitative and quantitative traits may be
cation of collinearity in plants was the the result of different types of variation
attempt to clone specific barley disease of genic DNA at the loci involved. At any
resistance genes by chromosome walk- given locus, variation of a minor nature
ing using rice. The collinearity provided may result in wild-type alleles responsible
numerous DNA markers from rice that facil- for gene products with different efficien-
itated the chromosome walk in barley, lead- cies (quantitative alleles) while major genic
ing to the isolation of the desired stem-rust rearrangements or changes in the region of
resistance gene Rpg1, although the synteny the gene essential for a normal function-
with rice failed to yield the gene because it ing gene product may result in qualitative
does not seem to exist in rice (Brueggeman mutant alleles. Based on this hypothesis,
et al., 2002). Another example is the Lr21 Robertson proposed a possible approach to
430 Chapter 11
cloning QTL. It is apparent from previous tions and comparing the map position of
work that the alleles for quantitative vari- these QTL with previously known positions
ation assume possible allelic interactions of qualitative variations for the same charac-
and have a smaller individual effect than ter (Beavis et al., 1991). The results showed
alleles from qualitative variation. However, a general concordance in map positions
it is possible that alleles for qualitative of QTL and major genes affecting height,
mutants are simply loss-of-function alleles which is consistent with the hypothesis.
at the same loci underlying quantitative With the development of practical QTL
variation. Consider, for instance, a trait such mapping, the similar location of QTL and
as plant height. In maize, at least 17 known major genes is supported by many research
qualitative mutants that affect plant height results and only several early examples
have been identified (cf. Robertson, 1985). will be discussed here. In maize, many
These are non-allelic mutants, all of which QTL affecting plant height were located at
have been placed on chromosomes. In rice, known major loci (Edwards et al., 1992),
over 50 loci responsible for semi-dwarfism indicating that some QTL may be allelic to
or dwarfism have been found and mapped the major genes. In genetic analysis of rice
(Kinoshita, 1995). If all these loci had two blast resistance, a major gene was located
or more wild-type alleles responsible for on chromosome 8 by randomly amplified
quantitative variation, the combination of polymorphic DNA (RAPD) analysis on
these would come close to being sufficient resistant and susceptible plants of a double
to explain the quantitative inheritance pat- haploid (DH) population (Zhu et al., 1994).
tern observed by breeders. Theoretically, A QTL controlling quantitative resistance
QTL mapping studies can provide a test was found in the same chromosomal region
of this hypothesis. If a gene contributing when using molecular markers to map the
to quantitative variation is allelic to a gene resistance gene with quantitative pheno-
controlling qualitative variation, then these type data (Wang et al., 1994). In A. thaliana,
genes should map to the same locus along five QTL affecting flowering time were
the chromosome. For some organisms (e.g. identified in a cross between two ecotypes,
maize and Drosophila), many of the major H51 and Landsberg erecta and four of them
qualitative loci controlling morphological were located in regions containing muta-
variation have been mapped with a high tions or loci previously identified as confer-
degree of precision on genetic maps and ring a late flowering phenotype (Clarke et
these locations should be predictive of the al., 1995). Generally, associations between
locations of QTL for the same character. qualitative mutants and QTL are more often
As indicated by Robertson (1985), it seems than expected by chance. In maize, for exam-
unreasonable, to say nothing of wasteful, to ple, 75% of chromosome intervals harbour-
assume that a living organism would have ing discrete height mutants also harboured
two sets of loci, one for qualitative traits height QTL and 43% of intervals harbouring
and one for quantitative traits, when one set QTL also harboured mutants (cf. Lin et al.,
could account for both patterns. 1995), although the association is by no
Robertson (1989) gave two examples to means absolute. A report on QTL mapping
support his hypothesis. One of them is that in five rice populations detected 23 plant
a difference in gibberellin deficiency, con- height QTL. According to linkage relation-
trolled by a major gene, resulted in a quan- ships determined with restriction fragment
titative difference of plant height. He also length polymorphism (RFLP) markers, all
listed a series of qualitative traits which of the 13 major dwarfing or semi-dwarfing
are related to the quantitative traits of same genes were found to be in close proximity to
kind. This hypothesis has been tested in these plant-height QTL (Huang et al., 1996).
maize. Beavis and colleagues attempted to In Drosophila, the map positions of bristle
test the relationship of qualitative mutants QTL in every case corresponded approxi-
to quantitative variation by mapping QTL mately to those of candidate neurogenic
for plant height in four maize F2 popula- loci or loci with major bristle phenotypes
Isolation and Functional Analysis of Genes 431
(Long et al., 1995). However, the QTL were organisms and use these tags to fish a gene
located on the map with a low degree of res- out of a portion of chromosomal DNA by
olution in most cases mentioned above, rais- matching base pairs. The challenge associ-
ing the possibility that the QTL are linked, ated with identifying genes from genomic
but not identical to the qualitative loci, as sequences varies among organisms and is
indicated by Tanksley (1993). Until QTL are dependent upon genome size as well as the
mapped to higher degrees of precision and/ presence or absence of introns, the interven-
or cloned, it will be difficult to prove that ing DNA sequences interrupting the protein
the particular QTL actually correspond to coding sequence of a gene. A large number of
known loci defined by macromutant alleles. new genes have been identified in many spe-
As more and more QTL are cloned, whether cies by randomly sequencing cDNA clones
there are any corresponding macromutant to produce ESTs. In view of the efficiency of
alleles can be tested. this approach as a mechanism for establish-
As suggested by Helentjaris et al. (1992), ing relationships between plant phenotypes
the theory proposed by Robertson (1985) and the large amount of sequence infor-
provides a possible approach to identifica- mation available for other organisms, it is
tion and cloning of important quantitative desirable to obtain large numbers of partial
genes. For QTL cloning based on Robertsons sequences.
proposal, both major and minor genes
should be examined based on their rela-
tive contribution to expression of traits and
their interaction with each other. It can be 11.3.1 Generation of ESTs
expected that genetic relationships can be
established between known major genes To generate EST sequences, the mRNA is
and QTL, or QTL can be paralleled with the isolated and reverse transcribed into cDNA.
extreme mutants and one can verify these The cDNA clones are sequenced from either
relationships and facilitate QTL cloning the 5' or 3' ends of the cDNA or from both
through cloning the related major genes. ends (Chapter 3). The sequences are then
clustered to identify a series of tentative
unique genes (TUGs) or tentative contigs
(TCs) and an estimate of the number of dif-
11.3 Cloning Based on cDNA ferent RNAs present in the initial sample.
Sequencing The TUGs/TCs can then be compared with
the current databases to identify which of
One way to identify genes is to clone and these have already been described in the
sequence RNAs. Short stretches of cDNA species under consideration and which
sequences, derived from mRNA, are referred are still absent from the current databases.
to as expressed sequence tags (ESTs). ESTs are Where hits occur to ESTs from other organ-
usually 200500 nucleotides long and gener- isms, a possible function may be ascribable
ated by sequencing either one or both ends to the sequence (Cullis, 2004). The sequen-
of an expressed gene. cDNA sequence-based cing of any given sample is continued until
approaches for gene cloning have been exten- the rate of finding new sequences drops
sively applied in humans, Caenorhabditis below an acceptable level. Although a huge
elegans and plants. The precise nature of redundancy of highly abundant RNAs will
the sequence information obtained by EST be produced, low-abundance RNAs and
analysis and the ever increasing number of those genes that are only expressed in spe-
gene sequences of known function make it cialized cells are still likely to be missed.
possible and productive to identify specific Therefore, techniques facilitating the isola-
genes by sequence similarity as discussed in tion of specific tissues or cells, such as laser
the previous section. The idea is to sequence capture microscopy and RNA amplification,
cDNA that represents genes expressed in cer- may help identify genes that are expressed
tain cells, tissues, or organs from different at low levels or in very few cells.
432 Chapter 11
with EST generation. A full-length first- the second-strand synthesis. Again, this
strand cDNA is not efficiently produced primer also has an extension that includes
by reverse transcription, especially if the a restriction enzyme site. After the second-
mRNA has a stable secondary structure. strand synthesis the full-length cDNA is
Libraries made from cDNAs, therefore, can cloned with the restriction sites inserted
contain both full-length and partial cDNAs. with the first- and second-strand primers.
One method for constructing cDNA librar- The full-length cDNA for a desired
ies with a high content of full-length clones gene can be obtained using 5' and 3' RACE
involves starting from the first transcribed (rapid amplification of cDNA ends) tech-
nucleotide. A number of critical issues nique. RACE results in the production of
pertaining to synthesis and cloning of full- a DNA copy of the RNA sequence of inter-
length cDNAs have been recently identified. est, produced through reverse transcrip-
Most important is the purity and integrity tion, followed by PCR amplification of the
of the starting material. mRNA is often con- DNA copy. The amplified DNA copy is then
taminated with heterogenous nuclear RNA sequenced to obtain a partial sequence of
(hnRNA) due to the difficulty to exclu- the original RNA. RACE can provide the
sively isolate cytoplasmic RNA from plant sequence of an RNA transcript from a small
tissues. True full-length cDNAs will yield known sequence within the transcript to the
sequence information from both 5' and 3' 5' end (5' RACE-PCR) or 3' end (3' RACE-
non-coding regions as well. A full-length PCR) of the RNA.
cDNA should encompass all sequences
from the CAP site to the poly (A) addition
site. However, it is generally agreed upon 11.3.3 Full-length cDNA sequencing
that a cDNA comprising the entire coding
sequence of a protein should be considered
The wide availability and usefulness of
worthy for full-length sequencing at high
cDNA clones has spurred an interest in
accuracy.
using a high-throughput approach to
Cullis (2004) described a process of con-
obtaining the complete sequence of full-
traction of full-length cDNA. A biotin label
length clones. The approach to obtaining
for the CAP structure has been developed
the sequence of a full-length cDNA clone
based on the principle that the CAP site and
is different from that used to generate EST
3' end of mRNA are the only sites that carry
data. Many of the full-length cDNAs are
the diol structure. The diol groups at each
likely to be longer than the reads resulting
end of the mRNA are biotinylated and then
from sequencing both the 3' and the 5' ends
the first-strand of cDNA is synthesized.
of the insert. Therefore, additional sequenc-
This synthesis is primed with a degener-
ing strategies are necessary to obtain the
ate primer (XTTTTTTTT (restriction site) ).
full-length cDNA sequence. There are three
The reaction mixture is then digested with
possible strategies for full-length sequenc-
RNase I, which cleaves the single-stranded
ing (Cullis, 2004):
RNA molecules at any sites, to destroy RNA
molecules or part of them unpaired with Transposon mutagenesis (Kimmel
their cDNA. Therefore, the 5' ends of all the et al., 1997): a transposon is randomly
mRNAs not protected by their partial cDNAs inserted, in vitro, into the cDNA insert
and exposed as single-stranded are removed and primers designed from both sides
(along with the biotinylated CAP structure) of the transposon are used for sequen-
as are all the biotinylated 3' ends. The full- cing. Typically, each cDNA clone is
length cDNAs are captured on streptavi- subjected to a transposon reaction to
din-coated magnetic beads and the cDNA produce a population of subclones,
is released from the beads and the mRNA each harbouring a transposon at a dis-
is destroyed by treatment with RNase H tinct location. The subclones are then
and alkaline hydrolysis. The cDNA is then sequenced using transposon-specific
tailed with oligo(dG) that is used to prime primers, most often from each end of
434 Chapter 11
DNA by bacteriophage lambda (l) up to transcripts in the megabase DNA clone; and
400700 kb by YAC vectors. (iii) an efficient transformation system for
While chromosome walking is straight introducing exogenous DNA into the plant
forward in organisms with small genomes, it species of interest, permitting identification
is more difficult to apply in most plant spe- of the target gene by mutant complementa-
cies with large and complex genomes. The tion. Currently, an essential requirement for
strategy of chromosome walking is based positional cloning is the availability of com-
on the assumption that it is difficult and prehensive genomic libraries of relatively
time-consuming to find DNA markers that large DNA fragments, typically in YAC vec-
are physically close to a gene of interest. tors. The chromosome landing paradigm
Technological developments have invali- can readily be applied to cloning QTL that
dated this assumption for many species. As can be mapped with a high degree of reso-
a result, the mapping paradigm has changed lution. Progress has already been made in
such that it is often possible to isolate one or high-resolution mapping of QTL in plants
more DNA marker(s) at a physical distance and chromosome landing has been used to
from the targeted gene that is less than the clone genes for quantitative traits, as exem-
average insert size of the genomic library plified in the following section. As antici-
being used for clone isolation. The DNA pated more than a decade ago by Xu (1997),
marker is then used to screen the library QTL that have been cloned so far are those
and isolate (or land on) the clone contain- with large effect and can be easily verified
ing the gene, without any need for chromo- by transformation.
some walking and its associated problems Genome sequence information has
(Tanksley et al., 1995). Through this chro- reshaped the procedures of positional clon-
mosome landing approach, Martin et al. ing as the chromosome-aligned genome
(1993) isolated the tomato gene Pto, con- sequence information allows several of the
ferring resistance to the bacterial pathogen steps in positional cloning to be skipped
Pseudomonas syringae pv. This exempli- (Jander et al., 2002). With a larger number
fies the advantages of chromosome landing, of sequence-based molecular markers avail-
in that initial emphasis on the isolation of able, a certain level of genetic mapping may
many closely linked DNA markers elimi- be able to quickly associate the target trait
nated the need for chromosome walking and to a specific genomic region and further
the development of a high-resolution link- fine mapping effort may narrow the target
age map expedited the identification of can- genomic region to several candidate genes
didate cDNAs. This approach has become based on sequence information. This would
the main strategy by which positional clon- be followed by cloning, complementation
ing is applied to isolate both major genes by transformation and high quality de novo
and QTL in plant species. determination of the sequence of the entire
Contig assembly, or chromosome walk- region of interest without a previously deter-
ing/landing, facilitate positional cloning mined wild-type DNA sequence as a guide.
the isolation of genes based on genetic map Fig. 11.4 provides a comparison of map-
information. Positional cloning has proven based cloning in Arabidopsis between 1995
an effective means of isolation of genes and 2002 (with complete genomic sequence
in higher plants, but can be complicated available). The total effort required for map-
by physically large genomes, prominent based cloning reduced from 35 person-years
repetitive DNA fractions and polyploidy. to less than 1 person-year.
Positional cloning has several basic require- Methods have been proposed for cloning
ments (Paterson, 1996b): (i) delineation of of multiple QTL and QTL with small effects.
a target gene to a small chromosomal inter- Peleman et al. (2005) proposed a method to
val, preferably flanked by two DNA mark- fine map multiple QTL in a single popula-
ers and spanned by a single megabase DNA tion. As a first step, a rough mapping analysis
clone, or by a contig of several megabase is performed on a small part of the popula-
DNA clones; (ii) a means for identifying tion. Once the QTL have been mapped to a
438 Chapter 11
Fig. 11.4. Comparison of effort involved in map-based cloning in Arabidopsis. The key steps that have
become easier between 1995 and 2002 are presented. From Jander et al. (2002) reproduced with
permission of the American Society of Plant Biologists.
chromosomal interval by standard procedures, blast disease and were associated with the
a large population of 1000 plants or more is level of blast resistance.
analysed with markers flanking the defined
QTL to select QTL isogenic recombinants
(QIRs). QIRs bear a recombination event in the 11.4.2 Examples of positional cloning
QTL interval of interest, while other QTL have
the same homozygous genotype. Only these Positional/candidate gene isolation has
QIRs are subsequently phenotyped to fine been very successful. Some early and sig-
map the QTL. By focusing at an early stage on nificant examples include: (i) identification
the informative individuals in the population of genes underlying qualitative phenotypes
only, the efforts in population genotyping and using mutant analysis and a sequenced
phenotyping are significantly reduced as com- genome (Jander et al., 2002) and no mutant
pared to prior methods. Linkage disequilib- in a sequenced or unsequenced genome
rium methods for fine mapping may also offer (Buschhes et al., 1997); (ii) identification
improved accuracy of QTL detection (Bink and of genes underlying quantitative pheno-
Meuwissen, 2004; Grapes et al., 2004). types in an unsequenced genome (Frary
For QTL with small effects, fine-scale et al., 2000) and using positional analysis
mapping and positional cloning will be and structure/function interpretation in a
very difficult in the absence of a whole sequenced genome (Yano et al., 2000); and
genome sequence. However, in these (iii) comparing the function of the gene with
cases, reverse genetics may offer a solu- its orthologous counterparts in other spe-
tion, through functional genomics analysis cies and exploring how the gene interacts
of candidate genes that underlie QTL. For with other genes in a pathway (Izawa et al.,
example, Liu et al. (2004) identified five 2003).
candidate defence response (DR) genes With advances made in rice genom-
that co-located with QTL for resistance to ics, several QTL associated with the same
Isolation and Functional Analysis of Genes 439
traits or trait components have been cloned. Fine mapping and candidate gene
These include four QTL for heading date identification
Hd1, Hd3a, Hd6 and Ehd1 (Yano et al.,
2000; Takahashi et al., 2001; Kojima et al., 1. YAC clones containing fw2.2 were used
2002; Doi et al., 2004) and QTL for grain as templates to screen the cDNA library
number (Gn1a) and grain size (GS3) (Ashikari that contains the dominant allele (Fw2.2),
et al., 2005; Fan et al., 2006). More recently, which allows a positionally targeted search
the first QTL with significant pleiotropic for candidate genes.
effects has been isolated (Xue et al., 2008). The 2. 100 positive cDNA clones and four
QTL Gdh7, isolated from an elite hybrid rice unique transcripts were identified.
and encoding a CONSTANS, CONSTANS- 3. 3472 F2 individuals (derived from NIL
LIKE, TOC1 (CCT) domain protein, has major recurrent parent (RP) cross) were screened
effects on an array of traits in rice, including with four markers to establish the marker order
number of grains per panicle, plant height of the cDNAs along the YACs (Fig. 11.5B). An
and heading date. Enhanced expression alternative would be to sequence the YACs,
of Gdh7 under long-day conditions delays which, however, is more expensive.
heading and increases plant height and 4. cDNAs were used to identify four cosmid
panicle size. Sakamoto and Matsuoka (2008) clones (Fig. 11.5B) from an L. pennellii cosmid
summarized the genes identified in rice grain library consisting of 1550 kb genomic clones,
yield and its component trait including grain which are large enough to contain more than
number, grain weight, grain filling, plant one gene per clone including enhancer/pro-
height and tillering, In this section, cloning of moter, 5' and 3' UTRs, introns and exons.
fw2.2 (Frary et al., 2000) will be discussed in
detail as an example for identifying a gene(s) Complementation tests
underlying a quantitative phenotype in an
1. Identified cosmid candidate clones were
unsequenced genome.
used in transformation experiments with
two cultivated genetic backgrounds (Mogeor,
Preliminary genetic mapping TG496). In the hemizygous R0 generation,
the fruit weight of transformants was not sig-
1. Several QTL (11) associated with tomato
nificantly different from the controls due to
fruit weight were identified in primary inter-
partial dominance of the L. pennellii. Thus,
specific mapping populations: Lycopersicon
R0 plants were selfed and homozygous R1
pennellii (small fruit) Lycopersicon escu-
individuals with and without transgenes
lentum (large fruit).
were compared for phenotype.
2. All wild Lycopersicon spp. contain
2. Significant differences in fruit weight
small-fruit alleles at the locus fw2.2; mod-
were observed only for COS50 transform-
ern cultivars have large-fruit alleles, which
ants and the differences were signifi-
suggested that this locus is a domestica-
cant in both Mogeor and TG496 genetic
tion locus and partially recessive muta-
backgrounds.
tions lead to large fruit. The alleles from
3. By sequencing the COS50 clone, two
modern cultivars at fw2.2 increase fruit
ORFs were identified: cDNA44 (used as
weight by 530% in segregating popula-
probe) and ORFX (Fig. 11.5C).
tions but 47% in near-isogenic line (NIL)
4. Recombination with COS50 (XO33) delim-
populations.
ited fw2.2 to a region containing ORFX.
3. NILs were developed with a total of
41.9 cM of L. pennellii DNA (containing
fw2.2) in an L. esculentum background
Exploration of ORFX identity
(Fig. 11.5A).
4. Fine mapping narrowed the region to 1. A significantly higher level of ORFX
two YAC clones (150 kb region) containing transcript in small-fruited NILs (Fw2.2)
fw2.2 (Alpert and Tanksley, 1996) (Fig. 11.6; was found, compared to large-fruited (cul-
Fig. 11.5A). tivated) RP (fw2.2). No ORFX transcript was
440 Chapter 11
B
TG91 cDNA70 cDNA27TG687 cDNA38 cDNA44 HSF24 TG167
~5 kb
COS62 COS84 COS69 COS50
XO31 XO33
14
12
10
Log P
8
6
4
A 2
0
14.6
10.4
10.1
1.6
5.4
6.8
7.5
5.9
1.6 TG266
7.0
0.9
3.3
6.6
8.4
2.4
3.4
1.6
1.6
1.0
2.6
1.1
7.7
6.4
Distance (cM)
marker
TG154
TG537
CT59
TG492
CD66
TG91
TG167
TG361
TG140
TG151
TG189
TG426
CT232
TG48
TG33
TG308
TG353
TG469
CT205
CT255
TG554
TG145
CT9
Fig. 11.5. High-resolution mapping of the fw2.2 QTL. (A) The location of fw2.2 on tomato chromosome
2 in a cross between Lycopersicon esculentum and a NIL containing a small introgression (grey area)
from Lycopersicon pennellii (Alpert and Tanksley, 1996). (B) Contig of the fw2.2 candidate region,
delimited by recombination events at XO31 and XO33 (from Alpert and Tanksley, 1996). Arrows
represent the four original candidate cDNAs (70, 27, 38 and 44), and heavy horizontal bars are the four
cosmids (COS62, 84, 69 and 50) isolated with these cDNAs as probes. The vertical lines are positions of
RFLP or cleaved amplified polymorphism (CAPs) markers. (C) Sequence analysis of COS50, including
the positions of cDNA44, ORFX, the A-T-rich repeat region, and the rightmost recombination event,
XO33. From Frary et al. (2000). Reprinted with permission from AAAS.
TG686
TG687 TG167
CD66 TG91 HSF24 TG361 Plant Average ten
recombinant fruit weight (g)
ID # CA NY
NA b
3 62.1
11 69.3b NA
b NA
12 63.3
31 NA 63.0b
33 NA 50.5a
34 41.1 a
44.1a
Controls
M82-1-8 72.4 71.9
Location of fw2.2
= Homozygous Lycopersicon pennellii DNA
a = Significantly different (P < 0.01) from M82-1-8 large-fruited control
= Homozygous Lycopersicon esculentum DNA b = Significantly different (P < 0.01) from NIL 939-2 small-fruited control
= Interval in chromosome where crossover took place
Fig. 11.6. Graphical genotypes of homozygous recombinants in the fw2.2 region of chromosome 2. Five
replications of each recombinant plant were grown in California (CA) and New York (NY). The average gram
(g) weight of ten fruit from each recombinant was compared with the large-fruited, M82-1-8, and the small-
fruited, NIL 939-2, controls. Recombinants #3, #11, #12, and #31 were significantly larger (b; P < 0.01) for
average ten fruit weight in comparison to the small-fruited control, NIL 939-2, while recombinants #33 and
#34 were significantly smaller (a; P < 0.01) for average ten fruit weight in comparison to the large-fruited
control, M82-1-8. Recombinants #31 and #33 delineate the fw2.2 region (bracketed by arrows), based on
the smallest region demonstrating statistical significance. Plants for which few or no fruit were harvested
due to pest infection were not available (NA) for fruit weight analysis. The black and white boxes indicate
the homozygous condition for Lycopersicon pennellii (NIL 939-2) and Lycopersicon esculentum (M82-
1-8) at the molecular markers, respectively. The grey boxes indicate the approximate position between
two molecular markers where the genetic recombination event took place. The genetic distance between
molecular markers (separated by dashed lines) is indicated by the scale shown in centiMorgans (cM). From
Alpert and Tanksley (1996) National Academy of Sciences, USA 1996.
ORFX transcript, suggesting a role as a nega- has been used as a tool in plant breeding
tive regulator of cell division. for many years with numerous cultivars
released. As described in Chapter 1, various
chemical and physical mutagens have been
11.5 Identification of Genes used to create a wide variety of unique plant
by Mutagenesis mutants to increase the amount of variation.
Mutagenesis approaches have attracted the
Phenotypic variation that has been used attention of plant molecular biologists as
in genetic analysis and plant breeding they provide a means for identifying desired
comes from either natural variation or genes (Xu et al., 2005). Whole genome muta-
induced mutations. Natural phenotypic genesis brings an opportunity of mutating
variation is observed in germplasm collec- every gene contained in a plant species.
tions and exists as a random collection of In functional genomics, mutant popu-
diverse mutations throughout the genome, lations or libraries that cover all possible
although natural selection has led to main- genes become an increasingly important
tenance of these mutations. Mutagenesis tool. Mutant libraries can be constructed
442 Chapter 11
region has been defined. Mutation rates rying a constitutive promoter, such as the
as high as 103 alleles per gene have been cauliflower mosaic virus (CaMV) 35S pro-
reported in maize, suggesting that alleles of moter, that is capable of driving the expres-
any given gene might be found by screening sion of genes adjacent to the insertions. The
as few as 3000 M2 families, or 3000 M1 plants knock-about mutation is an insertion that
in the case of non-complementation screen does not inhibit normal functioning of the
(Candela and Hake, 2008). gene. The knock-knock mutation has more
Knock-out mutagenesis including most than one insertion, causing multiple knock-
that are chemically induced has the following outs. Finally, the knock-worst mutation
limitations: (i) redundancy a high level of includes insert events that lead to large-
gene duplication in plants provides genetic scale chromosomal rearrangement.
buffering (backup/second copy) such that Gene knock-outs imply that the activity
knocking out one member of a gene family of a gene has been eliminated. In plants the
may not affect phenotype; and (ii) lethality two major methods for generating these are
some genes confer essential functions; dis- by inserting either a T-DNA or a transposon
ruption will lead to lethality so knock-outs at sequence (Azpiroz-Leehan and Feldmann,
those loci will never be retrieved in generated 1997). The unique advantage of using for-
plants or in offspring. Conditional lethals may eign DNA as a mutagen is that the inserted
be retrieved, if necessary conditions, such as fragment not only disrupts the gene func-
temperature sensitivity, are met. tion but also tags the affected gene with
Resource populations in model systems known sequences, which greatly facilitates
provide genome-wide resources for all biolo- gene isolation. The DNA has a defined
gists so they do not have to develop them for sequence and acts as a marker for the loca-
each experiment. These populations allow tion of the mutation. Thus, by using oligo-
results of independent experiments to build nucleotide screening or specialized PCR,
on each other, because data on the same set of the mutagenized gene can be identified and
mutants can be maintained in a common data- sequenced easily. This method was first
base, facilitating worldwide collaboration. illustrated by the cloning of the white eye
locus of Drosophila (Bingham et al., 1981).
As a principle of insertion tagging, an
11.5.2 Insertional mutagenesis endogenous or an engineered DNA frag-
ment (with known sequence) is allowed to
Insertional mutagenesis occurs naturally insert at random into the genome. When it
in a number of plant species through the lands in a gene, it generally causes a reces-
excision and reintegration of endogenous sive, loss of function mutation. For insertion
transposable elements. The insertion of a mutagenesis to be useful for isolating all
known DNA segment into a gene of inter- genes from a plant genome, it will be neces-
est has been an extremely valuable genomic sary to saturate the genome with insertions
tool for a number of systems in mammals so that every single gene has been mutated.
and plants. Insertional events can be classi- The probability that an insertion will be
fied as T-DNA tagging, transposon tagging, found within a given gene can be estimated
retrotransposon tagging, or entrapment based on the size of the gene, the size of the
tagging, depending on the type of element genome and number of inserts distributed
used. Insertion events can also be labelled among the population (Krysan et al., 1999,
according to their sites and types of inser- 2002). Assuming random chromosomal
tions (Jeon et al., 2004). The knock-out is a insertion, tagging efficiency can be calcu-
null mutation with an insertion in the cod- lated according to the formula
ing or regulatory region of a gene. Knock-
down mutations cause reduced expression P = 1 [1 (L/C)]nf
due to an insertion in the promoter or 3'
UTRs. The knock-on (or activation tag- where P is the probability of finding an
ging) mutation has an insertion element car- insertion within a given gene, L is an average
444 Chapter 11
length for the gene, C is the haploid genome the ability to make and propagate large
size, n is the number of independent inser- numbers of transformants; (iv) the pre-
tional lines and f is the average number of dominance of loss-of-functional alleles; (v)
loci inserted per line. the biased distribution of insertion in the
Consider an example that: (i) the rice genome; (vi) the inability to characterize
haploid genome size is 3.8 108 bp; (ii) lethal mutations; and (vii) the difficulty
the average rice gene is 3.0 kb long; and of generating populations that are large
(iii) the mean number of insertion loci per enough to reach complete saturation of the
line is 1.4. A total of 417,000 tagging lines genome.
would be required for establishing a popu-
lation in which a T-DNA insertion could T-DNA tagging
be found within a given gene at 99% prob-
ability.The number of tagging lines required The transfer DNA (T-DNA) is a defined seg-
for saturation mutagenesis of a genome is ment of the tumor-inducing (Ti) plasmid
highly dependent on the length of the target of Agrobacterium tumefaciens and delim-
genes. A group of 1 kb genes in rice requires ited by short (25 bp) imperfect-repeat bor-
1,250,000 lines to achieve 99% probability der sequences called left and right T-DNA
of being mutated, whereas 5 kb genes need borders. The insertion of a T-DNA element
250,000 lines in the T-DNA tagging popula- into a chromosome can lead to many dif-
tion. Jung et al. (2008) summarized the rice ferent outcomes: insertion into the coding
insertional mutants generated by different region can lead to partial or complete inac-
mutagens including T-DNA, Ac/Ds, Spm/ tivation of the gene; while insertion into
dSpm, T-DNA with enhancer, full-length the promoter region can lead to complete
cDNA over-expresser (FOX) system and inactivation of the gene, reduced expres-
Tos17. sion of the gene, or increased expression
Insertion tagging has the following of the gene.
advantages: (i) insertion tagging gener- Several methods have been developed
ally inactivates a gene which simplifies for introducing T-DNA into Arabidopsis.
phenotypic evaluation (disrupts an ORF, These include various tissue culture and
interrupts promoter, interferes with intron- whole plant techniques. However, most
splicing); (ii) it marks the gene for isola- tissue culture-based transformation pro-
tion via inverse-PCR, TAIL-PCR (thermal tocols developed for Arabidopsis were
asymmetric interlaced), transposon dis- not directed toward insertion mutagen-
play, AIMS (amplification of insertion- esis. The vast majority of T-DNA tagged
mutagenized sites), cDNA-AFLP, etc; and genes have been isolated from populations
(iii) it can be used for both forward and of transformants generated with whole
reverse genetics. In forward genetics, it can plant transformation protocols (Jenks and
be used to screen for an interesting pheno- Feldmann, 1996). A computer database
type and uses the tag to isolate the gene. In has been established for Arabidopsis that
reverse genetics, it can be used to identify contains the precise genomic locations of
insertion in a gene sequence of interest and over 50,000 T-DNA insertions. Any gene
to figure out the phenotypic consequences of interest can quickly be found, if the col-
of the insertion; three-dimensional pools of lection contains a mutation in that gene,
insertion line DNA can be used for efficient by performing a simple BLAST search. The
screening. database of these insertions can be found at
Insertional mutagenesis as a tool for http://signal.salk.edu/cgi-bin/tdnaexpress
gene cloning has its limitations: (i) redun- and the Arabidopsis Knock-out Facility at
dancy and lethality (the same as chemi- the University of Wisconsin. A number of
cal mutagenesis); (ii) some species lack other crop plants have similar resources,
endogenous TEs or cannot mobilize them especially rice, which is already well
efficiently; (iii) engineered systems require served with T-DNA insertion lines (Parinov
Isolation and Functional Analysis of Genes 445
Retrotransposon
(Containing reverse transcriptase gene)
Donor DNA
RNA
cDNA
spot insertion by Ac/Ds), thus preventing about 4000 mutations per genome, com-
complete genome coverage; and (iv) in some pared with an average of 1.5 insertions per
organisms, insertional mutagenesis has transferred DNA (T-DNA) mutants (Alonso
never reached the efficiency needed for et al., 2003; Till et al., 2003). Chemical
large-scale mutagenesis. Due to these issues, agents generate a broader range of DNA
other types of mutations are also used in alternations; these are predominantly single
plant species, including deletion and chem- base-pair substitution, but also induce small
ical mutagenesis. insertions and deletions. Importantly, the
Chemical and radiation-induced muta- distribution of EMS-induced mutations is
tions have been widely used for random unbiased (Alonso and Ecker, 2006).
mutagenesis in plants, resulting in a broader
spectrum of mutation alleles that occurs Point mutations
randomly in the genome. Chemical muta-
genesis produces a broad range of mutant EMS, a base-alkylating agent that generates
alleles such as loss-of-function, gain-of- point mutations (of which the vast majority
function, reduction-of-function and novel are G/C-A/T transitions, which often lead
functions, in contrast to insertion and dele- to the creation of stop codons/nonsense
tion mutagenesis that causes mainly loss- mutations), has been used most commonly
of-function mutations. When an efficient because of its ease of use and the diversity
transformation tool is not available, it is not of potential mutants. As EMS causes a high
possible to adopt a gene tagging strategy, but density of mutations, fewer plants need to
these random, non-tagging systems can be be screened in order to target all genes, com-
utilized to create a mutagenized library. pared with other mutagenesis systems.
Ionizing radiation has been widely However, point mutations induced by
used to induce mutations for plant breeding EMS are subtle changes whose detection can
and classical genetic analysis, but the con- be challenging. Once a phenotypic mutant
sequences of ionizing radiation have only is identified, it is necessary to determine the
recently been examined closely at the molec- locus in the genome by a positional cloning
ular level. Several genes have been identi- strategy in order to clone the corresponding
fied in animals and plants using deletion gene, as discussed in the previous section. If
mutants. Fast neutron, gamma ray, X-ray and a mutant that exhibits an identical or simi-
UV radiations have been used in different lar phenotype has already been identified,
systems. Usually, fast neutrons produce large complementation crosses are a first step
deletions while the other three radiations to determine whether the new mutation is
yield small deletions or point mutations. allelic. The existence of multiple alleles can
Besides ionizing radiation, a number of give information on gene function and be
chemicals have been used to generate large useful for breeding.
mutant collections. Many chemicals can Strategies have been developed recently
be used as mutagens but di-epoxy butane so that subtle changes like point mutation
(DEB), N-ethyl-N-nitrosourea (ENU), ethyl- can be detected easily. For efficient adap-
methane sulfonate (EMS), di-epoxy octane tation chemical induced mutagenesis for
(DEO), ultraviolet-activated trimethylpso- reverse genetics in Arabidopsis and other
ralen (UVTMP) and hexamethyllphospho- plants, McCallum et al. (2000) developed
ramide (HMPA) are common mutagens used a large-scale screening system, targeting
in animals and plants. Generally, deletions induced local lesions in genomes (TILLING),
caused by chemical mutagens are relatively which allows a point mutation to be identi-
small, ranging from point mutations to fied. In the basic TILLING method, seeds
mutations of several kilobases. are mutagenized by treatment with EMS.
Chemical agents such as EMS and nitro- The resulting M1 plants are self-fertilized
somethylurea (NUM) are extremely efficient and DNA is prepared from the M2 individu-
mutagens in A. thaliana. Under optimal con- als. To screen many individuals a pooling
ditions, EMS treatment of seeds can generate strategy is used. DNA samples are pooled
450 Chapter 11
and pools are arrayed on microtitre plates EcoTILLING technologies and discussed the
and subjected to gene-specific PCR. High- process that has been made in applying these
throughput TILLING (Colbert et al., 2001; methods to many different plant species.
Till et al., 2003) uses the CEL I mismatch In addition, new methods for efficient
cleavage enzyme, which recognizes base- genome-wide detection of point mutations
pair mismatches (Oleykowski et al., 1998). are appearing on the horizon, including a
The PCR is performed using a mixture of mismatch-repair detection on tag arrays
labelled and unlabelled primers. One primer (Faham et al., 2005). Mismatch-repair
is labelled with the IR Dye 700 and the other detection allows > 1000 amplicons to be
with IR Dye 800. Melting and re-annealing screened for variations in a single labora-
of PCR products is followed by CEL I tory reaction. This approach can be scaled
treatment, which preferentially cleaves up to allow sequence comparison in whole-
mismatches in heteroduplexes between genome coding regions among large sets of
wild-type and mutant DNA sequences. lines and controls at a reasonable cost.
CEL I-treated PCR products are applied to
slab gel electrophoresis, then detected in Deletion mutagenesis
two separate channels by LI-COR scanners.
Mutations are indicated by shorter, cleaved Ionizing radiation mutagenesis causes de-
PCR products. If a mutation is detected in a letions and other types of chromosomal
pool, the individual DNA samples that went alternations. In plants, fast-neutrons are
into the pool can be analysed separately to well-established, very effective deletion
identify the individual that carries the muta- mutagens (Koornneef et al., 1982; Li, X.
tion. Once this individual has been identi- et al., 2001). Approximately ten genes
fied, its phenotype can be determined. This are randomly deleted in each line when
screening procedure can locate a mutation treated with fast neutrons at a dose of 60 Gy
to within a few base pairs for PCR products (Koornneef et al., 1982). As fast neutron-
of up to 1 kb in size. A potential problem deletion mutagenesis can be performed on
with this method is that any one individual numerous dry seeds and plant transforma-
will carry multiple mutations. Genetic anal- tion is not necessary, it is easy to produce a
ysis is therefore necessary to confirm that great number of mutants with a high prob-
any observed phenotypic alteration is asso- ability of finding a mutation in every gene.
ciated with the mutation in the target gene Cloning a gene mutated by a deletion
and not with another mutation elsewhere in requires chromosome walking, as with
the genome. However, TILLING often results chemical mutagenesis. However, deletion
in a number of allelic mutations in different mutants can also be effective for reverse
lines, which can help to confirm phenotype genetics. Deletion libraries have been estab-
as well as provide information on protein lished that contain knock-out mutants in
function. An important advantage of the Arabidopsis and rice (Li, X. et al., 2001).
TILLING method is that it can be applied Deletions can be identified by gene-specific
to any species for which a gene sequence is PCR screening of pooled DNA, where PCR
known. extension time is shortened so that ampli-
TILLING has moved from proof-of con- fication of the longer wild-type fragment
cepts to production with the establishment is suppressed and only mutant lines yield
of publicly available services for Arabidopsis, products (Joen et al., 2004). Experimental
maize, lotus and barley. Pilot-scale projects approaches for identifying DNA deletions
have been completed on several other plant in pools of mutants that are generated by
species, including wheat. The protocols devel- high-energy ionizing radiation have been
oped for TILLING have been adapted for dis- developed (Li and Zhang, 2002), which
covery of natural nucleotide variation linked is called Deleteagene. Deleteagene can be
to important phenotypic traits, a process applied to plants in which transformation
termed EcoTILLING (Comai et al., 2004). Till is inefficient; it might also provide a means
et al. (2007) reviewed the current TILLING and of simultaneously mutate (delete) tandem
Isolation and Functional Analysis of Genes 451
duplicated genes (Li and Zhang 2002; and pairs with complementary sequences.
Zhang, S. et al., 2003). In contrast to inser- The most well-studied outcome of this rec-
tion mutants where the probability of find- ognition event is post-transcriptional gene
ing a mutant is proportional to the size of silencing. In this way all the RNA tran-
the target gene, meaning that identification scripts from any of the members of a gene
is difficult in a small gene (Krysan et al., family can be simultaneously silenced if
1999), it is easier to find a knock-out of a highly homologous regions are used. Any
small gene from a deletion mutant pool. resulting phenotype can then be attributed
to the functioning of that gene family, but it
will still need to be determined whether the
11.5.4 RNA interference family members contribute redundant func-
tions or whether only one of the members
All gene disruption approaches have some of the gene family actually conditions the
inherent limitations. For example, it is dif- particular phenotype observed.
ficult to identify the function of redundant Artificial microRNAs (amiRNAs),
genes or the functions of genes required in which are designed to target one or several
early embryogenesis or gametophyte devel- genes of interest, provide a new and highly
opment. One way to overcome the redundant specific approach for effective post-transcrip-
gene problem is to simultaneously inhibit tional gene silencing in plants. Warthmann
all the members of a gene family through et al. (2008) devised an amiRNA-based strat-
gene silencing. RNA interference (RNAi) is egy for both japonica and indica types of
a mechanism that inhibits gene expression cultivated rice. Using an endogenous rice
by causing the degradation of specific RNA miRNA precursor and customized 21mers,
molecules or hindering the transcription amiRNA constructs were designed to target
of specific genes (Fire et al., 1998). RNAi three different genes (Phytoene desaturase
refers to the function of homologous double- -Pds, Spotted leaf -Spl11 and elongated uper-
stranded RNA (dsRNA) to specifically target pmost internode-Eui1/CYP714D1). Upon
a genes product, resulting in null or hypo- constitutive expression of these amiRNAs
morphic phenotypes. As long as the interfer- in the cultivar Nipponbare (japonica) and
ence is targeted to a region of the gene that IR64 (indica), the target genes were down-
is conserved within all the members of the regulated by amiRNA-guided cleavage of the
gene family, all members of the family will transcripts, resulting in the expected mutant
be similarly inhibited (Tang et al., 2003). phenotypes. The effects were highly spe-
The most interesting aspects of RNAi cific to the target gene, the transgenes were
are the following (Cullis, 2004): stably inherited and they remained effec-
tive in the progeny. Ossowski et al. (2008)
dsRNA, rather than single-stranded reviewed various strategies for small RNA-
antisense RNA, is the interfering agent. based gene silencing, described the design
It is highly specific. and application of artificial miRNAs for gene
It is remarkably potent (only a few silencing in many plant species and com-
dsRNA molecules per cell are required pared the small RNA pathways mediating
for effective interference). transgene-induced gene silencing, includ-
The interfering activity (and presum- ing post-transcriptional gene silencing, tran-
ably the dsRNA) can cause interference scriptional gene silencing and virus-induced
in cells and tissues far away from the gene silencing.
site of introduction.
The RNAi pathway is initiated by the enzyme
DICER, which cleaves long, dsRNA mol-
ecules into short fragments of 2025 bp. One 11.5.5 Gene isolation via mutagenesis
of the two strands of each fragment, known
as the guide strand, is then incorporated There are two main approaches for disrupt-
into an RNA-induced silencing complex ing gene function on the basis of its DNA
452 Chapter 11
sequence: using one of the targeted tech- The ligated DNA is precipitated and
niques such as RNAi or ectopic expression, transformed by electroporation into
or screening a collection of randomly gen- recombination-deficient E. coli cells to
erated mutants for a knock-out. The ampli- maximize the stability of the multimer-
fication and sequencing of genomic DNA ized (CaMV 35S enhancers). Recovered
next to an inserted transposon or T-DNA is plasmids can then be sequenced to
an essential step in identifying a mutation identify captured flanking sequences.
within a gene. Several steps are required
to isolate the disrupted flanking DNA, 4. Inverse PCR (IPCR): utilizing primers
incorporating different possible techniques made from the left or right border sequences
(Jenks and Feldmann, 1996), including: on circularized genomic fragments. IPCR has
been implemented to isolate DNA segments
1. Screen for the mutant: this can be done of the genome that flank the inserted molec-
through the generation of genomic librar- ular in transgenic plants tagged by T-DNA,
ies from the mutants and screening with transposons or retrotransposons. The tech-
sequences homologous to the right or left nique involves digestion by appropriate
border regions. When the screen is based restriction enzymes containing the known
on visible phenotypes, all the mutants are sequence and its flanking region (Joen et al.,
grown under regular growth conditions if 2004). Many thousands of restriction frag-
screening for morphological variation; or ments are circularized by self-ligation with
they are grown under a special condition if T4 DNA ligase and the circularized DNA is
screening for conditional mutants such as then used as a template in PCR. The unknown
those to biotic or abiotic stresses. flanking DNA segment is amplified by two
2. Confirm co-segregation: because a large primers located at the ends of the known
proportion of mutant lines are untagged in sequence. The first primer is designed to
T-DNA or transposon mutagenized collec- locate near the junction point between the
tions, co-segregation analysis of the T-DNA insert and plant sequences and the second
sequence or selection marker with the phe- primer is located near the enzyme site that
notype is the first step towards cloning the is used for digestion of the mutagenized
gene. It is estimated that 3540% of the DNA. At least 50 nucleotides should be left
mutants in Arabidopsis are possibly due to between the primer sites and the junctions
deletion, rearrangement or somatic mutation for nested PCR to isolate specific amplifica-
during transformation. Once co-segregation tion products and DNA sequencing.
is established for a given mutant, isolation 5. Thermal asymmetric interlaced (TAIL)-
of the mutated gene may be achieved by PCR using nested border specific primers
several methods such as plasmid rescue, and arbitrary degenerate primers. The TAIL-
IPCR or TAIL-PCR. PCR strategy has been used to isolate insert-
3. Plasmid rescue: this involves utilizing end sequences from P1 and YAC clones (Liu
bacterial selectable markers and origin of and Whittier, 1995), genomic sequences
replication from a linearized bacterial plas- that flank T-DNA insertions from trans-
mid incorporated into the T-DNA to isolate genic lines of Arabidopsis (Liu et al., 1995)
T-DNA-plant junctions in E. coli. It includes and genomic DNA flanking Tos17 in rice
the following procedures: (Yamazaki et al., 2001). TAIL-PCR depends
on amplification between a set of three
Restriction enzymes are present in the nested primers for the known sequences and
T-DNA at the ends of the bacterial plas- shorter, arbitrary degenerate primers with
mid sequence. low Tm values. Accordingly, the PCR pro-
After extracting purified genomic DNA, gramme is set to thermally control specific
the genomic DNA is digested with the and non-specific products. In the primary
appropriate restriction enzyme. After reaction, five high-stringency cycles are used
removal of the enzyme, samples are to specifically amplify linear products from
ligated. the target flanking sequences by the known
Isolation and Functional Analysis of Genes 453
screens (Carpenter and Sabatini, 2004). This function analysis (Alonso and Ecker, 2006).
will enable researchers to test simultane- First, induced alleles with phenotypes
ously the role of all genes in the genome that have only been observed in a specific
for involvement in a particular biological genetic background (Sanda and Amasino,
process (Alonso and Ecker, 2006). The first 1996) point to the need for the creation and
step towards this goal involves generating sequencing of insertions in large mutant
a non-redundant collection of homozygous populations using various accessions or
mutants. From more than 300,000 gene- ecotypes; a process that will be facilitated
indexed mutant lines in A. thaliana, ideally by UHTS. Secondly, UHTS technology
two independent lines per gene need to be will allow complete genome re-sequencing
selected, the mutations need to be confirmed of many hundreds, or even thousands, of
and homozygous plants obtained. The hypo- accessions. With the concomitant develop-
thetical end product for this step will be a ment of more phenotyping platforms and
collection of 521 96-well plates that corre- corresponding community phenotyping
spond to 50,000 mutant lines, two lines for databases, whole-genome association stud-
each of the 25,000 genes. This seed library ies that link genotype and phenotype will
could then be systematically screened to become the approach of choice to inter-
study the role of each one of the 25,000 rep- rogate plant gene function and the role of
resented genes in any given biological proc- natural allelic variation in plant adaptation
ess. The identification of mutants affected to a range of local growth habitats (Weigel
in the selected biological process allows the and Nordborg, 2005).
immediate identification of the underlying
genes. By having two independent mutant
lines per gene, false positives and the need 11.6 Other Approaches for Gene
for experimental replicates are substantially Isolation
reduced.
Several important advances towards
As described in Chapter 3, DNA chip and
gene-function analysis in Arabidopsis are
microarray technology make it possible
on the horizon (Alonso and Ecker, 2006),
to do gene isolation in a high-throughput
which should provide some guidelines for
way. Gene isolation through microarrays is
all plant species: the ability to do systematic
to identify target gene(s) from a genome.
forward genetics using reverse genetic tools
There are two different approaches: (i)
(simultaneous phenotypic analysis of all
parallel analysis of gene expression by
gene-indexed mutants), the development of
comparison of expressions among differ-
new phenomic platforms, improvements in
ent species or different individuals within
targeted mutagenesis (specifically, homolo-
a species, or expressions of the same indi-
gous recombination) and the utilization of
viduals at different growth or developmen-
natural variation in gene function studies.
tal stages or under different environments.
Induced mutations (point mutations, dele-
Microarray-based gene expression analysis
tions and transposon- and T-DNA-generated
can be used to detect the type and abun-
mutations) and other means of reducing gene
dance of mRNA in the cell by hybridiza-
expression (such as the use of small inhibi-
tion, which needs few samples and is
tory RNAs (siRNAs), amiRNAs and artificial
highly automatable. (ii) Genes can be iso-
repressor proteins) will continue to be used
lated from cDNA or EST microarrays by
for some time. However, the importance of
using homologous probes.
natural allelic variation to study gene func-
tion in plants is likely to increase. The rapid
advancement of ultra high-throughput
sequencing (UHTS) technologies, which 11.6.1 Gene expression analysis
allow 1 gigabase of sequence in 48 h for a
cost of US$3000 (Service, 2006), is likely to One of the major applications of DNA micro-
have a profound effect in two areas of gene- arrays is gene expression profiling (Tessier
Isolation and Functional Analysis of Genes 455
et al., 2005). Gene profiling via microarrays sample data together. Either the gene or the
involves determining the expression of array dimensions can be clustered accord-
genes under specific conditions. Similarly, ing to similarity indices, enabling one to see
microarrays have enabled genome-wide both genes with similar expression profiles
class comparisons such as organs, geno- across arrays and arrays that have underly-
types or conditions. Several studies have ing similarities across genes (Finak et al.,
identified genes that are consistently dif- 2005).
ferentially expressed between two or more Class prediction experiments aim to
predefined classes, the degree to which a find subsets of genes that can best distin-
gene is active in a certain organ or tissue guish between two or more classes of sam-
can be measured by the amount of mRNA ple. First do sequence analysis. Probes are
found in the cells, although the correlation designed using gene-specific DNA frag-
between mRNA and active protein is not ments and oligonucleotide or DNA arrays
always absolute due to post-transcriptional are made for all genes of an organism. All
regulation. Such approaches hope to find mRNA probes reverse transcribed under
the complete set of genes that differentiate different classes from the organism are
between cellular states and shed light on then hybridized with the DNA microar-
the underlying differences between these ray. Based on the intensity of hybridiza-
classes at the molecular level. tion signal, differential gene expressions
Initial strategies for detecting dif- or co-expression under different classes
ferentially expressed genes between two can be detected. Class-dependent gene
classes are straightforward. Essentially they expression can be identified by compari-
involve two-sample comparisons of the dif- son of expression profiles across all genes
ferences between mean log expressions of between different classes. The set of genes
the classes. The significance of the differ- can be mapped on to the gene ontology
ences is estimated using t-tests modified or to metabolic pathways. In this way,
specifically for array data (simple t-tests physiological functions of a gene can be
are almost never used usually more com- analysed and related functional genes
plex t-tests are needed due to the nature of can be determined. There are two early
the data) or its non-parametric analogues. examples (Lockhart et al., 1996; Wang,
When more than two classes are involved, X. et al., 1999). In addition, several tools
F-statistics and non-parametric analogues for exploring mRNA expression data with
can be applied. known proteinDNA and proteinprotein
A wide variety of statistical techniques interaction databases have been reported.
is available for class prediction (Finak et al., An example is CYTOSCAPE, which maps
2005), including linear discriminant analy- expression data on to the protein interac-
sis, weighted voting, nearest-neighbour tion network (Shannon et al., 2003). This
classifiers, support vector machines, neural approach can yield important insights into
nets and Bayesian methods. At the centre the protein complexes perturbed in a given
of all of these methods is the issue of fea- experiment and establish functional roles
ture selection. The goal is to select a sub- for the genes distinguished.
set of features (genes) that best distinguish
between known classes and can predict
new, unseen samples.
Class discovery experiments attempt to 11.6.2 Using homologous probes
determine biologically relevant subclasses
of a particular cellular state. Several meth- Availability of adequate quantities of suf-
ods are used for this purpose and the most ficiently pure protein for production of
popular techniques are k-means cluster- specific antibodies or for partial peptide
ing, hierarchical clustering, self-organizing sequencing opens the door to cloning of the
maps and principal component analysis. gene specifying that protein. Physiological
The goal of clustering is to organize similar and biochemical investigations may lead
456 Chapter 11
to the identification of a protein responsi- that can produce the corresponding mRNA.
ble for the phenotype or biological property Positively-reacting genomic library clones
of interest. If the purified polypeptide has need to be further examined to identify
a non-blocked N-terminus, the N-terminal where on the clone the gene is located and
sequence may be determined directly. prove that the clone actually contains the
Sequences from the interior of the protein gene.
can be obtained by producing and analysing Functional genomics has been broadly
proteolytic fragments of the polypeptide. If applied to include many endeavours aimed
sufficient quantities of purified protein are at determining functions of genes on a
available, the protein may be used to immu- genome-wide scale, such as transcriptional
nize animals (commonly rabbits or mice). profiling to determine gene expression pat-
The animals usually produce and secrete terns; and yeast two-hybrid and other inter-
into the serum, antibodies specifically action analyses to help identify pathways,
recognizing the protein. The antisera may networks and protein complexes (Chapter
be used to detect the immunizing protein 3; Henikoff and Comai, 2003). Although
specifically. a daunting task, several approaches have
Peptides of ten to 30 residues in length already been established, including the use
can be chemically synthesized efficiently. of T-DNA knock-out lines and over-expres-
Small peptides chemically coupled to sion studies. In contrast to the previously
larger carriers can be effective immunogens. prevalent gene-by-gene approaches, new
Antisera can be used to recognize clones of high-throughput methods are being devel-
an expression library that are synthesiz- oped for expression analysis as well as for
ing the cognate antigen. The antibodies the recovery and identification of mutants.
bind to protein from the colony or plaque. The experimental approach is consequently
Bound antibodies can be detected by any of changing from hypothesis-driven to non-
a variety of methods such as radioimmune biased data collection and an archiving
precipitation and enzyme-linked immuno- methodology that makes these data avail-
sorbent assay (ELISA). able for analysis by bioinformatics tools.
Nucleotide sequences that could code Reverse genetics (sequenced gene to mutant
for the determined sequence of amino acids and function) may play a more prominent
can be deduced from the genetic code. role in functional genomics studies in the
Since the genetic code is redundant, mul- future (Xu et al., 2005).
tiple nucleotide sequences can encode the In true directed mutagenesis, research-
same peptide sequence. To be sure that the ers choose the gene to be perturbed. The most
actual nucleotide sequence is present in a elegant and precise targeted mutagenesis
probe oligonucleotide, the oligonucleotide approach relies on homologous recombina-
is synthesized incorporating, where needed, tion to target foreign DNA on a homologous
multiple nucleotides. The product is called sequence in the host genome. This is rarely
a degenerate oligonucleotide. possible in plants and therefore alternative
When amino acid sequences from two approaches have been developed to alter the
separated parts of the polypeptide chain are expression of selected genes. There are two
available, degenerate oligonucleotides based main variants of direct mutagenesis: gene-
on the two sequences can be used to attempt silencing (RNAi) and zinc-finger nucleases.
to amplify the intervening sequence by PCR. In these strategies, specific sequences that
An amplified RT-PCR product may be used in are unique for each gene to be disrupted
nucleic acid hybridization to screen the col- must be engineered in vitro and then intro-
onies or plaques of a cDNA library for clones duced into the plant. It has been shown that
containing complementary sequences. the expression of a sequence-specific zinc-
Positive clones obtained from the cDNA finger nuclease in A. thaliana generates
library can be used as nucleic acid hybridi- mutations in the target gene in planta (Lloyd
zation probes to screen a library of genomic et al., 2005). The large battery of well-char-
DNA to identify clones containing the gene acterized zinc-fingers, each with different
Isolation and Functional Analysis of Genes 457
As described in previous chapters, intra- some early reviews can be found in McElroy
specific transfer of genes is easily per- and Brettell (1994), Christou (1996) and
formed by cross-hybridization in all plants McElroy (1996) among many other books
with a sexual cycle. Gene transfer by cross- and journal articles. Only some basic con-
hybridization becomes more difficult or cepts will be introduced in this chapter. For
impossible with increasing phylogenetic a full coverage, readers are recommended to
distance and as a result, the inter-generic seek information in recent books, including
gene transfer is very rare. By genetic trans- Liang and Skinner (2004), Parekh (2004),
formation, DNA from any organism can be Pea (2004), Skinner et al. (2004) and the
transferred into other species genomes. Transgenic Crops series by Springer.
The inserted gene sequence (known as the
transgene) may come from another unre-
lated plant, or from a completely different
species: transgenic Bt maize, for example,
12.1 Plant Tissue Culture
which produces its own insecticide, con- and Genetic Transformation
tains a gene from a bacterium. This power-
ful tool enables plant breeders to do what 12.1.1 Plant tissue culture
they have always done generate more use-
ful and productive crop cultivars containing Plant tissue culture exploits the in vitro
new combinations of genes but it expands plasticity of plant growth and develop-
the possibilities beyond the limitations ment because whole plants can be regen-
imposed by traditional cross-pollination erated from a wide range of plant cells
and selection techniques. Plants containing (totipotency). For the majority of species
transgenes are often called genetically mod- gene transfer is carried out using explants
ified- or GM-crops, although in reality all competent of regeneration to obtain com-
crops have been genetically modified from plete, fertile plants. Cell division and callus
their original wild state by domestication, (dedifferentiated tissue) formation, embryo-
selection and controlled breeding over long genesis and organogenesis can be induced
periods (Chapter 1). In this book, the term using combinations of plant growth regu-
transgenic is used to describe a crop plant lators. Auxins like 2,4-dichlorophenoxy-
that has transgenes inserted. acetic acid (2,4-D), picloram and dicamba
Issues in gene transfer and GM-crops and cytokinins like benzylaminopurine
have been a hot topic for many years and (BAP), kinetin and zeatin are usually used
in the tissue culture media. There are no Bt gene without significantly changing the
universally applicable methods of plant tis- amino acid sequence. The result was the
sue culture and thus, protocols need to be enhanced production of the gene product in
modified for each genus, species, cultivar plant cells.
and tissue. Within individual cereal species 3. The termination sequence signals to the
the elite germplasm is usually least amena- cellular machinery that the end of the gene
ble to tissue culture. sequence has been reached.
4. A selectable marker gene in the gene con-
struct is to identify plant cells with the inte-
12.1.2 Genetic transformation grated transgene. This is necessary because
achieving incorporation and expression
of transgenes in plant cells is a rare event,
Goals of plant transformation for crop
occurring in just a few of the targeted tissues
improvement are to produce fertile trans-
or cells. Selectable marker genes encode pro-
genic plants with integrated transgenes at
teins that provide resistance to agents that
reasonable frequencies from elite back-
are normally toxic to plants, such as, meta-
grounds. Once a gene has been isolated
bolic inhibitors, antibiotics or herbicides.
through one of the approaches as described
As explained below, only plant cells that
in Chapter 11 and cloned (amplified in
have the integrated selectable marker gene
a bacterial vector), it must undergo sev-
will survive when grown on a medium con-
eral modifications before it can be effec-
taining the appropriate antibiotic or herbi-
tively inserted into a plant. Components
cide. Similar to the gene of interest, marker
of any successful plant transformation sys-
genes also require promoter and termination
tem include delivery of DNA to the plant
sequences for their proper function.
genome without compromising cell viabil-
ity, selection of transformed cells, regenera-
Conventional plant breeding represents
tion to produce intact fertile plants and the
the principal approach to crop improve-
transmission of transgenes into subsequent
ment. It employs methods such as hybridi-
generations. A simplified representation of
zation, introgression breeding, induced
a constructed transgene, containing the nec-
mutagenesis and somatic hybridization to
essary components, which need to be devel-
randomly modify genomes and, as a result,
oped in parallel, for successful integration
create genetic variation (Fig. 12.1a). Genetic
and expression is as follows:
engineering is different from the traditional
1. The promoter is the on/off switch that methods in that any modification can be
controls gene expression at different devel- designed and tailored to achieve the desired
opmental stages and in response to certain effect. This method often fuses promoters
environmental changes, or specific to cer- and genes to produce expression cassettes
tain tissues and organs. On the other hand, that are introduced into plants using bac-
promoters like the most commonly used terial transfer DNAs (T-DNAs) (Fig.12.1b).
cauliflower mosaic virus (CaMV) 35S are It excludes the transfer of known allergen-
constitutive. The genes under constitutive or toxin-encoding genes and analyses the
promoters are expected to be expressed sequence of insertion sites. The ability
throughout the life cycle of the plant in to identify rapidly and eliminate plants
most tissues and organs. containing inadvertent fusions or disrup-
2. The gene of interest is modified to achieve tions of genes is not available to traditional
greater expression in a plant. For example, plant breeding, where genes can be inacti-
the Bt gene for insect resistance is of bac- vated through unpredictable transposition
terial origin and has a higher percentage of of resident mobile elements. The second
A-T nucleotide pairs compared to plants, advantage of transgenic applications is that
which prefer G-C nucleotide pairs. In a it generally takes less than a year to trans-
clever modification, researchers substituted form an existing cultivar with one or several
A-T nucleotides with G-C nucleotides in the traits.
Source Development
Genetic distance time Issues Trait potential
of foreign DNA
unknown DNA
~820 years
~46 years
Transgenic
Transfer of
(proposed)
% genome
Regulation
Xenogenic
complexity
Famigenic
Intragenic
concerns
Transfer
Genetic
Public
(a)
Species boundary
Transfer of existing traits (native
Variety
+ + genes) from one to another Basic
crosses
variety.
M0
Modification of existing traits
Traditional breeding
F1
hybrid Introduction of new traits that are
Introgression similar to existing traits and often Basic
breeding > 1% + + associated with disease or stress
tolerance.
Somatic
Interspecies hybrid Introduction of new traits that are
somatic similar to but possibly stronger
hybridization > 1% + + + than existing traits, and often Full
associated with disease or stress
tolerance.
(b) Introduction of new traits that
Transgenic may outperform native traits
modification < 0.1% by transforming plants with Full
+ + genes from viral, bacterial,
Tn
Binary vector fungal or unrelated plant
sources.
+ + Full
modification traits by transforming plants with
Tn
synthetic genes.
Fig. 12.1. Summary of various methods for crop improvement. The genetic distance between DNA
source and target crop is indicated in the left four columns, including foreign and sexually compatible.
The species barrier is shown as a dotted vertical line. Xenogenic, synthetic DNA; transgenic, DNA
from unrelated species, such as viruses, bacteria, fungi and plants that belong to different families;
famigenic, DNA from plants that belong to the same family; and intragenic, DNA from within the same
sexual compatibility group. The % genome column shows the estimated size of the introduced DNA as a
percentage of the entire genome. Proposed regulatory requirements are shown in bold letters with Basic
implying multi-year field tests on agronomic performance and an assessment of the nutritional profile,
and Full indicating more extensive studies, which include biosafety assessments of foreign proteins as
well as environmental studies. Regulatory requirements for cisgenic applications are dependent on the
trait (Dep.). In these cases, the transfer of traits that resemble native traits, such as those associated
with disease resistance, should be considered for the basic regulatory assessment described above.
However, traits that are new to the sexual compatibility group would require more extensive analyses.
(a) Methods in traditional breeding. M0 stands for an original plant derived from induced mutagenesis.
Random mutations are shown as triangles, and can represent hundreds of point mutations/chromosome
induced by ethylmethane sulfonate (EMS) or deletions of up to 100 kb pairs triggered by di-epoxy
butane (DEB) or low linear energy transfer radiation (LET). (b) Methods in genetic engineering. Tn, plant
transformation. Reprinted from Rommens et al. (2007) with permission from Elsevier.
Gene Transfer and GM Plants 461
There are several important fields in industrial research process at large seed or
plant transformation that will not be dis- agrochemical companies. Most university-
cussed in detail in this chapter but are based research groups do not have access
worthy of brief mention here: (i) high- to the physical or human resources neces-
throughput transformation, by which all sary to establish a cereal transformation
candidate genes can be used for transforma- effort for their own target crop. These limi-
tion prior to functional analysis; (ii) plastid tations have led to the establishment of core
transformation with a major advantage that plant transformation facilities (PTFs) at a
in many plant species plastid DNA is not number of academic institutions. Examples
inherited, preventing gene flow from the of North American PTFs include Cornell
GM-plant to other plants; and (iii) chro- University (tomato, algae, fungi), Iowa State
mosome construction and transformation, University (maize and soybean), University
by which high molecular weight DNA and of Nebraska at Lincoln (wheat and soy-
multiple genes can be delivered into plant bean), Texas A&M University (cotton, rice,
cells. Ogawa et al. (2008) established a sorghum, banana, conifers), University of
large-scale, high-throughput protocol to Wisconsin at Madison (lucerne) and the
construct Arabidopsis thaliana suspension- National Research Council (NRC) of Canada
cultured cell lines, each of which carries (canola, wheat and pea). Core PTFs have
a single transgene, using Agrobacterium- advantages to exploit the economies of
mediated transformation. They took advan- scale associated with the centralization of
tage of RIKEN Arabidopsis full-length a labour-intensive activity, i.e. to assemble
(RAFL) cDNA clones and the Gateway clon- a critical mass of transformation specialists
ing system for high-throughput prepara- working on related problems with conti-
tion of binary vectors carrying individual nuity of activity over long time frames, to
full-length cDNA sequences. Throughout provide an in-house resource dedicated
all cloning steps, multiple-well plates were to fulfilling the exclusive transformation
used to treat 96 samples simultaneously needs of the institutions own community,
in a high-throughput manner. They evalu- to eliminate the need to compete for lim-
ated the protocol by generating transgenic ited collaborative opportunities elsewhere,
Arabidopsis T87 cell lines carrying indi- to offer on-site teaching resources in plant
vidual 96 metabolism-related RAFL cDNA tissue culture and transformation, to facili-
fragments and showed that the protocol was tate funding from local grower groups and
useful for high-throughput and large-scale to generate funds from private enterprises
production of gain-of-function lines for that contract out transformation activities
functional genomics. Plastid transformation to public sector organizations as a matter of
is suitable only for certain crop species. For economy. The core PTFs have been work-
example, Ruf et al. (2007) studied geneti- ing well at big companies and international
cally modified tobacco in which the trans- centres. A problem with core PTFs is that
gene was integrated in chloroplasts. In a they may forget their reason for being and
large screen, they detected low-level pater- go off on tangents, thus being of limited use
nal inheritance of transgenic plastids in to the wider community.
tobacco. Mini-chromosomes will be briefly
discussed in Section 12.2.2.
Encounter of tissues
Immature embryos with Agrobacterium
with and without preculture Liquid 1015 min
Solid
treatment co-culture co-culture 23 days Eliminating
Room temp. Agrobacterium
Dark 2025C
Explant Co-cultivation Resting
phase
Immature embryos
Embryogenic calli Optional 7 days
Selection
General steps in
phase-1
Ro
Agrobacterium-mediated
ot
Transgenic
Selection
d
s
tra
plants 3 weeks
n
sf
er
12 weeks
to
Optional
Rooting Regeneration
Selection
Light/dark 16/8 h phase-2
Dark
34 weeks light/dark Pre- 4 weeks
Regeneration
1 week regeneration
Light/dark 16/8 h Dark
Fig. 12.2. General scheme for Agrobacterium-mediated transformation of cereal plants. From Shrawat
and Lrz (2006) with permission from Wiley-Blackwell.
A B
Helium pressure gauge
Before After
Fire switch
Vac/vent/hold switch
Gas acceleration tube
Power switch
ON/OFF
Rupture disk
Bombardment Macrocarrier A
chamber door DNA-coated microcarriers
B
Stopping screen
Vacuum gauge Disk-retaining C
Target cells
cap
Microcarrier
launch assembly
Target shelf
Vacuum/vent rate
control valves
Fig. 12.3. Gene gun and system. (A) The biolistic system. The Biolistic PDS-1000/He instrument
consists of the bombardment chamber (main unit), connective tubing for attachment to vacuum source,
and all components necessary for attachment and delivery of high pressure helium to the main unit
(helium regulator, solenoid valve, etc.). (B) Biolistic process. The Biolistic PDS-1000/He system uses
high pressure helium, released by a rupture disk and partial vacuum to propel a macrocarrier sheet
loaded with millions of microscopic tungsten or gold microcarriers towards target cells at high velocity.
The microcarriers are coated with DNA or other biological materials for transformation. The macrocarrier
is halted after a short distance by a stopping screen. The DNA-coated microcarriers continue travelling
towards the target to penetrate and transform the cells. The launch velocity of microcarriers for each
bombardment is dependent upon the helium pressure (rupture disk selection), the amount of vacuum in
the bombardment chamber, the distance from the rupture disk to the macrocarrier, the macrocarrier travel
distance to the stopping screen, and the distance between the stopping screen and target cells.
Gene Transfer and GM Plants 465
transgene expression. The expression cas- bardment was first demonstrated by Vaneck
sette typically comprises a promoter, open et al. (1995) using cell suspensions of two
reading frame and polyadenylation site tomato cultivars. Only one of the cultivars
that are functional in plant cells, although yielded YAC transformants and initial stud-
other components may be present, such as ies suggested that the integrated YAC was
a protein-targeting signal (Altpeter et al., fairly intact in four of the five transform-
2005a). Once this plasmid has been isolated ants recovered, based on the presence of
from the bacterial culture it can be purified two marker genes. The most promising way
and used directly for transformation. of introducing high molecular weight DNA
During Agrobacterium-mediated trans- into plant cells is to create engineered mini-
formation, the T-DNA is naturally excised chromosomes in maize and genes to those
from the vector during the transformation mini-chromosomes (Yu et al., 2007). Mini-
process. This frequently, although not always, chromosomes are able to function in many
prevents the integration of vector backbone of the same ways as chromosomes but allow
sequence into the plant genome (Fang et al., for genes to be stacked on them. The tech-
2002; Popelka and Altpeter, 2003), neces- nique developed in maize should be trans-
sitating time-consuming sequence analy- ferable to other plant species.
sis of transgene insertion sites following
Agrobacterium-mediated gene transfer. In Particle bombardment is the most convenient
contrast, particle bombardment involves no way to achieve organelle transformation
such processing. Cloning vectors are used
in particle bombardment for convenience Thus far, most genetically engineered plants
rather than necessity. Consequently, Fu et al. have been subject to nuclear transformation.
(2000) devised a clean DNA strategy in which An alternative approach is to introduce
all vector sequences were removed prior to transgenes into the chloroplast genome.
particle loading. A standard plasmid vec- This strategy offers advantages such as very
tor was used to clone the plant expression high levels of transgene expression, uni-
cassette and transgene of interest in bacteria parental plastid gene inheritance in most
and then the cassette was excised from the crop plants (preventing pollen transmis-
plasmid and purified by agarose gel electro- sion of transgenes), the absence of gene
phoresis. This minimal, linear cassette was silencing and position effects, integration
then used to coat the metal particles and via a homologous recombination process
carry out transformation. that facilitates targeted transgene inser-
tion, elimination of vector sequences, pre-
High molecular weight DNA delivery cise transgene control and sequestration
into plant cells of foreign proteins in the organelle, which
prevents adverse interactions within the
Until recently, one serious limitation to cytoplasmic environment (as reviewed by
plant transformation technology was the Altpeter et al., 2005a).
inability to introduce large intact DNA
constructs into the plant genome. Such Comparison with other methods
large constructs could incorporate multiple
transgenes, or could comprise a segment of In addition to the properties discussed
genomic DNA to facilitate the map-based for particle bombardment, Altpeter et al.
cloning of plant genes. In Agrobacterium- (2005a) provided a comprehensive review
mediated transformation, this limitation of this method by comparing it with other
has been addressed by the development transformation methods. Transgene inte-
of binary bacterial artificial chromosome gration, mediated by either A. tumefaciens
(BIBAC) and transformation-competent arti- or particle bombardment, is a random pro-
ficial chromosome (TAC) vectors (Shibata cess that appears to correlate with the posi-
and Liu, 2000). The transfer of yeast artificial tion of naturally occurring chromosome
chromosome (YAC) DNA by particle bom- breaks. Transcriptionally active regions of
Gene Transfer and GM Plants 467
the genome are favoured, particularly the protoplast isolation, callus formation and
subterminal regions of the chromosomes, plant regeneration. It is important to gener-
perhaps because the DNA is more accessi- ate and maintain a cell suspension culture
ble in these areas. It is possible, although for its embryogenic capacity. Extensive time
still a matter of speculation, that further in tissue culture often results in low repro-
breaks may be caused by particle bom- ducibility and poor regeneration capacity.
bardment since the microprojectiles may A breakthrough in Arabidopsis research
shear the ends of DNA loops in the nucleus was the invention of the vacuum-infiltration
(Abranches et al., 2000; Kohli et al., 2003), procedure, a simple and reliable method of
which may partially explain the relative obtaining transformants at high efficiency
efficiency of bombardment in terms of while avoiding the use of tissue culture
stable transformation compared to other (Bent, 2000). In planta transformation
techniques. involves floral dip, vacuum infiltration
Compared to biolistic techniques, and spraying. They yield transformants at
Agrobacterium-mediated transformation frequencies ranging up to several percent,
offers several advantages (Tzfira and Citovsky, with the most common frequency being
2006), such as simpler integration patterns 0.11%.
resulting in lower mutational consequences Electroporation utilizes short, high-
for the transgenic plant and limited transgene intensity electric fields to permeabilize
silencing via co-suppression. In addition, the reversely the liquid bilayers of the cell mem-
option for fine tuning the Agrobacterium- brane. It is widely believed that the electric
based transformation protocols renders more pulse causes extensive compression and
and more cereal species amenable for effi- thinning of the plasmalemma. The resulting
cient genetic engineering (Shrawat and Lrz, transient formation of pores permits free dif-
2006; Conner et al., 2007). fusion of various classes of macromolecules
including dyes, antibodies, RNA and viral
particles and DNA. Transient expression
from electroporated plant cells has been
12.2.3 Electroporation and other direct used to define functional elements within
gene transfer approaches a promoter, to examine the effects of anti-
sense RNA on gene expression, to study the
There are several less popular means of translocation of proteins into both plasmids
gene transfer that may be effective in spe- and nuclei of intact protoplasts, to examine
cific cases: polyethylene glycol (PEG)- cell-cycle-specific gene expression and to
facilitated protoplast fusion, microinjection, study responses to plant hormones.
sonication, in planta transformation and As a method of DNA transfer, electro-
electroporation. The mechanism is to cause poration is convenient and the results are
transient micro-wounds in the cell wall and consistently duplicated as a daily routine.
the plasma membrane, allowing DNA in In most cases it is more efficient than other
the medium to enter the cytoplasm before methods designed for the same purpose,
repair or fusion of the damaged cellular such as particle bombardment. In addition,
structures. The direct transfer of DNA to it does not suffer from host-range limitations
protoplasts using PEG, or electroporation imposed by biology-based systems such as
resulting in the transient permeabilization of those employing A. tumefaciens or toxic-
the cell membrane using high-voltage elec- ity problems sometimes encountered using
tric fields, has been shown to be possible in a PEG-based procedure. Finally, electropo-
various plants. Leaf tissue or embryogenegic ration coupled with a transient expression
calli are often used to isolate protoplasts by assay is rapid, allowing for the reproducible
enzymatic treatment. Using protoplasts as detection of gene products within hours of
the starting material for transformation in the introduction of DNA. This is in con-
cereals often employs callus induction, sus- trast to a stable transformation strategy
pension culture initiation and maintenance, that involves months to regenerate trans-
468 Chapter 12
Modular vectors
Plant cells
Fig. 12.4. From vectors to applications to cellular functions. Introduction of genetic information into target
plant cells and acquisition of new data as a result of transgene expression may require a network of
modular vectors, flexible gene cloning and expression systems, and specialized plasmids that result in
different modes of transgene expression. Modular vectors may represent a starting point for assembly of
custom-made expression vectors, multi-gene expression vectors, and other types of plant transformation
vectors. These vectors in turn provide the users with the abilities to overexpress and downregulate
genes, as well as with the capacity for specific, and often unique, applications, useful for obtaining novel
traits and functional data, protein imaging in living plant cells, and generating transgenic plants for plant
research and biotechnology.
Gene Transfer and GM Plants 469
A crucial improvement made to the out various tasks in plant cells, e.g. the
first generation of binary plasmids was the transfer of extremely long DNA molecules
introduction of an empty plant expression (Hamilton, 1997), the expression of fluores-
cassette, a feature that allowed the plant cent protein fusions (Goodin et al., 2002)
biologists a simple and more direct route for and the detection of proteinprotein inter-
cloning their gene of interest under the con- actions (Bracha-Drori et al., 2004), while
trol of a plant-expressing constitutive pro- others were specifically designed for versa-
moter. The constant improvements in binary tility and simplicity, allowing plant biolo-
vectors even included the most famous gists not only a choice but also the ability
binary vector, one of which has been domi- to manipulate these vectors for their own
nating the landscape of binary plasmids needs. The latter group of vectors are typi-
for several decades: pBin19 (Bevan, 1984; cally constructed as families of plasmids
Komori et al., 2007). This plasmid offers and include, for example, the pCB mini-
several features, including incorporation of binary vector series that featured a collec-
the lacZ gene into the multiple cloning site tion of extremely small pBin19-derivative
(MCS) to facilitate identification of recom- vectors (Xiang et al., 1999) and the pGreen
binant plasmids using a colorimetric assay, series of plasmids featuring versatile and
a bacterial kanamycin-resistance gene, an flexible series of binary vectors (Hellens
E. coli origin of replication, a complete et al., 2000b). These and many other fami-
plant selection marker expression cassette lies of vectors provide the plant research
and an extended MCS. community with a vast number of versatile
New vectors were designed and con- tools for various plant expression analyses.
structed to provide users with a more spe- Some well-known binary and superbinary
cialized set of tools suitable for carrying vectors are listed in Table 12.1.
Table 12.1. Well-known binary and superbinary vectors (from Komori et al. (2007) reproduced with
permission of the American Society of Plant Biologists).
Frequency
Plant Bacterial Replication Replication of use in
selection selection origin for A. origin for Mobili- recent
Vector markera markerb tumefaciens E. coli zation Reference literaturec
Frequency
Plant Bacterial Replication Replication of use in
selection selection origin for A. origin for Mobili- recent
Vector markera markerb tumefaciens E. coli zation Reference literaturec
The 2007 Focus Issue of Plant Phys- and an artificial T-DNA within a plasmid
iology presented a collection of original that can be replicated both in E. coli and
articles describing the development of new A. tumefaciens turned out to be fully func-
vector systems useful for plant research and tional in plant transformation. The term
biotechnology, as well as a compilation of binary vector literally refers to the entire
short review articles that highlight some of combination, but the plasmid that carries
the major developments in vector-assisted the artificial T-DNA is usually called a
plant research technologies (Tzfira et al., binary vector.
2007). It includes papers describing an A binary vector consists of T-DNA and
extensive collection of MultiSite Gateway- the vector backbone (Fig. 12.5). T-DNA is the
based plant expression vectors (Karimi segment delimited by the border sequences,
et al., 2007), a guide to vectors for chloro- the right border (RB) and the left border (LB)
plast transformation (Lutz et al., 2007) and and may contain MCS, a selectable marker
a system of transformation vectors with gene for plants, a reporter gene and other
the superpromoter (Lee, L.-Y., et al., 2007). genes of interest. The vector backbone car-
For a recent update on binary vectors, the ries plasmid replication functions for E. coli
reader is referred to an update by Komori and A. tumefaciens, selectable marker genes
et al. (2007). for the bacteria, and optionally a function for
plasmid mobilization between the bacteria
and other accessory components (Komori
12.3.1 Binary vectors et al., 2007).
The RB and the LB are imperfect, direct
The binary vector was invented soon after repeats of 25 bases and said to be the only
it had been elucidated that crown gall tum- essential cis-elements for T-DNA transfer
origenesis was caused by genetic transfor- (Yadav et al., 1982). The RB and the LB are
mation of plant cells with a piece of T-DNA integrated in binary vectors as DNA frag-
from a Ti plasmid (tumour-inducing plas- ments cloned from well-known Ti plasmids,
mid) harboured by A. tumefaciens (Fraley of either the octopine or nopaline type.
et al., 1986). A key finding was that the Insertion of genes of interest into appro-
virulence genes, which are involved in the priate locations of a binary vector is tradi-
transfer of T-DNA, could be placed on a tionally carried out by standard subcloning
replicon separate from the one with T-DNA techniques. MCS, which are similar or iden-
(Hoekema et al., 1983). Thus, combina- tical to those in pUC, pBluescript and other
tion of a disarmed strain, which carries a standard vectors, are still very useful in this
Ti plasmid without the wild-type T-DNA regard, but recently constructed vectors are
Gene Transfer and GM Plants 471
Fig. 12.5. Typical structure of a binary vector. Key components and their major options are displayed.
From Komori et al. (2007) reproduced with permission of the American Society of Plant Biologists.
more user-friendly. Recognition sites for mainly in dicotyledons and it had been
rare cutters, which are restriction enzymes difficult to apply the method to cereals.
with long recognition sequences, are very The finding that some of the virulence
convenient in this respect because the DNA genes exhibited gene dosage effects led to
fragments that are to be inserted scarcely the development of a superbinary vector,
have such sites. In some of the recently which carried additional virulence genes.
created vectors termed modular vectors, a The superbinary vector has been highly effi-
series of these rare sites are placed in the cient in the transformation of various plants
T-DNA (Chung et al., 2005). An extensive and especially useful in the transformation
set of auxiliary plasmids, which have full of recalcitrant plants, such as important
sets or subsets of these rare sites and other cereals.
restriction sites, are provided and some of A superbinary vector was developed
the plasmids also carry frequently used and successfully used for the transformation
promoters, marker genes and/or 3' signals. of monocotyledons, such as rice and maize
Various types of expression units may be (Hiei et al., 1994; Ishida et al., 1996). The
constructed in auxiliary plasmids and then superbinary vector is an improved version
the units may be inserted into the modular of a binary vector and carries the 14.8-kb
binary vectors. Thus, several expression KpnI fragment that contains the virB, virG
cassettes could easily be assembled in a and virC genes derived from pTiBo542,
binary vector. which is responsible for the supervirulence
Until the early 1990s, Agrobacterium- phenotype of an A. tumefaciens strain, A281
mediated transformation had been used (Jin et al., 1987; Komari, 1990).
472 Chapter 12
questions needs to be asked about the size recovery of transgenic crop plants (Ramessar
and nature of the DNA fragments, the strains et al., 2007). Without them, the few plant
of A. tumefaciens to be employed, the spe- cells that take up and stably integrate the
cies of plants to be transformed and the foreign DNA would simply be lost in an
purposes of the experiments. If the DNA frag- ocean of wild-type cells, which would cer-
ments are larger than 15 kb, IncP, BIBAC and tainly overgrow these transformed cells in
TAC vectors are recommended. Otherwise, the absence of effective selection against
high-copy-number plasmids are very con- them. However, under certain conditions,
venient and a wide range of vectors varying selectable marker genes may not be neces-
in restriction sites, selectable markers and sary and it may be feasible to get transgenic
Gateway sites is available. A series of vec- plants without selection of a marker gene.
tors designed for specific purposes, e.g. vec-
tors for suppression of plant genes by RNA
interference (RNAi) technology (Miki and
Shimamoto, 2004) may also be chosen. 12.4.1 Functions of selectable marker
Newer generations of plant transforma- genes
tion vectors provide us with improved strat-
egies for cloning and delivering their genes Once a plant cell has incorporated the
of interest into plant cells, typically using introduced DNA in a stable manner (i.e.
Agrobacterium as a vehicle for the transfor- covalently integrated within the host plants
mation process. Some of these vectors were genome), the next step is to regenerate
developed as families of plasmids and others plants from the transformed cells. Position,
represented single constructs designed for frequency and scope of regeneration events
specific purposes. One can find a plasmid for are critical to the isolation of transgenic
every task, including such relatively unique plants. Most often, the major limiting step
applications as activation tagging (e.g. the in the isolation of transgenic plants is a lack
pSKI015 and pSKI074 binary vectors; Weigel of regeneration occurring from within the
et al., 2000) or dexamethasone-inducible transformed cell populations. There is a
expression (e.g. the pOp/LhGR transcription large amount of variability in the frequency
activation system; Samalova et al., 2005). In and scope of regeneration among different
addition, vectors have been constructed that angiosperm species as well as among differ-
allow us to take advantage of radically new ent cultivars of any one species.
cloning methodologies and utilize new gene A critical step in the regeneration of
expression technologies. In addition, new transgenic plants is the ability to distinguish
vector systems are being produced to utilize between transformed plant cells with an
transgenic technologies in an ever-expanding integrated transgene and the bulk of non-
range of plant species, such as forest trees and transformed cells. The traditional way to
transformation-recalcitrant crops (e.g. Meyer achieve this goal is to use marker genes within
et al., 2004; Coutu et al., 2007). Furthermore, the transgene and to select for their expres-
vectors for systemic gene expression without sion. Genes conferring resistance to vari-
permanent genetic modification of the plant ous antibiotics or herbicides are commonly
are being developed based on different plant used in laboratory transformation research.
viruses (e.g. Gleba et al., 2005; Marillonnet Selective marker genes act by expressing an
et al., 2005). enzyme that inactivates the selective agent
(detoxification) and a resistant variant of a
selective agents target enzyme (tolerance).
For example, the aminoglycoside antibiot-
12.4 Selectable Marker Genes ics, such as kanamycin, neomycin and G418
kill cells by inhibiting protein translation.
The use of selectable marker gene sys- The E. coli nptII gene, encoding neomycin
tems facilitates the transformation process phosphotransferase, inactivates these anti-
and allows the relatively straightforward biotics by phosphorylation, thus allowing
474 Chapter 12
preferential growth of plant cells trans- tive media, only plant tissues that have suc-
formed with this gene on media containing cessfully integrated the transgene construct
these selection agents. The herbicide phos- and express the selectable marker gene will
phinothricin is an analogue of glutamine survive. It is assumed that these plants will
and acts by irreversibly inhibiting glutamine also possess the transgene of interest. Thus,
synthetase, a key enzyme for ammonium subsequent steps in the process will only
assimilation and the regulation of nitro- use these surviving plants.
gen assimilation in plants. The bar gene,
cloned from the bacterium Streptomyces
hygroscopicus, encodes phosphinothricin
acetyltransferase, which converts phosphi- 12.4.2 Selectable marker genes
nothricin into the non-toxic acetylated form for plants
and allows growth of transformed plant
cells in the presence of phosphinothricin, or There are two major classes of selectable
commercial glufosinate ammonium-based marker genes, antibiotic and herbicide
herbicides. resistance genes. Antibiotic resistance
All systems in general have low trans- genes are used in two important phases of
formation efficiencies in the absence of transgenic plant production: (i) pre-plant
selectable markers. However, in the presence transformation to select bacteria during
of a selectable marker, in systems such as routine molecular biology operations to
tobacco, rice and maize cells, transformation manipulate transgenes and create expres-
frequencies are extremely high. With high sion vectors; and (ii) during the transforma-
co-transformation frequencies selectable tion process itself, to select cells and plants
markers facilitate the identification of plants that have stably integrated introduced trans-
containing co-transformed transgenes. The genes (selectable markers and gene(s) of
utility of the individual selectable marker interest) (Ramessar et al., 2007). There are
genes is a function of both the properties of two issues frequently raised with respect to
the respective resistance protein they encode antibiotic resistance genes: (i) effects on the
and the relative sensitivity of the target tis- therapeutic efficacy of clinically used anti-
sue to their corresponding selective agent. biotics, i.e. concerns that antibiotic resist-
The timing of selective agent application ance gene products in transgenic crops or
is critical to its successful utilization and products might render clinically important
transformed cells need to recover and com- therapeutic antibiotics ineffective; and
pete. The relative insensitivity of monocots (ii) potential for horizontal gene transfer,
to high levels of the antibiotic kanamycin i.e. concerns about the potential transfer of
(commonly used in dicot transformation) the antibiotic resistance marker gene to intes-
led to attempts to replace this antibiotic tinal and soil microorganisms. For herbicide
with other selective agents. The features of resistance genes the issues are: (i) gene
a particular transformation system (espe- flow by which new genes can spread by
cially the nature of the material to be trans- normal outcrossing to wild or weedy rela-
formed and the route of transgenic plant tives of the engineered crops; (ii) weediness
regeneration) should be considered when the potential for a crop or its sexually com-
choosing the resistance mechanism and the patible wild relatives to become established
individual marker gene to be employed in and to persist and spread into new habitats
any selection scheme. Patent and freedom as a result of newly introduced genes; and
to operate (FTO) issues often influence the (iii) toxicity and allergenicity an issue
choice of selectable marker gene. associated with human health and the safety
Following the gene insertion process, of novel foods and potential negative effects
plant tissues are transferred to a selec- on non-target organisms.
tive medium containing an antibiotic or Choice of selectable marker genes is a
herbicide, depending on which selectable key factor in plant transformation. Genes that
marker was used. When grown on selec- give resistance to antibiotics or herbicides,
Gene Transfer and GM Plants 475
such as kanamycin, hygromycin, phosphi- markers (Joersbo et al., 1998). Table 12.2 pro-
nothricin and glyphosate, are very popu- vides a list for selectable marker genes used
lar. Kanamycin resistance has been most in plant transformation. Although to date
frequently employed in the transformation more than 20 selectable marker genes have
of many dicotyledonous plants. If the devel- been reported in the transformation of higher
opment of herbicide-resistant plants is aimed plants, many of them were tested only in a
at, a trait gene could also be a selectable limited number of plant species on a limited
marker gene. Because of concerns over anti- scale. Therefore, further studies of marker
biotic resistance genes in commercial trans- genes may contribute to improvement of
formants, genes to add metabolic capabilities the transformation of certain plant species
have been drawing considerable attention. (Komori et al., 2007).
For example, plant cells expressing a phos- Selectable marker genes are driven by
phomannose isomerase can grow on media constitutive promoters. The promoters of
with mannose as the sole carbon source. Such the CaMV 35S transcript (Odell et al., 1985)
markers are referred to as positive selection and the nopaline synthase of A. tumefaciens
Selectable
marker gene Gene product Source Selection
(Depicker et al., 1982) are very popular latory concerns, especially in Europe and
in dicotyledons and the promoters of the difficulty in using in breeding programmes
ubiquitin gene of maize (Christensen et al., for transgene identification.
1992) and the actin gene of rice are popu-
lar in monocotyledons (Zhang et al., 1991). Herbicide tolerance genes
Selectable marker genes are followed by a
DNA fragment, the so-called 3' signal. The A number of herbicides have been used as
3' regions of the CaMV 35S transcript and selective agents in cereal transformation.
the nopaline synthase gene in the wild-type Markers have been developed by engineer-
T-DNA of A. tumefaciens are frequently ing tolerance to herbicides that inhibit
used as a 3' signal. amino acid biosynthesis. Both herbicides
and antibiotics can be used to select materi-
Antibiotic resistance genes als by addition to the tissue culture media or
by spraying the full-grown plants. They both
Aminoglycoside antibiotics are bacte- can be readily used in breeding programmes
rial inhibitors of prokaryotic, mitochon- to select for the inheritance of linked trans-
drial and chloroplast protein synthesis. genes. However, herbicides have a more
Kanamycin, gentamycin/geneticin (G418) serious intellectual property problem than
and paromomycin bind the 30S ribosomal antibiotics. A problem with herbicide resist-
subunit to inhibit translation initiation. ance genes is that we end up with plants that
Hygromycin interacts with the elongation are herbicide resistant although this may
factor EF-2 to inhibit peptide chain elon- not be a desired goal. Several strategies have
gation. Exposure of plants to these antibi- been developed for engineering herbicide
otics leads to an inhibition of chlorophyll tolerance in transgenic cereals, by introdu-
biosynthesis and leaf bleaching. The most cing a herbicide tolerant variant of an amino
widely used selectable markers in cereal acid biosynthetic enzyme, e.g. a mutant als
transformation are the genes encoding gene for sulfometuron methyl (Qust) toler-
neomycin phosphotransferase (nptII), ance and by introducing an enzyme which
hygromycin phosphotransferase (hpt) and inactivates the herbicide, e.g. the bar gene for
phosphinothricin acetyltransferase (bar) phosphinothricin (PPT, Liberty) tolerance.
(Cheng et al., 2004). These genes confer Resistance to PPT-based herbicides using
resistance to kanamycin and some related the bar gene from S. hygroscopicus has been
aminoglycosides (such as G418 and paro- used for the selection of fertile transgenic
momycin), hygromycin and PPT, respec- cereals, e.g. rice, maize, wheat and barley,
tively. Transformed cells in these systems while Monsanto uses 5-enol-pyruvylshiki-
are able to survive and non-transformed mate-3-phosphate synthase (EPSPS) and
cells are killed by the selective agents. DuPont uses imidazolinone, chlorsulfuron
This type of selection is referred to as or acetolactate synthase (ALS).
negative selection. Cereals have proven to
be insensitive to relatively high concen- Engineering detoxification of herbicides
trations of kanamycin. Paromomycin has that inhibit glutamine synthase
been used for selection and regeneration of
rice, maize, wheat, oats and barley trans- The enzyme glutamine synthase (GS)
formed with the nptII gene. Resistance to catalyses the synthesis of glutamine from
hygromycin is encoded by the aphIV gene glutamate and free ammonium. PPT is
(commonly referred to as the hpt gene) a glutamate analogue that acts by inhib-
of E. coli, which codes for hygromycin iting GS activity resulting in a cytotoxic
phosphotransferase (HPT). Rice showed accumulation of ammonium. Inactivation
relatively high sensitivity to hygromycin. of PPT and PPT-containing herbicides
There has been a move away from anti- (Liberty) is conferred by the bar gene from
biotic marker genes in commercial cereal S. hygroscopicus, which encodes a phos-
biotechnology because of associated regu- phinothricin acetyltransferase.
Gene Transfer and GM Plants 477
Removal of marker genes and other Verweire et al. (2007) presented a vector
unnecessary segments by recombination system to obtain homozygous marker-free
transgenic plants without the need of extra
Recombinases from phages and yeasts, such handling and within the same period as
as cre, FLP and R, which recombine spe- transformation methods in which the
cific sites loxP, FRT and RS, respectively, marker is not removed. By introducing a
are powerful tools to remove selectable germline-specific auto-excision vector con-
marker genes (Ow, 2001) and effective for a taining a cre recombinase gene under the
few model systems. A DNA segment placed control of a germline-specific promoter,
between two of the specific recombination transgenic plants become genetically pro-
sites may be excised from the plant chromo- grammed to lose the marker when its pres-
some if the corresponding recombinase is ence is no longer required (i.e. after the
somehow expressed in the plant cell. For initial selection of primary transformants).
example, transgenic lines that contained Using promoters with different germline
the loxP sites were crossed with lines that functionality, two modules of this genetic
expressed the cre recombinase gene (Moore programme were developed. In the first
and Srivastava, 2006). Various sophisticated module, the promoter, placed upstream of
vector configurations and means to express the cre gene, confers CRE functionality in
the recombinases were reported to exploit both the male and the female germline or in
this system (Wang, Y. et al., 2005; Jia et al., the common germline (e.g. floral meristem
2006). The recombinases may be able to cut cells). In the second module, a promoter
out not only marker genes but also other conferring single germline-specific CRE
unnecessary DNA segments. For example, functionality was introduced upstream of
tandem integration of two or more copies of the cre gene.
T-DNA in a single locus has been observed Recently, Mlynarova et al. (2006) and
quite frequently (Krizkova and Hrouda, Luo et al. (2007) showed that it was possible
1998); it is a cumbersome phenomenon to remove transgenes (selectable markers and
because clean, single-copy transformants are others) efficiently by using an auto-excision
generally preferred. If a recombination site is vector in which a promoter that was specifi-
possessed by the T-DNA, a segment between cally functional during microsporogenesis,
two of the sites in the tandem T-DNA could in pollen or in seed, was placed upstream
be deleted so that a clean, single T-DNA of a site-specific recombinase gene. More
integration pattern could be generated. efficient transmission of the recombined
The multi-auto transformation (MAT) allele to the progeny was observed com-
vector system uses recombinase-based exci- pared to previously described auto-excision
sion to enable the production of marker-free strategies that rely on chemical or physical
transgenic plants (Sugita et al., 2000). An induction of the recombinase. The results
Agrobacterium isopentenyltransferase (ipt) presented by Verweire et al. (2007), together
gene provides a positive visual selectable with the results obtained by Mlynarova
marker for transformation by catalysing et al. (2006) and Luo et al. (2007), clearly
cytokinin synthesis and inducing a shooty indicate that germline-specific auto-exci-
phenotype on hormone-free medium. After sion is an efficient, flexible and versatile
selection, subsequent excision via the R/RS system to remove selectable markers from
system produces marker-free transgenic transgenic plants.
plants with a normal phenotype, allowing ipt
and MAT to be used again for another round Use of transposons
of transformation. Recent improvements
to the method have increased its efficiency The maize Ac/Ds transposable element
and have allowed it to be applied to species system has been used to create novel
that do not regenerate through cytokinin- T-DNA vectors for separating genes that are
dependent organogenesis, but rather via linked together on the same T-DNA after
somatic embryogenesis (Endo et al., 2002b). insertion into plants. The expression of the
480 Chapter 12
Ac transposase from within the T-DNA bined with the site-specific recombina-
can induce the transposition of the gene of tion system (R/RS). At transformation, the
interest from the T-DNA to another chromo- oncogenes regenerate transgenic plants and
somal location (Shrawat and Lrz, 2006). then are removed by the R/RS system to
This results in the separation of the gene generate marker-free transgenic plants. The
of interest from the T-DNA and selectable choice of a promoter for the oncogenes and
marker gene. the recombinase (R) gene, the state of plant
materials and the tissue culture conditions
greatly affect efficiency of both the regenera-
Use of homologous recombination
tion of transgenic plants and the generation
Homologous recombination between direct of marker-free plants (Ebinuma et al., 2004).
repeats provides a method for excising These conditions have been evaluated in
marker genes after transgenic cells and several plant species to increase their gen-
shoots have been isolated. The strategy uses eration efficiency and the MAT system has
native plant enzymes and is simple because been applied to tobacco and rice (Endo
it avoids the need for foreign site-specific et al., 2002a, b).
DNA recombinases (Corneille et al., 2001; As discussed above, marker-free
Hajdukiewicz et al., 2001). Efficient imple- transgenic cereal plants can be generated
mentation of the method requires high rates at varying efficiencies using different
of homologous recombination relative to approaches and techniques, followed by
illegitimate recombination pathways. The segregation of the genes in the subsequent
procedure works well in plasmids where sexual generation. However, there are lim-
homologous recombination predominates. itations associated with these techniques
Marker genes are flanked by engineered (Shrawat and Lrz, 2006). For example,
direct repeats. The number and length of co-transformation technology is not suit-
direct repeats flanking a marker gene influ- able for all plant species and its efficiency
ence the excision rate. Excision is automatic is clearly dependent on a number of vari-
and loss of the marker gene is controlled by ables, including the Agrobacterium strain
selection alone. After transgenic cells have and the plant tissue being transformed. In
been isolated, selection is removed allow- addition, this technique is labour inten-
ing loss of the marker genes. Excision is a sive, requiring the production of a large
unidirectional process resulting in the rapid number of transgenic plants to isolate the
accumulation of high levels of marker-free plant of interest. Although site-specific
plastid genomes. Cytoplasmic sorting of recombinases hold the greatest prom-
marker-free plastids from marker-containing ise for the excision of selectable marker
plastids leads to the isolation of marker-free genes, concerns also exist about pleio-
plants. Marker-free plants can be isolated tropic effects induced by the action of
following vegetative propagation or among recombinase on cryptic excision sites in
the progeny of sexual crosses. the plant genomes. A transposon to sepa-
rate the selectable marker gene and gene
Use of positive markers of interest (Goldsbrough et al., 1993) is of
limited use. Homologous recombination
Ebinuma et al. (2001) developed removal approaches, although interesting from a
systems combined with a positive marker, scientific point of view, are only effective
which are called MAT vectors. The MAT for a few model systems.
vector system is designed to use the onco-
genes (ipt, iaaM/H, rol) of Agrobacterium,
which control the endogenous levels of 12.5 Transgene Integration,
plant hormones and the cell response to Expression and Localization
plant growth regulators, to differentiate
transgenic cells and to select marker-free Once whole plants are generated and
transgenic plants. The oncogenes are com- produce seeds, evaluation of the progeny
Gene Transfer and GM Plants 481
begins. The transgenic plants should be integration of defined T-DNAs, often into tran-
evaluated for transgene integration, expres- scriptionally active sites. Gene targeting has
sion and localization. the potential to place foreign gene sequences
in predetermined regions of the genome thus
potentially overcoming so-called position
12.5.1 Transgene integration effects on transgene expression. Transposons
can be used to deliver recombination targets
As a part of the regulatory process associ- for subsequent site-specific integration.
ated with commercial release of a trans-
genic plant product, transgene integration
events must be fully characterized. For 12.5.2 Transgene expression
transgene technology to be useful, trans-
genes must have predictable and stable
Transformation technologies can be used
expression. Technologies have been sought
for characterizing expression elements
that would enhance our ability to create
using reporter genes, utilizing transgene
transgenic plants with the desired expres-
expression to modify endogenous metabolic
sion characteristics. One of these technolo-
activities, introducing transgenes conferring
gies involves the use of matrix attachment
novel phenotypic characteristics, inactivat-
regions (MARs). MARs are DNA sequences
ing genes using anti-sense or co-suppres-
that bind specifically to a network of pro-
sion technologies and identifying genes by
teinaceous fibres, called the nuclear matrix,
complementation. The characterization of
which permeates the nucleus. These MAR
constitutive and non-constitutive promoter
matrix interactions are thought to organize
elements has advanced the most in cereal
chromatin into a series of independent loop
transformation, but there are other non-
domains. When MARs are positioned at
promoter elements that regulate and control
the 5'- and 3'-ends of a transgene more pre-
gene expression in transgenic plants, which
dictable expression of the transgene results
include transcript termination, transcript
(Allen et al., 2000).
stability, post-transcriptional modification,
Transgenic plants often contain complex
translation efficiency and protein targeting.
integration structures at an undetermined
Transgenes currently used in cereal
genomic location, which may cause varia-
transformation have a relatively simple
tions in gene expression. It has been dem-
structure. They usually contain: (i) a pro-
onstrated that the precise integration of a
moter, usually of plant, bacterial or viral
transgene in a pre-determined genomic loca-
origin, which may be constitutive (Act1),
tion can reduce the variation in transgene
inducible (Hsp70) or tissue-specific (Amy1)
expression (Day et al., 2000). The integration
and which may have been modified for
of transgenes in a pre-determined genomic
optimal activity; (ii) a coding sequence,
locus can be achieved by the use of site-
which may have been modified for optimal
specific recombinase systems, such as cre/
expression in transgenic plants, e.g. trans-
lox and FLP/frt (Ow, 2002). Integration by
lation initiation site modification, targeting
homologous recombination would favour the
information, glycosylation site modification
establishment of a simple integration pattern
and codon usage modification; and (iii) a
and allow the insertion of a transgene into a
transcript termination sequence.
known and stable region of the genome.
Individual transgenic lines with com-
plex integration patterns are generally
considered undesirable. There has been a 12.5.3 Confirmation of transgene
drive to achieve cereal transformation using and analysis of gene expression
Agrobacterium and other target recombi- in transgenic plants
nation/integration systems. Agrobacterium-
mediated DNA integration is a defined Commonly used methods to confirm the
process that generally results in low copy putative transgenic plants, as discussed in
482 Chapter 12
gene expression in subsequent generations tissue where the gene is expressed, as well
of primary transformants, can occur at the as the cellular localization of the protein.
transcriptional or post-transcriptional level The reintroduction of the full-length cDNA
and the phenomenon has often been asso- into a plant can also result in either over-
ciated with a high transgene copy number expression or silencing of that gene. The
(Matzke and Matzke, 1995; Matzke et al., subsequent phenotype that is observed pro-
2000). Studies have indicated that the vides clues as to the function of the gene. In
problem of transgene silencing raises seri- addition, overexpression of such a gene, for
ous concerns regarding the selection of which a full-length cDNA is available, can
transgenic lines for crop improvement with be accomplished in a heterologous system,
specific trait(s). Therefore, it now appears such as yeast or E. coli, followed by in vitro
imperative that transgenic lines carrying studies of the protein function.
gene(s) of economic importance need to be Transformation of allelic series into iso-
carefully tested for gene expression levels genic backgrounds can confirm the function
over many generations. of individual sequence motifs. However,
Particle bombardment has featured current plant transformation protocols based
strongly in the burgeoning field of cereal on non-homologous end joining result in
functional genomics, specifically through random genomic integration of transgenic
the development of transposon-tagged plant DNA, position effects, multiple insertions of
lines for the systematic functional char- the transgene and transgene alterations (Xu,
acterization of plant genes. For example, 1997; Hanin and Paszkowski, 2003), obscur-
Kohli et al. (2001, 2004) produced a large ing quantitative phenotypic differences
population of transgenic rice plants tagged between alleles. This can be circumvented
with the maize Ac transposon. They found using homologous recombination-based,
that this population was suitable for satura- locus-targeted integration of alleles. Recently,
tion mutagenesis and the rapid PCR-based 1% of insertion events in rice were found
cloning of interrupted genes using unique to result from homologous recombination
barcode elements present in the DNA cas- (Terada et al., 2002). If this finding can
sette used for transformation (Kohli et al., be confirmed, rice genomics-genetics will be
2001). Callus induced from specific trans- revolutionized. Further, if the method can
poson-tagged rice plants was maintained in be applied to other species, a similar advance
a dedifferentiated state prior to regenera- in genomics of all plants would occur.
tion into clonal transgenic lines, prolonging Virus-based vectors can be efficiently
the developmental phase characterized by used for high levels of transient expression
hypomethylation of genomic DNA (Kohli of foreign proteins in transfected plants and
et al., 2004). This resulted in a dramatically permit non-Agrobacterium bacterial spe-
increased frequency of secondary trans- cies to be employed for the production of
position events compared to seed-derived transgenic plants (reviewed by Chung et al.,
plants, thus increasing the rate of genome 2006). Viral vectors hold great promise as
saturation. efficient tools for transient recombinant
As detailed more fully in Chapter 6 of protein expression in plant cells because
Cullis (2004), the use of tagged full-length of their ability to replicate in host cells
cDNAs in transgenic plants can be a first autonomously (Marillonnet et al., 2004,
step in isolating and identifying the protein 2005). These viral vectors are built on the
complexes that exist in vivo. Genetic trans- backbones of plus-sense RNA viruses, such
formation can also be used to develop a pro- as tobacco mosaic virus (TMV) or potato
tein atlas of where in the cell each of the virus and have been used for the expression
genes is expressed. A full-length cDNA can of foreign sequences in plants (Porta and
be tagged with a dye and the tagged probe Lomonossoff, 2002; Gleba et al., 2004).
transformed back into the plant under the The recent development of reliable and
control of its native promoter. The site of efficient Agrobacterium-mediated transfor-
the fluorescence will indicate the organ or mation technologies for cereals (for review,
484 Chapter 12
see Shrawat and Lrz, 2006; Goedeke et al., under, for instance, the direction of differ-
2007) has stimulated a variety of strategies ent promoters or the presence of different
towards functional gene characterization, transcription factors may be investigated.
thereby paving the way for deeper under- Reporter genes are used in cereal transfor-
standing of crop plant biology in cereals mation for analysing gene function, monitor-
(Himmelbach et al., 2007). Comprehensive ing selection efficiency in both transformed
analyses of gene function include stable tissue and transgenic plants and following
transformation with sequences for overex- the inheritance of foreign genes in subse-
pression or knock-out of plant genes. quent plant generations.
Transient expression assays using
promoterreporter fusion genes may be used
to analyse gene regulation and function.
12.5.4 Reporter genes There can be incongruity between results
obtained from transient assays and those
Reporter genes, whose expression can be observed in stably transformed plants. The
easily monitored, are useful in many ways utility of different reporter genes in cereal
in plant transformation. Strength and tem- transformation is a function of the proper-
poral, spatial and other types of regulation ties of the respective protein products they
of promoters and other elements may be encode. The required properties a good
conveniently assayed by connecting these reporter gene should have include: (i) expres-
elements to the reporter genes. Genes for sion in plant cells; (ii) low background activ-
GUS (Jefferson, 1987), luciferase (Ow et al., ity in transgenic cereals; (iii) no detrimental
1986) and GFP (Pang et al., 1996) are popu- effects on plant metabolism; (iv) only mod-
lar examples. Gene fusions of the reporters erate stability in vivo so as to detect down-
and proteins of interest may be employed to regulation of gene expression as well as gene
examine the subcellular localization of the activation; and (v) coming with an assay
proteins. system that is non-destructive, quantitative,
Reporter genes that are connected to sensitive, versatile, simple to carry out and
constitutive promoters may be used to inexpensive. The coral-derived red fluores-
monitor the process of transformation. The cent protein DsRed is one of the reporter
establishment of genetic transformation pro- systems currently used in cereal transforma-
cedures has relied on, among other factors, tion that have all these desired properties.
the use of efficient reporter genes, which
easily allows the detection of transgenic -Glucuronidase
events after a transformation experiment, in
either a transient or stable expression assay. b-Glucuronidase (GUS) catalyses the hydrol-
Expression of the reporter genes soon after ysis and cleavage of a wide range of fluoro-
the inoculation of plant cells with A. tumefa- metric and histochemical b-glucuronide
ciens, is referred to as transient expression. substrates. Since GUS gene (gus, gusA, or
Expression of the reporter genes later in a uidA) was first isolated from E. coli, many
cluster of cells growing on selection media efforts have been made to develop the
is a piece of evidence for integration of the E. coli uidA gene as a reporter system for
T-DNA in plant chromosomes. A binary plant transformation. Indeed, it has become
vector that carries a constitutive selectable the most widely used marker system,
marker and a constitutive reporter is very mainly because of the enzyme stability
useful as a control vector both in transfor- and high sensitivity and amenability of the
mation experiments and in assays of gene assay to detection by fluorometric, spectro-
expression (Komori et al., 2007). photometric, or histochemical techniques.
It should also be mentioned that gene In addition, there is little or no detectable
reporter systems have played a key role in GUS activity in almost any higher plant tis-
many gene expression and regulation stud- sues. The expression of gus gene fusions
ies, in which expression of a reporter gene can be quantified by fluorometric assay.
Gene Transfer and GM Plants 485
Histochemical analysis can be used to local- cations. The GFP was isolated from a jel-
ize gene activity in transgenic tissues. lyfish (Aequorea victoria) in 1992 and has
There are a number of problems since been modified for specific applications
associated with the use of gus reporter genes. and transformed into many different organ-
The expression assays of the gus gene are isms. GFP monitoring has the potential to
destructive. The GUS protein shows high in track transgenes under large spatial scales
vivo stability, leading to problems when used utilizing visual or instrumental detection of
to monitor gene inactivation. Histochemical the characteristic green fluorescence of trans-
localization of GUS enzyme activity can be genic materials. There are other versions of
leaky. Dependence on the use of gus genes GFP fluorescing at different wavelengths
to monitor the efficiency of cereal transfor- that allow detection of multiple proteins.
mation protocols has often been misleading. GFP expression in mammalian cells yields
a green fluorescence when excited by blue
Luciferase light, which does not require additional gene
products or exogenous substrates for activity
The product of the firefly (Photinus pyralis) and detection is non-destructive.
luciferase gene (luc) catalyses the oxidation GFP showed relatively weak activity
of D()-luciferin in the presence of ATP to in transformed plant cells and a number of
generate oxyluciferin and yellow-green light. modifications have been made to increase
The activity of luciferase gene fusions can be GFP expression in plants. The modifications
assayed in transformed cereal tissue non- include: (i) point mutations to increase
destructively. There are a number of prob- signal intensity and shift excitation peak;
lems associated with the use of luc reporter (ii) mutations to alter codon usage for effi-
genes. First, penetration of the luciferin sub- cient translation and increased mRNA sta-
strate can be limiting in whole plant material. bility; (iii) mutation to remove cryptic intron
Secondly, detection equipment presently splice junctions to increase mRNA process-
needed to monitor luciferase gene expres- ing and stability; (iv) subcellular localiza-
sion is relatively expensive. luc genes are tion, targeting to the oestrogen receptor, to
widely used as an internal standard with gus reduce mild phytotoxicity; and (v) mutation
fusions constructed to study gene expression to inhibit thermosensitive protein misfold-
in transient assays and in transgenic plants. ing. The mgfp5-er variant gene has been
shown to be a feasible transgene monitor in
Anthocyanin biosynthetic pathway genes plants under field conditions (Haseloff et al.,
C1, B and R genes code for trans-acting factors 1997; Harper et al., 1999). GFP has also been
that regulate the anthocyanin biosynthetic shown to be a feasible qualitative marker for
pathway in maize seeds. Introduction of these the presence of a linked synthetic Bt crylAc
regulatory genes, with constitutive promoters, endotoxic transgene (Harper et al., 1999;
into cereal cells induces cell autonomous pig- Halfhill et al., 2001). With these beneficial
mentation in non-seed tissues. This reporter characteristics, the next step in the develop-
system does not require the application of ment of a GFP monitoring system is to better
external substrates for its detection. describe the system and resolve weaknesses
that could limit the utility of the monitoring
system (Halfhill et al., 2004b).
Green fluorescent protein
crossing plants within the same species or the other. Crossing both transgenic parental
with closely related species to bring dif- lines results in progeny of which 25% (in
ferent genes together. The growing interest case both parents were hemizygous for the
in dissecting and analysing complex meta- transgenes) or all (in case both parents were
bolic pathways and the need to exploit the homozygous for the transgenes) contain the
full potential of multi-gene traits for plant two transgenes.
biotechnology (for review, see Halpin and The main advantage of the crossing-
Boerjan, 2003; Tyo et al., 2007) provide based method for transgene stacking is that
a mandate for the development of new the method is technically simple. It only
methods and tools for the integration of involves transfer of pollen from one par-
multiple transgenes into the plant genome ent to the female reproductive organ of the
(multi-transgene pyramiding or stacking) other. One other advantage is that transgenic
and coordinated expression of these trans- populations of each parent can be screened
genes in transformed plants. for optimal expression of each transgene,
Several approaches can be considered thus facilitating the combination of two
when using single-gene vectors for the optimally expressed transgenes. However,
delivery of multiple genes into plant cells the procedure is relatively time-consuming,
(Halpin et al., 2001; Daniell and Dhingra, certainly if more than two transgenes need
2002; Halpin and Boerjan, 2003). Some to be combined by sequential crossing. The
of the approaches used for the produc- two transgenes in the lines resulting from
tion of transgenic plants carrying multiple the cross will most probably reside on dif-
new traits include: (i) re-transformation ferent chromosomal loci that complicate
(Singla-Pareek et al., 2003; Seitz et al., further breeding through conventional
2007), the stacking of several transgenes methods. Furthermore, for some agronomi-
by successive delivery of single genes into cally important crops like potato and cas-
transgenic plants; (ii) co-transformation sava, the high level of heterozygosity in the
(Li, L. et al., 2003; Altpeter et al., 2005a), species makes crossing approaches difficult
the combined delivery of several trans- and time-consuming. Crossing is very diffi-
genes in a single transformation experi- cult to apply to plants that are vegetatively
ment; and (iii) sexual crosses (Ma et al., propagated (e.g. perennial fruit crops and
1995; Zhao et al., 2003; Lucker et al., 2004) many ornamentals) since the (desired) het-
between transgenic plants carrying differ- erozygous nature of the genetic background
ent transgenes. will be altered due to recombination during
In this section, several transgene- meiosis (Gleave et al., 1999).
stacking/pyramiding methods will be dis- Sexual crosses among transgenic plants
cussed, which are mainly based on two make it possible to exploit powerful super-
reviews by Francois et al. (2002a) and traits that are not attainable through
Dafny-Yelin and Tzfira (2007) and revision traditional methods. One example of a
of the different multi-transgene-pyramiding crop carrying such new characteristics is
methods. Table 12.3 summarizes the Monsantos multi-stacked maize, which was
advantages, disadvantages and examples produced via conventional crossing of three
of the different multi-transgene-stacking inbred transgenic maize lines: MON863,
methods in plants. MON810 and NK603. The elements incorpo-
rated into this multi-stack include five loci,
four of which carry a synthetic gene linked
12.6.1 Sexual crosses to combinations of strong regulatory ele-
ments from viruses, bacteria and unrelated
In a crossing experiment, two plants are plants. Expression of the first two synthetic
crossed to obtain progeny that consists of genes produces an EPSPS that resembles
the traits of the two parents. In the case of the EPSPS from E. coli and is, unlike most
transgenic plants, a first gene is introduced plant versions, not inactivated by herbicides
in one of the parents and a second gene in containing glyphosate. The third synthetic
Table 12.3. Summary of advantages, disadvantages and examples of multi-transgene-stacking methods in plants.
Crossing Technically simple; pre-selection Time-consuming; difficulties Mercury detoxification (Bizily et al, 2000);
of parents with optimal gene in further breeding; not applicable to antibody engineering (Hiatt et al., 1989);
expression vegetatively propagated plants antimicrobial resistance (Zhu et al., 1994)
Sequential Applicable to vegetatively propagated Time-consuming; necessity for different Plant fertility restoration system (Hird et al.,
transformation plants; allows maintenance of elite selection markers 2000); removal of selectable marker gene
genotype (Gleave et al., 1999)
Co-transformation
Single plasmid Linked integrationa; single Technically demanding; linked Reporter gene expression (Christou and
transformation event integrationa Swain, 1990)
Multiple plasmid Technically simple; single Dependence on co-transformation Production of vitamin A-enriched rice
transformation event frequency (Golden Rice) (Ye et al., 2000);
489
a
Linked integration of the transgenes can be advantageous when the transgenic line is to be used in traditional breeding, whereas linked integration can be undesirable when one of the
transgenes (e.g. the selectable marker gene) is to be removed via outcrossing.
b
IRES, internal ribosome entrysite.
490 Chapter 12
gene encodes the insecticidal cry3Bb1 pro- Co-transformation with multiple plas-
tein with activity against specific Coleoptera, mids has the obvious advantage that assem-
whereas the fourth gene product, cry1Ab, pro- bly of the different expression cassettes
vides tolerance against certain Lepidopteran is technically easier as it is done inde-
insects. The fifth gene is a bacterial kan- pendently on different plasmids (Komari
amycin resistance gene encoding neomycin et al., 1996). The success of this technique
phosphotransferase (nptII). The pentuple depends on the frequency with which two
stack maize currently occupies millions of (or more) independent transgenes are both
hectares in the USA and supports a substan- transferred to the plant cell and integrated
tial reduction in pesticide usage. into the cell genome (= co-transformation
frequency). Agrawal et al. (2005) trans-
formed rice simultaneously with five mini-
12.6.2 Co-transformation via plasmids mal cassettes, each containing a promoter,
coding region and polyadenylation site but
Co-transformation is defined as the simul- no vector backbone. They found that multi-
taneous introduction in a cell of multi- transgene co-transformation was achieved
ple genes followed by the integration of with high efficiency using multiple cas-
the genes in the cell genome. The genes settes, with all transgenic plants generated
are either present on the same plasmid containing at least two transgenes and 16%
used in transformation (single-plasmid co- containing all five. They concluded that
transformation) or on separate plasmids gene transfer using minimal cassettes is an
(multiple-plasmid co-transformation). The efficient and rapid method for the produc-
main advantage of co-transformation for tion of transgenic plants containing and sta-
transfer of multiple genes into a plant is bly expressing several different transgenes.
that a single transformation event can result Their results facilitate effective manipula-
in the integration of multiple transgenes as tion of multi-gene pathways in plants in a
opposed to sequential transformation which single transformation step.
requires multiple, time-consuming transfor-
mation events.
Theoretically speaking, however, co- 12.6.3 Co-transformation via particle
transformation has some technical limita- bombardment
tions. For single-plasmid co-transformation,
the main technical limitation is the diffi- Particle bombardment is the most conven-
culty to assemble complex plasmids with ient method for multiple gene transfer to
multiple gene cassettes (Franois et al., plants since DNA mixtures comprising
2002a). Standard transformation vectors are any number of different transformation
not really up to such a task. A major prob- constructs can be used, with no need
lem is that their multiple cloning sites con- for complex cloning strategies, multiple
sist merely of hexa-nucleotide restriction Agrobacterium strains or sequential cross-
sites, which are often present within one ing (Altpeter et al., 2005a). Many studies
or more of the sequences that one wishes describe successful integration of two or
to insert in the vector. Insertion of more three different transgenes, in addition to the
than one or two expression cassettes often selectable marker, into plants by particle
requires inefficient partial digests or the use bombardment.
of linkers to convert one restriction site to Wu, L. et al. (2002) examined the co-
another or the use of inefficient blunt-end transformation of rice with nine transgenes
cloning. When plant transformation vectors via particle bombardment and documented
with multiple expression cassettes are even- the levels of transgene expression. They
tually finalized, it is often not possible to found that non-selected transgenes were
move or replace the cassettes in single clon- present along with the selectable marker in
ing steps, due to the presence of restriction about 70% of the plants and that 56% car-
sites at undesired locations. ried seven or more genes. This was much
Gene Transfer and GM Plants 491
higher than expected given the independ- duced full-sized multimeric antibodies in
ent integration frequencies, agreeing with a transgenic plants. These proteins comprise
model proposing that the integration of one at least two components, the heavy and light
gene into a specific locus in the rice genome chains, but more complex antibody forms
could mediate the insertion of other genes such as secretory antibodies (sIgA) also
into the same locus (Kohli et al., 1998). require a joining chain and a secretory com-
This phenomenon is important when large ponent. Nicholson et al. (2005) simultane-
numbers of genes are considered, since a ously delivered all four genes, together with
much larger transgenic population would a fifth gene encoding a selectable marker,
be required if each integration event were into rice by particle bombardment.
independent. Wu, L. et al. (2002) also For many applications of transgenesis,
found that all of the nine transgenes were production of different heterologous pro-
expressed and that the expression of one teins and hence introduction of multiple
gene was independent of each other. These transgenes (multi-transgene-stacking), is
findings are very useful in designing mul- highly desired. During the last decade, the
tiple plasmid transformation experiments number of approaches for multi-transgene-
such as those required for plant metabolic stacking in plants using transgenesis has sig-
engineering. nificantly increased. For all the benefits and
One of the most interesting recent devel- simplicity of combining co-transformation,
opments of particle bombardment is the retransformation and crosses while using
combination of multiple gene transfer and single-gene vectors for the delivery of multi-
clean DNA techniques, i.e. the simultaneous ple genes into plant species, these methods
transfer of multiple gene cassettes into rice suffer from several drawbacks. These include
plants. Three coat protein genes from the the undesirable incorporation of a complex
same virus were introduced simultaneously T-DNA integration pattern, often observed
to generate rice plants with pyramidal resist- during integration of T-DNA molecules from
ance against a single pathogen (Sivamani multiple sources (De Neve et al., 1997; De
et al., 1999). Similarly, Maqbool et al. (2001) Buck et al., 1999) and the time needed for
have shown how the same transforma- retransformation or crosses between trans-
tion strategy can provide pyramidal insect genic plants. More importantly, transgenes
resistance in rice. Datta et al. (2003) have derived from different sources typically
succeeded in the development of Golden integrate at different locations in the plant
indica rice lines containing four genes, genome, which may lead to various expres-
i.e. those required to extend the existing sion patterns and possible segregation of the
carotenoid metabolic pathway (psy, crtI transgenes in the offspring.
and lcy) in addition to the selectable marker Except for those discussed above, other
gene, either phosphomannose isomerase approaches for transgene stacking include
(pmi) or hygromycin phosphotransferase vector assembly, internal ribosome entry site
(hpt). Romano and colleagues synthesized (IRES), transplastomic technology and poly-
polyhydroxyalkanoates (PHAs) in trans- protein approach. Therefore, after evaluation
genic potatoes by simultaneously introduc- of the pros and cons of the different methods,
ing the phaG and phaC genes encoding one should be able now to select an appropri-
acyl-CoA trans-acylase and PHA polymer- ate approach for most purposes. Moreover,
ase along with the neomycin-phosphotrans- the potential of the different methods can
ferase selectable marker in three separate be significantly increased by combining
constructs (Romano et al., 2005). approaches. For example, for the delivery of
In addition to applications in metabolic different antimicrobial protein (AMP) genes,
engineering and multi-gene resistance strate- it has been able to double the capacity of
gies, the direct transfer of multiple genes has modular plant transformation vector by com-
also become a practical strategy for generat- bining it with a polyprotein strategy (Goderis
ing crops that produce multimeric proteins. et al., 2002). For this purpose, single transgene
For example, Nicholson et al. (2005) pro- units of the original vector were replaced by
492 Chapter 12
poly-AMP encoding expression cassettes and improved cultivar must be followed by sev-
transformed to A. thaliana. Single biologi- eral cycles of repeated backcrosses to the
cally active AMPs could be demonstrated in improved parent. The goal is to recover as
the resulting transgenic plants. much of the improved parents genome as
possible, with the addition of the transgene
from the transformed parent.
12.7 Transgenic Crop The next step in the process is multi-
Commercialization location and multi-year evaluation trials
in greenhouse and field environments, as
Genetic transformation has the potential to described in Chapter 10, to test the effects of
address some of the most challenging biotic the transgene and overall performance. This
and abiotic constraints faced by farmers in phase also includes evaluation of envir-
non-industrialized agriculture, which are onmental effects and food safety.
not easily addressed through conventional
plant breeding alone. The major constraints
include insect pests and viruses, as well 12.7.1 Commercial targets
as drought. A second advantage of genetic
transformation is that it can add an eco- Commercialization of transgenic products
nomically valuable trait while maintaining is influenced by markets, i.e. consumer
other desirable characteristics of the host demand for improved processes and new
cultivar. For example, enhanced product products are dependent on technology
quality or micronutrients can be added to scientific discoveries in molecular genetics
a well-adapted cultivar that already yields and biochemistry. Some examples of com-
well under local conditions. This feature is mercial targets include:
particularly attractive for semi-commercial,
1. Hybrid seed systems for heterosis and
small-holder farmers in non-industrialized
intellectual property protection, such as
agriculture, who are more likely to consume
nuclear male sterility systems for inbred
as well as sell their farm products. The poor
line production.
of the developing world should benefit from
2. Pest and disease tolerance genes: Bt genes,
the deployment of desirable transgenic crops
a-amylase inhibitors, viral coat proteins.
that follows scientifically-sound biosafety
3. Stress tolerance genes: barley Hva1,
and food safety standards and appropriate
maize ZmPLC1.
intellectual property management and stew-
4. Herbicide resistant crops: muta-
ardship (Ortiz and Smale, 2007).
tion screens for resistance to sethoxy-
Intrinsic to the production of trans-
dim (Poast), an acetyl-CoA carboxylase
genic plants is an extensive evaluation
(ACCase) inhibitor; transgenic plants for
process to verify whether the inserted gene
glyphosate (Roundup) resistance.
has been stably incorporated without det-
5. Genes for commercially valuable oils,
rimental effects to other plant functions,
proteins and starches: fatty acid biosynthetic
product quality, or the intended agroecosys-
gene modification in high oil corn; modifi-
tem. Initial evaluation includes attention to
cation of seed storage proteins; generation of
activity of the introduced gene, stable inher-
transgenic corn with improved amino acid
itance of the gene and unintended effects on
profiles; manipulation of carbon-partitioning
plant growth, yield and quality.
genes for novel starch production.
If a plant passes these tests, it may not
6. Genes for improved plant performance:
be used directly for crop production, but
generation of dwarf cultivars of wheat and
will be crossed with improved cultivars of
rice; PhyA expression for narrow-row crop
the crop. This is because not all cultivars of
production.
a given crop can be efficiently transformed
and these generally do not possess all the Most of these potential products gener-
producer and consumer qualities required ate revenue by lowering the costs (financial
of modern cultivars. The initial cross to the and/or environmental) of plant production,
Gene Transfer and GM Plants 493
e.g. reducing the level of chemical inputs 4. Improved grain quality: bean b-phaseolin
such as insecticides for both pests and seed storage gene expression in endos-
viral vectors. Following are examples for perm for improved lysine and isoleucine
the trangenes of agronomic importance levels.
that have been introduced into transgenic
Barley:
cereals:
1. Virus resistance: coat protein-mediated
Maize: barley yellow dwarf virus tolerance.
1. Insect resistance: synthetic truncated ver- 2. Improved malting/brewing character-
sion of the CrylA(b) protein from Bacillus istics: hybrid bacterial b-glucanase gene
thuringiensis for tolerance to European corn expression for enzyme thermotolerance.
borer. Wheat:
2. Virus resistance: coat protein-mediated
tolerance to maize dwarf mosaic virus. 1. Improved bread-making characteristics:
3. Herbicide resistance: the bar gene for PPT chimeric Dy10-Dx5 high molecular weight
(Liberty) tolerance; mutant epsps synthase gluten gene expression in endosperm.
genes for glyphosate (Roundup) toler- 2. Transgenes conferring herbicide resist-
ance; and mutant als gene for sulfonylurea ance: the bar gene for PPT (Liberty) tol-
(Glean) tolerance. erance; mutant EPSPS synthase genes for
glyphosate (Roundup) tolerance.
Rice:
1. Resistance to bacterial pathogens: chiti-
nase gene conferring enhanced tolerance to 12.7.2 Current status of transgenic crop
sheath blight; Xa-21 bacterial blight resist- commercialization
ance gene.
2. Virus resistance: coat protein-mediated Commercial adoption by farmers of trans-
rice stripe virus tolerance; coat protein- genic crops has been one of the most
mediated rice dwarf phytoreovirus rapid cases of technology diffusion in
tolerance. the history of agriculture (Borlaug, 2000).
3. Insect resistance: Bt CrylA(b) gene Commercialization of transgenic crops
expression for leaf folder and stem borer started in 1996. Fig. 12.6 provides data on
tolerance. the global areas of biotech/GM-crops grown
100
Million ha
80
60
40
20
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year
Fig. 12.6. Global area of biotech/GM crops (19962007). From James (2008) with permission.
494 Chapter 12
over the last 12 years (19962007) (James, GM crops in the USA in 2007 were stacked
2008). As a result of consistent and substan- products containing two or three traits that
tial benefits during the first dozen years of delivered multiple benefits.
commercialization, farmers have continued Soybean is the principal biotech/
to plant more biotech/GM-crops every sin- GM-crop, occupying 58.6 million ha (51%
gle year. In 2007, the global area of biotech/ of global biotech/GM area), followed by
GM-c0rops reached 114.3 million ha with fast-growing maize (35.2 million ha at 31%),
an unprecedented 67-fold increase between cotton (15.0 million ha at 13%) and canola
1996 and 2007, making it the fastest adopted (5.5 million ha at 5% of global biotech/
crop technology in recent history. The pro- GM-crop area) (Fig. 12.7A; James, 2008).
portion of the global area of biotech/GM Since the genesis of commercialization
crops grown by developing countries has in 1996, herbicide tolerance has consist-
increased consistently and by 2007, 43% of ently been the dominant trait (Fig. 12.7B).
the global biotech crop area, equivalent to In 2007, herbicide tolerance, deployed in
49.4 million ha, was grown in developing soybean, maize, canola, cotton and lucerne
countries (Table 12.4). The USA, followed occupied 63% or 72.2 million ha of the glo-
by Argentina, Brazil, Canada, India and bal biotech/GM-crops.
China are the principal adopters of biotech/ The most recent survey of the glo-
GM crops globally, with the USA retaining bal impact of biotech/GM-crops for the
its top world ranking with 57.7 million ha period 19962006, estimates that the
(50% of global biotech area) (Table 12.4). global net economic benefits to biotech/
Notably, 63% of biotech/GM-maize, 78% of GM-crop farmers in 2006 was US$7 billion,
biotech/GM cotton and 37% of all biotech/ and US$34 billion (US$16.5 billion for
Table 12.4. Global area of biotech/GM-crops in 2007 by country (from James (2008) with permission).
A 70
60 Soybean
Maize
50
Cotton
Canola
Million ha
40
30
20
10
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
B 80
70
Herbicide tolerance
60 Insect resistance
Herbicide tolerance/insect resistance
50
Million ha
40
30
20
10
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year
Fig. 12.7. Global area of biotech/GM crops (19962007). (A) By crops. (B) By traits. From James (2008)
with permission.
developing countries and US$17.5 billion ing to the net environmental impact of an
for industrial countries) for the accumu- individual active ingredient.
lated benefits during the period 19962006; While 23 countries planted commer-
these estimates include the very important cialized biotech/GM-crops in 2007, an addi-
benefits associated with the double crop- tional 29 countries, totalling 52, have granted
ping of biotech/GM-soybean in Argentina regulatory approvals for biotech/GM-crops
(Brookes and Barfoot, 2008). The accumu- for import for food and feed use and for
lative reduction in pesticides for the period release into the environment since 1996.
19962006 was estimated at 289,000 t of A total of 615 approvals have been granted
active ingredient, which is equivalent to for 124 events for 23 crops. Thus, biotech/
a 15.5% reduction in the associated envi- GM-crops have been accepted for import
ronmental impact of pesticide use on these for food and feed use and for release into
crops, as measured by the Environmental the environment in 29 countries, including
Impact Quotient (EIQ) a composite meas- major food importing countries like Japan,
ure based on the various factors contribut- which do not have plant biotech/GM-crops.
496 Chapter 12
The most important potential contribu- to produce a beneficial trait could also
tion of biotech/GM-crops will be their con- result in unintentionally hazardous effects.
tribution to the humanitarian Millennium Individual transgenic crops could poten-
Development Goals (MDG) of reducing pov- tially present risks to humans or the envi-
erty and hunger by 50% by 2015. With a ronment, although there is no evidence that
dozen years of accumulated knowledge and it has happened over the 10 years of com-
significant economic, environmental and mercialization. In general, a strong, but not
socio-economic benefits, biotech crops are stifling, regulatory system needs to be estab-
poised for even greater growth in coming lished and properly implemented to ensure
years, particularly in developing countries safe crops to humans and the environment.
that have the greatest need for this technology. Other reasons include fraud avoidance and
The number of biotech/GM-crop countries, social, ethical and public concerns.
crops and traits and hectarage are projected
to double between 2006 and 2015, the second Risk assessment
decade of commercialization (James, 2008).
Despite globally organized opposition, All transgenic plants are required to undergo
few innovations in agriculture have spread thorough and vigorous safety and risk assess-
so rapidly as GM-crops. Still, much remains ments before commercialization. A risk
to be done particularly the expansion of assessment consists of hazard identification,
disease-resistant cultivars, increased yields, hazard characterization, exposure assess-
biofortification of food for poor consum- ment and risk characterization (Codex
ers, substitution of plant-produced targeted Alimentarious Commission, 2001; Craig et al.,
endotoxins for broad-band pesticides and, 2008; Nickson, 2008). Regulatory justifica-
perhaps most crucially, drought-tolerant tions for these assessments differ between
and salt-tolerant cultivars (Herring, 2008). countries. In most countries there are two
Disaggregating the concept of the genetically kinds of regulations that govern research
modified organism (GMO) is a necessary and development of transgenic plants:
condition for confronting misconceptions (i) contained-use rules governing genetic
that constrain the use of biotechnology in modification in the laboratory, concentrating
addressing imperatives of development and mainly on worker health and safety issues;
escalating challenges from nature. There and (ii) field-release regulations focusing on
are still several problems associated with environmental risk assessment appropriate
commercialization. There is a high invest- to the nature and final use of the transgenic
ment cost associated with the long lead plant. Each release is considered case by
time (810 years) for products to reach the case to build up experience with particular
marketplace; profitability of some technol- crop and transgene combinations.
ogy push products at the onset of product The United States National Research
development is uncertain; intellectual prop- Council identified four categories of poten-
erty issues limit freedom to operate with key tial environmental hazards from the release
technologies; and uncertainty associated of a transgenic crop:
with regulatory and consumer acceptance
(i) hazards associated with the movement
issues inhibit trade and investment. of the transgene itself with subsequent
expression in a different organism or
species, (ii) hazards associated directly or
indirectly with the transgenic plant as a
12.7.3 Regulating transgenic crops whole, (iii) non-target hazards associated
with the transgene product outside the
There are many reasons why governments plant and (iv) resistance evolution in the
regulate and oversee processes and prod- targeted pest population.
(NRC, 2002)
ucts for transgenic crops (Jaffe, 2004). One
major concern about the use of GM-foods The potential human hazards from trans-
is that the molecular alterations designed genic crops are also generally recognized by
Gene Transfer and GM Plants 497
the scientific and regulatory communities (i) horizontal a process-based system that
(Jaffe, 2004). The potential risks generally applies to all plants produced by transforma-
related to: tion methods, e.g. Europe; and (ii) vertical
a product-based system that defines the
the possibility of introducing new allergens
or toxins into food-plant varieties, the characteristic of modified plants that require
possibility of introducing new allergens into them to be regulated, e.g. the USA.
pollen, or the possibility that previously In the USA, transgenic plants are regulated
unknown protein combinations now by three federal agencies the Department
being produced in food plants will have of Agriculture (USDA), the Environmental
unforeseen secondary or pleiotropic effect. Protection Agency (EPA) and the Food and
(NRC, 2001) Drug Administration (FDA). The USDA
However, not all tests currently being applied controls permits for inter-state movement of
to assessing allergenicity have a sound scien- transgenic materials, assesses the pest char-
tific basis (Goodman et al., 2008). Therefore, acter of transgenic plants, determines when
factors to be borne in mind in the risk assess- transgenic plants can be field-grown without
ment should include, but are not limited notification or permits. The FDA determines
to: (i) the function of the gene in the donor whether a transgenic plant has been ade-
organism; (ii) the effect of the transgene on quately evaluated in accordance with its bio-
the phenotype of the transgenic plant; (iii) technology food and feed policy, e.g. for the
evidence of toxicity and/or allerginicity, e.g. safety of antibiotic selectable marker genes.
Brazil nut seed storage protein; (iv) persist- The EPA regulates plants with pesticide prop-
ence in agricultural habitats (weediness); (v) erties, e.g. Bt plants and registers herbicides to
invasiveness in natural habitats; (vi) impact be used on herbicide-resistant plants.
on non-target organisms, e.g. Bt maize, regu- In the European Union (EU), EU Directive
lation requires extensive analysis to identify 2001/18/EC (European Parliament, 2001)
any potential problem; and (vii) the likeli- sets forth regulations governing the deliber-
hood/consequence of transgene movement ate release into the environment of GMOs.
to other plants by cross-pollination or to The Directive put in place a step-by-step
other (pathogenic) organisms by horizon- approval process on a case-by-case assess-
tal gene transfer, e.g. sexual compatibility ment of the risks to human health and the
between cultivated oats (Avena sativa) and environment before any GMO or products
wild oats (Avena fatua), viral host range consisting of or containing GMOs can be
extension by transgene-encoded coat pro- released into the environment or placed on
tein transencapsidation or genetic recom- the market (European Parliament, 2003).
bination. More recent discussion on several There are no European Community (EC)-
specific issues can be found in Craig et al. wide regulations governing novel feed and
(2008), Nickson (2008) and Romeis et al. foods, and some countries have established
(2008) and Tabashnik et al. (2008). their own national regulations.
The risk assessment process for trans- In Japan, transgenic plants are regulated
genic plants consists of two steps: (i) a com- by the Ministry of Agriculture, Forestry and
parative analysis (substantial equivalence) Fisheries and the Ministry of Health and
to identify potential differences with their Welfare. In Canada, transgenic plants are
non-engineered counterpart(s); followed regulated by Ag. & Agri-Foods Canada and
by (ii) an assessment of the environmental Health Canada. Novel products are regu-
and food/feed safety or nutritional impact lated in the same way whether they are gen-
of any identified differences (Ramessar erated by mutation or transformation.
et al., 2007). There is no international harmoniza-
tion of regulations to ensure that transgenic
Regulatory systems plant cultivars released in one country will
be accepted in another. Antibiotic resist-
There are two kinds of regulation systems ance genes in food products might inhibit
(the data assessment in both is similar): international trade in transgenic products.
498 Chapter 12
12.7.4 Product release and marketing or beside fields containing three kinds of
strategies inherited herbicide resistance, dominant,
recessive, or maternal. Over the 6-year
In order to recover their substantial research study, in the absence of herbicide selection,
investment the developer of a potential the maternal chloroplast-inherited resist-
commercial product must either generate ance was observed at a 2 106 frequency in
and market the transgenic seed directly or the weed populations. Resistant weed plants
negotiate a royalty with a seed company/ were observed 60 times as often, at 1.2 104
companies, e.g. universities, government in the case of the nuclear recessive resistance
agencies, technology development compa- and 190 times as often, at 3.9 104 in the
nies, large agrochemical companies. The case of the dominant resistance. The results
marketing strategies for any one particular indicated that the hereditary mode of trans-
trait will be influenced by the nature of the mission of transgenes played a major role in
product. Herbicides, disease or stress toler- interspecific gene flow. More recently visual
ance traits which enhance yield or reduce markers such as GFP have been proposed for
inputs will help increase market shares use, using whole plant expression to moni-
and/or increase seed sale premiums, e.g. Bt tor gene flow under agricultural conditions.
maize. Herbicide tolerant crops will help This method has been used successfully to
benefit from increased chemical sales, e.g. assess outcrossing events in canola (Brassica
Roundup Ready maize. Improved grain napus) under field conditions (Halfhill et al.,
quality, for on-farm/downstream processing 2004a). A direct method could be the use of
uses, will influence a seed sale premium, GFP-tagged pollen to monitor pollen move-
e.g. high lysine maize. ment under field conditions. This system
would allow the quantification of pollen
flow directly from a group of individuals in
the field and would determine the distance
12.7.5 Monitoring transgenes and directional patterns of pollen dispersal
within a plant population. In Hudson et al.s
One of the principal concerns of GM-crops (2004) report, a pollen specific promoter
is the likelihood and possible consequence was used to express the GFP gene in tobacco
of the introduced transgenes being trans- (Nicotiana tabacum L.). GFP was visualized
ferred through pollen dispersal to wild rela- in pollen and growing pollen tubes using
tives or non-transgenic crops (Chandler and fluorescence microscopy. Furthermore, the
Dunwell, 2008). For pollen-mediated gene goal of the research was to compare the
flow to occur among plant populations, dis- dynamics of pollen movement with that
persal of pollen to a different population of gene flow by using another method of
must occur with successful fertilization of whole plant expression of GFP to estimate
an ovule. Although the movement of pollen the outcrossing rate by progeny analysis.
is a critical step in transgene escape, there Pollen movement and gene flow were quan-
are currently few systems for the direct tified under field conditions. Pollen was col-
monitoring of transgenic pollen movement lected in traps and screened for the presence
under field conditions. Previous attempts to of GFP-tagged pollen using fluorescence
measure gene flow have evolved around the microscopy. Progeny from wild-type plants
analyses of genetic markers (Slatkin, 1985) were screened with a hand-held ultraviolet
as discussed in Chapter 13. These systems light for detection of the GFP phenotype. It
have limitations because they are species- should be noted that the GFP gene is from an
specific, requiring the use of expensive animal or fly and thus it should be handled
assays that hardly yield results in real time very carefully. The examples given here are
or in the field. Shi et al. (2008) reported on only proposals by researchers and they are
the gene flow between foxtail millet (Setaria not used for commercial purposes.
italica), an autogamous crop and its weedy A built-in strategy was developed to
relative, Setaria viridis, growing within create selectively terminable transgenic rice,
Gene Transfer and GM Plants 499
more genes with well-characterized func- limit many steps in transgenic breeding will
tion and tools optimized for transgene become less demanding compared to the
expression. Molecular markers can be used discovery and characterization of genes and
to facilitate the transformation process, to commercialization of transgenic products.
transfer the transgenes to a different genetic It can be expected that transgenic breed-
background and to identify and select trans- ing will become increasingly important by
genic plants as discussed in Chapter 13. producing good-quality and high-yielding
Genetic transformation will also be increas- agricultural products. All regulatory and
ingly combined with conventional breed- biosafety issues, both of which are man-
ing approaches, which will contribute to made and currently slow or stop the adop-
improved breeding efficiency. tion of transgenic crops by farmers in many
As technology develops in transgenic countries, will be brought under control at a
breeding, transformation technologies that reasonable level.
13
Intellectual Property Rights and Plant Variety Protection
2005), Chan (2006), Louwaars et al. (2006), to realize that IPR are only enforceable
a publication by The International Bank for within the national territory within which
Reconstruction and Development and The they have been registered. Moreover, the
World Bank that is cited as IBRD/World level and nature of enforcement of IPR
Bank (2006), Tripp et al. (2006) and Henson- laws varies considerably across geographi-
Apollonio (2007). cal regions. These factors significantly and
differentially influence the product devel-
opment and deployment strategies of com-
mercial companies operating in different
13.1 Intellectual Property and Plant countries.
Breeders Rights
competing commercial seed producers from nologies has brought additional challenges
multiplying and marketing the protected for the application of IPR in plant breeding.
cultivar without a licence. Many breeding Analysis of the historical seed-saving
companies would like to keep competing practices of soybean farmers in the USA
plant breeders from using a protected cul- indicates that large US farms have consist-
tivar or technology in the development of a ently saved seed as much as 60% in some
new cultivar. In this case, they must use the years. However, with the introduction of
patent system as PBRs have been explicitly Roundup Ready soybeans the nature of
structured to encourage rather than exclude seed saving was drastically changed. The
this type of activity. The degree to which combination of an expanding array of IPR
IPR and PVP systems in developing coun- on technologies used in the development of
tries are able to limit practices depends on new breeding materials, new genetically
economic, administrative and political fac- modified (GM) technologies that in some
tors (Tripp et al., 2006). A general prohibi- countries have led to whole plant patents
tion on saving seed of protected cultivars and the increasing application of indus-
is an unlikely strategy in most developing trial concepts to plant breeding has brought
countries. Although coordinated systems huge new private sector investments to
of cleaning and dressing farmer-saved seed plant breeding that are dramatically chang-
while collecting royalties have been suc- ing the nature of the business worldwide
cessful in some Organisation for Economic (Mascarenhas and Busch, 2006).
Co-operation and Development (OECD) Not only do some countries allow the
countries. use of patents to protect plants, cultivars
Crop cultivars present several important and genes, but the majority of the tools
challenges for an IPR system (IBRD/World and processes of molecular biology and
Bank, 2006). First, they are biological prod- genetic transformation can be patented as
ucts that are easily reproduced and whose well. Many of the biotechnology techniques,
very use entails multiplication. Secondly, which are becoming increasingly important
the users (and potential copiers) of the in conventional plant breeding, are also
technology are millions of individual farm- protected, thereby raising implications for
ers whose compliance with any protection the ownership of any cultivar resulting from
regime is difficult and expensive to moni- their use. In addition, because biotechnology
tor, particularly in developing countries. allows a much more precise understanding
Thirdly, the agricultural sector involves of the genetic make-up of any crop cultivar,
cultural values and food security issues that it opens the door to sophisticated screening
in many countries affect the livelihoods and reverse engineering techniques, which
and even potential survival of the rural in turns offer new possibilities for utilizing
poor, making the imposition of any controls protected cultivars, leading to pressure for
a sensitive political issue. Fourthly, the more stringent protection.
inherent diversity of crop cultivars makes Although a range of attempts have
it difficult to apply the narrow technical been made to provide a set of IPR for crop
criteria of novelty and reproducibility used cultivars, only within the last few decades
in the conventional patent system, whereas has a mechanism for PVP firmly taken hold
the use of standard breeding methodologies in industrialized countries. International
may frustrate the application of the inven- treaties such as UPOV and TRIPS, are
tive step criterion. Fifthly, the development attempting to establish common features
of new crop cultivars has always relied to of certain IPR although most develop-
some extent on public research, partly in ing country signatories are slow to ratify
response to the traditional public goods and implement these agreements through
nature of crop-related biodiversity. Thus, their national laws. One of the most con-
the application of IPR to the products of a troversial of these features is contained in
publicly funded endeavour can be problem- Article 27.3(b) of the TRIPS Agreement,
atic. Sixthly, the increasing use of biotech- which requires all WTO member states to
504 Chapter 13
Market
Discovery
introduction
review
Stringent agronomic Product
performance and efficacy Food performance
criteria
Feed Investigate
Decision and Greater than 90% of all complaints
actions here can events are eliminated Fuel
have long-term Support of
and late Based in part on methods Environmental academic
consequences used to evaluate research into
conventioanl varieties applications
through traditional breeding
Fig. 13.1. Steps involved in crop biotechnology product development using transformation or
marker-assisted selection (MAS). GH, greenhouse.
and thus the marginal cost to the farmer recognition of these difficulties is one of
of shifting cultivar is low. The experience the main reasons for the interest shown by
of Argentina shows that the advent of pro- many developing countries in PVP: PVP is
tected cultivars did not increase the price therefore an indispensable element of the
of the seed: breeders and seed producers, new seed policies that give an important
to remain competitive on the seed market role to both public and private-sector plant
with unprotected cultivars, rationalized breeding.
the production of and trade with their seed
and the royalty was taken from the savings. The exercise of the breeders rights
However, the situation is not always compa-
rable in developing countries where many The way in which plant breeders choose
farmers may be currently obtaining seed to exercise their right depends upon many
through informal systems and are thus faced factors and the scope of the protection con-
with significant initial investment require- ferred to them is only one of the options on
ments in order to shift to a new cultivar. hand. The breeder of seed crops will seek
Many countries have strong invest- to organize the production of commercial
ments in public research of relevance to (certified) seed in a fairly loose manner and
plant breeders. But experience shows that seek to collect a royalty at each multiplica-
this approach is not enough: public funds tion stage (to spread the risks); he will apply
cannot adequately cover the needs of every a very open licence policy. The breeder of
crop, every agroclimatic zone, every market an ornamental plant will seek to organize
preference, etc. In addition, the interaction the production and sale of cut flowers and
between strategic research and product not just propagating material.
development, between public plant breed- Heitz (1998) gave two reasons why
ing and private-sector product deployment, economic theories and constructions based
is frequently deficient (Heitz, 1998). The upon the notion of monopoly are totally
506 Chapter 13
inappropriate in the case of crop cultivars: Given the declining public funding of agri-
(i) breeders are bound to associate with oth- cultural research in many countries, rev-
ers in effect partners to exploit their enue generation is an attractive option for
cultivars; the success of a particular culti- many public institutions. Income from IPR
var and of the commercial strategy of its can support the institution to cover opera-
breeder is the result of many individual tional costs or hire additional staff and
decisions; and (ii) the breeder of protected provides managers with a financial tool to
cultivars is almost always bound to com- support particularly innovative researchers
pete with other breeders and their cultivars. or research groups. Public cultivars can gen-
Another relevant factor is the existence and erate a ready income, especially if cultivars
scope of a farmers privilege (the need to bred in the past can be protected.
make commercial seed competitive with
farm-saved seed).
13.2.2 Impacts of plant variety
The derived benefits protection
There are three main reasons for national
agricultural research institutes (NARIs) to Plant breeding
embrace IPR: recognition, technology access
MAINTENANCE BREEDING. The PVP system is
and transfer and revenue (Louwaars et al.,
not only there to encourage creative plant
2006). In commercial breeding, the last reason
breeding activities. The full benefit of a new
prevails; IPR create additional value for the
and improved cultivar can only be drawn
crop cultivar by providing a legal basis for
if, first of all, the cultivar is properly main-
licence contracts between the breeder and
tained and, secondly, if authentic propagat-
seed producers, which commonly includes a
ing material of the cultivar is made available
royalty payment that serves as an important
to users. The PVP system ensures that the
tool to recoup the investment in research.
breeder has a lasting interest in ensuring
In public research, however, cultivar devel-
these activities (at least as long as the culti-
opment is funded from public sources and
var is commercially successful).
research managers tend to put some empha-
sis on other research objectives. IPR formally
link the cultivar to the institute and individ- GENETIC DIVERSITY. The increased number
ual breeders. Furthermore, IPR may facilitate of breeding programmes which enter into
seed production when only an exclusive competition, if this happens, implies a
market will entice an individual seed pro- diversification of the programmes that
ducer to take a new cultivar into its product result an increasing probability of obtaining
range (to facilitate technology transfer) and superior and genetically diverse cultivars.
technology may be more easily acquired if This scenario provides a strong counterbal-
patents can be traded. ance to the trend for uniformity that may be
The direct effect of PVP is to promote generated by the market demand in prod-
plant breeding. All countries which have ucts (Heitz, 1998). However, the trend in the
become a member of UPOV and whose seed market for a few to be private provid-
agricultural sector is of a size that justifies ers of bred-seed worldwide contributes to a
investments in plant breeding have reported few crop cultivars that dominate the market,
increases in the volume of plant breed- reducing genetic diversity. The pressure on
ing activities in developing countries with the natural ecosystems can be lessened
direct effects on their agriculture. However, through: (i) providing uniformity for use of
the merges and acquisitions keep reduc- a cultivar in single fields (to allow rational
ing the number of players in the seed sec- and efficient production) but diversity for
tor in developed countries. The others have use across fields; and (ii) contributing to
reported increases in the assortment of cul- the widespread use of a relatively narrow,
tivars made available by foreign breeders. but improved, gene pool (to maximize
IPR and PVP 507
so that the resulting cultivars grow better The shift to commercial crops and
in that environment. This is done both farmers may be consistent with recent
through the organization of subsidiaries changes in national agricultural policy and
and through partnership or licensing trends of commercialization of public enti-
agreements. In both cases there is a flow of ties. In some countries, however, the public
technology and know-how, in both direc- task of a NARI is to support both equity and
tions, as subsidiaries have to rely on the national agricultural production. The trend
local seed trade. towards crop diversification and breeding
for low-input agriculture, which means bet-
Breeding strategies ter yield stability, may be reversed when
NARIs focus on using IPR for revenue gener-
Introducing the concept of revenue gen- ation. Another strategy of a NARI may be to
eration in public plant breeding is likely to secure a choice of cultivars for farmers in a
have an impact on the distribution of funds market that may otherwise be dominated by
within the NARI and on the breeding strat- large commercial companies owing to IPR.
egies applied. Louwaars et al. (2006) dis- However, this latter option also may shift
cussed the impact of IPR on plant breeding research priorities away from smallholder
strategies which is summarized as follows. farmers needs (Louwaars et al., 2006). Policy
First, IPR can be generated in plant makers and research managers need to care-
breeding relatively easily compared with fully consider the impact of the use of IPR
other agricultural research undertakings. As in public breeding before including protec-
a result, the pursuit of revenue could lead to tion in their research strategies. If national
important disciplines such as soil science, public organizations are not supposed to
socio-economics and plant pathology being protect their inventions, governments will
marginalized or downgraded to supporting have to provide the necessary funds for
only breeding efforts. their research.
A second possible impact is that funds
will be distributed more to crops with a high NARI organizations
value in seed production. These include,
in general, crops that are produced for the Louwaars et al. (2006) discussed how PVP-
market (where investment in seed is com- related issues would greatly affect the
mon), which are difficult to reproduce on- NARI organizations. When a NARI intends
farm (e.g. cross-pollinated crops) and that to commercialize its cultivars using IPR,
have a low seed rate. In practical terms, this it must realize that the right holders are
means that maize-breeding programmes will responsible for implementing their rights
get priority over those for open-pollinated and that the NARI needs capacities to
small grains, most pulses and root crops. design commercialization strategies and
The latter crops, however, may be impor- licence contracts, as well as to follow up
tant for the nutrition security of most of the on these contracts. In addition, research
population. managers have to be aware that there are
The third level of impact is within many costs involved in IP protection such
breeding programmes themselves, where as those for additional personnel, IPR acqui-
researchers have to choose which ecologi- sition, implementation, application and
cal areas or client groups to target. Revenue maintenance fees. Commercial decisions
generation will focus breeding on com- have to be made on which rights to apply
mercial farmers and hybrids rather than on for and when to surrender them. A signifi-
resource-poor farmers who cannot afford cant cost can arise when the rights have to
to buy hybrid seed and they have to use be defended, especially against experienced
open-pollinated cultivars instead. In the lat- negotiators of commercial companies with
ter situation, the seed industry is unlikely significant resources.
to generate profits and pay royalties to the While crop cultivars are in almost
breeder. all cases freely used as parents in further
IPR and PVP 509
breeding, this is not the case in patented Another challenge IARCs face is to get
biotechnologies. Hence, NARIs will need access to protected technologies without
to develop ways and means to observe diminishing their primary task of poverty
rights on technologies and materials that alleviation. Materials and tools may not
they use in breeding. Most countries be used in research if the products cannot
have a fairly liberal research exemption be made available to the target groups (i.e.
in their patent laws. This situation com- the resource poor) without restrictions.
monly leads to a licence contract in which Humanitarian licences and cooperation
the patent holder can specify the uses, agreements should at least contain such
the ways of commercialization and ben- provisions.
efit sharing (royalty payment) (Louwaars A less debated result of the spread of
et al., 2006). A NARI needs therefore to IPR on IARCs is the impact of the com-
identify possible risks associated with the mercialization of some NARIs on the cap-
use of patented technologies. An IP plan abilities of IARCs to reach the resource poor
needs to be developed for each project, in (Louwaars et al., 2006). A NARI that will
which it is decided when and how con- concentrate their strategy on revenue gen-
tact will be established with the technol- eration through IPR and thus move away
ogy provider. from producing solutions for resource-poor
The introduction of IPR brings new farmers in favour of commercial produc-
tasks and responsibilities to the NARI. It tion may not always be suitable partners of
requires not just access to lawyers, IP spe- IARCs for reaching the poor. The latter may
cialists, negotiators and marketers, but more need to look for other ways, for example,
importantly, it calls for a shift in culture through non-governmental organizations
among the researchers. All researchers will and in some cases, direct contacts with seed
have to be aware of the potential impact of producers. All the IARCs have IPR policies,
IPR on their work, when they commonly although most of these are still subject to
prefer to concentrate on their own science adjustment and elaboration. The increased
and not be bothered by administrative use of IPR has caused IARCs to re-evaluate
rules. Senior management will have to lead their modes of interaction with both NARI
the way in this gradual shift, assisted by organizations and seed companies. Various
well-designed capacity-building initiatives approaches have been taken to ensure that
and support systems. NARI germplasm reaches the farmers for
whom it is intended.
International agricultural research
current flexibility to protect plant cultivars (i) to clarify certain provisions in the light of
(Wall Tvet, 2005). These agreements that the experience of the UPOV member states
affect plant breeding are shown in Fig. 13.2. in operating the Convention since 1961;
(ii) to strengthen the protection offered to
plant breeders in certain specific ways; and
(iii) to reflect technological changes. The
13.3.1 The UPOV Convention and UPOV rights defined under UPOV are known as
plant variety protection (PVP). The UPOV
After decades of attempts to obtain patent system is considered as the most straight-
protection for their achievements, plant forward choice for countries wishing to
breeders, together with a segment of the IP comply with the TRIPS Agreement. The
specialists, requested that consideration be UPOV Convention is the only model for a
given to a specially designed protection sys- PVP system. It is not only an IP treaty, but
tem. The request was taken up by the French also an instrument in the field of agricul-
government, through the conferences and tural policies.
meetings it hosted between 1957 and 1961, The breeder is defined in the 1991 Act
leading to the signing on 2 December 1961 of the UPOV Convention as the person who
of the International Convention for the bred, or discovered and developed, a vari-
Protection of New Varieties of Plants (also ety (cultivar in this book). Protection has
known as the UPOV Convention). thus to be afforded not only where a cul-
The UPOV system revised in 1972, tivar has originated from breeding in the
1978 and 1991 has gradually strength- somewhat restricted sense of crossing par-
ened the rights of plant breeders. The last ent plants and selecting from within the
revision, 30 years after the initial adoption, progeny, but also where a person identi-
was substantial. The revisions were made: fies a mutation or a variation, of known or
Food and
Agriculture
Convention on World Trade World Intellectual
Organization of
Biological Diversity Organization Property Organization
the United
(CBD) (WTO) (WIPO)
Nations
(FAO)
Intergovernmental
International
Patent Committee on
Trade-Related Treaty on Plant
Cooperation Intellectual Property
Aspects of Genetic
Access and Cartagena Treaty (PCT), and Genetic
Intellectual Resources for
benefit sharing Protocol Substantive Resources,
Property Rights Food and
Patent Law Treaty Traditional
(TRIPS) Agriculture
(SPLT) Knowledge, and
(IT PGRFA)
Folklore
Breeders right,
Traditional Facilitated
Living modified patents, Harmonization of
Genetic resources knowledge, genetic access,
organisms trademarks, trade IPRs
resources, folklore farmers rights
secrets
Breeders
Fig. 13.2. International agreements that affect plant breeding. From IBRD/World Bank (2006) The
World Bank 2006.
IPR and PVP 511
unknown origin, in existing plant material UPOV provides protocols for assessing
and ensures that the mutation or variation and describing the unique characteristics of
is isolated and propagated as a new cultivar a new cultivar, ensuring that it is distinct,
(Heitz, 1998). uniform and stable (DUS). These standards
The UPOV Convention has been very are adapted to the mode of reproduction of
successful in impressing upon the crop cul- the protected species: cross-fertilizing crops
tivars and seed sector, in particular in the admit a wider tolerance than the relatively
UPOV member states, a notion of variety strict requirements for uniformity in vegeta-
(cultivar) that, from the technical point of tively propagated crops. Any cultivar that
view, is identical with protectable variety. fulfils the DUS criteria and that is new (in
According to Article 1(vi) of the 1991 Act of the market) is eligible for protection and
the UPOV Convention, a variety basically there is no need to demonstrate an inventive
is a plant grouping that meets the conditions step or industrial application, as required
of distinctness, uniformity and stability, but under a patent regime. A DUS examina-
not necessarily to the degree required for tion involves growing the candidate culti-
protection. var together with the most similar cultivars
as per common knowledge, usually for at
least two seasons and recording a compre-
Distinctness, uniformity and stability (DUS) hensive set of morphological (and in some
cases agronomic) descriptors (IBRD/World
There are five required conditions for
Bank, 2006).
protection.
Characteristics are used to assess DUS
1. Novelty The cultivar to be protected and include descriptors such as flower col-
must not have been the subject of commer- our, or leaf shape. A characteristic must
cial acts before certain dates determined on meet a number of basic requirements for it
the basis of the date of application. to be used for DUS testing or for producing
2. Distinctness (Article 7): The variety a cultivar description. The characteristic
shall be deemed to be distinct if it is clearly must:
distinguishable from any other variety whose
1. Result from a given genotype or combi-
existence is a matter of common knowledge
nation of genotypes.
at the time of the filing of the application.
2. Be sufficiently consistent and repeatable
Distinctness is established on the basis of
in a particular environment.
individual characteristics (descriptors in
3. Exhibit sufficient variation between cul-
genetic resources parlance) that are botani-
tivars to be able to establish distinctness.
cal in nature and are not necessarily related
4. Be capable of precise definition and
to the agricultural or technological proper-
recognition.
ties or value of the cultivar.
5. Allow uniformity and stability require-
3. Uniformity (or homogeneity) (Article
ments to be fulfilled.
8): The variety shall be deemed to be uni-
form if, subject to the variation that may be Characteristics may have direct com-
expected from the particular features of its mercial relevance or no commercial rele-
propagation, it is sufficiently uniform in its vance. For example, using the criteria above
relevant characteristics. may eliminate some commercially import-
4. Stability (Article 9): The variety shall ant traits, for example, yield. Chemical con-
be deemed to be stable if its relevant char- stituents may be acceptable characteristics,
acteristics remain unchanged after repeated provided they meet the criteria. It is impor-
propagation or, in the case of a particular tant that characteristics based on chemical
cycle of propagation, at the end of each such constituents be well defined and supported
cycle. by an appropriate method for examination.
5. Denomination The cultivar must be UPOV test guidelines have been devel-
given a denomination under which it will oped for individual species or cultivar
be commercialized. groupings to provide guidance related to
512 Chapter 13
growing cycles, number of plants, material ing since it suffices to add yet another gene
to be tested, or characteristics to be exam- to escape the protection of the cultivar taken
ined. The DUS test may be undertaken as host for that gene). The concept of essen-
directly by the authority of the UPOV mem- tially derived variety (EDV) embodied in
ber, by a party designated by the author- Article 14(5) of the 1991 Act of the UPOV
ity (e.g. an institute, the breeder), or the Convention is designed to ensure that the
authority may take into account the results Convention continues to provide an ade-
from previous tests or trials conducted by, quate incentive for plant breeding. Under
for example, other UPOV members. There that Article, a cultivar that is essentially
can be therefore a high level of coopera- derived from a protected cultivar may be the
tion in DUS testing, including, for example, subject of protection (if it fulfils the normal
the purchase of DUS test reports, bilateral protection criteria of DUS and novelty), but
arrangements to avoid duplication of testing cannot be exploited without the authoriza-
and centralized DUS testing at regional or tion of the breeder of the protected cultivar.
global levels. Cooperation between authori- For practical purposes, cultivars will only be
ties can minimize the time for DUS testing, essentially derived when they are developed
minimize costs and optimize examination in such a way that they retain virtually the
of characteristics in growing trials. whole genetic structure of the earlier variety.
Under the 1978 Act of the UPOV Convention, The most prominent issues in the sui generis
any protected cultivar may be freely used as systems involve the so-called farmers priv-
a source of initial variation to develop fur- ilege and breeders exemption. The trad-
ther cultivars. Any such cultivar may itself itional right of farmers to save seed from
be protected and, what is more important, their harvests to plant the following season
exploited without any obligation on the part is an important aspect of sui generis systems
of its breeder and users towards the breeder and is one of the most contentious aspects
of the cultivar that was used as a source of of IPR in plant breeding. Although this prac-
initial variation. These rules have with cer- tice is often described as a farmers right,
tain exceptions worked well in practice and it is referred to here by the UPOV term of
have been reaffirmed in the 1991 Act. farmers privilege to distinguish it from the
However, the rules did not prevent a broader concept of farmers rights.
person finding a mutation within a crop cul- The 1978 UPOV Convention assumed
tivar (such mutations are for a few traits in that farmers were permitted to save and reuse
some species), or selecting some other minor seed of protected cultivars as part of private
variant from within a cultivar, from exploit- and non-commercial use. However, Article
ing the mutant or variant with no authoriza- 15(2) of the 1991 UPOV Convention rules that
tion from, or recognition of the contribution on-farm seed saving is not permitted without
of, the original breeder to the final result. the consent of the breeder, although it allows
The lack of recognition of that contribu- member states to specify crops for which the
tion in such circumstances was generally use of farm-saved seed is permitted, taking
considered to be improper (Heitz, 1998). into account the legitimate interests of the
Modern biotechnology has greatly increased breeder. In the European Union (EU), this
the likelihood of such situations; it may take provision is interpreted as the right of small-
12 years to develop a new cultivar but a mere holder farmers to save seed for specific crops
3 months to modify it by adding a transgene and the right of the breeder to collect royalties
or genes introduced through genetic engi- on farm-saved seed used on larger farms. The
neering in the laboratory. 1991 Convention also prohibits any transfer
This situation indeed can be a disincen- of seed of protected cultivars (through sale,
tive to the continued pursuit of classical barter or gift) between farmers. Utility patents
plant breeding (and also of genetic engineer- on plant cultivars are even more rigid and a
IPR and PVP 513
The 1991 Act establishes three com- It is seen as a way of promoting the
pulsory exceptions to the breeders right development of the best cultivars for farm-
and one optional exception. The three com- ers, limiting the development of long-term
pulsory exceptions are: (i) acts done pri- commercial advantages, improving oppor-
vately and for non-commercial purposes tunities for smaller breeding companies and
(in particular the reproduction of a pro- thus promoting competition in the sector.
tected cultivar by a subsistence farmer or Unlike the farmers privilege, the breeders
by an amateur gardener); (ii) acts done for exemption has not dramatically changed in
experimental purposes; and (iii) acts done later UPOV Conventions, prompting some
for the purpose of breeding other cultivars companies in the USA to look to the patent
and (provided protection has not been spe- system for protecting their germplasm. The
cifically extended to them, as for instance only modification in the 1991 Convention
in the case of an EDV) for the purpose of is the limitation on EDVs, which may fall
exploiting such other cultivars. under the rights of the original breeder.
The optional exception relates to farm-
saved seed. States that are party to the 1991
Act of the UPOV Convention may exempt 13.3.2 The 1983 International
farm-saved seed from the breeders right, Undertaking on Plant Genetic Resources
within reasonable limits and subject to
safeguarding the legitimate interests of the
In 1983, the Food and Agriculture Organ-
breeder. Each member state will exercise
ization of the United Nations (FAO) estab-
this option in the light of its own national
lished a Commission on Plant Genetic
conditions. Some states have chosen to give
Resources (later renamed the Commission
farmers an unconditional right to replant
on Genetic Resources), the first permanent
seed from their previous harvest while
intergovernmental forum devoted to germ-
others have limited this right to certain
plasm conservation and development. The
crops or to small farmers.
Commissions first major action was to adopt
a non-binding resolution known as the
Breeders exemption International Undertaking on Plant Genetic
Resources (hereafter the Undertaking),
As plant breeding is generally considered as
which is based on the principle that plant
incremental, breeders have built on exist-
genetic resources are a common heritage of
ing cultivars to develop improved ones. To
mankind to be preserved and to be freely
make progress, contrary to the situation in
available for use, for the benefit of present
mechanics or chemistry, the description of
and future generations. The purpose of
the invention is not enough, as it is not to
the Undertaking is to ensure that genetic
rebuild a whole genome starting from nucle-
resources will be explored, preserved,
otides. That is why the UPOV Convention
evaluated and made available for breeding
included an exception to breeders rights:
and science. It is based on the following
The utilisation [by others] of the [protected]
underlying principles:
new cultivar as an initial source of variation
for the purpose of creating other new cul- Genetic resources are a heritage of
tivars and the marketing of such cultivars humanity and should be available with-
(Art. 5.3 of the 1961 Act). This exception, out restriction.
widely known as the breeders exemption, Establishes farmers rights: farmers
has been one of the engines of the breeding should be compensated for develop-
industry since the late 1960s. It stems from ment and conservation of genetic
the traditionally unrestricted use of seed by resources.
farmers and breeders. It provides that any Sovereign rights of nations to preserve,
person is allowed to use a protected culti- protect and be compensated for inno-
var for further breeding without requiring vative utilization of their native genetic
the consent of the rights holder. resources.
IPR and PVP 515
The CBD marked the end of the com- With respect to the fair and equitable
mon heritage of mankind conception of sharing of the benefits arising out of the uti-
genetic resources. The CBD does not refer to lization of genetic resources, it should also
a common heritage and its preamble states be obvious that it implies, first, the creation
only that conservation of biodiversity is a of benefits and, secondly, the identification
516 Chapter 13
of a person who would be called upon to developing countries and especially the
share the benefits which he and his part- least developed among them, secure a
ners have created. All agreements that have share in the growth in international trade
been publicized so far and follow the commensurate with the needs of their
economic development,
pattern created by the Merck-INBio agree-
ment (http://www.american.edu/projects/ Being desirous of contributing to these
mandala/TED/MERCK.HTM) include as objectives by entering into reciprocal and
a major component the sharing of royalties mutually advantageous arrangements
derived from patents. directed to the substantial reduction of
tariffs and other barriers to trade and to
elimination of discriminatory treatment in
international trade relations, []
13.3.4 The 1994 TRIPS Agreement
Article 27.3 provides for an obligation
The Uruguay Round of multilateral trade to protect crop cultivars which became effec-
negotiations held under the framework of tive for developed countries on 1 January
the General Agreement on Tariffs and Trade 1996 and became effective for developing
was concluded on 15 December 1993. The countries on 1 January 2000 (1 January 2006
agreement embodying the results of those for least-developed countries):
negotiations, the Agreement Establishing 3. Members may also exclude from
the World Trade Organization (WTO Agree- patentability:
ment), was adopted on 15 April 1994, in []
Marrakech, Morocco. (b) plants and animals other than
The result of those negotiations, con- micro-organisms and essentially biological
tained in an Annex to the WTO Agreement, processes for the production of plants or
was the Agreement on Trade-Related As- animals other than non-biological and
pects of Intellectual Property Rights (the microbiological processes. However,
Members shall provide for the protection
TRIPS Agreement). The WTO Agreement,
of plant cultivars either by patents or by
including the TRIPS Agreement (which is an effective sui generis system or by any
binding on all WTO members), came into combination thereof. The provisions of this
force on 1 January 1995. The former agree- subparagraph shall be reviewed four years
ment established a new organization, the after the date of entry into force of the
World Trade Organization (WTO), which WTO Agreement.
began its work on 1 January 1995.
The purpose and objective of the WTO It is clear that WTO members enforced
Agreement is described in its preamble: this obligation through the adoption of a
sui generis protection system. At the Fourth
Recognizing that their relations in the field Extraordinary Session of the FAO Commission
of trade and economic endeavour should be
on Genetic Resources for Food and Agriculture
conducted with a view to raising standards
of living, ensuring full employment and a
(Rome, 15 December 1997), the FAO Legal
large and steadily growing volume of real Adviser commented as follows:
income and effective demand and expanding
In fact, the concept of a sui generis system
the production of and trade in goods and
in the TRIPS Agreement is a very general
services, while allowing for the optimal use
concept that allows States to exercise
of the worlds resources in accordance with
ample discretion. The TRIPS Agreement
the objective of sustainable development,
does not give any direct indication on the
seeking both to protect and preserve the
elements or components that should be
environment and to enhance the means for
included in the sui generis system; nor does
doing so in a manner consistent with their
it require to follow the criteria of UPOV,
respective needs and concerns at different
which is already a sui generis system of
levels of economic development,
plant cultivar protection although not
Recognizing further that there is need for the only possible one. Nevertheless, it is
positive efforts designed to ensure that possible to infer, from the general context
IPR and PVP 517
countries also saw themselves as donors, 13.4.1 Plant variety protection or plant
not as recipients, of germplasm (Fowler and breeders rights
Lower, 2005).
A tremendous asset associated with the UPOV is the most widely used system
Treaty is the genetic resources, mostly of for PVP, currently with 63 member states
the worlds major food crops, that are held (http://www.upov.int/). Most countries
at the centres of the CGIAR. Historically, of the OECD and some developing coun-
these have been considered as an interna- tries are members of one of the UPOV
tional heritage and have been freely avail- conventions, although that is not the only
able to everyone, most recently under the sui generis option under the WTOs 1994
terms of a formal agreement between FAO TRIPS Agreement. Countries wishing to join
and the centres in which it is agreed that the UPOV must present legislation compatible
centres are holding the materials in trust with the 1991 Convention. UPOV member-
for the benefit of the international commu- ship offers a number of advantages, includ-
nity. The agreements signed by the centres ing a source of technical backstopping for
with FAO on behalf of the Governing Body cultivar testing and the assurance of a PVP
of the Treaty on 16 October 2006 oblige the system recognized and respected by for-
centres to deal differently with the plant eign investors. On the other hand, the 1991
genetic resources for food and agriculture Convention imposes potential restrictions
(PGRFA) they hold and have brought under on farmer seed management practices that
the Treaty, depending on whether or not the may be politically unacceptable, a poten-
PGRFA is listed in Annex 1 of the Treaty. All tial threat to food security and impossible
transfers of PGRFA of crops listed in Annex to enforce in some circumstances. For these
1 must be under the SMTA. It is assumed reasons some developing countries have
that the SMTAs prohibition against recipi- declined to join UPOV. Only in specific
ents acquiring IPR on the germplasm and cases where seed saving might threaten a
related information refers to the access market (e.g. export markets for flowers) or
and use of the material with few onerous seed exchange would reduce incentives for
restrictions. It therefore encourages use and plant breeding (e.g. informal seed sale by
development of the materials while keeping larger farmers or sales by grain merchants
it available for use in the future by others in competition with the commercial seed
(Fowler et al., 2005). sector) would restrictions be justified in
most developing countries (Tripp et al.,
2006).
The UPOV mission statement is to
13.4 Plant Variety Protection provide and promote an effective system of
Strategies PVP, with the aim of encouraging the devel-
opment of new cultivars of plants, for the
A number of mechanisms are available to benefit of society. PVP provides the oppor-
protect the interests of plant breeders and tunity for breeders to gain a return on the
contribute to the development of a com- investment made in breeding a new cultivar.
petitive and dynamic national seed sector. For large-scale commercial farmers, market
In addition to PVP (through the granting of forces under UPOV schemes will generally
plant breeders rights) and patents, addi- lead to largely positive scenarios. However,
tional options include biological proc- the situation is very different and substan-
esses (such as the hybrid cultivar system), tially more complex for farmers in develop-
national seed laws, contract law, brand pro- ing countries. Cultivar protection systems
tection and other IPR (such as trademarks), may not be inappropriate in developing
as well as trade secrets. As with patents and countries as long as resource-poor farmers
PVP, the effectiveness of these alternatives continue to have choices through access
depends on the local capacity for enforce- to public cultivars or the right to save seed
ment (IBRD/World Bank, 2006). for their own purposes from commercial
IPR and PVP 519
for genetically engineered biological materi- Utility patents for plant cultivars are
als and plant/plant cultivars. Only plant cul- not considered a reasonable option for
tivars invented or discovered in a cultivated developing country IPR systems (IBRD/
area are eligible for patents, thus limiting World Bank, 2006). Nevertheless, aspects of
the possibility of patents on wild relatives. patents for plant cultivars become increas-
Under the 1978 UPOV Convention, a culti- ingly important because of the pressure from
var could not be protected by both a patent some parts of the seed industry to move in
and PVP, but the 1991 Convention allows this direction and because this option is
this double protection. Table 13.1 provides included in some of the bilateral trade nego-
a comparison of three major IP systems for tiations between the USA and several Latin
plant cultivars. In Japan, the patent system American countries.
is used only for plant cultivars that are con- Patent protection first became available
sidered innovative and not merely a product in 1985 and companies used both PVP and
of normal plant breeding. patent systems for some years; recently a
Table 13.1. Comparison of major intellectual property systems for plant cultivars (varieties) (from IBRD/
World Bank (2006) the World Bank).
reliance on patents has dominated. There (Louwaars et al., 2002). In the absence of
are reasons for that choice, despite the special treatments, plants containing these
higher cost of utility patents (Lesser, 2005). technologies produce sterile seed, thereby
PVP allows farmers reuse of seed (although ensuring that farmers cannot save commer-
not an issue for F1 hybrids) as well as open cial seed of self-fertilizing crops (e.g. wheat
breeding access. Patents allow neither. and beans) for subsequent planting. The
Moreover, underfunding and resultant technologies also make it difficult for other
delays in issuing certificates reduced the breeders to use the protected germplasm.
value of PVP for breeders. Companies are using the methods of genetic
transformation to develop several such pro-
tection mechanisms including the so called
terminator technology, a colloquial name
13.4.3 Biological protection given to proposed methods for restricting
the use of GM-plants by genetically switch-
The oldest mechanism for protecting a plant ing off a plants ability to germinate a sec-
cultivar is hybridization. The discovery of ond time as next-generation seed. None is
the phenomenon of hybrid vigour (hetero- commercially viable yet, but the possibility
sis) in the early 20th century opened new of this technology has led to widespread
possibilities for producing high-yielding debate and concern in the popular press (e.g.
and uniform cultivars of cross-fertilizing http://www.banterminator.org/; Guidetti,
crops and offered two distinct advantages 1998) and has caused the technology to be
for protecting the interests of commer- specifically banned in Indias Protection
cial seed provision. First, seed of hybrid of Plant Varieties and Farmers Rights Act
origin will lose some yield potential and (IBRD/World Bank, 2006).
other valuable characteristics (such as uni-
formity) in subsequent generations, which
reduces farmers incentives for saving seed.
Secondly, competing seed companies can- 13.4.4 Seed laws
not duplicate a particular hybrid cultivar if
they do not have access to the inbred lines Plant breeding and seed production are
used to develop the hybrid cultivar. If the already subjected to a set of national regu-
inbreds can be physically protected, they lations on cultivar release and seed qual-
have the character of a trade secret. Hybrids ity control. These regulations are related
from self-pollinated species including rice to seed saving, seed exchange, the scope of
were first commercialized in China in the protection, the breadth of coverage and the
1970s using genetic male sterility and now relation of PVP and patents to the concerns
over 50% of rice land is planted with hybrid of farmers rights. They have played an
rice that has a huge seed market in China important part in determining the current
and South-east Asia. The use of hybrids thus evolution of seed systems. The following
provides a steady demand for seed, over- discussion on seed laws is based on IBRD/
coming much of the uncertainty in the con- World Bank (2006).
ventional seed market, where factors such Conventional seed laws can provide
as the weather determine how much seed is opportunities for controlling access to plant
saved on the farm and hence the demand for cultivars, even in the absence of IPR legisla-
fresh seed. In China a thriving and diverse tion. They determine what cultivars may be
commercial seed sector has existed for more produced and establish regulations for seed
than two decades because of the develop- certification and quality control. They can
ment of hybrid rice. also limit the production and sale of seed
A more recent example of biological by competitors and can perform some of the
protection mechanisms is the introduc- functions expected of PVP. Seed laws usu-
tion of genetic use restriction technologies, ally specify the extent to which seed must
operating at the cultivar level (V-GURTS) be certified and define the types of cultivar
522 Chapter 13
that may be offered for sale. Where seed is recognized by the law and remedies can
certification is compulsory, the breeder may be provided. Contract law can be classi-
determine who is to produce seed by con- fied, as is habitual in civil law systems, as
trolling access to breeders (or pre-basic) part of a general law of obligations. Various
seed. Any unauthorized multiplication will types of contracts can be effective in pro-
not be acceptable to the certification agency. viding legally enforceable agreements that
A public or private breeder can establish an restrict the use of a breeders cultivar and
exclusive contract with a seed company offer complements or substitutes to IPR.
for the production of specified cultivars. Some contracts are aimed primarily at pre-
When a cultivar is not protected by PVP, the venting seed saving and multiplication,
authorities can assign one or more maintain- whereas others are aimed at protecting the
ers to meet the continued demand for seed. germplasm from being used in competitors
Seed certification requirements can also be breeding programmes.
used to limit informal seed sales, especially One type of contract that is increas-
when they occur on a large scale. ingly prevalent in the US seed market is the
Where seed law specifies that a culti- grower contract, or bag tag. This simple
var must be approved through a registration (unsigned) agreement restricts the farmer
process or on the basis of performance tests from using or disposing of any part of the
before entering commercial seed produc- harvest as seed. Farmers are considered to
tion, this provision can also prohibit the comply with the provisions of such con-
sale of a released cultivar under a different tracts when they open the seed bag. If con-
name. In this way, the law limits the extent trolling the market for the harvested product
to which a competing company can market becomes possible, another type of contract
seed of a protected or an essentially derived can be enforced. The breeder can oblige a
version of a released cultivar, including the grower to use crop cultivars in certain ways
unauthorized use of a transgene. and can impose restrictions on the saving or
Commercial seed systems usually begin multiplication of planting material. In the
with products that are difficult for farmers cut flower industry, for example, the vast
to save (hybrid cultivars or small seeded majority of the output is sold in a limited
vegetables) and that generally require number of wholesale markets. If a flower
little IP protection. As the seed industry cultivar is protected in the country where a
matures and farmers recognize the value major wholesale market is located, growers
of commercial seed, companies will offer in other countries may have to sign contracts
a wider range of products, some of which limiting multiplication or unauthorized sale
may require attention to IPR. Seed industry of that cultivar, or they risk being denied
development usually parallels the growth further access to the major market.
of agribusiness and markets for particular Access to germplasm may also be
commodities may demand specific atten- controlled through material transfer
tion to IPR. Seed companies can sell seed agreements (MTA), which may be seen
to farmers who recognize the quality and as another form of contract regulating the
convenience of commercial seed, on the use of plant germplasm. Such an example
basis of reputation and branding as is the involving MTA includes the Agreements
case for small-size vegetable seeds in vari- signed by the CGIAR centres with the
ous countries. FAO as discussed in Section 13.3.5. MTA
and other contractual arrangements can
be used by private companies to control
access to genes or transgenic cultivars that
13.4.5 Contract law are protected by IPR in one country, even
if the recipient country does not recognize
A contract is a legally binding exchange of the particular IPR. For example, when a
promises or agreement between parties that national agricultural research organiza-
the law will enforce. Breach of a contract tion contracts with a major biotechnology
IPR and PVP 523
company to use particular proprietary protected long after the PVP expires. Some
transgenes, the contract may specify how countries prohibit the use of separate trade
the national organization is to use the names and prescribe that the name regis-
genes, the rights to any technologies that tered in the PVP or seed law lists is to be
are produced and the companys obliga- used in commerce.
tions (for example, to provide training or
other assistance). Access to various tools
and processes of biotechnology, such as
genetic transformation techniques or diag- 13.4.7 Trade secrets
nostic methods, is also usually subject to
contracts specifying limitations on their A trade secret can be considered a formula,
use and the rights of the provider in rela- practice, process, design, instrument, pat-
tion to commercial products. tern, or compilation of information used
by a business to obtain an advantage over
competitors within the same industry or
profession. In some instances, secrecy is an
13.4.6 Brands and trademarks effective way to protect certain technolo-
gies and the choice between patenting and
As a symbol such as a name, logo, slogan secrecy may depend on the type of tech-
and design scheme, which embodies all the nology and the size of the company. Trade
information connected to a company, prod- secrets may not be included in a separate
uct or service, brands and trademarks are body of law but come under standard trade
part of IP law, but their utility in the seed law. In plant breeding, the primary exam-
industry is often overlooked in the policy ple of a trade secret is the protection of
debate about IPR (IBRD/World Bank, 2006). the inbred lines used to produce a hybrid.
A minor point to remember is that terms The ability to exploit this type of secrecy
such as AFLP and Breeding by Design, depends to an important extent on the
both trademarks of Keygene, Inc., should degree of physical security that can be pro-
carry the or designation. Seed compa- vided to plant breeding facilities and seed
nies frequently register their brands and multiplication plots. Registration require-
trademarks as a way of distinguishing their ments (under PVP or seed law) may require
products from those of their competitors the breeder to provide information on the
and building up a loyal customer base. In pedigree (e.g. the specific inbred lines) or
the absence of other IP instruments, the even deposit samples of the different par-
development of a strong brand image and ent lines. This requirement can nullify the
reputation can protect a company from some trade secret unless the registration author-
types of competition. While trademarks can ity can keep the information and materials
be effective in communication with custom- confidential. Advances in biotechnology
ers (farmers), they do not protect a breeder make this type of secrecy more difficult
from competitors who steal the cultivar to maintain, as reverse engineering of new
and include it in their own (branded) prod- cultivars becomes easier. Even though such
uct portfolio. actions might be covered by the enforce-
As there is usually a prohibition against ment of provisions on EDVs, they help to
using a cultivar name registered under PVP explain the pressure from some parts of
as a trademark, it is much less common for the seed industry for further limitations
crop cultivars to be trademarked. However, on the breeders exemption (IBRD/World
in some cases a trademarked cultivar name Bank, 2006). Trade secrets are also use-
may be very useful (IBRD/World Bank, ful for protecting certain aspects of plant
2006). For example, flower breeders often biotechnology, particularly procedures or
register a cultivar through PVP under one techniques that cannot be detected in the
name but market it under a second, trade- final product, such as markers and regen-
marked name, which can be used and eration methods.
524 Chapter 13
Dunwell (2005) reviewed the wide range of granted US utility patents in the category
of existing patents that cover all aspects of genetic transformation from 1976 to 2000
transgenic technology, from selectable mark- is available at http://www.ars.usda.gov/
ers and novel promoters to methods of gene data/AgBiotechIP/. For detailed analysis of
introduction. Although few of the patents several of the key areas under discussion,
in this area have any real commercial value, the reader is referred to detailed summaries
there are a small number of key patents that published elsewhere, for example in the
restrict the freedom to operate of new com- series of comprehensive CAMBIA White
panies seeking to exploit the methods. Since Papers (http://www.cambia.org/daisy/bios/
the late 1980s, these restrictions have forced home.html). Frequently, the main point of
extensive cross-licensing between agricul- interest in these discussions is the coverage
tural biotechnology companies and have of the patent(s) in question.
been one of the driving forces behind the
consolidation of these companies. Transformation methods
During the period since the produc-
tion of the first transgenic plants a wide As described in Chapter 12, there are several
diversity of patents have been sought on techniques for the introduction of recom-
all aspects of the process, ranging from the binant vectors containing heterologous
underlying tissue culture methods through genes of interest into plant cells and the
to the means of introducing the heterolo- subsequent regeneration of plants from such
gous DNA and to the composition of the cells. Some of the patents covering these
DNA construct so introduced. The summary techniques are summarized in Table 13.2.
Table 13.2. Selection of patents/applications covering plant transformation methods (from Dunwell
(2005) with permission from Wiley-Blackwell).
Most of these methods involve a tissue cul- Almost all the significant components of
ture step and many of these enabling proto- the constructs used in plant transforma-
cols are also the subject of patent claims. tion have been the subject of patent cover-
The most extensive publication in this age. These include the effect gene as well
area is the 360-page CAMBIA White Paper as its associated regulatory sequences, the
(Roa-Rodriguez and Nottenburg, 2003a) selectable or screenable marker and addi-
on Agrobacterium-mediated transforma- tional sequences that might be required for
tion. This document focuses on the patents the subsequent excision of the transgene.
directed to methods and materials used for It is important to recognize that patents
transformation, mainly of plants, but also of on plant genes affect more than just the pro-
other organisms such as fungi. duction of transgenic cultivars. It is possible
to identify and protect genes that are used
Genes and DNA sequences in more conventional breeding procedures.
For instance, several herbicide-tolerant crop
Much of the debate in this area concerns cultivars commercially available in North
the ability to apply for patents on DNA America incorporate patented genes that
sequences of unproven function. There have been identified through techniques
have been several attempts to do so and the such as mutagenesis or whole cell selec-
decisions on such applications have not tion and then incorporated in new crop
been finalized. However, the fact remains cultivars through conventional breeding.
that there is much useful sequence informa- Another example is imidazolinone-resist-
tion available in patent databases and much ant maize, which is being tested in sub-
of it is ignored by academic research scien- Saharan Africa to control the weed Striga.
tists. Specifically, it is estimated that some The key to patent protection in these cases
3040% of all DNA sequences are only is the definition of novelty that is, some
available in patent databases, since there is countries prohibit patent protection on
of course no obligation for commercial (or substances found in nature, which are con-
other) applicants to submit their sequences sidered to be discoveries rather than inno-
to public databases. Possibly, the best way vations. In most cases, a discovery must be
to access this information is via the GENESEQ further developed in order to be considered
system, a commercial (Derwent) service. an innovation and eventually gain a patent
As described in Chapter 12, transgenic that may effectively include the discovery.
crops are distinguished by the presence of However, genes discovered and developed
several types of foreign genetic material. in the course of conventional breeding can
These include: (i) functional genes (that is, be patented in several countries. IBRD/
genes that code for insect resistance, herbi- World Bank (2006) provided an example of
cide tolerance, or other desired character- the resistance to aphid (Nasonovia ribisn-
istics); (ii) selectable marker genes (which igri) in lettuce, patented by a Dutch breed-
have characteristics easily identifiable in the ing company in the USA and Europe. The
laboratory and, when linked to a functional European patent is, however, under appeal
gene, facilitate the detection of transformed from various sides, including some impor-
cells); (iii) promoters (which regulate the tant vegetable seed companies. So far, the
timing and location of the expression of US Patent and Trademark Office (USPTO)
functional or marker genes); and (iv) end and the European Patent Office (EPO) have
sequences (portions of DNA that termi- treated isolated and purified nucleotide
nate transcription). These different types sequences as if they were the same as man-
of genes, sequences and techniques used in made chemicals (Doll, 1998). Andrews
developing transgenic crops, as well as the (2002) argued that the useful properties of
diagnostic tools and processes of marker- a gene sequence (such as its ability to bind
assisted breeding used to produce conven- to a complementary strand of DNA for diag-
tional crop cultivars, are all candidates for nostic purposes) are not ones that scientists
patent protection (IBRD/World Bank, 2006). have invented, but instead, are natural,
IPR and PVP 527
inherent properties of the genes themselves. is provided by the Bt genes that are used
Moreover, gene patents do not meet the cri- for insect resistance in cotton, maize and
teria of non-obviousness, because, through other crops (IBRD/World Bank, 2006). The
in silico analysis, the function of genes Bt bacterium produces certain insecticidal
can now be predicted on the basis of their proteins and has been used as a source of
homology to other genes. natural insecticide for many years. The
Although the possibility of patenting techniques of biotechnology have allowed
genes is controversial, the concept itself the identification and transfer of the genes
seems straightforward. Even so, several that code for these crystalline (Cry) pro-
issues contribute to making this area a teins; the nomenclature describes a series
particularly complex one for patent law. of different cry genes (found in different
One problem is related to broad patent strains of the bacterium), each coding for a
claims, which may cut a swathe as wide as distinct Cry protein that is effective against
all genetically engineered cotton plants. specific insects. Thus the cry1Ac gene
Although such comprehensive claims codes for the Cry1Ac protein that is effec-
may be more difficult to make now than tive against the cotton bollworm and is the
in the early years of biotechnology, the basis of most versions of Bt-cotton. Not only
issue of broad patents remains a concern are there various claims on genes that code
for many areas of research, including the for specific Cry proteins; the cry genes that
plant breeding industry (Barton, 2000). are used in transgenic plants are synthetic
Another issue that affects gene patenting and significantly different from the origi-
is the degree to which claims are allowed nal wild genes found in the bacterium. In
for genetic material whose functions are most cases, the Bt genes are codon modi-
incompletely understood. For instance, fied because part of the code that functions
the Human Genome Project witnessed a in a bacterium must be changed to be more
rush towards patents for a wide range of effective in a plant. So although the insecti-
DNA sequences without any correspond- cidal protein that is produced by the trans-
ing characterization and although such genic plant may be essentially identical to
practices are more prevalent in the phar- that produced by the bacterium, the govern-
maceutical industry than in plant breed- ing gene may look somewhat different and
ing, they illustrate that there is not yet a patent claims can be made on the modified
widely accepted definition of how genetic gene and the techniques used for its modi-
material qualifies for a patent. This issue fication. A cry gene may be further altered
is related to a third issue, which concerns by eliminating certain portions to produce
the type of genes or DNA sequences that a truncated form of the gene (which may
might be patented. Claims have been made prove more effective) and research has also
for protecting DNA that does not consti- created fusion genes that code for novel
tute a complete gene, including promoters, proteins combining parts of two differ-
nucleic acid probes (used to identify DNA ent Cry proteins. The various types of cry
sequences) and polymorphisms. On the genes must be linked with specific promot-
other hand, patents have been sought for ers as well. The potential patent claims on
collections of genes, from bacterial clon- various aspects of the process and disputes
ing vectors to entire genomes (IBRD/World over definitions of novelty explain why Bt
Bank, 2006). Both the EPO and USPTO now technology causes considerable uncertainty
have stronger guidelines concerning claims among scientists in developing countries
on genes: there must be a good knowledge and it is the subject of continuing legal
and description of the genes function. disputes among the major biotechnology
A fourth issue that complicates the multinational corporations. Although the
granting and defence of gene patents is the Bt example is particularly complex, it illus-
variable nature of the genes themselves. trates that genetic modification is rarely a
A good example of the difficulties in identi- case of simply identifying and moving a
fying what precisely is eligible for protection gene from one organism to another and it
528 Chapter 13
demonstrates how patent claims on genes in conjunction with a transgene that con-
may cover a range of issues. fers resistance or tolerance to the chemi-
cal through detoxification or modification
Selection and identification of transformants of the chemical. Much of the original work
was conducted using antibiotic resistance
The production of transgenic organisms, marker (ARM) genes, which confer resist-
including plants, involves the delivery of a ance to antibiotics such as neomycin, kana-
gene of interest and the use of a selectable mycin and hygromycin. Roa-Rodriguez and
marker that enables the selection and recov- Nottenburg (2003b) provided a summary
ery of transformed cells. This is necessary of the most important scientific aspects
because only a minor fraction of the treated of such resistance genes, together with an
cells become transgenic while the major- analysis of selected patents that relate to
ity remain untransformed. It has been esti- the most widely used ARM. Many of these
mated recently (Miki and McHugh, 2004) marker genes are covered by patents or pat-
that approximately 50 marker genes used ent applications (Table 13.3) with a thor-
for transgenic and transplastomic plant ough IP analysis available on antibiotic
research or crop development have been markers and Basta resistance (Mayer et al.,
assessed for efficiency, biosafety, scientific 2004). As an alternative, or addition, to the
applications and commercialization. use of selectable markers, transformants are
Selectable marker genes (see Table 13.3 often identified through the use of reporter
for selected patents) can be divided into or visualization molecules.
several categories depending on whether
they confer positive or negative selection Promoters and other regulatory elements
and whether selection is conditional or
non-conditional on the presence of external Regulatory elements are crucial to gene expres-
substrates. The most common strategy cur- sion in all organisms. The patent landscape
rently used for selection is negative selec- of transcriptional regulators that are consti-
tion, the elimination of non-transformed tutively active, spatially active (e.g. tissue-
cells in conditions where the transformed specific) and temporally active (e.g. induced
cells are allowed to thrive. Elimination is or active in response to a certain chemical or
often affected by treatment of cells with physical stimulus) has been well summarized
chemicals, (e.g. antibiotics or herbicides) (Roa-Rodriguez, 2003).
Table 13.3. Selection of patents covering selectable marker genes (Pardey et al., 2003).
Although the inventions protected by remaining edible part of rice grains, the
individual patents cannot be exactly the endosperm, lacks several essential nutrients
same, in certain cases, there are patents including pro-vitamin A. Thus, predomi-
that due to the breadth of their scope may nant rice consumption promotes vitamin
encompass other protected inventions or A deficiency. A combination of transgenes
there may be patents which share common enabled biosynthesis of pro-vitamin A in
features. Where that is the case, Dunwell the endosperm (Ye et al., 2000). GM-rice that
(2006) pointed out the juxtaposition of the produces b-carotene (pro-vitamin A) in the
different inventions and the possible room endosperm shows the yellow colour of the
left to manoeuvre around the different enti- grain that is visible after milling and polish-
ties in the field. It also needs to be taken ing, from which the generic name Golden
into account that there are patents that Rice is derived. Golden Rice could be used
while not totally directed to promoters may in food-based approaches and complement
have an effect on gene expression control. others, in reducing the persistent problem
This is the case for the restrictive reproduc- of vitamin A deficiency in rice-dependent
tive technologies, for example, those termed populations. The Golden Rice technology
as terminator technologies, which may was developed by I. Potrykus and P. Beyer
have a great impact on the use and develop- with their co-workers and was funded
ment of methods to regulate the expression by the Rockefeller Foundation, the Swiss
of genes related to plant reproduction and Federal Institute of Technology, the EU and
seed generation. the Swiss Federal Office for Education and
Science.
Golden Rice as an example for Golden Rice and its use in grain pro-
freedom-to-operate duction has involved a lot of controversies.
It has been suggested that extensive patent-
One of the issues of over-riding importance ing has hampered delivery of this rice to
to all companies is whether or not they are those in need since about 40 organizations
free to commercialize any particular prod- hold 72 patents on the technology under-
uct. Such freedom-to-operate is deter- lying its production (Kryder et al., 2000).
mined by the status of any IPR that might The range of patents covering various com-
cover the product in question and analysis ponents of the pBin 19hpc plasmid used
of such IPR requires continuous (and there- in the production of this rice include ones
fore expensive) surveillance. on the phytoene trait genes, the promoter
A well known example that can be sequences, the selectable marker and the
used to demonstrate the complexity of this transit peptide. Table 13.4 shows the prod-
issue is Golden Rice, a transgenic line that uct clearance profile detailing the possible
is enhanced for b-carotene (pro-vitamin A) required licences and/or agreements for
(Ye et al., 2000). Vitamin A deficiency causes Golden Rice. Table 13.5 lists the tangible
symptoms ranging from night blindness to property received by ETH-Zurich, including
those of xerophthalmia and keratomalacia, the apparatuses used in the transformation.
leading to total blindness. In developing Some components were obtained under
countries, 500,000 children year1 go blind research-only licences or research-only
and up to 600 day-1 die from vitamin A mal- MTA whereas others included use licences.
nutrition (Potrykus, 2005). As oral delivery The challenges to freedom-to-operate for
of vitamin A is problematic, mainly due Golden Rice at national and international
to the lack of infrastructure, alternatives levels include: (i) the technology is quite
might be found in supplementation of the complex with many sophisticated compo-
major staple food with pro-vitamin A. As a nents and processes; (ii) many potential IP
table food for many countries, rice is usu- owners or assignees; (iii) the range of poten-
ally milled to remove the oil-rich aleurone tial producers and consumers of Golden
layer that turns rancid upon storage. The Rice is wide; (iv) a rapidly evolving global
530 Chapter 13
Table 13.4. Product clearance profile: possible required licences and/or agreements for GoldenRice
(from Kryder et al. (2000) with permission).
IP landscape; and (v) Golden Rice may The inventors have reached an agreement
have significant commercial values (Kryder with Greenovation and Zeneca (now
et al., 2000). This issue has been overcome Syngenta)to enable the delivery of this
by a coordinated international programme technology free-of-charge for humanitarian
designed to streamline the production purposes in the developing world.
and distribution of this material (http:// Inventors (Beyer and Potrykus) assigned
www.goldenrice.org/). However, perceived their rights exclusivelyto [Syngenta] for
problems with access to Golden Rice and all uses; [Syngenta] licensed inventors for
essential medicines have stimulated debate humanitarian uses, with right to sublicense
public research institutes and poor farmers
within the USA on the obligations of US
in developing countries; the technology is
universities to facilitate the provision of to be made freely available, poor farmers
goods for the public benefit (Kowalski and can trade Golden Rice locally; [Syngenta]
Kryder, 2002; Phillips et al., 2004). will support inventors in this task; and
The deal for Golden Rice has the [Syngenta] retains commercial rights. In
following clauses: the Golden Rice Deal, Syngentas role is to
IPR and PVP 531
Table 13.5. Material transfer agreements (MTAs), licences, documents and agreements relevant to
Golden Rice (from Kryder et al. (2000) with permission).
Rice germplasm transformed with Taipei 309, obtained from International Rice
Research Institite (IRRI) gene construct(s)
PGEM4 Promega
PbluescriptKS Stratagene
PCIB900 Ciba-Geigy Limited (now Novartis Seeds AG)
CaMv35S promoter (component of pCIB900) Monsanto
CaMv35S terminator (component of pCIB900) Monsanto
AphIV gene: hygromycin phosphotransferase Ciba-Geigy Limited (now Novartis Seeds AG)
(component of pCIB900)
pKSP-1 Thomas Okita, Washington State University
GT1 promoter: glutelin storage protein Thomas Okita, Washington State University
(component of pKSP-1)
pUCET4 N. Misawa, Kirin Brewery Co., Ltd
Pea Rubisco transit peptide N. Misawa, Kirin Brewery Co., Ltd
(component of pUCET4)
CrtI gene: phytoene desaturase N. Misawa, Kirin Brewery Co., Ltd
(component of pUCET4)
PPZP100 Pal Maliga, Rutgers University
pYPIET4 Clontech, but now marketed by Life Technologies
Electroporation apparatus Bio-Rad Corp., Gene Pulser II System
Miroprojectile bombardment apparatus Bio-Rad Corp.
help the inventors in the management of are up to 8.0 and 36.7 g g1, respectively,
Golden Rice deployment for humanitarian compared to 1.21.8 g g1 for Golden Rice
purposes and with other companies and (Plate 4, colour photograph from Paine et al.,
universities obtained FTO for humanitarian 2005). Consistent with Syngentas support
use; provide biosafety expertise; and
of the Humanitarian Project for Golden Rice,
share available regulatory data. Here
Humanitarian Use means (research leading
SGR2 transgenic events will be donated
to): developing country use (FAO list); for further research and development. The
resource poor farmer use (<US$10,000 pa use of the SGR2 events will be governed by
from farming); in public germplasm (= seed); the strategic directions of the Golden Rice
there must be no charge for technology Humanitarian Board and full regulatory
(normal costs can be recovered; no premium); compliance. It is expected that the third gen-
local sales are allowed by such farmers ( eration of Golden Rice will be the rice with
urban needs); and replanting is allowed. a high level of pro-vitamin A and the normal
Other license terms include regulatory colour of polished rice grain, which is more
requirements national sovereignty (or
acceptable to most rice consumers. Looking
international standards); no export of grain
allowed (or seed, expect for research, to
into the future, an interesting issue has been
other licenses) liability, trade, biosafety pointed out by Potrykus (2005), that is, expe-
approvals; and obliged to fulfil all regulatory rience with the Humanitarian Golden Rice
requirements. project has shown that extreme precaution-
ary regulation not IPR prevents use of
Golden Rice 1 (SGR1) and Golden the GMO potential to the benefit of the poor
Rice 2 (SGR2), as the second generation of and that the public domain is incompetent
Golden Rice, were developed by Syngenta and unwilling to deliver products. But a
as a part of their commercial pipe-line and decade after its invention, Golden Rice is
their pro-vitamin A (b-carotenoid) levels still stuck in the laboratory. Well-organized
532 Chapter 13
DNA
1 2
Selection of
SSR-primers
6
3 7
PCR 4 8
13
5
14
15
M2
882
* *
Analysis * * M3
* M1
895
878
*
870 880 890 900
9 10 11 12
Fig. 13.3. An overview of a typical experiment involving SSR markers and some sample patents that
are relevant for the different steps as described in Chapters 2 and 3. Patents are indicated by numbers.
Starting with DNA isolation, the DNA is sometimes cut by restriction enzymes. After selection of specific
SSR primers, a PCR reaction is carried out. There are different possible methods for the analysis of
the PCR product including gel electrophoresis (the fluorescent label of the PCR product is indicated
by *), mass spectrometry, and microarray analysis. The result from molecular marker analysis can be
used in marker-assisted plant breeding, which involves several patents. From Jorasch (2004) with kind
permission of Springer Science and Business Media.
products. The most common one, the product sizes. A second method of analysis,
analysis by gel electrophoresis, can also be mass spectrometry, is claimed by patent No.
claimed by patents, if for example special 10 (Hillenkamp and Kster, 1999). This pat-
fluorescent labels for detection are used. ent generally claims the analysis of nuclei
Such a method is claimed by patent No. 9 acids by mass spectrometry in general. The
(Shuber and Pierceall, 2002). The claimed microarray technique that can be used for
method comprises the PCR with fluorescent high-throughput analysis of probe is pro-
primers, the detection of the labelled exten- tected by a patent of Affymetrix, patent No.
sion products and the comparison of the PCR 12 (Fodor et al., 1998). This specification not
534 Chapter 13
only protects the detection of microsatellite depend on the conditions under which the
markers by microarray analysis, but also the technology was acquired and the wording
detection of nuclei acid sequences in general of any contract with the supplier. So-called
which comprises microsatellites. Another reach through claims have not seemed to
high-throughput technique described in pat- play a significant role in this area to date.
ent No. 11 (Olek, 1996) combines the meth- Patent offices have also become aware of the
ods of mass spectrometry and microarray negative effect of these claims and are very
analysis of microsatellite markers. In addi- critical in granting wide claims.
tion, patent specifications No. 13 (Caskey National patent systems have been
and Edwards, 1992), No. 14 (Perlin, 1995) unable to keep pace with the rapid devel-
and No. 15 (Saint-Louis and Paquin, 2003) opment of plant biotechnology, leaving
summarize the complete experimental proc- many areas of uncertainty and dispute. In
ess from DNA extraction to the analysis of developing countries, only a small minor-
PCR products, in which different PCR meth- ity of patent offices have begun to consider
ods are combined, e.g. use of certain labelled applications related to plant biotechnology,
nucleotide triphosphates and different ana- while in several industrialized countries a
lytical tools such as mass spectrometry or number of claims to basic technologies are
computer analytical tools. still the subject of complex court cases. It is
therefore impossible to chart an unambigu-
Marker-assisted breeding methods ous course for the development of effec-
tive IPR regimes for plant breeding-related
The most comprehensive patent specifica- biotechnology, but it is important to recog-
tions claim complete plant breeding meth- nize the major parameters and to identify
ods in which molecular marker analysis is the issues that will affect IPR policy in the
used. Examples are patent specifications coming years. Areas of particular concern
No. 16 (Byrum and Reiter, 1998), No. 17 include the protection of genes and other
(Beavis, 1999), No. 18 (Openshaw and sequences, the methods used for genetic
Bruce, 2001) and No. 19 (Jansen and Beavis, transformation, information in bioinformat-
2001) (Fig. 13.3). These comprise previously ics databases and the diagnostic techniques
mentioned experimental steps in which that biotechnology offers conventional plant
they claim the association of genotype with breeding (IBRD/World Bank, 2006).
phenotypic traits of interest for molecular
marker analysis. The patents differ in the
selection of plan populations that are the
basis for the analysis, statistical methods 13.5.3 Product development
applied in the analysis and the integration and commercialization
of molecular biological techniques such as
expression profiling of genes. There are multiple steps involved in the
This section just describes how bio- procedure of developing biotechnology and
technology patents would affect marker-as- breeding products, each of which might be
sisted plant breeding using microsatellites associated with specific IPR issues. For exam-
as an example. As SNP markers and gene- ple, the development of Bt maize involves
based markers become increasingly feasible, multiple steps that are associated with spe-
more claimed patents will be associated cific patents and IPR issues: (i) gene owner-
with their application in plant breeding. ship (CrylF, PAT marker gene); (ii) enabling
Although MAS techniques are used in con- technologies (microprojectile bombardment,
ventional plant breeding and no foreign herbicide selection, backcrossing, production
DNA sequences become part of any result- of fertile transgenic plants); (iii) enhanced
ing cultivar, the use of patented diagnostic expression (chimeric genes using viral
technology may have implications for a promoters, enhanced expression, enhanced
plant breeders ability to claim ownership of transcription efficiency, selective gene
the final product. The exact situation will expression); and (iv) developing elite maize
IPR and PVP 535
inbreds and hybrids (patented inbreds, to add to and modify the best list we can
hybrids and patents for associated traits and generate so far.
genes).
There are many IPR issues involved in
delivering a transformation product from
research to the farmers field. For exam- 13.6 Use of Molecular Techniques
ple, in Bt-maize IPR issues would include: in Plant Variety Protection
(i) research agreements among major players
allowing forward movement in plant biotech- Molecular techniques, particularly molecu-
nology; and (ii) cross-licences for Roundup lar markers, have been widely used in all
Ready (RR) YieldGard. Monsanto licenses procedures involved in plant breeding and
Herculex 1 whereas Pioneer licenses RR for some fields of PVP as well. For example, on
maize, soybean and canola, or Pioneer needs 16 and 17 June 2005, the Plant Production
to deal on germplasm issues with Monsanto. Division of the Canadian Food Inspection
Likewise, there was competition for devel- Agency (CFIA) and the National Forum on
oping basic technologies to most effective Seed jointly held a seminar on UPOV plant
use of technologies to develop improved variety protection and the use of molecular
products. In addition, payment for technol- techniques. The objectives of the seminar
ogy or germplasm research is ultimately were: (i) to provide information to Canadian
dependent on farmer purchases of seed. plant breeders and other stakeholders on
It can be expected that a large number PVP and the use of molecular markers under
of new techniques will be developed and the UPOV Convention; and (ii) to facilitate
associated patents will be claimed in the discussion on the potential application of
field of molecular breeding in the near molecular techniques to PBR, cultivar reg-
future. The new patents that may add to the istration and seed certification in Canada.
current patent list and will affect molecular Information from this seminar is available
breeding include: at http://www.inspection.gc.ca/english/
plaveg/pbrpov/molece.shtml (CFIA/NFS,
High-throughput automated molecular
2005).
marker profiling.
There are a number of advantages to
High-throughput gene expression as-
the use of DNA marker techniques in plant
says using DNA on silicon chips.
breeding as described in Chapter 8, most of
High-throughput proteomics assays.
which are also applicable to PBR. Among
High-throughput DNA sequencing
all available molecular markers discussed
facilities.
in Chapter 2, SNP is the most prolific and is
The ability to DNA profile both the
very efficient and inexpensive to use once
female and the male parents of hybrids
developed. The technology holds enor-
without accessing either parent per se
mous potential for multiplexing and high
via use of maternally inherited tissue.
throughput. SNP technology is being used
The ability to conduct genome-wide
to characterize germplasm and in breeding
genetrait association studies involving
programmes. Broad adoption of this technol-
hundreds or thousands of genotypes,
ogy would be useful to the plant protection
including heterogeneous complexes
regulatory systems, especially for cultivar
such as landraces.
identification and protection purposes.
The ability to conduct genome-wide
While SSRs are now generally accepted in
scans comparing domesticated cultivars
the courts, there are some inherent limita-
or landraces and to compare them with
tions that would be overcome through the
wild relatives to identify potentially
use of SNP.
useful loci and novel genetic diversity.
Traditional methods based on mor-
It would not be surprising at all if several phological observations take time to com-
years later many other fields and associ- plete and results are influenced by the
ated techniques and knowledge will have environment. Molecular tools can play a
536 Chapter 13
between the parents and p, the parental domly distributed markers, particularly
genome contribution transmitted to the with medium-to-low marker densities.
progeny. The treatise provides estimates
of p and the variances of p (s 2i ; Wang and
Bernardo, 2000). Morphological distances
based on 25 traits and midparent hetero- 13.6.3 Cultivar identification
sis for 12 traits were observed for a total
of 58 European maize inbred lines com- The identification of cultivars is an import-
prising 38 triplets. A triplet consisted of ant aspect of plant production systems and
one homozygous line derived from an F2, is central to the protection of IPR through
BC1 or BC2 population and both parental PBR. Preston et al. (1999) discussed the
inbreds. All inbreds were genotyped with application of a range of molecular marker
100 uniformly distributed SSR markers technologies to three points in the PBR
and 20 AFLP primer combinations in com- registration process: (i) the analysis of the
panion studies for calculation of genetic genetic distance between a candidate cul-
distances. Correlations between the co- tivar and the existing pool of cultivars in
ancestry coefficient, genetic distances and order to define a set of comparison culti-
morphological distances and midparent vars; (ii) the contribution to the generation
heterosis were significant and high for of a description of the cultivar for PBR reg-
the majority of traits. However, thresh- istration; and (iii) the use of DNA markers to
olds for EDVs to discriminate between F2- investigate and resolve the identity for cul-
and BC1-derived, or BC1- and BC2-derived tivars in cases where infringement of PBR
progenies using only morphological dis- is claimed. Molecular techniques may be
tances or heterosis yielded a considerably particularly useful in resolving the dispute
higher probability of error than observed relating to cultivar infringement (i.e. some-
with genetic distances based on SSRs one selling anothers cultivar) dealt with by
and AFLPs (Heckenberger et al., 2005b). breeders in the courts. For example, molec-
Consequently, morphological traits and ular techniques are used in Canada by the
heterosis are less suited for identification Grain Research Laboratory (GRL) for culti-
of EDVs in maize than molecular markers. var identification testing of wheat and bar-
Heckenberger et al. (2006) observed ley. The GRL has used two methods: acidic
considerable differences between AFLP- polyacrylamide gel electrophoresis (acid
and SSR-based mean genetic distance es- PAGE) and high performance liquid chro-
timates for unrelated inbred lines. With matography (HPLC). Protein fingerprinting
each marker system, the genetic distance works well, there are however limitations.
between progeny lines and parents was With acid PAGE, estimates of sample com-
little affected by the variation in genetic position are based on single kernels and
distance between the parents. Substantial large numbers of kernels can be necessary
differences in Type I and Type II errors for statistically reliable estimates. HPLC
were detected between flint and dent maize can be used on ground samples, but it is not
germplasm pools with different marker sys- suitable for complex mixtures. Both meth-
tems and when fixed EDV thresholds were ods are limited by finite protein diversity;
considered. It was suggested that threshold there are a limited number of protein differ-
levels should be crop-specific. With a crop, ences among cultivars and not all cultivars
thresholds should be germplasm pool- are distinguishable.
specific. In addition, thresholds should Quantitative DNA methods are now
also be molecular marker system-specific being developed using SNP and insertion/
because marker systems vary in the way of deletion polymorphisms (Indels). The goal
generating polymorphism. Heckenberger is to be able to look at ground samples of
et al. (2005a) reported that correlation grain to determine the cultivars present in
between true and estimated genetic dis- a mixture and their proportions. Key chal-
tances was considerably lower for ran- lenges ahead include the development of
540 Chapter 13
accurate and sensitive quantification meth- Common seed is less costly than certified
ods and, ultimately, the development of seed and includes farm-saved seed.
portable technologies that are capable of Certified (pedigreed) seed is used by
delivering rapid results. farmers who want additional assurance on
Japanese barberry (Berberis thunbergii) seed quality, cultivar purity and perform-
is an ornamental shrub desired for its hardi- ance. It is derived from a crop that has been
ness and attractiveness. However, because it issued a crop certificate from a seed associa-
is a host of black stem rust of wheat, it has tion like CSGA indicating it has been granted
been banned from importation. With per- Breeder, Select, Foundation, Registered or
mitted importation of 11 rust-resistant culti- Certified status. Production of certified seed
vars, molecular identification methods were involves the planting of known seed stocks,
used to assist CFIA inspectors with identi- previous land use restrictions, minimum
fication of the permitted cultivars, as their isolation distances and field inspections.
appearance is not always consistent with CFIA seed laboratories use a number of
the morphological criteria, particularly for test methods for the determination of cul-
plants imported in the dormant state. AFLP tivar purity and identity, depending on the
test results identified 33 reference polymor- crop kind and other factors. CFIA seed labo-
phic bands. A sample is of the same cultivar ratories are International Organization for
if 31 or more polymorphic bands are shared, Standardization (ISO)-accredited and there-
whereas if the number of shared bands is 28 fore carry responsibilities regarding the use
or fewer, it is not considered to be of the of validated methods. The methods they
same cultivar. Should results show 29 or 30 use are classified as routine or non-routine
shared bands, the DNA is re-extracted and and range from field, growth chamber and
more primer sets are used to set the refer- greenhouse grow outs to PCR.
ence bands to 64 (CFIA/NFS, 2005). Certified seed must be processed by
While one gene or one trait may be suf- an approved conditioner or by the grower
ficient for identifying a cultivar in one spe- of the seed and it must be sampled, tested
cies, it may not be sufficient in all species. and graded by accredited industry person-
It may be appropriate to take a case-by-case nel. Molecular markers are not currently
and crop-by-crop approach. used in CFIA seed laboratories because seed
certification has traditionally been based on
phenotypic traits observable during crop
inspection. However, molecular markers
13.6.4 Seed certification have the potential to be used as a control
tool to ensure seed certification is working
The purpose of seed certification is to provide as it should. This could expand the level of
high quality seed to consumers by maintain- public confidence regarding the purity level
ing the cultivar identity and purity of seed and security of certified seed or grain.
and ensuring high standards of germina-
tion, seed health and mechanical purity. For
example, in Canada for seed to be certified,
it must be a recognized cultivar, multiplied 13.6.5 Seed purification
according to strict rules that include proc-
ess standards and cultivar purity standards Breeder seed purification is a 3-year pro-
established and monitored by the Canadian cess. For example in wheat, the first year
Seed Growers Association (CSGA). involves head selection from seed increases
Common seed must meet germination, of material also being tested in first-year
disease and mechanical purity standards, collaborative trials. Year 2 includes grow-
but there are no cultivar identity or purity ing single-head-derived breeder lines in
guarantees associated with the purchase or hill or short row plots with line discards
use of the seed. For most crop kinds, com- based on visual and in some cases chemi-
mon seed may not be sold by cultivar name. cal phenotype. During the third year the
IPR and PVP 541
remaining breeder lines are grown as indi- The system follows UPOVs DUS
vidual breeder long rows with further line requirements. The cultivar must also be
discarding based on visual phenotype novel, i.e. commercialized for less than 1
with all rows of the remaining single-head- year within the EU and commercialized for
derived lines bulked as the first breeder less than 5 years outside the EU (6 years for
seed. In addition during year 3, the cultivar trees). Protection is granted for 25 years (30
is visually described in order to be registered years for trees, vines and potatoes) and pro-
and for purposes of future pedigree seed vides that authorization of the right-holder
production (CFIA/NFS, 2005). Purification is required for the multiplication, sale or
for molecular characterization is the same, international trade of the cultivar.
with line discards during year 3 also based There is currently no use of molecu-
on molecular characterization and purifi- lar techniques in CPVO DUS testing pro-
cation, but the molecular characterization tocols, however CPVO funds research and
process occurs in the laboratory rather than development projects on the potential use
in the field. of molecular techniques and supports ongo-
An important application of molecu- ing discussions and consultations on impli-
lar techniques in seed purification is seed cations and issues related to molecular
purity testing, particularly for hybrid crops. techniques. CPVO received requests from
Using cucumber as an example, Staub breeders to add a genetic fingerprint to the
(1999) illustrated the usefulness of genetic official cultivar description to facilitate the
markers in hybrid seed production includ- enforcement of European Community plant
ing purity testing. Xu, Y. (2003) discussed variety rights.
the use of molecular markers in seed qual-
ity assurance including identification of
off-types and false hybrids in rice seed pro-
duction. When a two-line hybrid system is 13.7.2 Plant variety protection
involved, the false hybrids in rice usually in the USA
come from the selfing of the female parents
due to the sterility instability of environ- In the USA, IPR for plants are provided
ment genic male sterility lines caused by through plant patents, PVP and utility pat-
temperature fluctuation beyond their criti- ents. Plant patents provide protection for
cal temperature for fertility conversion. The asexually reproduced (by vegetation) cul-
false hybrids that co-exist with real hybrid tivars excluding tuber crops. PVP provides
seeds can also happen to other hybrid crops protection for sexually (by seed) reproduced
because of various reasons. cultivars including tuber crops, F1 hybrids
and EDVs. Utility patents currently offer
protection for any plant type or plant parts.
A plant cultivar can also receive double
13.7 Plant Variety Protection protection under a utility patent and PVP.
Practice The US Plant Variety Protection Office
is responsible for administering the PVP
13.7.1 Plant variety protection in the EU Act, which provides plant cultivar own-
ers with exclusive marketing rights within
The European Communitys plant variety the USA. The requirements of protection
rights system (CPVO) was established in are that the cultivar be new, uniform, sta-
1994. The IPR granted under this system ble and distinct from all other cultivars. The
are valid throughout the 25 member states PVP Act states that a novel cultivar is dis-
of the EU. Most of the members of the EU tinct when it clearly differs by one or more
are members of UPOV. The system is in line identifiable morphological, physiological,
with UPOV 1991. It provides a one applica- or other characteristics . . . from all prior
tion, one procedure, one examination, one cultivars of public knowledge. The mean-
decision approach to the granting of rights. ings of characteristic and identifiable
542 Chapter 13
are purposefully vague in this definition to Crop cultivars may be protected under the
allow for future advances in knowledge and legislation for a period of up to 18 years. All
methodology. plant species are eligible for protection.
PVP Office protection applies to culti- The owners of new cultivars who
vars that are sexually (seed) reproduced or receives a Grant of Rights will have exclu-
tuber propagated and F1 hybrids. Cultivars sive rights over the use of the cultivar and
sold or used in the USA for longer than 1 will be able to protect their new cultivars
year or more than 4 years in a foreign coun- from exploitation by others. To be protected,
try are ineligible for protection. Fungi and a cultivar must be new, distinct, uniform
bacteria are specifically excluded by the and stable.
PVP Act. Asexually propagated crops fall The Plant Breeders Rights Office, which
under the purview of the US Patent Office. is part of the CFIA, functions to secure the
A Certificate of Protection remains in rights of plant breeders by granting pro-
effect for 20 years from the date of issue, or tection for their new cultivars. It reviews
25 years in the case of vines or trees. There and accepts applications, conducts site
are two exemptions to the rights granted. examinations, reviews data and compara-
One exists to allow farmers to save seed for tive descriptions, publishes descriptions of
use on their own farm. Another exemption cultivars and comparative photographs and
allows research to be conducted using the grants rights.
cultivar. This allows for the free exchange of
germplasm within the research community.
Important events in the US history
of IPR for plants and agriculture include: 13.7.4 Plant variety protection
(i) hybrid cultivars could be protected in developing countries
through trade secrecy (1930s); (ii) the Plant
Patent Act (1930), administered through Systems for IPR have been recognized for
the US Patent Office, provided protec- more than a century, yet until recently IPR
tion for asexually propagated plants only have not been an issue in the plant breeding
(plants reproduced through buds or graft- and seed sector in most developing coun-
ing) including horticultural crops and nurs- tries. Developing countries are being urged
ery stocks, with potato excluded; (iii) the to strengthen IPR to foster innovation and
Plant Variety Protection Act (1970), with expand trade. The field of agriculture is no
the goal to promote commercial invest- exception and the TRIPS Agreement requires
ments in plant breeding, provided patent- all WTO members to provide either patent
like protection for plants reproduced by or sui generis protection for plant cultivars.
seed; and (iv) the Utility Patent of living Developing countries will almost certainly
organisms (1980), as shown in Diamond look towards sui generis options for PVP to
v. Chakrabarty Supreme Court decision in meet their TRIPS obligations (Tripp et al.,
1980, established that anything under the 2006). IPR are being introduced or strength-
sun made by man is patentable, broadened ened in developing countries as a result of
patent law to encompass living organisms the TRIPS Agreement of the WTO, bilateral
and established ownership of plant culti- trade negotiations and pressure from export-
vars, traits, parts and processes. oriented sectors in agriculture.
Most developing countries are in the
early stages of implementing and/or enfor-
cing IPR related to plant cultivars. The
13.7.3 Plant variety protection in Canada use of IPR in plant breeding in develop-
ing countries raises a number of important
The Canadian Plant Breeders Rights Act issues, including smallholders access to
came into force on 1 August 1990. The legi- technology, the role of public agricultural
slation makes it possible for plant breeders research, the growth of the domestic private
to legally protect new cultivars of plants. seed sector, the status of farmer-developed
IPR and PVP 543
cultivars and the growing northsouth tech- policy makers (food security, health and
nology divide that restricts access to plant nutrition, employment). In the context of
germplasm and research tools (IBRD/World plant breeding in the developing world,
Bank, 2006). PPB is breeding that involves close farmer
Relatively few developing countries researcher collaboration to bring about plant
have any significant experience with pro- genetic improvement within a species.
tecting cultivars. In systems where there PPB is seen as a way to overcome the
is heavy emphasis on hybrid cultivars and limitations of conventional breeding by
considerable commercial competition, such offering farmers the possibility to choose,
as those in China and India, most interest in their own environment, which cultivars
centres on PVP for parent lines and hybrids, better suit their needs and conditions. PPB
particularly in rice and maize. In countries exploits the potential gains of breeding for
where the production of ornamental plant specific adaptation through decentralized
materials is important, these materials dom- selection, defined as selection in the target
inate PVP applications. environment and is the ultimate conceptual
The protection of transgenic crops has consequence of a positive interpretation
proven particularly difficult in developing of genotype-by-environment interactions
countries. Most experience with transgenic (Ceccarelli and Grando, 2007). As one of the
crops resolves around Roundup Ready models, selection is conducted jointly by
soybean and Bt cotton. IBRD/World Bank breeders, farmers and extension specialists
(2006)s report shows that the presence of in a number of target environments and the
IPR systems is not necessarily correlated best selections are used in further cycles of
with the effectiveness of controlling access recombination and selection.
to seed of transgenic cultivars. In developing countries, plant breeding
Pressure to strengthen IPR in plant in the public sector is seldom a profit-making
breeding in developing countries presents activity. Public sector plant breeders rarely
both immediate and long-term challenges to make financial gains from their released prod-
policy makers and development investors. ucts. This is unlikely to change if plant breed-
The immediate challenges are related to ers rights are introduced. Hence the issue of
framing and implementing appropriate legi- how to reward farmers is not complicated
slation that is consistent with TRIPS and by a need to divide profits. Farmers partici-
that supports national agricultural develop- pating in breeding programmes benefit from
ment goals. The long-term challenges are early access to new material, gain recognition
derived from the fact that an IPR regime, on from the community and learning new tech-
its own, is not likely to provide the incen- niques. In Nepal, farmers involved in PPB
tives that elicit the emergence of a robust have gained all of these benefits and have
plant breeding and seed sector; attention to sold seed of the new cultivar at a higher price
other institutions and the provision of an than the local landrace (Witcombe, 1996).
enabling environment are also necessary
(IBRD/World Bank, 2006). Collaboration
and understanding between the south and
the north should be strengthened for a bet- 13.8 Future Perspectives
ter worldwide PVP.
13.8.1 Extension and enforcement
13.7.5 Participatory plant breeding A PVP system will not meet its goals unless
and plant variety protection it is supported by the full range of stakehold-
ers. Breeders, seed producers, traders and
Participatory plant breeding (PPB) is the farmers need to understand the objectives of
development of a plant breeding programme the system in order to comply with it. The
in collaboration between breeders and farm- development of a PVP system should thus
ers, marketers, processors, consumers and include an extensive information campaign
544 Chapter 13
involving all stakeholders, including the legal tivar for protection, which crops to protect
profession. One of the major challenges for a first, how to recruit personnel with requisite
PVP system is providing effective enforce- technical and legal capacities and how the
ment. Establishing elaborate restrictions on authority can pursue cost recovery while
seed use is counterproductive if there is no ensuring that small players can afford to apply
enforcement capacity. Private companies and for protection (Tripp et al., 2006). Research
public institutes that lobby for the establish- managers and policy makers responsible for
ment of PVP must be made aware that most public research, who are commonly in favour
enforcement responsibilities will fall on their of using IPR in public sector breeding, have
shoulders. Likewise, identifying offenders to consider the potential impact on breeding
is of little use if the court system is unable strategies and on the costs and benefits before
to understand or interpret PVP legislation. giving their unconditional support to IPR in
Developing judicial experience in PVP may plant breeding and their use in public agri-
take some time (Tripp et al., 2006). cultural research (Louwaars et al., 2006).
The following next steps should be Developing molecular techniques for
taken in moving forward as suggested for IPR in plant breeding requires greater atten-
Canada by CFIA/NFS (2005), which should tion to strengthening capacities in national
be applicable to other countries: patent offices. As new methods of cultivar
identification become available, the PVP
1. Develop standardized protocols.
Office should consult with the plant breed-
2. Update marker systems.
ing community and research experts to best
3. Develop stakeholder agreement on thresh-
use these procedures. On the other hand, new
old levels and techniques to be used.
tools also raise some concerns, including
4. Work towards harmonization of crop-
legal considerations relating to conformity
specific protocols as they relate to PBR both
with the UPOV Convention and the poten-
nationally and internationally.
tial impact on the strength of protection. For
5. Develop a means for validation of tests
example, countries that use transgenic culti-
and accrediting laboratories.
vars will need to ensure adequate protection,
6. Review current national and interna-
although in many cases credible enforcement
tional crop specific projects to determine
of the right combination of biosafety regula-
all possible available markers.
tions, seed laws and PVP may offer adequate
7. Initiate research projects for selected
protection for transgenic cultivars, at least in
species.
the early stages of their availability in devel-
8. Canada should establish and lead a BMT
oping countries (IBRD/World Bank, 2006).
subgroup on barley and possibly one on
Not all crops need to be covered by
peas; and should participate in existing soy-
PVP initially and choices should be made
bean, wheat and canola BMT subgroups.
about which crop-breeding efforts would
9. Improve Canadian involvement in and
benefit most from IPR. With respect to
feedback to Canadian experts and stake-
public plant-breeding efforts, policy mak-
holders from UPOV BMT meetings.
ers must distinguish between situations in
10. Explore further collaboration with the
which PVP will help stimulate the deploy-
National Forum on Seed.
ment of crop cultivars developed by pub-
lic institutes and those in which PVP may
turn national research institutes away from
13.8.2 Administrative challenges their public mandate. A further decision
for implementing PVP involves the protection afforded to extant
(usually public) cultivars. Given that the
In addition to establishing a framework for rationale for IPR is to provide incentives for
PVP legislation, there are administrative future breeding, rather than to reward past
challenges for implementing PVP, including achievement, it seems reasonable to limit
decisions on where to house the new author- the protection periods for extant cultivars
ity, how to establish eligibility of a new cul- (Tripp et al., 2006).
IPR and PVP 545
The general concept of a PVP-type sys- date of a PVP application during which
tem is appropriate and important to pro- the breeders exemption would not be
vide affordable IP for plant breeders while available for UPOV-protected material
retaining the availability of germplasm as including commercialized cultivars.
an initial source of variation in breeding. Requiring a seed deposit for all UPOV-
PVP remains especially important to pro- related applications.
vide IP for successful breeders who, either Requiring the disclosure of all material
because of the incredible and still largely deposited with PVP applications at the
incomprehensible complex biology of their end of x years and making all material
crop species or through lack of expensive deposited available for research under
technology cannot describe an individual the breeders exemption at the end of x
gene and its agronomic impact, but who, years unless the disclosure and availa-
none the less, develop improved cultivars bility would be in conflict with a utility
that are needed in agriculture, horticulture, patent on the same material.
or forestry. Other forms of IP (trade secrets, Placing all UPOV-related deposits
contracts, patents) are also important. (excepting parents and synthetics) into
The use of molecular techniques for the public domain following expiration
cultivar registration needs to be harmonized of UPOV protection.
with its use for PBR. To do this, international Creating a PCT (Patent Cooperation
agreement on methods and procedures need Treaty)-like system to facilitate filing
to be established. There may be legal prob- of PVP applications on an international
lems associated with the use of molecular basis.
markers that may require third party verifi- Providing for and facilitating under
cation. Related government agencies should UPOV global benefit sharing consist-
act as coordinators or verifiers of molecular ent with the International Treaty on
markers and accredit or certify laboratories Plant Genetic Resources for Food and
that wish to perform molecular techniques. Agriculture.
The PBR Office may not be responsible for
establishing standards or thresholds or for Janis and Smith (2007) made two novel
review of molecular markers. Instead, it and provocative claims in Obsolescence in
would be handled in a similar manner as intellectual property regimes. They first
botanical descriptions. argued that the legal regime for protect-
ing new plant cultivars has become hope-
lessly outdated in light of recent changes
13.8.3 The need to update UPOV in technology. They next asserted that the
fate of the PVP system illustrates a broader
and more disturbing phenomenon in IP
UPOV was updated once due to changes in
law the potential for sui generis, indus-
technology. It is time to update the provi-
try-specific IP regimes to become increas-
sions once again to accommodate advances
ingly ineffective over time. Helfer (2006)
in technology that have occurred since 1991,
believed that Obsolescence in intellectual
in order to encourage continued infusions
property regimes offered an insightful
of new germplasm into breeding pools. As
legal analysis of PVP, one of IP laws least
suggested by Donnenwirth et al. (2004),
understood sui generis regimes and that the
these UPOV updates should include:
article also made a persuasive case that the
Providing compensation for and limits lynchpin of the PVP system is outdated and
on saved seed in all countries. needs to be replaced with more flexible
Making the EDV system more effec- unfair competition principles. International
tively defined to avoid technological and domestic policy makers interested in
loopholes. advancing innovation in the plant breeding
Revising the breeders exemption to industry and legal scholars concerned with
include a period of x years from the the ever-evolving relationship between law
546 Chapter 13
and technological change would do well the matter, however, lies in the application
to consider the arguments from Janis and of conflicting conventions and protocols in
Smith (2007). respect of genetic resources and biotechnol-
ogy: genetic resources are treated as public
goods, while biotechnology is treated as a
private good (Adi, 2006). Developing coun-
13.8.4 Collaboration in use of genetic tries that claim ownership to a large reserve
resources of the Earths pool of genetic resource feel
that this exposes them to the exploitative
Historically, there has been excellent col- tendencies of multinational corporations
laboration between the US Land Grant (MNC) that are mainly owned by devel-
Institutions and publicly supported IARCs oped countries of the north, considering
in crop improvement efforts. A hallmark of 74% of agbiotech patents held by six gene
the collaboration has been the free exchange giants (Monsanto, Dupont, Syngenta, Dow,
of plant germplasm and information. Now Aventis and Grupo Pulsar) (http://www.
there are increasing restrictions to use and etcgroup.org/upload/publication/247/01/
exchange of germplasm from the USA to com_globilization.pdf). MNC are some-
the IARCs and from the private sector to the times regarded as exploiting the advantages
public sector, although the reverse cannot as well as the weaknesses in the various
happen due to international public good conventions increasingly to monopolize
nature of the CGIAR centres as well as their the seed and germplasm industry, without
agreement with the International Treaty due consideration for farmers and develop-
on Plant Genetic Resources for Food and ing countries (Adi, 2006). It will take a long
Agriculture. This situation results in the fol- time to introduce a more even playing field
lowing consequences: that is mutually favourable to both parties
and to establish a better regime of benefit
restricted access and use of germplasm;
sharing that recognizes farmers or indig-
legal costs and enforcement;
enous rights alongside patents and plant
restrictions on progeny and publications;
breeders rights.
joint ownership of progenies and dis-
Publicprivate partnerships will need
coveries;
to be established to manage IP issues
complication caused by biotech patents
related to the transfer of information, mate-
on single genes and processes;
rial or technologies from private compa-
public programmes increasingly being
nies to developing countries (Naylor et al.,
unable to access and use technology; and
2004). The African Agricultural Technology
companies becoming increasingly restric-
Foundation is one initiative that has been
tive and demanding.
established to deal with such issues. Several
On the other hand, international col- private corporations with major invest-
laboration in the use of genetic resources ments in MAS in maize have agreed to pro-
becomes increasingly important and IPR vide access to germplasm and knowledge
issues are worth more attention. The recent for African countries (Naylor et al., 2004;
revolution in the field of biotechnology has Delmer, 2005).
triggered off another round of controversy
between the developed countries of the
north and the developing countries of
the south concerning access to genetic 13.8.5 Technology and intellectual
resources and equitable sharing of its ben- property interaction
efits. Developed countries, as the genetic
resource poor, assert ownership claims on Technology can be a two-edged sword with
associated technologies, while developing respect to the effective level of IPR and
countries, as the genetic resource rich, claim the utilization of genetic resources. While
ownership of genetic resources. The heart of technology can facilitate the use of genetic
IPR and PVP 547
resources, it can also be used in a fashion An inbred containing the key genetics
that threatens to undermine existing levels of the male parent of a hybrid (hitherto
of IPR. Donnenwirth et al. (2004) gave the essentially impossible to access via a
following examples: hybrid) can similarly be recreated and
used.
Molecular marker technologies can be
used to attack trade secrets by rapid
identification of female parent inbred
line contaminants in bags of hybrid 13.8.6 Seed saving and plant variety
seed. These inbred lines might then be protection
used directly as parents of hybrids or as
parents for further breeding. Seed saving is a historical cultural phe-
Molecular marker technology can be nomenon that dates back to the beginning
used to identify segregating molecular of agriculture itself. It helps farmers control
characteristics in an otherwise uni- their enterprises and maintain their inde-
form cultivar and thus to select a dis- pendence; it allows them to predict how
tinct new cultivar from the segregating well a crop will perform in the following
source without any breeding effort being season; it allows them to participate in
expended. maintaining the crop; it serves as insurance
An existing cultivar could be trans- against inadequate supplies of seed; it helps
formed by genetic engineering and to maintain food security; and it creates a
thus achieve cultivar status by virtue of viable market that ensures that seed prices
its distinctness but without any effort remain affordable (Mascarenhas and Busch,
expended to change the genetic base of 2006). Because a seed contains within itself
the cultivar. the means for its own reproduction, seeds
An existing cultivar could be changed have offered a particularly large stumbling
just sufficiently and even only cosmeti- block to capital accumulation. In the
cally using marker-assisted breeding so USA, IPR legislation and Supreme Court
that it retains the important agronomic decisions have played a profound role in
attributes of the initial cultivar but overcoming these unique characteristics.
would evade the dependency resulting According to the ETC Group (2005) the top
from its status as an EDV through selec- ten seed companies including Monsanto,
tion for a molecular marker profile that Dupont and Syngenta now account for
is sufficiently different from the initial an estimated market value of US$21 bil-
cultivar. lion for commercial seed sales worldwide
An existing cultivar could be changed and about 50% of the global seed market.
dramatically in its overall DNA marker Mascarenhas and Busch (2006) argued that
profile yet contain some or all of the the combination of expanding IPR, new
key genetics impacting important agro- GM technology and the ideology of the
nomic traits due to targeted selection of technological treadmill have successfully
its genetics using molecular marker or overcome seeds inherent obstacles to capi-
genomics data. talist accumulation. As a result, US farmers
An inbred containing the key genetics are facing further loss of control of the farm
of the female parent of a hybrid can be production process.
rapidly recreated using one or a suite For example, US large soybean farmers
of technologies including di-haploidy, have consistently saved seed in the USA
molecular markers, genomics, winter as much as 60% in some years. However,
nurseries and high-throughput labora- with the introduction of Roundup Ready
tory genetic profiling and screening. soybeans the nature of seed saving was
The inbred can then either be used as drastically changed. Savings rates have
a parent of a hybrid or as a parent for ranged from a peak of 63% in 1960 to 33%
further breeding. in 1991. The decline in saved soybean
548 Chapter 13
seed from 1955 to 1974 before GM soy- per bushel. The decline in seed saving has
bean was approximately 1.4% year1. shifted a significant portion of the value of
However, with the introduction in 1996 bin-run seed from farmers to commercial
of Monsantos Roundup Ready soybean, seed retailers and their parent owners. The
a genetically modified herbicide tolerant value of bin-run seed in 2000 was about
cultivar, the rate of decline in soybean seed US$170 million or approximately half its
saving increased to 2.3% year1 from 1996 value before the introduction of Roundup
to 2002. Ready soybeans. This decline in bin-run
More remarkable perhaps has been the seed amounted to approximately US$374
intensive adoption of Monsantos Roundup million in additional profits in 2001 to
Ready soybean since its introduction, e.g. commercial seed retailers (Mascarenhas
Monsanto accounted for 91% of the world- and Busch, 2006).
wide GM-soybean area in 2004 (ETC Group, The above information is not to say that
2005). However, Ervin et al. (2000) suggest farmers who adopted Roundup Ready seed
that when examined worldwide, all cur- necessarily lost money. The major draw of
rently available transgenic crops account for Roundup Ready soybeans was that they
a yield increase of no more than 2%. On the required less farmer labour and manage-
contrary, government data sources reveal ment time. This point is significant, particu-
that in some areas seed saving has all but larly when one recalls that the persistence
ceased (USDA, 2002a). In order to explain of family farms has been through their abil-
the apparent contradiction, Mascarenhas ity to self-exploit farm labour. Furthermore,
and Busch (2006) invoked the theory of GM soybeans are relatively simple to use
technological treadmill (Cochrane, 1993) and increased flexibility in herbicide appli-
and they considered the rapid adoption of cation provided that one used a glypho-
Roundup Ready soybeans a classic exam- sate herbicide such as Roundup allows
ple of the technological treadmill. As spraying to occur throughout most of the
the theory suggests, given the inability of crop cycle. This flexibility also fits well
farmers to affect the prices they receive for with conservation tillage and other produc-
their commodity crops, farmers can only tion inputs currently in practice (USDA,
increase their profits by adopting new tech- 2002b).
nologies that decrease their costs. However, The decline in seed saving that has
only early adopters of new technology gain happened in the USA as shown by soybeans
because the efficiencies usually in terms and would also be expected elsewhere in
of increased profits gained from wide- the world bring up two important issues.
spread adoption itself pushes the prices First, to prepare for the natural disaster
received by all farmers downwards, thus or civil disturbance (such as particularly
abolishing any comparative advantage. severe weather and global climate change),
When confronted with the rapidly expand- it is essential to domestic food security
ing technologies of natures production that farmers: (i) already have some saved
farmers are left with few options: loyalty seed on hand; and (ii) have the requisite
to the technological treadmill or exiting skills needed to properly save that seed
the industry all together, the latter being an skills that are only maintained if seed can
option few are willing to consider. be regularly saved. The second issue is asso-
For as rapidly as seed saving has ciated with genetic diversity that might be
been declining, the cost of seed has been narrowed down because of the decline in
rising. For example, in 1975 a bushel of seed saving. If this narrowing continues it
soybean seed cost US$7.34. Twenty years will result in great homogeneity in domestic
later, in 1994, it was US$12.21. However, crops that are planted across expansive con-
in 1997, 1 year after the introduction of tiguous areas. And, as demonstrated in the
Roundup Ready soybeans, the price of case of the 19721973 southern corn leaf
soybean seed jumped to US$17.40 and blight, this lack of planted biodiversity can
6 years later, in 2003, sold for US$24.20 prove to be very costly.
IPR and PVP 549
In previous chapters, we have discussed been driven by the availability and acces-
genetic variation as it relates to plant breed- sibility of various types of information.
ing and the molecular tools used for the The first computer network, ARPAnet, was
dissection, transfer and selection of novel developed in the late 1950s as a product
traits and genes. Using these molecular of the Cold War. By the 1980s, universities
techniques may result in the generation of a throughout North America and Western
large amount of data. Extracting useful infor- Europe were connected via countrywide
mation from this ocean of data requires the networks such as the UKs Joint Academic
integration of different sources of data and Network (JANet). Molecular biologists were
the ability to analyse and visualize the data regularly logging in to central servers to run
in effective and efficient ways. The genom- sequence analysis programs and transferring
ics projects of the last decade are most often data from one machine to another. In the
carried out within universities, research early 1990s, the World Wide Web (WWW)
institutes and companies, closely allied was invented and turned the Internet into
with laboratories producing large quanti- the worldwide cultural phenomenon that
ties of data. While bioinformatics has been it is today. The WWW has made the con-
involved with the primary data, it has yet cept of a global village developed by
to become focused significantly on applied Marshal McLuhan decades ago into a
areas such as plant breeding. However, this reality. In 1991, Tim Berners-Lee and Robert
situation is beginning to change. Advances Caillou, scientists working at CERN (the
in plant breeding will depend heavily on Organisation Europenne pour la Recherche
how well we can manage and utilize all rel- Nuclaire: European Organization for
evant information (Xu et al., 2009b) In this Nuclear Research) in Geneva, developed
chapter, breeding-related informatics will the Hypertext Transfer Protocol (HTTP) as a
be discussed, including information collec- way of linking and cross-referencing docu-
tion, storage, integration and mining. ments held on different computers. Many
professionals, including plant breeders,
now make regular use of the Internet as an
14.1 Information-driven Plant integral part of their work.
Breeding Through these networks, information
has been transferred at an increasing rate
As in other disciplines, plant breeding and across the world and the quantity of infor-
in particular molecular plant breeding, has mation has been increasing exponentially.
Considering only DNA sequence data, the biology, biochemistry, statistics and compu-
volume of biological information is doubling ter and information science. It involves the
roughly every 6 months. This is faster than use of computer technologies and statistical
the exponential rate of increase in comput- methods to manage and analyse a huge vol-
ing power, as suggested by Moores law (an ume of biological data. Bioinformatics pro-
empirical observation made long ago that vides a common conceptual framework for
has held until today: the doubling of proces- molecular biologists, biochemists, molecular
sor power every 12 months) (Sobral, 2002). evolutionists, statisticians, computer scien-
With the development of high-throughput tists, information technologists and many
technologies, genotypic information includ- others to work together.
ing genetic polymorphisms and gene expres- Databases allow people to organize and
sion profiling, will increase exponentially. manipulate large amounts of data and to
Since the first molecular genetic maps based quickly translate and deliver that information
on restriction fragment length polymor- in useful summaries and formats. A database
phism (RFLP) markers were developed in can be defined as a structured collection of
the 1980s, a significant amount of molecu- records or data which is stored on a com-
lar marker and genetic map information has puter. The database can be queried and the
been generated and become available for records retrieved can be used to make deci-
many plant species. Genotype information sions. The computer software used to manage
is now generated primarily using PCR-based and query a database is known as a database
markers such as simple sequence repeats management system (DBMS).
(SSRs) and single nucleotide polymorphisms The structural description of a data-
(SNPs) and high-throughput systems. These base is known as a schema. The schema
molecular polymorphisms can be accurately describes the objects that are represented in
sized and readily compared across laborato- the database and the relationships among
ries and experiments. them. There are several different database
Historically, plant breeding has been models (or data models) and the most com-
driven by phenotypic information and a mon in use today is the relational model,
large amount of phenotype and pedigree data which represents information in the form of
has been accumulating for many decades in data records in different sets of tables and
plant breeding programmes. A typical exam- the relationships between them.
ple is the use of multi-environment trials There are four main components to
(METs), which have been in general prac- any database application: (i) a method for
tice for most plant breeding programmes entering or editing data usually data entry
for years. Yield trial data in many private screens or import functions; (ii) a data stor-
breeding companies can often be traced age mechanism a way of storing the data
back to the very beginning of each specific on the computer; (iii) a query mechanism
breeding programme. Most breeding insti- to allow users to filter and summarize data
tutions/companies have extensive facilities in structured ways; and (iv) a report genera-
and expertise in collecting phenotype data tor to extract and interpret information from
for various agronomic traits. Integrating the stored data.
this type of information with other sources The first basic concept to understand
of information from genetics and genomics about databases is the difference between
will lead to more efficient use of both types data and information. What we call data
of information for plant breeding. is really a collection of facts in a specified
domain; the facts may be measured values,
observations, responses or even pictures.
Data by itself is meaningless, but once it
14.1.1 Basics of informatics is organized in useful ways, it becomes
meaningful information. Therefore, essen-
Bioinformatics can be considered a combina- tially a database is nothing more than a
tion of several scientific disciplines including tool to organize and access large amounts
552 Chapter 14
of data so that people can turn it into useful 14.1.2 Gaps between bioinformatics and
information. plant breeding
The content of a database determines its
type. The main types of databases include There has been some delay in the uptake
those listed below and a combination of any of bioinformatics within the plant breeding
of them: community. Most bioinformatics databases
are lacking information on phenotypes, traits
Bibliographic: examples are library cat-
and other organism data, largely because
alogues and an article index. A library
bioinformatics grew out of the fields of
catalogue is a database that describes
molecular biology and biochemistry. When
what the library owns. Each item in
applied to plant breeding, bioinformatics
the catalogue describes a book or other
data must be combined with other types of
item in the library. An article index is a
information, including plant phenotype and
database that describes the contents of
information on the environment where the
a particular set of journals, magazines,
phenotype is measured. Therefore, breeding
newspapers and/or other documents.
informatics focuses on the development of
Full text: a full-text database pro-
breeding-centric databases and algorithms
vides the full text of a publication.
and statistical tools to analyse, interpret and
For instance, the research library in
mine these datasets (Xu et al., 2009b).
GALILEO (Georgia Library Learning
Although the recent explosion of genetic
Online) provides not only the citation
and genomic data for a wide range of plant
to a journal article, but often the entire
species has led to a proliferation of publicly
text of the article as well.
available plant databases, this wealth of
Numeric databases: examples are Cen-
knowledge has not yet found its way into
sus Bureau databases and databases for
mainstream plant breeding. There may be
stock market information, each contain-
several explanations for this. First, it is not
ing primarily numeric data (statistics,
obvious to many plant breeders how or if
census data, economic indicators, etc.).
much of the primary information generated
Image databases: these collect only image
in plant genomics can be applied to real-
information (EBSCO host image collec-
life breeding situations. Secondly, breeding
tion, www.ebscohost.com).
requires the integration of information from
Audio databases: those containing MP3
different sources, usually stored in different
or wav files, etc.
databases and managed by different groups
Meta-databases contain information of scientists (for example, pedigree, geno-
about databases. They allow users to search type and phenotype). Thirdly, many of the
for content that is indexed by other data- publicly available tools and interfaces avail-
bases. For example, the Genomes Online able for bioinformatic data are oriented at the
Database (GOLD, http://www.genomeson cellular/molecular level, while most breed-
line.org/) is an Internet resource for access ers are working and thinking at the organ-
to information regarding complete and ism level. Fourthly, until recently, much
ongoing genome projects around the world of the genomic research and therefore the
and JAKE (jointly administered knowledge publically available data has concentrated
environment, http://jake.openly.com) is a on the comparison of genes between species
meta-database of bibliographic databases. rather than the gene diversity within spe-
If you find a citation for an article in one cies required for plant breeding. Therefore,
of the bibliographic databases and want to there is a need to re-orient the tools and
determine if the article is available in full information so that crop researchers and
text in another database, you could do a biologists in general can query and use them
search for the journal in JAKE to get a list properly. As in many informatics projects,
of all the databases that index that specific an essential factor for success in plant bio-
publication and whether those databases informatics will be the ability to integrate
include it in full text. related information and to view and analyse
Breeding Informatics 553
it with tools that support decision-making The success story from the data-hosting
functions. As the volume of information institutions and those for managing large-
continues to increase, the need for such scale genome sequencing programs provide
tools grows. us with a clue that a web-based informa-
Bioinformatics data typically includes tion management system is a better bet for
cDNA and genomic sequence data, genetic modern plant breeding programmes than
maps of mutants, DNA markers and maps, local, stand-alone systems. The web-based
candidate genes and quantitative trait loci information management systems, once
(QTL), physical maps based on chromo- fully developed as stand-alone systems,
some breakpoints, gene expression data and are anticipated to offer several advantages,
libraries of large inserts of DNA such as bac- including the following:
terial artificial chromosomes and radiation
hybrids. Information flow from molecular They provide highly efficient cutting
markers to genetic maps to sequences and edge technology solutions for breeding
to genes has been established. However, institutions/programmes looking to dra-
there is a gap between the sequence-based matically improve quality and reduce
information and breeding-related infor- costs of information management.
mation such as germplasm, pedigree and They provide a universal information
phenotype. We will depend on phenotyp- management system that is suitable for
ing as the basis for the functional analysis all breeding programmes so that end-
of about 40% of genes, even though a com- user application setup and maintenance
plete sequence is available. Therefore, the is simplified in the Internet computing
integration of breeding-related information environment because there is nothing
with genomics databases is required for to install, configure, or maintain on the
genomics-based breeding programmes. users computer. Each institution has
Mayes et al. (2005) discussed how no need to maintain, manage, and inte-
genetic information can be integrated into grate their data using their own facili-
plant breeding programmes to produce cul- ties and personal.
tivars from molecular variation using bio- They accelerate breeding procedures
informatics and what crop scientists might by providing a much more affordable
want from bioinformatics. They examined information management system to
how bioinformatics tools might be used to breeders. The application needs only to
track down the underlying genes controlling be installed, configured and modified
sustainability traits and how these may then on the web server, reducing the risk of
be exploited in plant breeding programmes inconsistent configurations and incom-
using marker-assisted selection (MAS). patible versions of software between
client and the server machines.
They create a knowledge base for typi-
14.1.3 A universal system cal customer support questions; a sin-
for information management and data gle integrated source for all customer
analysis support inquiries; and ability to ana-
lyse information more effectively.
Modern plant breeding needs a stand- They stimulate collaborations by pro-
ardized and widely accepted system viding more accessible approaches to
for information acquisition, deposition, sharing data and sharing intellectual
classification, integration, interpreta- property based on mutual interests.
tion and utilization. Three major types of They provide effective, flexible and
information genotypic, phenotypic and competitive approaches to converting
environmental should be brought under data into knowledge that is critical to
a single umbrella, with comprehensive companies and institutions continu-
tools for integrating, extracting and ana- ously seeking ways to improve their
lysing useful information. products and services.
554 Chapter 14
With such a system, customers can use plant breeding. A rice double haploid (DH)
the power of data management and analy- population derived from IR64/Azucena has
sis, along with computational biology and been shared and used worldwide for the
comparative genomics, to create an intel- genetic mapping of many different traits
lectual property portfolio associated with with hundreds of genes/QTL identified,
their particular cultivars and hybrids. The largely based on the first generation RFLP
International Crop Information System map. However the original phenotypic data
(ICIS) to be discussed later in this chapter has never been shared. The phenotypic data
is under development towards providing collected across laboratories should have
such a universal system to worldwide plant been analysed through a meta-analysis and
breeding programmes, with a great potential fine mapped using the updated genetic map
but a long way to go. consisting of about a thousand SSR mark-
ers, rather than individual efforts using the
first version of the molecular map consisting
14.1.4 Transforming information to of only 175 RFLP markers. The same story
new cultivars can be found in almost all well-studied crop
plants. As both parental lines involved in
A challenge to modern plant breeding is the rice example have been used widely for
how to best utilize all relevant information breeding yield and adaptive traits, a collec-
efficiently and comprehensively, harness- tive effort bringing all related information
ing the power of informatics to support on one page and mining it through an inte-
molecular breeding. Integrated exploration grated data analysis would help transform
of genotypic, phenotypic and environmen- them into new cultivars.
tal information is critical for more efficient
and predictable plant breeding.
A database would organize genetic
information and give breeders the oppor- 14.2 Information Collection
tunity to pose specific questions through
a software interface, helping them make 14.2.1 Data collection procedures
selections and identify desired parents and
progeny. Breeders will be able to look for Planning the research and developing data
particular traits they want to breed for by collection strategies is the first step in
going back through breeding history and research management. Before data collec-
pedigrees to see where traceable character- tion begins, the following questions should
istics could come. be clearly answered: What hypothesis is to
As a result of years of research on genetic be tested? What data needs to be collected?
mapping, allele mining and molecular and How will this data be collected? What
functional diversity analysis of germplasm equipment or supplies are needed for the
collections, we have amassed a large body of data collection?
knowledge regarding the genomic location of Plant breeding information comes
factors/alleles that affect specific agronomic from many different sources and in many
traits and the allelic variation available for different forms, including a description of
utilization in plant breeding, but often it the plant itself, its genotype and pheno-
is not in an easy format for all researchers type and a depiction of the environment
to use. For most of the traits, it is very dif- (Fig. 14.1). What data should be in the
ficult to detect the presence of particular repository depends on many factors, how-
alleles when the lines are only examined ever, if human and computing resources
phenotypically in the field; however, by are not limiting it is advisable to preserve
examining the lines at the DNA or sequence all the historical data, so that it can reana-
level, this becomes possible. Xu, Y. (2002) lysed for new hypotheses and guide new
provided an example to illustrate the impor- research. Whatever system will be built, it
tance of this effort in information-driven should be flexible, because there are clients
Breeding Informatics 555
Genetic analysis
has has
Genotype Germplasm Phenotype
De n
ter Inventory n
n Genetic maps mi s Anatomical
ne n Genealogy ine
n Physical maps s te rm n Developmental
n DNA sequence
De n Field performance
n DNA markers n Stress response
Molecular
n Functional annotation n Transcriptome
expression
n Molecular variation n Proteome
(natural or induced) n Metabolome
n Physiology
Affects
Fig. 14.1. Crop biological concepts, relationships and breeding related information. Modified from
Richard Bruskiewich (ICIS Workshop, 2005, http://www.icis.cgiar.org).
who require minimum data but there are multiple factors including environmental
others who require all. In general, data and measurement errors, multiple replica-
should include germplasm information tions are usually required for most quan-
(passport data, pedigree and genealogy, titatively inherited traits. To check the
genetic stocks), genotypic information data quality and phenotyping reliability,
(DNA markers, sequences, and expression data collected from multiple replications
information), phenotypic information and within a trial can be analysed for between-
environmental information. replication correlation. When a relatively
A reliable data-collection technique large genetic variation exists within the
will ensure that information is systemati- tested population or cultivars, correlation
cally collected in a manner compatible with coefficients should be high, e.g. 0.6 and 0.8
other existing information and it should or higher, for traits with medium and high
take into account the following considera- heritability, respectively.
tions: controls or checks to be used in data Often there is relevant information
collection, sampling method, sample size, that has already been collected by others,
testing sites, replications and previously although it may not necessarily have been
used data-collection techniques. Bias during analysed or published. Taking the effort to
data collection could come from defective locate and review this information is a good
instruments, biased observation, sampling starting point and can help in planning a
errors, etc. Quality control for data collec- more efficient experiment.
tion can be done by checking relevant data
and comparing and contrasting the collected
information with expectations, controls and
hypotheses. In addition, before the data is 14.2.2 Germplasm information
entered into a database, some preliminary
organization and analysis might be needed. Information for a specific germplasm acces-
As phenotyping procedures are affected by sion could include passport data, pedigree
556 Chapter 14
Intergovernmental organizations
Commission on Genetic Resources for Food and Agriculture (FAO): http://www.fao.org/ag/cgrfa/
Consultative Group on International Agricultural Research (CGIAR): http://www.cgiar.org/
Convention on Biological Diversity Secretariat: http://www.biodiv.org/
FAO Plant Genetic Resources: http://www.fao.org/ag/cgrfa/PGR.htm
Bioversity International: www.bioversityinternational.org
CGIARs System-wide Information Network for Genetic Resources (SINGER): http://singer.grinfo.net/
System-wide Information Network for Genetic Resources (SINGER): http://singer.cgiar.org/
National/regional activities
Asian Vegetable Research and Development Center: http://www.avrdc.org/
Information System Genetic Resources: http://www.genres.de/genres-e.htm
Centre for Genetic Resources, The Netherlands: http://www.cgn.wur.nl/UK/
UK Plant Genetic Resource Group: http://ukpgrg.org/
N.I. Vavilov Research Institute of Plant Industry, Russia: http://www.vir.nw.ru/
Southern African Development Community (SADC) Plant Genetic Resources Project: http://www.ngb.
se/sadc/sadc.html
United States Department of Agricutlure (USDA) Genetic Resources Information Network: http://www.
ars-grin.gov/
Chinese Crop Germplasm Information System: http://icgr.caas.net.cn/cgris_english.html
Non-governmental organizations
Conservation International: http://www.conservation.org/
Global Biodiversity Forum: http://www.gbf.ch/
World Resources Institute: http://www.wri.org/
Genetic Resources Action International (GRAIN): http://www.grain.org/
Breeding Informatics 557
By an accession number
externally given
locally given
Is a name for a commercial cultivar
heterozygous
propagated asexually
is a homogenous collection of a single heterozygous genotype
is a clonal traditional cultivar
propagated by crossing
of a small number of inbred lines
by farmer selection
by mass selection
of inbred lines
homozygous
derived from a single plant of a highly inbred population
is a traditional cultivar
is a deliberate mixture of fixed lines
Is the name of a (collection of) homozygous individual(s)
from a breeding population
collected
from a bulk of many plants
is a weed
not a weed
from a single plant
Is the name of a population (collection of heterogeneous /heterozygous individuals)
used for breeding
a recurrent selection cycle name
a recognized genetic stock
a deliberate mixture of populations
named after place of collection
collected off farm
Fig. 14.2. Genealogy ontology: how a germplasm accession is named. From the International Crop
Information System (ICIA) Workshop (2005).
558 Chapter 14
mutants, populations with segregating geno- characteristics describing the markers and
types (recombinant inbred lines (RILs), DHs, how to best apply them to plant breeding.
introgression lines (ILs)), cytogenetic mate- As an example, information about a PCR-
rial (primary trisomics, translocation lines, based marker is given in Table 14.2, which
etc), cell culture lines and gene and DNA is what a molecular database could manage
clones. Genetic stock availability may vary and provide.
greatly from one crop species to another. For There are some databases that have
example, the National Institute of Genetics been developed with features for display-
(Japan) provides information on about ing marker-related information. Despite
11,000 genetic stocks developed in Japan. the large numbers involved with SNPs and
These resources include marker gene test- polymorphism data, presenting SNPs on a
ers, mutant lines, isogenic lines, autotetra- genome browser has become fairly straight-
ploid lines, primary trisomics, reciprocal forward. For example, Ensembl can show
translocation homozygote lines, cytoplasm SNP locations as a track in ContigView dis-
substitution lines and cell cultured lines. plays, with colour coding to highlight those
located in coding, intronic or upstream
areas of genes. Clicking on an SNP produces
a SNPView page with further details includ-
14.2.3 Genotypic information
ing, where appropriate, primers, validation
status, heterozygosity, strain differences
It is genotypic information that funda- and links to entries in variation databases
mentally drives the breeding informatics such as dbSNP and HGVBase (Hammond
and products. Genotypic scores are deter- and Birney, 2004) and Panzea for plants.
mined by the genotype of an individual
or a pooled DNA sample of multiple indi-
viduals. A genotype is determined by Table 14.2. Information associated with PCR-
DNA sequences, the genes encoded by the based DNA markers.
sequences and the gene products translated
from the sequences. Therefore, genotypic Marker per se
information is based on underlying DNA Marker name and synonyms
polymorphisms, which can be detected with Repeat motif/enzyme and repeat length
many different techniques (Fig. 14.1). The Primer sequence
PCR protocol (i.e. annealing temperature and
action of these genes is not always additive,
number of cycles)
thus epistasis, the particular combination of Expected allele size in a control cultivar or a
alleles and genotype-by-environment inter- group of cultivars
action can be of great importance. Number of alleles
Allele frequency (most/least frequent allele)
Signal strength
Molecular markers Allele size/range
Polymorphic information content
Many molecular breeding projects involve
Chromosome location
a large number of molecular markers (hun- Linkage to other markers
dreds to thousands) that cover the whole Images and gel pictures
plant genome. These markers are used to References
genotype, or fingerprint, a large number of Source (inventory)
accessions (an entire or core collection, indi- Patent information
viduals derived from a selected cross, a set of Historical data (e.g. associated with a trait)
landraces or a population). This will create Project data (date, title, germplasm, reports, etc.)
a valuable database of information that can Marker-derived
be used to determine which crosses or indi- Genetic maps (including haplotype block)
Physical maps
viduals may be more valuable than others.
Consensus maps
Information for DNA markers (e.g. Comparative maps
primers for PCR-based markers) includes
Breeding Informatics 559
data. These data are associated with dif- 14.2.5 Environmental information
ferent breeding procedures or stages. The
most systematically collected information Environmental informatics may be viewed
relating to plant breeding is probably yield as a merging of biodiversity and ecological
trial data, which has been accumulating for informatics with geographic information
many years and in many plants, since con- systems (GIS) and other environmental data
trolled plant breeding programmes started. (Fig. 14.1). Environmental data include all
As more and more breeding objectives are the environmental factors that contribute to
added and advanced instruments are devel- crop growth and development, including
oped, many additional phenotypic traits are soil type as well as chemical, moisture and
now measured, such as nutritional charac- nutritional components in the soil, daily,
teristics, chemical responses, stress toler- monthly and annual temperature, humidity
ance, etc. and precipitation profiles, day-length and
Categories of phenotypes of interest to even winds and other climatic factors as
plant breeders include yield and yield com- listed in Table 14.3, plus many environmen-
ponents, product quality and biochemical tal factors, such as drought and cold tem-
characteristics, morphological characteris- perature, that cause stress to crop plants.
tics (e.g. plant height), physiological char- GIS has proven to be of great utility to
acteristics (e.g. flowering time) and abiotic predict the environments in which wild
and biotic stress tolerance. A more compre-
hensive list can be derived from the breed-
ing objectives as listed in Section 1.7. Table 14.3. Environmental factors affecting plants
In addition to phenotypic data and plant breeding.
generated by plant breeders, more phe-
notypes are being generated by physiolo- Soil
gists, geneticists, pathologists and other Texture
biologists for both model and non-model Water content
organisms. New technologies such as RNA Fertility
interference (RNAi) now make genome- Nutrient content
Production index
wide knock-down studies feasible and have
Air
already been applied in a high-throughput Pollutants
manner for many novel characteristics. Emission of CO2
As microarray techniques become widely Light
used in transcriptomics and metabolomics, Light intensity
molecular phenotypic data will keep wid- Day length
ening the definition of phenotype. Temperature
We need efficient, precise and compre- Average, maximum, minimum daily temperature
hensive large-scale phenotyping techniques. Effective temperature
This presents a difficult challenge because Length of available growing period
Water
phenotypes are numerous and diverse and
Humidity
they can be observed and annotated at the Precipitation
molecular, cellular and organism levels. Ground water
Bochner (2003) described the efforts to Water quality
develop new and efficient technologies for Potential evapotranspiration
assessing cellular phenotypes for simple Cropping systems
microbial-cell model organisms such as E. Intercropping
coli and Saccharomyces cerevisiae. Such a Previous crop
system could be exploited for the character- Accompanying organisms
ization of in vitro culture of any plant spe- Root microorganisms
Weeds
cies. Phenotypic profiling through a whole
Pathogens
plant procedure will facilitate phenotypic Insects
data collecting and processing.
562 Chapter 14
value of raw data of all types and facili- information and knowledge (Sobral, 2002).
tate new biological discoveries. For a given Standardization of databases has been
gene, for example, a database could horizon- receiving more and more attention because
tally link sequence, structure, map position it is required for integration across data-
and associated germplasm accessions and bases. Currently the emphasis is in func-
could include related elements pertain- tional genomics but is expanding to other
ing to the expression profile of the gene, fields including plant breeding. A number
its protein structure, example phenotypes of initiatives have proposed standard report-
and environmental factors that affect gene ing guidelines for functional genomics
expression. All this information should be experiments. Associated with these are data
correlated with the genetic resources avail- models that may be used as the basis of the
able for a given crop. design of software tools that store and trans-
At the level of databases, there are three mit experiment data in standard formats.
main ways to integrate information, referred Data standards and the formal data
to as link integration, view integration and descriptions that underlie them may yield
data warehousing. For link integration, a range of benefits including the following
researchers begin their query with one data (Jenkins et al., 2005): (i) consideration and
source and then follow links to related infor- development of best practice and standard
mation in other data sources. View integra- operating procedures, which, in turn ena-
tion leaves the information in its source ble proper interpretation of experimental
database, but builds an environment around results, principled dataset comparison and
the databases that makes them appear to be experiment repetition; (ii) standardized
part of one large system. A data warehouse reporting of experiments and deposition
brings all the data together under one roof and archiving of data associated with pub-
in a single database. Information integration lications or other standard pieces of work;
will be promoted by standardized data col- and (iii) development of databases and veri-
lection, shared vocabularies/terms (ontol- fiable transmission mechanisms for storage,
ogy) and the development of database tools collection and dissemination of results.
that help cross-database querying and paral- When phenotypes are characterized as
lel analysis of related data. a whole (phenome), which are characteris-
tics of organisms that arise via the interac-
tion of the genome with the environment,
there is a need for phenotypic standardiza-
14.3.1 Data standardization tion that has been recognized by breeding
and stock centres. Several projects associ-
Genomics and plant breeding are generating a ated with handling genetic mutants have
large and heterogeneous set of data. Efficient begun to develop a standardized approach
sharing, computational integration and to developing annotation and databases.
accurate scientific interpretation of research Jenkins et al. (2005) described the collec-
outputs will require some agreement about tion of datasets that conform to the recently
the format and semantics of the basic data. proposed data model for plant metabo-
A common set of biological domain models lomics known as ArMet (architecture for
is essential to achieve this goal. A standard- metabolomics) and illustrated a number
ized nomenclature will facilitate database of approaches to robust data collection
searches, comparisons and extrapolations that have been developed in collaboration
throughout model biological systems from between software engineers and biologists.
bacteria to Arabidopsis and rice. Database curators are working to con-
To achieve some of the benefits possible vert raw datasets contributed by researchers
from an integrative approach to biological from their original format, including struc-
questions and information, it is necessary ture, syntax, assumptions, naming rules
to create standards for data interpretation and conventions, into a format compatible
and comparison and its transformation into and contextually consistent with respective
564 Chapter 14
genome databases, while maintaining accu- aim is to build a generic organism database
racy of fact and interpretation. In addition, toolkit to allow researchers to set up a genome
curators help users access and query the database off the shelf. Recent developments
data and cooperate with other groups to in the Ensembl system include access to
improve the software and data distribution inter-species sequence levels and improve-
infrastructure. ments to the display of polymorphism data
while users can display their own data in the
context of other annotation (Hammond and
Birney, 2004).
14.3.2 Development of generic
databases
The increasing number and types of data- 14.3.3 Use of controlled vocabularies
bases and software applications make it and ontologies
more and more difficult for researchers to
determine which databases to use for vari- The diverse databases reflect the expertise
ous types of information. A universal or and interests of the groups that maintain
updatable database system is required and them. A current limitation of complex anno-
so is an automated updating system. In add- tation and integration is the lack of agreed-
ition, the different ways in which data are upon formats across databases. There are
accessed and presented create an additional many integration challenges. One of the
burden on researchers who seek to apply the most difficult is the one that might seem the
available resources to their research. Using most minor: how do you assign and main-
biodiversity as an example, the difficulties tain the correct names of biological objects
in finding, accessing and using biodiversity across databases?
data include the long history of the bot- A more subtle problem is the clash of
tom-up evolution of scientific biodiversity concepts as users move from one database to
information, the mismatch between the dis- another. An extreme example, first noted by
tribution of biodiversity itself and the distri- Michael Ashburner, considers the use of the
bution of information describing it and most term pseudogene by different researchers
importantly, the inherent complexity of and research communities. To some, a pseu-
biodiversity and ecological data. This stems dogene is a gene-like structure that contains
from numerous data types, the non-existence in-frame stop codons or evidence of reverse
of a common underlying language and the transcription. To others, the definition of
multiple perceptions of different research- a pseudogene is expanded to include gene
ers/data recorders across spatial or temporal structures that contain full open read-
distance or both (Lane et al., 2000). ing frames (ORFs) but are not transcribed.
Emerging technologies to solve these Some members of the Neisseria gonorrhea
problems have been proposed, such as research community, meanwhile, use pseu-
the BioMOBY initiative (Wilkinson et al., dogene to mean a transposable cassette that
2005). BioMOBY is an international research is rearranged in the course of antigenic vari-
project involving biological data hosts, serv- ation (Stein, 2003).
ice providers and coders whose aim is to There are also more subtle disagree-
explore various methodologies for biologi- ments. The human genetics community uses
cal data representation, distribution and the term allele to refer to any genomic vari-
discovery. In addition, the National Human ant, including silent nucleotide polymor-
Genome Research Institute has funded a col- phisms that lie outside of genes, whereas
laborative project called the Generic Model members of many model organism com-
Organism Database (GMOD; http://www. munities prefer to reserve the term allele
gmod.org) to promote the development and to refer to variants that change genes. Even
sharing of software, schemas and standard the concept of the gene itself can mean radi-
operating procedures. The projects major cally different things to different research
Breeding Informatics 565
communities because it has been refined in globally unique identifiers. One line
as the field of genetics moves forward. holds that object identifiers should point to
Some researchers may treat the gene as the the objects themselves and use a Uniform
transcriptional unit itself, whereas others Resource Locator (URL) syntax. The other
extend this definition to include up- and decouples the notion of the location of a
downstream regulatory elements and still resource from its authoritative source.
others use the classical definition of cistron Dictionaries, encyclopedias and data-
and genetic complementation. Plant breed- base schemas are examples of ontologies,
ers may consider a gene to be a manipulable as are many web-based entities, such as
unit during the breeding process, which can the search engines Yahoo! and Google. One
be as big as a gene complex that is trans- approach to this problem is to have biolo-
ferred in conventional backcross breeding, gists describe and conceptualize common
or as small as a single nucleotide difference biological elements and produce a dynamic,
that can be detected in MAS. controlled vocabulary that can be applied to
It will be increasingly desirable for inter- certain types of organisms. An ontology is
database queries to be performed to exploit simply an organized set of concepts about
comparative genomic and phenomic strate- a specified domain. It generally consists of
gies in order to elucidate functional aspects two components: (i) an indexed controlled
of plant biology and study synteny. However, vocabulary of terms (the concept); and
terms used to describe comparative objects (ii) information about semantic relation-
within and between databases are some- ships between these terms. As sophisticated
times quite variable and limit the ability to types of controlled vocabularies that attempt
accurately and successfully query informa- to capture the main concepts in knowledge
tion in and across different databases. To domains, ontologies are important facilita-
solve this problem, controlled vocabularies tors but they do not, by themselves, lead to
and ontologies become increasingly impor- the integration of biological databases. The
tant. Unique identifiers that are associated existence of a shared ontology allows two
with each concept in biological ontologies databases to be merged with some guarantee
(bio-ontologies) can be used for linking that a term used in one database corresponds
and querying databases (Bard and Rhee, to the same term in the other.
2004). Natural language processing (NLP) There are many ontology projects that
techniques are increasingly being used range from descriptions of mutant pheno-
to automate the capture of new biologi- types in plants to anatomical structures
cal discoveries described in text. A novel in vertebrates (details can be found at the
representational schema, PGschema, was Global Open Biological Ontologies web
developed that enables translation of pheno- site). Imperfect ontologies in biology are
typic, genetic and other related information the gene ontology of terms for protein and
found in textual narratives to a well-defined gene sequences, the minimum information
data structure comprising phenotypic and about a microarray experiment (MIAME)
genetic concepts taken from established (Brazma et al., 2001) and plant ontologies
ontologies along with modifiers and rela- for broader plant-based information (Plant
tionships (Friedman et al., 2006). Ontology Consortium, 2002). Once an
Shared ontologies can help bioinforma- ontology is established, databases need to
ticians agree on how to describe biological be annotated under the agreed terms. At
objects, but they do not necessarily help present, only a few plant genome databases,
them agree on how to name them. The same such as Arabidopsis and rice, have primary
biological object might have multiple names gene ontology annotation.
and the same name might denote multiple In an effort to address the need for con-
objects. One approach is to establish glo- sistent descriptions of gene products in dif-
bally unique identifiers to standardize the ferent databases, the Gene Ontology (GO)
description. There are two main lines of project began as a collaboration between three
thought among groups that are interested model organism databases (Gene Ontology
566 Chapter 14
Scientists have realized that there is a Just as redundant accessions are found in
need to make existing data from differ- germplasm collections, redundant data are
ent organisms simultaneously searchable, unavoidable because duplicate datasets
visible and, most importantly, comparable. using the same genotypes are generated
To look for genes involved in a particular by different research groups or for differ-
trait one has to search different databases ent purposes. The availability of complete
and manually figure out the orthology genome sequences, as well as the flood of
relationships among the relevant genes. other sequence data, is leading to alterna-
These species-specific databases are widely tive views on how these data can be organ-
dispersed and tailored to different objectives ized and interrogated. The high level of
and they store phenotypic data in different redundancy in gene discovery programmes
formats. Considerable handwork is therefore is being condensed through reference to
necessary to compare the phenotype of the consensus or complete genome sequences.
same gene in different organisms. A simple If a complete genome sequence is unavail-
meta-search engine or an interoperable able for a specific crop, closely related
query system for these databases alone does syntenic genomes can be used. The ever-
resolve this kind of problem. increasing size of DNA sequence databases
Productive utilization of databases continues to push bioinformatic capabili-
requires interoperability: that is, the precise ties and there is a growing need to condense
yet flexible interrelating of information from redundant data (Edwards and Batley, 2004).
one database to another. There are, at present, Information integration and redundant data
two major impediments to achieving wide- condensing often are two procedures that
scale interoperability: the state of database can interact and support each other.
protection legislation and computer security
issues (Greenbaum et al., 2005). While most
non-commercial/academic databases may 14.3.6 Database integration
not be overly concerned with the protection
of their intellectual property, they still put up Sequence databases that evolve from rigor-
barriers to entrance and consequently inter- ous and systematic sequencing efforts should
operability, due to concerns regarding the not merely function as warehouses for mil-
security of their computing infrastructure. lions of bases or amino acids. Of particular
For rice and maize, cross-database importance is the ability to attach substantial
querying and display of text objects can genomic information to the sequences. Studies
be implemented using a web-based object- will follow on identifying genes and predict-
oriented query system called the OPM ing the proteins they encode, determining
(Object-Protocol Model) data management when and where the proteins are expressed
tools of Gene Logic Inc. These tools are and how they interact and how these expres-
unique in their capacity to impose a uni- sion and interaction profiles are modified in
form object-oriented data model on an exist- response to environmental signals. Emphasis
ing relational database framework where on the underlying value of genotypic and
users can explore and assemble biological genomic elements must be balanced with a
information from heterogeneous databases. phenocentric approach. One way to address
This query system promotes direct analy- this need is to link the resources containing
sis of collinearity at the nucleotide level in the various types of information, such as
cereal species and may also be applied for genomic data, phenotypic or expression data
exploring multiple crop databases. The fur- and genetic resources.
ther incorporation of datasets from different The different source databases all use
studies will close the loop and create the different gene loci description systems
foundation for meta-integrated databases (i.e. gene indices) and the orthology rela-
that facilitate queries across whole systems. tionships are not always obvious, so many
568 Chapter 14
information found in this way should be from data (Frawely et al., 1991). The major-
carefully evaluated. Be aware that anyone ity of data mining exercises in bioinformatics
can publish almost anything on the Internet, at present are founded on the requirement
so a key factor in assessing the validity of to screen through large, usually sequence-
online information is the reliability of its based, datasets searching for homology.
source. It is important to assess what quali- Bioinformatics has traditionally helped to
fies the individual or organization to publish identify the molecular constituents of the
the information and what their motivation cell and their functions, often described in
for doing so might be. As with any literature relation to a biochemical activity. This has
research, the information found should be included gene finding, motif recognition,
cross-checked and critically evaluated. similarity searches, multiple sequence align-
A search engine provided by Google ment, protein structure prediction, phyloge-
(http://scholar.google.com/) has recently netic analysis and other related methods.
become popular for literature searches. Compared to sequence databases, ex-
Searching by authors, key words or authors tracting information related to plant breeding
affiliation will bring up all related publica- is not an easy task. It is like finding a tiny
tions (articles, books, etc.) with article title, bit of gold in the voluminous portion of ore
author list, the number of citations, etc. taken from a gold mine. Breeding related data
A full article can be browsed from a pro- consist of many inter-related, complex data
vided link. All the articles that cite a spe- types and therefore require complex queries
cific article can be browsed. Therefore, a to search, retrieve and analyse them. Classical
key-word search for a specific topic would multivariate and discriminant statistics are
provide a series of linked information. relevant to many biological data mining exer-
cises. Significant progress has yet to be made
in carrying out systematic or integrated data
14.4.2 Information mining mining for the disparate and complex infor-
mation now available to plant scientists.
The first major goal for plant biologists in Plant breeders may want to mine for
the post-genome era is to understand the the following information: (i) germplasm
function of every gene and how individual information collected across worldwide
gene products interact and contribute to institutions; (ii) markertrait associations
major plant processes. This new challenge reported for traits of interest to specific
for plant functional genomics is destined to breeding programmes; (iii) genes that are
become the most difficult hurdle in plant required for the improvement of traits of
biology and requires the systematic appli- agronomic importance through transforma-
cation of global molecular approaches inte- tion and introgression; and (iv) molecular
grated through bioinformatics. Several tools markers and marker-related information for
are now required to decipher gene function the development of MAS tools.
including the traditional methods of random
mutagenesis, gene knock-out and silencing, Comparative informatics
as well as high-throughout omics disciplines
of transcriptomics, proteomics and metabo- As comparative genetics is now viewed as a
lomics. Mining this genomics information key component to expanding existing know-
and effectively applying it to plant breeding ledge on plant genomes and genes, com-
is a significant challenge indeed. parative bioinformatics remains an essential
strategy of this pursuit. Comparative infor-
Data mining matics facilitates linking the genomes of var-
ious crop species and will provide keys to
Data mining, or knowledge discovery in understanding how genes and genomes are
databases (KDD), has been described as the structured and how they evolve. Through
nontrivial extraction of implicit, previously the identification of synteny, it will be pos-
unknown and potentially useful information sible to isolate genes from crop plants with
572 Chapter 14
large genomes using information about to identify different genes for the same
homologous genes in related crops with or similar phenotype.
smaller genomes. Linkages and interactions
should also be promoted between databases Sequence similarity analysis
of plants and non-plant species.
Horan et al. (2005) clustered all pro- DNA sequence similarity analysis can be used
tein sequences from Arabidopsis and rice to trace allele, gene or chromosomal fragments,
into similarity groups, calculated their identify similarities between sequences or
corresponding alignments, localized their genes and align multiple sequences. Protein
conserved domains and generated distance sequence analysis includes searching for
trees. The resulting datasets provide compre- protein similarity and looking at primary,
hensive information about the similarities secondary and tertiary structure.
and dissimilarities between a monocotyle- BLAST (Basic Local Alignment Search
don and dicotyledon representative with Tool) is a set of similarity search programs
regard to the size, quantity and composi- designed to explore all of the available
tion of their family and singlet proteins. sequence databases regardless of whether
The provided datasets represent a founda- the query is a protein or DNA sequence.
tion for future studies of orthologous and The BLAST services available include: NUCLE-
paralogous sequences of the two species. OTIDE BLAST; PROTEIN BLAST; TRANSLATED BLAST;
The user-friendly Genome Cluster Database GENOMIC BLAST pages (human genome,
(GCD; http://bioinfo.ucr.edu/projects/GCD) eukaryotes, microbial genomes); and spe-
was designed to provide an efficient cluster cialized BLAST pages (VECSCREEN, a BLAST-
mining tool for Arabidopsis and rice, to per- based detection of vector contamination;
form various intraspecific and interspecific IGBLAST, for analysis of immunoglobulin
comparisons and also to retrieve related sequences in GenBank; GEO BLAST, for gene
sequences from other organisms. expression data; and SNP BLAST). These serv-
There are four basic comparative bioin- ices are produced and made available on
formatics analyses, including: the Internet by the US NCBI.
There is a big concept gap between tion management tools are needed. High-
breeding and molecular biology in terms throughput laboratories, often required
of what information is available and by molecular breeding programmes, make
how it can be used. Laboratory Information Management
Many of the systems have been devel- Systems (LIMS) a necessity. LIMS manage
oped independently for phenotypic data, samples, laboratory users, instru-
and genotypic data and used by two ments, standards and support laboratory
different groups of scientists including functions such as invoicing, plate manage-
breeders/agronomists and geneticists/ ment, sample tracking and work flow auto-
molecular biologists. mation. Taking a typical genotyping project
Breeding information has been man- as an example, the LIMS may include track-
aged in most breeding programmes by ing the samples from field to plate and to
using relatively simple tools such as storage, managing the data flow from plate
MS ACCESS and AGROBASE (http://www. to genotyping facilities and to computers
agronomix.mb.ca/), for which less train- and organizing and optimizing experiments
ing is required. However, these tools both internally and externally.
are not suitable for data management Todays trend is to move the whole
and statistical analysis when molecu- process of information collection, manage-
lar data and multiple-resource data are ment, analysis, decision making, review
incorporated. and release into the workplace. The goal of
Insufficient communication between a LIMS is to create a seamless organization
breeders/agronomists and geneticists/ in which:
molecular biologists has contributed Instruments are integrated in the lab
to the paucity of tools suitable for both
network where they receive instruc-
groups. As a result, hardware, database
tions and work lists from the LIMS and
and software support in most breeding
return finished results, including raw
institutions is very limited or very dif-
data, back to a central repository where
ferent from those established in the bio-
the LIMS can update relevant informa-
technology and IT industries.
tion to external systems.
Understanding and communication Laboratory personnel perform calcu-
between IT scientists and breeders and
lations, review and document results
between facility developers and breed-
using online information from con-
ers, is also lacking. This contributes to
nected instruments, reference databases
underdevelopment of information sys-
and other resources using electronic lab
tems designed for plant breeding and
notebooks connected to the LIMS.
the limited use of those currently avail- Management can supervise the lab
able in genomics.
process, react to bottlenecks in work-
Many breeding companies and institu-
flow and ensure regulatory require-
tions, especially in developing coun-
ments are met.
tries, are lacking the personnel and Laboratory participants can place work
facilities for information management
requests and follow up on progress,
which are well developed in the bio-
review results and other documentation.
technology industry.
With several thousand data points
flowing out of the laboratory every day,
14.5.1 Laboratory information timely scoring and delivery of the results
management systems to breeders are basic requirements for an
efficient breeding system. Well-trained
To handle the constant flow of data from assistants for genotyping and scoring,
the lab to the breeder and to integrate infor- coupled with research scientists who can
mation from molecular markers, genetic analyse data in meaningful ways, are the
mapping and phenotyping, many informa- key components for a data management
574 Chapter 14
information with these units in the short tion management for each crop. Meanwhile,
window of time available for most selection a common structure ensures that huge econ-
decisions. This is critical to practical imple- omies are gained by shared commitments
mentation of any molecular-based breed- to training in national agricultural systems
ing strategy. In addition, computational and through collaboration in terms of intel-
tools are required to translate and integrate lectual development, programming, testing
research outputs into a usable form for and maintenance.
plant breeding programmes (Dwivedi et al., Linkages between the GMS and DMS
2007). The International Crop Information provide biological scientists with powerful
System (ICIS) is identified as the key com- querying functionality. The querying cap-
ponent that can link the gene, phenotype abilities of ICIS will not place sensitive data
and environment data with uniquely identi- at risk. To permit researchers to manage their
fied germplasm units used and manipulated own data in parallel with those from other
in breeding programmes. sources, ICIS has a parallel structure of cen-
ICIS is a database system already pro- tral and local versions. This structure pro-
totyped since 1996 by a CGIAR multi- vides local read/write capabilities, allowing
centre group of biologists and information data generated locally to be merged and
scientists (www.icis.cgiar.org) to manage harmonized with the central database at the
and integrate all research data on genetic local users discretion.
resources, crop improvement and resource ICIS must have seamless links to other
management and to link this information information technologies used in agricul-
to global environmental and genomic data ture. The System-wide Genetic Resources
resources (McLaren et al., 2005). ICIS is Program (SGRP) has endorsed ICIS as a criti-
attempting to level the information play- cal initiative in germplasm information sys-
ing field between developed and devel- tems. A current project with SGRP ensures
oping nations and it addresses the CGIAR that ICIS and the System-wide Information
mandate to share research information as Network for Genetic Resources (SINGER)
well as germplasm and technology. Modest exchange data smoothly; another project
resources with strong commitments have so with the Collaborative Research Centre for
far produced an innovative prototype for Molecular Plant Breeding in Australia tar-
tracking and recording generically all the gets linkages between conventional evalua-
processes in germplasm collection, char- tion data and molecular marker data within
acterization, evaluation and development. ICIS. In addition, ICIS is in many ways very
The system has been used or evaluated for complementary and becoming increasingly
rice, wheat, maize, barley, cowpea and com- dependent upon the content and technolo-
mon bean and is used by private and public gies of plant genome databases. As ICIS
breeding programmes. finds itself increasingly drawn towards the
ICIS has a modular structure with a core integration of breeding and field evalua-
consisting of The Genealogy Management tions with associated molecular data, this
System (GMS) which manages data on requirement has inspired ongoing collabo-
nomenclature, origin, development and rations to integrate ICIS data with external
deployment of germplasm and the Data genetic, genomic, transcriptomic and pro-
Management System (DMS) which manages teomic datasets, such as species-specific
and documents characterization and evalua- plant genomic databases, the United
tion data. Specialized user interfaces deliver States Department of Agriculture (USDA)
data views and decision support tools to Gramene comparative genomics database,
crop scientists from different disciplines the European PlaNet group and others.
which access the same data resources lead- Although ICIS has created some
ing to efficient use and re-use of research fundamental components required for mole-
data. The development of distinct crop cular breeding, there are several general
databases (separate ICIS implementations) needs for plant breeding, which still require
are resulting in focused data and informa- a great deal of development: (i) databasing
576 Chapter 14
for all breeding-related information such as possibly several of the following data
climate, soil and phenotype data for selec- types: (i) QTL and (comparative) genetic
tion and target environments; (ii) data min- mapping data from both specific projects
ing for specific breeding purposes such as and public sources of such information as
environment classification, genotype-by- in Gramene, GrainGenes and MaizeGDB;
environment interaction and identification (ii) additional public sequence and anno-
of novel alleles and genetic variation; (iii) tation data, as it becomes available, pos-
modelling breeding processes and selection sibly including molecular marker data
schemes using multiple sources of breeding of various types; (iii) crop mutant data;
information to eliminate some field and lab and (iv) information from other pertinent
tests required for making selection decisions, international plant databases, in particu-
which may be critical for complex traits; and lar, The Arabidopsis Information Resource
(iv) extracting useful information by an inte- (TAIR) and equivalent model organism
grated exploration of the information created databases.
in a specific breeding programme with all To help users choose the most appro-
related information from public databases. priate experimental design and data ana-
Phenotypic and genetic data should be lysis methods and to provide them with a
stored in a generic database such as ICIS or regularly updated selection of appropriate
where necessary in other databases compat- options, the system under development by
ible with a standard informatics platform. the GCP project targets to provide automatic
For example the CIMMYT MAIZE FIELDBOOK, transition for data flow between all permu-
currently used for capturing phenotypic tations and combinations of software to
and trial data in maize, is being integrated be used. This integrated decision support
with ICIS. The functionality developed in system for marker-assisted plant breeding
the MAIZEFINDER software is being utilized (analogous to AGROBASE), called iMAS,
as a data warehouse and an interface for will facilitate an integrated, error-free and
queries on data in ICIS. This integration appropriate data analysis from the begin-
will allow breeders to continue to use the ning to the end of the molecular breeding
functions provided by both MAIZE FIELDBOOK pathway. As an integrated decision support
and MAIZEFINDER, but also allows access system for marker-assisted plant breeding,
to the functionalities of ICIS, such as iMAS was developed to seamlessly facili-
pedigree management and storage for tate marker-assisted plant breeding by inte-
genomic data. ICIS also provides integration grating freely available quality software
with the GCP informatics platform, called involved in the journey from phenotyping
Pantheon (http://pantheon.generationcp. and genotyping of individuals to identifica-
org). This software includes a web-based tion and application of trait-linked markers
search engine, standalone Java graphical and providing simple-to-understand-and-
user interfaces and integration to other third use online decision guidelines to correctly
party software such as ISYS (http://www. use these software programs and interpret
ncgr.org/cmtv), the Genomic Diverstiy and and use their product. Potential useful soft-
Phenotype Connection (GDPC) (http:// ware identified include those for generation
www.maizegenetics.net/gdpc/index.html) of experimental design, biometric analysis
and BioMOBY web service compliant of phenotypic data, building a linkage map,
tools (http://www.biomoby.org/). Through marker identification through QTL analysis,
GDPC the platform has access to the TASSEL marker identification through association
software used for association analysis and analysis and determination of sample sizes
through ISYS access to the visualization required for foreground and background
tools such as the Comparative Map and Trait selection. ICIS should finally integrate with
Viewer (http://www.ncgr.org/cmtv) that are iMAS to make all the software available
useful for QTL mapping and MAS. under one single umbrella.
The data should be cross-linked with Other statistical tools and software
other publicly available data, including should also be incorporated into the same
Breeding Informatics 577
platform through ICIS. One such tool is researchers. Its Basic System offers data man-
CROPSTAT, a computer program for data man- agement, experiment management and sta-
agement and basic statistical analysis of tistical analysis. The Varietal Comparisons
experimental data. CROPSTAT is freely avail- Module compares relative performance
able from www.irri.cgiar.org and has been of cultivars or treatments within a trial or
developed primarily for the analysis of data across all trials, locations and years and also
from agricultural field trials, but many of analyses genotype-by-environment inter-
the features can be used for analysis of data actions. The Advanced Statistics Module
from other sources. The main modules and supports the randomization and analysis
facilities are: of more advanced experimental designs,
spatial analyses of yield trials, multivari-
data management with a spreadsheet;
ate analyses and other advanced statistical
text editor;
analyses. The Pedigree Data Management
summary statistics and scatter plot
Module supports the plant breeding needs
graphics;
of many types of crops. The Image Display
analysis of variance;
Module supports the display of images of
regression and correlation;
cultivars or treatments including the growth
mixed model analysis;
stages, flower colour or shape, plant com-
single site analysis of plant breeding
ponents and characteristics, or molecular
cultivar trials;
markers, for any cultivar or genotype.
cross site and additive main effect
GERMINATE (Lee et al., 2005), devel-
and multiplicative interaction (AMMI)
oped by the Scottish Crop Research Institute
analysis;
and the John Innes Centre (http://germi
pattern analysis of genotype-by-
nate.scri.sari.ac.uk/), is a generic plant data
environment interaction;
management system designed to hold a
generalized linear models;
diverse variety of data types, ranging from
log linear models;
molecular to phenotypic and to allow que-
QTL analysis;
rying between such data for any plant spe-
randomization and layout of experi-
cies. Data are stored in GERMINATE in a
mental designs;
technology-independent manner, such that
display of linear forms for general facto-
new technologies can be accommodated in
rial expected mean squares (EMS); and
the database as they emerge, without modi-
generation of coefficients for orthogo-
fication of the underlying schema.
nal polynomials.
The Plabsoft database (Heckenberger
Although CROPSTAT is an easy-to-use soft- et al., 2008) is a comprehensive database
ware package, it is not suitable for analysing management system (DBMS) for integrating
large-scale data sets. phenotypic and genomic data in academic
and commercial plant breeding programmes.
The database structure is capable of manag-
14.5.4 Other informatics tools ing the following types of data observed in
breeding programmes of all major crops:
There are several informatics tools available (i) germplasm data of any species including
from either private or public sectors. Some pedigree data; (ii) phenotypic data of any
of them have multiple functions relevant traits and trait complexity; (iii) trial man-
to plant breeding, while others only pro- agement data for any field and trial design;
vide specific applications in plant breed- (iv) molecular marker data for all common
ing. Only some representative tools will be types of markers; and (v) project and study
described here. management data. By implementing the
AGROBASE Generation II (http:// database structure into the DBMS, functions
www.agronomix.com/) is a comprehensive have been developed for data import, data
database management and analysis system retrieval and data transfer from and to com-
for agronomists, plant breeders and plant monly used statistical analysis software.
578 Chapter 14
among the available data. Recent movements the availability of specific thesauri to cata-
towards the creation of a scientific society logue and validate terms.
for database curators (http://www.biocura To meet this demand, a public resource
tor.org) and projects that bring together the for mining, filtering and visualizing
efforts of different model organism databases phenotypic data the PROPHECY database
(http://www.gmod.org) provide early hints was designed to allow easy and flexible
that bioinformatics is developing into a more access to physiologically relevant quantita-
coherent discipline of biology. With the mat- tive data for the growth behaviour of mutant
uration of genomics has come the adoption strains in the yeast deletion collection dur-
of standard data formats and schemata for ing conditions of environmental challenges
crop genome information, and it is likely that (Fernandez-Ricaud et al., 2005). We would
future databases will be designed with cross- expect a similar effort in crop plants.
connectivity capabilities as a priority. The Some informatics tools are discussed in
availability of complete genome sequences Chapter 15.
enables further mining for novel promoter
sequences and other regulatory features such
as micro-RNA. This tertiary annotation pro- 14.6 Plant Databases
vides links to both the phenotype and the
complex regulatory mechanisms that govern A list of currently available molecular bio-
development and response to the environ- logy databases can be found at http://www.
ment (Edwards and Batley, 2004). oxfordjournals.org/nar/database/a/. The list
One of the more significant changes to is updated in the January issue of Nucleic
crop genome databases has been the move Acid Research each year. The 2008 update
towards graphical user interfaces that pro- contains 1078 databases, which can be classi-
vide a more user-friendly search envir- fied into 14 categories (Table 14.4; Galperin,
onment. The Ensembl database schema, 2008). In addition, a comprehensive list of
which has a strong emphasis on graphi- databases is available at ExPASy Life Science
cal user interaction, is used in the cereal Directory (http://expasy.ch/alinks.html).
comparative genomic database Gramene Many attempts are being made to
(Liang et al., 2008). No single database can understand biological subjects at a systems
attempt to store all of the possible informa- level. A major resource for these approaches
tion about an organism. Therefore, a key are biological databases, storing large vol-
role of genome browsers is to provide a rich umes of information about DNA, RNA and
variety of links to external databases. As protein sequences, including their func-
genome sequences become available from tional and structural motifs, molecular
more organisms, projects such as Ensembl markers, mRNA expression levels, metabo-
are attempting to provide access to genome- lite concentrations, proteinprotein inter-
wide inter-species comparisons of genomic actions, phenotypic traits or taxonomic
and protein sequences. The same strategy is relationships. As an example of a compre-
needed to develop inter-species crop infor- hensive resource, the NCBI provides analysis
matics resources aimed at serving the plant and retrieval tools for the data in GenBank
breeding community. and other biological data made available
The rapid evolution of the field of through NCBIs web site, in addition to
phenomics the genome-wide study of maintaining the GenBank nucleic acid
gene dispensability by quantitative analysis sequence database (Wheeler et al., 2007).
of phenotype has resulted in an increasing NCBI resources include Entrez, the Entrez
demand for new data analysis and visuali- Programming Utilities, My NCBI, PubMed,
zation tools. Most of the valuable pheno- PubMed Central, Entrez Gene, the NCBI
typic data reside in the public literature, not Taxonomy Browser, BLAST, BLAST LINK (BLINK),
captured in databases. Effective text min- Electronic PCR, OrfFinder, Spidey, Splign,
ing is needed to gather these data as well. RefSeq, UniGene, HomoloGene, ProtEST,
A prerequisite for text mining, however, is dbMHC, dbSNP, Cancer Chromosomes,
580 Chapter 14
Table 14.4. Molecular biology databases: categories and numbers. Summarized from http://www.
oxfordjournals.org/nar/database/a/; the number in parentheses is the number of databases in the
category.
Entrez Genome, Genome Project and related sequence databases, structure databases,
tools, the Trace and Assembly Archives, the genomics databases, metabolic and signalling
Map Viewer, Model Maker, Evidence Viewer, pathways, microarray data and other gene
Clusters of Orthologous Groups (COGs), expression databases, proteomics resources,
Viral Genotyping Tools, Influenza Viral organelle databases and plant databases.
Resources, HIV-1/Human Protein Interaction Some of the databases to be discussed in this
Database, Gene Expression Omnibus (GEO), section are not for plant species, however,
Entrez Probe, GENSAT, Online Mendelian they may be useful in comparative genom-
Inheritance in Man (OMIM), Online ics, genetics and phenomics.
Mendelian Inheritance in Animals (OMIA),
the Molecular Modelling Database (MMDB),
the Conserved Domain Database (CDD), the 14.6.1 Sequence databases
Conserved Domain Architecture Retrieval
Tool (CDART) and the PubChem suite of Nucleotide sequence databases
small molecule databases.
Information relevant to plant breeding The most important DNA sequence databases
may be housed in nucleotide, RNA and protein are listed in Table 14.5 with their URLs
Table 14.5. DNA and protein sequence databases.
DDBJ http://www.ddbj.nig.ac.jp DNA Data Bank of Japan (DDBJ), one of the three major databases for the
International Nucleotide Sequence Database Collaboration
EMBL Nucleotide http://www.ebi.ac.uk/embl The EMBL Nucleotide Sequence Database is maintained at the European
Sequence Bioinformatics Institute (EBI) in an international collaboration with
Database DDBJ and GenBank at the NCBI (USA)
GenBank http://www.ncbi.nlm.nih.gov A comprehensive sequence database that contains publicly available DNA sequences
for more than 170,000 different organisms, obtained primarily through the submission
of sequence data from individual laboratories and batch submissions from
large-scale sequencing projects
EXProt http://www.cmbi.kun.nl/EXProt A non-redundant protein database containing a selection of entries from genome
annotation projects and public databases, aimed at including only proteins with an
experimentally verified function
Breeding Informatics
MIPS http://mips.gsf.de Databases at Munich Information Center for Protein Sequences
NCBI Protein http://www.ncbi.nlm.nih.gov/ The NCBI Entrez Protein database comprises sequences taken from a variety of
entrez/query.fcgi?db=Protein sources, including Swiss-PROT, the Protein Information Resource, the
Protein Research Foundation, the Protein Data Bank, and translations from
annotated coding regions in the GenBank and RefSeq databases
Patome http://www.patome.org Biological sequence data disclosed in patents and published applications, as well as
their analysis information
PIR-PSD http://pir.georgetown.edu The Protein Information Resource (PIR) is an integrated public bioinformatics resource
that supports genomic and proteomic research and scientific studies. PIR has
provided many protein databases and analysis tools to the scientific community,
including the PIR-International Protein Sequence Database (PSD) of functionally
annotated protein sequences
PRF http://www.prf.or.jp/en/index.shtml Protein Research Foundation database of peptides: sequences, literature and unnatural
amino acids
RefSeq http://www.ncbi.nlm.nih.gov/RefSeq The NCBI Reference Sequence (RefSeq) database provides curated non-redundant
sequence standards for genomic regions, transcripts (including splice variants),
and proteins
Swiss-PROT http://www.expasy.org/sprot The UniProt/Swiss-PROT Protein Knowledgebase is a curated protein sequence,
providing a high level of annotation (such as the description of protein function,
domains structure, post-translational modifications, variants, etc.), a minimal level of
redundancy and high level of integration with other databases. It is part of the
581
Universal Protein Knowledgebase (UniProtKB)
Continued
582
Table 14.5. Continued.
TCDB http://www.tcdb.org The Transporter Classification Database (TCDB) is a curated, relational database
containing sequence, classification, structural, functional and evolutionary
information about transport systems from a variety of living organisms
UniProt http://www.uniprot.org UniProt (Universal Protein Resource) is the worlds most comprehensive catalogue of
Chapter 14
information on proteins. It is a central repository of protein sequences and functions
created by joining the information contained in Swiss-PROT, TrEMBL and PIR.
UniProt has three components, each optimized for different uses. The UniProt
Knowledgebase (UniProtKB) is the central access point for extensive curated protein
information, including function, classification and cross-reference. The UniProt
Reference Clusters (UniRef) databases combine closely related sequences into a
single record to speed searches. The UniProt Archive (UniParc) is a
comprehensive repository, reflecting the history of all protein sequences
Breeding Informatics 583
and a brief description. The International base and maintained collaboratively by the
Nucleotide Sequence Database Collabo- Swiss Institute of Bioinformatics (SIB) and
ration is a joint effort of the European the EBI, provides a high level of annota-
Bioinformatics Institute (EBI), the DNA tion, a minimal level of redundancy, a high
Data Bank of Japan (DDBJ) and the US level of integration with other biomolecular
NCBI. The nucleotide sequence databases databases and extensive external documen-
are data repositories, accepting nucleic acid tation. Each entry in Swiss-PROT is thor-
sequence data from the community and oughly analysed and annotated to ensure a
making it freely available. high standard of annotation and maintain
Each entry in a database has a unique the quality of the database. In Swiss-PROT
identifier, which is a string of letters and two classes of data can be distinguished: the
numbers corresponding to that record. This core data and the annotation. The core data
unique identifier, known as the Accession consists of the sequence data, the citation
Number, can be quoted in the scientific information (bibliographical references) and
literature. As the Accession Number is the taxonomic data (description of the bio-
permanent, another code is used to indi- logical source of the protein). The annota-
cate the number of changes that a particu- tion describes the function(s) of the protein,
lar sequence has undergone. This code is post-transcriptional modification (carbo-
known as the Sequence Version and is com- hydrates, phosphorylation, acetylation,
posed of the Accession Number followed by glycosylphosphatidylinositol-anchor, etc.),
a period and a number indicating the spe- domains and sites (calcium binding regions,
cific version. ATP-binding sites, zinc fingers, homeobox,
Since their inception in the 1980s, kringle, etc.), secondary structure, quater-
the nucleic acid sequence databases have nary structure (homodimer, heterotrimer,
experienced exponential growth, with etc.), similarities to other proteins, disease(s)
archives doubling in size about every 18 associated with deficiencies in the protein
months, reflecting advances in sequencing and sequence conflicts, variants, etc.
technologies. TrEMBL (Translation of EMBL nucle-
otide sequence database), the supplement of
Protein sequence databases Swiss-PROT, was created in 1996 to make
new sequences available as quickly as pos-
The protein sequence databases are the sible, since maintaining the high quality of
most comprehensive source of informa- Swiss-PROT is a time-consuming process
tion on proteins, some of which are listed that involves extensive sequence analysis
in Table 14.5. They can be classified into and detailed curation by expert annotators.
universal databases, covering proteins from TrEMBL consists of computer-annotated
all species and specialized data collections entries derived from the translation of all
storing information about specific families coding sequences in the EMBL nucleotide
or groups of proteins, or about the proteins sequence database, except for those already
of a specific organism. Two categories of included in Swiss-PROT.
universal protein sequence databases can be Searches in protein databases have
discerned: (i) simple archives of sequence become a standard research tool in the life
data; and (ii) annotated databases where sciences. To produce valuable results, the
additional information has been added to source databases should be comprehen-
the sequence record. sive, non-redundant, well-annotated and
The Protein Information Resource (PIR) up-to-date. However, the lack of a single
is the oldest protein sequence database. protein sequence database satisfying all
It was established in 1984 by the National four criteria has forced users to search mul-
Biomedical Research Foundation (NBRF) tiple databases. By unifying the PIR, Swiss-
and has been maintained since 1988 by PIR. PROT and TrEMBL database activities, PIR
Swiss-PROT, established in 1986 as an International and its partners, EBI and SIB,
annotated universal protein sequence data- have produced a single worldwide database
584 Chapter 14
TIGR Gene Indices http://compbio.dfci.harvard.edu/tgi Databases to identify and classify transcribed sequences in eukaryotic species
using available EST and gene sequence data
GO http://www.geneontology.org Gene Ontology Consortium database
KEGG http://www.genome.ad.jp/kegg KEGG (Kyoto Encyclopedia of Genes and Genomes) is the primary database
resource of the Japanese GenomeNet service for understanding higher order
functional meanings and utilities of the cell or the organism from its genome
information
Swiss-2DPAGE http://www.expasy.org/ch2d Maintained collaboratively by the Central Clinical Chemistry Laboratory of the
Geneva University Hospital and the Swiss Institute of Bioinformatics (SIB), the
database contains data on proteins identified on various two-dimensional PAGE
and SDS-PAGE reference maps from human, mouse, Arabidopsis thaliana,
Dictyostelium discoideum, Escherichia coli, Saccharomyces cerevisiae and
Breeding Informatics
Staphylococcus aureus
COGs http://www.ncbi.nlm.nih.gov/COG Clusters of Orthologous Groups of proteins (COGs) were delineated by
comparing protein sequences encoded in complete genomes, representing
major phylogenetic lineages. Each COG consists of individual proteins or
groups of paralogues from at least three lineages and thus corresponds to an
ancient conserved domain
ERGO http://www.ergo-light.com ERGO, formerly WIT database, provides links to information about the functional
role of enzymes (via links to data in KEGG); links to NCBI Medline entries for
each enzyme; and links to enzymes and metabolic pathways records for each
enzyme. The database also provides access to thoroughly annotated genomes
within a framework of metabolic reconstructions, connected to the sequence
data; protein alignments and phylogenetic trees, and data on gene clusters,
potential operons and functional domains
wwPDB http://www.wwpdb.org The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as
deposition, data processing and distribution centres for PDB data. The mission
of the wwPDB is to maintain a single Protein Data Bank Archive of
macromolecular structural data that is freely and publicly available to the
global community
Genome Project http://www.ncbi.nlm.nih.gov/entrez/ The NCBI Entrez Genome Project Database is intended to be a searchable
Database query.fcgi?CMD=search&DB= collection of complete and incomplete (in-progress) large-scale sequencing,
genomeprj assembly, annotation, and mapping projects for cellular organisms
585
Continued
586
Table 14.6. Continued.
Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/ Entrez Gene is NCBIs database for gene-specific information with focus on the
query.fcgi?db=gene genomes that have been completely sequenced, that have an active research
community to contribute gene-specific information, or that are scheduled for
intense sequence analysis
Chapter 14
Entrez Genomes http://www.ncbi.nlm.nih.gov/sites/ NCBIs collection of databases for the analysis of complete and unfinished viral,
entrez?db=genome pro- and eukaryotic genomes
ACeDB http://www.acedb.org Caenorhabditis elegans, Schizosaccharomyces pombe, and human sequences
and genomic information
FlyBase http://flybase.org An integrated resource for genetic, molecular and descriptive data concerning
the Drosophilidae, including interactive genomic maps, gene product
descriptions, mutant allele phenotypes, genetic interactions, expression
patterns, transgenic constructs and their insertions, anatomy and images, and
genetic stock collections
Table 14.7. General plant databases.
AgBase http://www.agbase.msstate.edu A curated, open-source, web-accessible resource for functional analysis of agricultural plant and
animal gene products
BarleyBase http://www.barleybase.org An online database for plant microarrays with integrated tools for data visualization and statistical
analysis
Cereal Small http://sundarlab.ucdavis.edu/ An integrated resource for small RNAs expressed in rice and maize that includes a genome
RNA Database smrnas browser and a smRNA-target relational database as well as relevant bioinformatic tools
CR-EST http://pgrc.ipk-gatersleben. A publicly available online resource providing access to sequence, classification, clustering, and
Crop ESTs de/cr-est annotation data of crop EST projects at IPK Gatersleben, Germany
CropNet http://ukcrop.net The UK Crop Plant Bioinformatics Network (UK CropNet) established to harness the extensive
work in genome mapping in crop plants in the UK. The resource facilitates the identification and
manipulation of agronomically important genes by laying a foundation for comparative analysis
among crop plants and model species. A number of software tools have been developed to
facilitate data visualization and analysis
Breeding Informatics
FLAGdb++ http://urgv.evry.inra.fr/projects/ Dedicated to the integration and visualization of data for high-throughput functional analysis
FLAGdb++/HTML/index.shtml of a fully sequenced genome, as illustrated for Arabidopsis
GnoPlante-Info http://www.genoplante.com Integrated and made publicly available the data have been generated for genomics sequence,
transcriptome, proteome, allelic variability, mapping and synteny, mutation data) and tools
(databases, interfaces, analysis software) through a collaboration between public French
institutes and private companies that aims at developing genome analysis programs for
crop species (maize, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis
thaliana and rice)
GeneFarm http://urgi.versailles.inra.fr/ Expert annotation of Arabidopsis gene and protein families
Genefarm
GrainGenes http://wheat.pw.usda.gov Molecular and phenotypic information on wheat, barley, rye, triticale and oats
Gramene http://www.gramene.org A comparative genome mapping database for grasses with both automatic and manual curation
performed to combine and interrelate information on genomic and EST sequences, genetic,
physical and sequence-based maps, proteins, molecular markers, mutant phenotypes and
QTL, and publications
MIPSPlantsDB http://mips.gsf.de/proj/plant/jsf The MIPS (Plant Genome Bioinformatics at the Institute for Bioinformatics) plant Genomics group
focuses on the bioinformatics of plant genomes. It developed from the Arabidopsis Genome
Annotation Group and currently provides the following databases: the MIPS Arabidopsis
thaliana genome database, the maize genome, the rice genome (MOsDB), the Medicago
Genome database, the Lotus Genome database, the Tomato Genome database,
Cis-Regulatory Element Detection Online (CREDO), mips Repeat Element database
587
(mips-REdat), mips Repeat Element catalogue (mips-REcat), and MotifDB
(Continued )
588
Table 14.7. Continued.
MPIM http://www.plantenergy.uwa.edu. A database containing information on the mitochondrial protein import apparatus from a wide
au/applications/mpimp/index. range of organisms, including yeast, human, rat, mouse, Drosophila, Danio rerio, Cenorhabtidis
html elegans, Arabidopsis, rice and Plasmodium falciparum
PathoPlant http://www.pathoplant.de A database on plantpathogen interactions and components of signal transduction pathways
related to plant pathogenesis
ICIS http://www.icis.cgiar.org The International Crop Information System (ICIS) is a database system for the management and
integration of global information on genetic resources and crop improvement for any crop
Phytome http://www.phytome.org A comparative genomics database designed to facilitate functional plant genomics, molecular
breeding, and evolutionary studies. It contains predicted protein sequences, protein family
assignments, multiple sequence alignments, phylogenies, and functional annotations for
Chapter 14
proteins from a large, phylogenetically diverse set of plant taxa
PHYTOPROT http://urgi.versailles.inra.fr/ Clusters of predicted plant proteins
phytoprot
dbEST http://www.ncbi.nlm.nih.gov/ A division of GenBank that contains sequence data and other information on single-pass cDNA
dbEST sequences, or ESTs, from a number of organisms
PLACE http://www.dna.affrc.go.jp/ A database containing cis-element motifs found in plant genes
htdocs/PLACE
Plant DNA http://www.kew.org/genomesize/ A one-stop, user-friendly database for plant genome sizes. The most recent release (release 4.0,
C-values homepage.html October 2005) contains genome size data for 5150 species comprising 4427 angiosperms,
database 207 gymnosperms, 87 monilophytes and lycopods, 176 bryophytes and 253 algal species
Plant Genome http://www.ncbi.nlm.nih.gov/ Providing access to data from large-scale sequencing projects, genetic maps, and large-scale
Central genomes/PLANTS/PlantList. EST sequencing projects
html
Plant MPSS http://mpss.udel.edu MPSS (Massively Parallel Signature Sequencing) is a sequencing-based technology that uses a
unique method to quantify gene expression level, generating millions of short sequence tags
per library. The Plant MPSS databases are the largest publicly available set of tag-based gene
expression data
Plant Ontology http://www.plantontology.org The Plant Ontology (PO) is a collaborative effort among several plant databases and experts in
database plant systematics, botany and genomics to develop simple yet robust and extensible controlled
vocabularies that accurately reflect the biology of plant structures (morphology and anatomy)
and developmental stages
Plant snoRNA DB http://bioinf.scri.sari.ac.uk/cgi-bin/ Small nucleolar RNA (snoRNA) genes in plant species
plant_snorna/home
PLANT-PIs http://bighost.area.ba.cnr.it/ A database for facilitating retrieval of information on plant protease inhibitors (PIs) and their
PLANT-PIs genes
PlantGDB http://www.plantgdb.org A database for plant genomic sequences, in particular ESTs that correspond to fragments of
genes that are actively transcribed under particular conditions (currently with data for 48 plant
species)
PlantProm http://mendel.cs.rhul.ac.uk/mendel. A database for plant promoter sequences
php?topic=plantprom
PlantsP/PlantsT http://plantsp.sdsc.edu PlantsP and PlantsT are plant-specific curated databases that combine sequence derived
information with experimental functional genomics data. PlantsP focuses on proteins involved
in the phosphorylation process (i.e. kinases and phosphatases), whereas PlantsT focuses on
membrane transport proteins
POGs/PlantRBP http://plantrbp.uoregon.edu A relational database that integrates data from rice, Arabidopsis, and maize by placing the
complete Arabidopsis and rice proteomes and available maize sequences into putative
Breeding Informatics
orthologous groups (POGs)
TAED http://www.bioinfo.no/tools/TAED TAED (The Adaptive Evolution Database) is a phylogeny-based tool for comparative genomics
TIGR plant repeat http://www.tigr.org/tdb/e2k1/ Classification of repetitive sequences in plant genomes
database plant.repeats
TIGR Plant http://plantta.tigr.org The database uses expressed sequences collected from the NCBI GenBank Nucleotide
Transcript database for the construction of transcript assemblies. The sequences collected include ESTs
Assembly and full-length and partial cDNAs, but exclude computationally predicted gene sequences
Database
TropGENE DB http://tropgenedb.cirad.fr A database that manages genetic and genomic information about tropical crops
PLEXdb http://www.plexdb.org PLEXdb (Plant Expression Database) is a unified public resource for gene expression for plants
and plant pathogens, serving as a bridge to integrate new and rapidly expanding gene
expression profile data sets with traditional structural genomics and phenotypic data
PLANTS Database http://plants.usda.gov The PLANTS Database provides standardized information about the vascular plants, mosses,
liverworts, hornworts, and lichens of the USA and its territories
UK CropNet http://ukcrop.net/db.html Contains six databases (Arabidopsis Genome Resource, BarleyDB, BrassicaDG, CropSeqDB,
Databases FoggDB, MilletGenes) and mirrors many other plant-related databases
589
590 Chapter 14
for multiple functions. A typical exam- map viewer (CMAP) from GMOD and in the
ple is ICIS, which has been described in Proteins module displays. BLAST is used to
the previous section. As another example, search for similar sequences.
Phytome is an online comparative genomics Some sites host their own databases
resource that is built upon publicly avail- and also mirror other related databases.
able sequence and map information from a The UK Crop Plant Bioinformatics Network
diverse set of plant species, with a focus on (UK CropNet), established to harness the
the angiosperms, or flowering plants. It pro- extensive work in genome mapping in crop
vides an interface to the results from a vari- plants in the UK, is one such example. The UK
ety of phylogenomic analyses. Phytome is CropNet contains six databases (Arabidopsis
designed to facilitate functional genomics, Genome Resource, BarleyDB, BrassicaDG,
molecular breeding and evolutionary stud- CropSeqDB, FoggDB, MilletGenes). It also
ies in model and non-model plant species. mirrors many other plant-related databases
Currently, Phytome contains phylogenetic (Table 14.7).
and functional information for predicted
protein sequences (Unipeptides). Future
development will incorporate data and
tools for analysis of sequence-based com- 14.6.4 Individual plant databases
parative maps.
There are some databases supporting Tables 14.8, 14.9 and 14.10 list databases for
functions for comparative biology. Gramene specific plants. As model plants, Arabidopsis
is one such database for comparative genome and rice databases are listed in separate
mapping of grasses, with both automatic and tables. Although most plant databases differ
manual curation performed to combine considerably in content, the general subject
and interrelate information on genomic matter for a species-specific database may
and EST sequences, genetic, physical and include:
sequence-based maps, proteins, molecular genetic and cytogenetic maps;
markers, mutant phenotypes and QTL and genomic probes, nucleotide sequences;
publications. As an information resource, genes, alleles and gene products;
Gramenes purpose is to provide added value phenotypes, quantitative traits and QTL;
to data sets available within the public sec- genotypes and pedigrees of cultivars,
tor, which will facilitate researchers ability
genetic stocks and other germplasm;
to understand the rice genome and leverage pathologies and the corresponding
the rice genomic sequence for identifying
pathogens, insects and abiotic stresses;
and understanding corresponding genes, taxonomy of the crops and related
pathways and phenotypes in other crop
species;
grasses. This is achieved by building auto- addresses and research interests of col-
mated and curated relationships between
leagues; and
rice and other cereals. The automated and relevant bibliographic citations.
curated relationships are queried and dis-
played using controlled vocabularies and As a model crop, rice has highly diver-
web-based displays. The controlled vocab- sified databases, which results in each data-
ularies (ontologies) currently being utilized base containing specific information, such
include Gene ontology, Plant Ontology, as mutants and T-DNA insertions, or serv-
Trait ontology, Environment ontology and ing as a specific function, such as annota-
Gramene Taxonomy ontology. The web- tion and proteomic analysis (Table 14.9).
based displays for phenotypes include the There are two annotation related databases,
Genes and Quantitative Trait Loci (QTL) one oriented around contigs for high-quality
modules. Sequence based relationships are manual annotation (RAD) and the other pro-
displayed in the Genomes module using the viding a system to integrate programs for
genome browser adapted from Ensembl, in prediction and analysis of protein-coding
the Maps module using the comparative gene structure (RiceGAAS). The evidence
Table 14.8. Arabidopsis thaliana databases.
Breeding Informatics
Arabidopsis MPSS http://mpss.udel.edu/at Arabidopsis gene expression detected by massively parallel signature sequencing
Arabidopsis Nucleolar http://bioinf.scri.sari.ac.uk/cgi-bin/ Comparative analysis of nucleolar proteomes of human and Arabidopsis
Protein Database atnopdb/proteome_comparison
ARAMEMNON http://aramemnon.botanik.uni-koeln.de A curated database for A. thaliana transmembrane (TM) proteins and
transporters
ARTADE http://omicspace.riken.jp/ARTADE A database containing transcriptional structures elucidated by ARTADE which
estimates exon/intron structures of structurally unknown genes based on both tiling
array data and genomic sequence data
ASRP http://asrp.cgrb.oregonstate.edu A database for A. thaliana small RNA project
AtGDB http://www.plantgdb.org/AtGDB A part of PlantGDB Plant Genome Database and Analysis Tools to provide a
convenient sequence-centred genome view for A. thaliana, with a narrow
focus on gene structure annotation
AthaMap http://www.athamap.de Genome-wide map of putative transcription factor binding sites in A. thaliana
ATTED-II http://www.atted.bio.titech.ac.jp A database providing co-regulated gene relationships based on co-expressed genes
deduced from microarray data and the predicted cis elements
DATF http://datf.cbi.pku.edu.cn The database of Arabidopsis Transcription Factors (DATF) collects all Arabidopsis
transcription factors (total of 1922 loci and 2290 gene models) and classifies them
into 64 families
GABI-Kat http://www.gabi-kat.de A Flanking Sequence Tag (FST)-based database for T-DNA insertion mutants
generated by the GABI-Kat project
(Continued )
591
592
Table 14.8. Continued.
MAtDB http://mips.gsf.de/proj/thal/db MIPS (Plant Genome Bioinformatics at the Institute for Bioinformatics)
A. thaliana database
NASCarrays http://affymetrix.arabidopsis.info Nottingham Arabidopsis Stock Centre microarray database
PLprot http://www.pb.ipw.biol.ethz.ch/ A. thaliana chloroplast protein database
Chapter 14
proteomics
RARGE http://rarge.gsc.riken.jp RIKEN Arabidopsis Genome Encyclopaedia (RARGE) contains Arabidopsis cDNAs,
mutants and microarray data
SeedGenes http://www.seedgenes.org Genes essential for Arabidopsis development
SUBA http://www.suba.bcs.uwa.edu.au The Arabidopsis Subcellular Database (SUBA) contains publicly available protein
subcellular localization data from a variety of sources from the model plant
Arabidopsis
TAIR http://www.arabidopsis.org The Arabidopsis Information Resource (TAIR) contains data for A. thaliana
genome
Table 14.9. Rice databases.
Oryza Tag Line http://urgi.versailles.inra.fr/OryzaTagLine A database to organize data resulting from the phenotypic characterization
of a library of T-DNA insertion lines of rice (Oryza sativa L. cv. Nipponbare)
BGI-RISe http://rise.genomics.org.cn Beijing Genomics Institute Rice Information System (BGI-RISe), containing
comprehensive data from O. sativa L. ssp. indica, genome information
from O. sativa L. ssp. japonica and EST sequences available from
other cereal crops. Sequence contigs of indica (93-11) have been further
assembled into Mbp-sized scaffolds and anchored on to the rice
chromosomes referenced to physical/genetic markers, cDNAs and
BAC-end sequences. The rice genomes have been annotated for gene
content, repetitive elements, gene duplications (tandem and segmental)
and SNPs between rice subspecies
WhoGA http://rgp.dna.affrc.go.jp/whoga WhoGA is a rice genome annotation viewer using the GBrowse web-server
Breeding Informatics
application. In addition to predicted genes, WhoGA also includes gene
models for pseudogenes with or without EST/full-length cDNA support,
regions wherein genes could not be modelled although showing
significant homology to known genes, and ORFs predicted by a single
gene prediction program
IRIS http://www.iris.irri.org The International Rice Information System (IRIS) is the rice implementation
of the International Crop Information System (ICIS, www.cgiar.org/icis),
a database system for the management and integration of global
information on genetic resources and crop improvement for any crop
MOsDB http://mips.gsf.de/proj/plant/jsf/rice/index.jsp A resource for publicly available sequences of the rice (O. sativa L.)
genome to provide all available data about rice genes and genomics,
including mutant information and expression profiles
OryGenesDB http://orygenesdb.cirad.fr A database for rice genes, T-DNA and transposable elements flanking
sequence tags
Oryzabase http://www.shigen.nig.ac.jp/rice/oryzabase A comprehensive rice science database with the original aim to gather as
much knowledge as possible ranging from classical rice genetics to
recent genomics and from fundamental information to hot topics
RAP-DB http://rapdb.lab.nig.ac.jp Rice Annotation Project Database (RAP-DB) provides access to the
annotation data. By connecting the annotations to other rice genomics
data, such as full-length cDNAs and Tos17 mutant lines, the RAP-DB
serves as a hub for rice genomics
593
(Continued )
594
Table 14.9. Continued.
RetrOryza http://www.retroryza.org RetrOryza is a database that aims at providing the research community
with the most complete resource on long terminal repeat-retrotransposon
for rice
RAD http://golgi.gs.dna.affrc.go.jp/SY-1102 The Rice Annotation Database (RAD) is a contig-oriented database for
rad/index.html high-quality manual annotation of the Rice Genome Project, which can
Chapter 14
present non-redundant contig analyses by merging the accumulated
PAC/BAC clones
RMD http://rmd.ncpgr.cn The Rice Mutant Database (RMD) contains the information of approximately
129,000 rice T-DNA insertion (enhancer trap) lines generated by an
enhancer trap system
Rice Pipeline http://cdna01.dna.affrc.go.jp/PIPE A unification tool which dynamically collects and compiles data from
scientific databases in National Institute of Agrobiological Sciences
(NIAS) to provide a unique scientific resource of rice that pools publicly
available data
Rice Proteome Database http://gene64.dna.affrc.go.jp/RPD/main_en.html Rice proteome database
RiceGAAS http://RiceGAAS.dna.affrc.go.jp Rice Genome Automated Annotation System (RiceGAAS)
RMD http://www.ricefgchina.org/mutant Rice mutant database
Breeding Informatics 595
for database diversification within a sin- The lack of mutual understanding between
gle species is also apparent from the list of breeders and scientists in other disciplines
Arabidopsis databases (Table 14.8). will continue to be a major limiting fac-
For other plant databases (Table 14.10), tor, so information management systems
two will be briefly described here. MaizeGDB and tools should be enhanced so that they
the Maize Genetics and Genomics Database can be accessed and used by breeders more
provides a central repository for public easily.
maize information and presents it in a way The great proliferation of relevant data-
that creates intuitive biological connec- bases and informatics tools makes them less
tions for the researcher with minimal effort. accessible to most breeders. The use of these
It also provides a series of computational resources is often hampered by the fact that
tools that directly address the questions of they are designed for specific application
the biologist in an easy-to-use form. Its data areas and thus lack universality. As users,
centre contains the following information: breeders have to visit many different data-
data centres; bacterial artificial chromo- bases and use different tool packages for
somes (BACs); ESTs; gene products; locus/ specific purposes, depending on which crop
loci; maps; metabolic pathways; microar- species the breeder works with, the types of
rays; overgos; people/organizations; pheno- information the breeder wants to retrieve
types; probes; QTL; references; sequences; and the different functions the breeder
SSRs; stocks; variations. At CIMMYT, two wants to perform. As a result, knowing how
crop-specific databases for wheat and maize to access and use these databases demands a
(http://iwis.cimmyt.org/ICIS5/) have been significant investment of time and effort.
developed. Data stored in central databases such
Dendrome is a collection of forest as KEGG, BRENDA or SABIO-RK is often
tree genome databases and information limited to read-only access. If researchers
resources for the international forest genet- want to store their own data, they must
ics community. Dendrome is part of a larger either develop their own information sys-
collaborative effort to construct genome tem for managing that data, which can be
databases for major crop and forest species. time-consuming and costly, or they must
The primary genome database of Dendrome store their data in existing systems, which
is called TreeGenes. TreeGenes provides is often restricted. Hence, an out-of-the-box
curated information about genetic maps, information system for managing breeding-
DNA sequences, germplasm, markers, QTL related data is needed. As an example of
and ESTs. The goal of this effort is to pro- such effort, Weise et al. (2006) designed
vide an improved interface for comparison META-ALL, an information system that allows
between maps and to integrate expression the management of metabolic pathways,
and EST data. including reaction kinetics, detailed loca-
tions, environmental factors and taxonomic
information. Data can be stored together
with quality tags and in different parallel
14.7 Future Prospects for Breeding versions.
Informatics As many information systems and
databases are developed through specially
Plant breeding in the future will be largely funded projects, which generally only run
driven by molecular biology and informat- for a specific period of time, they become
ics. Breeding efficiency will depend on how outdated and ill supported. They may also
much information breeders can access and be abandoned completely. Maintaining the
how wisely and effectively they can use it databases and tools that have been devel-
in their breeding programmes. oped requires continuous funding and tech-
Breeding related databases and infor- nical support, which is almost impossible
mation systems have to be improved so if the number of databases and informatics
they are more user-friendly to breeders. tools keep growing at the present rate. One
596
Table 14.10. Databases for other plants excluding Arabidopsis and rice.
Brassica BASC http://bioinformatics.pbcbasc. The BASC system provides tools for the integrated mining and browsing
latrobe.edu.au of genetic, genomic and phenotypic data, hosting information on Brassica species
supporting the Multinational Brassica Genome Sequencing Project
Diatom EST Database http://www.biologie.ens.fr/ ESTs from two diatom algae, Thalassiosira pseudonana and
diatomics/EST Phaeodactylum tricornutum
ForestTreeDB http://foresttree.org/ftdb A resource that centralizes large-scale EST sequencing results from several tree species
Legume Information http://www.comparative-legumes.org The Legume Information System (LIS), formerly the Medicago Genome System Initiative
(MGI), is an EST sequence database and analysis system that supports EST
sequencing at the Noble Foundation Center for Medicago Genome Research
MaizeGDB http://www.maizegdb.org The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for
Chapter 14
maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal
mapping data. In addition, MaizeGDB provides contact information for over 2400
maize cooperative researchers, facilitating interactions among members of the rapidly
expanding maize community
MtDB http://www.medicago.org/MtDB Medicago truncatula genome database
NRESTdb http://genome.ukm.my/nrestdb Natural Rubber EST Database (NRESTdb) serving as a molecular resource for
functional genomics of the rubber tree
Panzea http://www.panzea.org The Panzea Database contains the genotype, phenotype, and polymorphism data
produced by the Molecular and Functional Diversity in the Maize Genome project
PoMaMo https://gabi.rzpd.de/PoMaMo.html PoMaMo (Potato Maps and More), established within the German Plant Genome Project
GABI, harbours information on molecular maps of all 12 potato chromosomes with
about 1000 mapped elements, sequence data, putative gene functions, results from
BLAST analysis, SNP and Indel information from different diploid and tetraploid potato
genotypes, publication references, and links to other public databases like NCBI or
SGN (see below) for example
SGMD http://psi081.ba.ars.usda.gov/SGMD/ Soybean genomics and microarray database
default.htm
SoyGD http://soybeangenome.siu.edu The Soybean Genome Database (SoyGD) genome browser integrates the publicly
available physical map, BAC sequence database and genetic map-associated
genomic data
TED http://ted.bti.cornell.edu Tomato expression database
TIGR Maize database http://maize.tigr.org A repository of publicly available maize genomic sequences
TomatEST DB http://biosrv.cab.unina.it/tomatestdb/ A secondary database integrating EST/cDNA sequence informationindex2.php from
different libraries of multiple tomato species collected from dbEST
Soybean Genome http://www.soybeangenome.org Dedicated to the sharing and dissemination of public information on all aspects of
soybean genomics and the application of genome information to soybean
BarleyBase http://www.plexdb.org/plex.php? BarleyBase is a MIAME-compliant and Plant Ontology enhanced database=Barley
expression database for plant microarray data
Dendrome http://dendrome.ucdavis.edu Dendrome is a collection of forest tree genome databases and other forest genetic
information resources for the international forest genetics community
TropGENE http://tropgenedb.cirad.fr A database that manages genetic and genomic information about tropical crops studied
by the Agricultural Research Centre for International Development (known by its
French acronym, CIRAD), including banana, cocoa, coconut, coffee, cotton, oil palm,
rice, rubber tree and sugarcane
Cotton http://www.cottondb.org A database that contains genomic, genetic and taxonomic information for cotton
Breeding Informatics
(Gossypium spp.). It serves both as an archival database and as a dynamic database
which incorporates new data and user resources
CyanoBase http://bacteria.kazusa.or.jp/cyano CyanoBase provides an easy way of accessing the sequences and all-inclusive
annotation data on the structures of the cyanobacterial genomes
BeanGenes http://beangenes.cws.ndsu.nodak.edu A plant genome database currently containing information relevant to Phaseolus and
Vigna species
SGN http://sgn.cornell.edu The SOL Genomics Network (SGN) is a Clade Oriented Database (COD) containing
genomic, genetic and taxonomic information for species in the Euasterid clade,
including the families Solanaceae (e.g. tomato, potato, eggplant, pepper, petunia)
and Rubiaceae (coffee)
RAPESEED http://rapeseed.plantsignal.cn Shanghai RAPESEED database contains information collected on ESTs, full-length
cDNA, unique serial analysis of gene expression (SAGE) tags, and EMS mutants for
Brassica napus
ICIS http://www.icis.cgiar.org A database system that provides integrated management of global information on crop
improvement and management both for individual crops and for farming systems
597
598 Chapter 14
approach is to develop databases and tools universal language that can be shared across
that need minimum maintenance or that all plant species. Gene Ontology and Plant
can be upgraded or updated automatically. Ontology projects represent a good begin-
Another way is to develop a universal data- ning of such effort. Another universal lan-
base and informatics tool package for infor- guage is also needed that can be used for
mation-driven plant breeding, which needs communications among breeders, database
a worldwide collaboration through a global curators, bioinformaticians, molecular biol-
scientific programme in a way similar to the ogists and tool developers. Breeders should
human genome sequencing project. be a major player rather than an observer in
Developing a universal database or a the development of such a universal data-
database of all databases would require a base or language.
15
Decision Support Tools
Functional Analysis
Differential expressed genes between genetic
stocks (chips and arrays)
REVERSE
GENETICS
Annotation through comparison with
model genome
FORWARD GENETICS
Detailed expressional
Association mapping Linkage mapping Candidate genes
analysis (RT-PCR)
Candidate gene-based
genomics technologies
Fig. 15.1. Flowchart for molecular breeding approaches and outputs. Various molecular breeding
approaches discussed in this book are summarized, including forward and reverse genetics approaches
and their associated breeding outputs. Decision support tools may be needed in each step. G E,
genotype-by-environment interaction.
linkage analysis with various software listed improvement. Decision support tools are
in alphabetic order (http://linkage.rockefel- needed to manage and evaluate crop genetic
ler.edu/soft/list.html). resources and breeding materials including
This chapter provides an overview of key genetic diversity and variation analysis,
decision support tools that need to support population structure evaluation and for
molecular breeding programmes, including hybrid crops, use of the genetic diversity to
germplasm evaluation, breeding popula- define heterotic groups and predict hybrid
tion management, genotype-by-environment performance.
interaction (GEI), genetic map construction,
markertrait linkage and association analy-
sis, MAS and breeding system design and 15.1.1 Germplasm management
simulation. Plant variety protection and and evaluation
breeding information management are dis-
cussed in Chapters 13 and 14, respectively. Genetic resources provide the foundation
of any plant breeding programme. Efficient
germplasm utilization requires well-founded
15.1 Germplasm and Breeding sampling strategies. Genetic diversity analy-
Population Management and sis and its relationship with functional vari-
Evaluation ation of the target trait is the fundamental
basis of germplasm evaluation (Chapter 5).
Germplasm collections and breeding popu- Novel alleles and new genes from both culti-
lations are basic materials required in crop vated and wild relatives provide the engine
Decision Support Tools 601
improvement has led to a number of bio- showing genotype information; (iii) a graph
informatics projects developing new tools drawing tool that could show both pedi-
to improve the power and scope of such grees and phylogenetic trees or networks
analyses. Ambiguous germplasm identifi- (graphs containing closed loops, which
cation, difficulty in tracing pedigree infor- can be used to represent genetic exchange
mation and lack of integration of databases between organisms); (iv) a map tool show-
across genetic resources, characterization, ing the distribution of genetic markers on
evaluation and utilization have been identi- the relevant linkage maps; and (v) a plot-
fied as the major constraints to developing ting tool that could show scatter plots of, for
knowledge-led germplasm enhancement example, diversity distances between pairs
programmes. of accessions and principal components
Visualization tools enable us to simul- (Davenport et al., 2004).
taneously view large quantities of these A high-throughput platform for identi-
data and to identify underlying patterns fying single feature polymorphisms (SFPs)
in our data sets. We also need analytical in complex genomes has been developed
tools to help search for association between by Borevitz et al. (2003). This is based on
the target trait and individual markers or hybridizing Arabidopsis genomic DNA
marker haplotypes and to look for patterns against an RNA expression GeneChip. Their
of genetic diversity with our germplasm informatics analysis involved development
collections. GENE-MINE (http://www.gene- of analytical tools to identify 4000 SFPs by
mine.org; Davenport et al., 2004) was devel- comparing the reference ecotype Columbia
oped to bring together experts in database against Landsberg erecta. A linear cluster-
development, data querying and visualiza- ing algorithm enabled identification of SFPs
tion, quantitative methods and computa- representing potential deletions in 111
tional methods, to develop novel tools for transposons, disease resistance genes and
the analysis of germplasm collections char- genes involved in secondary metabolism,
acterized by molecular markers. Within the at 5% error rate. In crop plants, a genome-
GENE-MINE project, a generic information wide rice DNA polymorphism database has
system was developed for studying the rela- been constructed based on the genomic
tionships between large-scale collections sequences from two subspecies, indica
databases from genebank material such as (93-11) and japonica (Nipponbare). This
molecular marker data, trait phenotype, database contains 1,703,176 single nucle-
passport and environmental data. The sys- otide polymorphisms (SNPs) and 479,406
tem uses a generic model for associating insertion/deletions (Indels), approximately
properties, such as traits, genetic data and one SNP every 268 bp and one Indel every
molecular data, with accession data. Users 953 bp in rice genome (Shen et al., 2004).
are able to make queries using terms from Several commercialized or freely avail-
germplasm query language (GQL), an exten- able software packages, such as STATISTICA, JMP,
sion of structured query language. GQL SAS, NTSYS, GENEFLOW, STRUCTURE and POWER-
allows the definition of specialist query MARKER, can be used for germplasm evaluation
terms that are not held with the database. including principal component or coordinate
For germplasm analysis, these may include analysis to identify distinct groups or popu-
pedigree terms such as grandparent or lations, cluster or structure analysis to find
ancestor, geographical terms such as neigh- population structure. STRUCTURE, developed
bouring country and marker terms such as by Pritchard et al. (2000a), uses multi-locus
haplotype. genotype data to investigate population struc-
Graphical tools for germplasm analysis ture, which can be used to infer the presence
that are considered to be essential to the of distinct populations, assign individuals
GENE-MINE and other similar tools include: to populations, study hybrid zones, identify
(i) a geographical tool that could show the migrants and admixed individuals and esti-
origin of accessions and the distribution mate population allele frequencies in situa-
of genetic diversity; (ii) a haplotype tool tions where many individuals are migrants
Decision Support Tools 603
or admixed. It can be applied to most of the R package for multivariate analysis, graphics,
commonly used genetic markers. phylogeny and spatial analysis.
POWERMARKER (http://www.powermarker. The exploration of DNA sequence vari-
net), as a software package to perform statisti- ation for making inferences on evolution-
cal analysis of marker data collected from a ary processes in populations has become
set of germplasm accessions, delivers a data- increasingly important recently and
driven, integrated analysis environment (IAE) requires the coordinated implementation
for marker data. The IAE integrates the data of a Suite of Nucleotide Analysis Programs
management, analysis and visualization in (SNAP; http://www.cals.ncsu.edu/plant
a user-friendly graphic interface. It acceler- path/people/faculty/carbone/snap.html),
ates the analysis process and enables users to each bound by specific assumptions and
maintain data integrity throughout the proc- limitations. A workbench tool was devel-
ess. POWERMARKER handles a variety of data oped to make existing population genetic
from most of the commonly used genetic software more accessible and to facilitate
markers including simple sequence repeat the integration of new tools for analysing
(SSR), SNP and restriction fragment length patterns of DNA sequence variation, within
polymorphism (RFLP). The results can be a phylogenetic context. Collectively, SNAP
exported as frequency, distance and tree. tools can serve as a bridge between theoreti-
Various data analyses can be performed cal and applied population genetic analysis
by POWERMARKER. Its summary statistics (Aylor et al., 2006).
include basic statistics, allele and geno-
type frequencies, haplotype frequencies,
HardyWeinberg disequilibrium, two-locus
linkage disequilibrium and multi-locus 15.1.2 Breeding population management
linkage disequilibrium. Structure analysis
includes population differentiation test, Decision support tools for the management
classic F-statistics, population-specific of breeding populations are needed to assist
F-statistics and co-ancestry matrix. In phy- in the choice of parental lines, types of
logenetic analysis, tree construction can be crosses and the nature of breeding system.
made after computing frequencies and fre- Computational tools may also assist in the
quency-based distances with bootstrapping establishment and maintenance of hetero-
implemented. Association analysis can be tic groups, selection of lines for creation
done through a single-locus case control of a synthetic cultivar, prediction of prog-
test, single-locus F-test and haplotype trend eny and hybrid performance; and monitor-
regression. ing of genomic profiles during population
There are several software packages for improvement.
treatment and analysis of data collected for
germplasm accessions. GRAPHICAL GENOTYPES Establishing heterotic patterns
(GGT) software, developed by van Berloo
(1999), allows the user to transform molec- Generating highly heterotic hybrids is highly
ular marker data into simple colourful dependent on having sufficient genetic
chromosome drawings. Besides graphical diversity in the germplasm pool of potential
representation, GGT can also be used for parents. However, it is still not possible in
selection or filtering of marker data. POPDIST many crops to predict the level of hybrid
calculates a number of different genetic iden- vigour from analysis of parental lines. For
tities, phylogeny reconstructing measures example, commercial maize hybrids are
and distance reconstructing measures (http:// typically generated from crosses between
genetics.agrsci.dk/bg/popgen/). ADEGENET is inbreds from complementary heterotic
a package dedicated to the handling of groups. Therefore, construction or develop-
molecular marker data for multivariate an- ment of heterotic groups has been one of the
alysis (http://pbil.univ-lyon1.fr/software/ade key strategies in hybrid breeding for many
genet/). This package is related to ADE4, an crops. However, moving to a more definitive
604 Chapter 15
system to predict which genotypes in each find the best process consumes a large pro-
heterotic group should be crossed to maxi- portion of the breeding effort but there is
mize heterosis, is still not possible in many currently no alternative as hybrid perform-
crops. ance is highly unpredictable in most crops.
Genotyping parental lines on a genome- Therefore, predicting hybrid performance
wide scale, especially when gene-based has always been a primary objective in all
markers are available, may provide an hybrid-breeding programmes.
opportunity for establishing parenthybrid Methods for predicting the perform-
performance relationships at the molecu- ance of single crosses would greatly
lar level. Genome-wide heterozygosity and enhance the efficiency of hybrid breeding
specific combinations of alleles (linkats) programmes. Development of a reliable
may be useful determinants in some crops method for predicting hybrid performance
for maximizing heterosis and hybrid vig- or heterosis without generating and test-
our. Melchinger and Gumber (1998) sug- ing hundreds or thousands of single cross
gested a multi-stage procedure to identify combinations has been the goal of numer-
heterotic groups (Chapter 9). Determining ous studies using marker data and combi-
heterotic patterns is a continual process, nations of marker and phenotypic data,
each cycle of which consists of three steps: particularly in maize and rice. Considering
(i) cluster analysis to identify broad heter- that hybrid performance must be governed
otic groups; (ii) combining ability and heter- by many genes, genotyping parental lines
osis analysis to define the heterotic pattern; on a genome-wide scale, especially when
and (iii) update and maintain heterotic gene-based markers are used, provides an
groups. Tools for heterotic group identifica- opportunity of establishing parenthybrid
tion are usually the same as those that have performance relationships at the molecu-
been used in germplasm classification and lar level. Genome-wide heterozygosity and
grouping. allele combination analysis may provide
some clue for breeding more heterotic and
Predicting hybrid performance vigorous hybrids. Therefore, using parental
genotyping may reduce the required level of
A successful hybrid development process testcross-based phenotyping analysis.
depends on a full understanding of the The best linear unbiased prediction
parental genotypes and the consequences of (BLUP) procedure has been used for decades
their genetic combinations and interactions for evaluating the genetic merit of animals,
in the hybrid. Hybrid breeding includes especially dairy cattle. Intrapopulation,
two major procedures: breeding parental additive genetic models have traditionally
lines and selection of the best combina- been used for BLUP in animal breeding
tions of those parental lines for hybrid pro- (Henderson, 1975). Bernardo (1994, 1996)
duction. These procedures involve a large used BLUP in maize breeding with inter-
amount of work for field evaluation, test- population genetic models that involve
crossing and progeny tests. Breeders both general combining ability and specific
continually have to decide which experi- combining ability and found that BLUP
mental single crosses to test, which advanced is useful for routine prediction of single-
hybrids to recommend for further testing cross performance. Results have indicated
or commercialization and which inbred that BLUP is useful for routine prediction
parents to cross to form new base popula- of single-cross performance. The predicted
tions for inbred/population development performance of single crosses may subse-
(Bernardo, 1999). As a result, large-scale quently be used to predict the perform-
testcrossing is required for all hybrid- ance of F2 tester combinations, three-way
related inbred development. Testcrossing crosses, or double crosses. Along with the
might be carried out at many stages in the pedigree relationship, the BLUP method
breeding process often beginning from the can use trait data, or both trait and marker
very first generations. This trying all to data, for prediction.
Decision Support Tools 605
In some specific cases within a breed- by the Whitehead Institute (Lander et al.,
ing programme, tools are needed for selec- 1987). Almost all molecular maps based on
tive genotyping and pooled DNA analysis the first generation of molecular markers,
as described in Chapter 7 and by Xu et al. RFLPs, were constructed using this soft-
(2008). GENEPOOL (http://genepool.tgen.org/) ware. As an alternative, MAP MANAGER CLAS-
is such a software package that provides SIC is a graphic, interactive program to map
analytical tools for the detection of shifts Mendelian loci using intercrosses with co-
in relative allele frequency between pooled dominant markers, backcrosses or recom-
genomic DNA from cases and controls binant inbred lines (RILs) in experimental
using SNP-based genotyping microarrays. plants or animals (Manly, 1993; http://www.
GENEPOOL supports genotyping platforms mapmanager.org/mapmgr.html).
from Affymetrix and Illumina (Pearson Some special statistical modifications
et al., 2007). Another package is PDA, may be needed to construct a map using
POOLED DNA ANALYSER, a tool for analysis of markers with severe distortion of segrega-
pooled DNA data (http://www.ibms.sinica. tion. MAPDISTO (web/ftp: http://mapdisto.free.
edu.tw/csjfann/first%20flow/programlist. fr/) is such a program for mapping genetic
htm; Yang, H.-C. et al., 2006b). markers in case of segregation distortion
In addition to the tools for germplasm using experimental segregating populations
and breeding population management and such as backcross, double haploid (DH) and
evaluation described above, decision sup- RIL populations. It can: (i) compute and draw
port tools are needed for intellectual property genetic maps through a graphical interface;
rights and plant variety protection. Chapter and (ii) facilitate the analysis of marker data
13 provides a section on how molecular showing segregation distortion due to differ-
markers can be used for this purpose. ential viability of gametes or zygotes.
Maps or data from multiple populations
derived from different crosses can be com-
15.2 Genetic Mapping and bined into single or consensus maps through
MarkerTrait Association Analysis joint mapping. JOINMAP is a software package
for construction of genetic linkage maps for
several types of mapping populations: BC1, F2,
Construction of genetic maps using molec-
RIL, F1- and F2-derived DH and out-breeder
ular markers (Chapter 2) and use of these
full-sib family (http://www.kyazma.nl/index.
maps in markertrait association analysis
php/mc.JoinMap/). It can combine (join)
(Chapters 6 and 7) are two prerequisite steps
data derived from several sources into an
required for MAS. There is a large number
integrated map, with several other functions
of methodologies and tools currently avail-
including linkage group determination, auto-
able for various types of populations and
matic phase determination for out-breeder
markers. In this section, only some of these
full-sib family, several diagnostics and map
tools will be discussed and at the same time,
charts (van Ooijen and Voorrips, 2001).
we expect more tools will be developed for
A software package with comparative
new mapping strategies and markers and
function is CMAP, which was developed
new types of populations.
as a web-based tool to allow users to view
comparisons of genetic and physical maps.
The package also includes tools for curat-
15.2.1 Genetic map construction ing map data (http://www.gmod.org/cmap;
Ware et al., 2002).
Genetic maps can be constructed using seg-
regating populations of different types for
species with different levels of ploidy as 15.2.2 Linkage-based QTL mapping
described in Chapter 2. The first and most
frequently used software for map construc- Demonstrated linkages/associations between
tion is MAPMAKER/EXP, which was developed target traits/genes and molecular markers
606 Chapter 15
are based on genetic linkage and linkage- A currently widely used QTL mapping
disequilibrium (LD) mapping experiments software is QTL CARTOGRAPHER (http://stat
(Chapters 6 and 7). Decision support tools gen.ncsu.edu/qtlcart/cartographer.html),
required for genotypephenotype asso- which implements several statistical meth-
ciation include: (i) statistical methods and ods using multiple markers simultaneously
tools to establish, validate and compare including composite interval mapping and
genotypephenotype associations through multiple composite interval mapping.
linkage mapping, LD or association map- Interaction between identified QTL can also
ping and in silico mapping, using single be estimated. PLABQTL uses composite inter-
populations, multiple populations or all val mapping with many functions similar
genetic resources with information avail- to QTL CARTOGRAPHER. QTL can be localized
able from multiple trials across years, sea- and characterized in populations derived
sons and locations; (ii) statistical methods from a biparental cross by selfing or produc-
and tools for identification of genetic back- tion of DHs. Simple and composite interval
ground effects, quantitative trait loci (QTL) mapping are performed using a fast multi-
alleles at multiple loci and multiple alleles ple regression procedure. As an additional
at a locus; (iii) tools facilitating the process function to many other software packages, it
from linked markers to functional markers can be used for QTL environment interac-
and candidate genes; and (iv) tools facili- tion analysis (Utz and Melchinger, 1996).
tating management of genetic populations, QGENE (http://www.qgene.org/) is
maps and related marker and phenotypic intended for doing comparative analyses
data. of QTL mapping data sets in computation-
There are many commercial or freely ally efficient ways that are of maximum use
available software packages for establishing to analysts. It is also written with a plug-in
association between marker genotypes and architecture for ready extensibility. QGENE
trait phenotypes. The most commonly used was begun in about 1991 as a map and
are QTL CARTOGRAPHER, MAPQTL, PLABQTL and population simulation program, to which
QGENE. All of these only handle bi-allelic pop- QTL analyses were added on. Recently
ulations, while MCQTL (Jourjon et al., 2005) QGENE has been rewritten in the Java lan-
also performs QTL mapping in multi-allelic guage, allowing it to run on any computer
situations, including bi-parental populations operating system. It offers most conven-
made from segregating parents, or sets of bi- tional QTL-mapping methods and allows
parental, bi-allelic populations. The most fre- their side-by-side comparison. Its interface
quently used software during the 1980s and can be rendered in any human language
1990s was MAPMAKER/QTL, which is a sister desired; the conversion requires only that
software package to MAPMAKER/EXP, developed the interested user writes a translation file.
by Lander et al. (1987) (http://www-genome. QGENE can be used for analysis of trait, QTL
wi.mit.edu/genome_software). This software and permutation and simulation of popula-
is based on maximum likelihood estimation tions and traits as well.
of linkage between marker and phenotype Several software packages can be
using interval mapping, which deals with used for constructing linkage maps in out-
simple QTL and several standard popula- crossing plant species. ONEMAP provides
tions. Another early software package, MAPL such an environment using full-sib families
(MAPping and QTL analysis; http://lbm. derived from two outbreed (non-inbreed-
ab.a.u-tokyo.ac.jp/software.html; Ukai et al., ing) parent plants (http://www.ciagri.usp.
1995) allows a user to get results on segrega- br/aafgarci/OneMap/; Garcia et al., 2006).
tion ratio, linkage test, recombination value, Another is MAPQTL (software for the calcu-
group markers, order markers by metric lation of QTL positions on genetic maps,
multi-dimensional scaling, draw a map and http://www.mapqtl.nl), which can be used
graphical genotype and map QTL through for several types of mapping populations
interval mapping and analysis of variance including BC1, F2, RILs, (doubled) haploids
(ANOVA). and full-sib family of out-breeders. It can
Decision Support Tools 607
be used for QTL mapping through interval a diallel modelling of the QTL effects is
mapping, composite interval mapping and allowed when using multiple related fami-
non-parametric mapping, with functions for lies. MAPPOP was developed for selective
automatic cofactor selection and permuta- mapping and bin mapping by choosing good
tion test. samples from mapping populations and for
A few mapping software programs locating new markers on pre-existing maps
consider epistasis in QTL mapping. EPISTACY (Vision et al., 2000). In addition, QTLNET-
is an SAS program designed to test all pos- WORK was developed for mapping and visu-
sible two-locus combinations for epistatic alizing the genetic architecture underlying
(interaction) effects on a quantitative trait. complex traits for experimental populations
The program is really an SAS program tem- derived from a cross between two inbred
plate that users must modify to suit their lines (http://ibi.zju.edu.cn/software/qtlnet
own data sets. In the simplest cases, users work; Yang et al., 2008).
will need only to change the names of the As web-based tools become increas-
files containing their data. However, the ingly important, web-based QTL analytical
program uses least squares methods and tools become available. As such an exam-
does not employ interval mapping methods ple, WEBQTL was developed as an interactive
(Holland, 1998). web site useful for exploring the genetic
Bayesian QTL mapping has received a modulation of thousands of phenotypes
lot of attention in recently years with several gathered over a 30-year period by hun-
software packages developed. For example, dreds of investigators using reference pan-
BQTL, Bayesian Quantitative Trait Locus els of recombinant inbred strains of mice
mapping, was developed for the mapping of (http://www.webqtl.org/search.html). WEBQTL
genetic traits from line crosses and RILs includes dense error-checked genetic maps,
(http://hacuna.ucsd.edu/bqtl; Borevitz et al., as well as extensive gene expression data
2002). It performs: (i) maximum likelihood sets (Affymetrix) acquired across more than
estimation of multi-gene models; (ii) Bayesian 35 strains of mice. As a web-based user-
estimation of multi-gene models via Laplace friendly package to map QTL in out-bred
Approximations; and (iii) interval mapping populations, QTL EXPRESS (http://qtl.cap.
and composite interval mapping of genetic ed.ac.uk; Seaton et al., 2002) was devel-
loci. BLADE, Bayesian LinkAge DisEquilibrium oped for line crosses, half-sib families,
mapping, was developed for Bayesian analy- nuclear families and sib-pairs. It provides
sis of haplotypes for LD mapping (http:// two options for QTL significance tests:
www.people.fas.harvard.edu/junliu/Tech permutation tests to determine empirical
Rept/03folder/; Liu et al., 2001; Lu, X. et al., significance levels and bootstrapping to
2003). MULTIMAPPER is a Bayesian QTL map- estimate empirical confidence intervals of
ping software for analysing backcross, DH QTL locations. Fixed effects/covariates can
and F2 data from designed crossing experi- be fitted and models may include single or
ments of inbred lines (Martinez et al., 2005). multiple QTL.
MULTIMAPPER/OUTBRED extended this to the
populations derived from out-bred lines
(http://www.rni.helsinki.fi/mjs/).
Several mapping software packages 15.2.3 eQTL mapping
were developed for QTL mapping required
in some specific situations. MCQTL was With the availability of whole genome
developed for simultaneous QTL mapping sequences in many plant species, linkage
in multiple crosses and populations (http:// analysis, positional cloning and microarray
www.genoplante.com; Jourjon et al., 2005). are gradually becoming powerful tools for
It allows the analysis of the usual popula- revealing the links between phenotype and
tions derived from inbred lines and can link genotype or genes. To display the myriad of
the families by assuming that the QTL loca- relationships between eTraits, markers and
tions are the same in all of them. Moreover, genes, we need a convenient bioinformatics
608 Chapter 15
software/midas; Gaunt et al., 2006) and annotation, pathway information and pat-
PEDGENIE (http://bioinformatics.med.utah. terns of LD (Pettersson et al., 2008). GOLD-
edu/PedGenie/index.html; Allen-Brady et SURFER2 (GS2), a comprehensive tool for the
al., 2006), which was developed as a general- analysis and visualization of GWA studies,
purpose tool to analyse association and was developed by Pettersson et al. (2008).
transmission disequilibrium (TDT) between GS2 is an interactive and user-friendly graph-
genetic markers and traits in families of ical application that can be used in all steps
arbitrary size and structure. With PEDGENIE, in GWA projects from initial data quality
any size pedigree may be incorporated into control and analysis to biological evalua-
this tool, from independent individuals to tion and validation of results. The program
large genealogies. Independent individuals is implemented in Java and can be used on
and families may be analysed together. all platforms. With GS2, very large data sets
GENERECON (http://www.daimi.au.dk/ (e.g. 500K markers and 5000 samples) can
mailund/GeneRecon/) is another software be quality assessed, rapidly analysed and
package for LD mapping using coalescent integrated with genomic sequence informa-
theory. It is based on a Bayesian Markov- tion. Candidate SNPs can be selected and
chain Monte Carlo method for fine-scale LD functionally evaluated.
mapping using high-density marker maps Other tools that are developed for GWA
in animals. GENERECON explicitly models studies include GENOMIZER (a platform-
the genealogy of a sample of the case chro- independent Java program for the analysis
mosomes in the vicinity of a disease locus. of GWA experiments; http://www.ikmb.
Given case and control data in the form of uni-kiel.de/genomizer), PLINK (a whole-
genotype or haplotype information, it esti- genome LD analysis toolset; http://pngu.
mates a number of parameters, most impor- mgh.harvard.edu/purcell/plink; Purcell et
tantly, the disease position (Mailund et al., al., 2007), MAPBUILDER (for chromosome-
2006). wide LD mapping; http://bios.ugr.es/
BMapBuilder; Abad-Grau et al., 2006) and
Genome-wide association mapping power Calculator for Association with Two
Stage design (CATS), which calculates the
Genome-wide association (GWA) studies power and other useful quantities for two-
are now being widely undertaken to find stage GWA studies (http://www.sph.umich.
the link between genetic variations and edu/csg/abecasis/CaTS) (Skol et al., 2006).
common diseases in humans and agronomic The results of large GWA studies are
traits in plants. Ideally, a well-powered being deposited in public databases with
GWA study will involve the measurement increasing frequency. But the currently
of hundreds of thousands of SNPs in thou- available software to analyse and inter-
sands of individuals. The sheer volume of pret GWA data sets can be difficult to use
data generated by these experiments cre- (Buckingham, 2008). User-friendly software
ates very high analytical demands. There is urgently needed to provide new ways of
are a number of important steps during the making GWA data sets easy to explore and
analysis of such data, many of which may share among researchers and to design ana-
present several bottlenecks. The data need lysis packages that deal with the increas-
to be imported and reviewed to perform ini- ing computational demands posed by these
tial quality control before proceeding to LD data sets.
testing. Evaluation of results may involve
further statistical analysis, such as permu- Integrated haplotype and LD analysis
tation testing, or further quality control of
associated markers, for example, review- The analysis of large amounts of SNP data
ing raw genotyping intensities. Finally, sig- creates difficulties for the analysis of hap-
nificant associations need to be prioritized lotypes and their association to traits of
using functional and biological interpreta- interest. Commonly fairly simple methods,
tion methods, browsing available biological such as two- or three-SNP sliding windows
610 Chapter 15
are used to create haplotypes across large lotype block identification, haplotype
regions, but these may be of limited value resolution and LD mapping, suitable
when adjacent SNPs are in strong LD and for high-density phased or unphased
provide redundant information. Genetic SNP data (http://bioinfo.cs.technion.
analysis of SNP data and haplotypes have ac.il/haploblock);
received more and more attention recently HAPLOT, a simple program for graphical
and various software packages have been presentation of haplotype block struc-
developed for haplotype analysis and these tures, tagSNP selection and SNP varia-
are sometimes integrated with LD analysis. tion (Gu et al., 2005);
HAPLOBUILD (http://snp.bumc.bu.edu/ HAPLOREC, population-based haplotyping
modules.php?name=HaploBuild), was cre- software (Eronen et al., 2004); and
ated for constructing and testing haplotypes HAP, a haplotype analysis system which
for SNPs in close physical proximity to one is aimed at helping to perform dis-
another but which are not necessarily con- ease association studies and a phasing
tiguous (Laramie et al., 2007). The number method which is based on the assump-
of SNPs contained in the haplotype is not tion of imperfect phylogeny (http://
restricted, thereby permitting the evalua- research.calit2.net/hap).
tion of complex haplotype structures.
HAPLOVIEW (http://www.broad.mit.edu/
personal/jcbarret/haploview) was designed
to simplify and expedite the process of 15.2.5 Genotype-by-environment
haplotype analysis by providing a common interaction analysis
interface to several tasks relating to such
analyses. HAPLOVIEW currently allows users To better separate the genetic effects from
to examine block structures, generate hap- the environmental effects and their interac-
lotypes in these blocks, run association tests tion, statistical methods are of paramount
and save the data in a number of formats. importance in a traditional as well as molec-
All functionalities are highly customizable ular breeding programme (Chapter 10).
(Barrett et al., 2005). These methods become even more essential
HAPSTAT (http://www.bios.unc.edu/lin/ when developing MAS systems for abiotic
hapstat/) is a user-friendly software inter- stress tolerance where germplasm must be
face for the statistical analysis of haplotype- tested under drought or low nitrogen condi-
disease association. HAPSTAT allows the user tions, for example. Under such stress con-
to estimate or test haplotype effects and ditions the soil where the plants are grown
haplotypeenvironment interactions by becomes extremely variable and patchy so
maximizing the (observed-data) likelihood that the separation of genetic effects from
that properly accounts for phase uncer- environmental effects is much more diffi-
tainty and study design. The current ver- cult than under normal conditions.
sion considers cross-sectional, case-control Various processes contribute to the
and cohort studies. characterization of a genotypeenvironment
Other related software packages include: system (Cooper et al., 1999). There is a great
need for integrated decision support tools
DPPH (Direct method for Perfect for genotype-by-environment interaction
Phylogeney Haplotyping; http://wwwc- (GEI) analysis: (i) developing field experi-
sif.cs.ucdavis.edu/gusfield/dpph. mental designs; defining the target popula-
html; Bafna et al., 2003); tion of environments (TPE) and genotypes;
EHAP (detecting association between (ii) assessing GEI for various field condi-
haplotypes and phenotypes; http:// tions and determining subsets of genotypes
wpicr.wpic.pitt.edu/WPICCompGen/ and sites with negligible crossover interac-
ehap__v1.htm); tion effects from which subgroups of sites
HAPLOBLOCK, a software package which and genotypes with similar response can be
provides an integrated approach to hap- identified in order to maximize responses
Decision Support Tools 611
to selection; (iii) mapping QTL and QTL- detection. Among various models, mixed
by-environment interaction (QEI) of compo- linear models are fundamental in the pro-
nent traits important for the target traits; (iv) cess of in silico QTL linkage and LD map-
developing a selection index for phenotypic ping. These decision support tools are being
as well as molecular marker data in order further refined through the integration of
to select the best genotypes to be used in whole-plant physiology models.
the next cycle of selection; (v) incorporat-
ing environmental and/or genotypic vari-
ables into statistical models to explain the
causes of GEI (physical and chemical soil 15.2.6 Comparative mapping and
conditions may be of importance under consensus maps
drought and may be the main cause of GEI);
(vi) studying genetic diversity of crop geno- In the past few decades, a wealth of genomic
types associated with the target traits; (vii) data has been produced in a wide variety
performing LD mapping of those traits; and of species using a diverse array of func-
(viii) studying gene expression of genes tional and molecular marker approaches.
under target conditions from microarray In order to unlock the full potential of the
experiments. information contained in these independ-
Decision support tools are required for ent experiments, researchers need efficient
classifying the most important testing envir- and intuitive means to identify common
onments into mega-environments that will genomic regions and genes involved in the
then define the appropriate TPE. Based on expression of target phenotypic traits across
these environmental classifications breeding diverse conditions. Experimenters who
strategies can be developed and established seek to apply many diverse studies on QTL
for a more efficient and rapid realization of face complex problems in summarizing,
genetic gains targeting those specific environ- interrelating and integrating them. Tools for
ments. Furthermore, the incorporation of cli- QTL consensus map building offer exten-
matic variables (attributes of environments) sive analysis or meta-analysis of data prior
and molecular markers (attributes of geno- to assigning a consensus QTL location for
types such as QTL) into statistical models a trait (Sawkins et al., 2004; Arcade et al.,
facilitate the identification of the causes of GEI 2004).
and therefore help explain QEI. This allows CMTV (Comparative Map and Trait
interpreting, understanding and exploiting Viewer; Sawkins et al., 2004) was devel-
GEI and QEI and it allows identification of oped as a software component to help serve
the regions of the chromosomes affecting a as an intuitive and extensible framework for
trait that are highly affected by external cli- the integration of various kinds of genomic
matic conditions. This also facilitates group- data. The software components use the
ing of environments with negligible genetic ISYS (Integrated SYStem) integration plat-
crossover effects as well as clustering geno- form developed by the National Center for
types with no genotypic crossover GEI. Genetic Resources (Siepel et al., 2001) to
Podlich and Cooper (1998) developed access and visualize map data and related
the QUGENE software for carrying out quan- information such as germplasm pedigree
titative genetic analyses of GEI in crop relationships. CMTV is based on algorithmi-
breeding and this has become an increas- cally determining correspondences between
ingly widely utilized decision support tool sets of objects on multiple genomic maps,
in breeding programmes. More recently, a and can display syntenic regions across taxa,
statistical model developed by Crossa et al. combine maps from separate experiments
(2006) incorporates pedigree information into a consensus map, or project data from
(through the coefficient of parentage) for different maps into a common coordinate
test genotypes when modelling GEI. This framework. As such an example, Schaeffer
model can be used to perform more efficient et al. (2006) used a strategy for consen-
LD mapping studies as well as in silico QTL sus QTL maps that leverages the highly
612 Chapter 15
et al., 2003b) was developed as a genetic tools have been developed for assisting
data management and analysis system to germplasm evaluation, genetic mapping
advance whole genome linkage, LD and and MAS, they either work independently,
eQTL studies. Designed for biologists, sta- depending on different operating systems,
tistical geneticists and investigators respon- or require different data formats which
sible for generating genotyping data, the makes it impossible to complete a compre-
Syllego system provides us with an easy- hensive data analysis to make the results
to-use project workspace so that we can available to breeders for decision making in
organize, analyse and share genotype and such a short time window.
phenotype data along with analysis results.
With the Syllego system, generating high
quality analysis data and meaningful results 15.3.1 MAS methodologies and
becomes simple. It automates all tedious implementation
data management and data formatting tasks
so that genetic analysis workflows can be There are many factors that affect the effi-
streamlined using analysis methods of ciency of MAS. In theory, MAS is expected
choice. Managing all genetic data and ref- to be more efficient than phenotypic selec-
erence information is straightforward. The tion when the heritability of a trait is low,
Syllego system converts public and private where there is tight linkage between the
genotype data sets and reference annota- QTL and the DNA markers (Dudley, 1993;
tions, such as dbSNP (http://www.ncbi. Knapp, 1998), with larger population sizes
nlm.nih.gov/projects/SNP/) and HapMap (Moreau et al., 1998) and in earlier genera-
(http://www.hapmap.org/), as well as indi- tions of selection before recombinational
vidual (sample) information into a single, erosion of markertrait associations (Lee,
consistent repository for fast, convenient 1995). Edwards and Page (1994) proposed
access. that the distance between the markers and
the QTL was the single largest constraining
factor for gains from MAS. An example in
Lande and Thompson (1990) demonstrated
15.3 Marker-assisted Selection that on a single trait the potential selection
efficiency, using a combination of molecu-
MAS is one of the major activities in molec- lar and phenotypic information, depends
ular breeding (Chapters 8 and 9). It needs on the heritability of the trait, the propor-
various decision support tools including tion of additive genetic variance associated
those for foreground and background selec- with marker loci and the selection scheme.
tion and identification of the recombinants The relative efficiency of MAS is greatest
with favourable alleles and allele combin- for traits with low heritability if a large frac-
ations. However, only a few tools are avail- tion of the additive genetic variance is asso-
able so far for some procedures of MAS. ciated with marker loci.
Development of decision support tools for Decision support tools are required for
fully functional MAS still faces a lot of the following procedures related to MAS: (i)
challenges. determining minimum sample size for fore-
A huge amount of data will be gener- ground/background selection; (ii)estimation
ated with large-scale MAS and this set of of genetic gains (response to selection);
data needs to be analysed and also inte- (iii) construction of selection indices for
grated with other types of data to make multiple traits and whole genome selection;
selection decisions in a short time window, (iv) estimation and graphical display of
e.g. 4 weeks during vegetative to flowering recipient genome content of selected indi-
stages, or harvest to planting the next season. viduals at each generation of introgression;
Thus, decision support tools are essential (v) identification of desirable plants based
to accelerate this process while maintain- on both phenotype and genotype; (vi) cost
ing accuracy and precision. Although many benefit analysis; and (vii) software for
614 Chapter 15
MAS and simulations (using all available decision-support guidelines to help the user
information). correctly operate the software and correctly
There has been much interest in the interpret the outputs.
development of software that simulates Other MAS tools include: (i) POPMIN, a
MAS using genetic models. Early efforts had program for the numerical optimization of
somewhat limited value, for example, GREGOR population sizes in marker-assisted back-
simulates MAS based only on predefined cross programmes (Hospital and Decoux,
genetic linkage maps and is thus restricted 2002); (ii) BCSIM, backcross simulation soft-
in its value for simulation of MAS in breed- ware for evaluation of marker-assisted
ing programmes (Tinker and Mather, 1993). backcross programmes (http://www.plant-
The program GREGOR implements the basic breeding.wur.nl/UK/software_bcsim.html);
principles, but the interactive use and the and (iii) the GGT, GRAPHICAL GENOTYPES soft-
fact that it simulates only some predefined ware, allowing the user to transform molec-
genetic linkage maps restricts its value for ular marker data into simple colourful
simulation of breeding programmes. chromosome drawings (van Berloo, 1999).
Frisch et al. (2000) present PLABSIM, a
tool for simulation of MAS programmes.
The software can be used to investigate the
effect of varying population size, marker 15.3.2 Marker-assisted inbred and
density and positions and selection strate- synthetic creation
gies on the genetic composition of the breed-
ing product and on the required number of For open-pollinated crops, a synthetic cul-
marker data points. It has the following fea- tivar is developed by inter-crossing selected
tures: (i) simulations can be made for any clones or inbred lines, with seed production
diploid genome with an arbitrary number of the cultivar through open pollination.
of loci at arbitrary positions on an arbitrary For self-pollinated crops, a synthetic culti-
number of chromosomes; (ii) the imple- var is a mix of different inbred lines. The
mented reproduction schemes include all breeding procedures used to develop a
common breeding methods; (iii) an arbitrary synthetic cultivar depend on the feasibil-
number of selection steps can be combined ity of developing superior inbred lines and
with a selection strategy; (iv) selection can clones. For species such as maize, inbred
be carried out for genotypes at defined loci, lines for synthetic cultivars are developed
or for selection indices calculated from by the same procedures used for the devel-
allele frequencies at several loci; and (v) the opment of hybrid cultivars. For many for-
simulated data can be analysed for a broad age crops, inbreeding depression is too
range of genetic parameters including pop- severe to permit the formation of inbred
ulation size, marker density and positions lines, but the parent can be maintained
and selection strategies for the genetic com- and reproduced readily by cloning. The
position of the breeding product and on the factors to consider in the development of a
required number of marker data points. synthetic cultivar include: (i) formation of
To integrate various tools into a common a population; (ii) evaluation of individual
platform to assist their effective deployment inbreds/clones per se; (iii) evaluation of the
in crop improvement, iMAS (www.genera- combining ability of the inbreds/clones;
tioncp.org) is a preliminary attempt to create (iv) evaluation of experimental synthetics;
a publicly available computational platform and (v) preparation of seed for commercial
to assist the development and application of use (Fehr, 1987).
marker-assisted breeding. iMAS currently Synthetic cultivars can be developed
integrates freely available software for the by mixing inbred lines that have been bred
journey from phenotyping-and-genotyping by MAS or by mixing individual plants
of individuals to identification and applica- derived from any stage of MAS (Dwivedi
tion of trait-linked markers. iMAS also pro- et al., 2007). With genotypic information
vides simple-to-understand-and-use on-line available across the whole genome for all the
Decision Support Tools 615
selected individuals or inbred lines, support Goldman, 2000), various assumptions are
tools are needed to facilitate developing made in quantitative genetics to render
synthetic cultivars to contain complemen- theories mathematically or statistically trac-
tary genotypes, fixed heterozygosity and the table. Some of these assumptions can be
best combinations of genetic structure. easily tested or satisfied by certain experi-
mental designs; others, such as the assump-
tions of no linkage, no multiple alleles and
no GEI, can seldom, if ever, be met. Other
15.4 Simulation and Modelling assumptions, like the presence or absence
of epistasis and pleiotropy, are statisti-
Along with the fast development in molec- cally difficult to define and test. Computer
ular biology and biotechnology, a large simulation provides a tool to investigate
amount of biological data becomes increas- the implications of relaxing some of the
ingly available for important breeding traits, assumptions and the effect it has on the
which in turn allows selection based on implementation of a breeding programme
information of multiple sources. As dis- (Kempthorne, 1988). Computer simula-
cussed in the previous sections, however, tion provides an opportunity to lessen the
available information has not been effec- impact of these assumptions by accommo-
tively used in crop improvement due to dating these factors, thereby improving the
the lack of appropriate tools. In this sec- validity of genetic models for use in plant
tion, plant breeding through simulation breeding. This approach would be very
and modelling will be discussed including helpful when the breeders want to compare
utilizing the vast and diverse information breeding efficiencies from different selec-
by incorporating simulation and modelling tion strategies, to predict the cross perform-
into breeding programmes to develop and ance with known gene information and to
upgrade various decision support tools. utilize efficiently identified major genes
and QTL in breeding.
As agronomically important traits
are significantly affected by the environ-
15.4.1 Importance of simulation ment, whole-plant physiology modelling
and modelling is becoming increasingly important for
partitioning complex traits into their com-
The accumulation in genomics information ponents and understanding how those
for breeding traits has made simulation and components interact with each other and
modelling more and more practical and contribute to the overall trait expression
important, as computer simulation can help in different environmental conditions.
to investigate many what if crossing and With a commitment to genomic analysis of
selection scenarios, allowing many scenar- component traits, whole-plant physiology
ios to be tested in silico in a short period modelling provides a critical link between
of time, which in turn helps breeders make molecular genetics and crop improvement.
important decisions to minimize and opti- Crop models with generic approaches to
mize highly resource demanding field underlying physiological processes (Wang
experiments. As the number of published et al., 2002) provide a means to link phe-
genes and QTL for various traits continues notype and genotype, through simulation
to increase, for example, plant breeders face analysis, of an in silico or virtual plant
a challenge to determine how to best uti- (Tardieu, 2003). In this way it is possible to
lize this multitude of information for crop dissect the physiological basis of adaptive
improvement. Although quantitative genet- traits and determine their control at whole-
ics provides much of the framework for the plant level through modelling.
design and analysis of selection methods A plant requires information about its
used within breeding programmes (Falconer environment and its interaction with that
and Mackay, 1996; Lynch and Walsh, 1998; environment and uses that information to
616 Chapter 15
dictate its adaptive responses that result in tion studies for grain yield in maize. The
the plant phenotype. Significant endeav- synopsis we can take from Coorss synthe-
ours in the field of whole-plant modelling sis of published studies strongly suggested
are now being directed at understanding that the realized progress from selection
genetic regulation and aiding crop improve- for this trait is considerably lower than the
ment (Cooper et al., 2002a; Chapman et al., predicted response. For most involved in
2002, 2003; Hammer et al., 2002; Yin et al., applied breeding this result is not surpris-
2003; Wang et al., 2004; Yin, X. et al., 2004; ing. However, this quantified observation
Wang, J. et al., 2005). There are three areas forces us to consider the possible reasons
in which crop modelling could assist in for the discrepancies between the predic-
assessing in silico the multitude of options tions made from classical quantitative
to improve the efficiency of plant breed- genetic theory and the realized responses
ing (Cooper et al. 2002a): (i) characterizing from applied breeding.
environments to define the target popu- A crop can be analysable for processes
lation of environments; (ii) assessing the at various scales: community, population,
value of specific putative traits in improved plant, organ, tissues, cell and downwards to
plant types; and (iii) enhancing integration molecular levels. White and Hoogenboom
of molecular genetic methodologies. Hence, (2003) identified six levels of genetic details
plant breeders can pose questions that range for simulation to elucidate differences in
from how to better utilize field performance plant growth and development among
data to how knowledge of gene action or cultivars:
function can be utilized for selection in a
1. Genetic model with no reference to
complex TPE.
species.
2. Species-specific model with no reference
15.4.2 Genetic models used to genotypes.
in simulation 3. Genetic differences represented by
cultivar-specific parameters.
4. Genetic differences represented by spe-
Multiple mathematical formalisms have
cific alleles, with gene action and gene
been used to model genetic and, more
effects presented through linear effects on
generally, metabolic networks. Examples
model parameters.
include: (i) Boolean (ON/OFF) networks;
5. Genetic differences represented by geno-
(ii) Petri (concurrent information flow) nets;
types with gene action explicitly simulated
(iii) S-systems (continuous time models
based on knowledge of regulation of gene
motivated by chemical kinetics); (iv) differ-
expression and effects of gene products.
ential equation models; (v) neutral network
6. Genetic differences represented by geno-
models; and (vi) Bayesian networks (Welch
types, with gene action simulated at the
et al., 2004). Despite this extensive effort,
level of interactions of regulators, gene
little attention has focused on predicting
products and other metabolites.
phenotypes of interest to plant breeders or
on integrating the effect of multiple envir- The first two levels are found in early
onmental factors. models of crops and are still used for
Simulation, using relatively simple models where only genetic representa-
genetic models, has been used for many tions of species are required. Most current
special studies in plant breeding (Casali and crop models are at level 3. Level 4 corre-
Tigchelaar, 1975; Reddy and Comstock, 1976; sponds to the approach used in GeneGro
van Oeveren and Stam, 1992; van Berloo Version 1 (White and Hoogenboom, 1996)
and Stam, 1998; Frisch and Melchinger, and linear models of gene effects and level
2001). When it is used for genetic models 5 is partially represented in the phenology
with complex traits involved, however, the routines of GeneGro Version 2 (Hoogenboom
result is uncertain. Coors (1999) summa- and White, 2003) and based on knowledge
rized many of the published recurrent selec- of gene action. The feasibility of level
Decision Support Tools 617
design, describes how different genotypes collaboration among crop modellers, gene-
interact with environments to produce dif- ticists and molecular biologists.
ferent phenotypes (Cooper et al., 2005).
Using information from genes, core germ-
plasm collections and cornerstone parents,
when combined with the biological charac- 15.4.3 A simulation module for genetics
teristics and breeding objectives for the tar- and breeding: QULINE
get environments, breeding procedure and
selection methods can be simulated and Typically, breeding is done by crossing and
optimized and desirable genotypes and the selecting from progeny. With the opportu-
probability of breeding new cultivars can be nity to make predictions of crop perform-
predicted. ance and to explicitly model in silico the
Comparisons among genomes of dif- desired genotype environment breeding
ferent crop species reveal high levels of scheme combinations, breeding shifts in its
similarity and it appears likely that models character. Breeders become model testers
of gene action in one crop can be extrapo- themselves while model systems (or other
lated to other crops in the same botanical information rich systems) become useful
family (e.g. among legumes or among cere- tools for model building. Once phenotype
als). However, Helentjaris and Briggs (1998) genotype environment models are veri-
noted that efforts to identify maize homo- fied through explicit breeding experiments,
logues for genes described in other species the task is to move the models themselves
have proven more difficult than originally around through breeding programmes of
anticipated. One problem is that a single different crop species. One of the interest-
species may have multiple genes with simi- ing efforts pursuing this type of paradigm
lar sequences but different functions. shift is the QUantitative GENEtics (QUGENE;
In the future it will be possible to build www.pig.ag.uq.edu.au/qu-gene) system.
more realistic genetic models if advances QUGENE is a simulation platform for
in genomics improve our understanding quantitative analysis of genetic models,
of the GP relationship and GEIs (Bernardo, which provides the opportunity to develop
2002; Cooper et al., 2005). Conclusions on a general simulation program for actual
the relative merits of breeding strategies breeding programmes through its two-
based on simple GP models may have to stage architecture (Podlich and Cooper,
be re-evaluated in the context of an expo- 1998). The first stage is the engine, which
nentially growing knowledge base. This has two roles: (i) to define the genotype-
information will aid in determining gene by-environment (GE) system (i.e. all the
number and gene effects on phenotype. genetic and environmental information of
In addition, conventional plant breeding the simulation experiment); and (ii) to gen-
provides a wealth of information about trait erate the starting population of individu-
heritabilities and correlations. This infor- als (base germplasm). The second stage
mation, once determined, will help define includes the application modules, whose
errors, linkage and pleiotropic effects. In role is to investigate, analyse, or manipu-
addition, crop physiological models may late the starting population of individu-
also help fine-tune the genetic models for als within the GE system defined by the
breeding modelling (Reymond et al., 2003; engine. The application module usually
Yin, X. et al., 2004; Hammer et al., 2005). represents the operation of a breeding pro-
White and Hoogenboom (2003) discussed gramme. The core model within the engine
several practical issues in gene-based mod- can incorporate many of the features for
elling, including how to access genetic and the architecture of traits that are revealed
molecular data, which species, traits and by the characterization of GE system.
what scale and level of detail to model, the It includes multiple traits and QTL with
relevance of results from animal systems to different effects, genome positional infor-
plant biology and how to ensure effective mation such as that provided by molecular
Decision Support Tools 619
maps, epistasis within gene networks, dif- QULINE has the potential to provide a
ferential gene expression, GEIs and struc- bridge between the vast amount of biologi-
ture within the TPE. Cooper et al. (1999) cal data and breeders queries on optimiz-
provided an example of this approach for ing selection gain and efficiency. It has been
comparisons between conventional pheno- used to compare two selection strategies
typic and MAS strategies. (Wang et al., 2003), study the effects on
Using QUGENE software, a breeding selection of dominance and epistasis (Wang
module was developed for sorghum by et al., 2004), predict cross performance
incorporating physiological constraints and using known gene information (Wang, J.
was implemented by linking QUGENE to the et al., 2005) and optimize MAS to efficiently
Agricultural Production System Simulator pyramid multiple genes (Kuchel et al., 2005;
(APSIM) cropping systems model (Keating Wang, J. et al., 2007).
et al., 2003; http://www.apsru.gov.au). This By defining breeding strategy, QULINE
module can be used to simulate breeding translates the complicated breeding process
line performance in a given environment into a way that the computer can understand
and extrapolate the effects of long-term and simulate. QULINE allows for several
selection over many breeding cycles and breeding strategies to be defined simulta-
seasons. Another project supported by the neously. The programme then starts with
Generation Challenge Programme links the same virtual crosses for all the defined
QUGENE/APSIM with QTL data on maize leaf strategies at the first breeding cycle, includ-
growth under drought. These projects aim ing the same initial population, crosses and
to deliver modelling tools into the hands genotype and environment systems, allow-
of molecular breeders and other research- ing appropriate comparisons. A breeding
ers to extend the scope and impact of their strategy in QULINE is defined to include all
use, particularly with respect to molecular activities involved in an entire breeding
breeding of complex traits such as drought cycle such as crossing, seed propagation
tolerance (Dwivedi et al., 2007). and selection (Wang and Pfeiffer, 2007).
As a QUGENE application module, A breeding cycle begins with crossing and
QULINE was developed at the International ends at the generation when the selected
Maize and Wheat Improvement Center advanced lines are returned to the cross-
(CIMMYT) specifically for wheat-breeding ing block as new parents. The genotypic
programme simulation. It is a computer value of a genotype is calculated based on
tool capable of defining a range of genetic the definition of gene actions. The pheno-
models from simple to complex and simu- typic value and family mean is derived from
lating breeding processes for developing the genotypic value and its associated error
final advanced lines. Simulation indicated (environmental deviation). With all defined
that it can be used to optimize breeding phenotypic and genotypic values, QULINE
methodology and improve breeding effi- then makes within-family selection from
ciency. QULINE can be used to integrate vari- phenotypic values and among-family selec-
ous genes with multiple alleles functioning tion from family means.
within epistatic networks and differentially To simulate in QULINE, the seed propa-
interacting with the environment and pre- gation type must be defined to describe
dict the outcomes from a specific cross how the selected plants in a retained fam-
following the application of a real selec- ily from the previous selection round or
tion scheme (Wang et al. 2003, 2004). The generation are propagated to generate the
breeding methods that can be simulated by seed for the current selection round or gen-
QULINE are mass selection, pedigree system eration. Wang and Pfeiffer (2007) defined
(including single seed descent), bulk popu- nine options for seed propagation, which
lation system, backcross breeding, top cross can be presented in the order of increasing
(or three-way cross) breeding, DH breeding, genetic diversity (the F1 excluded) as: (i)
MAS and many combinations and modifi- clone (asexual reproduction); (ii) DH (dou-
cations of these methods. bled haploid); (iii) self (self-pollination);
620 Chapter 15
(iv) singlecross (single crosses between Practical applications often oblige crop
two parents); (v) backcross (backcrossed modellers to emphasize simulation of eco-
to one of the two parents); (vi) topcross nomic yield. A set of traits that are involved
(crossed to a third parent, also known as a in stress response is also worthy consider-
three-way cross); (vii) doublecross (crossed ing. While allowing precise control of plant
between two F1s); (viii) random (random response and gene expression, specific stress
mating among the selected plants in a fam- responses may largely be survival mech-
ily); and (ix) noself (random mating but anisms. Thus, whereas their study could
self-pollination is eliminated). The seed improve the simulation of plant survival,
for the F1 is derived from crossing among the results might prove harder to relate to
the parents in the initial population (or the simulation of basic processes of growth
crossing block). QULINE randomly deter- and partitioning (White and Hoogenboom,
mines the female and the male parents for 2003). Innovative simulation models will
each cross from a defined initial popula- bridge the gap between molecular and con-
tion, or alternatively, one may select some ventional plant breeding and will inform
preferred parents from the crossing block. both strategic research and tactical breed-
The selection criteria used to identify such ing decisions (www.generationcp.org/
preferred parents can be defined in terms sccv10/sccv10_upload/modelling_links.
of among-family and within-family selec- pdf). Simulation models integrate molecu-
tion descriptors within the crossing block lar information about interaction between
(referred to as the F0 generation). By using genes and simpler traits to allow realistic
the parameter of seed propagation type, predictions for more complex traits such as
most if not all methods of seed propagation drought tolerance and yield.
in self-pollinated crops can be simulated Developing and implementing a design-
in QULINE. led breeding system for complex traits
requires enhanced attention to precision
phenotyping, eco-physiological modelling
and marker validation to ensure robust-
15.4.4 The future of simulation ness and selective power. These approaches
and modelling require the iterative and systemic integration
of a range of scientific disciplines including
There are several practical implementation modellers, physiologists, geneticists, breed-
issues in simulation and modelling to be ers and molecular biologists. Nevertheless,
solved, including: (i) communications and the first preliminary studies reviewed in
training required to combine modelling and this section suggest that a new paradigm in
simulation with real breeding programmes knowledge-led, design-driven plant breed-
through involvement of other scientists ing is a feasible option and that for the
including breeders, agronomists and geneti- first time, genomics may finally realize its
cists; (ii) standardization and documen- potential impact on breeding complex traits
tation of data collection for phenotypic, (Dwivedi et al., 2007).
environmental and genomic information Although many public databases on
needs to be enforced through the project; genes, alleles, gene and genomic sequences
(iii) unexpected and great variation within and related information are maintained
selection and target environments requires by geneticists and molecular biologists,
much more comprehensive data collec- physiologists and modellers may find these
tion, compared to other breeding environ- databases less useful than expected. The
ments with much less stressful factors; user interfaces assume familiarity with bio-
and (iv) when more and more factors are informatics. Databases of gene sequences
involved in modelling and simulation, data and protein structure lack information on
generation and collection should be done actual gene function in most cases. The
with more data dimensions including more number of lines or cultivars characterized
locations, samples and replications. for a given gene is usually limited to the
Decision Support Tools 621
parents used in describing the gene and few haplotypes, extensive phenotyping of all
data on field performance are found (White agronomic traits for both the mapping pop-
and Hoogenboom, 2003). For example, the ulations and the inbred lines that are used
Arabidopsis Information Resource (2000) for chromosome haplotyping and allele
purports to provide more phenotypic data assessment. Breeding by design involves
for Arabidopsis as a model plant, but still the integrative, complementary applica-
falls short of meeting the requirements of tion of technological tools and the materi-
whole plant model. The same is true for rice als currently available to develop superior
and its related databases. cultivars. During this process, an enormous
resource of knowledge is generated and
accumulated that should enable breeders to
deploy more rational and refined breeding
15.5 Breeding by Design strategies in the future. The developments
in high-throughput genotyping and genetic
The advances in applied genomics and mapping with associated statistical method-
the possibility of generating large-scale ology have now brought this strategy within
marker data sets provide us with the tools reach. The optimal exploitation of the nat-
to determine the genetic basis for all traits urally available genetic resources should
of agronomic importance. In addition, create unsurpassed possibilities to generate
methods for assessing the allelic varia- new traits and crop performance.
tion at these agronomically important loci
are now available. This combined knowl-
edge will eventually allow the breeder to 15.5.1 Parental selection
combine the most favourable alleles at all
these loci in a controlled manner to design Selecting parents to make crosses is the
superior cultivars in silico. This concept is first and essential step in plant breeding
called breeding by design (Peleman and (Fehr, 1987). Due to incomplete gene infor-
van der Voort, 2003) and has been gener- mation (i.e. some resistance genes and their
alized to breeding design using genome- effects on phenotype are known, while
wide QTLmarker associations identified other genes and most genes for other agro-
through all types of effort due to the fast nomic traits are unknown), many seem-
development in molecular marker technol- ingly good crosses are discarded during the
ogy (Bernardo, 2002; Peleman and van der segregating phase of a breeding programme.
Voort, 2003). The goal can be reached fol- Almost all agronomic traits including dis-
lowing a three-step approach: (i) mapping ease resistance, stress tolerance and yield
loci involved in all agronomically relevant involve complex genetics. It makes sense
traits; (ii) assessing the allelic variation at to understand as much as we can about the
those loci; and (iii) following the breed- plant parents, including genotype, before
ing by design approach. Because the posi- we make decisions about crossing one par-
tions of all loci of importance are mapped ent with another. In most plant breeding
precisely, recombinant events can be accu- programmes, less than 1% of all the crosses
rately selected using flanking markers to made end up in a cultivar. To a layperson,
collate the different favourable alleles next that may seem incredibly inefficient, but
to each other. Software tools should enable thats the nature of the beast. What is most
us to determine the optimal route for gen- important in plant breeding is to pick the
eration of those mosaic genotypes by cross- right parents so that breeders would have
ing lines and using markers to select for the fewer crosses to deal with and would be
specific combinations that will eventually able to spend more time and attention
combine all those alleles. The prerequisites on the crosses that will result in superior
for this approach include extremely satu- material.
rated marker maps available to enable the Generally speaking, the cross with the
generation of high-resolution chromosome highest progeny mean and largest genetic
622 Chapter 15
variance has the most potential to produce get genotype and the probability of success-
the best lines (Bernardo, 2002). Under an fully generating new cultivars through the
additive genetic model, the mid-parent proposed breeding system.
value is a good predictor of the progeny Cross performance can be accurately
mean, but the variance cannot be deduced predicted when information about the genes
from the performance of the parents alone. controlling the traits of interest is known.
The best way to estimate the progeny vari- If progeny arrays after selection in a breed-
ance is to generate and test the progeny. ing programme could be predicted, then the
Breeders normally use one of two types of efficiency of plant breeding would be greatly
parental selection: one based on parental increased. Take wheat as an example. For the
information, such as parental performance majority of economically important traits in
or the genetic diversity among parents; the wheat breeding the genes controlling their
other based on parental and progeny infor- expression remain unknown. However, for
mation. In the first case, previous studies wheat quality this information is known,
found that both high high and high low though incompletely, for certain aspects of
crosses have the potential to produce the wheat quality (Eagles et al., 2002, 2004).
best lines. In the second case, the progeny Wang and Pfeiffer (2007) demonstrated how
needs to be grown and tested, which pre- cross performance, following selection, can
cludes parental selection. Due to compli- be predicted in wheat quality breeding by
cated intra-genic, inter-genic and GEIs, no using QULINE, under the condition that all
method has given a precise prediction of the gene information of key selection traits
cross performance (Wang et al., 2005). is known.
Breeders are already aware of what par- Plant breeders have been always con-
ents are available, but often breeders pheno- fronted with the problem of predicting the
typic and field data comes in spreadsheets expected phenotypic performance of new
with numerous columns and reams of data individuals with untested gene combin-
without much association with other types ations (new genotypes) with limited infor-
of data generated in genetics and genomics. mation on the GP architecture for traits. The
Once software becomes available to show success of molecular breeding relies on an
a full genome genotype of all possible par- effective prediction of phenotypic variation
ents, one can ask, for example, which par- based on allelic variation. There are oppor-
ents will provide high yield and resistance tunities to apply molecular technologies to
to a specific disease. The informatics tool further refine the pedigree-based breeding
will indicate to the breeder what genes will strategies used today. Ultimately it will not
be traceable in the progeny and which are be sufficient to demonstrate that we can
the best sets of molecular markers for track- predict phenotypic variation and the phe-
ing these genes. notypic changes that result from selection
using genetic information, but this know-
ledge allows us to improve on the outcomes
that are currently being achieved by con-
15.5.2 Breeding product prediction ventional selection on phenotype alone.
of different breeding methods. However, in time, labour and costs associated with
because of the time and effort spent in con- nursery preparation, planting and plot
ducting field experiments, the concept of labelling (van Ginkel et al., 2002).
modelling and prediction has always been Before simulation, the breeders already
of interest to plant breeders. knew that SELBLK can save costs com-
Taking the bread wheat breeding at pared with MODPED. Some small-scale
CIMMYT as an example, breeders spend field experiments have been conducted
great efforts in choosing parents to make comparing the efficiencies of MODPED
the targeted crosses and approximately and SELBLK (Singh et al., 1998), but the
5080% of crosses are discarded in gen- relative efficiency of the two methods
erations F1 to F8, following selection for remains untested on a larger scale. Wang
agronomic traits (e.g. plant height, lodging and Pfeiffer (2007) illustrated the simula-
tolerance, tillering, appropriate heading tion principles by using the QULINE module
date and balanced yield components), dis- with CIMMYTs wheat breeding programme
ease resistance (e.g. stem rust, leaf rust and as an example. They developed the genetic
stripe rust) and end-use quality (e.g. dough models accounting for epistasis, pleiotropy
strength and extensibility, protein quantity and GEI. For each selection method, the
and quality). Then, after two cycles of yield simulation experiment comprised the same
trials (i.e. preliminary yield trial in F8 and 1000 crosses derived from 200 parents with
replicated yield trial in F9), only 10% of the an assumption that a total of 258 advanced
initial crosses remain, among which 13% lines remained following ten generations
of the crosses originally made are released of selection. The tests for the two methods
as cultivars from CIMMYTs international were each repeated 500 times on 12 GE sys-
nurseries (Wang et al., 2003, 2005). This fact tems. The simulation not only provided a
is true across plant breeding programmes clear answer that the adoption of SELBLK
of different species, which calls for a more would not cause a yield-gain penalty, but
efficient breeding system. also indicated a fact that CIMMYTs breed-
Two selection methods are commonly ers did not realize, i.e. SELBLK can retain
used in CIMMYTs wheat breeding pro- more crosses in the final selected popula-
grammes. Pedigree selection was used pri- tion than MODPED.
marily from 1944 until 1985. From 1985
until the second half of the 1990s the main
selection method was a modified pedigree/ 15.6 Future Perspectives
bulk method (MODPED), which resulted
in many widely adapted wheat cultivars The use of appropriate experimental design
and was replaced in the late 1990s by and data analysis is a critical component for
the selected bulk method (SELBLK) (van successful development and application of
Ginkel et al., 2002). The MODPED method molecular breeding approaches, in particu-
begins with pedigree selection of individ- lar, marker-assisted breeding systems. Figure
ual plants in the F2 followed by three times 15.2 shows an information flowchart from
of bulk selection from F3 to F5 and pedigree data to outputs through use of various ana-
selection in the F6; hence the name modi- lytical tools. Making these choices correctly
fied pedigree/bulk. In the SELBLK method, is a highly specialized function. There is a
spikes of selected F2 plants within a cross lack of proper and simple-to-use guidelines
are harvested in bulk, resulting in one F3 for non-specialists, which makes it difficult
seed lot per cross. This process continues for them to confidently choose the appropri-
from F3 to F5, while pedigree selection is ate design and analysis methods offered by
used only in the F6. A major advantage various types of software. Having a central-
of SELBLK compared with MODPED is ized and evolving resource offering biomet-
that fewer seed lots need to be harvested, ric inputs required for molecular breeding
threshed and visually selected for seed would be a tremendously valuable asset
appearance, leading to significant savings to the research and breeding community.
624 Chapter 15
Fig. 15.2. Analytical tools and outputs associated with procedures in plant breeding. Three types of data
from genotype (G), phenotype (P) and environment (E) are analysed using various tools, and outputs will
be delivered to breeders for decision making.
There is an urgent need for integrated mol- Independent of the platform and the
ecular tools including those for facilitating analysis methods used, the result of micro-
molecular breeding design, integrated map- array experiments is, in most cases, a list
ping and MAS and communications between of differentially expressed genes. An auto-
genomics scientists, geneticists, bioinforma- matic ontological analysis approach using
ticians and breeders. Gene Ontology has been proposed to help
There is also a need to develop mol- with the biological interpretation of such
ecular breeding decision support tools that results (Khatri et al., 2002). Currently this
can use modelling and simulation analysis approach is the de facto standard for the sec-
of all pre-existing and project generated ondary analysis of high-throughput experi-
data. These tools will help breeders design ments and a large number of tools have
and implement the most efficient breeding been developed for this purpose. Khatri
schemes (including cost- and time-related and Draghici (2005) provided a detailed
factors) based on the optimum combina- comparison of 14 such tools using the fol-
tion of MAS (for both foreground and back- lowing criteria: scope of the analysis, visu-
ground) and phenotypic selection. Other alization capabilities, statistical model(s)
decision support tools that are needed in used, correlation for multiple comparisons,
molecular breeding include: (i) those for reference microarray available, installa-
sample colleting, depositing, retrieving and tion issues and sources of annotation data.
tracking; (ii) those for data acquiring, col- This detailed analysis of the capabilities
lecting, processing and mining; and (iii) of these tools will help researchers choose
databases. the most appropriate tool for a given type
Decision Support Tools 625
Aastveit, H. and Martens, H. (1986) ANOVA interactions interpreted by partial least squares regression.
Biometrics 42, 829844.
Abad-Grau, M.M., Montes, R. and Sebastiani, P. (2006) Building chromosome-wide LD maps. Bioinformatics
22, 19331934.
Able, J.A., Langridge, P. and Milligan, A.S. (2008) Capturing diversity in the cereals: many options but little
promiscuity. Trends in Plant Sciences 12, 7179.
Abranches, R., Santos, A.P., Williams, S., Wegel, E., Castilho, A., Christou, P., Shaw, P. and Stoger, E.
(2000) Widely-separated multiple transgene integration sites in wheat chromosomes are brought
together at interphase. The Plant Journal 24, 713723.
Acosta-Gallegos, J.A., Kelly, J.D. and Gepts, P. (2007) Prebreeding in common bean and use of genetic
diversity from wild germplasm. Crop Science 47(S3), S44S59.
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A.,
Olde, B., Moreno, R.E., Kerlavage, A.R., Combie, W.R. and Venter, J.C. (1991) Complementary DNA
sequencing: expressed sequence tags and human genome project. Science 252, 16511653.
Adams, R.P. (1997) Conservation of DNA: DNA banking. In: Callow, J.A., Ford-Lloyd, B.V. and Newbury,
H.J. (eds) Biotechnology and Plant Genetics Resources Conservation and Use. CAB International,
Wallingford, UK, pp.163174.
Adi, B. (2006) Intellectual property rights in biotechnology and the fate of poor farmers agriculture. The
Journal of World Intellectual Property 9, 91112.
Aebersold, R. and Goodlett, D.R. (2001) Mass spectrometry in proteomics. Chemical Reviews 101,
269295.
Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198207.
Agrawal, P.K., Kohli, A., Twyman, R.M. and Christou, P. (2005) Transformation of plants with multiple
cassettes generates simple transgene integration patterns and high expression levels. Molecular
Breeding 16, 247260.
Aguilar, G. (2001) Access to genetic resources and protection of traditional knowledge in the territories of
indigenous peoples. Environmental Science and Policy 4, 241256.
Ahmadi, N., Albar, L., Pressoir, G., Pinel, A., Fargette, D. and Ghesquiere, A. (2001) Genetic basis and
mapping of the resistance to rice yellow mottle virus. III. Analysis of QTL efficiency in introgressed
progenies confirmed the hypothesis of complementary epistasis between two resistance QTL.
Theoretical and Applied Genetics 103, 10841092.
Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyren, P., Uhlen, M. and Lundeberg, J. (2000)
Single nucleotide polymorphism analysis by pyrosequencing. Analytical Biochemistry 280, 103110.
Ahn, S.N. and Tanksley, S.D. (1993) Comparative linkage maps of the rice and maize genomes. Proceedings
of the National Academy of Sciences of the United States of America 90, 79807984.
627
628 References
Ahn, S.N., Anderson, J.A., Sorrells, M.E. and Tanksley, S.D. (1993) Homoeologous relationships of rice,
wheat and maize chromosomes. Molecular and General Genetics 241, 483490.
Ajmone Marson, P., Castiglioni, P., Fusari, F., Kuiper, M. and Motto, M. (1998) Genetic diversity and its
relationship to hybrid performance in maize as revealed by RFLP and AFLP markers. Theoretical and
Applied Genetics 96, 219227.
Akaike, H. (1969) Fitting autoregressive models for prediction. Annals of the Institute of Statistical
Mathematics 21, 243247.
Alan, A.R., Mutchler, M.A., Brants, A., Cobb, E. and Earle, E.D. (2003) Production of gynogenic plants from
hybrids of Allium apa L. and A. roylei Stearn. Plant Science 165, 12011211.
Allard, R.W. (1956) Formulas and tables to facilitate the calculation of recombination values in heredity.
Hilgardia 24, 235278.
Allard, R.W. (1988) Genetic changes associated with the evolution of adaptedness in cultivated plants and
their progenitors. Journal of Heredity 79, 225238.
Allard, R.W. (1999) Principles of Plant Breeding, 2nd edn. John Wiley & Son, Inc., New York, 254 pp.
Allard, R.W. and Bradshaw, A.D. (1964) Implications of genotypeenvironmental interactions in applied
plant breeding. Crop Science 4, 503507.
Allen, G.C., Spiker, S. and Thompson, W.F. (2000) Use of matrix attachment regions (MARs) to minimize
transgene silencing. Plant Molecular Biology 43, 361376.
Allen-Brady, K., Wong, J. and Camp, N.J. (2006) PedGenie: an analysis approach for genetic association
testing in extended pedigrees and genealogies of arbitrary size. BMC Bioinformatics 7, 209.
Allison, D.B., Cui, X., Page, G.P. and Sabripour, M. (2006) Microarray data analysis: from disarray to con-
solidation and consensus. Nature Reviews Genetics 7, 5565.
Alonso, J.M. and Ecker, J.R. (2006) Moving forward in reverse: genetic technologies to enable genome-
wide phenomic screens in Arabidopsis. Nature Reviews Genetics 7, 524536.
Alonso, J.M., Stepanova, A.N., Leisse, T.J., Kim, C.J., Chen, H., Shinn, P., Stevenson, D.K., Zimmerman, J.,
Barajas, P., Cheuk, R., Gadrinab, C., Heller, C., Jeske, A., Koesema, E., Meyers, C.C., Parker, H.,
Prednis, L., Ansari, Y., Choy, N., Deen, H., Geralt, M., Hazari, N., Hom, E., Karnes, M., Mulholland, C.,
Ndubaku, R., Schmidt, I., Guzman, P., Aguilar-Henonin, L., Schmid, M., Weigel, D., Carter, D.E.,
Marchand, T., Risseeuw, E., Brogden, D., Zeko, A., Crosby, W.L., Berry, C.C. and Ecker, J.R. (2003)
Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301, 653657.
Alpert, K.B. and Tanksley, S.D. (1996) High-resolution mapping and isolation of a yeast artificial chromo-
some contig containing fw2.2: a major fruit weight quantitative trait locus in tomato. Proceedings of the
National Academy of Sciences of the United States of America 93, 1550315507.
Altpeter, F., Baisakh, N., Beachy, R., Bock, R., Capell, T., Christou, P., Daniell, H., Datta, K., Datta, S.,
Dix, P.J., Fauquet, C., Huang, N., Kohli, A., Mooribroek, H., Nicholson, L., Nguyen, T.H., Nugent, G.,
Raemakers, K., Romano, A., Somers, D.A., Stoger, E., Taylor, N. and Visser, R. (2005a) Particle
bombardment and the genetic enhancement of crops: myths and realities. Molecular Breeding 15,
305327.
Altpeter, F., Varshney, A., Abderhalden, O., Douchkov, D., Sautter, C., Kumlehn, J., Dudler, R. and Schweizer,
P. (2005b) Stable expression of a defense-related gene in wheat epidermis under transcriptional con-
trol of a novel promoter confers pathogen resistance. Plant Molecular Biology 57, 271283.
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. (1997) Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25,
33893402.
lvarez-Castro, J.M. and Carlborg, . (2007) A unified model for functional and statistical epistasis and its
application in quantitative trait loci analysis. Genetics 176, 11511167.
Amratunga, D. and Cabrera, J. (2004) Exploration and Analysis of DNA Microarray and Protein Array Data.
John Wiley & Sons, Inc., New York.
An, G., Watson, B.D., Stachel, S. and Gordon, M.P. (1985) New cloning vehicles for transformation of higher
plants. EMBO Journal 4, 277284.
An, G., Jeong, D.-H., An, S., Kang, H.-G., Moon, S., Han, J., Park, S., Lee, H. S. and An, K. (2003) Activation
tagged mutants to discover novel rice genes. In: Mew, T.W., Brar, D.S., Peng, S., Dawe, D. and Hardy, B.
(eds) Rice Science: Innovations and Impact for Livelihood. Proceedings of the International Rice
Research Conference, 1619 September 2002, Beijing, China, International Rice Research Institute,
Chinese Academy of Engineering and Chinese Academy of Agricultural Sciences, pp. 195204.
Andersen, J.R. and Lbberstedt, T. (2003) Functional markers in plants. Trends in Plant Science 8,
554560.
References 629
Anderson, J.A., Churchill, G.A., Autrique, J.E., Tanksley, S.D. and Sorrells, M.E. (1993) Optimizing parental
selection for genetic linkage maps. Genome 36, 181186.
Andrews, L.B. (2002) Genes and patent policy: rethinking intellectual property rights. Nature Reviews
Genetics 3, 803808.
Anido, F.L., Cravero, V., Asprelli, P., Firpo, T., Garca, S.M. and Cointry, E. (2004) Heterotic patterns in
hybrids involving cultivar-groups of summer squash, Cucurbita pepo L. Euphytica 135, 355360.
Annicchiarico, P., Bellah, F. and Chiari, T. (2005) Defining subregions and estimating benefits for a specific-
adaptation strategy by breeding programs: a case study. Crop Science 45, 17411749.
Annicchiarico, P., Bellah, F. and Chiari, T. (2006) Repeatable genotype location interaction and its exploi-
tation by conventional and GIS-based cultivar recommendation for durum wheat in Algeria. European
Journal of Agronomy 24, 7081.
Antonio, B.A., Inoue, T., Kajiya, H., Nagamura, Y., Kurata, N., Minobe, Y., Yano, M., Nakagahra, M. and
Sasaki, T. (1996) Comparison of genetic distance and order of DNA markers in five populations of
rice. Genome 39, 946956.
Arabidopsis Information Resource (2000) The Arabidopsis Information Resource (TAIR). TAIR, Stanford,
California. Available at: http://www.arabidopsis.org (accessed 17 November 2009).
Aranzana, M.J., Kim, S., Zhao, K., Bakker, E., Horton, M., Jakob, K., Lister, C., Molitor, J., Shindo, C., Tang, C.,
Toomajian, C., Traw, B., Honggang Zheng, H., Bergelson, J., Dean, C., Marjoram, P. and Nordborg, M.
(2005) Genome-wide association mapping in Arabidopsis identifies previously known flowering time
and pathogen resistance genes. PLoS Genetics 1, e60.
Arcade, A., Labourdette, A., Falque, M., Mangin, B., Chardon, F., Charcosset, A. and Joets, J. (2004) BioMercator:
integrating genetic maps and QTL towards discovery of candidate genes. Bioinformatics 20, 23242326.
Arcelllana-Panlilio, M. (2005) Principles of application of DNA microarrays. In: Sensen, C.W. (ed.) Handbook
of Genome Research, Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal
Issues. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp. 239260.
Arumuganathan, K. and Earle, E.D. (1991) Nuclear DNA content of some important plant species. Plant
Molecular Biology Reporter 9, 208219.
Ashikari, M., Sakakibara, H., Lin, S., Yamamoto, T., Takashi, T., Nishimura, A., Angeles, R.E., Qian, Q., Kitano, H.
and Matsuoka, M. (2005) Cytokinin oxidase regulates rice grain production. Science 309, 741745.
Ashman, K., Moran, M.F., Sicheri, F., Pawson, T. and Tyers, M. (2001) Cell signalling the proteomics of it
all. Sciences STKE. Available at: http://stke.sciencemag.org/cgi/content/full/sigtrans;2001/103/pe33
(accessed 17 November 2009).
Ashmore, S. (1997) Status Report on the Development and Application of in vitro Techniques for the
Conservation and Use of Plant Genetic Resources. Engelmann, F. (vol. ed.) International Plant
Genetic Resources Institute, Rome.
Auger, D.L., Gray, A.D., Ream, T.S., Kato, A., Coe, E.H., Jr and Birchler, J.A. (2005) Nonadditive gene
expression in diploid and triploid hybrids of maize. Genetics 169, 389397.
Auzanneau, J., Huyghe, C., Julier, R. and Barre, P. (2007) Linkage disequilibrium in synthetic varieties of
perennial ryegrass. Theoretical and Applied Genetics 115, 837847.
Avise, J.C. (1986) Mitochondrial DNA and the evolutionary genetics of higher animals. Philosophical
Transactions of the Royal Society of London B 312, 325342.
Avise, J.C. (2004) Molecular Markers, Natural History and Evolution, 2nd edn. Sinauer Associates, Inc.,
Sunderland, Massachusetts.
Avraham, S., Tung, C.-W., Ilic, K., Jaiswal, P., Kellogg, E.A., Susan McCouch, S., Pujar, A., Reiser, L.,
Rhee, S.Y., Sachs, M.M., Schaeffer, M., Stein, L., Stevens, P., Vincent, L., Zapata, F. and Ware, D.
(2008) The Plant Ontology Database: a community resource for plant structure and developmental
stages controlled vocabulary and annotations. Nucleic Acids Research 36, D449D454.
Ayele, M., Haas, B.J., Kumar, N., Wu, H., Xiao, Y., Van Aken, S., Utterback, T.R., Wortman, J.R., White, O.R.
and Town, C.D. (2005) Whole genome shotgun sequencing of Brassica oleracea and its application to
gene discovery and annotation in Arabidopsis. Genome Research 15, 487495.
Aylor, D.L., Price, E.W. and Carbone, I. (2006) SNAP: combine and map modules for multilocus population
genetic analysis. Bioinformatics 22, 13991401.
Ayoub, M., Armstrong, E., Bridger, G., Fortin, M.G. and Mather, D.E. (2003) Marker-based selection in
barley for a QTL region affecting -amylase activity of malt. Crop Science 43, 556561.
Ayres, N.M., Mclung, A.M., Larkin, P.D., Bligh, H.F.J., Jones, C.A. and Park, W.D. (1997) Microsatellites and
a single-nucleotide polymorphism differentiate apparent amylose classes in an extended pedigree of
US rice germ plasm. Theoretical and Applied Genetics 94, 773781.
630 References
Azpiroz-Leehan, R. and Feldmann, K.A. (1997) T-DNA insertion mutagenesis in Arabidopsis: going back
and forth. Trends in Genetics 13, 152156.
Babar, M.A., Reynolds, M.P., van Ginkel, M., Klatt, A.R., Raun, W.R. and Stone, M.L. (2006) Spectral
reflectance to estimate genetic variation for in-season biomass, leaf chlorophyll and canopy tempera-
ture in wheat. Crop Science 46, 10461057.
Babar, M.A., van Ginkel, M., Klatt, A.R., Prasad, B. and Reynold, M.P. (2007) The potential of using spec-
tral reflectance indices to estimate yield in wheat grown under reduced irrigation. Euphytica 150,
155172.
Babu, R., Nair, S.K., Prasanna, B.M. and Gupta, H.S. (2004) Integrating marker assisted selection in crop
breeding prospects and challenges. Current Science 87, 607619.
Babu, R., Nair, S.K., Kumar, A., Venkatesh, S., Sekhar, J.C., Singh, N.N., Srinivasan, G. and Gupta, H.S.
(2005) Two-generation marker-aided backcrossing for rapid conversion of normal maize lines to
Quality Protein Maize (QPM). Theoretical and Applied Genetics 111, 888897.
Bachem, C.W.B., van der Hoeven, R.S., de Bruijn, S.M., Vreugdenhil, D., Zabeau, M. and Visser, G.R.F.
(1996) Visualization of differential gene expression using a novel method of RNA fingerprinting
based on AFLP: analysis of gene expression during potato tuber development. The Plant Journal 9,
745753.
Bafna, V., Gusfield, D., Lancia, G. and Yooseph, S. (2003) Haplotyping as perfect phylogeny: a direct
approach. Journal of Computational Biology 10, 323340.
Bagge, M. and Lbberstedt, T. (2008) Functional markers in wheat: technical and economic aspects.
Molecular Breeding 22, 319328.
Bagge, M., Xia, X. and Lbberstedt, T. (2007) Functional markers in wheat. Current Opinion in Plant Biology
10, 211216.
Baginsky, S. and Gruissem, W. (2004) Choroplast proteomics: potentials and challenges. Journal of
Experimental Botany 55, 12131220.
Baginsky, S. and Gruissem, W. (2006) Arabidopsis thaliana proteomics: from proteome to genome. Journal
of Experimental Botany 57, 14851491.
Baieri, A., Bogdan, M., Frommlet, F. and Futschik, A. (2006) On locating multiple interacting quantitative
trait loci in intercross designs. Genetics 173, 16931703.
Baisakh, N., Datta, K., Oliva, N., Ona, I., Rao, G.J.N., Mew, T.W. and Datta, S.K. (2001) Rapid develop-
ment of homozygous transgenic rice using anther culture harboring rice chitinase gene for enhanced
sheath blight resistance. Plant Biotechnology 18, 101108.
Baker, R.J. (1986) Selection Indices in Plant Breeding. CRC Press, New York.
Bal, U. and Abak, K. (2007) Haploidy in tomato (Lycopersicon esculenttum Mill.): a critical review. Euphytica
158, 19.
Balint-Kurti, P.J., Zwonitzer, J.C., Wisser, R.J., Carson, M.L., Oropeza-Rosas, M.A., Holland, J.B. and
Szalma, S.J. (2007) Precise mapping of quantitative trait loci for resistance to southern leaf blight,
caused by Cochliobolus heterostrophus race O and flowering time using advanced intercross maize
lines. Genetics 176, 645657.
Balzergue, S., Dubreucq, B., Chauvin, S., Le-Clainche, I., Le Boulaire, F., de Rose, R., Samson, F.,
Biaudet, V., Lecharny, A., Cruaud, C., Weissenbach, J., Caboche, M. and Lepiniec, L. (2001) Improved
PCR-walking for large-scale isolation of plant T-DNA borders. Biotechniques 30, 496503.
Bnziger, M., Setimela, P.S., Hodson, D. and Vivek, B. (2004) Breeding for improved drought tolerance in
maize adapted to southern Africa. In: New Directions for a Diverse Planet, Proceedings of the 4th
International Crop Science Congress, 26 September1 October 2004, Brisbane, Australia. Published
on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004 (accessed 17 November 2009).
Bnziger, M., Setimela, P.S., Hodson, D. and Vivek, B. (2006) Breeding for improved abiotic stress toler-
ance in maize adapted to southern Africa. Agricultural Water Management 80, 212224.
Bao, J.B., Lee, S., Chen, C., Zhang, X.-Q., Zhang, Y., Liu, S.-Q., Clark, T., Wang, J., Cao, M.-L., Yang,
H.-M., Wang, S.M. and Yu, J. (2005) Serial analysis of gene expression study of a hybrid rice strain
(LYP9) and its parental cultivars. Plant Physiology 138, 12161231.
Barclay, I.R. (1975) High frequencies of haploid production in wheat (Triticum aestivum) by chromosome
elimination. Nature 256, 410411.
Bard, J.B.L. and Rhee, S.Y. (2004) Ontologies in biology: design, applications and future challenges. Nature
Reviews Genetics 5, 213222.
Bar-Hen, A., Charcosset, A., Bourgoin, M. and Guiard, J. (1995) Relationship between genetic markers
and morphological traits in a maize inbred lines collection. Euphytica 84, 145154.
References 631
Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. (2005) Haploview: analysis and visualization of LD and hap-
lotype maps. Bioinformatics 21, 263265.
Barrett, S.C.H and Kohn, J.R. (1991) Genetic and evolutionary consequences of small population size in
plants: implications for conservation. In: Falk, D.A. and Holsinger, K.E. (eds) Genetics and Conservation
of Rare Plants. Oxford University Press, Oxford, UK, pp. 330.
Barro, F., Cannell, M.E., Lazzeri, P.A. and Barcelo, P. (1998) The influence of auxins on transformation of
wheat and tritordeum and analysis of transgene integration patterns in transformants. Theoretical and
Applied Genetics 97, 684695.
Bartlett, J.M.S. (2002) Approaches to the analysis of gene expression using mRNA a technical overview.
Molecular Biotechnology 21, 149160.
Barton, J. (2000) Reforming the patent system. Science 287, 19331934.
Barton, N.H. and Keightley, P.D. (2002) Understanding quantitative genetic variation. Nature Reviews
Genetics 3, 1121.
Barua, U.M., Chalmers, K.J., Hackett, C.A., Thomas, W.T., Powell, W. and Waugh, R. (1993) Identification
of RAPD markers linked to a Rhynchosporium secalis resistance locus in barley using near-isogenic
lines and bulked segregant analysis. Heredity 71, 177184.
Beaujean, A., Sangwan, R.S., Hodges, M. and Sangwan-Norreel, B.S. (1998) Effect of ploidy and homozy-
gosity on transgene expression in primary tobacco transformants and their androgenetic progenies.
Molecular and General Genetics 260, 362371.
Beavis, W.D. (1994) The power and deceit of QTL experiments: lessons from comparative QTL studies. In:
49th Annual Corn and Sorghum Industry Research Conference. American Seed Trade Association,
Washington, DC, pp. 250266.
Beavis, W.D. (1998) QTL analyses: power, precision and accuracy. In: Paterson, A.H. (ed.) Molecular
Dissection of Complex Traits. CRC Press, Boca Raton, Florida, pp. 145162.
Beavis, W.D. (1999) QTL mapping in plant breeding populations. Patent EP 1042507.
Beavis, W.D. and Keim, P. (1996) Identification of QTL that are affected by environment. In: Kang, M.S. and
Gaugh, H.G. (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 123149.
Beavis, W.D., Grant, D., Albertson, M. and Fincher, R. (1991) Quantitative trait loci for plant height in
four maize populations and their associations with qualitative genetic loci. Theoretical and Applied
Genetics 83, 141145.
Beck von Bodman, S., Domier, L.L. and Farrand, S.K. (1995) Expression of multiple eukaryotic genes from
a single promotor in Nicotiana. BioTechnology 13, 587591.
Beckert, M. (1994) Advantages and disadvantages of the use of in vitro/in situ produced DH maize plants.
In: Bajaj, Y.P.S. (ed.) Biotechnology in Agriculture and Forestry, Vol. 25. Springer-Verlag, Berlin, pp.
201213.
Beckmann, J.S. and Soller, M. (1986a) Restriction fragment length polymorphisms in plant genetic improve-
ment. Oxford Surveys of Plant Molecular and Cell Biology 3, 196250.
Beckmann, J.S. and Soller, M. (1986b) Restriction fragment length polymorphisms and genetic improve-
ment of agricultural species. Euphytica 35, 111124.
Bedell, J.A., Budiman, M.A., Nunberg, A., Citek, R.W., Robbins, D., Jones, J., Flick, E., Rohlfing, T., Fries, J.,
Bradford, K., McMenamy, J., Smith, M., Holeman, H., Roe, B.A., Wiley, G., Korf, I.F., Rabinowicz, P.D.,
Lakey, N., McCombie, W.R., Jeddeloh, J.A. and Martienssen, R.A. (2005) Sorghum genome sequen-
cing by methylation filtration. PLoS Biology 3, 01030115.
Beer, S.C., Siripoonwiwat, W., ODonoughue, L.S., Sousza, E., Matthews, D. and Sorrells, M.E. (1997)
Associations between molecular markers and quantitative traits in a germplasm pool: can we infer
linkages? Journal of Agricultural Genomics 3. Available at: http://www.ncgr.org/research/jag/papers97/
paper197/indexp197.html (last accessed 31 December 2007).
Bekaert, S., Storozhenko, S., Mehrshahi, P., Bennett, M.J., Lambert, W., Gregory, J.F. III, Schubert, K.,
Hugenholtz, J., van der Straeten, D. and Hanson, A.D. (2008) Folate biofortification in food plants.
Trends in Plant Science 13, 2835.
Benchimol, L.L., de Souza, C.L., Jr, Garcia, A.F.F., Kono, P.M.S., Mangolin, C.A., Barbosa, A.M.M., Coelho,
A.S.G. and de Souza, A.P. (2000) Genetic diversity in tropical maize inbred lines: heterotic group
assignment and hybrid performance determined by RFLP markers. Plant Breeding 119, 491496.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: lessons from comparative QTL
approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289300.
Bennet, S.T., Barnes, C., Cox, A., Davies, L. and Brown C. (2005) Toward the 1,000 dollar human genome.
Pharmacogenomics 6, 373382.
632 References
Bennett, M.D., Finch, R.A. and Barclay, I.R. (1976) The time rate and mechanism of chromosome elimina-
tion in Hordeum hybrids. Chromosoma 54, 175200.
Bennetzen, J.L. (1996) The use of comparative genome mapping in the identification, cloning and manipu-
lation of important plant genes. In: Sobral, B.W.S. (ed.) The Impact of Plant Molecular Genetics.
Birkhuer, Boston, Massachusetts, pp. 7185.
Bennetzen, J.L. and Ma, J. (2003) The genetic colinearity of rice and other cereals on the basis of genomic
sequence analysis. Current Opinion in Plant Biology 6, 128133.
Bennetzen, J.L. and Ramakrishna, W. (2002) Numerous small rearrangements of gene content, order and
orientation differentiate grass genomes. Plant Molecular Biology 48, 821827.
Benson, E.E. (1990) Free Radical Damage in Stored Plant Germplasm. International Board for Plant
Genetic Resources (IBPGR), Rome.
Bent, A.F. (2000) Arabidopsis in planta transformation. Uses, mechanisms and prospects for transformation
of other species. Plant Physiology 124, 15401547.
Bernacchi, D., Beck-Bunn, T., Emmatty, D., Eshed, Y., Inai, S., Lopez, J., Petiard, V., Sayama, H., Uhlig, J.,
Zamir, D. and Tanksley, S.D. (1998a) Advanced backcross QTL analysis of tomato: II. Evaluation of
near-isogenic lines carrying single-donor introgressions for desirable wild QTL-alleles derived from
Lycopersicon hirsutum and L. pimpinellifolium. Theoretical and Applied Genetics 97, 170180.
Bernacchi, D., Beck-Bunn, T., Eshed, Y., Lopez, J., Petiard, V., Uhlig, J., Zamir, D. and Tanksley, S.D. (1998b)
Advanced backcross QTL analysis in tomato. I. Identification of QTLs for traits of agronomic import-
ance from Lycopersicon hirsutum. Theoretical and Applied Genetics 97, 381397.
Bernardo, R. (1991) Retrospective index weights used in multiple trait selection in a maize breeding pro-
gram. Crop Science 31, 11741179.
Bernardo, R. (1992) Relationship between single-cross performance and molecular marker heterozygosity.
Theoretical and Applied Genetics 83, 628634.
Bernardo, R. (1993) Estimation of coefficient of coancestry using molecular markers in maize. Theoretical
and Applied Genetics 85, 10551062.
Bernardo, R. (1994) Prediction of maize single-cross performance using RFLPs and information from
related hybrids. Crop Science 34, 2025.
Bernardo, R. (1996) Best linear unbiased prediction of maize single-cross performance. Crop Science 36,
5056.
Bernardo, R. (1999) Best linear unbiased predictor analysis. In: Coors, J.G. and Pandey, S. (eds) Genetics
and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 269276.
Bernardo, R. (2001) What if we knew all the genes for a quantitative trait in hybrid crops? Crop Science
41, 14.
Bernardo, R. (2002) Breeding for Quantitative Traits in Plants. Stemma Press, Woodbury, Minnesota,
369 pp.
Bernardo, R. (2004) What proportion of declared QTL in plants are false? Theoretical and Applied Genetics
109, 419424.
Bernardo, R. (2008) Molecular markers and selection for complex traits in plants: learning from the last 20
years. Crop Science 48, 16491664.
Bernardo, R. and Yu, J. (2007) Prospects for genomewide selection for quantitative traits in maize. Crop
Science 47, 10821090.
Bernot, A. (2004) Genome, Transcriptome and Protein Analysis. John Wiley & Sons, Ltd, Chichester, UK.
Betrn, F.J., Ribaut, J.M., Beck, D. and Gonzalez de Len, D. (2003) Genetic diversity, specific combining
ability and heterosis in tropical maize under stress and nonstress environments. Crop Science 43,
797806.
Bevan, M. (1984) Binary Agrobacterium vectors for plant transformation. Nucleic Acids Research 12,
87118721.
Bhave, S.V., Hombaker, C., Phang, T.L., Saba, L., Lapadat, R., Kechris, K., Gaydos, J., McGoldrick, D.,
Dolbey, A., Leach, S., Soriano, B., Ellington, A., Ellington, E., Jones, K., Mangion, J., Belknap, J.K.,
Williams, R.W., Hunter, L.E., Hoffman, P.L. and Tabakoff, B. (2007) The PhenoGen informatics web-
site: tools for analyses of complex traits. BMC Genetics 8, 59.
Bhojwani, S.S. (ed.) (1990) Plant Tissue Culture: Applications and Limitations. Elsevier Science Publishers,
The Netherlands.
Biber-Klemm, S. and Cottier, T. (2006) (eds) Rights to Plant Genetic Resources and Traditional Knowledge:
Basic Issues and Perspectives. CAB International, Wallingford, UK, 448 pp.
References 633
Bidinger, F.R., Serraj, R., Rizvi, S.M.H., Howarth, C., Yadav, R.S. and Hash, C.T. (2005) Field evaluation of
drought tolerance QTL effects on phenotype and adaptation in pearl millet (Pennisetum glaucum (L.)
R. Br.) top cross hybrids. Field Crops Research 94, 1432.
Bijlsma, R., Allard, R.W. and Kahler, A.L. (1986) Nonrandom mating in an open-pollinated maize popula-
tion. Genetics 112, 669680.
Bingham, P.M., Levis, R. and Rubin, G.M. (1981) Cloning of DNA sequences from the white locus of
Drosophila melanogaster by a general and novel method. Cell 25, 693704.
Bink, M.C.A.M. and Meuwissen, T. (2004) Fine mapping of quantitative trait loci using linkage disequilibrium
in inbred plant populations. Euphytica 137, 9599.
Birchler, J.A., Auger, D.L. and Riddle, N.C. (2003) In search of the molecular basis of heterosis. The Plant
Cell 15, 22362239.
Birney, E., Thompson, J.D. and Gibson, T.J. (1996) PairWise and SearchWise: finding the optimal alignment
in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids
Research 24, 27302739.
Biswas, S., Storey, J.D. and Akey, J.M. (2008) Mapping gene expression quantitative trait loci by singular
value decomposition and independent component analysis. BMC Bioinformatics 9, 244.
Bizily, S.P., Rugh, C.L. and Meagher, R.B. (2000) Phytodetoxification of hazardous organomercurials by
genetically engineered plants. Nature Biotechnology 18, 213217.
Blakeslee, A.F. and Avery, A.H. (1937) Methods of inducing chromosome doubling in plants. Journal of
Heredity 28, 393411.
Blanc, G., Charcosset, A., Mangin, B., Gallais, A. and Moreau, L. (2006) Connected populations for detect-
ing quantitative trait loci and testing for epistasis: an application in maize. Theoretical and Applied
Genetics 113, 206224.
Blanchard, J.L. (2004) Bioinformatics and systems biology, rapidly evolving tools for interpreting plant
response to global change. Field Crops Research 90, 117131.
Blanco, A., Lotti, C., Simeone, R., Signorile, A., De-Santis, V., Pasqualone, A., Troccoli, A. and Di-Fonzo, N.
(2001) Detection of quantitative trait loci for grain yield and yield components across environments in
durum wheat. Cereal Research Communications 29, 237244.
Bligh, H.F.J., Till, R.I. and Jones, C.A. (1995) A microsatellite sequence closely linked to the waxy gene of
Oryza sativa. Euphytica 86, 8385.
Blow, N. (2008) Mass spectrometry and proteomics: hitting the mark. Nature Methods 5, 741747.
Bochner, B.R. (1989) Sleuthing out bacterial identifies. Nature 339, 157158.
Bochner, B.R. (2003) New technologies to assess genotypephenotype relationships. Nature Reviews
Genetics 4, 309314.
Boer, M.P., ter Braak, C.J.F and Jansen, R.C. (2002) A penalized likelihood method for mapping epistatic
quantitative trait loci with one-dimensional genome searches. Genetics 162, 951960.
Bogyo, T.P., Lance, R.C.M., Chevalier, P. and Nilan, P.A. (1988) Genetic models for quantitatively inherited
endosperm characters. Heredity 60, 6167.
Bohanec, B., Jakse, M. and Havey, M.J. (2003) Genetic analysis of gynogenetic haploid production in
onion. Journal of American Horticulture Science 128, 571574.
Bollen, K.A. (1989) Structural Equations with Latent Variables. John Wiley & Sons, New York.
Bonnet, D.G., Rebetzke, G.J. and Spielmeyer, W. (2005) Strategies for efficient implementation of molecu-
lar markers in wheat breeding. Molecular Breeding 15, 7585.
Boppenmaier, J., Melchinger, A.E., Seitz, G., Geiger, H.H. and Herrmann, R.G. (1993) Genetic diversity
for RFLPs in European maize inbreds. III. Performance of crosses within versus between heterotic
groups for grain traits. Plant Breeding 111, 217226.
Borevitz, J.O. and Ecker, J.R. (2004) Plan genomics: the third wave. Annual Review of Genomics and
Human Genetics 5, 443477.
Borevitz, J.O., Maloof, J.N., Lutes, J., Dabi, T., Redfern, J.L., Trainer, G.T., Werner, J.D., Asami, T., Berry,
C.C., Weigel, D. and Chory, J. (2002) Quantitative trait loci controlling light and hormone response in
two accessions of Arabidopsis thaliana. Genetics 160, 683696.
Borevitz, J.O., Liang, D., Plouffe, D., Chang, H.S., Zhu, T., Weigel, D., Berry, C.C., Winzeler, E. and Chory, J.
(2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome
Research 13, 513523.
Borevitz, J.O., Hazen, S.P., Michael, T.P., Morris, G.P., Baxter, I.R., Hu, T.T., Chen, H., Werner, J.D.,
Nordborg, M., Salt, D.E., Kay, S.A., Chory, J., Weigel, D., Jones, J.D.G. and Ecker, J.R. (2007)
634 References
Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. (2002) Genetic dissection of transcriptional regulation in
budding yeast. Science 296, 752755.
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M.,
Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G., Vermaas, E., Williams, S.R., Moon, K.,
Burcham, T., Pallas, M., DuBridge, R.B., Kirchner, J., Fearon, K., Mao J.-I. and Corcoran, K. (2000)
Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.
Nature Biotechnology 18, 630634.
Breseghello, F. and Sorrells, M.E. (2006a) Association mapping of kernel size and milling quality in wheat
(Triticum aestivum L.) cultivars. Genetics 172, 11651177.
Breseghello, F. and Sorrells, M.E. (2006b) Association analysis as a strategy for improvement of quantita-
tive traits in plants. Crop Science 46, 13231330.
Bretting, P. and Duvick, D. (1997) Dynamic conservation of plant genetic resources. Advances in Agronomy
61, 151.
Bretting, P.K. and Goodman, M.M. (1989) Genetic variation in crop plants and management of germ-
plasm collections. In: Stalker, H.T. and Chapman, C. (eds) Scientific Management of Germplasm:
Charaterization, Evaluation and Enhancement. International Board for Plant Genetic Resources
(IBPGR) Training Courses: Lecture Series 2. Department of Crop Science, North Carolina State
University, Raleigh, North Carolina and IBPGR, Rome, pp. 4154.
Bretting, P.K. and Widrlechner, M.P. (1995) Genetic markers and plant genetic resource management.
Plant Breeding Reviews 13, 1186.
Brick, M.A., Byrne, P.F., Schwartz, H.F., Ogg, J.B., Otto, K., Fall, A.L. and Gilbert, J. (2006) Reaction to
three races of Fusarium wilt in the Phaseolus vulgaris core collection. Crop Science 46, 12451252.
Briggs, R.N. and Knowles, P.F. (1967) Introduction to Plant Breeding. Reinhold Books, New York.
Broman, K.W. (1997) Identifying quantitative trait loci in experimental crosses. PhD thesis, Department of
Statistics, University of California, Berkeley.
Broman, K.W. (2005) The genomes of recombinant inbred lines. Genetics 169, 11331146.
Broman, K.W., Churchill, G.A., Yandell, B.S. and Zeng, Z.B. (2003a) Statistical methods for mapping
quantitative trait loci in experimental crosses. Available at: http://www.stat.wisc.edu/yandell/statgen
(accessed 17 November 2009).
Broman, K.W., Wu, H., Sen, S. and Churchill, G.A. (2003b) R/qtl: QTL mapping in experimental crosses.
Bioinformatics 19, 889890.
Brondani, C., Rangel, N., Brondani, V. and Ferreira, E. (2002) QTL mapping and introgression of yield-
related traits from Oryza glumaepatula to cultivated rice (Oryza sativa) using microsatellite markers.
Theoretical and Applied Genetics 104, 11921203.
Brookes, G. and Barfoot, P. (2008) GM Crops: Global Socio-economic and Environmental Impacts 1996
2006. PG Economics, Dorchester, UK.
Broothaerts, W., Mitchell, H.J., Weir, B., Kaines, S., Smith, L.M.A., Yang, W., Mayer, J.E., Roa-
Rodriguez, C. and Jefferson, R.A. (2005) Gene transfer to plants by diverse species of bacteria.
Nature 433, 629633.
Brown, A.D.H. (1989a) The case for core collections. In: Brown, A.D.H., Frankel, O.H., Marshall, R.D. and
Williams, J.T. (eds) The Use of Plant Genetic Resources. Cambridge University Press, Cambridge,
UK, pp. 136156.
Brown, A.D.H. (1989b) Core collection: a practical approach to genetic resources management. Genome
31, 818824.
Brown, A.H.D. and Brubaker, C.L. (2002) Indicators for sustainable management of plant genetic resources:
how well are we doing? In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T.
(eds) Managing Plant Genetic Diversity. International Plant Genetics Resources Institute (IPGRI),
Rome, pp. 249262.
Brown, A.H.D. and Weir, B.S. (1983) Measuring genetic variability in plant populations. In: Tanksley, S.D. and
Orton, T.J. (eds) Isozymes in Plant Genetics and Breeding, Vol. 1A. Developments in Plant Genetics
and Breeding 1. Elsevier, Amsterdam, pp. 219240.
Brown, G.G., Formanova, N., Jin, H., Wargachuk, R., Dondy, C., Patil, P., Laforest, M., Zhang, J., Cheung,
W.Y. and Landry, B.S. (2003) The radish Rfo restorer gene of Ogura cytoplasmic mole sterility encodes
a protein with multiple pentatricopeptide repeat. The Plant Journal 35, 262272.
Brown, P.J., Rooney, W.L., Franks, C. and Kresovich, S. (2008) Efficient mapping of plant height quantita-
tive trait loci in a sorghum association population with introgressed dwarfing genes. Genetics 180,
629637.
636 References
Brown, S.D. and Peters, J. (1996) Combining mutagenesis and genomics in the mouse closing the
phenotype gap. Trends in Genetics 12, 433435.
Brown, S.M. and Kresovich, S. (1996) Molecular characterization for plant genetic resources conservation.
In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Co., Austin, Texas, pp. 8593.
Brown, T.A. (2002) Genomics, 2nd edn. Wiley-Liss, Wilmington, Delaware, pp. 125159.
Brownstein, M.J., Carpten, J.D. and Smith, J.R. (1996) Modulation of non-templated nucleotide addition by
Taq DNA polymerase: primer modifications that facilitate genotyping. Biotechniques 20, 10041006.
Brueggeman, R., Rostoks, N., Kudrna, D., Kilian, A., Han, F., Chen, J., Druka, A., Steffenson, B. and
Kleinhofs, A. (2002) The barley stem rust-resistance gene Rpg1 is a novel disease-resistance gene
with homology to receptor kinases. Proceedings of the National Academy of Sciences of the United
States of America 99, 93289333.
Brummer, E.C. (1999) Capturing heterosis in forage crop cultivar development. Crop Science 39, 943954.
Brummer, E.C. (2006) Breeding for cropping systems. In: Lamkey, K.R. and Lee, M. (eds) Plant Breeding:
the Arnel R. Hallauer International Symposium. Blackwell Publishing, Oxford, UK, pp. 97106.
Brunner, S., Keller, B. and Feuillet, C. (2003) A large rearrangement involving genes and low copy DNA
interrupts the micro-collinearity between rice and barley at the Rph7 locus. Genetics 164, 673683.
Bruskiewich, R., Senger, M., Davenport, G., Ruiz, M., Rouard, M., Hazekamp, T., Takeya, M., Doi, K.,
Satoh, K., Costa, M., Simon, R., Balaji, J., Akintunde, A., Mauleon, R., Wanchana, S., Shah, T.,
Anacleto, M., Portugal, A., Ulat, V.J., Thongjuea, S., Braak, K., Ritter, S., Dereeper, A., Skofic, M.,
Rojas, E., Martins, N., Pappas, G., Alamban, R., Almodiel, R., Barboza, L.H., Detras, J., Manansala,
K., Mendoza, M.J., Morales, J., Peralta, B., Valerio, R., Zhang, Y., Gregorio, S., Hermocilla, J.,
Echavez, M., Yap, J.M., Farmer, A., Schiltz, G., Lee, J., Casstevens, T., Jaiswal, P., Meintjes, A.,
Wilkinson, M., Good, B., Wagner, J., Morris, J., Marshall, D., Collins, A., Kikuchi, S., Metz, T., McLaren, G.
and van Hintum, T. (2008) The Generation Challenge Programme platform: semantic standards and
workbench for crop science. International Journal of Plant Genomics, Article ID 369601, 6 pages.
Available at: http://www.hindawi.com/journals/ijpg/2008/369601.html (accessed 17 November 2009).
Buchanan, B., Gruissem, W. and Jones, R.L. (eds) (2002) Biochemistry and Molecular Biology of Plants.
John Wiley & Sons Inc., Chichester, UK.
Buckingham, S.D. (2008) Scientific software: seeing the SNPs between us. Nature Methods 5, 903908.
Buckler, E.S. IV and Thornsberry, J.M. (2002) Plant molecular diversity and applications to genomics.
Current Opinion in Plant Biology 5, 107111.
Buckler, E.S., Holland, J.B., Bradbury, P.J., Acharya, C.B., Brown, P.J., Browne, C., Ersoz, E., Flint-Garcia,
S., Garcia, A., Glaubitz, J.C., Goodman, M.M., Harjes, C., Guill, K., Kroon, D.E., Larsson, S., Lepak,
N.K., Huihui Li, H., Mitchell, S.E., Pressoir, G., Pfeiffer, J.A., Oropeza Rosas, M., Rocheford, T.R.,
Cinta Romay, M., Romero, S., Salvo, S., Sanchez-Villeda, H., Sofia da Silva, H., Qi Sun, Q., Tian, F.,
Upadyayula, N., Ware, D.,Yates, H., Yu, J., Zhang, Z., Kresovich, S. and McMullen, M.D. (2009) The
genetic architecture of maize flowering time. Science 325, 714718.
Burgueo, J., Crossa, J., Cornelius, P.L. and Yang, R.-C. (2008) Using factor analytic models for joining environ-
ment and genotypes without crossover genotype environment interaction. Crop Science 48, 12911305.
Burns, J., Fraser, P.D. and Bramley, P.M. (2003) Identification and quantification of carotenoids, tocopherols
and chlotophylls in commonly consumed fruits and vegetables. Phytochemistry 62, 939947.
Burr, B. and Burr, F.A. (1991) Recombinant inbreds for molecular mapping in maize: theoretical and practi-
cal considerations. Trends in Genetics 7, 5560.
Burr, B., Burr, F.A., Thompson, K.H., Albertson, M.C. and Stuber, C.W. (1988) Gene mapping with recom-
binant inbreds in maize. Genetics 118, 519526.
Burton, G.W. (1981) Meeting human needs through plant breeding: past progress and prospects for the
future. In: Frey, K.J. (ed.) Plant Breeding II. Iowa State University Press, Ames, Iowa, pp. 433466.
Busch, W. and Lohmann, J.U. (2007) Profiling a plant: expression analysis in Arabidopsis. Current Opinion
in Plant Biology 10, 136141.
Bschhes, R., Hollricher, K., Ranstruga, R., Simons, G., Wolter, M., Frijters, A., van Daelen, R., van der
Lee, T., Diergaarde, P., Groenendijk, J., Tpsch, S., Vos, P., Salamini, F. and Schulze-Lefert, P. (1997)
The barley Mio gene: a novel control element of plant pathogen resistance. Cell 88, 695705.
Busso, C.S., Liu, C.J., Hash, C.T., Witcombe, J.R., Devos, K.M., deWet, J.M.J. and Gale, M.D. (1995)
Analysis of recombination rate in female and male gametogenesis in pearl millet (Pennisetum glau-
cum) using RFLP markers. Theoretical and Applied Genetics 90, 242246.
Bustamam, M., Tabien, R.E., Suwarmo, A., Abalos, M.C., Kadir, T.S., Ona, I., Bernardo, M., VeraCruz, C.M.
and Leung, H. (2002) Asian rice biotechnology network: improving popular cultivars through marker-
References 637
assisted backcrossing by the NARES. Abstract of International Rice Congress, 1622 September
2002, Beijing China. Available at: http://www.irri.org/irc2002/index.htm (last accessed 31 December
2007).
Butlin, R.K. and Tregenta, T. (1998) Levels of genetic polymorphism: marker loci versus quantitative traits.
Philosophical Transactions of the Royal Society of London B 353, 112.
Byrum, J. and Reiter, R. (1998) A method for identifying genetic marker loci associated with trait loci. Patent
EP 0972076.
Caetano-Anolls, G., Bassam, B.J. and Gresshoff, P.M. (1991) DNA amplification fingerprinting using very
short arbitrary oligonucleotide primers. Bio/Technology 9, 553557.
Caicedo, A.L. and Purugganan, M.D. (2005) Comparative plant genomics. Frontiers and prospects. Plant
Physiology 138, 545547.
Caliski, T., Kaczmarek, Z., Krajewski, P., Frova, C. and Sari-Gorla, M. (2000) A multivariate approach to
the problem of QTL localization. Heredity 84, 303310.
Campbell, B.T., Baezinger, P.S., Gill, K.S., Eskridge, K.M., Budak, H., Erayman, M., Dweikat, I. and Yen, Y.
(2003) Identification of QTLs and environmental interactions associated with agronomic traits on chro-
mosome 3A of wheat. Crop Science 43, 14931505.
Campbell, B.T., Baenziger, P.S., Eskridge, K.M., Budak, H., Streck, N.A., Weiss, A., Gill, K.S. and Erayman,
M. (2004) Using environmental covariates to explain genotype environment and QTL environ-
ment interactions for agronomic traits on chromosome 3A of wheat. Crop Science 44, 620627.
Campbell, M.A., Zhu, W., Jiang, N., Lin, H., Ouyang, S., Childs, K.L., Haas, B.J., Hamilton, J.P. and Buell,
C.R. (2007) Identification and characterization of lineage-specific genes within the Poaceae. Plant
Physiology 145, 13111322.
Candela, H. and Hake, S. (2008) The art and design of genetic screens: maize. Nature Reviews Genetics
9, 192203.
Cardon, L.R. and Bell, J.I. (2001) Association study designs for complex diseases. Nature Reviews Genetics
2, 9198.
Carlborg, . and Andersson, L. (2002) Use of randomization testing to detect multiple epistatic QTLs.
Genetical Research 79, 175184.
Carlborg, . and Haley, C.S. (2004) Epistasis: too often neglected in complex trait studies? Nature Reviews
Genetics 5, 618625.
Carlborg, ., Andersson, L. and Kinghorn, B. (2000) The use of a genetic algorithm for simultaneous map-
ping of multiple interacting quantitative trait loci. Genetics 155, 20032010.
Carlborg, ., Brockmann, G.A. and Haley, C.S. (2005) Simultaneous mapping of epistatic QTL in DU6i
DBA/2 mice. Mammalian Genome 16, 481494.
Carninci, P. and Hayashizaki, Y. (1999) High-efficiency full-length cDNA cloning. Methods in Enzymology
303, 1944.
Carpenter, A.E. and Sabatini, D.M. (2004) Systematic genome-wide screens of gene function. Nature
Reviews Genetics 5, 1122.
Cartwright, D.A., Troggio, M., Velasco, R. and Gutin, A. (2007) Genetic mapping in the presence of genotyp-
ing errors. Genetics 176, 25212537.
Casali, V.W.D. and Tigchelaar, E.C. (1975) Computer simulation studies comparing pedigree, bulk and sin-
gle seed descent selection in self-pollinated populations. Journal of American Society of Horticulture
Science 100, 364367.
Casasoli, M., Derory, J., Morera-Dutrey, C., Brendel, O., Porth, I., Guehl, J.-M., Villani, F. and Kremer, A.
(2006) Comparison of quantitative trait loci for adaptive traits between oak and chestnut based on an
expressed sequence tag consensus map. Genetics 172, 533546.
Caskey, T. and Edwards, A. (1992) DNA typing with short tandem repeat polymorphisms and identification
of polymorphic short tandem repeats. Patent EP 0639228.
Castle, W.E. (1921) On a method of estimating the number of genetic factors concerned in cases of blend-
ing inheritance. Science 54, 9396.
Causier, B., Graham, J. and Davis, B. (2005) Large-scale yeast two-hybrid analysis. In: Leister, D. (ed.)
Plant Functional Genomics. Food Products Press, New York, pp. 119135.
Causse, M.A., Fulton, T.M., Cho, Y.G., Ahn, S.N., Chunwongse, J., Wu, K., Xiao, J., Yu, Z., Ronald, P.C.,
Harrington, S.E., Second, G., McCouch, S.R. and Tanksley, S.D. (1994) Saturated molecular map of
the rice genome based on an interspecific backcross population. Genetics 138, 12511274.
Cavalli-Sforza, L.L. and Edwards, A.W.F. (1967) Phylogenetic analysis: models and estimation procedures.
American Journal of Human Genetics 19, 233257.
638 References
Ceccarelli, S. and Grando, S. (2007) Decentralized participatory plant breeding: an example of demand
driven research. Euphytica 155, 349360.
Ceccarelli, S., Grando, S., Amri, A., Asaad, F.A., Benbelkacem, A., Harrabi, M., Maatougui, M., Mekni,
M.S., Mimoun, H., El-Einen, R.A., El-Felah, M., El-Sayed, A.F., Shreidi, A.S. and Yahyaoui, A. (2001)
Decentralized and participatory plant breeding for marginal environments. In: Cooper, H.D., Spillane,
C. and Hodgkins, T. (eds) Broadening the Genetic Bases of Crop Production. CAB International.
Wallingford, UK, pp. 115135.
Cerna, F.J., Cianzio, S.R., Rafalski, A., Tingey, S. and Dyer, D. (1997) Relationship between seed yield hetero-
sis and molecular marker heterozygosity in soybean. Theoretical and Applied Genetics 95, 460467.
CFIA/NFS (Canadian Food Inspection Agency/National Forum on Seed) (2005) Seminar on the use of
molecular techniques for plant variety protection. Available at: http://www.inspection.gc.ca/english/
plaveg/pbrpov/molece.shtml (last accessed 30 June 2008).
Chagn, D., Batley, J., Edwards, D. and Forster, J.W. (2007) Single nucleotide polymorphisms genotyping
in plants. In: Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association
Mapping in Plants. Springer, Berlin, pp.7794.
Chahal, G.S. and Gosal, S.S. (2002) Principles and Procedures of Plant Breeding, Biotechnological and
Conventional Approaches. Alpha Science International Ltd, Pangbourne, UK.
Chab, J., Lecomte, L., Buret, M. and Causse, M. (2006) Stability over genetic backgrounds, generations
and years of quantitative trait loci (QTLs) for organoleptic quality in tomato. Theoretical and Applied
Genetics 112, 934944.
Chan, E.K.F., Rowe, H.C. and Kliebenstein, D.J. (2009) Understanding the evolution of defense metabolites
in Arabidopsis thaliana using genome-wide association mapping. Genetics (in press).
Chan, H.P. (2006) International patent behaviour of nine major agricultural biotechnology firms. AgBioForum
9, 5968.
Chandler, P.M., Marrion-Poll, A., Ellis, M. and Gubler, F. (2002) Mutants at the Slender1 locus of barley cv
Himalaya. Molecular and physical characterization. Plant Physiology 129, 181190.
Chandler, S. and Dunwell, J.M. (2008) Gene flow, risk assessment and the environmental release of trans-
genic plants. Critical Reviews in Plant Sciences 27, 2549.
Chapman, S.C., Hammer, G.L., Podlich, D.W. and Cooper, M. (2002) Linking biophysical and genetic mod-
els to integrate physiology, molecular biology and plant breeding. In: Kang, M.S. (ed.) Quantitative
Genetics, Genomics and Plant Breeding. CAB Internationl, Wallingford, UK, pp. 167187.
Chapman, S., Cooper, M., Podlich, D.W. and Hammer, G.L. (2003) Evaluating plant breeding strategies by
simulating gene action and dryland environment effects. Agronomy Journal 95, 99113.
Charcosset, A. and Essioux, L. (1994) The effect of population structure on the relationship between het-
erosis and heterozygosity at marker loci. Theoretical and Applied Genetics 89, 336343.
Charcosset, A. and Gallais, A. (1996) Estimation of the contribution of quantitative trait loci (QTL) to the
variance of a quantitative trait by means of genetic markers. Theoretical and Applied Genetics 93,
11931201.
Charcosset, A., Lefort-Buson, M. and Gallais, A. (1991) Relationship between heterosis and heterozygosity
at marker loci: a theoretical computation. Theoretical and Applied Genetics 81, 571575.
Charcosset, A., Causse, M., Moreau, L. and Gallais, A. (1994) Investigation into the effect of genetic back-
ground on QTL expression using three recombinant inbred lines (RIL) populations. In: van Ooijen,
J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of Molecular Markers. Centre for
Plant Breeding and Reproduction Research, Wageningen, The Netherlands, pp. 7584.
Charcosset, A., Mangin, B., Moreau, L., Combes, L., Jourjon, M.F. and Gallais, A. (2000) Heterosis in maize
investigated using connected RIL populations. In: Quantitative Genetics and Breeding Methods: the
Way Ahead. Institut National de la Recherche Agronomique (INRA), Paris, pp. 8998.
Chardon, F., Virlon, B., Moreau, L., Falque, M., Joets, J., Decousset, L., Murigneux, A. and Charcosset, A.
(2004) Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-
analysis and synteny conservation with the rice genome. Genetics 168, 21692185.
Charmet, G., Robert, N., Perretant, M.R., Gay, G., Sourdille, P., Groos, C., Bernard, S. and Bernard, M.
(1999) Marker-assisted recurrent selection for cumulating additive and interactive QTLs in recom-
binant inbred lines. Theoretical and Applied Genetics 99, 11431148.
Chase, S.S. (1969) Monoploids and monoploid derivatives of maize (Zea mays L.). The Botanical Review
35, 117167.
Chavarriaga-Aguirre, P., Maya, M.M., Tohme, J., Duque, M.C., Iglesias, C., Bonierbale, M.W., Kresovich, C.
and Kochert, G. (1999) Using microsatellites, isozymes and AFLPs to evaluate genetic diversity and
References 639
redundancy in the cassava core collection and to assess the usefulness of DNA-based markers to
maintain germplasm collections. Molecular Breeding 5, 263273.
Chellappan, P., Masona, M.V., Vanitharani, R., Taylor, N.J. and Fauquet, C.M. (2004) Broad spectrum resist-
ance to ssDNA viruses associated with transgene-induced gene silencing in cassava. Plant Molecular
Biology 56, 601611.
Chen, H., Wang, S., Xing, Y., Xu, C., Hayes, P.M. and Zhang, Q. (2003) Comparative analyses of genomic
locations and race specificities of loci for quantitative resistance to Pyricularia grisea in rice and barley.
Proceedings of the National Academy of Sciences of the United States of America 100, 25442549.
Chen, J., Griffey, C.A., Chappell, M., Shaw, J. and Pridgen, T. (1999) Haploid production in twelve wheat
F1 by wheat maize hybridization method. In: Proceedings of National Fusarium Head Blight Forum,
December 1999, Sioux Falls, South Dakota, pp.147149.
Chen, J.Q., Zhou, H.M., Chen, J., and Wang, X.C. (2006) A GATEWAY-based platform for multiple plant
transformation. Plant Molecular Biology 62, 927936.
Chen, L. and Storey, J.D. (2006) Relaxed significance criteria for linkage analysis. Genetics 173,
23712381.
Chen, M., Presting, G., Barbazuk, W.B., Goicoechea, J.L., Blackmon, B., Fang, G., Kim, H., Frisch, D., Yu,
Y., Sun, S., Higingbottom, S., Phimphilai, J., Phimphilai, D., Thurmond, S., Gaudette, B., Li, P., Liu, J.,
Hatfield, J., Main, D., Farrar, K., Henderson, C., Barnett, L., Costa, R., Williams, B., Walser, S., Atkins,
M., Hall, C., Budiman, M.A., Tomkins, J.P., Luo, M., Bancroft, I., Salse, J., Regad, F., Mohapatra, T.,
Singh, N.K., Tyagi, A.K., Soderlund, C., Dean, R.A. and Wing, R.A. (2002) An integrated physical and
genetic map of the rice genome. The Plant Cell 14, 537545.
Chen, S., Lin, X.H., Xu, C.G. and Zhang, Q. (2000) Improvement of bacterial blight resistance of Minghui
63, an elite restorer line of hybrid rice, by molecular marker-assisted selection. Crop Science 40,
239244.
Chen, T.M., Lu, C.C. and Li, W.H. (2005) Prediction of splice sites with dependency graphs and their
expanded Bayesian networks. Bioinformatics 21, 471482.
Chen, X., Temnykh, S., Xu, Y., Cho, Y.G. and McCouch, S.R. (1997) Development of microsatellite frame-
work map providing genome-wide coverage in rice (Oryza sativa L.). Theoretical and Applied Genetics
95, 553567.
Chen, Y., Lu, C., He, P., Shen, L., Xu, J., Xu, Y. and Zhu, L. (1997) Gametic selection in a doubled hap-
loid population derived from anther culture of indica/japonica cross of rice. Acta Genetica Sinica 24,
322329.
Cheng, M., Fry, J.E., Pang, S., Zhou, H., Hironaka, C., Duncan, D.R., Conner, T.W. and Wan, Y. (1997)
Genetic transformation of wheat mediated by Agrobacterium tumefaciens. Plant Physiology 115,
971980.
Cheng, M., Lowe, B.A., Spencer, T.M., Ye, X. and Armstrong, C.L. (2004) Factors influencing Agrobacterium-
mediated transformation of monocotyledonous species. In Vitro Cellular and Development Biology
Plant 40, 3145.
Chiarrolla, C. (2006) Commodifying agricultural biodiversity and development-related issues. The Journal
of World Intellectual Property 9 (1), 2560.
Chin, H.E. and Roberts, E.H. (eds) (1980) Recalcitrant Crop Seeds. Tropical Press Sdn. Bhd., Kuala
Lumpur, Malaysia.
Cho, Y.G., Ishii, T., Temnykh, S., Chen, X., Lipovich, L., McCouch, S.R., Park, W.D., Ayres, N. and
Cartinhour, S. (2000) Diversity of microsatellites derived from genomic libraries and GeneBank
sequences in rice. Theoretical and Applied Genetics 100, 713722.
Choisne, N., Samain, S., Demange, N., Orjeda, G., Michelet, L., Pelletier, E., Salanoubat, M., Weissenbach,
J. and Quetier, F. (2007) The sequencing of plant nuclear genomes. In: Morot-Gaudry, J.F., Lea, P. and
Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 2351.
Choo, T.M., Reinbergs, E. and Park, S.J. (1982) Comparison of frequency distribution of doubled haploid
and single seed descent lines in barley. Theoretical and Applied Genetics 61, 215218.
Choo, T.M., Reinbergs, E. and Kasha, K.J. (1985) Use of haploids in breeding barley. Plant Breeding
Reviews 3, 219252.
Christensen, A.H., Sharrock, R.A. and Quail, P.H. (1992) Maize polyubiquitin genes: structure, thermal per-
turbation of expression and transcript splicing and promoter activity following transfer to protoplasts
by electroporation. Plant Molecular Biology 18, 675689.
Christiansen, M.J., Anderson, S.B. and Ortiz, R. (2002) Diversity changes in an intensively bred wheat
germplasm during the 20th century. Molecular Breeding 9, 111.
640 References
Cone, K.C., McMullen, M.D., Bi, I.V., Davis, G.L., Yim, Y.-S., Gardiner, J.M., Polacco, M.L., Sanchez-Villeda, H.,
Fang, Z., Schroeder, S.G., Havermann, S.A., Bowers, J.E., Paterson, A.H., Soderlund, C.A., Engler,
F.W., Wing, R.A. and Coe, E.H. (2002) Genetic, physical and informatics resources for maize. On the
road to an integrated map. Plant Physiology 130, 15981605.
Conner, A.J., Barrell, P.J., Baldwin, S.J., Lokerse, A.S., Cooper, P.A., Erasmuson, A.K., Nap, J.P. and Jacobs,
J.M.E. (2007) Intragenic vectors for gene transfer without foreign DNA. Euphytica 154, 341353.
Cooper, M. and Byth, D.E. (1996) Understanding plant adaptation to achieve systematic applied crop
improvement a fundamental challenge. In: Cooper, M. and Hammer, G.L. (eds) Plant Adaptation
and Crop Improvement. CAB International, Wallingford, UK, pp. 523.
Cooper, M. and Hammer, G.L. (1996) Synthesis of strategies for crop improvement. In: Cooper, M. and
Hammer, G.L. (eds) Plant Adaptation and Crop Improvement. CAB International, Wallingford, UK,
pp. 591623.
Cooper, M. and Podlich, D.W. (2002) The E(NK ) model: extending the NK model to incorporate gene-by-
environment interactions and epistasis for diploid genomes. Complexity 7, 3147.
Cooper, M., Podlich, D.W. and Chapman, S.C. (1999) Computer simulation linked to gene information data-
bases as a strategic research tool to evaluate molecular approaches for genetic improvement of crops.
Workshop on Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in
Water-Limited Environments, Cento Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mexico,
2125 June 1999. Available at: http://www.cimmyt.org/ABC/map/research_tools_results/wsmolecular/
workshopmolecular/WorkshopMolecularcontents.htm (accessed 30 June 2008).
Cooper, M., Chapman, S.C., Podlich, D.W. and Hammer, G.L. (2002a) The GP problem: quantifying gene-
to-phenotype relationships. In Silico Biology 2, 151164.
Cooper, M., Podlich, D.W., Micallef, K.P., Smith, O.S., Jensen, N.M., Chapman, S.C. and Kruger, N.L.
(2002b) Complexity, quantitative traits and plant breeding: a role for simulation modelling in the genetic
improvement of crops. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB
International, Wallingford, UK, pp. 143166.
Cooper, M., Smith, O.S., Graham, G., Arthur, L., Feng, L. and Bodlich, D.W. (2004) Genomics, genetics and
plant breeding: a private sector perspective. Crop Science 44, 19071914.
Cooper, M., Podlich, D.W. and Smith, O.S. (2005) Gene-to-phenotype and complex trait genetics. Australian
Journal of Agricultural Research 56, 895918.
Cooper, M., Podlich, D.W. and Luo, L. (2007) Modelling QTL effects and MAS in plant breeding. In: Varshney,
R.K. and Tuberosa, R. (eds) Genomics-Assisted Crop Improvement. Volume 1. Genomics Approaches
and Platforms. Springer, Dordrecht, Netherlands, pp. 5795.
Coors, J.G. (1999) Selection methodologies and heterosis. In: Coors, J.G. and Pandey, S. (eds) The Genetics
and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 225245.
Coque, M. and Gallais, A. (2006) Genomic regions involved in response to grain yield selection at high and
low nitrogen fertilization in maize. Theoretical and Applied Genetics 112, 12051220.
Corneille, S., Lutz, K., Svab, Z. and Maliga, P. (2001) Efficient elimination of selectable marker genes
from the plastid genome by the CRE-lox site-specific recombination system. The Plant Journal 27,
171178.
Cornelius, P.L. and Seyedsadr, M.S. (1997) Estimation of general linearbilinear models for two-way tables.
Journal of Statistical Computation and Simulation 58, 287322.
Cornelius, P.L., Seyedsadr, M. and Crossa, J. (1992) Using the shifted multiplicative model in search for
separability in corn cultivar trials. Theoretical and Applied Genetics 84, 161172.
Cornelius, P.L., van Sanford, D.A. and Seyedsadr, M.S. (1993) Clustering cultivars into groups without rank-
change interactions. Crop Science 33, 11931200.
Cornelius, P.L., Crossa, J. and Seyedsadr, M.S. (1996) Statistical tests and estimates of multiplicative mod-
els for GE interaction. In: Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction.
CRC Press, Boca Raton, Florida, pp. 199234.
Correns, C. (1901) Bastarde zwischen Maisrassen, mit besonderer Berucksichtigung der Xenien.
Bibliotheca Botanica 53, 1161.
Cottage, A., Yang, A.P., Maunders, H., de Lacy, R.C. and Ramsay, N.A. (2001) Identification of DNA sequences
flanking T-DNA insertion by PCR walking. Plant Molecular Biology Reporter 19, 321327.
Courtois, B. (1993) Comparison of single seed descent and anther culture-derived lines of three single
crosses of rice. Theoretical and Applied Genetics 85, 625631.
Courtois, B., McLaren, G., Sinha, P.K., Prasad, K., Yadav, R. and Shen, L. (2000) Mapping QTL associated
with drought avoidance in upland rice. Molecular Breeding 6, 5566.
642 References
Coutu, C., Brandle, J., Brown, D., Brown, K., Miki, B., Simmonds, J. and Hegedus, D.D. (2007) pORE:
a modular binary vector series suited for both monocot and dicot plant transformation. Transgenic
Research 16, 771781.
Craig, W., Tepfer, M., Degrassi, G. and Ripandelli, D. (2008) An overview of general feature of rick assess-
ments and genetically modified crops. Euphytica 164, 853880.
Cravatt, B.F., Simon, G.M. and Yates, J.R. (2007) The biological impact of mass-spectrometry-based pro-
teomics. Nature 450, 9911000.
Cregan, P.B., Shoemaker, R.C. and Specht, J.E. 1999) An integrated genetic linkage map of the soybean
genome. Crop Science 39, 14641490.
Cresham, D., Dunham, M.J. and Botstein, D. (2008) Comparing whole genomes using DNA microarrays.
Nature Reviews Genetics 9, 291302.
Crosbie, T.M., Eathington, S.R., Johnson, G.R., Edwards, M., Reiter, R., Stark, S., Mohanty, R.G., Oyervides,
M., Buehler, R.E., Walker, A.K., Dobert, R., Delannay, X., Pershing, J.C., Hall, M.A. and Lamkey, K.R.
(2006) Plant breeding: past, present and future. In: Lamkey, K.R. and Lee, M. (eds) Plant Breeding: the
Arnel R. Hallauer International Symposium. Blackwell Publishing, Oxford, UK, pp. 350.
Croser, J.S., Lulsdorf, M.M., Davies, P.A., Clarke, H.J., Dayliss, K.L., Mallikarjuna, N. and Siddique, K.H.M.
(2006) Toward doubled haploid production in the Fabaceae: progress, constraints and opportunities.
Critical Reviews in Plant Sciences 25, 139157.
Crossa, J. and Cornelius, P.L. (1997) Sites regression and shifted multiplicative model clustering of cultivar
trial sizes under heterogeneity of error variances. Crop Science 37, 406415.
Crossa, J. and Cornelius, P. (2002) Linearbilinear models for the analysis of genotype-environment inter-
action. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International,
Wallingford, UK, pp. 305322.
Crossa, J. and Franco, J. (2004) Statistical methods for classifying genotypes. Euphytica 137, 1937.
Crossa, J., Cornelius, P.L., Seyedsadr, M. and Byrne, P. (1993) A shifted multiplicative model cluster ana-
lysis for grouping environments without genotypic rank change. Theoretical and Applied Genetics 85,
577586.
Crossa, J., Cornelius, P.L., Sayre, K. and Ortiz-Monasterio, R.J.I. (1995) A shifted multiplicative model
fusion method for grouping environments without cultivar rank change. Crop Science 35, 5462.
Crossa, J., Cornelius, P.L. and Seyedsadr, M.S. (1996) Using the shifted multiplicative model cluster meth-
ods for crossover GE interaction. In: Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment
Interaction. CRC Press, Boca Raton, Florida, pp. 175198.
Crossa, J., Vargas, M., van Eeuwijk, F.A., Jiang, C., Edmeades, G.O. and Hoisington, D. (1999) Interpreting
genotype environment interaction in tropical maize using linked molecular markers and environmen-
tal covariables. Theoretical and Applied Genetics 99, 611625.
Crossa, J., Cornelius, P.L. and Yan, W. (2002) Biplots of linearbilinear models for studying crossover geno-
type environment interaction. Crop Science 42, 619633.
Crossa, J., Yang, R.-C. and Cornelius, P.L. (2004) Studying crossover genotype environment interaction
using linearbilinear models and mixed models. Journal of Agricultural Biological and Environmental
Statistics 9, 362380.
Crossa, J., Burgueo, J., Autran, D., Vielle-Calzada, J.-P., Cornelius, P.L., Garcia, N., Salamanca, F. and
Arenas, D. (2005) Using linearbilinear models for studying gene-expression treatment interac-
tion in microarray experiments. Journal of Agricultural, Biological and Environmental Statistics 10,
337353.
Crossa, J., Burgueo, J., Cornelius, P.L., McLaren, G., Trethowan, R. and Krischnamachari, A. (2006)
Modeling genotype environment interaction using additive genetic covariance of relatives for pre-
dicting breeding values of wheat genotypes. Crop Science 46, 17221733.
Crossa, J., Burdueno, J., Dreisigacker, S., Vargas, M., Herrera-Foessel, S.A., Lillemo, M., Singh, R.P.,
Trethowan, R., Warburton, M., Franco, J., Reynolds, M., Crouch, J.H. and Ortiz, R. (2007) Association
analysis of historical bread wheat germplasm using additive genetic covariance of relatives and popu-
lation structure. Genetics 177, 18891013.
Crow, J.F. (1999) Dominance and overdominance. In: Coors, J.G. and Pandey, S. (eds) Genetics and
Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 4958.
Crow, J.F. (2000) The rise and fall of overdominance. Plant Breeding Reviews 17, 225257.
Cui, Y. and Wu, R. (2005) Statistical model for characterizing epistatic control of triploid endosperm trig-
gered by maternal and offspring QTLs. Genetical Research 86, 6575.
Cullis, C.A. (2004) Plant Genomics and Proteomics. John Wiley & Sons, Inc., Chichester, UK.
References 643
Curtis, J.J., Brunson, A.M., Hubbard, J.E. and Earle, F.R. (1956) Effect of the parent on oil content of the
corn kernel. Agronomy Journal 48, 551555.
Curtis, M.D. and Grossniklaus, U. (2003) A Gateway cloning vector set for high-throughput functional ana-
lysis of genes in planta. Plant Physiology 133, 462469.
Dafny-Yelin, M. and Tzfira, Z. (2007) Delivery of multiple transgenes to plant cells. Plant Physiology 145,
11181128.
DAmato, F. (1975) The problem of genetic stability in plant tissues and cell cultures. In: Frankel, O. and
Hawkes, J.G. (eds) Crop Genetic Resources for Today and Tomorrow. Cambridge University Press,
Cambridge, UK, pp. 333348.
Damude, H.G. and Kinney, A.J. (2008) Enhancing plant seed oils for human nutrition. Plant Physiology
147, 962968.
Daniell, H. and Dhingra, A. (2002) Multigene engineering: dawn of an exciting new era in biotechnology.
Current Opinion in Biotechnology 13, 136141.
Dargie, J.D. (2007) Marker-assisted selection: policy considerations and options for developing countries.
In: Guimares, E.P., Ruane, J., Scherf, B.D., Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted
Selection, Current Status and Future Perspectives in Crops, Livestock, Forestry and Fish. Food and
Agriculture Organization of the Unites Nations, Rome, pp. 441471.
Darrah, L.L. and Zuber, M.S. (1986) 1985 United States maize germplasm base and commercial breeding
strategy. Crop Science 26, 11091113.
Darvasi, A. and Soller, M. (1992) Selective genotyping for determination of linkage between a molecular
marker and a quantitative trait. Theoretical and Applied Genetics 85, 353359.
Darvasi, A. and Soller, M. (1994) Selective DNA pooling for determination of linkage between a molecular
marker and a quantitative trait. Genetics 138, 13651373.
Darvasi, A. and Soller, M. (1995) Advanced intercross lines, an experimental population for fine genetic
mapping. Genetics 141, 11991207.
Darvasi, A. and Soller, M. (1997) A simple method to calculate resolving power and confidence interval of
QTL map location. Behavior Genetics 27, 125132.
Darvasi, A., Weinreb, A., Minke, V., Weller, J.I. and Soller, M. (1993) Detecting marker-QTL linkage and esti-
mating QTL gene effect and map location using a saturated genetic map. Genetics 134, 943951.
Datta, K., Vasquez, A., Tu, J., Torrizo, L., Alam, M.F., Oliva, N., Abrigo, E., Khush, G.S. and Datta, S.K.
(1998) Constitutive and tissue-specific differential expression of cryIA(b) gene in transgenic rice
plants conferring resistance to rice insect pest. Theoretical and Applied Genetics 97, 2030.
Datta, K., Tu, J., Oliva, N., Ona, I., Velazhahan, R., Mew, T.W., Muthukrishnan, S. and Datta, S.K. (2001)
Enhanced resistance to sheath blight by constitutive expression of infection-related rice chitinase in
transgenic elite indica rice cultivars. Plant Science 160, 405414.
Datta, K., Baisakh, N., Thet, K.M., Tu, J. and Datta, S.K. (2002) Pyramiding transgenes for multiple resist-
ance in rice against bacterial blight, yellow stem borer and sheath blight. Theoretical and Applied
Genetics 106, 18.
Datta, K., Baisakh, N., Oliva, N., Torrizo, L., Abrigo, E., Tan, J., Rai, M., Rehana, S., Al-Babili, S., Beyer,
P., Potrykus, I. and Datta, S.K. (2003) Bioengineered golden indica rice cultivars with beta-carotene
metabolism in the endosperm with hygromycin and mannose selection systems. Plant Biotechnology
Journal 1, 8190.
Davenport, C.B. (1908) Degeneration, albinism and inbreeding. Science 28, 454455.
Davenport, G., Ellis, N., Ambrose, M. and Dicks, J. (2004) Using bioinformatics to analyse germplasm col-
lections. Euphytica 137, 3954.
Davuluri, R.V. and Zhang, M.Q. (2003) Computer software to find genes in plant genomic DNA. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236: Plant Functional Genomics: Methods and
Protocols. Humana Press, Inc., Totowa, New Jersey, pp. 87107.
Day, C.D., Lee, E., Kobayashi, J., Holappa, L.D., Albert, H. and Ow, D.W. (2000) Transgene integration into
the same chromosome location can produce alleles that express at a predictable level, or alleles that
are differentially silenced. Genes and Development 14, 28692880.
Day Rubenstein, K., Heisey, P., Shoemaker, R., Sullivan, J. and Frisvold, G. (2005) Economic Information
Bulletin No. (EIE2), p. 47. Available at: http://www.ers.usda.gov/publications/eib2/ (accessed 17
November 2009).
De Buck, S., Jacobs, A., Van Montagu, M. and Depicker, A. (1999) The DNA sequences of T-DNA junctions
suggest that complex T-DNA loci are formed by a recombination process resembling T-DNA integra-
tion. The Plant Journal 20, 295304.
644 References
De Buck, S., De Wilde, C., Van Montagu, M. and Depicker, A. (2000) T-DNA vector backbone sequences
are frequently integrated into the genome of transgenic plants obtained by Agrobacterium mediated
transformation. Molecular Breeding 6, 459468.
De Cosa, B., Moar, W., Lee, S.B., Miller, M. and Daniell, H. (2001) Overexpression of the Bt cry2Aa2 operon
in chloroplasts leads to formation of insecticidal crystals. Nature Biotechnology 19, 7174.
De Groote, H., Wangare, L., Kanampiu, F., Odendo, M., Diallo, A., Karaya, H. and Friesen, D. (2008) The
potential of a herbicide resistant maize technology for Striga control in Africa. Agricultural Systems
97, 8394.
De Hoog, C.L. and Mann, M. (2004) Proteomics. Annual Review of Genomics and Human Genetics 5,
267293.
de Koning, D.J. and Haley, C.S. (2005) Genetical genomics in humans and model organisms. Trends in
Genetics 21, 377381.
De Neve, M., De Buck, S., Jacobs, A., Van Montagu, M. and Depicker, A. (1997) T-DNA integration patterns
in co-transformed plant cells suggest that T-DNA repeats originate from co-integration of separate
T-DNAs. The Plant Journal 11, 1529.
De Silva, H.N. and Ball, R.D. (2007) Linkage disequilibrium mapping concepts. In: Oraguzie, N.C.,
Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in Plants. Springer,
Berlin, pp. 103132.
De Vicente, M.C. and Tanksley, S.D. (1991) Genome-wide reduction in recombination of backcross progeny
derived from male versus female gametes in an interspecific cross of tomato. Theoretical and Applied
Genetics 83, 173178.
De Vicente, M.C. and Tanksley, S.D. (1993) QTL analysis of transgressive segregation in an interspecific
tomato cross. Genetics 134, 585596.
Dean, R.E., Dahlberg, J.A., Hopkins, M.S. and Kresovich, S. (1999) Genetic redundancy and diversity
among Orange accessions in the U.S. national sorghum collection as assessed with simple sequence
repeat (SSR) markers. Crop Science 39, 12151221.
Deimling, S., Rber, F.K. and Geiger, H.H. (1997) Methodik und Genetik der in-vivo-Haploideninduktion bei
Mais. Vortr. Pflanzenzchtg. 38, 203224.
DeLacy, I.H. and Cooper, M. (1990) Pattern analysis for the analysis of regional variety trials. In: Kang,
M.S. (ed.) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University
Agricultural Center, Baton Rouge, Louisiana, pp. 301334.
DeLacy, I.H., Cooper, M. and Basford, K.E. (1996) Relationships among analytical methods used to study
genotype-by-environment interactions and evaluation of their impact on response to selection. In:
Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton,
Florida, pp. 5184.
DellaPenna, D. and Last, R.L. (2008) Genome-enabled approaches shed new light on plant metabolism.
Science 320, 479481.
Delmer, D.P. (2005) Agriculture in the developing world: connecting innovations in plant research to
downstream applications. Proceedings of the National Academy of Sciences of the United States of
America 102, 1573915746.
Delseny, M. (2004) Re-evaluating the relevance of ancestral shared synteny as a tool for crop improvement.
Current Opinion in Plant Biology 7, 126131.
Delvin, B. and Risch, N. (1995) A comparison of linkage disequilibrium measures for fine-scale mapping.
Genomics 29, 311322.
Dempster, A.P., Laid, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM
algorithm. Journal of the Royal Statistical Society Series B 39, 138.
Depicker, A., Stachel, S., Dhaese, P., Zambryski, P. and Goodman, H.M. (1982) Nopaline synthase: tran-
script mapping and DNA sequence. Journal of Molecular and Applied Genetics 1, 561573.
Dereuddre, J., Blandin, S. and Hassen, N. (1991) Resistance of alginate-coated somatic embryos of carrot
(Daucus carota L.) to desiccation and freezing in liquid nitrogen: 1. Effects of preculture. Cryo-Letters
12, 125134.
Desloire, S., Gherbi, H., Laloui, W., Marhadour, S., Clouet, V., Cattolico, L., Falentin, C., Giancola, S.,
Renard, M., Budar, F., Small, I., Caboche, M., Delourme, R. and Bendahmane, A. (2003) Identification
of the fertility restoration locus, Rfo, in radish, as a member of the pentatricopeptide-repeat protein
family. EMBO Reports 4, 588594.
Devaux, P. and Zivy, M. (1994) Protein markers for anther culturability in barley. Theoretical and Applied
Genetics 88, 701706.
References 645
Devaux, P., Kilian, A. and Kleinhofs, A. (1995) Comparative mapping of the barley genome with male and female
recombination-derived, doubled haploid populations. Molecular and General Genetics 249, 600608.
DeVerna, J.W., Chetelat, R.T., Rick, C.M. and Stevens, M.A. (1987) Introgression of Solanum lycoper-
sicoides germplasm. In: Nevins, D.J. and Jones, R.A. (eds) Tomato Biotechnology. Proc. Seminar,
University of California, Davis, California, 2022 August 1986. Plant Biology Vol.4, Alan R. Liss, New
York, pp. 2736.
DeVerna, J.W., Rick, C.M., Chetelat, R.T., Lanini, B.J. and Alpert, K.B. (1990) Sexual hybridization of
Lycopersicon esculentum and Solanum rickii by means of a sesquidiploid bridging hybrid. Proceedings
of the National Academy of Sciences of the Unites States of America 87, 94869490.
Dhillon, B.S., Boppenmaier, J., Pollmer, W.G., Hermann, R.G. and Mechinger, A.E. (1993) Relationship of
restriction fragment length polymorphisms among European maize inbreds with ear dry matter yield
of their hybrids. Maydica 38, 245248.
Dhoop, B.B., Paulo, M.J., Mank, R.A., van Eck, H.J. and van Eeuwijk, F.A. (2008) Association mapping of
quality traits in potato (Solanum tuberosum L.). Theoretical and Applied Genetics 161, 4760.
Dhungana, P., Eskridge, K.M., Baenziger, P.S., Champbell, B.T., Gill, K.S. and Dweikat, I. (2007) Analysis
of genotype-by-environment interaction in wheat using a structural equation model and chromosome
substitution lines. Crop Science 47, 477484.
Dias, A.P., Brown, J., Bonello, P. and Brotewold, E. (2003) Metabolite profiling as a functional genomics tool.
In: Grotewold, E. (ed.) Methods in Molecular Biology 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Totowa, New Jersey, pp. 415425.
Diatchenko, L., Lau, Y.-F.C., Campbell, A.P., Chenchik, A., Moqadam, F., Huang, B., Lukyanov, S.,
Lukyanov, K., Gurskaya, N., Sverdlov, E.D. and Siebert, P.D. (1996) Suppression subtractive hybridi-
zation: a method for generating differentially regulated or tissue-specific cDNA probes and libraries.
Proceedings of the National Academy of Sciences of the United States of America 93, 60256030.
Dijkhuizen, A., Dudley, J.W., Rocheford, T.R., Haken, A.E. and Eckhoff, S.R. (1998) Comparative analysis
for kernel composition using near infrared reflectance and 100g Wetmill Analysis. Cereal Chemistry
75, 266270.
Dilday, R.H. (1990) Contribution of ancestral lines in the development of new cultivars of rice. Crop Science
30, 905911.
Dinka, S.J., Campbell, M.A., Demers, T. and Raizada, M.N. (2007) Predicting the size of the progeny map-
ping population required to positionally clone a gene. Genetics 176, 20352054.
Diretto, G., Al-Babili, S., Tavazza, R., Papacchioli, V., Beyer, P. and Giiliano, G. (2008) Metabolic engineering
of potato carotenoid content through tuber-specific over-expression of a bacterial mini-pathway. PLoS
ONE 2(4), e350. doi:10.1371/journal.pone.0000350. Available at: http://www.plosone.org (accessed
17 November 2009).
Ditt, R.F., Nester, E.W. and Comai, L. (2001) Plant gene expression to Agrobacterium tumefa-
ciens. Proceedings of the National Academy of Sciences of the United States of America 98,
1095410959.
Dixon, A.L., Liang, L., Moffatt, M.F., Chen, W., Heath, S., Wong, K.C., Taylor, J., Burnett, E., Gut, I., Farrall,
M., Lathrop, G.M., Abecasis, G.R. and Cookson, W.O.C. (2007) A genome-wide association study of
global gene expression. Nature Genetics 39, 12021207.
Dodds, J.H. (1991) Introduction: conservation of plant genetic resources the need for tissue culture. In:
Dodds, J.H. (ed.) In Vitro Methods for Conservation of Plant Genetic Resources. Chapman & Hall,
London, pp. 19.
Doebley, J. (1992) Molecular systematics and crop evolution. In: Soltis, D.E., Soltis, P.S. and Doyle, J.J.
(eds) Molecular Systematics of Plants. Chapman & Hall, New York, pp. 202222.
Doebley, J., Stec, A. and Gustus, C. (1995) Teosinte branched1 and the origin of maize: evidence for epista-
sis and the evolution of dominance. Genetics 141, 333346.
Doerge, R.W. and Churchill, G.A. (1996) Permutation tests for multiple loci affecting a quantitative charac-
ter. Genetics 142, 285294.
Doi, K., Izawa, T., Fuse, T., Yamanouchi, U., Kubo, T., Shimatani, Z., Yano, M. and Yoshimura, A. (2004)
Ehd1, a B-type response regulator in rice, confers short-day promotion of flowering and controls
FT-like gene expression independently of Hd1. Genes and Development 18, 926936.
Doll, J. (1998) The patent of DNA. Science 280, 689690.
Dong, Y.S., Cao, Y.S., Zhang, X.Y., Liu, S.C., Wang, L.F., You, G.X., Pang, B.S., Li, L.H. and Jia, J.Z. (2003)
Establishment of candidate core collections in Chinese common wheat germplasm. Journal of Plant
Genetic Resources 4, 18.
646 References
Donnenwirth, J., Grace, J. and Smith, S. (2004) Intellectual property rights, patents, plant variety protection
and contracts: a perspective from the private sector. IP Strategy Today, No. 9.
Doumas, P., Al-Ghazi, Y., Rothan, C. and Robin, S. (2007) DNA microarrays in plants. In: Morot-Gaudry, J.F.,
Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire,
pp. 165190.
Dreher, K., Khairallah, M., Ribau, J.M. and Morris, M. (2003) Money matters (I): cost of field and laboratory
procedures associated with conventional and marker-assisted maize breeding at CIMMYT. Molecular
Breeding 11, 221234.
Dubcovsky, J. (2004) Marker-assisted selection in public breeding programs. The wheat experience. Crop
Science 44, 18951898.
Dubcovsky, J., Ramakrishna, W., SanMiguel, P.J., Busso, C.S., Yan, L., Shiloff, B.A. and Bennetzen, J.L.
(2001) Comparative sequence analysis of colinear barley and rice BACs. Plant Physiology 125,
13421353.
Dudley, D.N., Saghai Maroof, M.A. and Rufener, G.K. (1991) Molecular markers and grouping of parents in
a maize breeding program. Crop Science 31, 718723.
Dudley, J.W. (1977) Seventy six generations of selection for oil and protein percentage in maize. In: Pollak,
E., Kempthorne, O. and Bailey, T.B. (eds) Proceedings of International Conference on Quantitative
Genetics. Iowa State University Press, Ames, Iowa, pp. 459473.
Dudley, J.W. (1993) Molecular markers in plant improvement: manipulation of genes affecting quantitative
traits. Crop Science 33, 660668.
Dudley, J.W. (1997) Quantitative genetics and plant breeding. Advances in Agronomy 59, 123.
Dudley, J.W. (2007) From means to QTL: the Illinois Long-Term Selection Experiment as a case study in
quantitative genetics. Crop Science 47(S3), S20S31.
Dudley, J.W. (2008) Epistatic interactions in crosses of Illinois High Oil Illinois Low Oil and of Illinois High
Protein Illinois Low Protein corn strains. Crop Science 48, 5968.
Dudley, J.W. and Lambert, R.J. (1992) Ninety generations of selection for oil and protein in maize. Maydica
37, 8187.
Dudley, J.W. and Lambert, R.J. (2004) 100 generations of selection for oil and protein in corn. Plant Breeding
Reviews 24 (Part 1), 79110.
Dudley, J.W., Lambert, R.J. and Alexander, D.E. (1974) Seventy generations of selection for oil and protein
concentration in the maize kernel. In: Dudley, J.W. (ed.) Seventy Generations of Selection for Oil and
Protein in Maize. Crop Science Society of America, Madison, Wisconsin, pp. 181212.
Dudley, J.W., Lambert, R.J. and de la Roche, I.A. (1977) Genetic analysis of crosses among corn strains
divergently selected for percent oil and protein. Crop Science 17, 111117.
Dunford, R.P., Yano, M., Kurata, N., Sasaki, T., Huestis, G., Rocheford, T. and Laurie, D.A. (2002)
Comparative mapping of the barley Phd-H1 photoperiod response gene region, which lies close to a
junction between two rice linkage segments. Genetics 161, 825834.
Dunn, G. and Everitt, B.S. (1982) An Introduction to Mathematical Taxonomy. Cambridge Studies in
Mathematical Biology. Vol. 5. Cambridge University Press, Cambridge, UK.
Dunning, A.M., Durocher, F., Healey, C.S., Teare, M.D., McBride, S.E., Carlomagno, F., Xu, C.-F., Dawson, E.,
Rhodes, S., Ueda, S., Lai, E., Luben, R.N., Van Rensburg, E.J., Mannermaa, A., Kataja, V., Rennart, G.,
Dunham, I., Purvis, I., Easton, D. and Ponder, B.A.J. (2000) The extent of linkage disequilibrium in four
populations with distinct demographic histories. American Journal of Human Genetics 67, 15441554.
Dunninnton, E.A., Haberefeld, A., Stallard, L.G., Siegel, P.B. and Hillel, J. (1992) Deoxyribonucleic-acid fin-
gerprint bands linked to loci coding for quantitative traits in chicken. Poultry Science 71, 12511258.
Dunwell, J.M. (2005) Intellectual property aspects of plant transformation. Plant Biotechnology Journal 3,
371384.
Dunwell, J.M. (2006) Patents and transgenic plants. In: Fri, M.G., Holb, I. and Bisztray, G.D. (eds) Proceedings
of Vth International Symposium on In Vitro Culture and Horticultural Breeding. International Society
for Horticultural Science. Acta Horticulturae 725, 719732.
Dutfield, G. (2003) Protecting Traditional Knowledge and Folklore, Issue Paper 1. International Centre
on Trade and Sustainable Development and United Nations Conference on Trade and Development
Project on Intellectual Property Rights and Sustainable Development, Geneva.
Duvick, D.N. (1977) Major USA crops in 1976. Annals of the New York Academy of Sciences 287,
8696.
Duvick, D.N. (1984) Genetic contribution to yield grains of U.S. hybrid maize, 19301980. In: Fehr, W.R.
(ed.) Genetic Contributions to Yield Grains of Five Major Crop Plants. Crop Science Society of America
References 647
(CSSA) Spec. Publ. 7. CSSA and American Society of Agronomy (ASA), Madison, Wisconsin, pp.
1547.
Duvick, D.N. (1990) Genetic enhancement and plant breeding. In: Janick, J. and Simon, J.E. (eds) Advances
in New Crops. Proc. First National Symposium on New Crops: Research, Development, Economics.
Timber Press, Portland, Oregon, pp. 9096.
Duvick, D.N. (1999) Heterosis: feeding people and protecting natural resources. In: Coors, J.G. and Pandey,
S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp.
1929.
Duvick, D.N., Smith, J.S.C. and Cooper, M. (2004) Long-term selection in commercial hybrid maize breed-
ing programs. Plant Breeding Reviews 24 (Part 2), 109151.
Dwivedi, S.L., Blair, M., Upadhyaya, H.D., Serraj, R., Balaji, J., Buhariwalla, H.K., Ortiz, R. and Crouch,
J.H. (2005) Using genomics to exploit grain legume biodiversity in crop improvement. Plant Breeding
Reviews 26, 176357.
Dwivedi, S.L., Crouch, J.H., Mackill, D.J., Xu, Y., Blair, M.W., Ragot, M., Upadhyaya, H.D. and Ortiz, R.
(2007) The molecularization of public sector crop breeding: progress, problems and prospects.
Advances in Agronomy 95, 163318.
Eagles, H.A., Bariana, H.S., Ogbonnaya, F.C., Rebetzke, G.J., Hollamby, G.J., Henry, R.J., Henschke, P.H.
and Carter, M. (2001) Implementation of markers in Australian wheat breeding. Australian Journal of
Agricultural Research 52, 13491356.
Eagles, H.A., Hollamby, G.J., Gororo, N.N. and Eastwood, R.F. (2002) Estimation and utilization of glutein
gene effects from the analysis of unbalanced data from wheat breeding programs. Australian Journal
of Agricultural Research 53, 367377.
Eagles, H.A., Eastwood, R.F., Hollamby, G.J., Martin, E.M. and Cornish, G.B. (2004) Revision of the esti-
mates of glutenin gene effects at the Glu-B1 locus form southern Australian wheat breeding pro-
grams. Australian Journal of Agricultural Research 55, 10931096.
Eamens, A., Wang, M.-B., Smith, N.A. and Waterhouse, P.M. (2008) RNA silencing in plants: yesterday,
today and tomorrow. Plant Physiology 147, 456468.
Earley, K.W., Haag, J.R., Pontes, O., Opper, K., Juehne, T., Song, K. and Pikaard, C.S. (2006) GATEWAY-
compatible vectors for plant functional genomics and proteomics. Plant Journal 45, 616629.
East, E.M. (1908) Inbreeding in corn. Rep. Connecticut Expt. Stat. Years 19071908, pp. 419428.
Eathington, S.R. (2005) Practical applications of molecular technology in the development of commercial
maize hybrids. In: Proceedings of the 60th Annual Corn and Sorghum Seed Research Conferences.
American Seed Trade Association, Washington, DC.
Eathington, S.R., Crosbie, T.M., Edwards, M.D., Reiter, R.S. and Bull, J.K. (2007) Molecular markers in a
commercial breeding program. Crop Science 47(S3), S154S163.
Eberhart, S.A. and Russell, W.A. (1966) Stability parameters for comparing varieties. Crop Science 6,
3640.
Ebinuma, H.K., Sugita, K., Matsunaga, E., Endo, S., Yamada, K. and Komamine, A. (2001) Systems for
removal of a selection marker and their combination with a positive marker. Plant Cell Reports 20,
383392.
Ebinuma, H.K., Sugita, E., Endo, S., Matsunaga, E. and Yamada, K. (2004) Elimination of markers genes
from transgenic plants using MAT vector system. In: Pea, L. (ed.) Methods in Molecular Biology,
vol. 286: Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp.
237253.
Eder, J. and Chalyk, S. (2002) In vivo haploid induction in maize. Theoretical and Applied Genetics 104,
703708.
Edmeades, G.O., Bnziger, M. and Ribaut, J.M. (2000) Maize improvement for drought-limited environ-
ments. In: Otegui, M.E. and Slafer, G.A. (eds) Physiological Bases for Maize Improvement. Food
Products Press, New York, pp. 75111.
Edwards, D. and Batley, J. (2004) Plant bioinformatics: from genome to phenome. Trends in Biotechnology
22, 232237.
Edwards, D., Forster, J.W., Chagn, D. and Batley, J. (2007a) What is SNPs? In: Oraguzie, N.C., Rikkerink,
E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in Plants. Springer, Berlin, pp.
4152.
Edwards, D., Forster, J.W., Cogan, N.O.I., Batley, J. and Chagn, D. (2007b) Single nucleotide poly-
morphism discovery. In: Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds)
Association Mapping in Plants. Springer, Berlin, pp. 5376.
648 References
Edwards, J.D., Janda, J., Sweeney, M.T., Gaikwad, A.B., Liu, B., Leung, H. and Galbraith, D.W. (2008)
Development and evaluation of a high-throughput, low-cost genotyping platform based on oligonucle-
otide microarrays in rice. Plant Methods 4, 13.
Edwards, M. and Johnson, L. (1994) RFLPs for rapid recurrent selection. In: Proceedings of Symposium
on Analysis of Molecular Marker Data. American Society of Horticultural Science and Crop Science
Society of America, Corvallis, Oregon, pp. 3340.
Edwards, M.D. and Page, N.J. (1994) Evaluation of marker-assisted selection through computer simulation.
Theoretical and Applied Genetics 88, 376382.
Edwards, M.D., Stuber, C.W. and Wendel, J.F. (1987) Molecular-marker-facilitated investigations of quan-
titative trait loci in maize. I. Numbers, genomic distribution and types of gene action. Genetics 116,
113125.
Edwards, M.D., Helentjaris, T., Wright, S. and Stuber, C.W. (1992) Molecular-marker-facilitated inves-
tigations of quantitative trait loci in maize. 4. Analysis based on genome saturation with isozyme
and restriction fragment length polymorphism markers. Theoretical and Applied Genetics 83,
765774.
Eisemann, R.L., Cooper, M. and Woodruff, D.R. (1990) Beyond the analytical methodology, better inter-
pretation and exploiting of GE interaction in plant breeding. In: Kang, M.S. (ed.) Genotype-by-
Environment Interaction and Plant Breeding. Louisiana University Agricultural Center, Baton Rouge,
Louisiana, pp. 108117.
Eitan, Y. and Soller, M. (2004) Selection induced genetic variation. In: Wasser, S. (ed.) Evolutionary Theory
and Processes: Modern Horizon. Papers in honour of Eviatar Nevo. Kluwer Academic Publishers,
Dordrecht, Netherlands, pp. 154176.
Elston, R.C. (1984) The genetic analysis of quantitative trait differences between two homozygous lines.
Genetics 108, 733744.
Emebiri, L.C. and Moody, D.B. (2006) Heritable basis for some genotype-environment stability statistics:
inference from QTL analysis of heading date in two-rowed barley. Field Crops Research 96, 243251.
Empig, L.T., Gardner, C.O. and Compton, W.A. (1972) Theoretical grains for different population improve-
ment procedures. Nebraska Agricultural Experiment Station Miscellaneous Publications 26 (revised).
Emrich, S., Li, L., Wen, T.J., Ashlock, D., Aluru, S. and Schnable, P. (2007b) Nearly identical paralogs:
implications for maize (Zea mays L.) genome evolution. Genetics 175, 429439.
Endo, S., Kasahara, Y., Sugita, K. and Ebinuma, H. (2002a) A new GST-MAT vector containing both the ipt
gene and iaaM/H genes can produce marker-free transgenic plants with high frequency. Plant Cell
Reports 20, 923928.
Endo, S., Sugita, K., Sakai, M., Tanaka, H. and Ebinuma, H. (2002b) Single-step transformation for generat-
ing marker-free transgenic rice using the ipt-type MAT vector system. The Plant Journal 30, 115122.
Engelmann, F. and Engels, J.M.M. (2002) Technologies and strategies for ex situ conservation. In: Engels,
J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic Diversity.
International Plant Genetic Resources Institute, Rome, pp. 89103.
Engels, J.M.M. and Visser, L. (2003) A guide to effective management of germplasm collections. IPGRI
Handbook for Genebanks No. 6. International Plant Genetic Resources Institute, Rome.
Enserink, M. (2008) Tough lessons from golden rice. Science 320, 468471.
Eronen, L., Geerts, F. and Toivonen, H. (2004) A Markov chain approach to reconstruction of long haplo-
types. Pacific Symposium on Biocomputing 9, 104115.
Ervin, D., Batie, S., Welsh, R., Carpentier, C.L., Fern, J.I., Richman, N.J. and Schulz, M.A. (2000) Transgenic
Crops: an Environmental Assessment. Henry A. Wallace Center for Agricultural and Environmental
Policy at Winrock International, Arlington, Virginia.
Erwin, T. (1991) An evolutionary basis for conservation strategies. Science 253, 750752.
Eshed, Y. and Zamir, D. (1994) A genomic library of Lycopersicon pennellii in L. esculentum: a tool for fine
mapping of genes. Euphytica 79, 175179.
Eshed, Y. and Zamir, D. (1995) An introgression line population of Lycopersicon pennellii in the cultivated
tomato enables the identification and fine mapping of yield associated QTL. Genetics 141, 11471162.
Eshed, Y. and Zamir, D. (1996) Less-than-additive epistatic interactions of quantitative trait loci in tomato.
Genetics 143, 18071817.
Esquinas-Alczar, J.T. (1993) Plant genetic resources. In: Hayward, M.D., Bosemark, N.O. and Romagosa, I.
(eds) Plant Breeding: Principles and Prospects. Chapman & Hall, London, pp. 3351.
Esquinas-Alczar, J. (2005) Protecting crop genetic diversity for food security: political, ethical and techni-
cal challenges. Nature Reviews Genetics 6, 946953.
References 649
ETC Group (Action Group on Erosion, Technology and Concentration) (2005) Global seed industry con-
centration 2005. Communique September/October 2005, pp. 112.
Etzel, C. and Guerra, R. (2003) Meta-analysis of genetic-linkage of quantitative trait loci. American Journal
of Human Genetics 71, 5665.
Eujayl, I., Sorrels, M.E., Baum, M., Wolters, P. and Powell, W. (2002) Isolation of EST-derived microsatel-
lite markers for genotyping the A and B genomes of wheat. Theoretical and Applied Genetics 104,
399407.
European Parliament (2001) Directive 2001/18/EC of the European Parliament and of the Council of 12
March 2001 on the deliberate release into the environment of genetically modified organisms and
repealing Council Directive 90/220/EEC Commission Declaration. Official Journal of European
Community L 106, 139.
Evans, L.T. (1993) Crop Evolution, Adaptation and Yield. Cambridge University Press, New York.
Faham, M., Zheng, J., Moorhead, M., Fakhrai-Rad, H., Namsaraev, E., Wong, K., Wang, Z., Chow,
S.G., Lee, L., Suyenaga, K., Reichert, J., Boudreau, A., Eberle, J., Bruckner, C., Jain, M., Karlin-
Neumann, G., Jones, H.B., Willis, T.D., Buxbaum, J.D. and Davis, R.W. (2005) Multiplexed variation
scanning for 1,000 amplicons in hundreds of patients using mismatch repair detection (MRD) on
tag arrays. Proceedings of the National Academy of Sciences of the United States of America 102,
1471714722.
Falconer, D.S. (1960) Introduction to Quantitative Genetics. Oliver & Boyd, Edinburgh, UK.
Falconer, D.S. (1981) Introduction to Quantitative Genetics, 2nd edn. Longman, London.
Falconer, D.S. (1989) Introduction to Quantitative Genetics, 3rd edn. Wiley, New York.
Falconer, D.S. and Mackay, T.F.C. (1996) Introduction to Quantitative Genetics, 4th edn. Longman Scientific
& Technical Ltd, Harlow, UK.
Faleiro, F.G., Ragagnin, V.A., Moreira, M.A. and de Barros, E.G. (2004) Use of molecular markers to acceler-
ate the breeding of common bean lines resistant to rust and anthracnose. Euphytica 138, 213218.
Falque, M. and Santoni, S. (2007) Molecular markers and high-throughput genotyping analysis. In: Morot-
Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New
Hampshire, pp. 503527.
Falque, M., Decousset, L., Dervins, D., Jacob, A.-M., Joets, J., Martinant, J.-P., Raffoux, X., Ribire, N.,
Ridel, C., Samson, D., Charcosset, A. and Murigneux, A. (2005) Linkage mapping of 1454 new maize
candidate gene loci. Genetics 170, 19571966.
Falush, D., Stephens, M. and Pritchard, J.K. (2003) Inference of population structure using multilocus geno-
type data: linked loci and correlated allele frequencies. Genetics 164, 15671587.
Fan, C., Xing, Y., Mao, H., Lu, T., Han, B., Xu, C., Li, X. and Zhang, Q. (2006) GS3, a major QTL for grain
length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmem-
brane protein. Theoretical and Applied Genetics 112, 11641171.
Fang, Y.-D., Akula, C. and Altpeter, F. (2002) Agrobacterium-mediated barley (Hordeum vulgare L.)
transformation using green fluorescent protein as a visual marker and sequence analysis of the
T-DNA:genomic DNA junctions. Journal of Plant Physiology 159, 11311138.
FAO (Food and Agriculture Organization of the United Nations) (1998) The State of the Worlds Plant
Genetic Resources for Food and Agriculture. FAO, Rome.
Faris, J.D., Laddomada, B. and Gill, B.S. (1998) Molecular mapping of segregation distortion loci in Aegilops
tauschii. Genetics 149, 319327.
Faris, J.D., Fellers, J.P., Brooks, S.A. and Gill, B.S. (2003) A bacterial artificial chromosome contig span-
ning the major domestication locus Q in wheat and identification of a candidate gene. Genetics 164,
311321.
Fashena, S.J., Serebriiskii, I. and Golemis, E.A. (2000) The continued evolution of two-hybrid screening
approaches in yeast: how to outwit different preys with different baits. Gene 250, 114.
Fatokun, C.A., Menancio-Hautea, D.I., Danesh, D. and Young, N.D. (1992) Evidence for orthologous seed
weight genes in cowpea and mung bean based on RFLP mapping. Genetics 132, 841846.
Fauquet, C.M. and Tohme, J. (2004) The global cassava partnership for genetic improvement. Plant
Molecular Biology 86, vx (editorial).
Fehr, W.R. (1987) Principles of Cultivar Development. Vol. 1. Theory and Techniques. Macmillan Publishing
Company, London.
Feltus, F.A., Singh, H.P., Lohithaswa, H.C., Schulze, S.R., Silva, T.D. and Paterson, A.H. (2006) A com-
parative genomic strategy for targeted discovery of single-nucleotide polymorphisms and conserved-
noncoding sequences in orphan crops. Plant Physiology 140, 11831191.
650 References
Fenn, J.B., Mann, M., Meng, C.K., Wong, S.F. and Whitehouse, C.M. (1989) Electrospray ionization for the
mass spectrometry of large biomolecules. Science 246, 6471.
Fernandez-Ricaud, L., Warringer, J., Ericson, E., Pylvanainen, I., Kemp, G.J.L., Nerman, O. and Blomberg,
A. (2005) PROPHECY a database for high-resolution phenomics. Nucleic Acids Research 33,
D369D373.
Fernando, R.L. (2002) Methods to map QTL. Available at: http://meishan.ansci. iastate.edu/rohan/notes-dir/
QTL.pdf (accessed 31 December 2007).
Fernando, R.L., Nettleton, D., Southey, B.R., Dekkers, J.C.M., Rothschild, M.F. and Soller, M.
(2004) Controlling the proportion of false positives in multiple dependent tests. Genetics 166,
611619.
Ferrie, A.M.R. (2007) Doubled haploid production in nutraceutical species: a review. Euphytica 158, 347357.
Ferro, M., Salvi, D., Rivire-Polland, H., Vernat, T., Seigneurin-Berny, D., Grunwald, D., Garin, J., Joyard,
J. and Rolland, N. (2002) Integral membrane proteins of the chloroplast envelope: identification and
subcellular localization of new transporters. Proceedings of the National Academy of Sciences of the
United States of America 99, 1148711492.
Fiehn, O. (2002) Metabolomics the link between genotypes and phenotypes. Plant Molecular Biology
48, 155171.
Fiehn, O., Wohlgemuth, G., Scholz, M., Kind, T., Lee, D.Y., Lu, Y., Moon, S. and Nikolau. B. (2008) Quality
control for plant metabolomics: reporting MSI-compliant studies. The Plant Journal 53, 691704.
Fields, S. and Song, O. (1989) A novel genetic system to detect proteinprotein interactions. Nature 340,
245246.
Filipski, A. and Kumar, S. (2005) Comparative genomics in eukaryotes. In: Gregory, T.R. (ed.) The Evolution
of the Genome. Elsevier Inc., Amsterdam, pp. 521583.
Finak, G., Hallett, M., Park, M. and Pepin, F. (2005) Bioinformatics tools for gene-expression studies.
In: Sensen, C.W. (ed.) Handbook of Genome Research. Genomics, Proteomics, Metabolomics,
Bioinformatics, Ethical and Legal Issues. WILEY-VCH, Weinheim, Germany, pp. 415434.
Finlay, K.W. and Wilkinson, G.N. (1963) The analysis of adaptation in a plant-breeding programme.
Australian Journal of Agricultural Research 14, 742754.
Fire, A., Xu, S., Montgomery, M., Kostas, S., Driver, S. and Mello, C. (1998) Potent and specific genetic
interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806811.
Fisher, R.A. (1918) The correlation between relatives on the supposition of Mendelian inheritance.
Transactions of the Royal Society of Edinburgh, Earth Sciences 52, 399433.
Fisher, R.A. (1935) The detection of linkage with dominant abnormalities. Annals of Eugenics 6, 187201.
Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7,
179188.
Fisk, H.J. and Dandekar, A.M. (2004) Electroporation. In: Pea, L. (ed.) Methods in Molecular Biology, Vol.
286. Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp. 7990.
Flint, J. and Mott, R. (2001) Finding the molecular basis of quantitative traits: successes and pitfalls. Nature
Reviews Genetics 2, 437445.
Flint-Garcia, S.A., Thornsberry, J.M. and Buckler, E.S. (2003) Structure of linkage disequilibrium in plants.
Annual Review of Plant Biology 54, 357374.
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.G. and Miller, W. (1998) A computer program for aligning a
cDNA sequence with a genomic DNA sequence. Genome Research 8, 967974.
Flores, F., Moreno, M.T. and Cubero, J.I. (1998) A comparison of univariate and multivariate methods to
analyze G E interaction. Field Crops Research 56, 271286.
Fodor, S., Dower, W. and Solas, D. (1998) Detection of nucleic acid sequences. Patent EP 0834576.
Fofana, I.B.F., Sangar, A., Collier, R., Taylor, C. and Fauquet, C.M. (2004) A geminivirus-induced gene
silencing system for gene function validation in cassava. Plant Molecular Biology 56, 613624.
Foolad, M.R. and Jones, R.A. (1992) Models to estimate maternally controlled genetic variation in quantita-
tive seed characters. Theoretical and Applied Genetics 83, 360366.
Foolad, M.R. and Jones, R.A. (1993) Mapping salt-tolerance genes in tomato (Lycopersicon esculentum)
using trait-based marker analysis. Theoretical and Applied Genetics 87, 184192.
Forster, B.P. and Thomas, W.T.B. (2004) Doubled haploids in genetics and plant breeding. Plant Breeding
Reviews 25, 5788.
Forster, B.P., Ellis, R.P., Thomas, W.T.B., Newton, A.C., Tuberosa, R., This, D., El-Enein, R.A., Bahri, M.H.
and Ben Salem, M. (2000) The development and application of molecular markers for abiotic stress.
Journal of Experimental Botany 51, 1927.
References 651
Forster, B.P., Herberle-Bors, E., Kasha, K.J. and Touraev, A. (2007) The resurgence of haploids in higher
plants. Trends in Plant Science 12, 368375.
Foster, G.D. and Twell, D. (eds) (1996) Plant Gene Isolation: Principles and Practice. John Wiley & Sons,
Chichester, UK, 426 pp.
Fowler, C. and Hodgkin, T. (2004) Plant genetic resources for food and agriculture: assessing global avail-
ability. Annual Review of Environment and Resources 29, 143179.
Fowler, C. and Lower, R.L. (2005) Politics of plant breeding. Plant Breeding Reviews 25, 2155.
Fowler, C., Hawtin, G., Ortiz, R., Iwanaga, M. and Engels, J. (2005) The questions and derivatives: promot-
ing use and ensuring availability of non-proprietary plant genetic resources. The Journal of World
Intellectual Property 7, 641663.
Fox, P.N., Crossa, J. and Romagosa, I. (1997) Multi-environment testing and genotype environment
interaction. In: Kempton, R.A. and Fox, P.N. (eds) Statistical Methods for Plant Variety Evaluation.
Chapman & Hall, London, pp. 117138.
Fraley, R. (2006) Presentation at Monsanto European Investor Day, 10 November 2006. Available at: http://
www.monsanto.com (accessed 17 November 2009).
Fraley, R.T., Rogers, S.G. and Horsch, R.B. (1986) Genetic transformation in higher plants. Critical Reviews
in Plant Sciences 4, 146.
Francia, E., Tacconi, G., Crosatti, C., Barabaschi, D., Bulgarelli, D., DallAglio, E. and Vale, G. (2005) Marker
assisted selection in crop plants. Plant Cell, Tissue and Organ Culture 82, 317342.
Franco, J., Crossa, J., Taba, S. and Shands, H. (2005) A sampling strategy for conserving genetic diversity
when forming core subsets. Crop Science 45, 10351044.
Franco, J., Crossa, J., Warburton, M.L. and Taba, S. (2006) Sampling strategies for conserving maize diver-
sity when forming core subsets using genetic markers. Crop Science 46, 854864.
Franois, I., Broekaert, W. and Cammue, B. (2002a) Different approaches for multi-transgene-stacking in
plants. Plant Science 163, 281295.
Franois, I.E.J.A., De Bolle, M.F.C., Dwyer, G., Goderis, I.J.W.M, Wouters, P.F.J., Verhaert, P., Proost, P.,
Schaaper, W.M.M., Cammue, B.P.A and Broekaert, W.F. (2002b) Transgenic expression in Arabidopsis
thaliana of a polyprotein construct leading to production of two different antimicrobial proteins. Plant
Physiology 128, 13461358.
Franois, I.E.J.A., Dwyer, G.I., De Bolle, M.F.C., Goderis, I.J.W.M, van Hemelrijck, W., Proost, P., Wouters,
P.F.J., Broekaert, W.F. and Cammue,, B.P.A. (2002c) Processing in transgenic Arabidopsis thaliana
plants of polyproteins with linker peptide variants derived from the Impatiens balsamina antimicrobial
polyprotein precursor. Plant Physiology and Biochemistry 40, 871879.
Frankel, O. (1984) Genetic perspectives of germplasm conservation. In: Arber, W., Limensee, K., Peacock,
W.J. and Starlinger, P. (eds) Genetic Manipulation: Impact on Man and Society. Cambridge University
Press, Cambridge, UK, pp. 161170.
Frankel, O.H. (1986) Genetic resources museum or utility. In: Williams, T.A. and Wratt, G.S. (eds) Plant
Breeding Symposium, DSIR 1986. Agronomy Society of New Zealand, Christchurch, pp. 37.
Frankel, O.H. and Brown, A.H.D. (1984) Current plant genetic resources: a critical appraisal. In: Genetics:
New Frontiers (Vol. IV). Oxford & IBH, New Delhi.
Frankel, W.N. (1995) Taking stock of complex trait genetics in mice. Trends in Genetics 11, 471477.
Frary, A., Nesbitt, T.C., Frary, A., Grandillo, S., van de Knaap, E., Cong, B., Liu, J., Meller, J., Elber, R.,
Alpert, K.B. and Tanksley, S.D. (2000) fw2.2: a quantitative trait locus key to the evolution of tomato
fruit size. Science 289, 8588.
Frascaroli, E., Can, M.A., Landi, P., Pea, G., Gianfranceschi, L., Villa, M., Morgante, M. and P, M.E.
(2007) Classical genetic and quantitative trait loci analyses of heterosis in a maize hybrid between
two elite inbred lines. Genetics 176, 625644.
Frawely, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1991) Knowledge discovery in databases: an over-
view. In: Piatetsky-Shapiro, G. and Frawely, W.J. (eds) Knowledge Discovery in Databases. AAAI
Press, Menlo Park, California and MIT Press, Cambridge, Massachusetts, pp. 127.
Freeman, G.H. (1973) Statistical methods for the analysis of genotypeenvironment interactions. Heredity
31, 339354.
Freudenreich, C.H., Stavenhagen, J.B. and Zakian, V.A. (1997) Stability of CTG:CAG trinucleotide repeat in
yeast is dependent on its orientation in the genome. Molecular and Cell Biology 4, 20902098.
Fridman, E., Pleban, T. and Zamir, D. (2000) A recombination hotspot delimits a wild-species quantitative
trait locus for tomato sugar content to 484 bp within an invertase gene. Proceedings of the National
Academy of Sciences of the United States of America 97, 47184723.
652 References
Fridman, E., Carrari, F., Liu, Y.S., Fernie, A.R. and Zamir, D. (2004) Zooming in on a quantitative trait for
tomato yield using interspecific introgressions. Science 305, 17861789.
Friedman, C., Borlawsky, T., Shagina, L., Xing, H.R. and Lussier, Y.A. (2006) Bio-ontology and text: bridging
the modelling gap. Bioinformatics 22, 24212429.
Frisch, M. (2004) Breeding strategies: optimum design of marker-assisted backcross programs. In: Lrz, H.
and Wenzl, G. (eds) Biotechnology in Agriculture and Forestry, Vol. 55. Molecular Marker Systems in
Plant Breeding and Crop Improvement. Springer-Verlag, Berlin, pp. 319334.
Frisch, M. and Melchinger, A.E. (2001) Marker-assisted backcrossing for simultaneous introgression of two
genes. Crop Science 41, 17161725.
Frisch, M. and Melchinger, A.E. (2005) Selection theory for marker-assisted backcrossing. Genetics 170,
909917.
Frisch, M. and Melchinger, A.E. (2008) Precision of recombination frequency estimates after random inter-
mating with finite population sizes. Genetics 178, 597600.
Frisch, M., Bohn, M. and Melchinger, A.E. (1999a) Comparison of selection strategies for marker-assisted
backcrossing of a gene. Crop Science 39, 12951301.
Frisch, M., Bohn, M. and Melchinger, A.E. (1999b) Minimum sample size and optimal positioning of flanking
markers in marker-assisted backcrossing for transfer of a target gene. Crop Science 39, 967975.
Frisch, M., Bohn, M. and Melchinger, A.E. (2000) PLABSIM: software for simulation of marker-assisted
backcrossing. Journal of Heredity 91, 8687.
Fu, H. and Dooner, H.K. (2002) Intraspecific violation of genetic colinearity and its implications in maize.
Proceedings of the National Academy of Sciences of the United States of America 99, 95739578.
Fu, X.D., Duc, L.T., Fontana, S., Bong, B.B., Tinjuangjun, P., Sudhakar, D., Twyman, R.M., Christou, P. and
Kohli, A. (2000) Linear transgene constructs lacking vector backbone sequences generate low-copy
number transgenic plants with simple integration patterns. Transgenic Research 9, 1119.
Fu, Y., Wen, T.J., Ronin, Y.I., Chen, H.D., Guo, L., Mester, D.I., Yang, Y., Lee, M., Korol, A.B., Ashlock, D.A.
and Schnable, P.S. (2006) Genetic dissection of intermated recombinant inbred lines using a new
genetic map of maize. Genetics 174, 16711683.
Fu, Y.B., Peterson, G.W., Williams, D., Richards, K.W. and Fetch, J.M. (2005) Patterns of AFLP varia-
tion in a core subset of cultivated hexaploid oat germplasm. Theoetical and Applied Genetics 111,
530539.
Fulton, T.M., Beck-Bunn, T., Emmatty, D., Eshed, Y., Lopez, J., Petiard, V., Uhlig, J., Zamir, D. and Tanksley,
S.D. (1997) QTL analysis of an advanced backcross of Lycopersicon peruvianum to the cultivated
tomato and comparisons with QTLs found in other wild species. Theoretical and Applied Genetics
95, 881894.
Fulton, T.M., van der Hoeven, R., Eannetta, N.T. and Tanksley, S.D. (2002) Identification, analysis and uti-
lization of conserved ortholog set markers for comparative genomics in higher plants. The Plant Cell
14, 14571467.
Furtado, A. and Henry, R.J. (2005) The wheat Em promoter drives reporter gene expression in embryo and
aleurone tissue of transgenic barley and rice. Plant Biotechnology Journal 3, 421434.
Gabriel, K.R. (1971) The biplot graphic display of matrices with application to principal component analysis.
Biometrika 58, 453467.
Gabriel, K.R. (1978) Least squares approximation of matrices by additive and multiplicative models. Journal
of the Royal Statistical Society, Series B 40, 186196.
Gale, M.D. (1975) High a-amylase breeding and genetical aspects of the problem. Cereal Research
Communications 4, 231243.
Gale, M.D. and Devos, K.M. (1998) Comparative genetics in the grasses. Proceedings of the National
Academy of Sciences of the United States of America 95, 19711974.
Galinat, W.C. (1977) The origin of corn. In: Sprague, G.F. (ed.) Corn and Corn Improvement, 2nd edn.
American Society of Agronomy, Madison, Wisconsin, pp. 148.
Gallais, A. and Bordes, J. (2007) The use of doubled haploids in recurrent selection and hybrid develop-
ment in maize. Crop Science 47(S3), S190S201.
Gallais, A., Moreau, L. and Charcosset, A. (2007) Detection of markerQTL associations by studying
change in marker frequencies with selection. Theoretical and Applied Genetics 114, 669681.
Galperin, M.Y. (2008) The molecular biology database collection: 2008 update. Nucleic Acids Research
36, D2D4.
Galperin, M.Y. and Koller, E. (2006) New metrics for comparative genomics. Current Opinion in Biotechnology
17, 440447.
References 653
Gao, S., Martinez, C., Skinner, D.J., Krivanek, A.F., Crouch, J.H. and Xu, Y. (2008) Development of a
seed DNA-based genotyping system for marker-assisted selection in maize. Molecular Breeding 22,
477494.
Gao, Z., Xie, X., Ling, Y., Muthukrishnan, S. and Liang, G.H. (2005) Agrobacterium tumefaciens-medi-
ated sorghum transformation using a mannose selection system. Plant Biotechnology Journal 3,
591599.
Garcia, A.A., Kido, E.A., Meza, A.N., Souza, H.M., Pinto, L.R., Pastina, M.M., Leite, C.S., Silva, J.A., Ulian,
E.C., Figueira, A. and Souza, A.P. (2006) Development of an integrated genetic map of a sugarcane
(Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of link-
age and linkage phases. Theoretical and Applied Genetics 112, 298314.
Gauch, H.G., Jr (1988) Model selection and validation for yield trials with interaction. Biometrics 44,
705715.
Gauch, H.G. (2006) Statistical analysis of yield trials by AMMI and GGE. Crop Science 46, 14881500.
Gauch, H.G. and Zobel, R.W. (1988) Predictive and postdictive success of statistical analysis of yield trials.
Theoretical and Applied Genetics 76, 110.
Gauch, H.G. and Zobel, R.W. (1996) AMMI analysis of yield trials. In: Kang, M.S. and Hauch, H.G., Jr (eds)
Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 85122.
Gauch, H.G. and Zobel, R.W. (1997) Identifying mega-environments and targeting genotypes. Crop
Science 37, 311326.
Gauch, H.G., Piepho, H.-P. and Annicchiarico, P. (2008) Statistical analysis of yield trials by AMMI and
GGE: further considerations. Crop Science 48, 866889.
Gaunt, T.R., Rodriguez, S., Zapata, C. and Day, I.N.M. (2006) MIDAS: software for analysis and visualisation
of interallelic disequilibrium between multiallelic markers. BMC Bioinformatics 7, 227.
Gaut, B.S. and Ross-Ibarra, J. (2008) Selection on major components of angiosperm genomes. Science
320, 484486.
Gayen, P., Madan, J.K., Kumar, R. and Sarkar, K.R. (1994) Chromosome doubling in haploids through
colchicine. Maize Genetics Cooperation Newsletter 68, 65.
Gebhardt, C., Ballvora, A., Walkemeier, B., Oberhagemann, P. and Schler, K. (2004) Assessing genetic
potential in germplasm collections of crop plants by markertrait association: a case study for pota-
toes with quantitative variation of resistance to late blight and maturity type. Molecular Breeding 13,
93102.
Gedil, M.A., Wye, C., Berry, S., Segers, B., Peleman, J., Jones, R., Leon, A., Slabaugh, M.B. and Knapp,
S.J. (2001) An integrated restriction fragment length polymorphism-amplified fragment length poly-
morphism linkage map for cultivated sunflower. Genome 44, 213221.
Geldermann, H. (1975) Investigations on inheritance of quantitative characters in animals by gene markers.
I. Methods. Theoretical and Applied Genetics 46, 319330.
Geleta, L.F., Labuschagne, M.T. and Viljoen, C.D. (2004) Relationship between heterosis and genetic dis-
tance based on morphological traits and AFLP markers in pepper. Plant Breeding 123, 467473.
Gelfand, M.S., Mironow, A.A. and Pevzner, P.A. (1996) Gene recognition via spliced sequence alignment.
Proceedings of the National Academy of Sciences of the United States of America 93, 90619066.
Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nature Genetics 25,
2529.
George, E.I. and McMulloch, R.E. (1993) Variable selection via Gibbs sampling. Journal of The American
Statistical Association 91, 883904.
Georgiady, M.S., Whitkus, R.W. and Lord, E.M. (2002) Genetic analysis of traits distinguishing outcrossing
and self-pollinating forms of currant tomato, Lycopersicon pimpinellifolium (Jusl.) Mill. Genetics 161,
333344.
Gepts, P. (2006) Plant genetic resources conservation and utilization: the accomplishments and future of a
societal insurance policy. Crop Science 46, 22782292.
Gerdes, J.T. and Tracy, W.F. (1993) Pedigree diversity within the Lancaster Surecrop heterotic group of
maize. Crop Science 33, 334337.
Gerdes, J.T., Behr, C.F., Coors, J.G. and Tracy, W.F. (1993) Compilation of North America Maize Breeding
Programs. Crop Science Society of America, Madison, Wisconsin.
Gernand, D., Rutten, T., Varshney, A., Rubtsova, M., Prodanovic, S., Br, C., Kumlehn, J., Matzk, F. and
Houben, A. (2005) Uniparental chromosome elimination at mitosis and interphase in wheat and pearl
millet crosses involves micronucleus formation, progressive heterochromatinization and DNA frag-
mentation. The Plant Cell 17, 24312438.
654 References
Gerry, N.P., Witowski, N.E., Day, J., Hammer, R.P., Barany, G. and Barany, F. (1999) Universal DNA micro-
array method for multiplex detection of low abundance point mutations. Journal of Molecular Biology
292, 251262.
Gethi, J.G., Labate, J.A., Lamkey, K.R., Smith, M.E. and Kresovich, S. (2002) SSR variation in important
U.S. maize inbred lines. Crop Science 42, 951957.
Gibbon, B.C. and Larkins, B.A. (2005) Molecular genetic approaches to developing quality protein maize.
Trends in Genetics 21, 227233.
Gibrat, J.F. and Marin, A. (2007) Detecting protein function from genome sequences. In: Morot-Gaudry, J.F.,
Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire,
pp. 87106.
Gibson, G. and Weir, B. (2005) The quantitative genetics of transcription. Trends in Genetics 21, 616623.
Gibson, S. and Somerville, C. (1993) Isolating plant genes. Trends in Biotechnology 11, 306313.
Gill, B.S., Appels, R., Botha-Oberholster, A.-M., Buell, C.R., Bennetzen, J.L., Chalhoub, B., Chumley, F.,
Dvork, J., Iwanaga, M., Keller, B., Li, W., McCombie, W.R., Ogihara, Y., Quetier, F. and Sasaki, T.
(2004) A workshop report on wheat genome sequencing: International Genome Research on Wheat
Consortium. Genetics 168, 10871096.
Gimelfarb, A. and Lande, R. (1994a) Simulation of marker-assisted selection in hybrid populations.
Genetical Research 63, 3947.
Gimelfarb, A. and Lande, R. (1994b) Simulation of marker-assisted selection for non-additive traits.
Genetical Research 64, 127136.
Gimelfarb, A. and Lande, R. (1995) Marker-assisted selection and marker-QTL associations in hybrid pop-
ulations. Theoretical and Applied Genetics 91, 522528.
Giovannoni, J.J., Wing, R.A., Ganal, M.W. and Tanksley, S.D. (1991) Isolation of molecular markers from
specific chromosome intervals using DNA pools from existing populations. Nucleic Acids Research
19, 65536558.
Gish, W. and States, D.J. (1993) Identification of protein coding regions by database similarity search.
Nature Genetics 3, 266272.
Gizlice, Z., Carter, T.E., Jr and Burton, J.W. (1993) Genetic diversity in North American soybean: II.
Prediction of heterosis in F2 populations of southern founding stock using genetic similarity measures.
Crop Science 33, 620626.
Glass, G.V. (1976) Primary, secondary and meta-analysis of research. Educational Researcher 5, 38.
Glazier, A.M., Nadeau, J.H. and Aitman, T.J. (2002) Finding genes that underlie complex traits. Science
298, 23452349.
Gleave, A.P., Mitra, D.S., Mudge, S.R. and Morris, B.A.M. (1999) Selectable marker-free transgenic plants
without sexual crossing: transient expression of cre recombinase and use of a conditional lethal domi-
nant gene. Plant Molecular Biology 40, 223235.
Gleba, Y., Marillonnet, S. and Klimyuk, V. (2004) Engineering viral expression vectors for plants: the full
virus and the deconstructed virus strategies. Current Opinion in Plant Biology 7, 182188.
Gleba, Y., Klimyuk, V. and Marillonnet, S. (2005) Magnifection a new platform for expressing recombinant
vaccines in plants. Vaccine 23, 20422048.
Goderis, I.J.W.M., De Bolle, M.F.C., Franois, I.E.J.A., Wouters, P.F.J., Broekaert, W.F. and Cammue, B.P.A.
(2002) A set of modular plant transformation vectors allowing flexible insertion of up to six expression
units. Plant Molecular Biology 50, 1727.
Godshalk, E.B., Lee, M. and Lamkey, K.R. (1990) Relationship of restriction fragment length poly-
morphisms to single-cross hybrid performance of maize. Theoretical and Applied Genetics 80,
273280.
Goedeke, S., Hensel, G., Kapusi, E., Gahrtz, M. and Kumlehn, J. (2007) Transgenic barley in fundamental
research and biotechnology. Transgenic Plant Journal 1, 104117.
Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller,
P., Varma, H., Hadley, D., Hutchison, D., Martin, C., Katagiri, F., Lange, B.M., Moughamer, T., Xia,
Y., Budworth, P., Zhong, J., Miguel, T., Paszkowski, U., Zhang, S., Colbert, M., Sun, W.L., Chen, L.,
Cooper, B., Park, S., Wood, T.C., Mao, L., Quail, P., Wing, R., Dean, R., Yu, Y., Zharkikh, A., Shen, R.,
Sahasrabudhe, S., Thomas, A., Cannings, R., Gutin, A., Pruss, D., Reid, J., Tavtigian, S., Mitchell, J.,
Eldredge, G., Scholl, T., Miller, R.M., Bhatnagar, S., Adey, N., Rubano, T., Tusneem, N., Robinson,
R., Feldhaus, J., Macalma, T., Oliphant, A. and Briggs, S. (2002) A draft sequence of the rice genome
(Oryza sativa L. ssp. japonica). Science 296, 92100.
Goffinet, B. and Gerber, S. (2000) Quantitative trait loci: a meta-analysis. Genetics 155, 463473.
References 655
Goldman, I.L. (1999) Inbreeding and outbreeding in the development of a modern heterosis concept. In:
Coors, J.G. and Pandey, S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA,
Madison, Wisconsin, pp. 718.
Goldman, I.L. (2000) Prediction in plant breeding. Plant Breeding Reviews 19, 1540.
Goldman, I.L., Rocheford, T.R. and Dudley, J.W. (1993) Quantitative trait loci influencing protein and starch
concentration in the Illinois long term selection maize strains. Theoretical and Applied Genetics 87,
217224.
Goldman, I.L., Rocheford, T.R. and Dudley, J.W. (1994) Molecular markers associated with maize kernel oil
concentration in the Illinois High Protein Illinois Low Protein Cross. Crop Science 34, 908915.
Goldsbrough, A.P., Lastrella, C.N. and Yoder, J.I. (1993) Transposition mediated re-positioning and subse-
quent elimination of marker genes from transgenic tomato. Bio/Technology 11, 12861292.
Gollob, H.F. (1968) A statistical model which combines features of factor analytic and analysis of variance.
Psychometrika 33, 73115.
Goodin, M.M., Dietzgen, R.G., Schichnes, D., Ruzin, S. and Jackson, A.O. (2002) pGD vectors: versatile
tools for the expression of green and red fluorescent protein fusions in agroinfiltrated plant leaves. The
Plant Journal 31, 375383.
Goodman, R.E., Vieths, S., Sampson, H.A., Hill, D., Ebisawa, M., Tyaler, S.L. and van Ree, R. (2008)
Allergenicity assessment of genetically modified crops what make sense? Nature Biotechnology
26, 7381.
Goodnight, C.J. (2004) Gene interaction and selection. Plant Breeding Reviews 24 (Part 2), 269291.
Gorg, A., Obermaier, C., Boguth, G. and Weiss, W. (1999) Recent developments in two-dimensional gel
electrophoresis with immobilized pH gradients: wide pH gradients up to pH 12, longer separation
distances and simplified procedures. Electrophoresis 20, 712717.
Grandillo, S. and Tanksley, S.D. (1996) QTL analysis of horticultural traits differentiating the cultivated tomato
from the closely related species Lycopersicon pimpinellifolium. Theoretical and Applied Genetics 92,
935951.
Graner, A., Jahoor, A., Schondelmaier, J., Siedler, H., Pollen, K., Fischbeck, G., Wenzel, G. and
Herrmann, R.G. (1991) Construction of an RFLP map of barley. Theoretical and Applied Genetics
83, 250256.
Grapes, L., Dekkers, J.C.M., Rothschild, M.F. and Fernando, R.L. (2004) Comparing linkage disequilibrium-
based methods for fine mapping quantitative trait loci. Genetics 166, 15611570.
Green, P.J. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determi-
nation. Biometrika 82, 711732.
Greenbaun, D., Smith, A. and Gerstein, M. (2005) Impediments to database interoperation: legal issues
and security concerns. Nucleic Acids Research 33, D3D4.
Greene, S.L. and Guarino, L. (eds) (1999) Linking Genetic Resources and Geography: Emerging Strategies
for Conserving and Using Crop Biodiversity. American Society of Agronomy (ASA) and Crop Science
Society of America (CSSA), Madison, Wisconsin.
Gregory, B.D., Yazaki, J. and Ecker, J.R. (2008) Utilizing tiling microarrays for whole-genome analysis in
plants. The Plant Journal 53, 636644.
Groos, C., Robert, N., Bervas, E. and Charmet, G. (2003) Genetic analysis of grain protein-content,
grain yield and thousand-kernel weight in bread wheat. Theoretical and Applied Genetics 106,
10321040.
Grosset, J., Alary, R., Gautier, M.F., Menossi, M., Martinez-Izquierdo, J.A. and Joudrier, P. (1997)
Characterization of a barley gene coding for an alpha-amylase inhibitor subunit (CMd protein) and
analysis of its promoter in transgenic tobacco plants and in maize kernels by microprojectile bombard-
ment. Plant Molecular Biology 34, 331338.
Grupe, A., Germer, S., Usuka, J., Aud, D., Belknap, J.K., Klein, R.F., Ahluwalia, M.K., Higuchi, R. and Peltz,
G. (2001) In silico mapping of complex disease-related traits in mice. Science 292, 19151918.
Gu, S., Pakstis, A.J. and Kidd, K.K. (2005) HAPLOT: a graphical comparison of haplotype blocks, tagSNP
sets and SNP variation for multiple populations. Bioinformatics 21, 39383939.
Guidetti, G. (1998) Seed terminator and mega-merger threaten food and freedom. Available at: http://www.
sustainable-city.org/articles/terminat.htm (accessed 17 November 2009).
Guo, B., Sleper, D.A., Sun, J., Nguyen, H.T., Arelli, P.R. and Shannon, J.G. (2006) Pooled analysis of data from
multiple quantitative trait locus mapping populations. Theoretical and Applied Genetics 113, 3948.
Guo, M., Rupe, M.A., Zinselmeier, C., Habben, J., Bowen, B.A. and Smith, O.S. (2004) Allelic variation of
gene expression in maize hybrids. The Plant Cell 16, 17071716.
656 References
Guo, M., Rupe, M.A., Yang, X., Crasta, O., Zinselmeier, C., Smith, O.S. and Bowen, B. (2006) Genome-wide
transcript analysis of maize hybrids: allelic additive gene expression and yield heterosis. Theoretical
and Applied Genetics 113, 831845.
Gupta, P.K. and Rustgi, S. (2004) Molecular markers from the transcribed/expressed region of the genome
in higher plants. Functional and Integrated Genomics 4, 139162.
Gur, A. and Zamir, D. (2004) Unused natural variation can lift yield barriers in plant breeding. PLoS Biology
2(10), e245.
Gurib-Fakim, A. (2006) Medicinal plants: traditions of yesterday and drugs of tomorrow. Molecular Aspects
of Medicine 27, 193.
Haanstra, J.P.W., Wye, C., Verbakel, H., Meijer-Dekens, F., Van den Berg, P., Odinot, P., van Heusden,
A.W., Tanksely, S., Lindhout, P. and Peleman, J. (1999) An integrated high-density RFLP-AFLP map of
tomato based on two Lycopersicon esculentum L. pennellii F2 populations. Theoretical and Applied
Genetics 99, 254271.
Haberer, G., Young, S., Bharati, A.K., Gundlach, H., Raymond, C., Fuks, G., Butler, E., Wing, R.A., Rounsley,
S., Birren, B., Nusbaum, C., Mayer, K.F.X. and Messing, J. (2005) Structure and architecture of the
maize genome. Plant Physiology 139, 16121624.
Hackett, C.A., Meyer, R.C. and Thomas, W.T.B. (2001) Multi-trait QTL mapping in barley using multivariate
regression. Genetical Research 77, 95106.
Hagberg, A. and Hagberg, G. (1980) High frequency of spontaneous haploids in the progeny of an induced
mutation barley. Hereditas 93, 341343.
Hahn, W.J. and Grifo, F.T. (1996) Molecular markers in plant conservation genetics. In: Sobral, B.W.S. (ed.)
The Impact of Plant Molecular Genetics. Birkhuer, Boston, Massachusetts, pp. 113136.
Hajdukiewicz, P., Svab, Z. and Maliga, P. (1994) The small, versatile pPZP family of Agrobacterium binary
vectors for plant transformation. Plant Molecular Biology 25, 989994.
Hajdukiewicz, P.T.J., Gilbertson, L. and Staub, J.M. (2001) Multiple pathways for Cre/lox-mediated recom-
bination in plastids. The Plant Journal 27, 161170.
Haldane, J.B.S. (1919) The combination of linkage values and the calculation of distance between the loci
of linkage factors. Journal of Genetics 8, 299309.
Haldane, J.B.S. and Smith, C.A.B. (1947) A new estimate of the linkage between the genes for colour-
blindness and haemophilia in man. Annals of Eugenics 14, 1031.
Haldane, J.B.S. and Waddington, C.H. (1931) Inbreeding and linkage. Genetics 16, 357374.
Haldrup, A., Petersen, S.G. and Okkels, F.T. (1998a) Positive selection: a plant selection principle based on
xylose isomerase, an enzyme used in the food industry. Plant Cell Reports 18, 7681.
Haldrup, A., Petersen, S.G. and Okkels, F.T. (1998b) The xylose isomerase gene from Thermoanaerobacterium
thermosulfurogenes allows effective selection of transgenic plant cells using D-xylose as the selection
agent. Plant Molecular Biology 37, 287296.
Haley, C. (1999) Advances in quantitative trait locus mapping. In: Dekkers, J.C.M., Lamont, S.J. and
Rothschild, M.F. (eds) From Jay Lush to Genomics: Visions for Animal Breeding and Genetics. Animal
Breeding and Genetics Group, Department of Animal Science, Iowa State University, Ames, Iowa,
pp. 4759.
Haley, C.S. and Knott, S.A. (1992) A simple regression method for mapping quantitative trait loci in line
crosses using flanking markers. Heredity 69, 315324.
Haley, C.S., Knott, S.A. and Elsen, J.-M. (1994) Mapping quantitative trait loci in crosses between outbred
lines using least squares. Genetics 136, 11951207.
Halfhill, M.D., Richards, H.A., Mabon, S.A. and Stewart, C.N., Jr (2001) Expression of GFP and Bt trans-
genes in Brassica napus and hybridization and introgression with Brassica rapa. Theoretical and
Applied Genetics 103, 362368.
Halfhill, M.D., Zhu, B., Warwick, S.I., Raymer, P.L., Millwood, R.J., Weissinger, A.K. and Stewart, C.N., Jr
(2004a) Hybridization and backcrossing between transgenic oilseed rape and two related weed spe-
cies under field conditions. Environmental Biosafety Research 3, 7381.
Halfhill, M.D., Millwood, R.J. and Stewart, C.N., Jr (2004b) Green fluorescent protein quantification in whole
plants. In: Pea, L. (ed.) Methods in Molecular Biology, Vol. 286. Transgenic Plants: Methods and
Protocols. Humana Press Inc., Totowa, New Jersey, pp. 215225.
Hall, J.G., Eis, P.S., Law, S.M., Reynaldo, L.P., Prudent, J.R., Marshall, D.J., Allawi, H.T., Mast, A.L.,
Dahlberg, J.E., Kwiatkowski, R.W., de Arruda, M., Neri, B.P. and Lyamichev, V.I. (2000) Sensitive
detection of DNA polymorphisms by the serial invasive signal amplification reaction. Proceedings of
the National Academy of Sciences of the United States of America 97, 82728277.
References 657
Hallauer, A.R. (1990) Methods used in developing maize inbreds. Maydica 35, 116.
Hallauer, A.R. (2007) History, contribution and future of quantitative genetics in plant breeding: lessons
from maize. Crop Science 47(S3), S4S19.
Hallauer, A.R. and Miranda, J.B. (1988) Quantitative Genetics in Maize Breeding, 2nd edn. Iowa State
University Press, Ames, Iowa.
Hallauer, A.R., Russell, W.A. and Lamkey, K.R. (1988) Corn breeding. In: Sprague, G.F. and Dudley, J.W.
(eds) Corn and Corn Improvement, 3rd edn. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 463564.
Hallauer, A.R., Ross, A.J. and Lee, M. (2004) Long-term divergent selection for ear length in maize. Plant
Breeding Reviews 24 (Part 2), 153168.
Halpin, C. and Boerjan, W. (2003) Stacking transgenes in forest trees. Trends in Plant Science 8,
363365.
Halpin, C., Barakate, A., Askari, B.M., Abbott, J.C. and Ryan, M.D. (2001) Enabling technologies for manip-
ulating multiple genes on complex pathways. Plant Molecular Biology 47, 295310.
Hamilton, C.M. (1997) A binary-BAC system for plant transformation with high-molecular-weight DNA.
Gene 200, 107116.
Hamilton, C.M., Frary, A., Lewis, C. and Tanksley, S.D. (1996) Stable transfer of intact high molecular weight
DNA into plant chromosomes. Proceedings of the National Academy of Sciences of the United States
of America 93, 99759979.
Hammer, G.L., Kropff, M.J., Sinclair, T.R. and Porter, J.R. (2002) Future contribution of crop modeling:
from heuristics and supporting decision making to understanding genetic regulation and aiding crop
improvement. European Journal of Agronomy 18, 1531.
Hammer, G.L., Chapman, S., van Oosterom, E. and Podlich, D.W. (2005) Trait physiology and crop mod-
eling as a framework to link phenotypic complexity to underlying genetic systems. Australian Journal
of Agricultural Research 56, 947960.
Hammond, M.P. and Birney, E. (2004) Genome information resources developments at Ensembl. Trends
in Genetics 20, 268272.
Han, B. and Xue, Y. (2003) Genome-wide intraspecific DNA-sequence variations in rice. Current Opinion
in Plant Biology 6, 134138.
Han, O.K., Kaga, A., Isemura, T., Wang, X.W., Tomooka, N. and Vaughan, D.A. (2005) A genetic linkage
map for azuki bean [Vigna angularis (Willd.) Ohwi & Ohashi]. Theoretical and Applied Genetics 111,
12781287.
Han, X., Aslanian, A. and Yates, J.R. III (2008) Mass spectrometry for proteomics. Current Opinion in
Chemical Biology 12, 483490.
Hanash, S. (2003) Disease proteomics. Nature 422, 226232.
Hanin, M. and Paszkowski, J. (2003) Plant genome modification by homologous recombination. Current
Opinion in Plant Biology 6, 157162.
Hanocq, E., Laperche, A., Jaminon, O., Lain, A.-L. and Le Guis, J. (2007) Most significant genome regions
involved in the control of earliness traits in bread wheat, as revealed by QTL meta-analysis. Theoretical
and Applied Genetics 114, 569584.
Hansen, B.G., Halkier, B.A. and Kliebenstein, D.J. (2008) Identifying the molecular basis of QTLs: eQTLs
add a new dimension. Trends in Plant Science 13, 7277.
Hansen, M., Kraft, T., Ganestam, S., Sll, T. and Nilsson, N.-O. (2001) Linkage disequilibrium mapping of
the bolting gene in sea beet using AFLP markers. Genetical Research 77, 6166.
Hanson, W.D. (1959) Early generation analysis of lengths of heterozygous chromosome segments around
a locus held heterozygous with backcrossing or selfing. Genetics 44, 833837.
Harding, K. (2004) Genetic integrity of cryopreserved plant cells: a review. Cryo Letters 25, 322.
Harlan, H.V. and Pope, M.N. (1922) The use and value of back-crosses in small grain breeding. Journal of
Heredity 13, 319322.
Harlan, H.V., Martini, M.L. and Stevens, H. (1940) A study of methods in barley breeding. USDA Technical
Bulletin 720.
Harlan, J. (1965) The possible role of weed races in the evolution of cultivated plants. Euphytica 14,
173176.
Harlan, J.R. (1971) Agricultural origins: centers and noncenters. Science 174, 468474.
Harlan, J. (1992) Crops and Man, 2nd edn. Crop Science Society of America, Madison, Wisconsin.
Harlan, J.R. (1987) Gene centers and gene utilization in American agriculture. In: Yeatman, C.W., Kafton, D.
and Wilkes, G. (eds) Plant Genetic Resources: a Conservation Imperative. Westview Press, Boulder,
Colorado, pp. 111129.
658 References
Harlan, J.R. and de Wet, J.M.J. (1971) Towards a rational classification of cultivated plants. Taxon 20,
509517.
Harper, B.K., Mabon, S.A., Leffel, S.M., Halfhill, M.D., Richards, H.A., Moyer, K.A. and Stewart, C.N., Jr
(1999) Green fluorescent protein as a marker for expression of a second gene in transgenic plants.
Nature Biotechnology 17, 11251129.
Harris, S.A. (1999) Molecular approaches to assessing plant diversity. In: Benson, E.E. (ed.) Plant
Conservation Biotechnology. Taylor & Francis Ltd, London, pp. 1124.
Hart, G.E., Gale, M.D. and McIntosh, R.A. (1993) Linkage maps of Triticum aestivum (Hexaploid wheat, 2n
= 42, genome A, B and D) and T. tauschii (2n = 14, genome D). In: OBrien, S.J. (ed.) Genetic Maps:
Locus Maps of Complex Genomes. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New
York, pp. 6.2046.219.
Harushima, Y., Kurata, N., Yano, M., Nagamura, Y., Sasaki, T., Minobe, Y. and Nakagahra, M. (1996)
Detection of segregation distortions in an indicajaponica rice cross using a high-resolution molecular
map. Theoretical and Applied Genetics 92,145150.
Harushima, Y., Yano, M., Shomura, A., Sato, M., Shimano, T., Kuboki, Y., Yamamoto, T., Lin, S.Y., Antonio,
B.A., Parco, A., Kajiya, H., Huang, N., Yamamoto, K., Nagamura, Y., Kurata, N., Khush, G.S. and
Sasaki, T. (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 popula-
tion. Genetics 148, 479494.
Haseloff, J., Siemering, K.P., Prasher, D. and Hodge, S. (1997) Removal of a cryptic intron and subcel-
lular localization of green fluorescent protein are required to mark transgenic Arabidopsis plants
brightly. Proceedings of the National Academy of Sciences of the United States of America 94,
21222127.
Havey, M.J. (1998) Molecular analyses and heterosis in the vegetables: can we breed them like maize?
Lamkey, K.R. and Staub, J.E. (eds) Concepts and Breeding of Heterosis in Crop Plants. Crop Science
Society of America (CSSA), Madison, Wisconsin, pp. 109116.
Hawtin, G. (1998) Conservation of agrobiodiversity for tropical agriculture. In: Chopra, V.L., Singh, R.B
and Varma, A. (eds) Crop Productivity and Sustainability Shaping the Future, Proceedings of the
2nd International Crop Science Congress. Oxford & IBH Publishing Co., New Delhi, pp. 917925.
Hayes, B. and Goddard, M.E. (2001) The distribution of the effects of genes affecting quantitative traits in
livestock. Genetics Selection Evolution 33, 209229.
Hazekamp, Th. (2002) The potential role of passport data in the conservation and use of plant genetic
resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing
Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome, pp. 185194.
Hazekamp, Th., Serwinski, J. and Alercia, A. (1997) Mulit-crop passport descriptors. In: Lipmann, E.,
Jongen, M.W.M., Hintum, Th.J.L. van, Gass, T. and Maggioni, L. (compilers) Central Crop Databases:
Tools for Plant Genetic Resources Management. Report of a Workshop, 1316 October 1996,
Budapest, Hungary. International Plant Genetic Resources Institute, Rome, Italy/CGN, Wageningen,
Netherlands, pp. 3539.
Hazen, S.P., Pathan, M.S., Sanchez, A., Baxter, I., Dunn, M., Estes, B., Chang, H.-S., Zhu, T., Kreps, J.A.
and Nguyen, H.T. (2005) Expression profiling of rice segregating for drought tolerance QTL using a
rice genome array. Functional and Integrative Genomics 5, 104116.
He, P., Li, J.Z., Zheng, X.W., Shen L.S., Lu, C.F., Chen, Y. and Zhu, L.H. (2001) Comparison of molecular
linkage maps and agronomic trait loci between DH and RIL populations derived from the same rice
cross. Crop Science 41, 12401246.
He, X.H. and Zhang, Y.M. (2008) Mapping epistatic quantitative trait loci underlying endosperm traits using
all markers on the entire genome in a random hybridization design. Heredity 101, 3947.
He, Y., Chen, C., Tu, J., Zhou, P., Jiang, G., Tan, Y., Xu, C. and Zhang, Q. (2002) Improvement of an elite
rice hybrid, Shanyou 63, by transformation and maker-assisted selection. In: Abstracts of the Fourth
International Symposium on Hybrid Rice, 1417 May 2002, Hanoi, Vietnam, p. 43.
He, Y., Li, X., Zhang, J., Jiang, G., Liu, S., Chen, S., Tu, J., Xu, C. and Zhang, Q. (2004) Gene pyramid-
ing to improve hybrid rice by molecular marker technique. 4th International Crop Science Congress.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
He, Z., Fu, Y., Si, H., Hu, G., Zhang, S., Yu, Y. and Sun, Z. (2004) Phosphomannose-isomerase (pmi) gene
as a selectable marker for rice transformation via Agrobacterium. Plant Science 166, 1722.
Heckenberger, M., Bohn, M., Maurer, H.P., Frisch, M. and Melchinger, A.E. (2005a) Identification of essen-
tially derived varieties with molecular markers: an approach based on statistical test theory and com-
puter simulations. Theoretical and Applied Genetics 111, 598608.
References 659
Heckenberger, M., Bohn, M., Klein, D. and Melchinger, A.E. (2005b) Identification of essentially derived
varieties obtained from biparental crosses of homozygous lines: II. Morphological distances and het-
erosis in comparison with simple sequence repeat and amplified fragment length polymorphism data
in maize. Crop Science 45, 11321140.
Heckenberger, M., Muminovic, J., van der Voort, J.R., Peleman, J., Bohn, M. and Melchinger, A.E. (2006)
Identification of essentially derived varieties from biparental crosses of homogenous lines. III.
AFLP data from maize inbreds and comparison with SSR data. Molecular Breeding 17, 111125.
Heckenberger, M., Maurer, H.P., Melchinger, A.E. and Frisch, M. (2008) The Plabsoft database: a compre-
hensive database management system for integrating phenotypic and genomic data in academic and
commercial plant breeding programs. Euphytica 161, 173179.
Hedden, P. (2003) The genes of the green revolution. Trends in Genetics 19, 519.
Hedgecock, D., Lin, J.Z., DeCola, S., Haudenschild, C., Meyer, E., Manahan, D.T. and Bowen, B. (2002)
Analysis of gene expression in hybrid Pacific oysters by massively parallel signature sequencing.
Plant & Animal Genome X Conference Abstract. Available at: http://www.intl-pag.org/pag/10/abstracts/
PAGX_W15.html (accessed 30 June 2007).
Hedges, L.V. and Olkin, I. (1985) Statistical Methods for Meta-analysis. Academic Press, Orlando, Florida.
Heisey, P.W., King, J.L. and Rubenstein, K.D. (2005) Patterns of public sector and private-sector patenting
in agricultural biotechnology. AgBioForum 8, 7382.
Heitz, A. (1998) Intellectual property rights and plant variety protection in relation to demands of the world
trade organization and farmers in sub-Saharan Africa. In: Proceedings of the Regional Technical
Meeting on Seed Policy and Programmes for Sub-Saharan Africa, Abidjan, Cte dIvoire, 2327
November 1998. Available at: http://www.fao.org/ag/agp/AGPS/abidjan/tabcont.htm (accessed 17
November 2009).
Helentjaris, T. and Briggs, K. (1998) Are there too many genes in maize? Maize Genetics Cooperation
Newsletter 72, 3940.
Helentjaris, T., Cushman, M.A.T. and Winkler, R. (1992) Developing a genetic understanding of agronomy
traits with complex inheritance. In: Dettee, Y., Dumas, C. and Gallais, A. (eds) Reproductive Biology
and Plant Breeding. Springer-Verlag, Berlin, pp. 397406.
Helfer, L.R. (2006) The demise and rebirth of plant variety protection: a comment on obsolescence in intel-
lectual property. Regimes. Public Law and Legal Theory (Vanderbilt University Law School), Working
Paper Number 0628. Vanderbilt University, Nashville, Tennessee.
Hellens, R., Mullineaux, P. and Klee, H. (2000) Technical focus: a guide to Agrobacterium binary Ti vectors.
Trends in Plant Science 5, 446451.
Hellens, R.P., Edwards, E.A., Leyland, N.R., Bean, S. and Mullineaux, P.M. (2000) pGreen, a versatile and
flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Molecular Biology 42,
819832.
Henderson, C.R. (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics
31, 423447.
Henikoff, S. and Comai, L. (2003) Single-nucleotide mutations for plant functional genomics. Annual Review
of Plant Biology 54, 375401.
Henry, Y., De Buyser, J., Agache, S., Parker, B.B. and Snape, J.W. (1988) Comparison of methods of
haploid production and performance of wheat lines produced by doubled haploidy and single seed
descent. In: Miller, T.E. and Koebner, R.M.D. (eds) Proceedings of 7th International Wheat Genetics
Symposium, Cambridge, 1319 July 1988. Institute of Plant Science Research, Cambridge, UK, pp.
10871092.
Henson-Apollonio, V. (2007) Impacts of intellectual property rights on marker-assisted selection research
and application for agriculture in developing countries. In: Guimares, E.P., Ruane, J., Scherf, B.D.,
Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives
in Crops, Livestock, Forestry and Fish. Food and Agriculture Organization of the United Nations,
Rome, pp. 405425.
Herring, R.J. (2008) Opposition to transgenic technologies: ideology, interests and collective action frames.
Nature Reviews Genetics 9, 458463.
Heun, M., Kennedy, A.E., Anderson, J.A., Lapitan, N.L.V., Sorrells, M.E. and Tanksley, S.D. (1991)
Construction of a restriction fragment length polymorphism map for barley (Hordeum vulgare).
Genome 34, 437447.
Hiatt, A.C., Cafferkey, R. and Bowdish, K. (1989) Production of antibodies in transgenic plants. Nature 342,
7678.
660 References
Hiei, Y., Ohta, S., Komari, T. and Kumashiro, T. (1994) Efficient transformation of rice (Oryza sativa L.) medi-
ated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. The Plant Journal 6,
271282.
Hiei, Y., Komari, T. and Kubo, T. (1997) Transformation of rice mediated by Agrobacterium tumefaciens.
Plant Molecular Biology 35, 205218.
Hijmans, R.J., Guarino, L., Cruz, M. and Rojas, E. (2001) Computer tools for spatial analysis of plant
genetic resources data. 1. DIVA-GIS. Plant Genetic Resources Newsletter 127, 1519.
Hillel, D. and Rosenzweig, C. (2005) The role of biodiversity in agronomy. Advances in Agronomy 88,
134.
Hillel, J., Avner, R., Baxter-Jones, C., Dunnington, E.A., Cahaner, A. and Siegel, P.B. (1990) DNA finger-
prints from blood mixes in chickens and turkeys. Animal Biotechnology 2, 201204.
Hillenkamp, F. and Kster, H. (1999) Infrared matrix-assisted laser desorption/ionization mass spectromet-
ric analysis of macro-molecules. Patent EP 1075545.
Himmelbach, A., Zierold, U., Hensel, G., Riechen, J., Douchkov, D., Schweizer, P. and Kumlehn, J. (2007) A
set of modular binary vectors for transformation of cereals. Plant Physiology 145, 11921200.
Hintum, Th.J.L. van (1999) The Core Selector, a system to generate representative selections of germ-
plasm accessions. Plant Genetic Resources Newsletter 118, 6467.
Hird, D.L., Paul, W., Hollyoak, J.S. and Scott, R.J. (2000) The restoration of fertility in male sterile tobacco
demonstrates that transgene silencing can be mediated by T-DNA that has no DNA homology to the
silenced transgene. Transgenic Research 9, 91102.
Hirochika, H. (2003) Insertional mutagenesis in rice using the endogenous retrotransposon. In: Mew, T.W.,
Brar, D.S., Peng, S., Dawe, D. and Hardy, B. (eds) Rice Science: Innovations and Impact for Livelihood,
Proceedings of the International Rice Research Conference, 1619 September 2002, Beijing, China.
International Rice Research Institute, Chinese Academy of Engineering and Chinese Academy of
Agricultural Sciences, pp. 205212.
Hirochika, H., Guiderdoni, E., An, G., Hsing, Y.I., Eun, M.Y., Han, C.D., Upadhyaya, N., Ramachandran,
S., Zhang, Q., Pereira, A., Sundaresan, V. and Leung, H. (2004) Rice mutant resources for gene
discovery. Plant Molecular Biology 54, 325334.
Hittalmani, S., Parco, A., Mew, T.V., Zeigler, R.S. and Huang, N. (2000) Fine mapping and DNA marker-
assisted pyramiding of the three major genes for blast resistance in rice. Theoretical and Applied
Genetics 100, 11211128.
Hodgkin, T. and Ramanatha Rao, V. (2002) People, plant and DNA: technical aspects of conserving and
using plant genetic resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson,
M.T. (eds) Managing Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome,
pp. 469480.
Hodson, D.P. and White, J.W. (2007) Use of spatial analyses for global characterization of wheat-based
production systems. Journal of Agricultural Science 145, 115125.
Hodson, D.P., Martinez-Romero, E., White, J.W., Corbett, J.D. and Bnziger, M. (2002) Africa Maize
Research Atlas (v. 3.0), CD-ROM Publication. Centro Internacional de Mejoramiento de Maiz y Trigo
(CIMMYT), Mexico, DF.
Hoekema, A., Hirsch, P.R., Hooykaas, P.J.J. and Schilperoort, R.A. (1983) A binary plant vector strategy based
on separation of vir- and T-region of the Agrobacterium tumefaciens Ti-plasmid. Nature 303, 179180.
Hoeschele, I. and VanRaden, P.M. (1993a) Bayesian analysis of linkage between genetic markers and
quantitative trait loci. I. Prior knowledge. Theoretical and Applied Genetics 85, 953960.
Hoeschele, I. and VanRaden, P.M. (1993b) Bayesian analysis of linkage between genetic markers and
quantitative trait loci. II. Combining prior knowledge with experimental evidence. Theoretical and
Applied Genetics 85, 946952.
Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999) The Prosite database, its status in 1999.
Nucleic Acids Research 27, 215219.
Hoheisel, J.D. (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nature
Reviews Genetics 7, 200210.
Hohn, B., Levy, A.A. and Puchta, H. (2001) Elimination of selection markers from transgenic plants. Current
Opinion in Biotechnology 12, 139143.
Hoisington, D. and Ortiz, R. (2008) Research and field monitoring on transgenic crops by the Centro
Internacional de Mejoramiento de Maiz y Trigo (CIMMYT). Euphytica 164, 893902.
Holland, J.B. (1998) EPISTACY: a SAS program for detecting two-locus epistasis interactions using genetic
marker information. Journal of Heredity 89, 374375.
References 661
Holland, J.B. (2001) Epistasis and plant breeding. Plant Breeding Reviews 21, 2932.
Holland, J.B. (2004) Implementation of molecular markers for quantitative traits in breeding programs
challenges and opportunities. In: New Direction for a Diverse Planet, Proceedings of the 4th International
Crop Science Congress, 26 September1 October 2004, Brisbane, Australia. Published on CD-ROM.
Available at: http://www.cropscience.org.au/icsc 2004/ (accessed 17 November 2009).
Hopkins, C.G. (1899) Improvement in the chemical composition of the corn kernel. Illinois Agricultural
Experiment Station Bulletin 55, 205240.
Horan, K., Lauricha, J., Bailey-Serres, J., Raikhel, N. and Girke, T. (2005) Genome cluster database.
A sequence family analysis platform for Arabidopsis and rice. Plant Physiology 138, 4754.
Hori, K., Kobayashi, T., Shimizu, A., Sato, K., Takeda, K. and Kawasaki, S. (2003) Efficient construction
of high-density linkage map and its application to QTL analysis in barley. Theoretical and Applied
Genetics 107, 806813.
Hori, K., Sato, K. and Takeda, K. (2007) Detection of seed dormancy QTL in multiple mapping popula-
tions derived from crosses involving novel barley germplasm. Theoretical and Applied Genetics 115,
869876.
Hormaza, J.I., Dollo, L. and Polito, V.S. (1994) Identification of a RAPD marker linked to sex determination
in Pistacia vera using bulked segregant analysis. Theoretical and Applied Genetics 89, 913.
Hospital, F. (2001) Size of donor chromosome segments around introgressed loci and reduction of linkage
drag in marker-assisted backcross programs. Genetics 158, 13631379.
Hospital, F. (2002) Marker-assisted backcross breeding: a case study in genotype building theory. In: Kang,
M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK,
pp. 135141.
Hospital, F. and Charcosset, A. (1997) Marker-assisted introgression of quantitative trait loci. Genetics 147,
14691485.
Hospital, F. and Decoux, G. (2002) Popmin: a program for the numerical optimization of population sizes in
marker-assisted backcross breeding programs. Journal of Heredity 93, 383384.
Hospital, F., Chevalet, C. and Mulsant, P. (1992) Using markers in gene introgression breeding programs.
Genetics 231, 11991210.
Hospital, F., Moreau, L., Lacoudre, F., Charcosset, A. and Gallais, A. (1997) More on the efficiency of
marker-assisted selection. Theoretical and Applied Genetics 95, 11811189.
Hospital, F., Goldringer, I. and Openshaw, S. (2000) Efficient marker-based recurrent selection for multiple
quantitative trait loci. Genetical Research 75, 11811189.
Hoti, F. and Sillanp, M.J. (2006) Bayesian mapping of genotype expression interaction in quantitative
and qualitative traits. Heredity 97, 418.
Howe, A.R., Gasser, C.S., Brown, S.M., Padgette, S.R., Hart, J., Parker, G.B., Fromn, M.E. and Armstrong,
C.L. (2002) Glyphosate as a selective agent for the production of fertile transgenic maize (Zea mays
L.) plants. Molecular Breeding 10, 153164.
Howell, W.M., Jobs, M., Gyllensten, U. and Brooks, V. (1999) Dynamic allele-specific hybridization. A new
method for scoring single nucleotide polymorphisms. Nature Biotechnology 17, 8788.
Hsing, Y.-I., Chern, C.-G., Fan, M.-J., Lu, P.-C., Chen, K.-T., Lo, S.-F., Sun, P.-K., Ho, S.-L., Lee, K.-W.,
Wang, Y.-C., Huang, W.-L., Ko, S.-S., Chen, S., Chen, J.-L., Chung, C.-I., Lin, Y.-C., Hour, A.-L., Wang,
Y.-W., Chang, Y.-C., Tsai, M.-W., Lin, Y.-S., Chen, Y.-C., Yen, H.-M., Li, C.-P., Wey, C.-K., Tseng, C.-S.,
Lai, M.-H., Huang, S.-C., Chen, L.-J. and Yu, S.-M. (2007) A rice gene activation/knockout mutant
resource for high throughput functional genomics. Plant Molecular Biology 63, 351364.
Hu, J. and Vick, B.A. (2003) Target region amplification polymorphism: a novel marker technique for plant
genotyping. Plant Molecular Biology Reporter 21, 289294.
Hua, J., Xing, Y., Wu, W., Xu, C., Sun, X., Yu, S. and Zhang, Q. (2003) Single-locus heterotic effects and
dominance by dominance interactions can adequately explain the genetic basis of heterosis in an
elite rice hybrid. Proceedings of National Academy of Sciences of United States of America 100,
25742579.
Hua, J.P., Xing, Y.Z., Xu, C.G., Sun, X.L., Yu, S.B. and Zhang, Q. (2002) Genetic dissection of an elite rice
hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162,
18851895.
Huamn, Z., Ortiz, R., Zhang, D. and Rodrguez, F. (2000) Isozyme analysis of entire and core collection of
Solanum tuberosum subsp. andigena potato cultivars. Crop Science 40, 273276.
Huang, L., Brooks, S.H., Li, W., Fellers, J.P., Trick, H.N. and Gill, B.S. (2003) Map based cloning of leaf rust
resistance gene Lr21 from the large and polyploid genome in bread wheat. Genetics 164, 655664.
662 References
Huang, N., Courtois, B., Khush, G.S., Lin, H., Wang, G., Wu, P. and Zheng, K. (1996) Association of quan-
titative trait loci for plant height with major dwarfing genes in rice. Heredity 77, 130137.
Huang, N., Angeles, E.R., Domingo, J., Magpantay, G., Singh, S., Zhang, G., Kumaravadivel, N., Bennet,
J. and Khush, G.S. (1997) Pyramiding of bacterial blight resistance genes in rice: marker-assisted
selection using RFLP and PCR. Theoretical and Applied Genetics 95, 313320.
Huang, S., Gilbertson, L.A., Adams, T.H., Malloy, K.P., Reisenbigler, E.K., Birr, D.H., Snyder, M.W., Zhang,
Q. and Luethy, M.H. (2004) Generation of marker-free transgenic maize by regular two-border
Agrobacterium transformation vectors. Transgenic Research 13, 451461.
Huang, X., Feng, Q., Qian, Q., Zhao, Q., Wang, L., Wang, A., Guan, J., Fan, D., Wang, Q., Huang, T.,
Dong, G., Sang, T. and Han, B. (2009) High-throughput genotyping by whole-genome resequencing.
Genome Research 19, 10681076.
Hudson, L.C., Halfhill, M.D. and Stewart, C.N., Jr (2004) Transgene dispersal through pollen. In: Pea, L.
(ed.) Methods in Molecular Biology, Vol. 286. Transgenic Plants: Methods and Protocols. Humana
Press Inc., Totowa, New Jersey, pp. 365374.
Huelsenbeck, J.P., Ronquist, F., Nielsen, R. and Bollback, J.P. (2001) Bayesian inference of phylogeny and
its impact on evolutionary biology. Science 294, 23102314.
Hhn, M. (1996) Nonparametric analysis of genotype environment interactions by ranks. In: Kang, M.S.
and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida,
pp. 235271.
Hulden, M. (1997) Standardization of central crop databases. In: Lipmann, E., Jongen, M.W.M., Hintum,
Th.J.L. van, Gass, T. and Maggioni, L. (compilers) Central Crop Databases: Tools for Plant Genetic
Resources Management. Report of a Workshop, 1316 October 1996, Budapest, Hungary. International
Plant Genetic Resources Institute, Rome, Italy/CGN, Wageningen, Netherlands, pp. 2634.
Hunt, M. (1997) How Science Takes Stock: the Story of Meta Analysis. Russell Sage Foundation,
New York.
Hussein, M.A., Bjornstad, A. and Aastveit, A.H. (2000) SASG ESTAB: a SAS program for computing
genotype environment stability statistics. Agronomy Journal 92, 454459.
Hyne, V. and Kearsey, M.J. (1995) QTL analysis further uses of marker regression. Theoretical and
Applied Genetics 91, 471476.
Hyten, D.L., Song, Q., Choi, I.-Y., Yoon, M.-P., Specht, J.E., Matukumalli, L.K., Nelson, R.L., Shoemaker,
R.C., Young, N.D. and Cregan, P.B. (2008) High-throughput genotyping with the GoldenGate assay in
the complex genome of soybean. Theoretical and Applied Genetics 116, 945952.
IBPGR (International Board for Plant Genetic Resources) (1986) Design, Planning and Operation of In
Vitro Genebanks: Reports of a Subcommittee of the IBPGR Advisory Committee on In Vitro Storage.
IBPGR, Rome.
IBRD/World Bank (The International Bank for Reconstruction and Development/The World Bank) (2006)
Intellectual Property Rights: Designing Regimes to Support Plant Breeding in Developing Countries.
The World Bank, Washington, DC.
Ideta, O., Yoshimura, A. and Iwata, N. (1996) An integrated linkage map of rice. Rice Genetics III. Proceedings
of the Third International Rice Genetics Symposium, 1620 October 1995, Manila. International Rice
Research Institute (IRRI), Manila, Phillipines.
Igartua, E., Casas, A.M., Ciudad, F., Montoya, L. and Romagosa, I. (1999) RFLP markers associated
with major genes controlling heading date evaluated in a barley germ plasm pool. Heredity 83,
551559.
Igartua, E., Edney, M., Rossnagel, B.G., Spaner, D., Legge, W.G., Scoles, G.L., Ecksteins, P.E., Penner,
G.A.,Tinker, N.A., Briggs, K.G., Falk, D.E. and Mather, D.E. (2000) Marker-assisted selection of QTL
affecting grain and malt quality in two-row barley. Crop Science 40, 14261433.
Ikeda, A., Ueguchi-Tanaka, M., Sonoda, Y., Kitano, H., Koshioka, M., Futsuhara, Y., Matsuoka, M. and
Yamaguchi, J. (2001) slender rice, a constitutive gibberellin response mutant, is caused by a null
mutation of the SLR1 gene, an ortholog of the height-regulating gene GAI/RGA/RHT/D8. The Plant
Cell 13, 9991010.
Ilic, K., Kellogg, E.A., Jaiswal, P., Zapata, F., Stevens, P.F., Vincent, L.P., Avraham, S., Reiser, L., Pujar, A.,
Sachs, M.M., Whitman, N.T., McCouch, S.R., Schaeffer, M.L., Ware, D.H., Stein, L.D. and Rhee, S.Y.
(2007) The Plant Structure Ontology, a unified vocabulary of anatomy and morphology of a flowering
plant. Plant Physiology 143, 587599.
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of human
genome. Nature 409, 860921.
References 663
Ioannidis, J.P., Ntzani, E.E., Trikalinos, T.A. and Contopoulos-Ioannidis, D.G. (2001) Replication validity of
genetic association studies. Nature Genetics 29, 306309.
IRGSP (International Rice Genome Sequencing Project) (2005) The map-based sequence of the rice
genome. Nature 436, 793800.
ISF (International Seed Federation) (2004) Protection of Intellectual Property and Access to Plant Genetic
Resources. Proceedings of an International Seminar, 2728 May, 2004, Berlin, CD-ROM.
ISF (International Seed Federation) (2005) Essential derivation from a not-yet protected variety and
dependency. ISF Position Paper, June 2005. Available at: http://www.worldseed.org/Position_papers/
ED&Dependency.htm (accessed 30 June 2007).
Ishida, Y., Saito, H., Ohta, S., Hiei, Y., Komari, T. and Kumashiro, T. (1996) High efficiency transformation of
maize (Zea mays L.) mediated by Agrobacterium tumefaciens. Nature Biotechnology 14, 745750.
Ishida, Y., Murai, N., Kuraya, Y., Ohta, S., Saito, H., Hiei, Y. and Komari, T. (2004) Improved co-transformation
of maize with vectors carrying two separate T-DNAs mediated by Agrobacterium tumefaciens. Plant
Biotechnology 21, 5763.
Ishimaru, K. (2003) Identification of a locus increasing rice yield and physiological analysis of its function.
Plant Physiology 122, 10831090.
Ivandic, V., Hackett, C.A., Nevo, E., Keith, R., Thomas, W.T.B. and Forster, B.P. (2002) Analysis of simple
sequence repeats (SSRs) in wild barley from the Fertile Crescent: associations with ecology, geogra-
phy and flowering time. Plant Molecular Biology 48, 511527.
Ivandic, V., Thomas, W.T.B., Nevo, E., Zhang, Z. and Forster, B.P. (2003) Association of SSRs with quan-
titative trait variation including biotic and abiotic stress tolerance in Hordeum spontaneum. Plant
Breeding 122, 300304.
Iwata, H., Uga, Y., Yoshioka, Y., Ebana, K. and Hayashi, T. (2007) Bayesian association mapping of multiple
quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L.
germplasms. Theoretical and Applied Genetics 114, 14371449.
Izawa, T., Takahashi, Y. and Yano, M. (2003) Comparative biology comes into bloom: genomic and genetic
comparison of flowering pathways in rice and Arabidopsis. Current Opinion in Plant Biology 6,
113120.
Jaccoud, D., Peng, K., Feinstein, D. and Kilian, A. (2001) Diversity arrays: a solid state technology for
sequence information independent genotyping. Nucleic Acids Research 29, e25.
Jack, T., Fox, G.L. and Meyerowitz, E.M. (1994) Arabidopsis homeotic gene APETALA3 ectopic expression:
transcriptional and posttranscriptional regulation determine floral organ identity. Cell 76, 703716.
Jaffe, G. (2004) Regulation transgenic crops: a comparative analysis of different regulatory processes.
Transgenic Research 13, 519.
Jain, S.M., Sopory, S.K. and Veilleux, R.E. (19961997) In Vitro Haploid Production in Higher Plants. Kluwer
Academic Publishers, Dordrecht, Netherlands.
James, C. (2006) Global Status of Commercialized Biotech/GM Crops: 2006. ISAAA Briefs No. 35.
International Service for the Acquisition of Agri-biotech Applications (ISAAA), Ithaca, New York.
James, C. (2008) 2007 ISAAA Report on Global Status of Biotech/GM Crops. International Service for
the Acquisition of Agri-biotech Applications (ISAAA). Available at: http://www.isaaa.org (accessed 17
November 2009).
Jander, G., Norris, S.R., Rounsley, S.D., Bush, D.F., Levin, I.M. and Last, R.L. (2002) Arabidopsis map-
based cloning in the post-genome era. Plant Physiology 129, 440450.
Janick, J. (1988) Horticulture, science and society. HortScience 23, 1113.
Janick, J. (1998) Hybrids in horticulture crops. In: Lamkey, K.R. and Staub, J.E. (eds) Concepts and
Breeding of Heterosis in Crop Plants. Crop Science Society of America (CSSA), Madison, Wisconsin,
pp. 4556.
Janis, M.D. and Kesan, J.P. (2002) U.S. plant variety protection: sound or furry ? Houston Law Review
39, 727778.
Janis, M.D. and Smith, S. (2007) Obsolescence in intellectual property regimes. University of Iowa Legal
Studies Research Paper No. 05-48. Abstract available at: http://papers.ssrn.com/sol3/papers.
cfm?abstract_id=897728 (accessed 17 November 2009).
Jannink, J.L. (2005) Selective phenotyping to accurately mapping quantitative trait loci. Crop Science 45,
901908.
Jannink, J.L. and Jansen, R.C. (2000) The diallel mating design for mapping interacting QTLs. In: Quantitative
Genetics and Breeding Methods: the Way Ahead. Institut National de la Recherche Agronomique
(INRA), Paris, pp. 8188.
664 References
Jannink, J.L. and Jansen, R. (2001) Mapping epistatic quantitative trait loci with one-dimensional genome
searches. Genetics 157, 445454.
Jannink, J.L. and Walsh, B. (2002) Association mapping in plant populations. In: Kang, M.S. (ed.) Quantitative
Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK, pp. 5968.
Jannink, J.L., Bink, M. and Jansen, R.C. (2001) Using complex plant pedigrees to map valuable genes.
Trends in Plant Science 6, 337342.
Jansen, C., Thomas, D.Y. and Pollock, S. (2005) Yeast two-hybrid technologies. In: Sensen, C.W. (ed.)
Handbook of Genome Research, Genomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
WILEY-VCH Verlag GmbH & Co., KGaA, Weinheim, Germany, pp. 261272.
Jansen, J.P.A. (1996) Aphid resistance in composites. International application published under the patent
cooperation treaty (PCT) No. WO 97/46080.
Jansen, R.C. (1996) A general Monte Carlo method for mapping multiple quantitative trait loci. Genetics
142, 305311.
Jansen, R.C. and Beavis, W.D. (2001) MQM mapping using haplotyped putative QTL-alleles: a simple
approach for mapping QTLs in plant breeding populations. Patent EP 1265476.
Jansen, R.C. and Nap, J.P. (2001) Genetical genomics: the added value from segregation. Trends in
Genetics 17, 388391.
Jansen, R.C. and Stam, P. (1994) High resolution of quantitative traits into multiple loci via interval mapping.
Genetics 136, 14471455.
Jansen, R.C., Van-Ooijen, J.W., Stam, P., Lister, C. and Dean, C. (1995) Genotype-by-environment inter-
action in genetic mapping of multiple quantitative trait loci. Theoretical and Applied Genetics 91,
3337.
Jansen, R.C., Jannink, J.-L. and Beavis, W.D. (2003) Mapping quantitative trait loci in plant breeding popu-
lations: use of parental haplotype sharing. Crop Science 43, 829834.
Jarvis, A., Yeaman, S., Guarino, L. and Tohme, J. (2005) The role of geographic analysis in locating, under-
standing and using plant genetic diversity. In: Zimmer, E. (ed.) Molecular Evolution: Producing the
Biochemical Data, Part B. Elsevier, New York, pp. 279298.
Jarvis, D.I. and Hodgkin, T. (1999) Wild relatives and crop cultivars: detecting natural introgression and
farmer selection of new genetic combinations in agroecosystems. Molecular Ecology 8, S159S173.
Jayasekara, N.E.M. and Jinks, J.L. (1976) Effect of gene dispersion on estimates of components of genera-
tion means and variances. Heredity 36, 3140.
Jefferson, R.A. (1987) Assaying chimeric genes in plants: the GUS gene fusion system. Plant Molecular
Biology Reporter 5, 387405.
Jenkins, H., Johnson, H., Kular, B., Wang, T. and Hardy, N. (2005) Toward supportive data collection tools
for plant metabolomics. Plant Physiology 138, 6777.
Jenkins, S. and Gibson, N. (2002) High-throughput SNP genotyping. Comparative and Functional Genomics
3, 5766.
Jenks, M.A. and Feldmann, K. (1996) Cloning genes by insertion mutagenesis. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 155168.
Jensen, C.J. (1974) Chromosome doubling techniques in haploids. In: Kasha, K.J. (ed.) Haploids in Higher
Plants: Advances and Potentials. Guelph University Press, Guelph, Canada, pp. 153190.
Jensen, L.J., Saric, J. and Bork, P. (2006) Literature mining for the biologist: from information retrieval to
biological discovery. Nature Reviews Genetics 7, 119129.
Jeon, J.-S., Kang, H.-G. and An, G. (2004) Tools for gene tagging and mutagenesis. In: Christou, P. and Klee,
H. (eds) Handbook of Plant Biotechnology. John Wiley & Sons Ltd, Chichester, UK, pp. 103125.
Jia, H., Pang, Y., Chen, X. and Fang, R. (2006) Removal of the selectable marker gene from transgenic
tobacco plants by expression of Cre recombinase from a tobacco mosaic virus vector through agroin-
fection. Transgenic Research 15, 375384.
Jiang, C. and Zeng, Z.B. (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics
140, 11111127.
Jiang, C., Pan, X. and Gu, M. (1994) The use of mixture models to detect effects of major genes on quan-
titative characters in plant breeding experiment. Genetics 136, 383394.
Jiang, C., Edmeades, G.O., Armstead, I., Lafitte, H.R., Hayward, M.D. and Hoisington, D. (1999) Genetic
analysis of adaptation differences between highland and lowland tropical maize using molecular
markers. Theoretical and Applied Genetics 99, 11061119.
Jiang, N., Bao, Z., Zhang, X., Hirochika, H., Eddy, S.R., McCouch, S.R. and Wessler, S.R. (2003) An active
DNA transposon family in rice. Nature 421, 163167.
References 665
Jin, C., Lan, H., Attie, A.D., Churchill, G.A., Bulutuglo, D. and Yandell, B.Y. (2004) Selective phenotyping for
increased efficiency in genetic mapping study. Genetics 168, 22852293.
Jin, S., Komari, T., Gordon, M.P. and Nester, E.W. (1987) Genes responsible for the supervirulence pheno-
type of Agrobacterium tumefaciens A281. Journal of Bacteriology 169, 44174425.
Jinks, J.L. and Perkins, J.M. (1969) The detection of linked epistatic genes for a metrical trait. Heredity 24,
465475.
Jinks, J.L. and Perkins, J.M. (1972) Predicting the range of inbred lines. Heredity 28, 399403.
Joen, J.-S., Lee, S., Jung, K.-H., Jun, S.-H., Joeng, D.-H., Lee, J., Kim, C., Jang, S., Yang, K., Nam, J., An, K.,
Han, M.J., Sung, R.-J., Choi, H.-S., Yu, J.-H., Choi, J.-H., Cho, S.-S., Cha, S.-S., Kim, S.-I. and An, G.
(2000) T-DNA insertional mutagenesis for functional genomics in rice. The Plant Journal 22, 561571.
Joersbo, M. and Okkels, F.T. (1996) A novel principle for selection of transgenic plant cells: positive selec-
tion. Plant Cell Reports 16, 219221.
Joersbo, M., Donaldson, I., Kreiberg, J., Petersen, S.G., Brunstedt, J. and Okkels, F.T. (1998) Analysis of
mannose selection used for transformation of sugar beet. Molecular Breeding 4, 111117.
Johannes, F. (2007) Mapping temporally varying quantitative trait loci in time-to-failure experiments.
Genetics 175, 855865.
Johnson, B., Gardner, C.O. and Wrede, K.C. (1988) Application of an optimization model to multi-trait selec-
tion programs. Crop Science 28, 723728.
Johnson, G.R. (2004) Marker assisted selection. Plant Breeding Reviews 24, 293310.
Johnson, H.E., Broadburst, D., Goodacre, R. and Smith, A.R. (2003) Metabolic fingerprinting of salt-
stressed tomatoes. Phytochemistry 62, 919928.
Johnson, H.W., Robinson, H.F. and Comstock, R.E. (1955) Estimates of genetic and environmental vari-
ability in soybeans. Agronomy Journal 47, 314318.
Johnson, R. (2001) Marker-assisted sweet corn breeding: a model for special crops. In: Proceedings of
56th Annual Corn and Sorghum Industry Research Conference Chicago, Illinois, 57 December
2001. American Seed Trade Association, Washington, DC, pp. 2530.
Jones, H. (ed.) (1995) Plant Gene Transfer and Expression Protocols. Humana Press, Totowa, New Jersey.
Jones, H.D., Doherty, A. and Wu, H. (2005) Review of methodologies and a protocol for the Agrobacterium-
mediated transformation of wheat. Plant Methods 2005, 15.
Jorasch, P. (2004) Intellectual property rights in the field of molecular marker analysis. In: Lrz, H. and
Wenzel, G. (eds) Biotechnology in Agriculture and Forestry, Vol. 55. Molecular Marker Systems.
Springer-Verlag Berlin, pp. 433471.
Jordaan, J.P., Engelbrecht, S.A., Malan, J.H. and Knobel, H.A. (1999) Wheat and heterosis. In: Coors, J.G.
and Pandey, S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison,
Wisconsin, pp. 411421.
Jordan, D., Tao, Y., Godwin, I., Henzell, R., Cooper, M. and McIntyre, C. (2004) Prediction of hybrid perform-
ance in grain sorghum using RFLP markers. Theoretical and Applied Genetics 106, 559567.
Jorde, L.B. (2000) Linkage disequilibrium and the search for complex disease genes. Genome Research
10, 14351444.
Joseph, M., Gopalakrishnan, S., Sharma, R.K., Singh, V.P., Singh, A.K., Singh, N.K. and Mohapatra, T.
(2003) Combining bacterial blight resistance and Basmati quality characteristics by phenotypic and
molecular marker-assisted selection in rice. Molecular Breeding 13, 111.
Jourjon, M.F., Jasson, S., Marcel, J., Ngom, B. and Mangin, B. (2005) MCQTL: multi-allelic QTL mapping
in multi-cross design. Bioinformatics 21, 128130.
Jung, K.-H., An, G. and Ronald, P.C. (2008) Towards a better bowl of rice: assigning function to tens of
thousands of rice genes. Nature Reviews Genetics 9, 91101.
Kahler, A.L., Gardner, C.O. and Allard, R.W. (1984) Nonrandom mating in experimental populations of
maize. Crop Science 24, 350354.
Kahraman, A., Avramov, A., Nashev, L.G., Popov, D., Ternes, R., Pohlenz, H.-D. and Weiss, B. (2005)
PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bio-
informatics 21, 418420.
Kahvejian, A., Quackenbush, J. and Thompson, J.F. (2008) What would you do if you could sequence
everything? Nature Biotechnology 26, 11251133.
Kamujima, O., Tanisaka, T. and Kinoshita, T. (1996) Gene symbols for dwarfness. Rice Genetics Newsletter
13,1925.
Kang, M.S. (1988) A rank-sum method for selecting high-yielding, stable corn genotypes. Cereal Research
Communications 16, 113115.
666 References
Kang, M.S. (1990) Understanding and utilization of genotypeenvironment interaction in plant breeding.
In: Kang, M.S. (ed.) Genotype-By-Environment Interactions and Plant Breeding. Louisiana State
University Agriculture Center, Baton Rouge, Louisiana, pp. 5268.
Kang, M.S. (1993) Simultaneous selection for yield and stability in crop performance trials: consequences
for growers. Agronomy Journal 85, 754757.
Kang, M.S. (2002) Genotypeenvironment interaction: progress and prospects. In: Kang, M.S. (ed.)
Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK,
pp. 221243.
Kang, M.S. and Magari, R. (1996) New developments in selecting for phenotypic stability in crop breeding.
In: Kang, M.S. and Gauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca
Raton, Florida, pp. 114.
Kantety, R.V., Rota, M.L., Mathews, D.E. and Sorrels, M.E. (2002) Data mining for simple-sequence repeats
in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Molecular Biology
48, 501510.
Kao, C.H. (2004) Multiple-interval mapping for quantitative trait loci controlling endosperm traits. Genetics
167, 19872002.
Kao, C.H. (2006) Mapping quantitative trait loci using the experimental designs of recombinant inbred
populations. Genetics 174, 13731386.
Kao, C.H. and Zeng, Z.B. (1997) General formulas for obtaining the MLEs and the asymptotic variance
covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53,
653665.
Kao, C.H., Zeng, Z.B. and Teasdale, R.D. (1999) Multiple interval mapping for quantitative trait loci. Genetics
152, 12031216.
Karas, M. and Hillenkamp, F. (1988) Laser desorption ionization of proteins with molecular mass exceeding
10000 daltons. Analytical Chemistry 60, 22992301.
Karimi, M., Bleys, A., Vanderhaeghen, R. and Hilson, P. (2007) Building blocks for plant gene assembly.
Plant Physiology 145, 11831191.
Karp, A. and Edwards, J. (1997) DNA markers: a global overview. In: Caetano-Anolles, G. and Gresshoff,
P.M. (eds) DNA Markers Protocols, Applications and Overviews. Wiley-Liss, Inc., New York,
pp. 113.
Kartal, M. (2007) Intellectual property protection in the natural product drug discovery, traditional herbal
medicine and herbal medicinal products. Phytotherapy Research 21, 113119.
Kasha, K.J. (2005) Chromosome doubling and recovery of doubled haploid plants. In: Palmer, C.E., Keller,
W.A. and Kasha, K.J. (eds) Biotechnology in Agriculture and Forestry, Vol. 56. Haploids in Crop
Improvement II. Springer-Verlag, Berlin, pp. 123152.
Katari, M.S., Balija, V., Wilson, R.K., Martienssen, R.A. and McCombie, W.R. (2005) Comparing low cover-
age random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for
their ability to add to the annotation of Arabidopsis thaliana. Genome Research 15, 496504.
Kato, A. (2002) Chromosome doubling of haploid maize seedlings using nitrous oxide gas at the flower
primordial stage. Plant Breeding 121, 370377.
Kauffman, S.A. (1993) The Origins of Order: Self-Organization and Selection in Evolution. Oxford University
Press, Oxford, UK.
Kaushik, N., Sirohi, M. and Khanna, V.K. (2004) Influence of age of the embryo and method of hormone
application on haploid embryo formation in wheat x maize crosses. In: New Directions for a Diverse
Planet, Proceedings of the 4th International Crop Science Congress, 26 September1 October, 2004,
Brisbane, Australia. Published on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004/
(accessed 17 November 2009).
Kearsey, M.J. and Farquhar, A.G.L. (1998) QTL analysis in plants: where are we now? Heredity 80,
137142.
Kearsey, M.J. and Hyne, V. (1994) QTL analysis: a simple marker regression approach. Theoretical and
Applied Genetics 89, 698702.
Kearsey, M.J. and Jinks, J.L. (1968) A general method of detecting additive, dominance and epistasis vari-
ation for metrical traits. I. Theory. Heredity 23, 403409.
Kearsey, M.J., Pooni, H.S. and Syed, N.H. (2003) Genetics of quantitative traits in Arabidopsis thaliana.
Heredity 91, 456464.
Keating, B.A., Carberry, P.S., Hammer, G.L., Probert, M.E., Robertson, M.J., Holzworth, D., Huth, N.I.,
Hargreaves, J.N.G., Meinke, H., Hockman, Z., McLean, G., Verburg, K., Snow, V., Dimes, J.P., Silburn,
References 667
M., Wang, E., Brown, S., Bristow, K.L., Asseng, S., Chapman, S., McCown, R.L., Freebairn, D.M. and
Smith, C.J. (2003) An overview of APSIM, a model designed for farming system simulation. European
Journal of Agronomy 18, 267288.
Keightley, P.D. (2004) Mutational variation and long-term selection response. Plant Breeding Reviews
24(1), 227247.
Keightley, P.D. and Bulfield, G. (1993) Detection of quantitative trait loci from frequency changes of marker
alleles under selection. Genetical Research 62, 195203.
Keller, E.R.J. and Korzun, L. (1996) Haploidy in onion (Allium cepa L.) and other Allium species. In: Jain,
S.M., Sopory, S.M. and Veilleux, R.E. (eds) In Vitro Haploid Production in Higher Plants. Vol. 3:
Important Selected Plants. Kluwer Academic Publisher, Dordrecht, Netherlands, pp. 5175.
Kempthorne, O. (1957) An Introduction to Genetics Statistics. Wiley, New York.
Kempthorne, O. (1988) An overview of the field of quantitative genetics. In: Weir, B.S., Eisen, E.J., Goodman,
M.M. and Namkoong, G. (eds) Proceedings of the 2nd International Conference on Quantitative
Genetics. Sinauer Associates, Inc., Sunderland, Massachusetts, pp. 4756.
Kennedy, B.G., Waters, D.L.E. and Henry, R.J. (2006) Screening for the rice blast resistance gene Pi-ta
using LNA displacement probes and real-time PCR. Molecular Breeding 18, 185193.
Kermicle, J.L. (1969) Androgenesis conditioned by a mutation in maize. Science 166, 14221424.
Kerns, M.R., Dudley, J.W. and Rufener, G.K. (1999) QTL for resistance to common rust and smut in maize.
Maydica 44, 3745.
Kersten, B., Berkle, L., Kuhn, E.J., Giavalisco, P., Konthur, Z., Lueking, A., Walter, G., Eickhoff, H. and
Schneider, U. (2002) Large-scale plant proteomics. Plant Molecular Biology 48, 133141.
Keurentjes, J.J., Bentsink, L., Alonso-Blanco, C., Hanhart, C.J., Blankestijn-De Vries, H., Effgen, S., Vreugdenhil,
D. and Koornneef, M. (2007a) Development of a near-isogenic line population of Arabidopsis thaliana
and comparison of mapping power with a recombinant inbred line population. Genetics 175, 891905.
Keurentjes, J.J.B., Jingyuan Fu, L., Terpstra, I.R., Garcia, J.M., Ackerveken, G., Snoek, L.B., Peeters,
A.J.M., Vreugdenhil, D., Koornneef, M. and Jansen, R.C. (2007b) Regulatory network construction in
Arabidopsis by using genome-wide gene expression quantitative trait loci. Proceedings of the National
Academy of Sciences of the United States of America 104, 17081713.
Keurentjes, J.J.B., Koornnef, M. and Vreugdenhil, D. (2008) Quantitative genetics in the age of omics.
Current Opinion in Plant Biology 11, 123128.
Khatkar, M.S., Thomson, P.C., Tammen, I. and Raadsma, H.W. (2004) Quantitative trait loci mapping in
dairy cattle: review and meta-analysis. Genetics Selection Evolution 36, 163190.
Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations
and open problems. Bioinformatics 21, 35873595.
Khatri, P., Draghici, S., Ostermeier, G.C. and Krawetz, S.A. (2002) Profiling gene expression using Onto-
Express. Genomics 79, 266270.
Khush, G.S. (1987) List of gene markers maintained in the Rice Genetic Stock Center, IRRI. Rice Genetics
Newsletter 4, 5662.
Khush, G.S. (1999) Green revolution: preparing for the 21st century. Genome 42, 646655.
Kiesselbach, T.A. (1926) The immediate effect of gametic relationship and of parental type upon the kernel
weight of corn. Nebraska Agricultural Experiment Station Bulletin 33, 169.
Kikuchi, K., Terauchi, K., Wada, M. and Hirano, Y. (2003) The plant MITE mPing is mobilized in anther
culture. Nature 421, 167170.
Kilian, A., Chen, J., Han, F., Steffenson, B. and Kleinhofs, A. (1997) Towards map-based cloning of the
barley stem rust resistance gene Rpg1 and rpg4 using rice as an intergenomic cloning vehicle. Plant
Molecular Biology 35, 187195.
Kilian, A., Kudrna, D. and Kleinhofs, A. (1999) Genetic and molecular characterization of barley chromo-
some telomeres. Genome 42, 412419.
Kilpikari, R. and Sillanp, M.J. (2003) Bayesian analysis of multilocus association in quantitative and
qualitative traits. Genetic Epidemiology 25, 122135.
Kim, K.-W., Chung, H.-K., Cho, G.-T., Ma, K.-H., Chandrabalan, D., Gwag, J.-G., Kim, T.-S., Cho, E.-G. and
Park, Y.-J. (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for
establishing core mining sets. Bioinformatics 23, 21552162.
Kimmel, A. and Oliver, B. (eds) (2006a) DNA Microarrays Part A: Array Platforms and Wet-Bench Protocols.
Elsevier Inc., Amsterdam.
Kimmel, A. and Oliver, B. (eds) (2006b) DNA Microarrays Part B: Databases and Statistics. Elsevier Inc.,
Amsterdam.
668 References
Kimmel, B.E., Palazzolo, M.J., Martin, C.H., Boeke, J.D. and Devine, S.E. (1997) Transposon-mediated
DNA sequencing. In: Birren, B., Green, E.D., Klapholz, S., Myers, R.M. and Roskams, J. (eds) Genome
Analysis: a Laboratory Manual, Vol. 1. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New
York, pp. 455532.
Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to
steady flux of mutations. Genetics 61, 893903.
King, G.L. (2004) Bioinformatics: harvesting information for plant and crop science. Seminars in Cell and
Developmental Biology 15, 721731.
King, J., Armstead, I.P., Donnison, I.S., Thomas, H.M., Jones, R.N., Kearseyc, M.J., Roberts, L.A., Thomas, A.,
Morgan, W.G. and King, I.P. (2002) Physical and genetic mapping in the grasses Lolium perenne and
Festuca pratensis. Genetics 161, 315324.
Kinoshita, T. (1995) Report of Committee on Gene Symbolization, Nomenclature and Linkage Groups. Rice
Genetics Newsletter 12, 9153.
Kinoshita, T. and Takahashi, M. (1991) The one hundredth report of genetical studies on rice plant: linkage
studies and future prospects. Journal of the Faculty of Agriculture, Hokkaido University 65, 161.
Kirst, M., Myburg, A.A., De Len, J.P.G., Kirst, M.E., Scott, J. and Sederoff, R. (2004) Coordinated genetic
regulation of growth and lignin revealed by quantitative trait locus analysis of cDNA microarray data in
an interspecific backcross of eucalyptus. Plant Physiology 135, 23682378.
Kisana, N.S., Nkongolo, K.K., Quick, J.S. and Johnson, D.L. (1993) Production of doubled haploids by
anther culture and wheat maize method in a wheat breeding programme. Plant Breeding 110,
96102.
Kiviharju, E., Moisander, S. and Laurila, J. (2005) Improved green plant regeneration rates from oat anther cul-
ture and the agronomic performance of some DH lines. Plant Cell, Tissue and Organ Culture 81, 19.
Kjemtrup, S., Boyes, D.C., Christensen, C., McCaskill, A.J., Hylton, M. and Davis, K. (2003) Growth
stage-based phenotypic profiling of plants. In: Grotewold, E. (ed.) Methods in Molecular Biology, Vol.
236. Plant Functional Genomics: Methods and Protocols. Humana Press, Totowa, New Jersey, pp.
427441.
Klein, P.E., Klein, R.R., Cartinhour, S.W., Ulanch, P.E., Dong, J., Obert, J.A., Morishige, D.T., Schlueter,
S.D., Childs, K.L., Ale, M. and Mullet, J.E. (2000) A high-throughput AFLP-based method for con-
structing integrated genetic and physical maps: progress toward a sorghum genome map. Genome
Research 10, 789807.
Klein, P.E., Klein, R.R., Vrebalov, J. and Mullet, J.E. (2003) Sequence-based alignment of sorghum chro-
mosome 3 and rice chromosome 1 reveals extensive conservation of gene order and one major
chromosomal rearrangement. The Plant Journal 34, 605621.
Klose, J., Nock, C., Herrmann, M., Sthler, K., Marcus, K., Blggel, M., Krause, E., Schalkwyk, L.C.,
Rastan, S., Brown, S.D.M., Bssow, K., Himmelbauer, H. and Lehrach, H. (2002) Genetic analysis of
mouse brain proteome. Nature Genetics 30, 385393.
Knapp, S.J. (1991) Using molecular markers to map multiple quantitative trait loci: models for backcross,
recombinant inbred and doubled haploid progeny. Theoretical and Applied Genetics 81, 333338.
Knapp, S.J. (1994) Mapping quantitative trait loci. In: Philip, R.I. and Vasil, I.K. (eds) DNA-Based Markers in
Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 5896.
Knapp, S.J. (1998) Marker-assisted selection as a strategy for increasing the probability of selecting supe-
rior genotypes. Crop Science 38, 11641174.
Knapp, S.J., Holloway, J.L., Bridges, W.C. and Liu, B.-H. (1995) Mapping dominant markers using F2 mat-
ings. Theoretical and Applied Genetics 91, 7481.
Knoll, J. and Ejeta, G. (2008) Marker-assisted selection for early-season cold tolerance in sorghum: QTL
validation across populations and environments. Theoretical and Applied Genetics 116, 541553.
Knox, M.R. and Ellis, T.H. (2001) Stability and inheritance of methylation states at PstI sites in Pisum.
Molecular Genetics and Genomics 265, 497507.
Kobiljski, B., Quarrie, S., Dencic, S., Kirby, J. and Ivege, M. (2002) Genetic diversity of the Novi Sad wheat
core collection revealed by microsatellites. Cellular and Molecular Biology 7, 685694.
Koebner, R.M. and Summers, R.W. (2003) 21st century wheat breeding: plot selection or plate detection?
Trends in Biotechnology 21, 5963.
Koester, R.P., Sisco, P.H. and Stuber, C.W. (1993) Identification of quantitative trait loci controlling days to
flowering and plant height in two near isogenic lines of maize. Crop Science 33, 12091216.
Kohli, A., Leech, M., Vain, P., Laurie, D.A. and Christou, P. (1998) Transgene organization in rice engineered
through direct DNA transfer supports a two-phase integration mechanism mediated by the establish-
References 669
ment of integration hot spots. Proceedings of the National Academy of Sciences of the United States
of America 95, 72037208.
Kohli, A., Xiong, J., Greco, R., Christou, P. and Pereira, A. (2001) Transcriptome Display (TTD) in indica rice
using Ac transposition. Molecular Genetics and Genomics 266, 111.
Kohli, A., Twyman, R.M., Abranches, A., Wegel, E., Christou, P. and Stoger, E. (2003) Transgene integra-
tion, organization and interaction in plants. Plant Molecular Biology 52, 247258.
Kohli, A., Prynne, M.Q., Berta, M., Pereira, A., Cappell, T., Twyman, R.M. and Christou, P. (2004)
Dedifferentiation-mediated changes in transposition behavior make the Activator transposon an ideal
tool for functional genomics in rice. Molecular Breeding 13, 177191.
Koizuka, N., Imai, R., Fujimoto, H., Hayakawa, T., Kimura, Y., Kohno-Murase, J., Sakai, T., Kawasaki, S. and
Imamura, J. (2003) Genetic characterization of a pentatricopeptide repeat protein gene, orf687, that
restores fertility in the cytoplasmic male-sterile Kosena radish. The Plant Journal 34, 407415.
Kojima, S., Takahashi, Y., Kobayashi, Y., Monna, L., Sasaki, T., Araki, T. and Yano, M. (2002) Hd3a, a rice
ortholog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short-
day conditions. Plant Cell and Physiology 43, 10961105.
Koller, A., Washburn, M.P., Lange, B.M., Andon, N.L., Deciu, C., Haynes, P.A., Hays, L., Schieltz, D., Ulaszek,
R., Wei, J., Wolters, D. and Yates, J.R. III (2002) Proteomic survey of metabolic pathways in rice.
Proceedings of the National Academy of Sciences of the United States of America 99, 1196911974.
Komari, T. (1990) Transformation of cultured cells of Chenopodium quinoa by binary vectors that carry a
fragment of DNA from the virulence region of pTiBo542. Plant Cell Reports 9, 303306.
Komari, T., Hiei, Y., Saito, Y., Murai, N. and Kumashiro, T. (1996) Vectors carrying two separate T-DNAs for
co-transformation of higher plants mediated by Agrobacterium tumefaciens and segregation of trans-
formants free from selection markers. The Plant Journal 10, 165174.
Komari, T., Takakura, Y., Ueki, J., Kato, N., Ishida, Y. and Hiei, Y. (2006) Binary vectors and super-binary
vectors. In: Wang, K. (ed.) Methods in Molecular Biology 343: Agrobacterium Protocols, Vol. 1, 2nd
edn. Humana Press, Totowa, New Jersey, pp. 1541.
Komori, T., Ohta, S., Murai, N., Takakura, Y., Kuraya, Y., Suzuki, S., Hiei, Y., Imaseki, H. and Nitta, N. (2004)
Map-based cloning of a fertility restorer gene, Rf-1, in rice (Oryza sativa L.). The Plant Journal 37,
315325.
Komori, T., Imayama, T., Kato, N., Ishida,Y., Ueki, J. and Komari, T. (2007) Current status of binary vectors
and superbinary vectors. Plant Physiology 145, 11551160.
Koncz, C. and Schell, J. (1986) The promoter of TL-DNA gene 5 controls the tissue-specific expression
of chimaeric genes carried by a novel type of Agrobacterium binary vector. Molecular and General
Genetics 204, 383396.
Konieczny, A. and Ausubel, F. (1993) A procedure for mapping Arabidopsis mutations using co-dominant
ecotype-specific PCR based markers. The Plant Journal 4, 403410.
Konishi, T., Abe, K., Matsuura, S. and Yano, Y. (1990) Distorted segregation of the esterase isozyme geno-
types in barley Hordeum vulgare L. Japanese Journal of Genetics 65, 411416.
Konishi, T., Yano, Y. and Abe, K. (1992) Geographic distribution of alleles at the ga2 locus for segregation
distortion in barley. Theoretical and Applied Genetics 85, 419422.
Koonin, E.V. (2005) Orthologies, paralogs and evolutionay genomics. Annual Review of Genetics 39,
309338.
Koornneef, M., Dellaert, L.W.M. and van der Veen, J.H. (1982) EMS- and radiation-induced mutation fre-
quencies at individual loci in Arabidopsis thaliana (L.) Heynh. Mutation Research 93, 109123.
Korbel, J.O., Doerks, T., Jensen, L.J., Perez-Iratxeta, C., Kaczanowski, S., Hooper, S.D., Andrade, M.A.
and Bork, P. (2005) Systematic association of genes to phenotypes by genome and literature mining.
PLos Biology 3, e134.
Korol, A.B., Ronin, Y.I., Nevo, E. and Hayes, P. (1998) Multi-interval mapping of correlated trait complexes:
simulation analysis and evidence from barley. Heredity 80, 273284.
Korol, A.B., Ronin, Y.I., Itskovichi, A.M., Peng, J. and Nevo, E. (2001) Enhanced efficiency of quantita-
tive trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157,
17891803.
Kosambi, D.D. (1944) The estimation of map distances from recombination values. Annals of Eugenics 12,
172175.
Kota, R., Rudd, S., Facius, A., Kolesov, G., Theil, T., Zhang, H., Stein, N., Mayer, K. and Graner, A. (2003)
Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Molecular
Genectics and Genomics 270, 2433.
670 References
Kowalski, S.P. and Kryder, R.D. (2002) Golden rice: a case study in intellectual property management and
international capacity building. RISK: Health, Safety and Environment 13, 4767.
Kraakman, A.T.W., Niks, R.E., van den Berg, P.M.M.M., Stam, P. and van Eeuwijk, F.A. (2004) Linkage
disequilibrium mapping of yield and yield stability in modern spring barley cultivars. Genetics 168,
435446.
Kraft, T., Hansen, M. and Nilsson, N.-O. (2000) Linkage disequilibrium and fingerprinting in sugar beet.
Theoretical and Applied Genetics 101, 323326.
Krapp, A., Morot-Gaudry, J.F., Boutet, S., Bergot, G., Lelarge, C., Prioul, J.L. and Noctor, G. (2007)
Metabolomics. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science
Publishers, Enfield, New Hampshire, pp. 311333.
Krattiger, A., Mahoney, R.T., Nelsen, L., Bennett, A.B., Graff, G.D., Fernandez, C. and Kowalski, S.P. (eds)
(2006) Intellectual Property Management in Health and Agricultural Innovation, a Handbook of Best
Practices. Centre for the Management of Intellectual Property in Health R&D, Oxford, UK and Public
Intellectual Property Resource for Agriculture, Davis, California.
Kresovich, S. and McFerson, J.R. (1992) Assessment and management of plant genetic diversity: consid-
eration of intra- and interspecific variation. Field Crops Research 29, 185204.
Kresovich, S., Luongo, A.J. and Schloss, S.J. (2002) Mining the gold: finding allelic variants for improved
crop conservation and use. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson,
M.T. (eds) Managing Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome,
pp. 379386.
Kriegner, A., Cervantes, J.C., Burg, K., Mwanga, R.O.M. and Zhang, D.P. (2003) A genetic linkage
map of sweetpotato (Ipomoea batatas (L) Lam) based on AFLP markers. Molecular Breeding 11,
169185.
Krishnan, P., Kruger, N.J. and Ratcliffe, R.G. (2005) Metabolite fingerprinting and profiling in plants using
NMR. Journal of Experimental Botany 56, 255265.
Krizkova, L. and Hrouda, M. (1998) Direct repeats of T-DNA integrated in tobacco chromosome: characteri-
zation of junction regions. The Plant Journal 16, 673680.
Kruglyak, L. (2008) The road to genome-wide association studies. Nature Reviews Genetics 9, 314318.
Kryder, R.D., Kowalski, S.P and Krattiger, A.F. (2000) The intellectual and technical property components
of pro-vitamin A rice (GoldenRice): a preliminary freedom-to-operate review. ISAAA Briefs No. 20.
International Service for the Acquisition of Agri-biotech Applications (ISAAA), Ithaca, New York, 56
pp.
Krysan, P.J., Young, J.C. and Sussman, M.R. (1999) T-DNA as an insertional mutagen in Arabidopsis.
The Plant Cell 11, 22832290.
Krysan, P.J., Young, J.C., Jester, P.J., Monson, S., Copenhaver, G., Preuss, D. and Sussman, M.R. (2002)
Characterization of T-DNA insertion sites in Arabidopsis thaliana and the implications for saturation
mutagenesis. OMICS 6, 163174.
Kuchel, H., Ye, G., Fox, R. and Jefferies, S. (2005) Genetic and economic analysis of a targeted marker-
assisted wheat breeding strategy. Molecular Breeding 16, 6778.
Kuchel, H., Fox, R., Reinheimer, J., Mosionek, L., Willey, N., Bariana, H. and Jefferies, S. (2008) The suc-
cessful application of a marker-assisted wheat breeding strategy. Molecular Breeding 20, 295308.
Kuiper, H.A., Kok, E.J. and Engel, K.H. (2003) Exploitation of molecular profiling techniques for GM food
safety assessment. Current Opinion in Biotechnology 14, 238243.
Kuiper, M., Zabeau, M. and Vos, P. (1997) Amplification of simple sequence repeats. Patent EP 0805875.
Kumar, I. and Khush, G.S. (1986) Genetics of amylose content in rice (Oryza sativa L.). Journal of Genetics
65, 111.
Kumar, P.V.S. (1993) Biotechnology and biodiversity a dialectical relationship. Journal of Scientific and
Industrial Research 52, 523532.
Kumpatla, S.P. and Mukhopadhyay, S. (2005) Mining and survey of simple sequence repeats in expressed
sequence tags of dicotyledonous species. Genome 48, 985998.
Kurata, N., Moore, G., Nagamura, Y., Foote, T., Yano, M., Minobe, Y. and Gale, M. (1994) Conservation of
genome structure between rice and wheat. Nature Biotechnology 12, 276278.
Kusterer, B., Piepho, H.P., Utz, H.F., Schn, C.C., Muminovic, J., Meyer, R.C., Altmann, T. and Melchinger,
A.E. (2007) Heterosis for biomass-related traits in Arabidopsis investigated by a novel QTL analysis of
the triple testcross design with recombinant inbred lines. Genetics 177, 18391850.
Lagercrantz, U. and Lydiate, D. (1995) RFLP mapping in Brassica nigra indicates different recombination
rates in male and female meiosis. Genome 38, 255264.
References 671
Laird, N.M. and Lange, C. (2006) Family-based designs in the age of large-scale gene-association studies.
Nature Reviews Genetics 7, 385394.
Lalonde, S., Ehrhardt, D.W., Loqu, D., Chen, J., Rhee, S.Y. and Frommer, W.B. (2008) Molecular and
cellular approaches for the detection of proteinprotein interactions: latest techniques and current
limitations. The Plant Journal 53, 610635.
Lamkey, K.R. and Edwards, J.W. (1999) Quantitative genetics of heterosis. In: Coors, J.G. and Pandey, S.
(eds) The Genetics and Exploitation of Heterosis in Crops. American Society of Agronomy (ASA) and
Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 3148.
Lamkey, K.R., Schnicker, B.J. and Melchinger, A.E. (1995) Epistasis in an elite maize hybrid and choice of
generation for inbred development. Crop Science 35, 12721281.
Lan, H., Chen, M., Flowers, J.B., Yandell, B.S., Stapleton, D.S., Mata, C.M., Mui, E.T.-K., Flowers, M.T.,
Schueler, K.L., Manly, K.F., Williams, R.W., Kendziorski, C. and Attie, A.D. (2006) Combined expres-
sion trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2(1), e6.
Lande, R. and Thompson, R. (1990) Efficiency of marker-assisted selection in the improvement of quantita-
tive traits. Genetics 124, 743756.
Landegren, U., Kaiser, R., Sanders, J. and Hood, L. (1988) A ligase-mediated gene detection technique.
Science 241, 10771080.
Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and
reporting linkage results. Nature Genetics 11, 241247.
Lander, E.S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP
linkage maps. Genetics 121,185199.
Lander, E.S. and Green, P. (1987) Construction of multilocus genetic linkage maps in humans. Proceedings
of the National Academy of Sciences of the United States of America 84, 23632367.
Lander, E.S., Green, P., Abrahamson, J., Barlow, A., Daly, M.J., Lincoln, S.E. and Newburg, L. (1987) MAP-
MAKER: an interactive computer package for constructing primary genetic linkage maps of experimental
and natural populations. Genomics 1, 174181.
Landy, A. (1989) Dynamic, structural and regulatory aspects of lambda-site-specific recombination. Annual
Review of Biochemistry 58, 913949.
Lane, M.A., Edwards, J.L. and Nielsen, E.S. (2000) Biodiversity informatics: the challenges of rapid devel-
opment, large databases and complex data. In: Proceedings of the 26th International Conference on
Very Large Databases, 1014 September 2000, Cairo, Egypt. Very Large Data Base Endowment,
Inc., USA.
Lang, N.T., Subudhi, P.K., Virmani, S.S., Brar, D.S., Khush, G.S., Li, Z. and Huang, N. (1999) Development
of PCR-based markers for thermosensitive genetic male sterility gene tms3(t) in rice (Oryza sativa
L.). Hereditas 131, 121127.
Laperche, A., Brancourt-Hulmel, M., Heumez, E., Gardet, O., Hanocq, E., Devienne-Barret, F. and Le
Gouis, J. (2007) Using genotype nitrogen interaction variables to evaluate the QTL involved in wheat
tolerance to nitrogen constraints. Theoretical and Applied Genetics 115, 399415.
Laramie, J.M., Wilk, J.B., DeStefano, A.L. and Myers, R.H. (2007) HaploBuild: an algorithm to con-
struct non-contiguous associated haplotypes in family based genetic studies. Bioinformatics 23,
21902192.
Larkin, P.J. and Scowcroft, W.R. (1981) Somaclonal variation a novel source of variability from cell cul-
tures for plant improvement. Theoretical and Applied Genetics 60, 197214.
Lashermes, P. and Beckert, M. (1988) Genetic control of maternal haploidy in maize (Zea mays L.) and
selection of haploid inducing lines. Theoretical and Applied Genetics 76, 405410.
Lassner, M.W. and Orton, T.J. (1983) Detection of somatic variation. In: Tanksley, S.D. and Orton, T.J. (eds)
Isozymes in Plant Genetics and Breeding. Vol. 1A. Developments in Plant Genetics and Breeding, 1.
Elsevier, Amsterdam, Netherlands, pp. 209217.
Laurie, C.C., Chasalow, S.D., LeDeaux, J.R., McCarroll, R., Bush, D., Hauge, B., Lai, C., Clark, D.,
Rocheford, T.R. and Dudley, J.W. (2004) The genetic architecture of response to long-term artificial
selection for oil concentration in the maize kernel. Genetics 168, 21412155.
Laurie, D.A. and Bennett, M.D. (1986) Wheat and maize hybridization. Canadian Journal of Genetics and
Cytology 28, 313316.
Laurie, D.A. and Reymondie, S. (1991) High frequencies of fertilization and haploid seedling production in
crosses between commercial hexaploid wheat varieties and maize. Plant Breeding 106, 182189.
Laurie, D.A., Pratchett, N., Bezant, J.H. and Snape, J.W. (1994) Genetic analysis of a photoperiod response
gene on the short arm of chromosome 2(2H) on Hordeum vulgare (barley). Heredity 72, 619627.
672 References
Lebowitz, R.L., Soller, M. and Beckmann, J.S. (1987) Trait-based analysis for the detection of linkage
between marker loci and quantitative trait loci in cross between inbred lines. Theoretical and Applied
Genetics 73, 556562.
Lee, E.A, Ash, M.J. and Good, B. (2007) Re-examining the relationship between degree of relatedness,
genetic effects and heterosis in maize. Crop Science 47, 629635.
Lee, J.M., Davenport, G.F., Marshall, D., Noel Ellis, T.H., Ambrose, M.J., Dicks, J., van Hintum, T.J.L. and
Flavell, A.J. (2005) GERMINATE: a generic database for integrating genotypic and phenotypic infor-
mation for plant genetic resource collections. Plant Physiology 139, 619631.
Lee, L.-Y., Kononov, M.E., Bassuner, B., Frame, B.R., Wang, K. and Gelvin, S.B. (2007) Novel plant trans-
formation vectors containing the superpromoter. Plant Physiology 145, 12941300.
Lee, M. (1995) DNA markers and plant breeding programs. Advances in Agronomy 55, 265344.
Lee, M., Godshalk, E.B., Lamkey, K.R. and Woodman, W.L. (1989) Association of restriction length poly-
morphism among maize inbreds with agronomic performance of their crosses. Crop Science 29,
10671071.
Lee, M., Sharopova, N., Beavis, W.D., Grant, D., Katt, M., Blair, D. and Hallauer, A. (2002) Expanding the
genetic map of maize with the intermated B73 Mo17 (IBM) population. Plant Molecular Biology 48,
453461.
Leflon, M., Lecomte, C., Barbottin, A., Jeuffroy, M.-H., Robert, N. and Brancourt-Hulmel, M. (2005)
Characterization of environments and genotypes for analyzing genotype environment interaction.
Some recent advances in winter wheat and prospects for QTL detection. Journal of Crop Improvement
14, 249298.
Leister, D.M., Kurth, J., Laurie, D.A., Yano, M., Sasaki, T., Devos, K., Graner, A. and Schulze-Lefert, P.
(1998) Rapid re-organization of resistance gene homologues in cereal genomes. Proceedings of the
National Academy of Sciences of the United States of America 95, 370375.
Leng, E.R. (1962) Results of long-term selection for chemical composition in maize and their significance
in evaluating breeding systems. Zeitschrift fr Pflanzenzchtung 47, 6791.
Lerner, I.M. (1950) Population Genetics and Animal Improvement. Cambridge University Press,
Cambridge.
Lerner, I.M. (1954) Genetic Homeostasis. Oliver and Boyd, London.
Lesser, W. (2005) Intellectual property rights in a changing political environment: perspectives on the types
and administration of protection. AgBioForum 8, 6472.
Lesser, W. and Mutschler, M.A. (2004) Balancing investment incentives and social benefits when
protecting plant varieties: implementing initial systems. Crop Science 44, 11131120.
Leung, H., Wu, C., Baraoidan, M., Bordeos, A., Ramos, M., Madamba, S., Cabauatan, P., Vera Cruz, C.,
Portugal, A., Reyes, G., Bruskiewich, R., McLaren, G., Lafitte, R., Gregorio, G., Bennett, J., Brar, D.,
Khush, G., Schnable, P., Wang, G. and Leach, J. (2001) Deletion mutants for functional genomics:
progress in phenotyping, sequence assignment and database development. In: Khush, G.S., Brar,
D.S. and Hardy, B. (eds) Rice Genetics IV. Proceedings of the Fourth International Rice Genetics
Symposium, 2227 October 2000, Los Banos, Philippines. Science Publishers, Inc., New Delhi and
International Rice Research Institute, Los Banos, Philippines, pp. 239251.
Levinson, G. and Gutman, G.A. (1987) Slipped-strand mispairing: a major mechanism for DNA sequence
evolution. Molecular Biology and Evolution 4, 203221.
Lewin, B. (2007) Genes IX. Jones & Bartlett, Sudbury, Massachusetts, 892 pp.
Lewington, A. (2003) Plants for People. Eden Project Books, London.
Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models.
Genetics 49, 4967.
Lewontin, R.C. and Berlan, J.P. (1990) The political economy of agricultural research: the case of hybrid
corn. In: Carroll, C.R., Vandermeer, J.H. and Rosset, P. (eds) Agroecology. McGraw Hill, New York,
pp. 613628.
Li, C.C. (1955) Population Genetics. University of Chicago Press, Chicago, Illinois.
Li, H., Ye, G. and Wang, J. (2007) A modified algorithm for the improvement of composite interval mapping.
Genetics 175, 361374.
Li, H., Ribaut, J.M., Li, Z. and Wang, J. (2008) Inclusive composite interval mapping (ICIM) for digenic epista-
sis of quantitative traits in biparental populations. Theoretical and Applied Genetics 116, 243260.
Li, L., Zhou, Y., Cheng, X., Sun, J., Marita, J.M., Ralph, J. and Chiang, V.L. (2003) Combinatorial modifica-
tion of multiple lignin traits in trees through multigene cotransformation. Proceedings of the National
Academy of Sciences of the United States of America 100, 49394944.
References 673
Li, R., Lyons, M.A., Wittenburg, H., Paigen, B. and Churchill, G.A. (2005) Combining data from multiple
inbred line crosses improves the power and resolution of quantitative trait loci mapping. Genetics 169,
16991709.
Li, R., Tsaih, S.W., Shockley, K., Stylianou, I.M., Wergedal, J., Paigen, B. and Churchill, G.A. (2006)
Structural model analysis of multiple quantitative traits. PLoS Genetics 2(7), e114.
Li, X. and Zhang, Y. (2002) Reverse genetics by fast neutron mutagenesis in higher plants. Functional and
Integrative Genomics 2, 254258.
Li, X., Song, Y., Century, K., Straight, S., Ronald, P.C., Dong, X., Lasser, M. and Zhang, Y. (2001)
Deleagene: a fast neutron mutagenesis-based reverse genetics system for plants. The Plant Journal
27, 235242.
Li, Y., Shi, Y., Cao, Y. and Wang, T. (2004) Establishment of a core collection for maize germplasm pre-
served in Chinese national gene bank using geographic distribution and characterization data.
Genetic Resources and Crop Evolution 51, 845852.
Li, Z.K., Pinson, S.R., Stansel, J.W. and Park, W.D. (1995) Identification of quantitative trait loci (QTL) for
heading date and plant height in cultivated rice (Oryza sativa L.). Theoretical and Applied Genetics
91, 374381.
Li, Z.K., Luo, L.J., Mei, H.W., Wang, D.L., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel,
J.W., Khush, G.S. and Paterson, A.H. (2001) Overdominance epistatic loci are the primary genetic
basis of inbreeding depression and heterosis in rice: I. Biomass and grain yield. Genetics 158,
17371753.
Li, Z.K., Fu, B.-Y., Gao, Y.-M., Xu, J.-L., Ali, J., Lafitte, H.R., Jiang, Y.-Z., Rey, J.D., Vijayakumar, C.H.M.,
Maghirang, R., Zheng, T.-Q. and Zhu, L.-H. (2005) Genome-wide introgression lines and their use in
genetic and molecular dissection of complex phenotypes in rice (Oryza sativa L.). Plant Molecular
Biology 59, 3352.
Liang, C., Jaiswal, P., Hebbard, C., Avraham, S., Buckler, E.S., Casstevens, T., Hurwitz, B., McCouch, S.,
Ni, J., Pujar, A., Ravenscroft, D., Ren, L., Spooner, W., Tecle, I., Thomason, J., Tung, C.-W., Wei, X.,
Yap, I., Youens-Clark, K., Ware, D. and Stein, L. (2008) Gramene: a growing plant comparative genom-
ics resource. Nucleic Acids Research 36, D947D953.
Liang, F., Deng, Q., Wang, Y., Xiong, Y., Jin, D., Li, J. and Wang, B. (2004) Molecular marker-assisted
selection for yield-enhancing genes in the progeny of 9311 O. rufipogon using SSR. Euphytica 139,
159165.
Liang, G.H. and Skinner, D.Z. (eds) (2004) Genetically Modified Crops: Their Development, Uses and
Risks. Food Products Press, Binghamton, New York.
Lillemo, M., van Ginkel, M., Trethowan, R.M., Hernndez, E. and Rajaram, S. (2004) Associations among
international CIMMYT bread wheat yield testing locations in high rainfall areas and their implications
for wheat breeding. Crop Science 44, 11631169.
Lilley, J.M., Ludlow, M.M., McCouch, S.R. and OToole, J.C. (1996) Locating QTL for osmotic adjustment
and dehydration tolerance in rice. Journal of Experimental Botany 47, 14271436.
Lin, C., Fang, J., Xu, X., Zhao, T., Cheng, J., Tu, J., Ye, G. and Shen, Z. (2008) A built-in strategy for con-
tainment of transgenic plants: creation of selectively terminable transgenic rice. PLoS ONE 3, e1818.
Available at: http://www.plosone.org (accessed 17 November 2009).
Lin, C.S. and Binns, M.R. (1988) A method of analyzing cultivar location year experiments: a new stabil-
ity parameter. Theoretical and Applied Genetics 76, 425430.
Lin, C.S., Binns, M.R. and Lefkovitch, L.P. (1986) Stability analysis: where do we stand? Crop Science 26,
894900.
Lin, H.X., Yamamoto, T., Sasaki, T. and Yano, M. (2000) Characterization and detection of epistatic inter-
actions of 3 QTLs, Hd1, Hd2 and Hd3, controlling heading date in rice using nearly isogenic lines.
Theoretical and Applied Genetics 101, 10211028.
Lin, Y.R., Schertz, K.F. and Paterson, A.H. (1995) Comparative analysis of QTLs affecting plant height
and maturity across the Poaceae, in reference to an interspecific sorghum population. Genetics 140,
391411.
Lippman, Z.B. and Zamir, D. (2007) Heterosis: revisiting the magic. Trends in Genetics 23, 6066.
Liu, B., Zhang, S., Zhu, X., Yang, Q., Wu, S., Mei, M., Mauleon, R., Leach, J., Mew, T. and Leung, H.
(2004) Candidate defense genes as predictors of quantitative blast resistance in rice. Molecular
PlantMicrobe Interaction 17, 11461152.
Liu, B.H. (1998) Statistical Genomics: Linkage, Mapping and QTL Analysis. CRC Press, Boca Baton,
Florida, 611 pp.
674 References
Liu, G., Zhang, Z., Zhu, H., Zhao, F., Ding, X., Zeng, R., Li, W. and Zhang, G. (2008) Detection of
QTLs with additive effects and additive-by-environment interaction effects on panicle number in
rice (Oryza sativa L.) with single-segment substitution lines. Theoretical and Applied Genetics 116,
923931.
Liu, J.H., Xu, X.Y. and Deng, X.X. (2005) Intergeneric somatic hybridization and its application to crop
genetic improvement. Plant Cell, Tissue and Organ Culture 82, 1944.
Liu, J.S., Sabatti, C., Teng, J., Keats, B.J.B. and Risch, K. (2001) Bayesian analysis of haplotypes for link-
age disequilibrium mapping. Genome Research 11, 17161724.
Liu, K., Goodman, M., Muse, S., Smith, J.S., Buckler, E.D. and Doebley, J. (2003) Genetic structure
and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165,
21172128.
Liu, S., Zhou, R., Dong, Y., Li, P. and Jia, J. (2006) Development, utilization of introgression lines using a
synthetic wheat as donor. Theoretical and Applied Genetics 112, 13601373.
Liu, S.C., Kowalski, S.P., Lan, T.H., Feldmann, K.A. and Paterson, A.H. (1996) Genome-wide high-
resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142,
247258.
Liu, X.C. and Wu, J.L. (1998) SSR heterotic patterns of parents for making and predicting heterosis in rice
breeding. Molecular Breeding 4, 263268.
Liu, X.Q., Wang, L., Chen, S., Lin, F. and Pan, Q.H. (2005) Genetics and physical mapping of Pi36(t), a
novel rice blast resistance gene located on rice chromosome 8. Molecular Genetics and Genomics
274, 394401.
Liu, X.Z., Peng, Z.B., Fu, J.H., Li, L.C. and Huang, C.L. (1997) Application of RAPD in group classification
studies. Scientia Agricultura Sinica 30, 4451.
Liu, Y. and Zeng, Z.B. (2000) A general mixture model approach for mapping quantitative trait loci from
diverse cross designs involving multiple inbred lines. Genetical Research 75, 345355.
Liu, Y.G. and Whittier, R. (1995) Thermal asymmetric interlaced PCR: automatable amplification and
sequencing of insert and fragments from P1 and YAC clones for chromosome walking. Genomics 25,
674681.
Liu, Y.G., Mitsukawa, N., Oosumi, T. and Whittier, R. (1995) Efficient isolation and mapping of Arabidopsis
thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR. The Plant Journal 8,
457463.
Liu, Y.-G., Shirano, Y., Fukaki, H., Yanai, Y., Tasaka, M., Tabata, S. and Shibata, D. (1999) Complementation
of plant mutants with large genomic DNA fragments by a transformation-competent artificial chromo-
some vector accelerates positional cloning. Proceedings of the National Academy of Sciences of the
United States of America 96, 65356540.
Lloyd, A., Plaisier, C.L., Carroll, D. and Drews, G.N. (2005) Targeted mutagenesis using zinc-finger nucle-
ases in Arabidopsis. Proceedings of the National Academy of Sciences of the United States of
America 102, 22322237.
Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C.,
Kobayashi, M., Norton, H. and Brown, E.L. (1996) Expression monitoring by hybridization to high-
density oligonucleotide arrays. Nature Biotechnology 14, 16751680.
Lffler, C.M., Wei, J., Fast, T., Gogerty, J., Langton, S., Bergman, M., Merrill, B. and Cooper, M. (2005)
Classification of maize environments using crop simulation and geographic information systems. Crop
Science 45, 17081716.
Lolle, S.J., Victoria, J.L., Young, J.M. and Pruitt, R.E. (2005) Genome-wide non-Mendelian inheritance of
extra-genomic information in Arabidopsis. Nature 434, 505509.
Long, A.D., Mullaney, S.L., Reid, L.A., Fry, J.D., Langley, C.H. and Mackay, T.F. (1995) High resolution map-
ping of genetic factors affecting abdominal bristle number in Drosophila melanogaster. Genetics 139,
12731291.
Longin, C.F.H., Utz, H.F., Reif, J.C., Schipprack, W. and Melchinger, A.E. (2006) Hybrid maize breeding with
doubled haploids: I. One stage versus two-stage selection for testcross performance. Theoretical and
Applied Genetics 112, 903912.
Lonnstedt, I. and Speed, T.P. (2002) Replicated microarray data. Statistica Sinica 12, 3146.
Lonosky, P.M., Zhang, X., Honavar, V.G., Dobbs, D.L., Fu, A. and Rodermel, S.R. (2004) A proteomic analy-
sis of maize chloroplast biogenesis. Plant Physiology 134, 560574.
Lrz, H. and Wenzel, G. (eds) (2005) Molecular Marker Systems in Plant Breeding and Crop Improvement.
Biotechnology in Agriculture and Forestry, Vol. 55. Springer-Verlag, Berlin.
References 675
Louwaars, N.P., Visser, B., Eaton, D., Beekwilder, J. and van der Meer, I. (2002) Policy response to techno-
logical developments: the case of GURTs. In: Louwaars, N.P. (ed.) Seed Policy, Legislation and Law:
Widening a Narrow Focus. Food Products Press, Binghamton, New York, pp. 89102.
Louwaars, N.P., Tripp, R. and Eaton, D. (2006) Public research in plant breeding and intellectual property
rights: a call for new institutional policies. Agricultural and Rural Development Notes Issue 13, p. 4.
World Bank, Washington, DC.
Lu, C., Shen, L., Tan, Z., Xu, Y., He, P., Chen, Y. and Zhu, L. (1996) Comparative mapping of QTL for agro-
nomic traits of rice across environments using a double haploid population. Theoretical and Applied
Genetics 93, 12111217.
Lu, H., Romero-Severson, J. and Bernardo, R. (2003) Genetic basis of heterosis explored by simple
sequence repeat markers in a random-mated maize population. Theoretical and Applied Genetics
107, 494502.
Lu, H., Redus, M.A., Coburn, J.R., Rutger, J.N., McCouch, S.R. and Tai, T.H. (2005) Population structure
and breeding patterns of 145 U.S. rice cultivars based on SSR marker analysis. Crop Science 45,
6676.
Lu, L., Romero-Severson, J. and Bernardo, R. (2002) Chromosomal regions associated with segregation
distortion in maize. Theoretical and Applied Genetics 105, 622628.
Lu, X., Niu, T. and Liu, J.S. (2003) Haplotype information and linkage disequilibrium mapping for single
nucleotide polymorphisms. Genome Research 13, 21122117.
Lu, X.G., Gu, M.H. and Li, C.Q. (eds) (2001) Theory and Technology of Two-line Hybrid Rice. China Science
Press, Beijing.
Lu, X.G., Mou, T.M., Hoan, N.T. and Virmani, S.S. (2004) Two-line hybrid rice breeding in and outside
of China. In: Virmani, S.S., Mao, C.X. and Hardy, B. (eds) Hybrid Rice for Food Security, Poverty
Alleviation and Environmental Protection. International Rice Research Institute, Manila, Phillipines.
Lu, Y., Yan, J., Guimares, C.T., Taba, S., Hao, Z., Gao, S., Chen, S., Li, J., Zhang, S., Vivek, B.S.,
Magorokosho, C., Mugo, S., Makumbi, D., Parentoni, S.N., Shah, T., Rong, T., Crouch, J.H. and Xu, Y.
(2009) Molecular characterization of global maize breeding germplasm based on genome-wide single
nucleotide polymorphisms. Theoretical and Applied Genetics 120, 93115.
Lbberstedt, T., Klien, D. and Melchinger, A.E. (1998a) Comparative QTL mapping of resistance to
Ustilago maydis across four populations of European flint-maize. Theoretical and Applied Genetics
97, 13211330.
Lbberstedt, T., Melchenger, A.E., Fhr, S., Klein, D., Dally, A. and Westhoff, P. (1998b) QTL mapping in
test crosses of flint lines of maize: III. Comparison across populations for forage traits. Crop Science
38, 12781289.
Lucca, P., Ye, X.D. and Potrykus, I. (2001) Effective selection and regeneration of transgenic rice plants with
mannose as selective agent. Molecular Breeding 7, 4349.
Lucken, K.A. (1986) The breeding and production of hybrid wheat. In: Smith, E.L. (ed.) Genetic Improvement
in Yield of Wheat. American Society of Agronomy (ASA) and Crop Science Society of America (CSSA),
Madison, Wisconsin, pp. 87107.
Lucken, K.A. and Johnson, K.D. (1988) Hybrid wheat status and outlook. In: International Rice Research
Institute (IRRI) (ed.) Hybrid Rice. IRRI, Manila, Philippines, pp. 243255.
Lucker, J., Schwab, W., van Hautum, B., Blaas, J., van der Plas, L.H., Bouwmeester, H.J. and Verhoeven,
H.A. (2004) Increased and altered fragrance of tobacco plants after metabolic engineering using three
monoterpene synthases from lemon. Plant Physiology 134, 510519.
Luo, K., Duan, H., Zhao, D., Zheng, X., Deng, W., Chen, Y., Stewart, C.N., Jr, McAvoy, R., Jiang, X., Wu, Y.,
He, A., Pei, Y. and Li, Y. (2007) GM-Gene-deletor: fused loxP-FRT recognition sequences dramati-
cally improve the efficiency of FLP or CRE recombinase on transgene excision from pollen and seed
of tobacco plants. Plant Biotechnology Journal 5, 263274.
Luo, L.J., Li, Z.K., Mei, H.W., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel, J.W., Khush, G.S. and
Paterson, A.H. (2001) Overdominant epistatic loci are the primary genetic basis of inbreeding depres-
sion and heterosis in rice. II. Grain yield components. Genetics 158, 17551771.
Lupas, A. (1996) Prediction and analysis of coiled coil structures. Methods in Enzymology 266, 513523.
Lush, J.L. (1937) Animal Breeding Plans. Iowa State College Press, Ames, Iowa.
Lush, J.L. (1945) Animal Breeding Plans, 3rd edn. Iowa State College Press, Ames, Iowa.
Lussier, Y.A. and Li, J. (2004) Terminological mapping for high throughput comparative biology of pheno-
types. Proceedings of the Pacific Symposium on Biocomputing, 610 January 2004, Hawaii. PSB,
Stanford, California, pp. 202213.
676 References
Lutz, K.A., Azhagiri, A.K., Tungsuchat-Huang, T. and Maliga, P. (2007) A guide to choosing vectors for
transformation of the plastid genome of higher plants. Plant Physiology 145, 12011210.
Lyamichev, V., Mast, A.L., Hall, J.G., Prudent, J.R., Kaiser, M.W., Takova, T., Kwiatkowski, R.W., Sander,
T.J., de Arruda, M., Arco, D.A., Neri, B.P. and Brow, M.A.D. (1999) Polymorphism identification and
quantitative detection of genomic DNA by invasive cleavage of oligonucleotide probes. Nature
Biotechnology 17, 292296.
Lyman, J.M. (1984) Progress and planning for germplasm conservation of major food crops. Plant Genetic
Resources Newsletter 60, 321.
Lynch, M. and Walsh, B. (1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates,
Sunderland, Massachusetts, 980 pp.
Ma, C.X., Casella, G. and Wu, R.L. (2002) Functional mapping of quantitative trait loci underlying the char-
acter process: a theoretical framework. Genetics 61, 17511762.
Ma, J.K., Hiatt, A., Hein, M., Vine, N.D., Wang, F., Stabila, P., van Dolleweerd, C., Mostov, K. and Lehner, T.
(1995) Generation and assembly of secretory antibodies in plants. Science 268, 716719.
Ma, J.K.-C., Chikwamba, R., Sparrow, P., Fischer, R., Mahoney, R. and Twyman, R.M. (2005) Plant-derived
pharmaceuticals the road forward. Trends in Plant Science 10, 580585.
MacBeath, G. and Schreiber, S.L. (2000) Printing proteins as microarrays for high-throughput function
determination. Science 289, 17601763.
MacCoss, M.J., McDonald, W.H., Saraf, A., Sadygov, R., Clark, J.M, Tasto, J.J., Gould, K.L., Wolters, D.,
Washburn, M., Weiss, A., Clark, J.I. and Yates III, J.R. (2002) Shotgun identification of protein modifi-
cations from protein complexes and lens tissue. Proceedings of the National Academy of Sciences of
the United States of America 99, 79007905.
MacDonald, J.A., Mackey, A.J., Pearson, W.R. and Haystead, T.A. (2002) A strategy for the rapid identifica-
tion of phosphorylation sites in the phosphoproteome. Molecular and Cellular Proteomics 1, 314322.
Mackay, I. and Powell, W. (2007) Methods for linkage disequilibrium mapping in crops. Trends in Plant
Science 12, 5763.
Mackay, T.F.C. (1995) The genetic basis of quantitative variation: number of sensory bristles of Drosophila
melanogaster as a model system. Trends in Genetics 11, 464470.
Mackay, T.F.C., Stone, E.A. and Ayroles, J.F. (2009) The genetics of quantitative traits: challenges and
prospects. Nature Reviews Genetics 10, 565577.
Mackill, D.J. and McNally, K.L. (2004) A model crop species: molecular markers in rice. In: Lrz, H. and
Wenzel, G. (eds) Molecular Marker Systems in Plant Breeding and Crop Improvement. Springer
Verlag, Heidelberg, pp. 3954.
Mackill, D.J., Salam, M.A., Wang, Z.Y. and Tanksley, S.D. (1993) A major photoperiod-sensitivity gene
tagged with RFLP and isozyme markers in rice. Theoretical and Applied Genetics 85, 536540.
Mackill, D.J., Zhang, Z., Redoa, E.D. and Colowit, P.M. (1996) Level of polymorphism and genetic map-
ping of AFLP markers in rice. Genome 39, 969977.
Macomber, R.S. (1998) A Complete Introduction to Modern NMR Spectroscopy. John Wiley & Sons,
Chichester, UK.
Magnuson, V.L., Ally, D.S., Nylund, S.J., Karanjawala, Z.E., Rayman, J.B., Knapp, J.I., Lowe, A.L., Ghosh, S.
and Collins, F.S. (1996) Substrate nucleotide-determined non-templates addition to adenine by Taq
DNA polymerase: implication for PCR-based genotyping and cloning. Biotechniques 21, 700709.
Maheswaran, M., Huang, N., Sreerangasamy, S.R. and McCouch, S.R. (2000) Mapping quantitative trait
loci associated with days to flowering and photoperiod sensitivity in rice (Oryza sativa L.). Molecular
Breeding 6, 145155.
Mailund, T., Schierup, M.H., Pedersen, C.N.S., Madsen, J.N., Hein, J. and Schauser, L. (2006) GeneRecon
a coalescent based tool for fine-scale association mapping. Bioinformatics 22, 23172318.
Malakoff, D. (1999) Bayes offers a new way to make sense of numbers. Science 286, 14601464.
Malmberg, R.L. and Mauricio, R. (2005) QTL-based evidence for the role of epistasis in evolution. Genetical
Research 86, 8995.
Malmberg, R.L., Held, S., Waits, A. and Mauricio, R. (2005) Epistasis for fitness-related quantitative traits in
Arabidopsis thaliana grown in the field and in the greenhouse. Genetics 171, 20132027.
Malosetti, M., Voltas, J., Romagosa, I., Ullrich, S.E. and van Eeuwijk, F.A. (2004) Mixed models including
environmental variables for studying QTL by environment interaction. Euphytica 137, 139145.
Malosetti, M., van der Linden, C.G., Vosman, B. and van Eeuwijk, A. (2007) A mixed-model approach to
association mapping using pedigree information with an illustration of resistance to Phytophthora
infestans in potato. Genetics 175, 879889.
References 677
Maluszynski, M., Kasha, K.J., Forster, B.P. and Szarejko, I. (eds) (2003) Doubled Haploid Production in
Crop Plants a Manual. Kluwer Academic Publishers, Dordrecht, Netherlands.
Mandel, J. (1969) The partitioning of interaction in analysis of variance. Journal of Research of the National
Bureau of Standards, Series B 73, 309328.
Mandel, J. (1971) A new analysis of variance model for nonadditive data. Technometrics 13, 118.
Manenti, G., Galvan, A., Pettinicchio, A., Trincucci, G., Spada, E., Zolin, A., Milani, S., Gonzalez-Neira, A.
and Dragani, T.A. (2009) Mouse genome-wide association mapping needs linkage analysis to avoid
false-positive loci. PLoS Genetics 5(1), e1000331.
Mangelsdorf, P.C. (1974) Corn: Its Origin, Evolution and Improvement. Harvard University Press, Cambridge,
Massachusetts.
Manly, K.F. (1993) A Macintosh program for storage and analysis of experimental genetic mapping data.
Mammalian Genome 4, 303313.
Mannschreck, S. (2004) Optimierung der Methode zur Chromosomalen Aufdopplung von in-vivo induzi-
erten Haploiden bei Mais (Zea mays L.). MSc thesis, Universitt Hohenheim, Germany.
Maqbool, S.B., Riazuddin, S., Loc, N.T., Gatehouse, A.M.R., Gatehouse, J.A. and Christou, P. (2001)
Expression of multiple insecticidal genes confers broad resistance against a range of different rice
pests. Molecular Breeding 7, 8593.
Marchini, J., Donnelly, P. and Cardon, L.R. (2005) Genome-wide strategies for detecting multiple loci that
influence complex diseases. Nature Genetics 4, 413417.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S.,
Chen, Y.-J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen,
S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.-B., Knight,
J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B.,
McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T.,
Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt,
K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F. and Rothberg, J.M. (2005)
Genome sequencing in open microfabricated high density picoliter reactors. Nature 437, 376380.
Marillonnet, S., Giritch, A., Gils, M., Kandzia, R., Klimyuk, V. and Gleba, Y. (2004) In planta engineering of
viral RNA replicons: efficient assembly by recombination of DNA modules delivered by Agrobacterium.
Proceedings of the National Academy of Sciences of the United States of America 101, 68526857.
Marillonnet, S., Thoeringer, C., Kandzia, R., Klimyuk, V. and Gleba, Y. (2005) Systematic Agrobacterium
tumefaciens-mediated transfection of viral replicons for efficient transient expression in plants. Nature
Biotechnology 23, 718723.
Martienssen, R.A., Rabinowicz, P.D., OShaughnessy, A. and McCombie, W.R. (2004) Sequencing the
maize genome. Current Opinion in Plant Biology 7, 102107.
Martin, G.B., Williams, J.G.K. and Tanksley, S.D. (1991) Rapid identification of markers linked to a
Pseudomonas resistance gene in tomato by using random primers and near-isogenic lines.
Proceedings of the National Academy of Sciences of the United States of America 88, 23362340.
Martin, G.B., Brommonschenkel, S.H., Chunwongse, J., Frary, A., Ganal, M.W., Spivey, R., Wu, T., Earle,
E.D. and Tanksley, S.D. (1993) Map-based cloning of a protein kinase gene conferring disease resist-
ance in tomato. Science 262, 14321436.
Martin, J.M., Talbert, L.E., Lanning, S.P. and Blake, N.K. (1995) Hybrid performance in wheat as related to
parental diversity. Crop Science 35, 104108.
Martin, O.C. and Hospital, F. (2006) Two- and three-locus tests for linkage analysis using recombinant
inbred lines. Genetics 173, 451459.
Martinez, L. (2003) In vitro gynogenesis induction and doubled haploid production in onion (Allium cepa L.).
In: Doubled Haploid Production in Crop Plants. Kluwer Academic Publisher, Dordrecht, Netherlands,
pp. 275281.
Martinez, O. and Curnow, R.N. (1992) Estimating the locations and the sizes of the effects of quantitative
trait loci using flanking markers. Theoretical and Applied Genetics 85, 480488.
Martinez, V., Thorgaard, G., Robison, B. and Sillanp, M.J. (2005) An application of Bayesian QTL map-
ping to early development in double haploid lines of rainbow trout including environmental effects.
Genetical Research 86, 209221.
Mascarenhas, M. and Busch, L. (2006) Seeds of change: intellectual property rights, genetically modified
soybeans and seed saving in the United States. Sociologia Ruralis 46, 122138.
Mather, K. (1949) Biometrical Genetics. Chapman & Hall, London.
Mather, K. and Jinks, J.L. (1982) Biometrical Genetics. Chapman & Hall, London.
678 References
Mathesius, U., Imin, N., Natera, S.H.A. and Rolfe, B.G. (2003) Proteomics as a functional genomics tool. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Totowa, New Jersey, pp. 395413.
Matsumura, H., Ito, A., Saitoh, H., Winter, P., Kahl, G., Reuter, M., Kruger, D.H. and Terauchi, R. (2005)
SuperSAGE. Cell Microbiology 7, 1118.
Matthews, P.R., Wang, M.B., Waterhouse, P.M., Thornton, S., Fieg, S.J., Gubler, F. and Jacobsen, J.V.
(2001) Marker gene elimination from transgenic barley, using co-transformation with adjacent twin
T-DNA on a standard Agrobacterium transformation vector. Molecular Breeding 7, 195202.
Matus, I., Corey, A., Filichkin, T., Hayes, P.M., Vales, M.I., Kling, J., Riera-Lizarazu, O., Sato, K., Powell, W.
and Waugh, R. (2003) Development and characterization of recombinant chromosome substitution
lines (RCSLs) using Hordeum vulgare subsp. spontaneum as a source of donor alleles in a Hordeum
vulgare subsp. vulgare background. Genome 46, 10101023.
Matzke, M.A. and Matzke, A.J.M. (1995) How and why do plants inactivate homologous (Trans) genes?
Plant Physiology 107, 679 685.
Matzke, M.A., Mette, M.F. and Matzke, A.J.M. (2000) Transgene silencing by the host genome defense:
implications for the evolution of epigenetic control mechanisms in plants and vertebrates. Plant
Molecular Biology 43, 401415.
Maxted, N., Ford-Lloyd, B.V. and Hawkes, J.G. (1997) Complementary conservation strategies. In: Maxted,
N., Ford-Lloyd, B.V. and Hawkes, J.G. (eds) Plant Genetic Resources Conservation. Chapman & Hall,
London, pp. 1539.
Mayer, J., Sharples, J. and Nottenburg, C. (2004) Resistance to Phosphinothricin. CAMBIA Intellectual
Property, Canberra.
Mayer, J.E., Pfeiffer, W.H. and Beyer, P. (2008) Biofortified crops to alleviate micronutrient malnutrition.
Current Opinion in Plant Biology 11, 166177.
Mayes, S., Parsley, K., Sylvester-Bradley, R., May, S. and Foulkes, J. (2005) Integrating genetic informa-
tion into plant breeding programmes: how will we produce varieties from molecular variation, using
bioinformatics? Annals of Applied Biology 146, 223237.
McCallum, C.M., Comai, L., Greene, E.A. and Henikoff, S. (2000) Targeting Induced Local Lesions IN
Genomes (TILLING) for plant functional genomics. Plant Physiology 123, 439442.
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A. and Hirschhorn,
J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and chal-
lenges. Nature Reviews Genetics 9, 356269.
McCouch, S.R., Teytelman, L., Xu, Y., Lobos, K.B., Clare, K., Walton, M., Fu, B., Maghirang, R., Li, Z., Xing, Y.,
Zhang, Q., Kono, I., Yano, M., Fjellstrom, R., DeClerck, G., Schneider, D., Cartinhour, S., Ware, D. and
Stein, L. (2002) Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA
Research 9, 199207.
McCouch, S.R., Sweeney, M., Li, J., Jiang, H., Thomson, M., Septiningsih, E., Edwards, J., Moncada, P.,
Xiao, J., Garris, A., Tai, T., Martinez, C., Tohme, J., Sugiono, M., McClung, A., Yuan, L.P. and Ahn, S.N.
(2007) Through the genetic bottleneck: O. rufipogon as a source of trait-enhancing alleles for O. sativa.
Euphytica 154, 317339.
McElroy, D. (1996) The industrialization of plant transformation. Nature Biotechnology 14, 715716.
McElroy, D. and Brettell, R.I.S. (1994) Foreign gene expression in transgenic cereals. Trends in Biotechnology
12, 6268.
McElroy, D., Zhang, W.G., Cao, J. and Wu, R. (1990) Isolation of an efficient actin promoter for use in rice
transformation. The Plant Cell 2, 163171.
McLaren, C.G., Bruskiewich, R.M., Portugal, A.M. and Cosico, A.B. (2005) The International Rice Information
System. A platform for meta-analysis of rice crop data. Plant Physiology 139, 637642.
McMullen, M.M., Kresovich, S., Villeda, H.S., Bradbury, P., Li, H., Sun, Q., Flint-Garcia, S., Thornsberry, J.,
Acharya, C., Bottoms, C., Brown, P., Browne, C., Eller, M., Guill, K., Harjes, C., Kroon, D., Lepak, N.,
Mitchell, S.E., Peterson, B., Pressoir, G., Romero, S., Rosas, M.O., Salvo, S., Yates, H., Hanson, M.,
Jones, E., Smith, S., Glaubitz, J.C., Goodman, M., Ware, D., Holland, J.B. and Buckler, E.S. (2009)
Genetic properties of the maize nested association mapping population. Science 325, 737740.
McNally, K.L., Bruskiewich, R., Mackill, D., Buell, C.R., Leach, J.E. and Leung, H. (2006) Sequencing multi-
ple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiology
141, 2631.
Meaburn, E., Butcher, L.M., Schalkwyk, L.C. and Plomin, R. (2006) Genotyping pooled DNA using 100K
SNP microarrays: a step towards genomewide association scans. Nucleic Acids Research 34, e28.
References 679
Meghi, M.R., Dudley, J.W., Lamkey, R.J. and Sprauge, G.F. (1984) Inbreeding depression, inbred and
hybrid grain yields and other traits of maize genotypes representing three eras. Crop Science 24,
545549.
Melchinger, A.E. (1993) Use of RFLP markers for analyses of genetic relationships among breeding mate-
rials and prediction of hybrid performance. In: Buxton, D.R., Shibles, R., Forsberg, R.A., Blad, B.L.,
Asay, K.H., Paulson, G.M. and Wilson, R.F. (eds) International Crop Science I. Crop Science Society
of America (CSSA), Madison, Wisconsin, pp. 621628.
Melchinger, A.E. (1999) Genetic diversity and heterosis. In: Coors, J.G. and Pandey, S. (eds) The Genetics
and Exploitation of Heterosis in Crops. Crop Science Society of America (CSSA), Madison, Wisconsin,
p. 54 (abstract).
Melchinger, A.E. and Gumber, R.K. (1998) Overview of heterosis and heterotic groups in agronomic crops.
In: Lamkey, K.R. and Staub, J.E. (eds) Concepts and Breeding of Heterosis in Crop Plants. Crop
Science Society of America (CSSA), Madison, Wisconsin, pp. 2944.
Melchinger, A.E., Geiger, H.H. and Schnell, F.W. (1986) Epistasis in maize (Zea mays L.) I. Comparison of
single and three-way cross hybrids among early flint and dent inbred lines. Maydica 31, 179192.
Melchinger, A.E., Lee, M., Lamkey, K.R. and Woodman, W.L. (1990) Genetic diversity for restriction frag-
ment length polymorphisms: relation to estimated genetic effects in maize inbreds. Crop Science 30,
10331040.
Melchinger, A.E., Messmer, M.M., Lee, M., Woodman, W.L. and Lamkey, K.R. (1991) Diversity and relation-
ships among U.S. maize inbreds revealed by restriction fragment length polymorphism. Crop Science
31, 669678.
Melchinger, A.E., Boppenmaier, J., Dhillon, B.S., Pollmer, W.G. and Herrmann, R.G. (1992) Genetic diver-
sity for RFLPs in European maize inbreds. II. Relation to performance of hybrids within versus between
heterotic groups for forage traits. Theoretical and Applied Genetics 84, 627681.
Melchinger, A.E., Graner, A., Singh, M. and Messmer, M.M. (1994) Relationships among European germ-
plasm: I. Genetic diversity among winter and spring cultivars revealed by RFLPs. Crop Science 34,
11911199.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (1998) Quantitative trait locus (QTL) mapping using different
testers and independent populations samples in maize reveals low power of QTL detection and large
bias in estimates of QTL effects. Genetics 149, 383403.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2000) From Mendel to Fisher. The power and limits of QTL
mapping for quantitative traits. Vortr Pflanzenzchtg 48, 132142.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2004) QTL analyses of complex traits with cross validation,
bootstrapping and other biometric methods. Euphytica 137, 111.
Melchinger, A.E., Longin, C.F., Utz, H.F. and Reif, J.C. (2005) Hybrid maize breeding with doubled haploid
lines: quantitative genetic and selection theory for optimum allocation of resources. In: Proceedings
of the Forty First Annual Illinois Corn Breeders School 78 March 2005, Urbana-Champaign, Illinois.
University of Illinois at Urbana-Champaign, pp. 821.
Melchinger, A.E., Utz, H.F., Piepho, H.P., Zeng, Z.-B. and Schn, C.C. (2007) The role of epistasis in the
manifestation of heterosis a system-oriented approach. Genetics 177, 18151825.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2008) Genetic expectations of quantitative trait loci main and
interaction effects obtained with the triple testcross design and their relevance for the analysis of
heterosis. Genetics 178, 22652274.
Menkir, A., Melake-Berhan, A., The, C., Ingelbrecht, I. and Adepoju, A. (2004) Grouping of tropical mid-
altitude maize inbred lines on the basis of yield data and molecular markers. Theoretical and Applied
Genetics 108, 15821590.
Menz, M.A., Klein, R.R., Mullet, J.E., Obert, J.A., Unruh, N.C. and Klein, P.E. (2002) A high-density genetic
map of Sorghum bicolor (L.) Moench based on 2926 AFLP, RFLP and SSR markers. Plant Molecular
Biology 48, 483499.
Mertz, E.T., Bates, L.S. and Nelson, O.E. (1964) Mutant gene that changes protein composition and
increases lysine content of maize endosperm. Science 145, 279280.
Messina, C.D., Jones, J.W., Boote, K.J. and Vallejos, C.E. (2006) A gene-based model to simulate soybean
development and yield response to environment. Crop Science 46, 456466.
Messmer, M.M., Melchinger, A.E., Herrmann, R.G. and Boppenmaier, J. (1993) Relationships among early
European maize inbreds. II. Comparisons of pedigree and RFLP data. Crop Science 33, 944950.
Meudt, H.M. and Clarke, A.C. (2007) Almost forgotten or latest practice? AFLP applications, analyses and
advances. Trends in Plant Science 12, 106117.
680 References
Meuwissen, T.H.E., Hayes, B.J. and Goddard, M.E. (2001) Prediction of total genetic value using genome-
wide dense marker maps. Genetics 157, 18191829.
Meyer, K., Benning, G. and Grill, E. (1996) Cloning of plant genes based on genetic map location. In:
Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 137154.
Meyer, S., Nowak, K., Sharma, V.K., Schulze, J., Mendel, R.R. and Hansch, R. (2004) Vectors for RNAi
technology in poplar. Plant Biology 6, 100103.
Meyers, B.C., Scalabrin, S. and Morgante, M. (2004) Mapping and sequencing complex genomes: lets get
physical! Nature Reviews Genetics 5, 578588.
Michelmore, R.W. and Shaw, D.V. (1988) Character dissection. Nature 335, 672673.
Michelmore, R.W., Paran, I. and Kesselli, R.V. (1991) Identification of markers linked to disease resistance
genes by bulked segregant analysis: a rapid method to detect markers in specific genome regions
using segregating populations. Proceedings of the National Academy of Sciences of the United States
of America 88, 98289832.
Miernyk, J.A. and Thelen, J.J. (2008) Biochemical approaches for discovering proteinprotein interactions.
The Plant Journal 53, 597609.
Miki, B. and McHugh, S. (2004) Selectable marker genes in transgenic plants: applications, alternatives
and biosafety. Journal of Biotechnology 107, 193232.
Miki, D. and Shimamoto, K. (2004) Simple RNAi vectors for stable and transient suppression of gene func-
tion in rice. Plant and Cell Physiology 45, 490495.
Mikkilineni, V. and Rocheford, T.R. (2004) RFLP variant frequency differences among Illinois long-term
selection protein strains. Plant Breeding Reviews 24(1), 111131.
Miklas, P.N., Kelly, J.D. and Singh, S.P. (2003) Registration of anthracnose-resistant pinto bean germplasm
line USPT-ANT-1. Crop Science 43, 18891890.
Miles, J.S. and Guest, J.R. (1984) Nucleotide sequence and transcriptional start point of the phosphoman-
nose isomerase gene (mana) of Escherichia coli. Gene 32, 4148.
Miller, W., Makova, K.D., Nekrutenko, A. and Hardison, R.C. (2004) Comparative genomics. Annual Review
of Genomics and Human Genetics 5, 1556.
Mitchell, A.A. and Chakravarti, A. (2003) Undetected genotyping errors cause apparent overtransmission
of common alleles in the transmission/disequilibrium test. American Journal of Human Genetics 72,
598610.
Miyahara, K. (1999) Analysis of LGC1, low glutelin mutant of rice. Gamma Field Symposia 38, 4352.
Miyao, A., Tanaka, K., Murata, K., Sawaki, H., Takeda, S., Abe, K., Shinozuka, Y., Onosato, K. and
Hirochika, H. (2003) Target site specificity of the Tos17 retrotransposon shows a preference for inser-
tion within genes and against insertion in retrotransposon-rich regions of the genome. The Plant Cell
15, 17711780.
Miyata, M., Yamamoto, T., Komori, T. and Nitta, N. (2007) Marker-assisted selection and evaluation of the
QTL for stigma exsertion under japonica rice genetic background. Theoretical and Applied Genetics
114, 539548.
Mlynarova, L., Conner, A.J. and Nap, J.P. (2006) Directed microspore-specific recombination of transgenic
alleles to prevent pollen-mediated transmission. Plant Biotechnology Journal 4, 445452.
Mo, H. (1988) Genetic expression for endosperm traits. In: Weir, B.S., Eisen, E.J., Goodman, M.M. and
Namkoog, S.N. (eds) Proceedings of the 2nd International Conference of Quantitative Genetics.
Sinauer Associates, Sunderland, Massachusetts, pp. 478487.
Mo, H. (1993a) Genetic analysis for qualitativequantitative traits. I. The genetic constitution of generations
and identification of major gene genotypes. Acta Agronomica Sinica 19, 16 (in Chinese with English
abstract).
Mo, H. (1993b) Genetic analysis for qualitativequantitative traits. II. Generation means and genetic vari-
ances. Acta Agronomica Sinica 19, 193200 (in Chinese with English abstract).
Mockler, T.C. and Ecker, J.R. (2004) Application of DNA tiling arrays for whole-genome analysis. Genomics
85, 115.
Mohler, V. and Singrn, C. (2004) General considerations: marker-assisted selection. In: Lrz, H. and
Wenzl, G. (eds) Biotechnology in Agricultural and Forestry, Vol. 55. Molecular Marker Systems in
Plant Breeding and Crop Improvement. Springer-Verlag, Berlin, pp. 305317.
Mohler, V. and Schwartz, G. (2005) Genotyping tools in plant breeding: from restriction fragment length
polymorphisms to single nucleotide polymorphisms. In: Lrz, H. and Wenzel, G. (eds) Molecular
Marker Systems in Plant Breeding and Crop Improvement. Biotechnology in Agriculture and Forestry,
Vol. 55. Springer-Verlag, Berlin, pp. 2338.
References 681
Moing, A., Deborde, C. and Rolin, D. (2007) Metabolic fingerprinting and profiling by proton NMR. In: Morot-
Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New
Hampshire, pp. 335344.
Molloy, M.P. and Witzmann, F.A. (2002) Proteomics: technologies and applications. Briefings in Functional
Genomics and Proteomics 1, 2329.
Moncada, P., Martinez, C.P., Borrero, J., Chatel, M., Gauch, H., Guimaraes, E., Tohme, J. and McCouch,
S.R. (2001) Quantitative trait loci for yield and yield components in an Oryza sativa Oryza rufipogon
BC2F2 population evaluated in an upland environment. Theoretical Applied Geneics 102, 4152.
Monna, L., Lin, H.X., Kojima, S., Sasaki, T. and Yano, M. (2002) Genetic dissection of a genomic region for a
quantitative trait locus, Hd3, into two loci, Hd3a and Hd3b, controlling heading date in rice. Theoretical
and Applied Genetics 104, 772778.
Mooers, C.A. (1921) The agronomic placement of varieties. Journal of American Society of Agronomy 13,
337352.
Moore, S.K. and Srivastava, V. (2006) Efficient deletion of transgenic DNA from complex integration locus
of rice mediated by Cre/lox recombination system. Crop Science 46, 700705.
Moreau, L., Charcosset, A., Hospital, F. and Gallais, A. (1998) Marker-assisted selection efficiency in popu-
lations of finite size. Genetics 148, 13531365.
Moreau, L., Lemarie, S., Charcosset, A. and Gallais, A. (2000) Economic efficiency on one cycle of marker-
assisted selection. Crop Science 40, 329337.
Moreau, L., Charcosset, A. and Gallais, A. (2004) Experimental evaluation of several cycles of marker-
assisted selection in maize. Euphytica 137, 111118.
Moreno-Gonzalez, J., Dudley, J.W. and Lambert, R.J. (1975) A design II study of linkage disequilibrium for
percent oil in maize. Crop Science 15, 840843.
Morgante, M. and Vogel, J. (1997) Compound microsatellite primers for the detection of genetic polymor-
phisms. Patent EP 0804618.
Morris, M., Dreher, K., Ribau, J.M. and Khairallah, M. (2003) Money matters (II): cost of maize inbred
line conversion schemes at CIMMYT using conventional and marker-assisted selection. Molecular
Breeding 11, 235247.
Morton, N.E. (1955) Sequential test for the detection of linkage. American Journal of Human Genetics 7,
277318.
Moser, H. and Lee, M. (1994) RFLP variation and genealogical distance, multivariate distance, heterosis
and genetic variance in oats. Theoretical and Applied Genetics 87, 947956.
Mu, J., Zhou, H., Zhao, S., Xu, C., Yu, S. and Zhang, Q. (2004) Development of contiguous introgres-
sion lines covering entire genome of the sequenced japonica rice. In: New Directions for a Diverse
Planet: Proceedings of the 4th International Crop Science Congress, 26 September1 October 2004,
Brisbane, Australia. Published on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004/
(accessed 17 November 2009).
Muehlbauer, G.J., Specht, J.E., Thomas-Compton, M.A., Staswick, P.E. and Bernard, R.L. (1988) Near-
isogenic lines a potential resource in the integration of conventional and molecular marker linkage
maps. Crop Science 28, 729735.
Mueller, M., Goel, A., Thimma, M., Dickens, N.J., Aitman, T.J. and Mangion, J. (2006) eQTL Explorer: inte-
grated mining of combined genetic linkage and expression experiments. Bioinformatics 22, 509511.
Mukhambetzhanov, S.K. (1997) Culture of nonfertilized female gametophytes in vitro. Plant Cell, Tissue
and Organ Culture 48, 111119.
Mullis, K. (1992) Process for amplifying nucleic acid sequences. Patent EP 0201184B1.
Mumm, R.H. and Dudley, J.W. (1994) Classification of 148 U.S. maize inbreds. I. Cluster analysis based on
RFLPs. Crop Science 34, 842851.
Munaf, M.R. and Flint, J. (2004) Meta-analysis of genetic association studies. Trends in Genetics 20,
439444.
Muranty, H. (1996) Power of tests for quantitative trait loci detection using full-sib families in different
schemes. Heredity 76, 156165.
Murigneux, A., Baud, S. and Beckert, M. (1993) Molecular and morphological evaluation of doubled-hap-
loid lines in maize: 2. Comparison with single-seed-descent lines. Theoretical and Applied Genetics
87, 278287.
Mles, S., Peiffer, J., Brown, P.J., Ersoz, E.S., Zhang, Z., Costich, D.E. and Buckler, E.S. (2009) Association
mapping: critical considerations shift from genotyping to experimental design. The Plant Cell 21,
21942202.
682 References
Nagaraju, J. (2003) Novel FISSR-PCR primes and method of identifying genotyping diverse genomes of
plant and animal systems including rice varieties, a kit thereof. Patent WO 03085133.
Nakagahra, M. (1972) Genetic mechanism on the distorted segregation of marker gene belonging to the
eleventh linkage group in cultivated rice. Japanese Journal of Breeding 22, 232238.
Nakazaki, T., Okumoto, Y., Horibata, A., Yamahira, S., Teraishi, M., Nishida, H., Inoue, H. and Tanisaka, T.
(2003) Mobilization of a transposon in the rice genome. Nature 421, 170172.
Naqvi, S., Zhu, C., Farrea, G., Ramessara, K., Bassiea, L., Breitenbach, J., Conesa, D.P., Ros, G.,
Sandmann, G., Capell, T. and Christou, P. (2009) Transgenic multivitamin corn through biofortification
of endosperm with three vitamins representing three distinct metabolic pathways. Proceedings of the
National Academy of Sciences of the United States of America 106, 77627767.
Narayanan, N.N., Baisakh, N., Oliva, N.P., Vera Cruz, C.M., Gnanamanickam, S.S., Datta, K. and Datta, S.K.
(2004) Molecular breeding: marker-assisted selection combined with biolistic transformation for blast
and bacterial blight resistance in indica rice (cv. CO39). Molecular Breeding 14, 6171.
Naseem, A., Oehmmke, J.F. and Schimmelpfennig, D.E. (2005) Does plant variety intellectual property
protection improve farm productivity? Evidence from cotton varieties. AgBioForum 8, 100107.
Navarro, R.L., Warrier, G.S. and Maslog, C.C. (2006) Genes Are Gems: Reporting Agri-Biotechnology.
A Sourcebook for Journalists. International Crops and Research Institute for the Semi-Arid Tropics,
Andhra Pradesh, India, 136 pp.
Naylor, R.L., Falcon, W.P., Goodman, R.M., Jahn, M.M., Sengooba, T., Tefera, H. and Nelson, R.J. (2004)
Biotechnology in the developing world: a case for increased investments in orphan crops. Food Policy
29, 1544.
Negrotto, D., Jolley, M., Beer, S., Wenck, A.R. and Hansen, G. (2000) The use of hosphomannose-iso-
merase as a selectable marker to recover transgenic maize plants (Zea mays L.) via Agrobacterium
transformation. Plant Cell Reports. 19, 798803.
Nei, M. (1972) Genetic distance between populations. The American Naturalist 106, 283292.
Nei, M. (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy
of Sciences of the United States of America 70, 33213323.
Nei, M., Tajima, F. and Tateno, Y. (1983) Accuracy of estimated phylogenetic trees from molecular data. II.
Gene frequency data. Journal of Molecular Evolution 19, 153170.
Nelson, O.E. (2001) Maize: the long trail to QTM. In: Reeve, E.C.R. and Black, I. (eds) Encyclopedia of
Genetics. Fitzroy Dearborn, London, pp. 657660.
Neuffer, M.G., Coe, E.H. and Wessler, S. (1997) Mutants of Maize. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, New York.
Ngetich, K.A. (2005) Indigenous Knowledge, Alternative Medicine and Intellectual Property Rights
Concerns in Kenya. 11th General Assembly, 610 December 2005, Maputo, Mozambique. Egerton
University, Njoro, Kenya.
Nguyen, B.D., Brar, D.S., Bui, B.C., Nguyen, T.V, Pham, L.N. and Nguyen, H.T. (2003) Identification and
mapping of the QTL for aluminum tolerance introgressed from the new source Oryza rufipogon Griff.
into indica rice (Oryza sativa L.). Theoretical and Applied Genetics 106, 583593.
Nguyen, H.T., Chandra Babu, R. and Blum, A. (1997) Breeding for drought tolerance in rice: physiology and
molecular genetics considerations. Crop Science 37, 14261434.
Nguyen, T.T.T., Klueva, N., Chamareck, V., Aarti, A., Magpantay, G., Millena, A.C.M., Pathan, M.S. and
Nguyen, H.T. (2004) Saturation mapping of QTL regions and identification of putative candidate genes
for drought tolerance in rice. Molecular Genetics and Genomics 272, 3546.
Ni, J.J., Wu, P., Senadhira, D. and Huang, N. (1998) Mapping QTLs for phosphorus deficiency tolerance in
rice (Oryza sativa L.). Theoretical and Applied Genetics 97, 13611369.
Ni, Z.F., Sun, Q.X., Liu, Z.Y. and Huang, T.C. (1997) Studies on heterotic grouping in wheat: II. Genetic
diversity among common wheat, Tibet semi-wild wheat and spelt wheat. Journal of Agricultural
Biotechnology (China) 5, 103111.
Nicholas, F.W. (2006) Discovery, validation and delivery of DNA markers. Australian Journal of Experimental
Agriculture 46, 155158.
Nicholson, L., Gonzalez-Melendi, P., van Dolleweerd, C., Tuck, H., Perrin, Y., Ma, J.K.-C., Fischer, R.,
Christou, P. and Stoger, E. (2005) A recombinant multimeric immunoglobulin expressed in rice
shows assembly dependent subcellular localization in endosperm cells. Plant Biotechnology 3,
115127.
Nickson, T.E. (2008) Planning environmental risk assessment for genetically modified crops: problem for-
mulation for stress-tolerant crops. Plant Physiology 147, 494502.
References 683
Nicolas, P. and Chiapello, H. (2007) Gene prediction. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds)
Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 7185.
Niebur, W.S., Rafalski, J.A., Smith, O.S. and Cooper, M. (2004) Applications of genomics technologies to
enhance rate of genetic progress for yield of maize within a commercial breeding program. In: Fischer,
T. (ed.) New Directions for a Diverse Planet. Proceedings of the 4th International Crop Science
Congress, Brisbane, Australia. Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17
November 2009).
Nilsson, M., Malmgren, H., Samiotaki, M., Kwiatkowski, M., Chowdhary, B.P. and Landegren, U.
(1994) Padlock probes: circularization oligonucleotides for localized DNA detection. Science 265,
20852088.
Nobcourt, P. (1939) Sur la prennite et laugmentation de volume des cultures de tissus vgtaux.
Comptes Rendus des Sances-Societe Biologie 130, 12701271.
Noirot, M., Anthony, F., Dussert, S. and Hamon, S. (2003) A method for building core collections. In: Hamon,
P., Seguin, M., Perrier, X. and Glaszmann, J.C. (eds) Genetic Diversity of Cultivated Tropical Plants.
Science Publishers, Enfield, New Hampshire and CIRAD, Paris, pp. 6575.
Nordborg, M. (2000) Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with
partial self-fertilization. Genetics 154, 923929.
Nordborg, M., Borevitz, J.O., Bergelson, J., Berry, C.C., Chory, J., Hagenblad, J., Kreitman, M., Maloof,
J.N., Noyes, T., Oefner, P.J., Stahl, E.A. and Weigel, D. (2002) The extent of linkage disequilibrium in
Arabidopsis thaliana. Nature Genetics 30, 190193.
NRC (National Research Council) (2001) Genetically Modified Pest-Protected Plants: Science and
Regulation. National Academy Press, Washington, DC.
NRC (National Research Council) (2002) Environmental Effects of Transgenic Plants: the Scope and
Adequacy of Regulation. National Academy Press, Washington, DC.
Nunberg, A.N., Li, Z. and Thomas, T.L. (1996) Analysis of gene expression and gene isolation by high-
throughput sequencing of plant cDNAs. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G.
Landes Company, Austin, Texas, pp. 169177.
Nyquist, W.E. (1991) Estimation of heritability and prediction of selection response in plant populations.
Critical Review of Plant Science 10, 235322.
OBrien, S.J. and Mayr, E. (1991) Bureaucratic mischief: recognizing endangered species and subspecies.
Science 251, 11871188.
OFlanagan, R.A., Paillard, G., Lavery, R. and Sengupta, A.M. (2005) Non-additivity in proteinDNA bind-
ing. Bioinformatics 21, 22542263.
Odell, J.T., Nagy, F. and Chua, N.H. (1985) Identification of DNA sequences required for activity of the
cauliflower mosaic virus 35S promoter. Nature 313, 810812.
Ogawa, Y., Dansako, T., Yano, K., Sakurai, N., Suzuki, H., Aoki, K., Noji, M., Saito, K. and Shibata, D.
(2008) Efficient and high-throughput vector construction and Agrobacterium-mediated transformation
of Arabidopsis thaliana suspension-cultured cells for functional genomics. Plant and Cell Physiology
49, 242250.
Oka, H.I. (1988) Origin of Cultivated Rice. Japan Scientific Societies Press, Tokyo.
Okkels, T.F. and Whenham, R.J. (1994) Method for the selection of genetically transformed cells and com-
pound for the used in the method. Patent EP 0601092B1.
Olek, A. (1996) Amplification of simple sequence repeats. Patent EP 0870062.
Oleykowski, C.A., Bronson Mullins, C.R., Godwin, A.K. and Yeung, A.T. (1998) Mutation detection using a
novel plant endonuclease. Nucleic Acids Research 26, 45974602.
Oliver, S.G., Winson, M.K., Kell, D.B. and Baganz, F. (1998) Systematic functional analysis of the yeast
genome. Trends in Biotechnology 16, 373378.
Olufowote, J.O., Xu, Y., Chen, X., Park, W.D., Beachell, H.M., Dilday, R.H., Goto, M. and McCouch, S.R.
(1997) Comparative evaluation of within-cultivar variation of rice (Oryza sativa L.) using microsatellite
and RFLP markers. Genome 40, 370378.
Openshaw, S. and Bruce, W.B. (2001) Marker-assisted identification of a gene associated with a pheno-
typic trait. Patent EP 1230385.
Openshaw, S.J. and Frascaroli, E. (1997) QTL detection and marker-assisted selection for complex traits in
maize. Proceedings of Corn and Sorghum Industrial Research Conference 52, 4453.
Oraguzie, N.C., Wilcox, P.L., Rikkerink, E.H.A. and De Silva, H.N. (2007) Linkage disequilibrium. In:
Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in
Plants. Springer, Berlin, pp. 1139.
684 References
Orf, J.H., Chase, K., Jarvik, T., Mansur, L.M., Cregan, P.B., Adler, F.R. and Lark, K.G. (1999) Genetics of
agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Science 39,
16421651.
Ortiz, R. and Smale, M. (2007) Transgenic technology: pro-poor or pro-rich? Chronica Horticulturae 47,
912.
Ossowski, S., Schwab, R. and Weigel, D. (2008) Gene silencing in plants using artificial microRNAs and
other small RNAs. The Plant Journal 53, 674690.
Ouyang, Z., Mowers, R.P., Jensen, A., Wang, S. and Zeng, S. (1995) Cluster analysis for genotype
environment interaction with unbalanced data. Crop Science 33, 13001305.
Ow, D.W. (2001) The right chemistry for marker gene removal? Nature Biotechnology 19, 115116.
Ow, D.W. (2002) Recombinase-directed plant transformation for the post-genomic era. Plant Molecular
Biology 48, 183200.
Ow, D.W., Wood, K.V., DeLuca, M., de Wet, J.R., Helinski, D.R. and Howell, S.H. (1986) Transient and stable
expression of the firefly luciferase gene in plant cells and transgenic plants. Science 234, 856859.
Owen, H.R. (1996) Plant germplasm. In: Hunter-Cevera, J.C. and Belt, A. (eds) Maintaining Cultures for
Biotechnology and Industry. Academic Press, Inc., London, pp. 197228.
Paine, J.A., Shipton, C.A., Chaggar, S., Howells, R.M., Kennedy, M.J., Vernon, G., Wright, S.Y., Hinchliffe,
E., Adams, J.L., Silverstone, A.L. and Drake, R. (2005) Improving the nutritional value of Golden Rice
through increased pro-vitamin A content. Nature Biotechnology 23, 482487.
Palmer, C.E. and Keller, W.A. (2005) Overview of haploidy. In: Palmer, C.E., Keller, W.A. and Kasha, K.J.
(eds) Biotechnology in Agriculture and Forestry, Vol. 56. Haploids in Crop Improvement II. Springer-
Verlag, Berlin, pp. 39.
Palmer, C.E., Keller, W.A. and Kasha, K.J. (eds) (2005) Biotechnology in Agriculture and Forestry, Vol. 56.
Haploids in Crop Improvement II. Springer-Verlag, Berlin.
Palmer, L.E., Rabinowicz, P.D., OShaughnessy, A.L., Balija, V.S., Nascimento, L.U., Dike, S., de la Bastide,
M., Martienssen, R.A. and McCombie, W.R. (2003) Maize genome sequencing by methylation filtra-
tion. Science 302, 21152117.
Palmer, R.G. and Shoemaker, R.C. (1998) Soybean genetics. In: Hrustic, M., Vidic, M. and Jackovic, D.
(eds) Soybean Institute of Field and Vegetative Crops. Novi Sad, Yugoslavia, pp. 4582.
Palmiter, R.D., Norstedt, G., Gelinas, R.E., Hammer, R.E. and Brinster, R.L. (1983) Metallothionein
human GH fusion genes stimulate growth of mice. Science 222, 809814.
Pan, Q.L., Liu, Y.S., Budai-Hadrian, O., Sela, M., Carmel-Goren, L., Zamir, D. and Fluhr, R. (2000)
Comparative genetics of nucleotide binding size leucine-rich repeat resistance gene homologues in
the genomes of two dicotyledons: tomato and Arabidopsis. Genetics 155, 309322.
Panaud, O., Chen, X. and McCouch, S.R. (1996) Development of microsatellite markers and characteriza-
tion of simple sequence length polymorphism (SSLP) in rice (Oryza sativa L.). Molecular and General
Genetics 252, 597607.
Pang, S.-Z., DeBoer, D.L., Wan, Y., Ye, G., Layton, J.G., Neher, M.K., Armstrong, C.L., Fry, J.E., Hinchee,
M.A.W. and Fromm, M.E. (1996) An improved green fluorescent protein gene as a vital marker in
plants. Plant Physiology 112, 893900.
Para, R., Acosta, J., Delgado-Salinas, A. and Gepts, P. (2005) A genome-wide analysis of differentia-
tion between wild and domesticated Phaseolus vulgaris from Mesoamerica. Theoretical and Applied
Genetics 111, 11471158.
Paran, I. and Michelmore, R.W. (1993) Development of reliable PCR-based markers linked to downy mil-
dew resistance genes in lettuce. Theoretical and Applied Genetics 85, 985993.
Paran, I., Kesseli, R.V. and Michemore, R.W. (1991) Identification of RFLP and RAPD markers linked to
downy mildew resistance genes in lettuce using near-isogenic lines. Genome 34,10211027.
Pardey, P.G., Wright, B.D., Nottenburg, C., Binenbaum, E. and Zambrano, P. (2003) Intellectual prop-
erty and developing countries: freedom to operate in agricultural biotechnology. Biotechnology
and Genetic Resource Policies Brief 3. International Food Policy Research Institute (IFPRI),
Washington, DC.
Parekh, S.R. (ed.) (2004) The GMO Handbook: Genetically Modified Animals, Microbes and Plants in
Biotechnology. Humana Press, Totowa, New Jersey.
Parinov, S. and Sundaresan, V. (2000) Functional genomics in Arabidopsis: large scale insertional mutagen-
esis complements the genome sequencing project. Current Opinion in Biotechnology 11, 157161.
Parisseaux, B. and Bernardo, R. (2004) In silico mapping of quantitative trait loci in maize. Theoretical and
Applied Genetics 109, 508514.
References 685
Park, S.J., Walsh, E.J., Reinbergs, E., Song, L.S.P. and Kasha, K. (1976) Field performance of doubled
haploid barley lines in comparison with lines developed by the pedigree and single seed descent
methods. Canadian Journal of Plant Science 56, 467474.
Parkin, I.A.P., Gulden, S.M., Sharp, A.G., Lukens, L., Trick, M., Osborn, T.C. and Lydiate, D.J. (2005)
Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis
thaliana. Genetics 171, 765781.
Paterson, A.H. (1996a) Mapping genes responsible for differences in phenotype. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 4154.
Paterson, A.H. (1996b) Physical mapping and map-based cloning: bridging the gap between DNA markers
and genes. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, Texas,
pp. 5562.
Paterson, A.H. (ed.) (1998) Molecular Dissection of Complex Traits. CRC Press, Boca Raton, Florida,
305 pp.
Paterson, A.H., Lander, E.S., Hewitt, J.D., Peterson, S., Lincoln, S.E. and Tanksley, S.D. (1988) Resolution
of quantitative traits into Mendelian factors, using a complete linkage map of restriction fragment
length polymorphisms. Nature 335, 721726.
Paterson, A.H., Deverna, J.W., Lanini, B. and Tanksley, S.D. (1990) Fine mapping of quantitative trait loci
using selected overlapping recombinant chromosomes, in an interspecific cross of tomato. Genetics
124, 735742.
Paterson, A.H., Damon, S., Hewitt, J.D., Zamir, D., Rabinowitch, H.D., Lincoln, S.E., Lander, E.C. and
Tanksley, S.D. (1991) Mendelian factors underlying quantitative traits in tomato: comparison across
species, generation and environments. Genetics 127, 181197.
Paterson, A.H., Lin, Y.R., Li, Z., Schertz, K.F., Doebley, J.F., Pinson, S.R.M., Liu, S.-C., Stansel, J.W. and
Irvine, J.E. (1995) Convergent domestications of cereal crops by independent mutations at corre-
sponding genetic loci. Science 269, 17141718.
Paterson, A.H., Saranga, Y., Menz, M., Jiang, C.X. and Wright, R.J. (2003) QTL analysis of genotype envi-
ronment interactions affecting cotton fiber quality. Theoretical and Applied Genetics 106, 384396.
Patwardhan, B. (2005) Ethnopharmacology and drug discovery. Journal of Ethnopharmacology 100,
5052.
Peacock, J. and Chaudhury, A. (2002) The impact of gene technologies on the use of genetic resources. In:
Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic
Diversity. International Plant Genetic Resources Institute, Rome, pp. 3342.
Peakall, R., Gilmore, S., Keys, W., Morgante, M. and Rafalski, A. (1998) Cross-species amplification of
soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera:
implications for the transferability of SSRs in plants. Molecular Biology and Evolution 15, 12751287.
Pearson, J.V., Huentelman, M.J., Halperin, R.F., Tembe, W.D., Melquist, S., Homer, N., Brun, M., Szelinger,
S., Coon, K.D., Zismann, V.L., Webster, J.A., Beach, T., Sando, S.B., Aasly, J.O., Heun, R., Jessen, F.,
Klsch, H., Tsolaki, M., Daniilidou, M., Reiman, E.M., Papassotiropoulos, A.P., Hutton, M.L., Stephan,
D.A. and Craig, D.W. (2007) Identification of the genetic basis for complex disorders by use of pooling-
based genomewide single-nucleotide-polymorphism association studies. American Journal of Human
Genetics 80, 126139.
Pearson, W.R., Wood, T., Zhang, Z. and Miller, W. (1997) Comparison of DNA sequences with protein
sequences. Genomics 15, 2436.
Peleg, Z., Saranga, Y., Suprunova, T., Ronin, Y., Rder, M.S., Kilian, A., Korol, A.B. and Fahima, T. (2008)
High-density genetic map of durum wheat wild emmer wheat based on SSR and DArT markers.
Theoretical and Applied Genetics 117, 103115.
Peleman, J.D. and van der Voort, J.R. (2003) Breeding by design. Trends in Plant Science 8, 330334.
Peleman, J.D., Wye, C., Zethof, J., Sorensen, A.P., Verbakel, H., van Oeveren, J., Gerats, T. and van der
Voort, J.R. (2005) Quantitative trait locus (QTL) isogenic recombinant analysis: a method for high-
resolution mapping of QTL within a single population. Genetics 171, 13411352.
Pea, L. (ed.) (2004) Methods in Molecular Biology, Vol. 286: Transgenic Plants: Methods and Protocols.
Humana Press Inc., Totowa, New Jersey.
Peng, J., Richards, D.E., Hartley, N.M., Murphy, G.P., Devos, K.M., Flintham, J.E., Beales J., Fish, L.J.,
Wordland, A.J., Pelica, F., Sudhakar D., Christou, P., Snape, J.W., Gale, M.D. and Harberd, N.P. (1999)
Green revolution genes encode mutant gibberellin response modulators. Nature 400, 256261.
Peng, Z.B., Liu, X.Z., Fu, J.H., Li, L.C. and Huang, C.L. (1998) Preliminary studies on the superior inbred
groups and construction of heterosis mode. Acta Agronomica Sinica 24, 711717.
686 References
Pereira, M.G., Lee, M.M. and Rayapati, P.J. (1994) Comparative RFLP and QTL mapping in sorghum and
maize. In: Second Internal Conference on the Plant Genome. Scherago Int., New York, Poster 169.
Prez, T., Albornoz, J. and Dominguez, A. (1998) An evaluation of RAPD fragment reproducibility and
nature. Molecular Evolution 7, 13471358.
Prez-Enciso, M. (2004) In silico study of transcriptome genetic variation in outbred populations. Genetics
166, 547554.
Perkins, J.M. and Jinks, J.L. (1973) The assessment and specificity of environmental and genotypeenvi-
ronmental components of variability. Heredity 30, 111126.
Perlin, M. (1995) Method and system for genotyping. Patent EP 0714537.
Perumal, R., Krishnaramanujam, R., Menz, M.A., Katil, S., Dahlberg, J., Magill, C.W. and Rooney, W.L.
(2007) Genetic diversity among sorghum races and working groups based on AFLPs and SSRs. Crop
Science 47, 13751383.
Pesek, J. and Baker, R.J. (1969) Desired improvement in relation to selection indices. Canadian Journal of
Plant Science 49, 803804.
Peters, J.L., Cnudde, F. and Gerats, T. (2003) Forward genetics and map-based cloning approaches. Trends
in Plant Science 8, 484491.
Peterson, D.G., Schulze, S.R., Sciara, E.B., Lee, S.A., Bowers, J.E., Nagel, A., Jiang, N., Tibbitts, D.C.,
Wessler, S.R. and Paterson, A.H. (2002) Integration of Cot analysis, DNA cloning and high throughput
sequencing facilitate genome characterization and gene discovery. Genome Research 12, 795807.
Peterson, P.A. (1992) Quantitative inheritance in the era of molecular biology. Maydica 37, 718.
Pettersson, F., Morris, A.P., Barnes, M.R. and Cardon, L.R. (2008) Goldsurfer2 (Gs2): a comprehensive tool
for the analysis and visualization of genome wide association studies. BMC Bioinformatics 9, 138.
Phillips, R.L. (2006) Genetic tools from nature and the nature of genetic tools. Crop Science 46,
22452252.
Phillips, R.L. (2008) Can genome sequencing of model plants be helpful for crop improvement? Proceedings
of 5th International Crop Science Congress, 1318 April 2008, Jeju, Korea. International Crop Science
Society, Madison, Wisconsin.
Phillips, R.L., Chen, J., Okediji, R. and Burk, D. (2004) Intellectual property rights and the public good. The
Scientist 18, 8.
Phizicky, E., Bastiaens, P.I.H., Zhu, H., Snyder, M. and Fields, S. (2003) Protein analysis on a proteomic
scale. Nature 422, 208215.
Pickering, R.A. and Devaux, P. (1992) Haploid production: approaches and use in plant breeding. In: Shewry,
P.R. (ed.) Barley: Genetics, Molecular Biology and Biotechnology. CAB International, Wallingford, UK,
pp. 511539.
Picoult-Newberg, L., Ideker, T.E., Pohl, M.G., Taylor, S.L., Donaldson, M.A., Nickerson, D.A. and Boyce-
Jacino, M. (1999) Mining SNPs from EST databases. Genome Research 9, 167174.
Piepho, H.P. (2000) A mixed model approach to mapping quantitative trait loci in barley on the basis of
multiple environment data. Genetics 156, 20432050.
Pillen, K., Pineda, O., Lewis, C.B. and Tanksley, S.D. (1996) Status of genome mapping tools in the taxon
Solonaceae. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, TX,
pp. 281308.
Pineda, O., Bonierbale, M.W., Plaisted, R.L., Brodie, B.B. and Tanksley, S.D. (1993) Identification of RFLP
markers linked to the H1 gene conferring resistance to the potato cyst nematode Globodera rosto-
chiensis. Genome 36, 152156.
Plant Ontology Consortium (2002) The Plant Ontology Consortium and plant ontologies. Comparative
and Functional Genomics 3, 137142.
Plomion, C., Durel C.-E. and OMalley, D.M. (1996) Genetic dissection of height in maritime pine seedlings
raised under accelerated growth conditions. Theoretical and Applied Genetics 93, 849858.
Plotsky, Y., Cahaner, A., Haberfeld, A., Lavi, U., Lamont, S.J. and Hillel, J. (1993) DNA fingerprint bands
applied to linkage analysis with quantitative trait loci in chickens. Animal Genetics 24, 105110.
Podlich, D.W. and Cooper, M. (1998) QU-GENE: a platform for quantitative analysis of genetic models.
Bioinformatics 14, 632653.
Podlich, D.W., Cooper, M.E. and Basford, K.E. (1999) Computer simulation of a selection strategy to
accommodate genotypeenvironment interactions in a wheat recurrent selection programme. Plant
Breeding 118, 1728.
Podlich, D.W., Winkler, C.R. and Cooper, M. (2004) Mapping as you go: an effective approach for marker-
assisted selection of complex traits. Crop Science 44, 15601571.
References 687
Poehlman, J.M. and Quick, J.S. (1983) Crop breeding in a hunger world. In: Wood, D.R., Rawal, K.M.
and Wood, M.N. (eds) Crop Breeding. American Society of Agronomy and Crop Science Society of
America, Madison, Wisconsin, pp. 119.
Pollak, L.M., Gardner, C.O., Kahler, A.L. and Thomas-Compton, M. (1984) Further analysis of the mating
system in two mass selected populations of maize. Crop Science 24, 793796.
Pooni, H.S., Kumar, I. and Khush, G.S. (1992) A comprehensive model for disomically inherited metrical
traits expressed in triploid tissues. Heredity 69, 166174.
Pooni, H.S., Kumar, I. and Khush, G.S. (1993) Genetical control of amylose content in selected crosses of
indica rice. Heredity 70, 269280.
Popelka, J.C. and Altpeter, F. (2003) Agrobacterium tumefaciens-mediated genetic ransformation of rye
(Secale cereale L.). Molecular Breeding 11, 203211.
Popelka, J.C., Xu, J. and Altpeter, F. (2003) Generation of rye plants with low copy number after biolis-
tic gene transfer and production of instantly marker-free transgenic rye. Transgenic Research 12,
587596.
Porceddu, A., Albertini, E., Barcaccia, G., Marconi, G., Bertoli, F. and Veronesi, F. (2002) Development of
S-SAP markers based on an LTR-like sequence from Medicago sativa L. Molecular Genetics and
Genomics 267, 107114.
Porta, C. and Lomonossoff, G.P. (2002) Viruses as vectors for the expression of foreign sequences in
plants. Biotechnology and Genetic Engineering Reviews 19, 245291.
Portyanko, V.A., Hoffman, D.L., Lee, M. and Holland, J.B. (2001) A linkage map of hexaploid oat based on
grass anchor DNA clones and its relationship to other oat maps. Genome 44, 249265.
Potrykus, I. (2005) Golden Rice, vitamin A and blindness public responsibility and failure. Available at:
http://www.goldenrice.org/PDFs/Potrykus_Zurich_2005.pdf (accessed 17 November 2009).
Prasanna, B.M., Vasal, S.K., Kassahun, B. and Singh, N.N. (2001) Quality protein maize. Current Science
81, 13081319.
Preston, L.R., Harker, N., Holton, T. and Morell, M.K. (1999) Plant cultivar identification using DNA analysis.
Plant Varieties and Seeds 12, 191205.
Price, A.H. and Tomos, A.D. (1997) Genetic dissection of root growth in rice (Oryza sativa L.): II. Mapping
quantitative trait loci using molecular markers. Theoretical and Applied Genetics 95, 143152.
Primmer, C.R., Ellengren, H., Saino, N. and Moller, A.P. (1996) Directional evolution in germline microsatel-
lite mutations. Nature Genetics 13, 391393.
Primrose, S.B. (1995) Principles of Genome Analysis. Blackwell Science, Oxford, UK, pp. 1437.
Pritchard, J.K. and Rosenberg, N.A. (1999) Use of unlinked genetic markers to detect population stratifica-
tion in association studies. American Journal of Human Genetics 65, 220228.
Pritchard, J.K., Stephens, M. and Donnelly, P. (2000a) Inference of population structure using multilocus
genotype data. Genetics 155, 945959.
Pritchard, J.K., Stephens, M., Rosenberg, N.A. and Donnelly, P. (2000b) Association mapping in structured
populations. American Journal of Human Genetics 67, 170181.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., de Bakker,
P.I.W., Daly, M.J. and Sham, P.C. (2007) PLINK: a toolset for whole-genome association and popula-
tion-based linkage analysis. American Journal of Human Genetics 81, 559575.
Qi, X., Stam, P. and Lindhout, P. (1998) Use of locus-specific AFLP markers to construct a high-density
molecular map in barley. Theoretical and Applied Genetics 96, 376384.
Qi, X., Pittaway, T.S., Lindup, S., Liu, H., Waterman, E., Padi, F.K., Hash, C.T., Zhu, J., Gale, M.D. and
Devos, K.M. (2004) An integrated genetic map and a new set of simple sequence repeat markers for
pearlmillet, Pennisetum glaucum. Theoretical and Applied Genetics 109, 14851493.
Qian, W., Sass, O., Meng, J., Li, M., Frauen, M. and Jung, C. (2007) Heterotic patterns in rapeseed (Brassica
napus L.): I. Crosses between spring and Chinese semi-winter lines. Theoretical and Applied Genetics
115, 2734.
Quarrie, S.A., Lazic-Jancic, V., Kovacevic, D., Steed, A. and Pekic, S. (1999) Bulk segregant analysis with
molecular markers and its use for improving drought resistance in maize. Journal of Experimental
Botany 50, 12991306.
Rabinowicz, P.D., Schulz, K., Dedhia, N., Yordan, C., Parnemm, L.D., Parnell., L.D., Stein, L., McCombie, R.
and Martienssen, R.A. (1999) Differential methylation of genes and retrotransposons facilitates shot
gun sequencing of maize genome. Nature Genetics 23, 305308.
Raboin, L.-M., Pauquet, J., Butterfield, M., DHont, A. and Glasmann, J.-C. (2008) Analysis of genome-wide link-
age disequilibrium in the high polyploidy sugarcane. Theoretical and Applied Genetics 116, 701714.
688 References
Rae, S.J., Macaulay, M., Ramsay, L., Leigh, F., Mathews, D., OSullivan, D.M., Donini, P., Morris, P.C.,
Powell, W., Marshall, D.F., Waugh, R. and Thomas, W.T.B. (2007) Molecular barley breeding. Euphytica
158, 295303.
Rafalski, A. (2002) Applications of single nucleotide polymorphisms in crop genetics. Current Opinion in
Plant Biology 5, 94100.
Ragavan, S. (2006) Of plant variety protection, agricultural subsidies and the WTO. Available at: http://www.
law.ou.edu/faculty/facfiles/OfPlantVarietyProtection.pdf (accessed 17 November 2009).
Ragot, M. and Lee, M. (2007) Marker-assisted selection in maize: current status, potential, limitations
and perspectives from the private and public sectors. In: Guimares, E.P., Ruane, J., Scherf, B.D.,
Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives
in Crops, Livestock, Forestry and Fish. Food and Agriculture Organization of the United Nations,
Rome, pp. 117150.
Ragot, M., Biasiolli, M., Delbut, M.F., DellOrco, A., Malgarini, L., Thevenin, P., Vernoy, J., Vivant, J.,
Zimmermann, R. and Gay, G. (1995) Marker-assisted backcrossing: a practical example. In: Bervill,
A. and Tersac, M. (eds) Les Colloques, No. 72. Techniques et Utilisations des Marqueurs Molculaires.
Institute National de la Recherche Agronomique (INRA), Paris, pp. 4556.
Ragot, M., Gay, G., Muller, J.P. and Durovray, J. (2000) Efficient selection for the adaptation to the envi-
ronment through QTL mapping and manipulation in maize. In: Ribaut, J.-M. and Poland, D. (eds)
Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in Water-limited
Environments, Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mxico, DF, pp.
128130.
Rajaram, S., van Ginkel, M. and Fischer, R.A. (1994) CIMMYTs wheat breeding mega-environments (ME).
In: Proceedings of the 8th International Wheat Genetics Symposium, 2025 July 1993 Beijing, China.
Agricultural Scientech Press, Beijing, pp. 11011106.
Ramachandran, S. and Sundaresan, V. (2001) Transposons as tools for functional genomics. Plant
Physiology and Biochemistry 39, 243252.
Ramage, R.T. (1983) Heterosis and hybrid seed production in barley. In: Frankel, R. (ed.) Monographs on
Theoretical and Applied Genetics, Vol. 6. Heterosis. Springer-Verlag, Berlin, pp. 7193.
Ramakrishna, W. and Bennetzen, J.L. (2003) Genomic colinearity as a tool for plant gene isolation. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Inc., Totowa, New Jersey, pp. 109121.
Ramakrishna, W., Dubcovsky, J., Park, Y.-J., Busso, C., Emberton, J., SanMiguel, P. and Bennetzen, J.L.
(2002) Different types and rates of genome evolution detected by comparative sequence analysis of
orthologous segments from four cereal genomes. Genetics 162, 13891400.
Ramessar, K., Peremarti, A., Gmez-Galera, S., Naqvi, S., Moralejo, M., Muoz, P., Capell, T. and Christou,
P. (2007) Biosafety and risk assessment framework for selectable marker genes in transgenic crop
plants: a case of the science not supporting the politics. Transgenic Research 16, 261280.
Ramlingam, J., Basharat, H.S. and Zhang, G. (2002) STS and microsatellite marker-assisted selection for
bacterial blight resistance and waxy gene in rice, Oryza sativa L. Euphytica 127, 255260.
Rao, K.E.P. and Rao, V.R. (1995) The use of characterization data in developing a core collection of sor-
ghum. In: Hodgkin, T., Brown, A.H.D., van Hintum, Th.J.L. and Morales, E.A.V. (eds) Core Collections
of Plant Genetic Resources. WileySayce, Chichester, UK, pp. 109115.
Rappsilber, J., Siniossoglou, S., Hurt, E.C. and Mann, M. (2000) A generic strategy to analyze the spatial
organization of multi-protein complexes by cross-linking and mass spectrometry. Analytical Chemistry
72, 267275.
Rebai, A. and Goffinet, B. (1993) Power of test for QTL detection using replicated progenies derived from
a diallel crosses. Theoretical and Applied Genetics 86, 10141022.
Rebai, A., Goffinet, B., Mangin, B. and Perret, D. (1994) QTL detection with diallel schemes. In: van Ooijen,
J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of Molecular Markers. Centre for
Plant Breeding and Reproduction Research, Wageningen, Netherlands, pp. 170177.
Reddy, B.V.S. and Comstock, R.E. (1976) Simulation of the backcross breeding method. I. Effect of herit-
ability and gene number on fixation of desired alleles. Crop Science 16, 825830.
Reed, J., Privalle, L., Powell, M.L., Meghji, M., Dawson, J., Dunder, E., Suttie, J., Wenck, A., Launis, K.,
Kramer, C., Chang, Y.-F., Hansen, G. and Wright, M. (2001) Phosphomannose isomerase: an efficient
selectable marker for plant transformation. In Vitro Cellular and Developmental Biology Plant 37,
127132.
References 689
Reeves, T., Pinstrup-Anderson, P. and Randya-Lorch, R. (1999) Food security and role of agricultural
research. In: Coors, J.G. and Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops.
ASA-CSSA-SSSA, Madison, Wisconsin, pp. 15.
Reif, J.C., Melchinger, A.E., Xia, X.C., Warburton, M.L., Hoisington, D.A., Vasal, S.K., Srinivasan, G., Bohn,
M. and Frisch, M. (2003) Genetic distance based on simple sequence repeats and heterosis in tropi-
cal maize populations. Crop Science 43, 12751282.
Reif, J.C., Xia, X.C., Melchinger, A.E., Warburton, M.L., Hoisington, D.A., Beck, D., Bohn, M. and Frisch,
M. (2004) Genetic diversity determined within and among CIMMYT maize populations of tropical,
subtropical and temperate germplasm by SSR markers. Crop Science 44, 326334.
Reif, J.C., Melchinger, A.E. and Frisch, M. (2005) Genetical and mathematical properties of similarity and dis-
similarity coefficients applied in plant breeding and seed bank management. Crop Science 45, 17.
Reiter, R. (2001) PCR-based marker systems. In: Phillip, R.L. and Vasil, I.K. (eds) DNA-Based Markers in
Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 929.
Remington, D.L., Thornsberry, J.M., Matsuoka, Y., Wilson, L.M., Whitt, S.R., Doebley, J., Kresovich, S.,
Goodman, M.M. and Buckler IV, E.S. (2001) Structure of linkage disequilibrium and phenotypic asso-
ciations in the maize genome. Proceedings of the National Academy of Sciences of the United States
of America 98, 1147911484.
Repellin, A., Bga, M., Jauhar, P.P. and Chibbar, R.N. (2001) Genetic enrichment of cereal crops via alien
gene transfer: new challenges. Plant Cell, Tissue and Organ Culture 64, 159183.
Reymond, M., Muller, B., Leonardi, A., Charcosset, A. and Tardieu, F. (2003) Combining quantitative trait
loci analysis and an ecophysiological model to analyze the genetic variability of the responses of
maize leaf growth to temperature and water deficit. Plant Physiology 131, 664675.
Reyna, N. and Sneller, C.H. (2001) Evolution of marker-assisted introgression of yield QTL alleles into
adapted soybean. Crop Science 41, 13171321.
Reynolds, J., Weir, B.S. and Cockerham, C.C. (1983) Estimation of the coancestry coefficient: basis for a
short-term genetic distance. Genetics 105, 767769.
Rhee, S.Y. (2005) Bioinformatics: current limitations and insights for the future. Plant Physiology 138,
569570.
Ribaut, J.-M. and Betrn, J. (1999) Single large-scale marker-assisted selection (SLS-MAS). Molecular
Breeding 5, 531541.
Ribaut, J.-M. and Ragot, M. (2007) Marker-assisted selection to improve drought adaptations in maize: the
backcross approach, perspectives, limitations and alternatives. Journal of Experimental Botany 58,
351360.
Ribaut, J.-M., Hoisington, D.A., Deutsch, J.A., Jiang, C. and Gonzlez-de-Len, D. (1996) Identification
of quantitative trait loci under drought conditions in tropical maize. I. Flowering parameters and the
anthesis-silking interval. Theoretical and Applied Genetics 92, 905914.
Ribaut, J.-M., Huu, X., Hoisington, D. and Gonzales de Leon, D. (1997) Use of STSs and SSRs as rapid and
reliable preselection tools in marker-assisted selection backcross scheme. Plant Molecular Biology
Reporter 15, 156164.
Ribaut, J.-M., Edmeades, G., Perotti, E. and Hoisington, D. (2000) QTL analyses, MAS results and per-
spectives for drought-tolerance improvement in tropical maize. In: Ribaut, J.-M. and Poland, D. (eds)
Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in Water-limited
Environments. Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mxico, DF, pp.
131136.
Ribaut, J.-M., Jiang, C. and Hoisington, D. (2002a) Simulation experiments on efficiencies of gene intro-
gression by backcrossing. Crop Science 42, 557565.
Ribaut, J.-M., Bnziger, M., Betran, J., Jiang, C., Edmeades, G.O., Dreher, K. and Hoisington, D. (2002b)
Use of molecular markers in plant breeding: drought tolerance improvement in tropical maize. In:
Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford,
UK, pp. 8599.
Richardson, K.L., Vales, M.I., Kling, J.G., Mundt, C.C. and Hayes, P.M. (2006) Pyramiding and dissecting
disease resistance QTL to barley stripe rust. Theoretical and Applied Genetics 113, 485495.
Rick, C.M. (1974) High soluble-solids content in large-fruited tomato lines derived from a wild green-fruited
species. Hilgardia 42, 493510.
Rick, C.M. (1988) Tomato-like nightshades: affinities, autoecology and breeders opportunities. Economic
Botany 42, 145154.
690 References
Rickert, A.M., Premstaller, A., Gebhardt, C. and Oefner, P.J. (2002) Genoptying of SNPs in a polyploid
genome by pyrosequencing. Biotechniques 32, 592603.
Roa-Rodriguez, C. (2003) Promoters Used to Regulate Gene Expression. CAMBIA Intellectual Property,
Canberra.
Roa-Rodriguez, C. and Nottenburg, C. (2003a) Agrobacterium-mediated Transformation of Plants. CAMBIA
Intellectual Property, Canberra.
Roa-Rodriguez, C. and Nottenburg, C. (2003b) Antibiotic Resistance Genes and Their Uses in Genetic
Transformation, Especially in Plants. CAMBIA Intellectual Property, Canberra.
Rber, F.K. (1999) Fortpflanzungsbiologische und genetische Untersuchungen mit RFLP-Markern zur
in-vivo-Haploideninduktion bei Mais. Dissertation, University of Hohenheim. Grauer Verlag, Stuttgart.
Rber, F.K., Gordillo, G.A. and Geiger, H.H. (2005) In vivo haploid induction in maize performance of new
inducers and significance of doubled haploid lines in hybrid breeding. Maydica 50, 275284.
Robert, V.J.M., West, M.A.L., Inai, S., Caines, A., Arntzen, L., Smith, J.K. and St-Clair, D.A. (2001) Marker-
assisted introgression of blackmold resistance QTL alleles from wild Lycopersicon chesmanii to
cultivated tomato (L. esculentum) and evaluation of QTL phenotypic effects. Molecular Breeding 8,
217233.
Roberts, E.H. (1973) Predicting the viability of seeds. Seed Science and Technology 1, 499514.
Roberts, J.K. (2002) Proteomics and a future generation of plant molecular biologists. Plant Molecular
Biology 48, 143154.
Robertson, D.S. (1985) A possible technique for isolating genomic DNA for quantitative traits in plants.
Journal of Theoretical Biology 117, 110.
Robertson, D.S. (1989) Understanding the relationship between qualitative and quantitative genetics. In:
Helentjaris, T. and Burr, B. (eds) Development and Application of Molecular Markers to Problems in
Plant Genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, pp. 8187.
Rockman, M.V. and Kruglyak, L. (2006) Genetics of global gene expression. Nature Reviews Genetics 7,
862872.
Rockman, M.V. and Kruglyak, L. (2008) Breeding designs for recombinant inbred advanced intercross
lines. Genetics 179, 10691078.
Rockman, M.V. and Wray, G.A. (2002) Abundant raw material for cis-regulatory evolution in humans.
Molecular Biology and Evolution 19, 19912004.
Rder, M., Plaschke, J. and Ganal, M. (1997) Microsatellite markers for plants of the species Triticum aes-
tivum and tribe Triticeae and the use of said markers. Patent EP 0835324B1.
Rogers, J.S. (1972) Measures of genetic similarity and genetic distance. In: Studies in Genetics VII, Publ.
7213. University of Texas, Austin, Texas, pp. 145153.
Romagosa, I. and Fox, P.N. (2003) Genotype environment interaction and adaptation. In: Hayward, M.D.,
Bosemark, N.O. and Romagosa, I. (eds) Plant Breeding, Principles and Prospects. Chapman & Hall,
London, pp. 373390.
Romagosa, I., Ullrich, S.E., Han, F. and Hayes, P.M. (1996) Use of the additive main effects and multiplica-
tive interaction model in QTL mapping for adaptation in barley. Theoretical and Applied Genetics 93,
3037.
Romano, A., van der Plas, L.H.W., Witholt, B., Eggink, G. and Mooibroek, H. (2005) Expression of poly-3-
(R)-hydroxyalkanoate (PHA) polymerase and acyl-CoA-transacylase in plastids of transgenic potato
leads to the synthesis of a hydrophobic polymer, presumably medium-chain-length PHAs. Planta 220,
455464.
Romeis. J., Bartsch, D., Bigler, F., Candolfi, M.P., Gielkens, M.M.C., Hartley, S.E., Hellmich, R.L., Huesing,
J.E., Jepson, P.C., Layton, R., Quemada, H., Raybould, A., Rose, R.I., Schiemann, J., Sears, M.K.,
Shelton, A.M., Sweet, J., Vaituzis, Z. and Wolt, J.D. (2008) Assessment of risk of insect-resistant
transgenic crops to nontarget anthropods. Nature Biotechnology 26, 203208.
Rommens, C.M., Haring, M.A., Swords, K., Davies, H.V. and Belknap, W.R. (2007) The intragenic approach
as a new extension to traditional plant breeding. Trends in Plant Science 12, 397403.
Ron Parra, J. and Hallauer, A.R. (1997) Utilization of exotic maize germplasm. Plant Breeding Reviews 14,
165187.
Rong, J., Feltus, F.A., Waghmare, V.N., Pierce, G.J., Chee, P.W., Draye, X., Saranga, Y., Wright, R.J.,
Wilkins, T.A., May, O.L., Smith, C.W., Gannaway, J.R., Wendel, J.F. and Paterson, A.H. (2007) Meta-
analysis of polyploid cotton QTL shows unequal contributions of subgenomes to a complex network
of genes and gene clusters implicated in lint fiber development. Genetics 176, 25772588.
Roos, E.E. (1984) Genetic shifts in mixed bean populations. I. Storage effects. Crop Science 24, 240244.
References 691
Roos, E.E. (1988) Genetic changes in a collection over time. HortScience 23, 8690.
Rostoks, N., Mudie, S., Cardle, L., Russell, J., Ramsay, L., Booth, A., Svensson, J.T., Wanamaker, S.I.,
Walia, H., Rodriguez, E.M., Hedley, P.E., Liu, H., Morris, J., Close, T.J., Marshall, D.F. and Waugh, R.
(2005) Genome-wide SNP discovery and linkage analysis in barley based on genes responsive to
abiotic stress. Molecular Genetics and Genomics 274, 515527.
Rudd, S., Schoof, H. and Mayer, K. (2005) PlantMarkers a database of predicted molecular markers from
plants. Nucleic Acids Research 33, D628632.
Ruf, S., Karcher, D. and Rock, R. (2007) Determining the transgene containment level provided by chlo-
roplast transformation. Proceedings of the National Academy of Sciences of the United States of
America 114, 69987002.
Sackville Hamilton, N.R. and Chorlton, K.H. (1997) Regenaration of accessions in seed collections: a deci-
sion guide. Handbook for Genebanks No. 5. International Plant Genetic Resources Institute, Rome.
Saghai Maroof, M.A., Yang, G.P., Zhang, Q. and Gravois, K.A. (1997) Correlation between molecular marker
distance and hybrid performance in US southern long grain rice. Crop Science 37, 145150.
Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W. and Velculescu, V.E.
(2002) Using the transcriptome to annotate the genome. Nature Biotechnology 20, 508512.
Saint-Louis, D. and Paquin, B. (2003) Method for genotyping microsatellite DNA markers by mass spec-
trometry. Patent WO 03035906.
Sakamoto, T. and Matsuoka, M. (2008) Identifying and exploiting grain yield genes in rice. Current Opinion
in Plant Biology 11, 209214.
Salathia, N., Lee, H.N., Sangster, T.A., Morneau, K., Landry, C.R., Schellenberg, K., Behere, A.S.,
Gunderson, K.L., Cavalieri, D., Jander, G. and Queitsch, C. (2007) Indel arrays: an affordable alterna-
tive for genotyping. The Plant Journal 51, 727737.
Salse, J., Piegu, B., Cooke, R. and Delseny, M. (2004) New in silico insight into the synteny between rice
(Oryza sativa L.) and maize (Zea mays SL.) highlights reshuffling and identifies new duplications in
the rice genome. The Plant Journal 38, 396409.
Salvi, S. and Tuberosa, R. (2005) To clone or not to clone plant QTLs: present and future challenges. Trends
in Plant Science 10, 297304.
Samalova, M., Brzobohaty, B. and Moore, I. (2005) pOp6/LhGR: a stringently regulated and highly respon-
sive dexamethasone-inducible gene expression system for tobacco. The Plant Journal 41, 919935.
San Noeum, L.H. (1976) Haploids of Hordeum vulgare L. from in vitro culture of unfertilized ovaries. Annales
de l Amelioration des Plantes 26, 751754.
Snchez-Monge, E. (1993) Introduction. In: Hayward, M.D., Bosemark, N.O. and Romagosa, I. (eds) Plant
Breeding, Principles and Prospects. Chapman & Hall, London, pp. 35.
Sanda, S.L. and Amasino, R.M. (1996) Ecotype-specific expression of a flowering mutant phenotype in
Arabidopsis thaliana. Plant Physiology 111, 641644.
Sano, Y. (1990) The genic nature of gamete eliminator in rice. Genetics 125, 183191.
Sant, V.J., Patankar, A.G., Sarode, N.D., Mhase, L.B., Sainani, M.N., Deshmukh, R.B., Ranjekar, P.K. and
Gupta, V.S. (1999) Potential of DNA markers in detecting divergence and in analyzing heterosis in
Indian elite chickpea cultivars. Theoretical and Applied Genetics 98, 12171225.
Saravanan, R.S., Bashir, S. and Rose, J.K.C. (2004) Plant proteomics. In: Christou, P. and Klee, H. (eds)
Handbook of Plant Biotechnology. John Wiley & Sons Ltd, Chichester, UK, pp. 183199.
Sari-Gorla, M., Calinski, T., Kaczmarek, Z. and Krajewski, P. (1997) Detection of QTL environment inter-
action in maize by a least squares interval mapping method. Heredity 78, 146157.
Sarkar, K.R., Pandey, A., Gayen, P., Mandan, J.K., Kumar, R. and Sachan, J.K.S. (1994) Stabilization of
high haploid inducer lines. Maize Genetics Cooperation Newsletter 68, 6465.
Satagopan, J.M., Yandell, B.S., Newton, M.A. and Osborn, T.G. (1996) A Bayesian approach to detect
quantitative trait loci using Markov chain Monte Carlo. Genetics 144, 805816.
Sauer, S., Gelfand, D.H., Boussicault, F., Bauer, K., Reichert, F. and Gut, I.G. (2002) Facile method for auto-
mated genotyping of single nucleotide polymorphisms by mass spectrometry. Nucleic Acid Research
30, e22.
Sawkins, M.C., Farmer, A.D., Hoisington, D., Sullivan, J., Tolopko, A., Jiang, Z. and Ribaut, J.M. (2004)
Comparative Map and Trait Viewer (CMTV): an integrated bioinformatic tool to construct consen-
sus maps and compare QTL and functional genomics data across genomes and experiments. Plant
Molecular Biology 56, 465480.
Sax, K. (1923) The association of size differences with seed coat pattern and pigmentation in Phaseolus
vulgaris. Genetics 8, 552560.
692 References
Scarascia-Mugnozza, G.T. and Perrino, P. (2002) The history of ex situ conservation and use of plant genetic
resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing
Plant Genetic Diversity. International Plant Genetics Resources Institute (IPGRI), Rome, pp. 122.
Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb,
J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B. and Friend, S.H. (2003) Genetics of gene
expression surveyed in maize, mouse and man. Nature 422, 297302.
Schaeffer, M., Byrne, P. and Coe, E.H., Jr (2006) Consensus quantitative trait maps in maize: a database
strategy. Maydica 51, 357367.
Schauer, N. and Fernie, A.R. (2006) Plant metabolomics: towards biological function and mechanism.
Trends in Plant Science 11, 508516.
Scheuring, C., Barthelson, R., Gailbraith, D., Betran, J., Cothren, J.T., Zeng, Z.-B. and Zhang, H.-B. (2006)
Preliminary analysis of differential gene expression between a maize superior hybrid and its parents
using the 57K maize gene-specific long-oligonucleotide microarray. In: 48th Annual Maize Genetic
Conference, 912 March 2006, Pacific Grove, California, 132 pp.
Schmid, K.J., Rosleff Srensen, T., Stracke, R., Trjk, O., Altmann, T., Mithell-Olds, T. and Weisshaar, B.
(2003) Large-scale identification and analysis of genome wide single nucleotide polymorphisms for
mapping in Arabidopsis thaliana. Genome Research 13, 12501257.
Schmidt, R. (2002) Plant genome evolution: lessons from comparative genomics at the DNA level. Plant
Molecular Biology 48, 2137.
Schmierer, D.A., Kandemir, N., Kudrna, D.A., Jones, B.L., Ullrich, S.E. and Kleinhofs, A. (2004) Molecular
marker-assisted selection for enhanced yield in malting barley. Molecular Breeding 14, 463473.
Schn, C.C., Utz, H.F., Groh, S., Truberg, B., Openshaw, S. and Melchinger, A.E. (2004) Quantitative trait
locus mapping based on resampling in a vast maize testcrosses experiment and its relevance to
quantitative genetics for complex traits. Genetics 167, 485498.
Schranz, M.E., Song, B.-H., Windsor, A.J. and Mitchell-Olds, T. (2007) Comparative genomics in the
Brassicaceae: a family-wide perspective. Current Opinion in Plant Biology 10, 168175.
Schller, C., Backes, G., Fischbeck, G. and Jahoor, A. (1992) RFLP markers to identify the alleles on the Mla
locus conferring powdery mildew resistance in barley. Theoretical and Applied Genetics 84, 330338.
Schuster, S.C. (2008) Next-generation sequencing transforms todays biology. Nature Methods 5, 1618.
Schwarz, G., Herz, M., Huang, X.Q., Michalek, W., Jahoor, A., Wenzel, G. and Mohler, V. (2000) Application
of fluorescence-based semi-automated AFLP analysis in barley and wheat. Theoretical and Applied
Genetics 100, 545551.
Scott, K.D. (2001) Microsatellites derived from ESTs and their comparison with those derived by other
methods. In: Henry, R.J. (ed.) Plant Genotyping: the DNA Fingerprinting of Plants. CAB International,
Wallingford, UK, pp. 225237.
Searle, S.R. (1987) Linear Model for Unbalanced Data. John Wiley & Sons, New York.
Seaton, G., Haley, C.S., Knott, S.A., Kearsey, M. and Visscher, P.M. (2002) QTL Express: mapping quanti-
tative trait loci in simple and complex pedigrees. Bioinformatics 18, 339340.
Seitz, C., Vitten, M., Steinbach, P., Hartl, S., Hirsche, J., Rathje, W., Treutter, D. and Forkmann, G. (2007)
Redirection of anthocyanin synthesis in Osteospermum hybrida by a two-enzyme manipulation strat-
egy. Phytochemistry 68, 824833.
Seitz, G. (2005) The use of doubled haploids in corn breeding. In: Proceedings of the Forty First Annual
Illinois Corn Breeders School, 78 March 2005, Urbana-Champaign, Illinois. University of Illinois at
Urbana-Champaign, pp. 18.
Seki, M., Narusaka, M., Satou, M., Fujita, M., Sakurai, T., Oono, Y., Akiyama, T., Yamaguchi-Shinozaki,
K., Iida, K., Carninci, P., Ishisa, J., Kawai, J., Nakajima, M., Hayashizaki, Y., Enju, A. and Shinozaki,
K. (2005) Full-length cDNAs for the discovery and annotation of genes in Arabidopsis thaliana. In:
Leister, D. (ed.) Plant Functional Genomics. Food Products Press, Binghamton, New York, pp. 322.
Semagn, K., Bjrnstad, ., Skinnes, H., Mary, A.G., Tarkegne, Y. and William, M. (2006) Distribution of
DArT, AFLP and SSR markers in a genetic linkage map of a doubled-haploid hexaploid wheat popula-
tion. Genome 49, 545555.
Sen, S. and Churchill, G.A. (2001) A statistical framework for quantitative trait mapping. Genetics 159,
371387.
Septiningsih, E.M., Prasetiyono, J., Lubis, E., Tai, T.H., Tjubaryat, T., Moeljopawiro, S. and McCouch, S.R.
(2003) Identification of quantitative trait loci for yield and yield components in an advanced backcross
population derived from the Oryza sativa variety IR64 and the wild relative O. rufipogon. Theoretical
and Applied Genetics 107, 14191432.
References 693
Service, R.F. (2006) Gene sequencing. The race for the $1000 genome. Science 311, 15441546.
Servin, B., Martin, O.C., Mzard, M. and Hospital, F. (2004) Toward a theory of marker-assisted gene pyra-
miding. Genetics 168, 513523.
Sessions, A. Burke, E., Presting, G., Aux, G., McElver, J., Patton, D., Dietrich, B., Ho, P., Bacwaden, J.,
Ko, C., Clarke, J.D., Cotton, D., Bullis, D., Snell, J., Miguel, T., Hutchison, D., Kimmerly, B., Mitzel,
T., Katagiri, F., Glazebrook, J., Law, M. and Goff, S.A. (2002) A high-throughput Arabidopsis reverse
genetics system. The Plant Cell 14, 29852994.
Setimela, P., Chitalu, Z., Jonazi, J., Mambo, A., Hodson, D. and Bnziger, M. (2005) Environmental clas-
sification of maize-testing sites in the SADC region and its implication for collaborative maize breeding
strategies in the subcontinent. Euphytica 145, 123132.
Sham, P., Bader, J.S., Craig, I., ODonovan, M. and Owen, M. (2002) DNA pooling: a tool for large-scale
association studies. Nature Reviews Genetics 3, 862871.
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and
Ideker, T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction
networks. Genome Research 13, 24982504.
Sharopova, N., McMullen, M.D., Schultz, L., Schroeder, S., Sanchez-Villeda, H., Gardiner, J., Bergstrom,
D., Houchins, K., Melia-Hancock, S., Musket, T., Duru, N., Polacco, M., Edwards, K., Ruff, T., Register,
J.C., Brouwer, C., Thompson, R., Velasco, R., Chin, E., Lee, M., Woodman-Clikeman, W., Long, M.J.,
Liscum, E., Cone, K., Davis, G. and Coe, E.H., Jr (2002) Development and mapping of SSR markers
for maize. Plant Moelcular Biology 48, 463481.
Shatskaya, O.A., Zabirova, E.R., Shcherbak, V.S. and Chumak, M.V. (1994) Mass induction of maternal
haploids in corn. Maize Genetics Cooperation Newsletter 68, 51.
Shen, J.H., Li, M.F., Chen, Y.Q. and Zhang, Z.H. (1982) Breeding by anther culture in rice improvement.
Scientia Agricultura Sinica 2,1519.
Shen, L., Courtois, B., McNally, K.L., Robin, S. and Li, Z. (2001) Evaluation of near-isogenic lines of rice
introgressed with QTLs for root depth through marker-aided selection. Theoretical and Applied
Genetics 103, 7583.
Shen, Y.-J., Jiang, H., Jin, J.-P., Zhang, Z.-B., Xi, B., He, Y.-Y., Wang, G., Wang, C., Qian, L., Li, X., Yu, Q.-B., Liu,
H.-J., Chen, D.-H., Gao, J.-H., Huang, H., Shi, T.-L. and Yang, Z.-N. (2004) Development of genome-wide
DNA polymorphism database for map-based cloning of rice genes. Plant Physiology 135, 11981205.
Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nature Biotechnology 26, 11351145.
Shi, Y., Wang, T., Li, Y. and Darmency, H. (2008) Impact of transgene inheritance on the mitigation of gene
flow between crops and their wild relatives: the example of foxtail millet. Genetics 180, 969975.
Shibata, D. and Liu, Y.G. (2000) Agrobacterium-mediated plant transformation with large DNA fragments.
Trends in Plant Science 5, 354357.
Shimamoto, K. and Kyozuk, J. (2002) Rice as a model for comparative genomics of plants. Annual Review
of Plant Biology 53, 399419.
Shin, B.K., Wang, H., Yim, A.M., Naour, F.L., Brichory, F., Jang, J.H., Zhao, R., Puravs, E., Tra, J., Michael,
C.W., Misek, D.E. and Hanash, S.M. (2003) Global profiling of the cell surface proteome of cancer
cells uncovers an abundance of proteins with chaperone function. Journal of Biological Chemistry
278, 76077616.
Shizuya, H., Birren, B., Kim, U., Mancino, V., Slepak, T., Tachiiri, Y. and Simon, M. (1992) Cloning and stable
maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-
based vector. Proceedings of the National Academy of Sciences of the United States of America 89,
87948797.
Shoemaker, J.S., Painter, I.S. and Weir, B.S. (1999) Bayesian statistics in genetics. A guide for the uniniti-
ated. Trends in Genetics 15, 354358.
Shrawat, A.K. and Lrz, H. (2006) Agrobacterium-mediated transformation of cereals: a promising approach
crossing barriers. Plant Biotechnology Journal 4, 575603.
Shuber, A. and Pierceall, W. (2002) Methods for detecting nucleotide insertion or deletion using primer
extension. Patent EP 1203100.
Shull, G.H. (1908) The composition of a field of maize. American Breeders Association Report 4, 296301.
Siepel, A., Farmerm A., Tolopko, A., Zhuang, M, Mendes, P., Beavis, W. and Sobral, B. (2001) ISYS: a
decentralized, component-based approach to the integration of heterogeneous bioinformatic
resources. Bioinformatics 17, 8394.
Sillanp, M.J. and Arjas, E. (1998) Bayesian mapping of multiple quantitative trait loci from incomplete
inbred line cross data. Genetics 148, 13731388.
694 References
Sillanp, M.J. and Arjas, E. (1999) Bayesian mapping of multiple quantitative trait loci from incomplete
outbred offspring data. Genetics 151, 16051619.
Sillanp, M.J. and Bhattacharjee, M. (2005) Bayesian association-based fine mapping in small chromo-
somal segments. Genetics 169, 427439.
Sillanp, M.J. and Corander, J. (2002) Model choice in gene mapping: what and why. Trends in Genetics
18, 301307.
Silver, J. (1985) Confidence limits for estimates of gene linkage based on analysis of recombinant inbred
strains. Journal of Heredity 76, 436440.
Simko, I., Costanzo, S., Haynes, K.G., Christ, B.J. and Jones, R.W. (2004a) Linkage disequilibrium map-
ping of a Verticillium dahliae resistance quantitative trait locus in tetraploid potato (Solanum tubero-
sum) through a candidate gene approach. Theoretical and Applied Genetics 108, 217224.
Simko, I., Haynes, K.G., Ewing, E.E., Costanzo, S., Christ, B.J. and Jones, R.W. (2004b) Mapping
genes for resistance to Verticillium albo-atrum in tetraploid and diploid potato populations using
haplotype association tests and genetic linkage analysis. Molecular Genetics and Genomics 271,
522531.
Simmonds, N.W. (1979) Principles of Crop Improvement. Longman, London.
Simmonds, N.W. (1982) The context of the workshop. In: Withers, L.A. and Williams, J.T. (eds) Crop Genetic
Resources the Conservation of Difficult Material. IUBS Series B42, International Union of Biological
Sciences/International Board for Plant Genetic Resources/International Genetic Federation, Paris,
pp. 13.
Singh, M., Ceccarelli, S. and Grando, S. (1999) Genotype environment interaction of crossover type:
detecting its presence and estimating the crossover point. Theoretical and Applied Genetics 99,
988995.
Singh, R.P., Rajaram, S., Miranda, A., Huerta-Espino, J. and Autrique, E. (1998) Comparison of two cross-
ing and four selection schemes for yield, yield traits and slow rusting resistance to leaf rust in wheat.
Euphytica 100, 3543.
Singla-Pareek, S.L., Reddy, M.K. and Sopory, S.K. (2003) Genetic engineering of the glyoxalase pathway
in tobacco leads to enhanced salinity tolerance. Proceedings of the National Academy of Sciences of
the United States of America 100, 1467214677.
Sinha, S.K. and Swaminathan, M.S. (1984) New parameters and selection criteria in plant breeding. In:
Vose, P.B. and Blixt, S.G. (eds) Crop Breeding, a Contemporary Basis. Pergamon Press, Oxford, UK.
Siripoonwiwat, W. (1995) Application of restriction fragment length polymorphism (RFLP) markers in the
analysis of chromosomal regions associated with some quantitative traits for hexaploid oat improve-
ment. MS thesis, Cornell University, Ithaca, New York.
Sivamani, E., Huet, H., Shen, P., Ong, C.A., DeKochko, A., Fauquet, C.M. and Beachy, R.N. (1999) Rice
plants (Oryza sativa L.) containing three rice tungro spherical virus (RTSV) coat protein transgenes
are resistant to virus infection. Molecular Breeding 5, 177185.
Skinner, D.Z., Muthukrishnan, S. and Liang, G.H. (2004) Transformation: a powerful tool for crop improve-
ment. In: Liang, G.H. and Skinner, D.Z. (eds) Genetically Modified Crops: Their Development, Uses
and Risks. Food Products Press, Binghamton, New York, pp. 116.
Skol, A.D., Scott, L.J., Abecasis, G.R. and Boehnke, M. (2006) Joint analysis is more efficient than replica-
tion-based analysis for two-stage genome-wide association studies. Nature Genetics 38, 209213.
Slater, S., Mitsky, T.A., Houmiel, K.L., Hao, M., Reiser, S.E., Taylor, N.B., Tran, M., Valentin, H.E., Rodriguez,
D.J., Stone, D.A., Padgette, S.R., Kishore, G. and Gruys, K.J. (1999) Metabolic engineering of
Arabidopsis and Brassica for poly(3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer production.
Nature Biotechnology 17, 10111016.
Slatkin, M. (1985) Gene flow in natural populations. Annual Review of Ecology and Systematics 16,
393430.
Smith, D., Yanai, Y., Lui, Y.-G., Ishiguro, S., Okada, K., Shibata, D., Whitter, R.F. and Fedoroff, N.V. (1996)
Characterization and mapping of Ds-GUS-T-DNA lines for targeted insertional mutagenesis. The
Plant Journal 10, 721732.
Smith, G.D. and Egger, M. (1998) Meta-analysis bias in location and selection of studies. BMJ 317,
625629.
Smith, H.F. (1936) A discriminant function for plant selection. Annals of Eugenics 7, 240250.
Smith, J.S.C. (1986) Genetic diversity within the corn belt dent racial complex of maize (Zea mays L.).
Maydica 21, 349367.
Smith, J.S.C. and Smith, O.S. (1992) Fingerprinting crop varieties. Advances in Agronomy 47, 85140.
References 695
Smith, M.E., Coffman, W.R. and Barker, T.C. (1990) Environmental effects on selection under high and
low input conditions. In: Kang, M.S. (ed.) Genotype-By-Environment Interactions and Plant Breeding.
Louisiana State University Agriculture Center, Baton Rouge, Louisiana, pp. 261272.
Smith, O.S., Smith, J.S.C., Bowen, S.L., Tenborg, R.A. and Wall, S.J. (1990) Similarities among a group
of elite maize inbreds as measured by pedigree, F1 grain yield, grain yield heterosis and RFLPs.
Theoretical and Applied Genetics 80, 833840.
Smith, O.S., Smith, J.S.C., Bowen, S.L. and Tenborg, R.A. (1991) Numbers of RFLP probes necessary to
show associations between lines. Maize Genetics Newsletter 65, 66.
Smith, O.S., Hoard, K., Shaw, F. and Shaw, R. (1999) Prediction of single-cross performance. In: Coors,
J.G. and Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops. American Society of
Agronomy (ASA) and Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 277285.
Smith, S. and Beavis, W. (1996) Molecular marker assisted breeding in a company environment. In: Sobral,
B.W.S. (ed.) The Impact of Plant Molecular Genetics. Birkhuer, Boston, Massachusetts, pp. 259272.
Smith, S. and Helentjaris, T. (1996) DNA fingerprinting and plant variety protection. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 95110.
Sneath, P. and Sokal, R.R. (1973) Numerical Taxonomy, 2nd edn. W.H. Freeman, San Francisco,
California.
Sobral, B.W.S. (2002) The role of bioinformatics in germplasm conservation and use. In: Engels, J.M.M.,
Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic Diversity.
International Plant Genetics Resources Institute (IPGRI), Rome, pp. 171178.
Sobral, B.W.S., Waugh, M. and Beavis W. (2001) Information systems approaches to support discovery
in agricultural genomics. In: Phillips, R.L. and Vasil, I.K. (eds) DNA-based Markers in Plants. Kluwer
Academic Publishers, Dordrecht, Netherlands.
Sobrino, B., Briona, M. and Carracedoa, A. (2005) SNPs in forensic genetics: a review on SNP typing
methodologies. Forensic Science International 154, 181194.
Sobrizal, K., Ikeda, K., Sanchez, P.L., Doi, K., Angeles, E.R., Khush, G.S. and Yoshimura, A. (1999)
Development of Oryza glumaepatulla introgression lines in rice, O. sativa L. Rice Genetics Newsletter
16, 107.
Sokal, R.R. (1986) Phenetic taxonomy: theory and methods. Annual Review of Ecological Systems 17,
423442.
Soller, M. and Beckmann, J.S. (1990) Marker-based mapping of quantitative trait loci using replicated pro-
genies. Theoretical and Applied Genetics 80, 205208.
Somers, D.J., Isaac, P. and Edwards, K. (2004) High-density microsatellite consensus map for bread wheat
(Triticum aestivum L.). Theoretical and Applied Genetics 109, 11051114.
Song, J., Bradeen, J.M., Naess, S.K., Raasch, J.A., Wielgus, S.M., Haberlach, G.T., Liu, J., Austin-Phillips,
S., Buell, C.R., Helgeson, J.P. and Jiang, J. (2003) Gene RB cloned from Solanum bulbocastanum
confers broad spectrum resistance to potato late blight. Proceedings of the National Academy of
Sciences of the United States of America 100, 91289133.
Song, R. and Messing, J. (2003) Gene expression of a gene family in maize based on noncollinear hap-
lotypes. Proceedings of the National Academy of the Sciences of the United States of America 100,
90559060.
Song, R., Llaca, V. and Messing, J. (2002) Mosaic organization of orthologous sequences in grass genome.
Genome Research 12, 15491555.
Sopory, S. and Munshi, M. (1996) Anther culture. In: Jain, S.M., Sopory, S.K. and Vielleux, R.E. (eds) In
Vitro Haploid Production in Higher Plants, Vol. 1.Kluwer Academic Publisher, Dordrecht, Netherlands,
pp. 145176.
Sorensen, D. and Gianola, D. (2002) Likelihood, Bayesian and MCMC Methods in Quantitative Genetics.
Springer-Verlag Inc., New York.
Sorrells, M.E. and Wilson, W.A. (1997) Direct classification and selection of superior alleles for crop
improvement. Crop Science 37, 691697.
Sorrells, M.E., La Rota, M., Bermudez-Kandianis, C.E., Greene, R.A., Kentety, R., Munkvold, J.D.,
Miftahudin, Mahmoud, A., Ma, X.F., Gustafson, P.J., Qi, L.L., Echalier, B., Gill, B.S., Matthews, D.E.,
Lazo, G.R., Chao, S., Anderson, O.D., Edwards, H., Linkiewicz, A.M., Dubcovsky, J., Akhunov, E.D.,
Dvorak, J., Zhang, D., Nguyen, H.T., Peng, J., Lapitan, N.L.V., Gonzalez-Hernandez, J.L., Anderson,
J.A., Hossain, K., Kalavacharla, V., Kianian, S.F., Choi, D.-W., Close, T.J., Dilbirligi, M., Gill, K.S.,
Steber, C., Walker-Simmons, M.K., McGuire, P.E. and Qualset, C.Q. (2003) Comparative DNA
sequence analysis of wheat and rice genomes. Genome Research 13, 18181827.
696 References
Sourdille, P., Singh, S., Cadalen, T., Brown-Guedira, G.L., Gay, G., Qi, L., Gill, B.S., Dufour, P., Murigneux,
A. and Bernard, M. (2004) Microsatellite-based deletion bin system for the establishment of genetic-
physical map relationships in wheat (Triticum aestivum L.). Functional and Integrative Genomics 4,
1225.
Southern, E.M. (1975) Detection of specific sequences among DNA fragments separated by gel electro-
phoresis. Journal of Molecular Biology 98, 503517.
Spielman, D., Cohen, J. and Zambrano, P. (2006) Will agbiotech applications reach marginalized farmers?
Evidence from developing countries. AgBioForum 9, 2330.
Spielman, R.S., McGinnis, R.E. and Ewens, W.J. (1993) Transmission test for linkage disequilibrium: the
insulin gene region and insulin-dependent diabetes mellitus (IDDM). American Journal of Human
Genetics 52, 506516.
Spooner, D., van Treuren, R. and de Vicente, M.C. (2005) Molecular markers for genebank management.
IPGRI Technical Bulletin No. 10. Available at: http://www.ipgri.cgiar.org/publications/pdf/1082.pdf
(accessed 30 June 2007).
Sprague, G.F. and Tatum, L.A. (1942) General vs. specific combining ability in single crosses of corn.
Journal of American Society of Agronomy 34, 923932.
Sprague, G.F., Russell, W.A., Penny, L.H. and Horner, T.W. (1962) Effects of epistasis on grain yield of
maize. Crop Science 2, 205208.
Springer, P.S. (2000) Gene traps: tools for plant development and genomics. The Plant Cell 12,
10071020.
Stadler, L.J. (1928) Mutations in barley induced by X-rays and radium. Science 68, 186187.
Stam, P. (1991) Some aspects of QTL analysis. Proceedings of the Eighth Meeting of the Eucarpia Section
Biometrics on Plant Breeding, 16 July 1991, Brno, Czechoslovakia, pp. 2432.
Stam, P. (1993) Construction of integrated genetic linkage maps by means of a new computer package:
JoinMap. The Plant Journal 3, 739744.
Stam, P. (1995) Marker-assisted breeding. In: Van Ooijen, J.W. and Jansen, J. (eds) Biometrics in Plant
Breeding: Applications of Molecular Markers. Proceedings of the 9th Meeting of EUCARPIA Section
on Biometrics in Plant Breeding (1994). Centre for Plant Breeding and Reproduction Research,
Wageningen, Netherlands, pp. 3244.
Stam, P. (2003) Marker-assisted introgression: speed at any cost? In: van Hintum, Th.J.L., Lebeda, A.,
Pink, D. and Schut, J.W. (eds) Proceedings of the Eucarpia Meeting on Leafy Vegetables Genetics
and Breeding, 1921 March 2003, Noordwijkerhout, Netherlands. Centre for Genetic Resources
(CGN), Wageningen, Netherlands, pp. 117124.
Stam, P. and Zeven, A.C. (1981) The theoretical proportion of the donor genome in near-isogenic lines of
self-fertilizers bred by backcrossing. Euphytica 30, 227238.
Stamatoyannopoulos, J.A. (2004) The genomics of gene expression. Genomics 84, 449457.
Stanford, J.C. (2000) The development of the biolistic process. In Vitro Cellular and Developmental
Biology Plant 36, 303308.
Stanford, J.C., Klein, T.M., Wolf, E.D. and Allen, N. (1987) Delivery of substances into cells and tissues
using a particle bombardment process. Particulate Science and Technology 5, 2737.
Staub, J.E. (1999) Intellectual property rights, genetic markers and the hybrid seed production. Journal of
New Seeds 1, 3964.
Stebbins, G.L. (1957) Self fertilization and population variability in the higher plants. American Nature 91,
337354.
Stebbins, G.L. (1970) Adaptive radiation of reproductive characteristics in angiosperms: I. Pollination
mechanisms. Annual Review of Ecology and Systematics 1, 307326.
Steele, K.A., Price, A.H., Shashidhar, H.E. and Witcombe, J.R. (2006) Marker-assisted selection to intro-
gress rice QTL controlling root traits into an Indian upland rice variety. Theoretical and Applied
Genetics 112, 208221.
Stein, L. (2001) Genome annotation: from sequence to biology. Nature Reviews Genetics 2, 493503.
Stein, L.D. (2002) Creating a bioinformatics nation. Nature 417, 119120.
Stein, L.D. (2003) Integrating biological databases. Nature Reviews Genetics 4, 337345.
Stein, N., Perovic, D., Kumlehn, J., Pellio, B., Stracke, S., Streng, S., Ordon, E. and Graner, A. (2005)
The eukaryotic translation initiation factor 4E confers multiallelic recessive Bymovirus resistance in
Hordeum vulgare (L.). The Plant Journal 42, 912922.
Stelly, D.M., Lee, J.A. and Rooney, W.L. (1988) Proposed schemes for mass-extraction of doubled haploids
of cotton. Crop Science 28, 885890.
References 697
Sterling, T.D. (1959) Publication decision and their possible effects on inferences drawn from tests of sig-
nificance or vice versa. Journal of the American Statistical Association 54, 3034.
Stich, B. and Melchinger, A.E. (2009) Comparison of mixed-model approaches for association mapping in
rapeseed, potato, sugar beet, maize, and Arabidopsis. BMC Genomics 10, 94.
Stich, B., Melchinger, A.E., Piepho, H.-P., Heckenberger, M., Maurer, H.P. and Reif, J.C. (2006) A new test
for family-based association mapping using inbred lines from plant breeding programs. Theoretical
and Applied Genetics 113, 11211130.
Stich, B., Yu, J., Melchinger, A.E., Piepho, H.P., Utz, H.F., Maurer, H.P. and Buckler, E.S. (2007) Power
to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy.
Genetics 176, 563570.
Stich, B., Mhring, J., Piepho, H.-P., Heckenberger, M., Buckler, E.S. and Melchinger, A.E. (2008)
Comparison of mixed-model approaches for association mapping. Genetics 178, 17451754.
Stitt, M. and Fernie, A.R. (2003) From measurements of metabolites to metabolomics: an on the fly per-
spective illustrated by recent studies of carbonnitrogen interactions. Current Opinion in Biotechnology
14, 136144.
Stoyanova, S.D. (1991) Genetic shifts and variations of gliadins induced by seed aging. Seed Science and
Technology 19, 363371.
Stratton, D.A. (1998) Reaction norm functions and QTL-environments for flowering time in Arabidopsis
thaliana. Heredity 81, 144155.
Stuber, C.W. (1992) Biochemical and molecular markers in plant breeding. Plant Breeding Reviews 9,
3761.
Stuber, C.W. (1994a) Breeding multigenic traits. In: Phillips, R.L. and Vasil, I.K. (eds) DNA Based Markers
in Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 97115.
Stuber, C.W. (1994b) Heterosis in plant breeding. Plant Breeding Reviews 12, 227251.
Stuber, C.W. (1995) Mapping and manipulating quantitative traits in maize. Trends in Genetics 11,
477481.
Stuber, C.W. (1999) Biochemistry, molecular biology and physiology of heterosis. In: Coors, J.G. and
Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops. American Society of Agronomy
(ASA) and Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 173184.
Stuber, C.W. and Moll, R.H. (1972) Frequency changes of isozyme alleles in a selection experiment for
grain yield in maize (Zea mays L.). Crop Science 12, 337340.
Stuber, C.W. and Sisco, P.H. (1991) Marker-facilitated transfer of QTL alleles between elite inbred lines and
responses of hybrids. Proceedings of 46th Annual Corn and Sorghum Industry Research Conference
46, 104113.
Stuber, C.W., Moll, R.H., Goodman, M.M., Schaffer, H.E. and Weir, B.S. (1980) Allozyme frequency
changes associated with selection for increased grain yield in maize (Zea mays). Genetics 95,
225336.
Stuber, C.W., Goodman, M.M. and Moll, R.H. (1982) Improvement of yield and ear number resulting from
selection at allozyme loci in a maize population. Crop Science 22, 737740.
Stuber, C.W., Lincoln, S.E., Wolff, D.W., Helentjaris, T. and Lander, E.S. (1992) Identification of genetic fac-
tors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers.
Genetics 132, 823839.
Stuber, C.W., Polacco, M. and Senior, M.L. (1999) Synergy of empirical breeding, marker-assisted selec-
tion and genomics to increase crop yield potential. Crop Science 39, 15711583.
Stuper, R.M. and Springer, N.M. (2006) Cis-transcriptional variation in maize inbred lines B73 and Mo17
lead to additive expression patterns in the F1 hybrid. Genetics 173, 21992210.
Subrahmanyam, N.C. and Kasha, K.J. (1975) Chromosome doubling of barley haploids by nitrous oxide
and colchicine treatment. Canadian Journal of Genetics and Cytology 17, 573583.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A.,
Pomeroy, S.L., Golub, T.R., Lander, E.S. and Mesirov, J.P. (2005) Gene set enrichment analysis: a
knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the
National Academy of Sciences of the United States of America 102, 1554515550.
Sughrou, J.R. and Rockeford, T.R. (1994) Restriction fragment length polymorphism differences among the
Illinois long-term selection oil strains. Theoretical and Applied Genetics 87, 916924.
Sugita, K., Kasahara, T., Matsunaga, E. and Ebinuma, H. (2000) A transformation vector for the produc-
tion of marker-free transgenic plants containing a single copy transgene at high frequency. The Plant
Journal 22, 461469.
698 References
Sullivan, S.N. (2004) Plant genetic resources and the law: past, present and future. Crop Science 135, 1015.
Sumner, L.W., Mendes, P. and Dixon, R.A. (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62, 817836.
Sun, D.J., He, Z.H., Xia, X.C., Zhang, L.P., Morris, C., Appels, R., Ma, W. and Wang, H. (2005) A novel STS
marker for polyphenol oxidase activities in bread wheat. Molecular Breeding 16, 209218.
Sun, Q.X., Huang, T.C., Ni, Z.F. and Procunier, D.J. (1996) Studies on heterotic grouping in wheat: I.
Genetic diversity between varieties revealed by RAPD. Journal of Agricultural Biotechnolgy (China)
4, 103110.
Sun, Q.X., Wu, L.M., Ni, Z.F., Meng, F.R., Wang, Z.K. and Lin, Z. (2004) Differential gene expression pat-
terns in leaves between hybrids and their parental inbreds are correlated with heterosis in a diallelic
cross. Plant Science 166, 651657.
Sun, Y., Wang, J., Crouch, J.H. and Xu, Y. (2009) Efficiency of selective genotyping for complex traits and
its innovative use in genetics and plant breeding. Molecular Breeding (in press)
Sundaresan, V., Springer, P., Volpe, T., Haward, S., Jones, J.D., Dean, C., Ma, H. and Martienssen, R.
(1995) Patterns of gene action in plant development revealed by enhancer trap and gene trap trans-
posable elements. Genes and Development 9, 17971810.
Suter, B., Kittanakom, S. and Stagljar, I. (2008) Two-hybrid technologies in proteomics research. Current
Opinion in Biotechnology 19, 316323.
Suzuki, Y., Uemura, S., Saito, Y., Murofushi, N., Schmitz, G., Theres, K. and Yamaguchi, I. (2001) A novel
transposon tagging element for obtaining gain-of-function mutants based on a self-stablizing Ac deriv-
ative. Plant Molecular Biology 45, 123131.
Swaminathan, M.S. (2006) An evergreen revolution. Crop Science 46, 22932303.
Swaminathan, M.S. (2007) Can science and technology feed the world in 2025? Field Crops Research
104, 39.
Swaminathan, M.S. and Singh, M.P. (1958) X-ray induced somatic haploidy in watermelon. Current Science
27, 6364.
Swanson-Wagner, R.A., Jia, Y., DeCook, R., Borsuk, L.A., Nettleton, D. and Schnable, P.S. (2006) All pos-
sible modes of gene action are observed in a global comparison of gene expression in a maize F1
hybrid and its inbred parents. Proceedings of the National Academy of Sciences of the United States
of America 103, 68056810.
Syvnen, A.-C. (1999) From gels to chips: minisequencing primer extension for analysis of point mutations
and single nucleotide polymorphisms. Human Mutation 13, 110.
Syvnen, A.-C. (2001) Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature
Reviews Genetics 2, 930942.
Syvnen, A.-C. (2005) Toward genome-wide SNP genotyping. Nature Genetics 37, S5S10.
Syvnen, A.-C., Aalto-Setala, K., Harju, L., Kontula, K. and Soderlund, H. (1990) A primer-guided nucle-
otide incorporation assay in the genotyping of apolipoprotein E. Genomics 8, 684692.
Szalma, S.J., Hostert, B.M., LeDeaux, J.R., Stuber, C.W. and Holland, J.B. (2007) QTL mapping with near-
isogenic lines in maize. Theoretical and Applied Genetics 114, 12111228.
Szarejko, I. and Forster, B.P. (2007) Doubled haploidy and induced mutation. Euphytica 158, 359370.
Tabashnik, B.E., Gassmann, A.J., Crowder, D.W. and Carrire, Y. (2008) Insect resistance to Bt crops:
evidence verus theory. Nature Biotechnology 26, 199202.
Taberner, A., Dopazo, J. and Castaaera, P. (1997) Genetic characterization of populations of a de novo
arisen sugar beet pest, Aubeonymus mariaefranciscae (Coleopteram Curculionidae), by RAPD anal-
ysis. Journal of Molecular Evolution 45, 2431.
Tai, G.C.C. (1971) Genotypic stability analysis and its application to potato regional trials. Crop Science
11, 184190.
Taji, A., Kumar, P.P. and Lakshmann, P. (2002) In Vitro Plant Breeding. Food Products Press, Binghamton,
New York, 167 pp.
Takahashi, Y., Shomura, A., Sasaki, T. and Yano, M. (2001) Hd6, a rice quantitative trait locus involved in
photoperiod sensitivity, encodes the alpha subunit of protein kinase CK2. Proceedings of the National
Academy of Sciences of the United States of America 98, 79227927.
Talbot, C.J., Nicod, A., Cherny, S.S., Fulker, D.W., Collins, A.C. and Flint, J. (1999) High-resolution mapping
of quantitative trait loci in outbred mice. Nature Genetics 21, 305308.
Tan, Y.F., Li, J.X., Yu, S.B., Xing, Y.Z., Xu, C.G. and Zhang, Q. (1999) The three important traits for cooking
and eating quality of rice grain are controlled by a single locus in an elite rice hybrid, Shanyou 63.
Theoretical and Applied Genetics 99, 642648.
References 699
Tang, G.L., Reinhart, B.J., Bartel, D.P. and Zamore, P.D. (2003) A biochemical framework for RNA silencing
in plants. Genes and Development 17, 4963.
Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M. and Paterson, A.H. (2008) Synteny and collinearity in
plant genomes. Science 320, 486488.
Tanksley, S.D. (1983) Molecular markers in plant breeding. Plant Molecular Biology Reporter 1, 13.
Tanksley, S.D. (1993) Mapping polygenes. Annual Review of Genetics 27, 205233.
Tanksley, S.D. and McCouch, S.R. (1997) Seed banks and molecular maps: unlocking genetic potential
from the wild. Science 277, 10631066.
Tanksley, S.D. and Nelson, J.C. (1996) Advanced backcross QTL analysis: a method for the simultaneous
discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding. Theoretical
and Applied Genetics 92, 191203.
Tanksley, S.D. and Rick, C.M. (1980) Isozyme gene linkage map of the tomato: applications in genetics and
breeding. Theoretical and Applied Genetics 57, 161170.
Tanksley, S.D., Miller, J., Paterson, A. and Bernatzky, R. (1988) Molecular mapping of plant chromosomes. In:
Gustafson, J.P. and Appels, R. (eds) Chromosome Structure and Function Impact of New Concepts.
Proceedings of the 18th Stadller Genetics Symposium. Plenum Press, New York, pp. 157173.
Tanksley, S.D., Young, N.D., Paterson, A.H. and Bonierbale, M.W. (1989) RFLP mapping in plant breeding:
new tools for an old science. Bio/Technology 7, 257263.
Tanksley, S.D., Ganal, M.W. and Martin, G.B. (1995) Chromosome landing: a paradigm for map based gene
cloning in plants with large genomes. Trends in Genetics 11, 6368.
Tanksley, S.D., Grandillo, S., Fulton, T.M., Zamir, D., Eshed, Y., Petiard, V., Lopez, J. and Beck-Bunn, T.
(1996) Advanced backcross QTL analysis in a cross between an elite processing line of tomato and
its wild relative L. pimpinellifolium. Theoretical and Applied Genetics 92, 213224.
Tao, Q. and Zhang, H.-B. (1998) Cloning and stable maintenance of DNA fragments over 300 kb in
Escherichia coli with conventional plasmid-based vectors. Nucleic Acids Research 26, 49014909.
Tarchini, R., Biddle, P., Wineland, R., Tingey, S. and Rafalski, A. (2000) The complete sequence of 340kb
of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4.
The Plant Cell 12, 381391.
Tardieu, F. (2003) Virtual plants: modeling as a tool for the genomics of tolerance to water deficit. Trends in
Plant Science 8, 914.
Tauz, D. and Renz, M. (1984) Simple sequences are ubiquitous repetitive components of eukaryotic
genomes. Nucleic Acids Research 12, 41274138.
Taylor, B.A. (1978) Recombinant inbred strains: use in gene mapping. In: Morse, H.C. (ed.) Origin of Inbred
Mice. Academic Press, New York, pp. 423438.
Tekeoglu, M., Rajesh, P.N. and Muehlbauer, F.J. (2002) Integration of sequence tagged microsatellites to
the chickpea genetic map. Theoretical and Applied Genetics 105, 847854.
Temnykh, S., Park, W.D., Ayres, N., Cartinhour, S., Hauck, N., Lipovich, L., Cho, Y.G., Ishii, T. and McCouch,
S.R. (2000) Mapping and genome organization of microsatellite sequences in rice (Oryza sativa L.).
Theoretical and Applied Genetics 100, 697712.
Tenaillon, M.I., Sawkins, M.C., Long, A.D., Gaut, R.L., Doebley, J.F. and Gaut, B.S. (2001) Patterns of DNA
sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proceedings of the
National Academy of Sciences of the United States of America 98, 91619166.
Tenhola-Roininen, T., Immonen, S. and Tanhuanp, P. (2006) Rye doubled haploids as a research and
breeding tool a practical point of view. Plant Breeding 125, 584590.
Terada, R., Urawa, H., Inagaki, Y., Tsugane, K. and Iida, S. (2002) Efficient gene targeting by homologous
recombination in rice. Nature Biotechnology 20, 10301034.
Tessier, D.C., Arbour, M., Benoit, F., Hogues, H. and Rigby, T. (2005) A DNA microarray fabrication strat-
egy for research laboratories. In: Sensen, C.W. (ed.) Handbook of Genome Research. Genomics,
Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues. WILEY-VCH, Weinheim,
Germany, pp. 223238.
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant
Arabidopsis thaliana. Nature 408, 796815.
Therneau, T.M. and Grambsch, P.M. (2000) Modeling Survival Data: Extending the Cox Model. Springer,
New York.
Thiel, T., Michalek, W., Varshney, R.K. and Graner, A. (2003) Exploiting EST data bases for the develop-
ment and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical
and Applied Genetics 106, 411422.
700 References
Tyo, K.E., Alper, H.S. and Stephanopoulos, G.N. (2007) Expanding the metabolic engineering toolbox:
more options to engineer cells. Trends in Biotechnology 25, 132137.
Tzfira, T. and Citovsky, V. (2006) Agrobacterium-mediated genetic transformation of plants: biology and
biotechnology. Current Opinion in Biotechnology 17, 147154.
Tzfira, T., Tian, G.W., Lacroix, B., Vyas, S., Li, J., Leitner-Dagan, Y., Krichevsky, A., Taylor, T., Vainstein, A.
and Citovsky, V. (2005) pSAT vectors: amodular series of plasmids for autofluorescent protein tagging
and expression of multiple genes in plants. Plant Molecular Biology 57, 503516.
Tzfira, T., Kozlovsky, S.V. and Vitaly Citovsky, V. (2007) Advanced expression vector systems: new weapons
for plant research and biotechnology. Plant Physiology 145, 10871089.
Ufaz, S. and Galili, G. (2008) Improving the content of essential amino acids in crop plants: goals and
opportunities. Plant Physiology 147, 954961.
Uga, Y., Fukuta, Y., Cai, H.W., Iwata, H., Ohsawa, R., Morishima, H. and Fujimura, T. (2003) Mapping QTLs
influencing rice floral morphology using recombinant inbred lines derived from a cross between Oryza
sativa L. and Oryza rufipogon Griff. Theoretical and Applied Genetics 107, 218226.
Uga, Y., Nonoue, Y., Liang, Z.W., Lin, H.X., Yamamoto, S., Yamanouchi, U. and Yano, M. (2007) Accumulation
of additive effects generates a strong photoperiod sensitivity in the extremely late-heading rice cultivar
Nona Bokra. Theoretical and Applied Genetics 114, 14571466.
Ukai, Y., Osawa, R., Saito, A. and Hayashi, T. (1995) MAPL: a package of computer programs for construc-
tion of DNA polymorphism linkage maps and analysis of QTL (in Japanese). Breeding Science 45,
139142.
Ulloa, M., Saha, S., Jenkins, J.N., Meredith, W.R., Jr, McCarty, J.C., Jr and Stelly, D.M. (2005) Chromosomal
assignment of RFLP linkage groups harboring important QTLs on an intraspecific cotton (Gossypium
hirsutum L.) joinmap. Journal of Heredity 96, 132144.
Ungerer, M.C., Halldorsdottir, S.S., Purugganan, M.D. and Mackay, T.F.C. (2003) Genotypeenvironment
interactions at quantitative trait loci affecting inflorescence development in Arabidopsis thaliana.
Genetics 165, 353365.
nl, M., Morgan, M.E. and Minden, J.S. (1997) Difference gel electrophoresis: a single gel method for
detecting changes in protein extracts. Electrophoresis 18, 20712077.
Upadhyaya, H.D. and Ortiz, R. (2001) A mini core subset for capturing diversity and promoting utiliza-
tion of chickpea genetic resources in crop improvement. Theoretical and Applied Genetics 102,
12921298.
Upadhyaya, H.D., Bramel, P.J., Ortiz, R. and Singh, S. (2002) Developing a mini core of peanut for utiliza-
tion of genetic resources. Crop Science 42, 21502156.
Upadhyaya, H.D., Gowda, C.L.L., Pundir, R.P.S., Reddy, V.G. and Singh, S. (2006a) Development of core
subset of fingermillet germplasm using geographical origin and data on 14 quantitative traits. Genetic
Resources and Crop Evolution 53, 679685.
Upadhyaya, H.D., Reddy, L.J., Gowda, C.L.L., Reddy, K.N. and Singh, S. (2006b) Development of a mini
core subset for enhanced and diversified utilization of pigeonpea germplasm resources. Crop Science
46, 21272132.
UPOV (The International Union for the Protection of New Varieties of Plants) (1991) The 1991 Act of
the UPOV Convention. Available at: http://www.upov.int/en/publications/conventions/1991/content.
htm (accessed 17 November 2009).
UPOV (The International Union for the Protection of New Varieties of Plants) (2005) UPOV Report on the Impact
of Plant Variety Protection. UPOV Publication No. 353 (E), UPOV, Geneva, December 2005, 98 pp.
Urwin, P.E., McPheron, M.J. and Atkinson, H.J. (1998) Enhanced transgenic plant resistance to nematodes
by dual proteinase inhibitor constructs. Planta 204, 472479.
Urwin, P., Yi, L., Martin, H., Atkinson, H. and Gilmartin, P.M. (2000) Functional characterization of the
EMCV IRES in plants. Plant Journal 24, 583589.
USDA (United States Department of Agriculture) (2002a) Statistical indicators. Agricultural Outlook,
JanuaryFebruary 2002, Economic Research Service, USDA, Washington, DC, pp. 3059.
USDA (United States Department of Agriculture) (2002b) Genetically engineered crops: US adoption and
impacts. Agricultural Outlook, September 2002, Economic Research Service, USDA, Washington,
DC, pp. 2427.
Usuka, J., Zhu, W. and Brendel, V. (2000) Optimal sliced alignment of homologous cDNA to a genomic DNA
template. Bioinformatics 16, 203211.
Utz, H.F. and Melchinger, A.E. (1994) Comparison of different approaches to interval mapping of quantita-
tive trait loci. In: van Ooijen, J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of
702 References
Molecular Markers. Proceedings of the Ninth Meeting of the EUCARPIA Section Biometrics in Plant
Breeding, 68 July 1994, Wageningen, Netherlands, pp. 195204.
Utz, H.F. and Melchinger A.E. (1996) PLABQTL: a program for composite interval mapping of QTL. Journal
of Agricultural Genomics. Available at: http://www.cabi-publishing.org/jag/papers96/paper196/
indexp196.html (accessed 30 June 2007).
Utz, H.F., Melchinger, A.E. and Schn, C.C. (2000) Bias and sampling error of the estimated proportion of
genotypic variance explained by quantitative loci determined from experimental data in maize using
cross validation and validation with independent samples. Genetics 154, 18391849.
Vain, P., Afolabi, A.S., Worland, B. and Snape, J.W. (2003) Transgene behaviour in populations of rice
plants transformed using a new dual binary vector system: pGreen/pSoup. Theoretical and Applied
Genetics 107, 210217.
Vallegos, C.E. and Chase, C.D. (1991) Linkage between isozyme markers and a locus affecting seed size
in Phaseolus vulgaris L. Theoretical and Applied Genetics 81, 413419.
van Berloo, R. (1999) GGT: software for the display of graphical genotypes. Journal of Heredity 90,
328329.
van Berloo, R. and Stam, P. (1998) Marker-assisted selection in autogamous RIL populations: a simulation
study. Theoretical and Applied Genetics 96, 147154.
van Berloo, R. and Stam, P. (1999) Comparison between marker-assisted selection and phenotypical
selection in a set of Arabidopsis thaliana recombinant inbred lines. Theoretical and Applied Genetics
98, 113118.
van Berloo, R. and Stam, P. (2001) Simultaneous marker-assisted selection for multiple traits in autog-
amous crops. Theoretical and Applied Genetics 102, 11071112.
van Berloo, R., Aalbers, H., Werkman, A. and Niks, R.E. (2001) Resistance QTL confirmed through devel-
opment of QTL-NILs for barley leaf rust resistance. Molecular Breeding 8, 187195.
van der Fits, L., Hilliou, F. and Memelink, J. (2001) T-DNA activation tagging as a tool to isolate regula-
tors of a metabolic pathway from a generally non-tractable plant species. Transgenic Research 10,
513521.
van der Wurff, A.W., Chan, Y.L., Van Straalen, N.M. and Schouten, J. (2000) TE-AFLP: combining rapidity
and robustness in DNA fingerprinting. Nucleic Acids Research 28, e105.
van Deynze, A.E., Nelson, J.C., ODonoughue, L.S., Ahn, S.N., Siripoonwiwat, W., Harrington, S.E.,
Yglesias, E.S., Braga, D.P., McCouch, S.R. and Sorrells, M.E. (1995a) Comparative mapping in
grasses. Oat relationships. Molecular and General Genetics 249, 349356.
van Deynze, A.E., Nelson, J.C., ODonoughue, L.S., Ahn, S.N., Siripoonwiwat, W., Harrington, S.E.,
Yglesias, E.S., Braga, D.P., McCouch, S.R. and Sorrells, M.E. (1995b) Comparative mapping in
grasses. Wheat relationships. Molecular and General Genetics 248, 744754.
van Eeuwijk, F.A., Denis, J.-B. and Kang, M.S. (1996) Incorporating additional information on genotype and
environments in models for two-way genotype by environment tables. In: Kang, M.S. and Gaugh, H.G.
(eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 1550.
van Eeuwijk, F.A., Crossa, J., Vargas, M. and Ribaut, J.M. (2001) Variants of factorial regression for ana-
lysing QTL by environment interaction. In: Gallais, A., Dillmann, C. and Goldringer, I. (eds) Eucarpia,
Quantitative Genetics and Breeding Methods: the Way Ahead. Institut National de la Rescherche
Agronomique (INRA) Editions, Versailles. Les colloques 96, 107116.
van Eeuwijk, F.A., Crossa, J., Vargas, M. and Ribaut, J.-M. (2002) Analysing QTL by environment inter-
action by factorial regression, with an application to the CIMMYT drought and low nitrogen stress
programme in maize. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB
International, Wallingford, UK, pp. 245256.
van Eeuwijk, F.A., Malosetti, M., Yin, X., Struik, P.C. and Stam, P. (2004) Modeling differential pheno-
typic expression. In: New Directions for a Diverse Planet: Proceedings 4th International Crop Science
Congress (ICSC), 26 September1 October 2004, Brisbane, Australia. ICSC, Brisbane, Australia.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
van Eeuwijk, F.A., Malosetti, M., Yin, X., Struik, P.C. and Stam, P. (2005) Statistical models for genotype
by environment data: from conventional ANOVA models to eco-physiological QTL models. Australian
Journal of Agricultural Research 56, 883894.
van Eijk, M., Peleman, J. and de Ruiter-Bleeker, M. (2001) Microsatellite-AFLP. Patent EP 1282729.
van Ginkel, M., Trethowan, R., Ammar, K., Wang, J. and Lillemo, M. (2002) Guide to bread wheat breeding
at CIMMYT (rev). Wheat special report No. 5. Centro Internacional de Mejoramiento de Maiz y Trigo
(CIMMYT), Mexico, DF.
References 703
van Oeveren, A.J. and Stam, P. (1992) Comparative simulation studies on the effects of selection for quan-
titative traits in autogamous crops: early selection versus single seed decent. Heredity 69, 342351.
van Ooijen, A.J. and Voorrips, R.E. (2001) JoinMap (tm) 3.0: Software for the Calculation of Genetic Linkage
Maps. Plant Research International, Wageningen, Netherlands.
van Ooijen, J.W. (1992) Accuracy of mapping quantitative trait loci in autogamous species. Theoretical and
Applied Genetics 84, 803811.
van Os, H., Andrzejewski, S., Bakker, E., Barrena, I., Bryan, G.J., Caromel, B., Ghareeb, B., Ishidore,
E., de Jong, W., van Koert, P., Lefebvre, V., Milbourne, D., Ritter, E., van der Voort, J.N.A.M.,
Rousselle-Bourgeois, E., van Vliet, J., Waugh, R., Visser, R.G.F., Bakker, J. and van Eck, H.J.
(2006) Construction of a 10,000-marker ultradense genetic recombination map of potato: provid-
ing a framework for accelerated gene isolation and a genomewide physical map. Genetics 173,
10751087.
van Treuren, R. (2001) Efficiency of reduced primer selectivity and bulked DNA analysis for the rapid detec-
tion of AFLP polymorphisms in a range of crop species. Euphytica 117, 2737.
van Wijk, K.J. (2001) Challenges and prospects of plant proteomics. Plant Physiology 126, 301308.
Vandepoele, K. and Van de Peer, Y. (2005) Exploring the plant transcriptome through phylogenetic profiling.
Plant Physiology 137, 3142.
Vaneck, J.M., Blowers, A.D. and Earle, E.D. (1995) Stable transformation of tomato cell-cultures after bom-
bardment with plasmid and YAC DNA. Plant Cell Reports 14, 299304.
Vane-Wright, R.I., Humphries, D.J. and Williams, P.H. (1991) What to protect? Systematics and the agony
of choice. Biological Conservation 55, 235254.
Varela, M., Crossa, J., Rane, J., Joshi, A. and Trethowan, R. (2006) Analysis of a three-way interaction
including multi-attributes. Australian Journal of Agricultural Research 57, 11851193.
Vargas, M., van Eeuwijk, F.A., Crossa, J. and Ribaut, J.-M. (2006) Mapping QTL and QTL environment
interaction for CIMMYT maize drought stress program using factorial regression and partial least
squares methods. Theoretical and Applied Genetics 122, 10091023.
Varshney, R.K., Graner, A. and Sorrells, M.E. (2005a) Genic microsatellite markers in plants: features and
applications. Trends in Biotechnology 23, 4855.
Varshney, R.K., Graner, A. and Sorrells, M.E. (2005b) Genomics-assisted breeding for crop improvement.
Trends in Plant Science 10, 621630.
Varshney, R.K., Nayak, S.N., May, G.D. and Jackson, S.A. (2009) Next-generation sequencing technolo-
gies and their implications for crop genetics and breeding. Trends in Biotechnology 27, 522530.
Vavilov, N.I. (1926) Studies on the origin of cultivated plants. Bulletin of Applied Botany, Genetics and Plant
Breeding 16, 1248.
Veena, J.H., Doerge, R.W. and Gelvin, S. (2003) Transfer of T-DNA and Vir proteins to plant cells by
Agrobacterium tumefaciens induces expression of host genes involved in mediating transformation
and suppresses host defense gene expression. The Plant Journal 35, 219236.
Velculescu, V.E., Zhang, L., Vogelstein, B. and Kinzler, K.W. (1995) Serial analysis of gene expression.
Science 270, 484487.
Veldboom, L.R., Lee, M. and Woodman, W.L. (1994) Molecular-marker-facilitated studies in an elite maize
population: I. Linkage analysis and determination of QTL for morphological traits. Theoretical and
Applied Genetics 88, 716.
Veldboom, L.R., Lee, M. and Woodman, W.L. (1996) Molecular-marker-facilitated studies in an elite maize
population: I. Linkage analysis and determination of QTL for morphological traits. Theoretical and
Applied Genetics 88, 716.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J. et al. (2001) The sequence of the human
genome. Science 291, 13041351.
Verbyla, A.P., Eckermann, P.J., Thompson, R. and Cullis, B.R. (2003) The analysis of quantitative trait
loci in multi-environment trials using a multiplicative mixed model. Australian Journal of Agricultural
Research 54, 13951408.
Verdonk, J.C., De Vos, C.H.R., Verhoeven, H.A., Harina, M.A., van Tunen, A.J. and Schuurink, R.C. (2003)
Regulation of floral scent production in petunia revealed by targeted metabolomics. Phytochemistry
62, 9971008.
Verhaegen, D., Plomion, C., Gion, J.-M., Poitel, M., Costa, P. and Kremer, A. (1997) Quantitative trait
dissection analysis in Eucalyptus using RAPD markers: 1. Detection of QTL in interspecific hybrid
progeny, stability of QTL expression across different ages. Theoretical and Applied Genetics 95,
597608.
704 References
Verweire, D., Verleyen, K., Buck, S.D., Claeys, M. and Angenon, G. (2007) Marker-free transgenic plants
through genetically programmed auto-excision. Plant Physiology 145, 12201231.
Veyrieras, J.-B., Goffinet, B. and Alain Charcosset, A. (2007) MetaQTL: a package of new computational
methods for the meta-analysis of QTL mapping experiments. BMC Bioinformatics 8, 49.
Vickers, C., Xue, G. and Gresshoff, P.M. (2006) A novel cis-acting element, ESP, contributes to high-level
endosperm-specific expression in an oat globulin promoter. Plant Molecular Biology 62, 195214.
Vigouroux, Y., Mitchell, S., Matsuoka, Y., Hamblin, M., Kresovich, S., Smith, J.S.C., Jaqueth, J., Smith, O.S.
and Doebley, J. (2005) An analysis of genetic diversity across the maize genome using microsatellites.
Genetics 169, 16171630.
Villar, M., Lefevre, F., Bradshaw, H.D., Jr and du-Cros, E.T. (1996) Molecular genetics of rust resistance in
poplars (Melampsora larici-populina Kleb/Populus sp.) by bulked segregant analysis in a 2 2 facto-
rial mating design. Genetics 143, 531536.
Virk, P.S., Ford-Lloyd, B.V., Jackson, M.T., Pooni, H.S., Clemeno, T.P. and Newbury, H.J. (1996) Predicting
quantitative variation within rice germplasm using molecular markers. Heredity 76, 296304.
Vision, T.J., Brown, D.G., Shmoys, D.B., Durrett, R.T. and Tanksley, S.D. (2000) Selective mapping: a strat-
egy for optimizing the construction of high-density linkage maps. Genetics 155, 407420.
Visscher, P.M. and Goddard, M.E. (2004) Prediction of the confidence interval of quantitative trait loci loca-
tion. Behavior Genetics 34, 477482.
Visscher, P.M., Thompson, R. and Haley, C.S. (1996) Confidence intervals in QTL mapping by bootstrap-
ping. Genetics 143, 10131020.
Visscher, P.M., Hill, W.G. and Wray, N.R. (2008) Heritability in the genomics era concepts and misconcep-
tions. Nature Reviews Genetics 9, 255266.
Vogl, C. and Xu, S. (2000) Multipoint mapping of viability and segregation distorting loci using molecular
markers. Genetics 155, 14391447.
Vos, P., Hogers, R., Bleeker, M., Reijans, M., van de Lee, T., Hornes, M., Frijters, A., Pot, J., Peleman,
H., Kuiper, M. and Zabeau, M. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids
Research 23, 44074414.
Vuylsteke, M., Kuiper, M. and Stam, P. (2000) Chromosomal regions involved in hybrid performance and
heterosis: their AFLP-based identification and practical uses in prediction models. Heredity 85,
208218.
Walden, I. (1998) Preserving diversity: the role of property rights. In: Swanson, T.M. (ed.) Intellectual
Property Rights and Biodiversity Conservation. Cambridge University Press, Cambridge, UK,
pp. 176 197.
Walker, D., Boerma, H.R., All, J. and Parrott, W. (2002) Combining cry1Ac with QTL alleles from PI 229358
to improve soybean resistance to lepidopteran pests. Molecular Breeding 9, 4351.
Walker, D.R., Narvel, J.M., Boerma, H.R., All, J.N. and Parrott, W.A. (2004) A QTL that enhances and
broadens Bt insect resistance in soybean. Theoretical and Applied Genetics 109, 10511957.
Wallace, D.H. (1985) Physiological genetics of plant maturity, adaptation and yield. Plant Breeding Reviews
3, 21158.
Wallace, R.B., Shaffer, J., Murphy, R.F., Bonner, J., Hirose, T. and Itakura, K. (1979) Hybridization of syn-
thetic oligodeoxyribonucleotide to phi 174 DNA: the effect of single base pair mismatch. Nucleic Acids
Research 6, 35433557.
Walling, G.A., Visscher, P.M., Andersson, L., Rothschild, M.F., Wang, L., Moser, G., Groenen, A.M.,
Bidanel, J.P., Cepica, S., Archibald, A.L., Geldermann, H., Koning, D.J., Milan, D. and Haley, C.S.
(2000) Combined analysis of data from quantitative trait loci mapping studies: chromosome 4 effects
on porcine growth and fatness. Genetics 155, 13691378.
Wall Tvet, M.W. (2005) How will a Substantive Patent Law Treaty affect the public domain for genetic
resources and biological material? Journal of World Intellectual Property 8, 311344.
Walsh, B. (2001) Quantitative genetics in the age of genomics. Theoretical Population Biology 59,
175184.
Walsh, B. (2004) Population- and quantitative-genetic models of selection limits. Plant Breeding Reviews
24 (Part 1), 177225.
Wan, S., Wu, J., Zhang, Z., Sun, X., Lv, Y., Gao, C., Ning, Y., Ma, J., Guo, Y., Zhang, Q., Zheng, X., Zhang,
C., Ma, Z. and Lu, T. (2008) Activation tagging, an efficient tool for functional analysis of the rice
genome. Plant Molecular Biology 69, 6980.
Wang, D.L., Zhu, J., Li, Z.K. and Paterson, A.H. (1999) Mapping QTLs with epistatic effects and QTL
environment interactions by mixed linear model approaches. Theoretical and Applied Genetics 99,
12551264.
References 705
Wang, E., Robertson, M.J., Hammer, G.L., Carberry, P.S., Holzworth, D., Meinke, H., Chapman, S.C.,
Hargreaves, J.N.G., Huth, N.I. and McLean, G. (2002) Development of a generic crop model template
in the cropping system model APSIM. European Journal of Agronomy 18, 121140.
Wang, G.-L., Mackill, D.J., Bonman, J.M., McCouch, S.R., Champoux, M.C. and Nelson, R.J. (1994) RFLP
mapping of genes conferring complete and partial resistance to blast in a durably resistant rice culti-
var. Genetics 136, 14211434.
Wang, G.W., He, Y.Q., Xu, C.G. and Zhang, Q. (2005) Identification and confirmation of three neutral alleles
conferring wide compatibility in inter-subspecific hybrids of rice (Oryza sativa L.) using near-isogenic
lines. Theoretical and Applied Genetics 111, 702710.
Wang, G.W., He, Y.Q., Xu, C.G. and Zhang, Q. (2006) Fine mapping of f5-Du, a gene conferring wide-
compatibility for pollen fertility in inter-subspecific hybrids of rice (Oryza sativa L.). Theoretical and
Applied Genetics 112, 382387.
Wang, H., Zhang, Y.M., Li, X., Masinde, G.L., Mohan, S., Baylink, D.J. and Xu, S. (2005) Bayesian shrink-
age estimation of quantitative trait loci parameters. Genetics 170, 465480.
Wang, J., van Ginkel, M., Podlich, D., Ye, G., Trethowan, R., Pfeiffer, W., DeLacy, I.H., Cooper, M. and Rajaram,
S. (2003) Comparison of two breeding strategies by computer simulation. Crop Science 43, 17641773.
Wang, J., van Ginkel, M., Trethowan, R., Ye, G., DeLacy, I., Podlich, D. and Cooper, M. (2004) Simulating the
effects of dominance and epistasis on selection response in the CIMMYT Wheat Breeding Program
using QuCim. Crop Science 44, 20062018.
Wang, J., Eagles, H.A., Trethowan, R. and van Ginkel, M. (2005) Using computer simulation of the selec-
tion process and known gene information to assist in parental selection in wheat quality breeding.
Australian Journal of Agricultural Research 56, 465473.
Wang, J., Chapman, S.C., Bonnett, D.G., Rebetzke, G.J. and Crouch, J. (2007) Application of population
genetic theory and simulation models to efficiently pyramid multiple genes via marker-assisted selec-
tion. Crop Science 47, 582588.
Wang, J.K. and Bernardo, R. (2000) Variance and marker estimates of parental contribution to F2 and BC1-
derived inbreds. Crop Science 40, 659665.
Wang, J.K. and Pfeiffer, W.H. (2007) Simulation modeling in plant breeding: principles and applications.
Agricultural Sciences in China 6, 908921.
Wang, X., Rea, T., Bian, J., Gray, S. and Sun, Y. (1999) Identification of the gene responsive to etoposide-
induced apoptosis: application of DNA chip technology. FEBS Letters 445, 269273.
Wang, X., Hu, Z., Wang, W., Li, Y., Zhang, Y.M. and Xu, C. (2007) A mixture model approach to the mapping
of QTL controlling endosperm traits with bulked samples. Genetica 132, 5970.
Wang, X.Y., Chen, P.D. and Zhang, S.Z. (2001) Pyramiding and marker-assisted selection for powdery
mildew resistance genes in common wheat. Acta Genetica Sinica 28, 640646 (in Chinese; summary
in English).
Wang, Y., Chen, B., Hu, Y., Li, J. and Lin, Z. (2005) Inducible excision of selectable marker gene from trans-
genic plants by the Cre/lox site-specific recombination system. Transgenic Research 14, 605614.
Wang, Y.H., Liu, S.J., Ji, S.L., Zhang, W.W., Wang, C.M., Jiang, L. and Wan, J.M. (2005) Fine mapping and
marker-assisted selection (MAS) of a low glutelin content gene in rice. Cell Research 15, 622630.
Wang, Z., Zou, Y., Li, X., Zhang, Q., Chen, L., Wu, H., Su, D., Chen, Y., Guo, J., Luo, D., Long, Y., Zhong, Y.
and Liu, Y.G. (2006) Cytoplasmic male sterility of rice with Boro II cytoplasm is caused by a cytotoxic
peptide and is restored by two related PPR motif genes via distinct modes of mRNA silencing. The
Plant Cell 18, 676687.
Ware, D. and Stein, L. (2003) Comparison of genes among cereals. Current Opinion in Plant Biology 6,
121127.
Ware, D.H., Jaiswal, P., Ni, J., Yap, I.V., Pan, X., Clark, K.Y., Teytelman, L., Schmidt, S.C., Zhao, W., Chang,
K., Cartinhour, S., Stein, L.D. and McCouch, S.R. (2002) Gramene, a tool for grass genomics. Plant
Physiology 130, 16061613.
Warthmann, N., Chen, H., Ossowski, S., Weigel, D. and Herv, P. (2008) Highly specific gene silencing by
artificial miRNAs in rice. PLoS ONE 3(3), e1829.
Wassom, J.J., Wong, J.C., Martinez, E., King, J.J., DeBaene, J., Hotchkiss, J.R., Mikkilineni, V., Bohn, M.O.
and Rocheford, T.R. (2008) QTL associated with maize kernel oil, protein and starch concentrations;
kernel mass; and grain yield in Illinois High Oil B73 backcross-derived lines. Crop Science 48,
243252.
Waugh, R., Mclean, K., Flavell, A.J., Pearce, S.R., Kumar, A. and Thomas, B.B.T. (1997) Genetic distribu-
tion of Bare-1 like retrotransposable elements in the barley genome revealed by sequence-specific
amplification polymorphism (S-SAP). Molecular and General Genetics 253, 687694.
706 References
Wayne, M.L. and McIntyre, L.M. (2002) Combining mapping and arraying: an approach to candidate gene
identification. Proceedings of the National Academy of Sciences of the United States of America 99,
1490314906.
Weber, A.L., Briggs, W.H., Rucker, J., Baltazar, B.M., Snchez-Gonzalez, J.D.J., Feng, P., Buckler, E.S. and
Doebley, J. (2008) The genetic architecture of complex traits in teosinte (Zea mays ssp. parviglumis):
new evidence from association mapping. Genetics 180, 12211232.
Weckwerth, W. (2003) Metabolomics in systems biology. Annual Review of Plant Biology 54, 669689.
Weckwerth, W., Wenzel, K. and Fiehn, O. (2004) Process for the integrated extraction, identification and
quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks.
Proteomics 4, 7883.
Wehrhahn, C. and Allard, R.W. (1965) The detection and measurement of the effects of individual genes
involved in the inheritance of a quantitative character in wheat. Genetics 51, 109119.
Weigel, D. and Nordborg, M. (2005) Natural variation in Arabidopsis. How do we find the causal genes?
Plant Physiology 138, 567568.
Weigel, D., Ahn, J.H., Blazquez, M.A., Borevitz, J.O., Christensen, S.K., Fankhauser, C., Ferrandiz, C.,
Kardailsky, I., Malancharuvil, E.J., Neff, M.M., Nguyen, J.T., Sato, S., Wang, Z., Xia, Y., Dixon, R.A.,
Harrison, M.J., Lamb, C.J., Yanofsky, M.F. and Chory, J. (2000) Activation tagging in Arabidopsis. Plant
Physiology 122, 10031013.
Weir, B.S. (1990) Genetic Data Analysis, Methods for Discrete Population Genetic Data. Sinauer Associates,
Inc., Sunderland, Massachusetts, pp. 222260.
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates, Inc., Sunderland, Massachusetts, 376 pp.
Weise, S., Grosse, I., Klukas, C., Koschtzki, D., Scholz, U., Schreiber, F. and Junker, B.H. (2006) Meta-All:
a system for managing metabolic pathway information. BMC Bioinformatics 7, 465.
Welch, R.M. and Graham, R.D. (2004) Breeding for micronutrients in staple food crops from a human nutri-
tion perspective. Journal of Experimental Botany 55, 353364.
Welch, S.M., Dong, Z. and Roe, J.L. (2004) Modeling gene networks controlling transition to flowering in
Arabidopsis. In: New Directions for a Diverse Planet: Proceedings 4th International Crop Science
Congress (ICSC), 26 September1 October 2004, Brisbane, Australia. ICSC, Brisbane, Australia.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
Welsh, J. and McClelland, M. (1990) Fingerprinting genomes using PCR with arbitrary primers. Nucleic
Acids Research 18, 72317238.
Wenck, A. and Hansen, G. (2004) Positive selection. In: Pea, L. (ed.) Methods in Molecular Biology, Vol. 286.
Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp. 227235.
Wenzel, W.G. and Pretorius, A.J. (2000) Heterosis and xenia in sorghum malt quality. South-African Journal
of Plant Soil 17, 6669.
Wenzl, P., Carling, J., Kudrna, D., Jaccoud, D., Huttner, E., Kleinhofs, A. and Kilian, A. (2004) Diversity
array technology (DArT) for whole-genome profiling of barley. Proceedings of the National Academy
of Sciences of the United States of America 101, 99159920.
Wenzl, P., Li, H., Carling, J., Zhou, M., Raman, H., Paul, E., Hearnden, P., Maier, C., Xia, L., Caig, V.,
Ovesn, J., Cakir, M., Poulsen, D., Wang, J., Raman, R., Smith, K.P., Muehlbauer, G.J., Chalmers,
K.J., Kleinhofs, A., Huttner, E. and Kilian, A. (2006) A high-density consensus map of barley linking
DArT markers to SSR, RFLP and STS loci and agricultural traits. BMC Genomics 7, 206.
Werner, K., Friedt, W. and Ordon, F. (2005) Strategies for pyramiding resistance genes against the barley
yellow mosaic virus complex (BaMMV, BaYMV, BaYMV-2). Molecular Breeding 16, 4555.
Wesley, S.V., Helliwell, C.A., Smith, N.A., Wang, M.B., Rouse, D.T., Liu, Q., Gooding, P.S., Singh, S.P.,
Abbott, D., Stoutjesdijk, P.A., Robinson, S.P., Gleave, A.P., Green, A.G. and Waterhouse, P.M. (2001)
Construct design for efficient, effective and high-throughput gene silencing in plants. The Plant Journal
27, 581590.
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio,
M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O.,
Landsman, D., Lipman, D.J., Madden, T.L., Maglott, V., Miller, D.R., Ostell, J., Pruitt, K.D., Schuler,
G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov,
R.L., Tatusova, T.A., Wagner, L. and Yaschenko, E. (2007) Database resources of the National Center
for Biotechnology Information. Nucleic Acids Research 36, D13D21.
White, J.W. and Hoogenboom, G. (1996) Integrating effects of genes for physiological traits into crop growth
models. Agronomy Journal 88, 416422.
References 707
White, J.W. and Hoogenboom, G. (2003) Gene-based approaches to crop simulation: past experiences
and future opportunities. Agronomy Journal 95, 5264.
White, P.J. and Broadley, M.R. (2005) Biofortifying crops with essential mineral elements. Trends in Plant
Science 10, 586593.
White, P.R. (1934) Potentially unlimited growth of excised tomato root tips in a liquid medium. Plant
Physiology 9, 585600.
Whitelaw, C.A., Barbazuk, W.B., Pertea, G., Chan, A.P., Cheung, F., Lee, Y., Zheng, L., van Heeringen,
S., Karamycheva, S., Bennetzen, J.L., SanMiguel, P., Lakey, N., Bedell, J., Yuan, Y., Budiman, M.A.,
Resnick, A., van Aken, S., Utterback, T., Riedmuller, S., Williams, M., Feldblyum, T., Schubert, K.,
Beachy, R., Fraser, C.M. and Quackenbush, J. (2003) Enrichment of gene-coding sequences in maize
by genome filtration. Science 302, 21182120.
Whitelegge, J.P. (2002) Plant proteomics: BLASTing out of a MudPIT. Proceedings of the National Academy
of Sciences of the United States of America 99, 1156411566.
Whitesides, G.M. (2006) The origins and the future of microfluidics. Nature 442, 368373.
Whittaker, J.C., Haley, C.S. and Thompson, R. (1997) Optimal weighting of information in marker-assisted
selection. Genetical Research 69, 137144.
Wiemann, S., Weil, B., Wellenreuther, R., Gassenhuber, J., Glassl, S., Ansorge, W., Bocher, M., Blocker,
H., Bauersachs, S., Blum, H., Lauber, J., Dsterhft, A., Beyer, A., Khrer, K., Strack, N., Mewes,
H.-W., Ottenwlder, B., Obermaier, B., Tampe, J., Heubner, D., Wambutt, R., Korn, B., Klein, M. and
Poustka, A. (2001) Toward a catalog of human genes and proteins: sequencing and analysis of 500
novel complete protein coding human cDNAs. Genome Research 11, 422435.
Wilkes, G. (1993) Germplasm collections: their use, potential, social responsibility and genetic vulnerability.
In: Buxton, D.R., Shibles, R., Forsberg, R.A., Blad, B.L., Asay, K.H., Paulsen, G.M. and Wilson,
R.F. (eds) International Crop Science I. Crop Science Society of America, Madison, Wisconsin,
pp. 445450.
Wilkinson, M., Schoof, H., Ernst, R. and Haase, D. (2005) BioMOBY successfully integrates distrib-
uted heterogeneous bioinformatics web services. The PlaNet Exemplar case. Plant Physiology
138, 57.
Wilkins-Stevens, P., Hall, J.G., Lyamichev, V., Neri, B.P., Lu, M., Wang, L., Smith, L.M. and Kelso, D.M.
(2001) Analysis of single nucleotide polymorphisms with solid phase invasive cleavage reactions.
Nucleic Acids Research 29, e77.
William, H.M., Morris, M., Warburton, M. and Hiosington, D.A. (2007a) Technical, economic and policy
considerations on marker-assisted selection in crops: lessons from the experience at an international
agricultural research center. In: Guimares, E.P., Ruane, J., Scherf, B.D., Sonnino, A. and Dargie,
J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives in Crops, Livestock,
Forestry and Fish. Food and Agriculture Organization of the Unites Nations, Rome, pp. 381404.
William, H.M., Trethowan, R. and Crosby-Galvan, E.M. (2007b) Wheat breeding assisted by markers:
CIMMYTs experience. Euphytica 157, 307319.
Williams, C.E. and St Clair, D.A. (1993) Phenetic relationships and levels of variability detected by restric-
tion fragment length polymorphism and random amplified polymorphic DNA analysis of cultivated and
wild accessions of Lycopersicon esculentum. Genome 36, 619630.
Williams, E.J. (1952) The interpretation of interactions in factorial experiments. Biometrika 39, 6581.
Williams, J.G.K., Kubelik, A.R., Livak, K.J., Rafalski, J.A. and Tingey, S.V. (1990) DNA polymorphisms ampli-
fied by arbitrary primers are useful as genetic markers. Nucleic Acids Research 18, 65316535.
Williams, J.S. (1962) The evaluation of a selection index. Biometrics 18, 375393.
Wilson, J.A. (1968) Problems in hybrid wheat breeding. Euphytica 17 (Suppl.1), 1333.
Wilson, L.M., Whitt, S.R., Ibanez, A.M., Rocheford, T.R., Goodman, M.M. and Buckler IV, E.S. (2004)
Dissection of maize kernel composition and starch production by candidate gene association. The
Plant Cell 16, 27192733.
Wilson, P. and Driscoll, C.J. (1983) Hybrid wheat. In: Frankel, R. (ed.) Monographs on Theoretical and
Applied Genetics, Vol. 6. Heterosis. Springer-Verlag, Berlin, pp. 94123.
Wilson, W.A., Harrington, S.E., Woodman, W.L., Lee, M., Sorrells, M.E. and McCouch, S.R. (1999)
Inferences on the genome structure of progenitor maize through comparative analysis of rice, maize
and the domesticated panicoids. Genetics 153, 453473.
Windsor, A.J. and Mitchell-Olds, T. (2006) Comparative genomics as a tool for gene discovery. Current
Opinion in Biotechnology 17, 17.
708 References
Wingbermuehle, W.J., Gustus, C. and Smith, K.P. (2004) Exploiting selective genotyping to study genetic
diversity of resistance to Fusarium head blight in barley. Theoretical and Applied Genetics 109,
11601168.
Wink, M. (1988) Plant breeding: importance of plant secondary metabolites for protection against patho-
gens and herbivores. Theoretical and Applied Genetics 75, 225233.
Winkler, R.G. and Feldman, K.A. (1998) PCR-based identification of T-DNA insertion mutants. Methods in
Molecular Biology 82, 129136.
Winzeler, E.A., Richards, D.R., Conway, A.R., Goldstein, A.L., Kalman, S., McCullough, M.J., McCusker,
J.H., Stevens, D.A., Wodicka, L., Lockhart, D.J. and Davis, R.W. (1998) Direct allelic variation scan-
ning of the yeast genome. Science 281, 11941197.
Wishart, D.S., Tzur, D., Knox, C., Eisner, R., Guo, A.C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney,
S., Fung, C., Nikolai, L., Lewis, M., Coutouly, M.-A., Forsythe, I., Tang, P., Shrivastava, S., Jeroncic, K.,
Stothard, P., Amegbey, G., Block, D., Hau, D.D., Wagner, J., Miniaci, J., Clements, M., Gebremedhin,
M., Guo, N., Zhang, Y., Duggan, G.E., Macinnis, G.D., Weljie, A.M., Dowlatabadi, R., Bamforth, F.,
Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B.D., Vogel, H.J. and Querengesser, L. (2007) HMDB:
The Human Metabolome Database. Nucleic Acids Research 35(Database issue), D521526.
Witcombe, J.R. (1996) Participatory approaches to plant breeding and selection. Biotechnology and
Development Monitor 29, 26.
Witcombe, J.R. and Hash, C.T. (2000) Resistance gene deployment strategies in cereal hybrids using
marker-assisted selection: gene pyramiding, three-way hybrids and synthetic parent populations.
Euphytica 112, 175186.
Withers, L.A. (1993) New technologies for the conservation of plant genetic resources. In: Buxton, D.R.,
Shibles, R., Forsberg, R.A., Blad, B.L., Asay, K.H., Paulsen, G.M. and Wilson, R.F. (eds) International
Crop Science I. Crop Science Society of America, Madison, Wisconsin, pp. 429435.
Withers, L.A. (1995) Collecting in vitro for genetic resources conservation. In: Guarino, L., Ramanatha
Rao, V. and Reid, R. (eds) Collecting Plant Genetic Diversity. CAB International, Wallingford, UK, pp.
511515.
Wold, B. and Myers, R.M. (2008) Sequence census methods for functional genomics. Nature Methods 5,
1921.
Wolf, Y.I., Rogozin, I.B., Grishin, N.V. and Koonin, E.V. (2003) Genome-scale phylogenetic trees. In: Frontiers
in Computational Genomics. Caister Academic Press, Wymondham, UK, pp. 241260.
Wollenweber, B., Porter, J.R. and Lbberstedt, T. (2005) Need for multidisciplinary research towards a
second green revolution. Commentary. Current Opinion in Plant Biology 8, 337341.
Wong, D.W.S. (1997) The ABCs of Gene Cloning. Chapman & Hall, New York.
Worland, A.J. and Law, C.N. (1986) Genetic analysis of chromosome 2D of wheat. I. The location of genes
affecting height, daylength insensitivity, hybrid dwarfism and yellow-rust resistance. Zeitschrift fr
Pflanzenzchtung 96, 331345.
Wouters, F.S., Verveer, P.J. and Bastiaens, P.I.H. (2001) Imaging biochemistry inside cells. Trends in Cell
Biology 11, 203221.
Wright, A.J. and Mowers, R.P. (1994) Multiple regression for molecular-marker, quantitative trait data from
large F2 populations. Theoretical and Applied Genetics 89, 305312.
Wright, S. (1921a) Correlation and causation. Journal of Agricultural Research 20, 557585.
Wright, S. (1921b) Systems of mating I. The biometric relations between parent and offspring. Genetics 6,
111123.
Wright, S. (1978) Evolution and Genetics of Populations, Vol. IV. The University of Chicago Press, Chicago,
Illinois.
Wright, S.I., Bi, I.V., Schroeder, S.G., Yamasaki, M., Doebley, J.F., McMullen, M.D. and Gaut, B.S. (2005)
The effects of artificial selection on the maize genome. Science 308, 13101314.
Wu, C., Li, X.J., Yuan, W.Y., Chen, G.X., Kilian, A., Li, J., Xu, C., Li, X.H., Zhou, D.-X., Wang, S. and Zhang,
Q. (2003) Development of enhancer trap lines for functional analysis of the rice genome. The Plant
Journal 35, 418427.
Wu, H., Sparks C., Amoah, B. and Jones, H.D. (2003) Factors influencing successful Agrobacterium-
mediated genetic transformation of wheat. Plant Cell Reports 21, 659668.
Wu, H., Sparks, C. and Jones, H.D. (2006) Characterization of T-DNA loci and vector backbone sequences
in transgenic wheat produced by Agrobacterium-mediated transformation. Molecular Breeding 18,
195208.
Wu, L., Nandi, S., Chen, L., Rodriguez, R.L. and Huang, N. (2002) Expression and inheritance of nine
transgenes in rice. Transgenic Research 11, 533541.
References 709
Wu, M.S., Wang, S.C. and Dai, J.R. (2000) Application of AFLP markers to heterotic grouping of elite maize
inbred lines. Acta Agronomica Sinica 26, 913.
Wu, R., Lou, X.Y., Ma, C.X., Wang, X., Larkins, B.A. and Casella, G. (2002a) An improved genetic model
generates high-resolution mapping of QTL for protein quality in maize endosperm. Proceedings of the
National Academy of Sciences of the United States of America 99, 1128111286.
Wu, R., Ma, C.-S. and Casella, G. (2002b) Joint linkage and linkage disequilibrium mapping of qualitative
trait loci in natural mapping populations. Genetics 160, 779792.
Wu, R., Ma, C.X., Gallo-Meagher, M., Littell, R.C. and Casella, G. (2002c) Statistical methods for dis-
secting triploid endosperm traits using molecular markers: an autogamous model. Genetics 162,
875892.
Wu, R., Ma, C.-X., Lin, M., Wang, Z. and Casella, G. (2004) Functional mapping of quantitative trait loci
underlying growth trajectories using the transform-both-sides of the logistic model. Biometrics 60,
729738.
Wu, R., Ma, C. and Casella, G. (2007) Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL
(Statistics for Biology and Health). Springer, Berlin.
Wu, R.L. and Lin, M. (2006) Functional mapping: how to map and study the genetic architecture of dynamic
complex traits. Nature Reviews Genetics 7, 229237.
Wu, R.L. and Zeng, Z.B. (2001) Joint linkage and linkage disequilibrium mapping in natural populations.
Genetics 157, 899909.
Wu, W., Zhou, Y., Li, W., Mao, D. and Chen, Q. (2002) Mapping of quantitative trait loci based on growth
models. Theoretical and Applied Genetics 105, 10431049.
Wu, W.R. and Li, W.M. (1994) A new approach for mapping quantitative trait loci using complete genetic
marker linkage maps. Theoretical and Applied Genetics 89, 535539.
Wu, W.R. and Li, W.M. (1996) Model fitting and model testing in the method of joint mapping of quantitative
trait loci. Theoretical and Applied Genetics 92, 477482.
Wu, W.-R., Li, W.-M., Tang, D.-Z., Lu, H.-R. and Worland, A.J. (1999) Time-related mapping of quantitative
trait loci underlying tiller number in rice. Genetics 151, 297303.
Xi, Z.Y., He, F.H., Zeng, R.Z., Zhang, Z.M., Ding, X.H., Li, W.T. and Zhang, G.Q. (2006) Development of
a wide population of chromosome single segment substitution lines in the genetic background of an
elite cultivar of rice (Oryza sativa L.). Genome 49, 476484.
Xia, L., Peng, K., Yang, S., Wenzl, P., de Vincente, M.C., Fregene, M. and Kilian, A. (2005) DArT for high-
throughput genotyping of cassava (Manihot esculenta) and its wild relatives. Theoretical and Applied
Genetics 110, 10921098.
Xia, X.C., Reif, J.C., Melchinger, A.E., Frisch, M., Hoisington, D.A., Beck, D., Pixley, K. and Warburton,
M.L. (2005) Genetic diversity among CIMMYT maize inbred lines investigated with SSR markers: II.
Subtropical, tropical mid-altitude and highland maize inbred lines and their relationships with elite U.S.
and European maize. Crop Science 45, 25732582.
Xiang, C., Han, P., Lutziger, I., Wang, K. and Oliver, D.J. (1999) A mini binary vector series for plant trans-
formation. Plant Molecular Biology 40, 711717.
Xiao, J., Li, J., Yuan, L. and Tanksley, S.D. (1995) Dominance is the major genetic basis of heterosis in rice
as revealed by QTL analysis using molecular markers. Genetics 140, 745754.
Xiao, J., Grandillo, S., Ahn, S.N., McCouch, S.R., Tanksley, S.D., Li, J. and Yuan, L. (1996a) Genes from
wild rice improve yield. Nature 384, 223224.
Xiao, J., Li, J., Yuan, L., McCouch, S.R. and Tanksley, S.D. (1996b) Genetic diversity and its relationship to
hybrid performance and heterosis in rice as revealed by PCR-based markers. Theoretical and Applied
Genetics 92, 637643.
Xiao, J., Li, J., Yuan, L. and Tanksley, S.D. (1996c) Identification of QTLs affecting traits of agronomic
importance in a recombinant inbred population derived from subspecific rice cross. Theoretical and
Applied Genetics 92, 230244.
Xiao, J., Li, L., Grandillo, S., Yuan, L., Tanksley, S.D. and McCouch, S.R. (1998) Identification of trait-improving
quantitative trait loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150, 899909.
Xiong, Q., Qiu, Y. and Gu, W. (2008) PGMapper: a web-based tool linking phenotype to genes. Bioinformatics
24, 10111013.
Xu, C., He, X. and Xu, S. (2003) Mapping quantitative trait loci underlying triploid endosperm traits. Heredity
90, 228235.
Xu, S. (1996) Mapping quantitative trait loci using four-way crosses. Genetical Research 68, 175181.
Xu, S. (1998) Mapping quantitative trait loci using multiple families of line crosses. Genetics 148,
517524.
710 References
Xu, S. (2002) QTL analysis in plants. In: Camp, N.J. and Cox, A. (eds) Methods in Molecular Biology,
Vol. 195. Quantitative Trait Loci: Methods and Protocols. Humana Press, Totowa, New Jersey, pp.
283310.
Xu, S. (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163, 789801.
Xu, S. (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait-loci. Biometrics
63, 513521.
Xu, S. and Jia, Z. (2007) Genome-wide analysis of epistatic effects for quantitative traits in barley. Genetics
175, 19551963.
Xu, Y. (1994) Application of molecular markers in genetic improvement of quantitative traits in plants. In:
Proceedings of the Third Young Scientists Symposium on Crop Genetics and Breeding. Publishing
House of Agricultural Science and Technology of China, Beijing, pp. 3849.
Xu, Y. (1997) Quantitative trait loci: separating, pyramiding and cloning. Plant Breeding Reviews 15,
85139.
Xu, Y. (2002) Global view of QTL: rice as a model. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics
and Plant Breeding. CAB International, Wallingford, UK, pp. 109134.
Xu, Y. (2003) Developing marker-assisted selection strategies for breeding hybrid rice. Plant Breeding
Reviews 23, 73174.
Xu, Y. and Crouch, J.H. (2008) Marker-assisted selection in plant breeding: from publications to practice.
Crop Science 48, 391407.
Xu, Y. and Luo, L. (2002) Biotechnology and germplasm resource management in rice. In: Luo, L., Ying, C.
and Tang, S. (eds) Rice Germplasm Resources. Hubei Science and Technology Publisher, Wuhan,
China, pp. 229250.
Xu, Y.B. and Shen, Z.T. (1991) Diallel analysis of tiller number at different growth stages in rice (Oryza
sativa L.). Theoretical and Applied Genetics 83, 243249.
Xu, Y. and Shen, Z. (1992a) Detection and genetic analyses of the gene dispersed crosses: some theoreti-
cal considerations. Acta Agricultura Zhejiangensis 18, 109117 (in English with Chinese abstract).
Xu, Y. and Shen, Z. (1992b) Detection and genetic analyses of the gene dispersed cross for tiller angle in
rice (Oryza sativa L.). Acta Agricultura Zhejiangensis 4, 5460.
Xu, Y. and Shen, Z. (1992c) Accumulation of the alleles with similar effects at four loci controlling tiller angle
from gene dispersed crosses in rice (Oryza sativa L.). Journal of Biomathematics (Beijing) 7, 110.
Xu, Y.B. and Shen, Z.T. (1992d) Distorted segregation of waxy gene and its characterization in indica
japonica hybrids. Chinese Journal of Rice Science 6, 8992 (in Chinese).
Xu, Y. and Zhu, L. (1994) Molecular Quantitative Genetics (in Chinese). China Agriculture Press, Beijing,
China, 291 pp.
Xu, Y., Shen, Z., Chen, Y. and Zhu, L. (1995) A statistical technique and generalized computer software
for interval mapping of quantitative trait loci and its application. Acta Agronomica Sinica 21, 18 (in
Chinese with English abstract).
Xu, Y., Zhu, L., Xiao, J., Huang, N. and McCouch, S.R. (1997) Chromosomal regions associated with seg-
regation distortion of molecular markers in F2, backcross, doubled haploid and recombinant inbred
populations of rice (Oryza sativa L.). Molecular and General Genetics 253, 535545.
Xu, Y., McCouch, S.R. and Shen, Z. (1998) Transgressive segregation of tiller angle in rice caused by com-
plementary action of genes. Crop Science 38, 1219.
Xu, Y., Lobos, K.B. and Clare, K.M. (2002) Development of SSR markers for rice molecular breeding. In:
Proceedings of Twenty-Ninth Rice Technical Working Group Meeting, 2427 February 2002, Little
Rock, Arkansas. Rice Technical Working Group, Little Rock, Arkansas, p. 49.
Xu, Y., Ishii, T. and McCouch, S.R. (2003) Marker-assisted evaluation of germplasm resources for plant
breeding. In: Mew, T.W., Brar, D.S., Peng, S. and Hardy, B. (eds) Rice Science: Innovations and Impact
for Livelihood. Proceedings of the 24th International Rice Research Conference, 1619 September
2002, Beijing. International Rice Research Institute, Chinese Academy of Engineering and Chinese
Academy of Agricultural Sciences, Beijing, pp. 213229.
Xu, Y., Beachell, H. and McCouch, S.R. (2004) A marker-based approach to broadening the genetic base
of rice (Oryza sativa L.) in the US. Crop Science 44, 19471959.
Xu, Y., McCouch, S.R. and Zhang, Q. (2005) How can we use genomics to improve cereals with rice as a
reference genome? Plant Molecular Biology 59, 726.
Xu, Y., Wang, J. and Crouch, J.C. (2008) Selective genotyping and pooled DNA analysis: an innovative
use of an old concept. In: Proceedings of the 5th International Crop Science Congress, 1318 April
2008, Jeju, Korea. Published on CD-ROM. Available at: http://www.cropscience2008.com (accessed
30 June 2008).
References 711
Xu, Y., Babu, R., Skinner D.J., Vivek, B.S. and Crouch, J.H. (2009a) Maize mutant Opaque2 and the improve-
ment of protein quality through conventional and molecular approaches. In: Shu, Q.Y. (ed.) Induced
Plant Mutations in the Genomics Era. Food and Agriculture Organization of the United Nations, Rome,
pp. 191196.
Xu, Y., Lu, Y., Yan, J., Babu, R., Hao, Z., Gao, S., Zhang, S., Li, J., Vivek, B.S., Magorokosho, C., Mugo,
S., Makumbi, D., Taba, S., Palacios, N., Guimares, C.T., Araus, J.-L., Wang, J., Davenport, G.F.,
Crossa, J. and Crouch, J.H. (2009b) SNP-chip based genomewide scan for germplasm evaluation and
markertrait association analysis and development of a molecular breeding platform. Proceedings of
14th Australasian Plant Breeding & 11th Society for the Advancement in Breeding Research in Asia &
Oceania Conference, 1014 August 2009, Cairns, Tropical North Queensland, Australia. Distributed
by CD-ROM.
Xu, Y., Skinner, D.J., Wu, H., Palacios-Rojas, N., Araus, J.L., Yan, J., Gao, S., Warburton, M.L. and Crouch,
J.H. (2009c) Advances in maize genomics and their value for enhancing genetic gains from breed-
ing. International Journal of Plant Genomics Volume 2009, Article ID 957602, 30 pages. Available at:
http://www.hindawi.com/journals/ijpg/2009/957602.html (accessed 21 December 2009).
Xu, Y., This, D., Pausch, R.C., Vonhof, W.M., Coburn, J.R., Comstock, J.P., McCouch, S.R. (2009d) Water
use efficiency determined by carbon isotope discrimination in rice: genetic variation associated with
population structure and QTL mapping. Theoretical and Applied Genetics 118, 10651081.
Xue, W., Xing, Y., Weng, X., Zhao, Y., Tang, W., Wang, L., Zhou, H., Yu, S., Xu, C., Li, X. and Zhang, Q.
(2008) Natural variation in Gdh7 is an important regulator of heading date and yield potential in rice.
Nature Genetics 40, 761767.
Xue, Y. and Xu, Z. (2002) An introduction to the China Rice Functional Genomics Program. Comparative
and Functional Genomics 3, 161163.
Yadav, N.S., Vanderleyden, J., Bennett, D.R., Barnes, W.M. and Chilton, M.D. (1982) Short direct repeats
flank the T-DNA on a nopaline Ti plasmid. Proceedings of the National Academy of Sciences of the
United States of America 79, 63226326.
Yadav, R.S., Hash, C.T., Bidinger, F.R., Cavan, G.P. and Howarth, C.J. (2002) Quantitative trait loci associ-
ated with traits determining grain and stover yield in pearlmillet under terminal drought stress condi-
tions. Theoretical and Applied Genetics 104, 6783.
Yamada, K., Lim, J., Dale, J.M., Chen, H., Shinn, P., Palm, C.J., Southwick, A.M., Wu, H.C., Kim, C., Nguyen,
M., Pham, P., Cheuk, R., Karlin-Newmann, G., Liu, S.X., Lam, B., Sakano, H., Wu, T., Yu, G., Miranda,
M., Quach, H.L., Tripp, M., Chang, C.H., Lee, J.M., Toriumi, M., Chan, M.M.H., Tang, C.C., Onodera,
C.S., Deng, J.M., Akiyama, K., Ansari, Y., Arakawa, T., Banh, J., Banno, F., Bowser, L., Brooks, S.,
Carninci, P., Chao, Q., Choy, N., Enju, A., Goldsmith, A.D., Gurjal, M., Hansen, N.F., Hayashizaki, Y.,
Johnson-Hopson, C., Hsuan, V.W., Iida, K., Karnes, M., Khan, S., Koesema, E., Ishida, J., Jiang, P.X.,
Jones, T., Kawai, J., Kamiya, A., Meyers, C., Nakajima, M., Narusaka, M., Seki, M., Sakurai, T., Satou,
M., Tamse, R., Vaysberg, M., Wallender, E.K., Wong, C., Yamamura, Y., Yuan, S., Shinozaki, K., Davis,
R.W., Athanasios Theologis, A. and Ecker, J.R. (2003) Empirical analysis of transcriptional activity in
the Arabidopsis genome. Science 302, 842846.
Yamagishi, M., Yano, M., Fukuta, Y., Fukui, K., Otani, M. and Shimada, T. (1996) Distorted segregation
of RFLP markers in regenerated plants derived from anther culture of an F1 hybrid of rice. Genes &
Genetic Systems 71, 3741.
Yamamoto, T., Takemori, N., Sue, N. and Nitta, N. (2003) QTL analysis of stigma exsertion in rice. Rice
Genetics Newsletter 20, 3334.
Yamazaki, M., Tsugawa, H., Miyao, A., Yano, M., Wu, J., Yamamoto, S., Matsumoto, T., Sasaki, T. and
Hirochika, H. (2001) The rice retrotransposon Tos17 prefers low-copy-number sequences as integra-
tion targets. Molecular and General Genetics 265, 336344.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1998a) Molecular dissection of developmental behavior
of plant height in rice (Oryza sativa L.). Genetics 150, 12571265.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1998b) Quantitative trait loci analysis for the developmen-
tal behavior of tiller number in rice (Oryza sativa L.). Theoretical and Applied Genetics 97, 267274.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1999) Molecular marker-assisted dissection of genotype
environment interaction for plant type traits in rice (Oryza sativa L.). Crop Science 39, 538544.
Yan, J., Yang, X., Shah, T., Snchez-Villeda, H., Li, J., Warburton, M., Zhou, Y., Crouch, J.H. and Xu, Y.
(2009) High-throughput SNP genotyping with the GoldenGate assay in maize. Molecular Breeding
(in press).
Yan, W. and Kang, M.S. (2003) GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and
Agronomists. CRC Press, Boca Raton, Florida.
712 References
Yan, W. and Rajcan, I. (2002) Biplot evaluation of test sides and trait relations of soybean in Ontario. Crop
Science 42, 1120.
Yan, W. and Tinker, N.A. (2006) Biplot analysis of multi-environment trial data: principles and applications.
Canadian Journal of Plant Science 86, 623645.
Yan, W., Hunt, L.A., Sheng, Q. and Szlavnies, Z. (2000) Cultivar evolution and mega-environment investiga-
tion based on GGE biplot. Crop Science 40, 596605.
Yan, W., Rutger, J.N., Bockelman, H.E. and Tai, T. (2004) Development of a core collection from the USDA
rice germplasm collection. In: Norman, R.J., Meullenet, J.-F. and Moldenhauer, K.A.K. (eds) B. R.
Wells Rice Research Studies 2003. Arkansas Agricultural Expteriment Research Station Series No.
517, pp. 8896. Available at: http://www.uark.edu/depts/agripub/publications/research (accessed 31
December 2007).
Yan, W., Kang, M.S., Ma, B., Woods, S. and Cornelius, P.L. (2007) GGE biplot vs. AMMI analysis of geno-
type-by-environment data. Crop Science 47, 643655.
Yang, H., You, A., Yang, Z., Zhang, F., He, R., Zhu, L. and He, G. (2004) High-resolution genetic mapping
at the Bph5 locus for brown planthopper resistance in rice (Oryza sativa L.). Theoretical and Applied
Genetics 110, 182191.
Yang, H.-C., Liang, Y.-J., Huang, M.-C., Li, L.-H., Lin, C.H., Wu, J.-Y., Chen, Y.-T. and Fann, C.S.J. (2006a)
A genome-wide study of preferential amplification/hybridization in microarray-based pooled DNA
experiments. Nucleic Acids Research 34, e106.
Yang, H.-C., Pan, C.-C., Lin, C.-Y. and Fann, C.S.J. (2006b) PDA: pooled DNA analyzer. BMC Bioinformatics
7, 233.
Yang, J., Hu, C., Hu, H., Yu, R., Xia, Z., Ye, X. and Zhu, J. (2008) QTLNetwork: mapping and visualizing
genetic architecture of complex traits in experimental populations. Bioinformatics 24, 721723.
Yang, R. and Xu, S. (2007) Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics
176, 11691185.
Yang, R.-C. (2004) Epistasis of quantitative trait loci under different gene action models. Genetics 167,
14931505.
Yang, R.-C. (2007) Mixed model analysis of crossover genotype-environment interactions. Crop Science
47, 10511062.
Yang, R.Q., Tan, Q. and Xu, S.Z. (2006) Mapping quantitative trait loci for longitudinal traits in line crosses.
Genetics 173, 23392356.
Yang, X., Rupe, M., Bickel, D., Arthur, L., Smith, O. and Guo, M. (2006) Effects of cistrans-regulation on
allele-specific transcript expression in the meristems of maize hybrids. In: 48th Annual Maize Genetic
Conference, 912 March 2006, Pacific Grove, California, 132 pp.
Yang, X.R., Wang, J.R., Li, H.L. and Li, Y.F. (1983) Studies on the general medium for anther culture of
cereals and increasing of the frequency of green pollen-plantlets-induction of Oryza sativa subsp.
hseni. In: Shen, J.H., Zhang, Z.H. and Shi, S.D. (eds) Studies on Anther-Cultured Breeding in Rice.
Agriculture Press, Beijing, pp. 6169.
Yano, M., Harushima, Y., Nagamura, Y., Kurata, N., Minobe, Y. and Sasaki, T. (1997) Identification of quan-
titative trait loci controlling heading date in rice using a high-density linkage map. Theoretical and
Applied Genetics 95, 10251032.
Yano, M., Katayose,Y., Ashikari, M., Yamanouchi, U., Monna, L., Fuse, T., Baba, T., Yamamoto, K., Umehara,
Y., Nagamura, Y. and Sasaki, T. (2000) Hd1, a major photoperiod sensitivity quantitative trait locus
in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. The Plant Cell 12,
24732484.
Yano, M., Kojima, S., Takahashi, Y., Lin, H.X. and Sasaki, T. (2001) Genetic control of flowering time in rice,
as short-day plant. Plant Physiology 127, 14251429.
Yao, Y., Ni, Z., Zhang, Y., Chen, Y., Ding, Y., Han, Z., Liu, Z. and Sun, Q. (2005) Identification of differentially
expressed genes in leaf and root between wheat hybrid and its parental inbreds using PCR-based
cDNA subtraction. Plant Molecular Biology 58, 367384.
Yates, F. and Cochran, W.G. (1938) The analysis of groups of experiments. Journal of Agricultural Science
28, 556580.
Ye, X., Al-Babili, S., Klti, A., Zhang, J., Lucca, P., Beyer, P. and Potrykus, I. (2000) Engineering the pro-
vitamin A (b-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 287,
303305.
Yi, N. (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci.
Genetics 167, 967975.
References 713
Yi, N. and Shriner, D. (2008) Advances in Bayesian multiple quantitative trait loci mapping in experimental
crosses. Heredity 100, 240252.
Yi, N. and Xu, S. (2002) Mapping quantitative trait loci with epistatic effects. Genetical Research 79,
185198.
Yi, N., George, V. and Allison, D.B. (2003) Stochastic search variable selection for identifying multiple quan-
titative trait loci. Genetics 164, 11291138.
Yi, N., Yandell, B.S., Churchill, G.A., Allison, D.B., Eisen, E.J. and Pomp, D. (2005) Bayesian model selec-
tion for genome-wide epistatic quantitative trait loci analysis. Genetics 70, 13331344.
Yi, N., Zinniel, D.K., Kim, K., Eisen, E.J., Bartolucci, A., Allison, D.B. and Pomp, D. (2006) Bayesian analyses
of multiple epistasis QTL models for body weight and body composition in mice. Genetical Research
87, 4560.
Yi, N., Banerjee, S., Pomp, D. and Yandell, B.S. (2007) Bayesian mapping of genomewide interacting quan-
titative trait loci for ordinal traits. Genetics 176, 18551864.
Yin, T.M., DiFazio, S.P., Gunter, L.E., Riemenschneider, D. and Tuskan, G.A. (2004) Large-scale heter-
ospecific segregation distortion in Populus revealed by a dense genetic map. Theoretical and Applied
Genetics 109, 451463.
Yin, X., Kropff, M.J. and Stam, P. (1999) The role of ecophysiological models in QTL analysis: the example
of specific leaf area in barley. Heredity 82, 415421.
Yin, X., Stam, P., Kropff, M.J. and Schapendonk, A.H.C.M. (2003) Crop modeling, QTL mapping and their
complementary role in plant breeding. Agronomy Journal 95, 9098.
Yin, X., Struik, P.C. and Kropff, M.J. (2004) Role of crop physiology in predicting gene-to-phenotype rela-
tionships. Trends in Plant Science 9, 426432.
Yin, X., Struik, P.C., Tang, J., Qi, C. and Liu, T. (2005) Model analysis of flowering phenology in recombinant
inbred lines of barley. Journal of Experimental Botany 56, 959965.
Yoo, B.H. (1980) Long-term selection for a quantitative character in large replicate populations of Drosophila
melanogaster. I. Response to selection. Genetical Research 35, 117.
Yoon, D.-B., Kang, K.-H., Kim, H.-J., Ju, H.-G., Kwon, S.-J., Suh, J.-P., Jeong, O.-Y. and Ahu, S.-N. (2006)
Mapping quantitative trait loci for yield components and morphological traits in an advanced backcross
population between Oryza grandiglumis and the O. japonica cultivar Hwaseongbyeo. Theoretical and
Applied Genetics 112, 10521062.
Young, N.D. (1999) A cautiously optimistic vision for marker assisted breeding. Molecular Breeding 5,
505 510.
Young, N.D. and Tanksley, S.D. (1989a) Restriction fragment length polymorphism maps and the concept
of graphical genotypes. Theoretical and Applied Genetics 77, 95101.
Young, N.D. and Tanksley, S.D. (1989b) RFLP analysis of the size of chromosomal segments retained
around the Tm-2 locus of tomato during backcross breeding. Theoretical and Applied Genetics 77,
353359.
Young, N.D., Zamir, D., Ganal, M. and Tanksley, S.D. (1988) Use of isogenic lines and simultaneous probing
to identify DNA markers tightly linked to the Tm-2a gene in tomato. Genetics 120, 579585.
Yousef, G.G. and Juvik, J.A. (2001a) Comparison of phenotypic and marker-assisted selection for quantita-
tive traits in sweet corn. Crop Science 41, 645655.
Yousef, G.G. and Juvik, J.A. (2001b) Evaluation of breeding utility of a chromosomal segment from
Lycopersicon chmielewskii that enhances cultivated tomato soluble solids. Theoretical and Applied
Genetics 103, 10221027.
Yu, G.-X. and Wise, R.P. (2000) An anchored AFLP- and retrotransponson-based map of diploid Avena.
Genome 43, 736749.
Yu, J., Hu, S., Wang, J., Wong, G.K.S., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Cao, M., Liu, J.,
Sun, J., Tang, J., Chen, Y., Huang, X., Lin, W., Ye, C., Tong, W., Cong, L., Geng, J., Han, Y., Li, L., Li,
W., Hu, G., Huang, X., Li, W., Li, J., Liu, Z., Li, L., Liu, J., Qi, Q., Liu, J., Li, L., Li, T., Wang, X., Lu, H.,
Wu, T., Zhu, M., Ni, P., Han, H., Dong, W., Ren, X., Feng, X., Cui, P., Li, X., Wang, H., Xu, X., Zhai, W.,
Xu, Z., Zhang, J., He, S., Zhang, J., Xu, J., Zhang, K., Zheng, X., Dong, J., Zeng, W., Tao, L., Ye, J.,
Tan, J., Ren, X., Chen, X., He, J., Liu, D., Tian, W., Tian, C., Xia, H., Bao, Q., Li, G., Gao, H., Cao, T.,
Wang, J., Zhao, W., Li, P., Chen, W., Wang, X., Zhang, Y., Hu, J., Wang, J., Liu, S., Yang, J., Zhang, G.,
Xiong, Y., Li, Z., Mao, L., Zhou, C., Zhu, Z., Chen, R., Hao, B., Zheng, W., Chen, S., Guo, W., Li, G.,
Liu, S., Tao, M., Wang, J., Zhu, L., Yuan, L. and Yang, H. (2002) A draft sequence of the rice genome
(Oryza sativa L. ssp. indica). Science 296, 7992.
714 References
Yu, J., Arbelbide, M. and Bernardo, R. (2005a) Power of in silico QTL mapping from phenotypic,
pedigree and marker data in a hybrid breeding program. Theoretical and Applied Genetics 110,
10611067.
Yu, J., Wang, J., Lin, W., Li, S., Li, H., Zhou, J., Ni, P., Dong, W., Hu, S., Zeng, C., Zhang, J., Zhang, Y.,
Li, R., Xu, Z., Li, S., Li, X., Zheng, H., Cong, L., Lin, L., Yin, J., Geng, J., Li, G., Shi, J., Liu, J., Lv,
H., Li, J., Wang, J., Deng, Y., Ran, L., Shi, X., Wang, X., Wu, Q., Li, C., Ren, X., Wang, J., Wang, X.,
Li, D., Liu, D., Zhang, X., Ji, Z., Zhao, W., Sun , Y., Zhang, Z., Bao, J., Han, Y., Dong, L., Ji, J., Chen,
P., Wu, S., Liu, J., Xiao, Y., Bu, D., Tan, J., Yang, L., Ye, C., Zhang, J., Xu, J., Zhou, Y., Yu, Y., Zhang,
B., Zhuang, S., Wei, H., Liu, B., Lei, M., Yu, H., Li, Y., Xu, H., Wei, S., He, X., Fang, L., Zhang, Z.,
Zhang, Y., Huang, X., Su, Z., Tong, W., Li, J., Tong, Z., Li, S., Ye, J., Wang, L., Fang, L., Lei, T., Chen,
C., Chen, H., Xu, Z., Li, H., Huang, H., Zhang, F., Xu, H., Li, N., Zhao, C., Li, S., Dong, L., Huang,
Y., Li, L., Xi, Y., Qi, Q., Li, W., Zhang, B., Hu, W., Zhang, Y., Tian, X., Jiao, Y., Liang, X., Jin, J., Gao,
L., Zheng, W., Hao, B., Liu, S., Wang, W., Yuan, L., Cao, M., McDermott, J., Samudrala, R., Wang,
J., Wong, G.K.-S. and Yang, H. (2005b) The genome of Oryza sativa: a history of duplications. PLoS
Biology 3, E38.
Yu, J., Pressoir, G., Briggs, W., Bi, I.V., Yamasaki, M., Doebley, J.F., McMullen, M.D., Gaut, B.S.,
Nielsen, D.M., Holland, J.B., Kresovich, S. and Buckler, E.S. (2006) A unified mixed-model
method for association mapping that accounts for multiple levels of relatedness. Nature Genetics
38, 203207.
Yu, J., Hollan, J.B., McMullen, M.D. and Buckler, E.S. (2008) Genetic design and statistical power of nested
association mapping in maize. Genetics 178, 539551.
Yu, J., Zhang, Z., Zhu, C., Tabanao, D.A., Pressoir, G., Tuinstra, M.R., Kresovich, S., Todhunter, R.J. and
Buckler, E.S. (2009) Simulation appraisal of the adequacy of number of background markers for rela-
tionship estimation in association mapping. The Plant Genome 2, 6377.
Yu, J.K., La Rota, M., Kantety, R.V. and Sorrells, M.E. (2004) EST-derived SSR markers for comparative
mapping in wheat and rice. Molecular Genetics and Genomics 271, 742751.
Yu, K., Park, S.J. and Poysa, V. (2000) Marker-assisted selection of common beans for resistance to com-
mon bacterial blight: efficacy and economics. Plant Breeding 119, 411415.
Yu, S.B., Li, J.X., Xu, C.G., Tan, Y.F., Gao, Y.J., Li, X.H., Zhang, Q.F. and Saghai Maroof, M.A. (1997)
Importance of epistasis as the genetic basis of heterosis in an elite rice hybrid. Proceedings of the
National Academy of Sciences of the United States of America 94, 92269231.
Yu, W., Andersson, B., Worley, K.C., Muzny, D.M., Ding, Y., Liu, W., Ricafrente, J.Y., Wentland, M.A.,
Lennon, G. and Gibbs, R.A. (1997) Large-scale concatenation cDNA sequencing. Genome Research
7, 353358.
Yu, W., Han, F., Gao, Z., Vega, J.M. and Birchler, J. (2007) Construction and behavior of engineered mini-
chromosomes in maize. Proceedings of the National Academy of Sciences of the United States of
America 104, 89248929.
Yuan, L.P. (1992) Development and prospects of hybrid rice breeding. In: You, C.B. and Chen, Z.L. (eds)
Agricultural Biotechnology. Proceedings of the Asian Pacific Conference on Agricultural Biotechnology.
China Agricultural Press, Beijing, pp. 97105.
Yuan, L.P. (2002) Future outlook on hybrid rice research and development. In: Abstracts of the Fourth
International Symposium on Hybrid Rice, 1417 May 2002, Hanoi, Vietnam. International Rice Research
Institute (IRRI), Manila, Philippines, p.3.
Yuan, L.P. and Chen, H.X. (eds) (1988) Breeding and Cultivation of Hybrid Rice. Hunan Science and
Technology Press, Changsha, China.
Yuan, Y., SanMiguel, P.J. and Bennetzen, J.L. (2003) High Cot sequence analysis of the maize genome.
The Plant Journal 34, 249255 (erratum: The Plant Journal 36, 430).
Zabeau, M. and Voss, P. (1993) Selective restriction fragment amplification: a general method for DNA
fingerprinting. European Patent Application. 92402629.7 (Publ. Number 0 534 858 A1).
Zale, J.M., Clancy, J.A., Ullrich, S.E., Jones, B.L., Hays, P.M. and the North American Barley Genome
Mapping Project (2000) Summary of barley malting QTL mapped in various mapping populations.
Barley Genetics Newsletter 30, 14.
Zamir, D. (2001) Improving plant breeding with exotic genetic libraries. Nature Reviews Genetics 2,
983989.
Zeng, R., Zhang, Z. and Zhang, G. (2000) Identification of multiple alleles at the Wx locus in rice using
microsatellite class and GT polymorphism. In: Liu, X. (ed.) Theory and Application of Crop Research.
China Science and Technology Press, Beijing, pp. 202205.
References 715
Zeng, Z.-B. (1993) Theoretical basis of separation of multiple linked gene effects on mapping quantita-
tive trait loci. Proceedings of the National Academy of Sciences of the United States of America 90,
1097210976.
Zeng, Z.-B. (1994) Precision mapping of quantitative trait loci. Genetics 136, 14571468.
Zeng, Z.-B. (1998) Mapping quantitative trait loci: interval mapping, composite interval mapping and mul-
tiple interval mapping. Summer Institute for Statistical Genetics, Module 7. Department of Statistics,
North Carolina State University, Raleigh, North Carolina.
Zenkteler, M. and Nitzsche, W. (1984) Wide hybridization experiments in cereals. Theoretical and Applied
Genetics 68, 311315.
Zhang, H.B. and Wing, R.A. (1997) Physical mapping of the rice genome with BACs. Plant Molecular
Biology 35, 115127.
Zhang, J., Chandra Babu, R., Pantuwan, G., Kamoshita, A., Blum, A., Wade, L., Sarkarung, S., OToole, J.C.
and Nguyen, N.T. (1999) Molecular dissection of drought tolerance in rice: from physio-morphological
traits to field performance. In: Ito, O., OToole, J. and Hardy, B. (eds) Genetic Improvement of Rice
for Water-limited Environments. International Rice Research Institute (IRRI), Manila, Philippines,
pp. 331343.
Zhang, J., Zheng, H.G., Aarti, A., Pantuwan, G., Nguyen, T.T., Tripathi, J.N., Sarial, A.K., Robin, S., Babu,
R.C., Nguyen, B.D., Sarkarung, S., Blum, A. and Nguyen, H.T. (2001) Locating genomic regions asso-
ciated with components of drought resistance in rice: comparative mapping within and across species.
Theoretical and Applied Genetics 103, 1929.
Zhang, J., Xu, Y., Wu, X. and Zhu, L. (2002) A bentazon and sulfonylurea sensitive mutant: breeding, genet-
ics and potential application in seed production of hybrid rice. Theoretical and Applied Genetics 105,
1622.
Zhang, J., Li, X., Jiang, G., Xu, Y. and He, Y. (2006) Pyramiding of Xa7 and Xa21 for the improvement of
disease resistance to bacterial blight in hybrid rice. Plant Breeding 125, 600605.
Zhang, J.F. and Stewart, J.McD. (2004) Semigamy gene is associated with chlorophyll reduction in cotton.
Crop Science 44, 20542062.
Zhang, L.P., Lin, G.Y., Nio-Liu, D. and Foolad, M.R. (2003) Mapping QTLs conferring early blight (Alternaria
solani) resistance in a Lycopersicon esculentum L. hirsutum cross by selective genotyping. Molecular
Breeding 12, 319.
Zhang, N., Xu, Y., Akash, M., McCouch, S. and Oard, J.H. (2005) Identification of candidate markers asso-
ciated with agronomic traits in rice using discriminant analysis. Theoretical and Applied Genetics 110,
727729.
Zhang, Q. (2007) Strategies for developing green super rice. Proceedings of the National Academy of
Sciences of the United States of America 104, 1640216409.
Zhang, Q. and Huang, N. (1998) Mapping and molecular marker-based genetic analysis for efficient hybrid
rice breeding. In: Virmani, S.S., Siddiq, E.A. and Muralidharan, K. (eds) Advances in Hybrid Rice
Technology. Proceedings of the Third International Symposium on Hybrid Rice, 1416 November 1996,
Hyderabad, India. International Rice Research Institute (IRRI), Manila, Philippines, pp. 243256.
Zhang, Q., Gao, Y.J., Yang, S.H., Ragab, R.A., Saghai Maroof, M.A. and Li, Z.B. (1994) A diallel analysis of hetero-
sis in elite hybrid rice based on RFLPs and microsatellites. Theoretical and Applied Genetics 89, 185192.
Zhang, Q., Gao, Y.J., Saghai Maroof, M.A., Yang, S.H. and Li, J.X. (1995) Molecular divergence and hybrid
performance in rice. Molecular Breeding 1, 133142.
Zhang, S., Raina, S., Li, H., Li, J., Dec, E., Ma, H., Huang, H. and Fedoroff, N.V. (2003) Resources for tar-
geted insertional and deletional mutagenesis in Arabidopsis. Plant Molecular Biology 53, 133150.
Zhang, W., McElroy, D. and Wu, R. (1991) Analysis of rice Act1 5' region activity in transgenic rice plants.
The Plant Cell 3, 11551165.
Zhang, X., Yazaki, J., Sundaresan, A., Cokus, S., Chan, S.W.-L., Chen, H., Henderson, I.R., Shinn, P.,
Pellegrini, M., Jacobsen, S.E. and Ecker, J.R. (2006) Genome-wide high-resolution mapping and func-
tional analysis of DNA methylation in Arabidopsis. Cell 126, 11891201.
Zhang, Y.M. and Xu, S. (2004) Mapping quantitative trait loci in F2 incorporating phenotypes of F3 progeny.
Genetics 166, 19811993.
Zhang, Y.M. and Xu, S. (2005) A penalized maximum likelihood method for estimating epistatic effects of
QTL. Heredity 95, 96104.
Zhang, Z., Bradbury, P.J., Kroon, D.E., Casstevens, T.M. and Buckler, E.S. (2006) TASSEL 2.0: a software
package for association and diversity analyses in plants and animals. Poster presented at Plant and
Animal Genomes XIV Conference, 1418 January 2006, San Diego, California.
716 References
Zhao, J.Z., Cao, J., Li, Y., Collins, H.L., Roush, R.T., Earle, E.D. and Shelton, A.M. (2003) Transgenic plants
expressing two Bacillus thuringiensis toxins delay insect resistance evolution. Nature Biotechnology
21, 14931497.
Zhao, M.F., Li, X.H., Yang, J.B., Xu, C.G., Hu, R.Y., Liu, D.J. and Zhang, Q. (1999) Relationships between
molecular marker heterozygosity and hybrid performance in intra- and inter-subspecific crosses of
rice. Plant Breeding 118, 139144.
Zhao, S. and Bruce, W.B. (2003) Expression profiling using cDNA microarray. In: Grotewold, E. (ed.)
Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and Protocols. Humana
Press, Totowa, New Jersey, pp. 365380.
Zhao, W., Li, H., Hou, W. and Wu. R. (2007) Wavelet-based parametric functional mapping of developmen-
tal trajectories with high-dimensional data. Genetics 176, 18791892.
Zhao, Z., Wang, C., Jiang, L., Zhu, S., Ikehashi, H. and Wan, J. (2006) Identification of a new hybrid sterility
gene in rice (Oryza sativa L.). Euphytica 151, 331337.
Zheng, K., Qian, H., Shen, B., Zhuang, J., Liu, H. and Lu, J. (1994) RFLP-based phylogenetic analysis of
wide compatibility varieties in Oryza sativa L. Theoretical and Applied Genetics 88, 6569.
Zheng, X., Wu, E.J.G., Lou, X.Y., Xu, H.M. and Shi, C.H. (2008) The QTL analysis on maternal and
endosperm genome and their environmental interactions for characters of cooking quality in rice
(Oryza sativa L.). Theoretical and Applied Genetics 116, 335342.
Zhou, P.H., Tan, Y.F., He, Y.A., Xu, C.G. and Zhang, A. (2003) Simultaneous improvement of four quality traits
of Zhenshan 97, an elite parent of hybrid rice, by molecular marker-assisted selection. Theoretical
and Applied Genetics 106, 326331.
Zhu, C., Gore, M., Buckler, E.S. and Yu, J. (2008) Status and prospects of association mapping in plants.
The Plant Genome 1, 520.
Zhu, H., Bilgin, M. and Snyder, M. (2003) Proteomics. Annual Review of Biochemistry 72, 783812.
Zhu, J. and Weir, B.S. (1994) Analysis of cytoplasmic and maternal effects. II. Genetic models for triploid
endosperm. Theoretical and Applied Genetics 89, 160166.
Zhu, L., Xu, J., Chen, Y., Ling, Z., Lu, C. and Xu, Y. (1994) Location of unknown resistance gene to rice blast
using molecular markers (in Chinese). Science in China (Ser. B) 24, 10481052.
Zhu, Q., Maher, A., Masoud, S., Dixon, R.A. and Lamb, C.J. (1994) Enhanced protection against fungal
attack by constitutive co-expression of chitinase and glucanase genes in transgenic tobacco. Bio/
Technology 12, 807812.
Zhu, Y. Nomura, T., Xu, Y., Zhang, Y., Peng, Y., Mao, B., Hanada, A., Zhou, H., Wang, R., Li, P., Zhu, X.,
Mander, L.N., Kamiya, Y., Yamaguchi, S. and He, Z. (2006) ELONGATED UPPERMOST INTERNODE
encodes a cytochrome P450 monooxygenase that epoxidizes gibberellins in a novel deactivation
reaction in rice. The Plant Cell 18, 442456.
Zhu, Z.F., Sun, C.Q., Jiang, T.B., Fu, Q. and Wang, X.K. (2001) The comparison of genetic divergences
and its relationships to heterosis revealed by SSR and RFLP markers in rice (Oryza sativa L.). Acta
Genetica Sinica 28, 738745.
Zhuang, J.Y., Lin, H.X., Lu, J., Qian, H.R., Hittalmani, S., Huang, N. and Zheng, K.L. (1997) Analysis of
QTL environment interaction for yield components and plant height in rice. Theoretical and Applied
Genetics 95, 799808.
Zimmerli, L. and Somerville, S. (2005) Transcriptomics in plants: from expression to gene function. In: Leister,
D. (ed.) Plant Functional Genomics. Food Products Press, Binghamton, New York, pp. 5584.
Zivy, M., Joyard, J. and Rossignol, M. (2007) Proteomics. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds)
Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 217244.
Zobel, R.W., Wright, M.J. and Gaugh, H.G., Jr (1988) Statistical analysis of a yield trial. Agronomy Journal
80, 388393.
Zou, F., Yandell, B.S. and Fine, J.P. (2001) Statistical issues in the analysis of quantitative traits in combined
crosses. Genetics 158, 13391346.
Zou, W. and Zeng, Z.-B. (2008) Statistical methods for mapping multiple QTL. International Journal of Plant
Genomics 2008, Article ID 286561. Available at: http://www.hindawi.com/journals/ijpg/2008/286561.
html (accessed 17 November 2009).
Zou, W., Aylor, D.L. and Zeng, Z.-B. (2007) eQTL Viewer: visualizing how sequence variation affects
genome-wide transcription. BMC Bioinformatics 8, 7.
Index
717
718 Index
stepwise selection analysis 211 56, 57, 128, 129, 136, 164, 165, 173, 181,
stopping rules 212 183, 184, 232, 242, 254, 264, 266, 290,
multiple QTL see multiple quantitative trait 299, 373, 377, 403, 430, 551, 554, 612
locus model Retrotransposon tagging 446447
multiple traits, gene expression Reversed breeding-to-genetics see Long-term
cis acting eQTL 275 selection
eQTL hotspot detection 275 RM190 289
non-additivity of transcription 274 RNA interference (RNAi) 451, 473
polygenic transcriptional
variation 274, 276
tissue transcript variation 274 Segregation distortion, genetic control 148149
transcript level, genetic Selectable marker gene
complexity 274 elimination, transgenic plants
permutation and thresholds 244246 co-transformation 478
pooled analysis 216217 homologous recombination 480
power and sample size positive markers 480
additive effect 240 recombination 479
dominance effect 241 transposons 479480
false positive and negative functions 473474
errors 239240 plant transformation 475
linkage effect 241242 antibiotic resistance genes 476
marker effects 240 CaMV 35S transcript 475476
separation 249255 classes 474
single marker-based approaches engineering herbicide
analysis of variance detoxification 476477
(ANOVA) 200201 5-enol-pyruvylshikimate-3-phosphate
assumptions 197198 synthase 477
backcross (BC) design 199 glutamine synthase 476
F2 design 199200 herbicide tolerance genes 476
likelihood approach 201202 selection 474475
regression approach 201 3 signal 476
thresholds, interval mapping 244 positive selection
carbohydrate 477478
b-glucuronidase (GUS) 477
Random amplified polymorphic DNA (RAPD) phosphomannose isomerase (PMI) 478
markers 2728 versus negative selection marker 477
Recombinant inbred lines (RILs) 251252 types 477
genetic map construction 136 Selective breeding 3
inbreeding and genetic effects Selective restriction fragment amplification
continuous inbreeding 132, 133 (SRFA) 29
homozygosity 131, 132 Self-pollinated species 7
mean value and variance 133, 134 Semigamy see Parthenogenesis
intermated RILs 136137 Sequence tagged site (STS) markers 289
map distance and recombinant Simple sequence repeat (SSR) markers 3134,
fraction 135136 288289
multi-way/nested RIL populations 137138 Single marker-based approaches
selection pressure 147148 analysis of variance (ANOVA) 200201
single seed descent (SSD) method assumptions 197198
advantages and disadvantages 135 backcross (BC) design 199
multiple-seed procedure 134135 F2 design 199200
single-hill procedure 134 likelihood approach 201202
single-seed procedure 133134 regression approach 201
Recombination frequency 145146 Single markers 294295
Recurrent parent (RP) 139145, 439, 440 Single nucleotide polymorphism (SNP)
Recurrent selection 1516 markers 3439, 288
Recurrent selection method 7 Somaclonal variation 7
Restriction fragment length polymorphism Southern African Development Community
(RFLP) markers 8, 2527, 29, 30, 34, 45, (SADC) countries 392
734 Index