You are on page 1of 755

Molecular Plant Breeding

In Memoriam
Norman Ernest Borlaug
(25 March 191412 September 2009)

Norman Borlaug was one of the greatest men of our times a steadfast champion and
spokesman against hunger and poverty. He dedicated his 95 richly lived years to filling the
bellies of others, and is credited by the United Nations World Food Program with saving
more lives than any other man in history.
An American plant pathologist who spent most of his years in Mexico, it was Dr
Borlaugs high-yielding dwarf wheat varieties that prevented wide-spread famine in South
Asia, specifically India and Pakistan, and also in Turkey. Known as the Green Revolution,
this feat earned him the Nobel Peace Prize in 1970. He was instrumental in establishing
the International Maize and Wheat Improvement Center, known by its Spanish acronym
CIMMYT, and later the Consultative Group of International Agricultural Research (CGIAR),
a network of 15 agricultural research centres.
Dr Borlaug spent time as a microbiologist with DuPont before moving to Mexico in
1944 as a geneticist and plant pathologist to develop stem rust resistant wheat cultivars. In
1966 he became the director of CIMMYTs Wheat Program, seconded from the Rockefeller
Foundation. His full-time employment with the Center ended in 1979, although he remained
a part-time consultant until his death. In 1984 he began a new career as a university pro-
fessor and went on to establish the World Food Prize, which honours the achievements of
individuals who have advanced human development by improving the quality, quantity or
availability of food in the world. In 1986, he joined forces with former US President Jimmy
Carter and the Nippon Foundation of Japan, under the chairmanship of Ryoichi Sasakawa,
to establish Sasakawa Africa Association (SAA) to address Africas food problems. Since
then, more than 1 million small-scale African farmers in 15 countries have been trained by
SAA in improved farming techniques.
Dr Borlaug influenced the thinking of thousands of agricultural scientists. He was a path-
breaking wheat breeder and, equally important, his stature enabled him to influence politi-
cians and leaders around the world. His legacy and his work ethic to get things done and not
mind getting your hands dirty influenced us all and remain CIMMYT guiding principles.
We will honor Dr Borlaugs memory by carrying forward his mission and spirit of inno-
vation: applying agricultural science to help smallholder farmers produce more and better-
quality food using fewer resources. At stake is no less than the future of humanity, for, as
Borlaug said: The destiny of world civilization depends upon providing a decent standard
of living for all. His presence will never really leave CIMMYT; it is embedded in our soul.

Thomas A. Lumpkin
Director General, CIMMYT
Marianne Bnziger
Deputy Director General for Research and Partnerships, CIMMYT
Hans-Joachim Braun
Director for Global Wheat Program, CIMMYT
Molecular Plant Breeding

Yunbi Xu

International Maize and Wheat Improvement Center (CIMMYT)


Apdo Postal 6-641
06600 Mexico, DF
Mexico
CABI is a trading name of CAB International

CABI Head Ofce CABI North American Ofce


Nosworthy Way 875 Massachusetts Avenue
Wallingford 7th Floor
Oxfordshire OX10 8DE Cambridge, MA 02139
UK USA

Tel: +44 (0)1491 832111 Tel: +1 617 395 4056


Fax: +44 (0)1491 833508 Fax: +1 617 354 6875
E-mail: cabi@cabi.org E-mail: cabi-nao@cabi.org
Web site: www.cabi.org
CAB International 2010. All rights reserved. No part of this publication
may be reproduced in any form or by any means, electronically,
mechanically, by photocopying, recording or otherwise, without the
prior permission of the copyright owners.
A catalogue record for this book is available from the British Library,
London, UK.
Library of Congress Cataloging-in-Publication Data
Xu, Yunbi.
Molecular plant breeding / Yunbi Xu.
p. cm.
ISBN 978-1-84593-392-0 (alk. paper)
1. Crop improvement. 2. Plant breeding. 3. Crops--Molecular genet-
ics. 4. Crops--Genetics. I. Title.
SB106.147X8 2010
631.5233--dc22
20009033246

ISBN: 978 1 84593 392 0

Typeset by SPi, Pondicherry, India.


Printed and bound in the UK by MPG Books Group.
Contents

Preface ix
Foreword by Dr Norman E. Borlaug xv
Foreword by Dr Ronald L. Phillips xvii

1 Introduction 1
1.1 Domestication of Crop Plants 1
1.2 Early Efforts at Plant Breeding 3
1.3 Major Developments in the History of Plant Breeding 4
1.4 Genetic Variation 9
1.5 Quantitative Traits: Variance, Heritability and Selection Index 10
1.6 The Green Revolution and the Challenges Ahead 16
1.7 Objectives of Plant Breeding 17
1.8 Molecular Breeding 18

2 Molecular Breeding Tools: Markers and Maps 21


2.1 Genetic Markers 21
2.2 Molecular Maps 43

3 Molecular Breeding Tools: Omics and Arrays 59


3.1 Molecular Techniques in Omics 59
3.2 Structural Genomics 68
3.3 Functional Genomics 81
3.4 Phenomics 91
3.5 Comparative Genomics 93
3.6 Array Technologies in Omics 100

4 Populations in Genetics and Breeding 113


4.1 Properties and Classification of Populations 113
4.2 Doubled Haploids (DHs) 116
4.3 Recombinant Inbred Lines (RILs) 131
4.4 Near-isogenic Lines (NILs) 138
4.5 Cross-population Comparison: Recombination Frequency and Selection 145

5 Plant Genetic Resources: Management, Evaluation and Enhancement 151


5.1 Genetic Erosion and Potential Vulnerability 152
5.2 The Concept of Germplasm 155

v
vi Contents

5.3 Collection/Acquisition 161


5.4 Maintenance, Rejuvenation and Multiplication 166
5.5 Evaluation 171
5.6 Germplasm Enhancement 186
5.7 Information Management 188
5.8 Future Prospects 192

6 Molecular Dissection of Complex Traits: Theory 195


6.1 Single Marker-based Approaches 197
6.2 Interval Mapping 202
6.3 Composite Interval Mapping 205
6.4 Multiple Interval Mapping 209
6.5 Multiple Populations/Crosses 214
6.6 Multiple QTL 217
6.7 Bayesian Mapping 219
6.8 Linkage Disequilibrium Mapping 223
6.9 Meta-analysis 233
6.10 In Silico Mapping 237
6.11 Sample Size, Power and Thresholds 239
6.12 Summary and Prospects 247

7 Molecular Dissection of Complex Traits: Practice 249


7.1 QTL Separating 249
7.2 QTL for Complicated Traits 258
7.3 QTL Mapping across Species 262
7.4 QTL across Genetic Backgrounds 264
7.5 QTL across Growth and Developmental Stages 270
7.6 Multiple Traits and Gene Expression 274
7.7 Selective Genotyping and Pooled DNA Analysis 277

8 Marker-assisted Selection: Theory 286


8.1 Components of Marker-assisted Selection 288
8.2 Marker-assisted Gene Introgression 293
8.3 Marker-assisted Gene Pyramiding 308
8.4 Selection for Quantitative Traits 318
8.5 Long-term Selection 327

9 Marker-assisted Selection: Practice 336


9.1 Selection Schemes for Marker-assisted Selection 337
9.2 Bottlenecks in Application of Marker-assisted Selection 339
9.3 Reducing Costs and Increasing Scale and Efficiency 344
9.4 Traits Most Suitable for MAS 350
9.5 Marker-assisted Gene Introgression 356
9.6 Marker-assisted Gene Pyramiding 363
9.7 Marker-assisted Hybrid Prediction 367
9.8 Opportunities and Challenges 378

10 Genotype-by-environment Interaction 381


10.1 Multi-environment Trials 383
10.2 Environmental Characterization 386
10.3 Stability of Genotype Performance 394
10.4 Molecular Dissection of GEI 402
Contents vii

10.5 Breeding for GEI 410


10.6 Future Perspectives 414

11 Isolation and Functional Analysis of Genes 417


11.1 In Silico Prediction 419
11.2 Comparative Approaches for Gene Isolation 426
11.3 Cloning Based on cDNA Sequencing 431
11.4 Positional Cloning 435
11.5 Identification of Genes by Mutagenesis 441
11.6 Other Approaches for Gene Isolation 454

12 Gene Transfer and Genetically Modied Plants 458


12.1 Plant Tissue Culture and Genetic Transformation 458
12.2 Transformation Approaches 461
12.3 Expression Vectors 468
12.4 Selectable Marker Genes 473
12.5 Transgene Integration, Expression and Localization 480
12.6 Transgene Stacking 487
12.7 Transgenic Crop Commercialization 492
12.8 Perspectives 499

13 Intellectual Property Rights and Plant Variety Protection 501


13.1 Intellectual Property and Plant Breeders Rights 502
13.2 Plant Variety Protection: Needs and Impacts 504
13.3 International Agreements Affecting Plant Breeding 509
13.4 Plant Variety Protection Strategies 518
13.5 Intellectual Property Rights Affecting Molecular Breeding 524
13.6 Use of Molecular Techniques in Plant Variety Protection 535
13.7 Plant Variety Protection Practice 541
13.8 Future Perspectives 543

14 Breeding Informatics 550


14.1 Information-driven Plant Breeding 550
14.2 Information Collection 554
14.3 Information Integration 562
14.4 Information Retrieval and Mining 568
14.5 Information Management Systems 572
14.6 Plant Databases 579
14.7 Future Prospects for Breeding Informatics 595

15 Decision Support Tools 599


15.1 Germplasm and Breeding Population Management and Evaluation 600
15.2 Genetic Mapping and MarkerTrait Association Analysis 605
15.3 Marker-assisted Selection 613
15.4 Simulation and Modelling 615
15.5 Breeding by Design 621
15.6 Future Perspectives 623

References 627
Index 717
The colour plate section can be found following p. 270
This page intentionally left blank
Preface

The genomics revolution of the past decade has greatly enhanced our understanding of
the genetic composition of living organisms including many plant species of economic
importance. Complete genomic sequences of Arabidopsis and several major crops, together
with high-throughput technologies for analyses of transcripts, proteins and mutants, pro-
vide the basis for understanding the relationship between genes, proteins and phenotypes.
Sequences and genes have been used to develop functional and biallelic markers, such as
single nucleotide polymorphism (SNP), that are powerful tools for genetic mapping, germ-
plasm evaluation and marker-assisted selection.
The road from basic genomics research to impacts on routine breeding programmes has
been long, windy and bumpy, not to mention scattered with wrong turns and unexpected
blockades. As a result, genomics can be applied to plant breeding only when an integrated
package becomes available that combines multiple components such as high-throughput
techniques, cost-effective protocols, global integration of genetic and environmental fac-
tors and precise knowledge of quantitative trait inheritance. More recently, the end of the
tunnel has come in sight, and the multinational corporations have ramped up their invest-
ments in and expectations from these technologies. The challenge now is to translate and
integrate the new knowledge from genomics and molecular biology into appropriate tools
and methodologies for public-sector plant breeding programmes, particularly those in low-
income countries. It is expected that harnessing the outputs of genomics research will be
an important component in successfully addressing the challenge of doubling world food
production by 2050.

What does Molecular Plant Breeding include?

The term molecular plant breeding has been much used and abused in the literature, and
thus loved or maligned in equal measure by the readership. In the context of this book, the
term is used to provide a simple umbrella for the multidisciplinary field of modern plant
breeding that combines molecular tools and methodologies with conventional approaches
for improvement of crop plants. This book is intended to provide comprehensive coverage
of the components that should be integrated within plant breeding programmes to develop
crop products in a more efficient and targeted way.

ix
x Preface

The first chapter introduces some basic concepts that are required for understanding
fundamentally important issues described in subsequent chapters. The concepts include
crop domestication, critical events in the history of plant breeding, basics of quantitative
genetics (variance, heritability and selection index), plant breeding objectives and molecu-
lar breeding goals. Chapters 2 and 3 introduce the key genomics tools that are used in
molecular breeding programmes, including molecular markers, maps, omics technolo-
gies and arrays. Different types of molecular markers are compared and construction of
molecular maps is discussed. Chapter 4 describes common types of populations that have
been used in genetics and plant breeding, with a focus on recombinant inbred lines, dou-
bled haploids and near-isogenic lines. Chapter 5 provides an overview of marker-assisted
germplasm evaluation, management and enhancement. Chapters 6 and 7 discuss the theory
and practice, respectively, of using molecular markers to dissect complex traits and locate
quantitative trait loci (QTL). Chapters 8 and 9 cover the theory and practice, respectively,
of marker-assisted selection. Genotype-by-environment interaction (GEI) is discussed in
Chapter 10, including multi-environment trials, stability of genotype performance, molecu-
lar dissection of GEI and breeding for optimum GEI. Chapter 11 provides a summary of
gene isolation and functional analysis approaches, including in silico prediction of genes,
comparative approaches for gene isolation, gene cloning based on cDNA sequencing, posi-
tional cloning and identification of genes by mutagenesis. Chapter 12 describes the use of
isolated and characterized genes for gene transfer and the generation of genetically modi-
fied plants, focusing on the vital elements of expression vectors, selectable marker genes,
transgene integration, expression and localization, transgene stacking and transgenic crop
commercialization. Chapter 13 is devoted to intellectual property rights and plant vari-
ety protection, including plant breeders rights, international agreements affecting plant
breeding, plant variety protection strategies, intellectual property rights affecting molec-
ular breeding and the use of molecular techniques in plant variety protection. The last
two chapters (14 and 15) discuss supporting tools that are required in molecular breeding
for information management and decision making, including data collection, integration,
retrieval and mining and information management systems. Decision support tools are
described for germplasm and breeding population management and evaluation, genetic
mapping and marker-trait association analysis, marker-assisted selection, simulation and
modelling, and breeding by design.

Intended audience and guidance for reading and using this book

This book is intended to provide a handbook for biologists, geneticists and breeders, as well
as a textbook for final year undergraduates and graduate students specializing in agronomy,
genetics, genomics and plant breeding. Although the book has attempted to cover all rel-
evant areas of molecular breeding in plants, many examples have been drawn from the
genomics research and molecular breeding of major cereal crops. It is hoped that the book
can also serve as a resource for training courses as described below. As each chapter covers
a complete story on a special topic, readers can choose to read chapters in any order.
Advanced Course on Quantitative Genetics: Chapters 1, 2, 4, 6, 7, 10 and 14, which
cover all molecular marker-based QTL mapping, including markers, maps, populations,
statistics and genotype-by-environment interaction.
Comprehensive Course on Marker-assisted Plant Breeding: Chapters 1, 2, 3, 4, 5, 8, 9,
10, 13, 14 and 15, which cover basic theories, tools, methodologies about markers, maps,
omics, arrays, informatics and support tools for marker-assisted selection.
Short Course on Genetic Transformation: Chapters 1, 11, 12 and 13, which provide
a brief introduction to gene isolation, transformation techniques, genetic-transformation-
related intellectual property and genetically modified organism (GMO) issues.
Preface xi

Introductory Course on Breeding Informatics: Chapters 1, 2, 3, 4, 5, 10, 14 and 15,


which cover bioinformatics, focusing on plant breeding-related applications, including
basic concepts in plant breeding, markers, maps, omics, arrays, population and germplasm
management, environment and geographic information system (GIS) information, data col-
lection, integration and mining, and bioinformatics tools required to support molecular
breeding. Additional introductory information can be found in other chapters.

History of writing this book

This book has been almost a decade in preparation. In fact, the initial idea for the book
was stimulated by the impact from my previous book Molecular Quantitative Genetics
published by China Agriculture Press (Xu and Zhu, 1994), which was well received by
colleagues and students in China and used as a textbook in many universities. Preliminary
ideas related to the book were developed in a review article on QTL separation, pyramiding
and cloning in Plant Breeding Reviews (Xu, 1997). Much of the hopeful thinking described
in this paper has fortunately come true during the following 10 years, and the manipula-
tion of QTL has been revolutionized and become mainstream. As complete sequences for
several plant genomes have become available and with more anticipated, as shown by
numerous genes and QTL that have been separated and cloned individually, some of them
have been pyramided for plant breeding through genetic transformation or marker-assisted
selection.
I started making tangible progress on this book while working as a molecular breeder
for hybrid rice at RiceTec, Inc., Texas (19982003). This experience shaped my thinking
about how an applied breeding programme could be integrated with molecular approaches.
With numerous QTL accumulating for a model crop, taking all the QTL into consideration
becomes necessary. Initial thoughts on this were described in Global view of QTL . . ., pub-
lished in the proceedings on quantitative genetics and plant breeding, which considered
various genetic background effects and genotype-by-environment interaction (Xu, Y., 2002).
Hybrid rice breeding, which involves a three-line system, requires a large number of test-
crosses in order to identify traits that perform well in seed and grain production. My expe-
rience in development of marker-assisted selection strategies for breeding hybrid rice was
then summarized in a review article in Plant Breeding Reviews (Xu, Y., 2003), which also
covered general strategies for other crops using hybrids.
Moving on to research at Cornell University with Dr Susan McCouch helped me to bet-
ter understand how molecular techniques could facilitate breeding of complex traits such
as water-use efficiency, which is a difficult trait to measure and requires strong collabora-
tion among researchers across many disciplines. In addition, this experience with rice as a
model crop raised the issue of how we can use rice as a reference genome for improvement
of other crops, which was discussed in an article published in a special rice issue of Plant
Molecular Biology (Xu et al., 2005).
With over 20 years experience in rice, I decided to shift to another major crop by work-
ing for the International Maize and Wheat Improvement Center (CIMMYT) as the principle
maize molecular breeder. CIMMYT has given me exposure to an interface connecting basic
research with applied breeding for developing countries and the resource-poor. Comparing
public- and private-sector breeding programmes has given me an intense understanding
of the importance of making the type of breeding systems that have been working well
for the private sector a practical reality for the public sector, particularly in developing
countries. This has been addressed in a recent review paper published in Crop Science
(Xu and Crouch, 2008), which discussed the critical issues for achieving this translation.
My most recent research has focused on the development of various molecular breeding
xii Preface

platforms that can be used to facilitate breeding procedures through seed DNA-based geno-
typing, selective and pooled DNA analysis, and chip-based large-scale germplasm evalua-
tion, markertrait association and marker-assisted selection (see Xu et al., 2009b for further
details). Thus, my career has evolved alongside the transition from molecular biology
research to routine molecular plant breeding applications and I strongly believe that now
is the right time for a mainstream publication providing comprehensive coverage of all
fields relevant for a new generation of molecular breeders.

Acknowledgements

Assistance and professional support

The dream of writing this book could not have become reality without the wonderful sup-
port of Dr Susan McCouch at Cornell University and Dr Jinhua Xiao, now at Monsanto, who
have both fully supported my proposal since 2002. Their support and consistent encour-
agement has greatly motivated me throughout the process. While working with Susan,
she allowed me so much flexibility in my research projects and working hours so that I
could continue to make progress on the writing of this book. At the same time the Cornell
libraries were an indispensible source of the major references cited throughout the book.
Susans encouragement provided the impetus to keep working on the book through a very
difficult time in my life. I also extend my appreciation to Dr Jonathan Crouch, the Director
of the Germplasm Resources Program at CIMMYT, where I received his full understanding
and support so that I could complete the second half of the book. Jonathans guidance and
contribution to my research projects and publications while at CIMMYT has significantly
impacted the preparation of the book.
I would also like to thank the chief editors of the three journals for which I have served
on the editorial boards during the preparation of this book: Dr Paul Christou for Molecular
Breeding, Dr Albrecht Melchinger for Theoretical and Applied Genetics, and Dr Hongbin
Zhang for International Journal of Plant Genomics. I thank them for their patience, support
and flexibility with my editorial responsibilities during the preparation of the book. In addi-
tion, Drs Christou and Melchinger also reviewed several chapters in their respective fields.
My appreciation also goes to Yanli Lu (a graduate student from Sichuan Agricultural
University of China) and Dr Zhuanfang Hao (a visiting scientist from the Chinese Academy
of Agricultural Sciences) who helped prepare some figures and tables during their work
in my lab at CIMMYT, Mexico. I would like to give special thanks to Dr Rodomiro Ortiz
at CIMMYT for his consistent information sharing and stimulating discussions during our
years together at CIMMYT. Finally, I would like to thank my colleagues at CIMMYT, par-
ticularly Drs Kevin Pixley, Manilal William, Jose Crossa and Guy Davenport, who provided
useful discussions on various molecular breeding-related issues.

Forewords

I am greatly indebted to Dr Norman E. Borlaug, visioned plant breeder and Nobel laure-
ate for his role in the Green Revolution, and Dr Ronald L. Phillips, Regents Professor and
McKnight Presidential Chair in Genomics, University of Minnesota, who each contributed
a foreword for the book. Their contributions emphasized the importance of molecular
breeding in crop improvement and the role that this book will play in molecular breeding
education and practice.
Preface xiii

Reviewers

Each chapter of the book has undergone comprehensive peer review and revision before
finalization. The constructive comments and critical advice of these reviewers have greatly
improved this book. The reviewers were selected for their active expertise in the field of the
respective chapter. Reviewers come from almost all continents and work in various fields
including plant breeding, quantitative genetics, genetic transformation, intellectual prop-
erty protection, bioinformatics and molecular biology, many of whom are CIMMYT scien-
tists and managers. Considering that each chapter is relatively large in content, reviewers
had to contribute a lot of time and effort to complete their reviews. Although these inputs
were indispensible, any remaining errors remain my sole responsibility. The names and
affiliations of the reviewers (alphabetically) are:

Raman Babu (Chapters 7 and 9), CIMMYT, Mexico


Paul Christou (Chapter 12), Lleida, Spain
Jose Crossa (Chapter 10), CIMMYT, Mexico
Jonathan H. Crouch (Chapters 13 and 15), CIMMYT, Mexico
Jedidah Danson (Chapters 7 and 9), African Center for Crop Improvement, South Africa
Guy Davenport (Chapter 14), CIMMYT, Mexico
Yuqing He (Chapter 8), Huazhong Agricultural University, Wuhan, China
Gurdev S. Khush (Chapter 1), IRRI, Philippines
Alan F. Krivanek (Chapter 4), Monsanto, Illinois, USA
Huihui Li (Chapter 6), Chinese Academy of Agricultural Sciences, China
George H. Liang (Chapter 12), San Diego, California, USA
Christopher Graham McLaren (Chapter 14), GCP/CIMMYT, Mexico
Kenneth L. McNally (Chapter 5), IRRI, Philippines
Albrecht E. Melchinger (Chapter 8), University of Hohenheim, Germany
Rodomiro Ortiz (Chapters 12, 13 and 15), CIMMYT, Mexico
Edie Paul (Chapter 14), GeneFlow, Inc., Virginia, USA
Kevin V. Pixley (Chapters 1, 4 and 5), CIMMYT, Mexico
Trushar Shah (Chapter 14), CIMMYT, Mexico
Daniel Z. Skinner (Chapter 12), Washington State University, USA
Debra Skinner (Chapter 11), University of Illinois, USA
Michael J. Thomson (Chapters 2 and 3), IRRI, Philippines
Bruce Walsh (Chapters 1, 6 and 8), University of Arizona, USA
Marilyn L. Warburton (Chapter 5), USDA/Mississippi State University, USA
Huixia Wu (Chapter 12), CIMMYT, Mexico
Rongling Wu (Chapter 1), University of Florida, Gainesville, USA
Weikai Yan (Chapter 10), Agriculture and Agri-Food Canada, Ottawa, Canada
Qifa Zhang (Chapters 8 and 12), Huazhong Agricultural University, Wuhan, China
Wanggen Zhang (Chapter 12), Syngenta, Beijing, China
Yuhua Zhang (Chapter 12), Rothamsted Research, UK

Publishers and development editors

Several editors at CABI have been working with me over the years: Tim Hardwick (2002
2006), Sarah Hulbert (20062007), Stefanie Gehrig (20072008), Claire Parfitt (20082009),
Meredith Caroll (2009) and Tracy Head (2009). These editors and their associates have
done a superb job of converting a series of manuscripts into a useable and coherent book.
I thank them for their effort, consideration and cooperation.
xiv Preface

Research grants

During the preparation of the book, my research on genomic analysis of plant water-use
efficiency at Cornell University was supported by the National Science Foundation (Plant
Genome Research Project Grant DBI-0110069). My molecular breeding research at CIMMYT
has been supported by the Rockefeller Foundation, the Generation Challenge Programme
(GCP), Bill and Melinda Gates Foundation and the European Community, and through
other attributed or unrestricted funds provided by the members of the Consultative Group
on International Agricultural Research (CGIAR) and national governments of the USA,
Japan and the UK.

Family

It is difficult to imagine writing a book without the full support and understanding of ones
family. My greatest thanks go to my wife, Yu Wang, who has given me her wholehearted and
unwavering support, and to my sons, Sheng, Benjamin and Lawrence, who have retained
great patience during this long adventure. And finally to my parents, for their love, encour-
agement and vision that unveiled in me from my earliest years the desire to thrive on the
challenge of always striving to reach the highest mountain in everything I do.
Foreword
DR NORMAN E. BORLAUG

The past 50 years have been the most productive period in world agricultural history.
Innovations in agricultural science and technology enabled the Green Revolution, which
is reputed to have spared one billion people the pains of hunger and even starvation.
Although we have seen the greatest reductions in hunger in history, it has not been enough.
There are still one billion people who suffer chronic hunger, with more than half being
small-scale farmers who cultivate environmentally sensitive marginal lands in developing
countries.
Within the next 50 years, the world population is likely to increase by 6080%, requir-
ing global food production to nearly double. We will have to achieve this feat on a shrinking
agricultural land base, and most of the increased production must occur in those countries
that will consume it. Unless global grain supplies are expanded at an accelerated rate, food
prices will remain high, or be driven up even further.
Spectacular economic growth in many newly industrializing developing countries,
especially in Asia, has spurred rapid growth in global cereal demand, as more people eat
better, especially through more protein-heavy diets. More recently, the subsidized conver-
sion of grains into biofuels in the USA and Europe has accelerated demand even faster.
On the supply side, a slowing in research investment in the developing world and more
frequent climatic shocks (droughts, floods) have led to greater volatility in production.
Higher food prices affect everyone, but especially the poor, who spend most of their
disposable income on food. Increasing supply, primarily through the generation and diffu-
sion of productivity-enhancing new technologies, is the best way to bring food prices down
and secure minimum nutritional standards for the poor.
Todays agricultural development challenges are centred on marginal lands and in
regions that have been bypassed during the Green Revolution, such as Africa and resource-
poor parts of Asia, and are experiencing the ripple effects of food insecurity through hun-
ger, malnutrition and poverty.
Despite these serious and daunting challenges, there is cause for hope. New science
and technology including biotechnology have the potential to help the worlds poor and
food insecure. Biotechnologies have developed invaluable new scientific methodologies
and products for more productive agriculture and added-value food. This journey deeper
into the genome to the molecular level is the consequence of our progressive understanding
of the workings of nature. Genomics-based methods have enabled breeders greater preci-
sion in selecting and transferring genes, which has not only reduced the time needed to

xv
xvi Foreword

eliminate undesirable genes, but has also allowed breeders to access useful genes from
distant species.
Bringing the power of science and technology to bear on the challenges of these riskier
environments is one of the great challenges of the 21st century. With the new tools of bio-
technology, we are poised for another explosion in agricultural innovation. New science
has the power to increase yields, address agroclimatic extremes and mitigate a range of
environmental and biological challenges.
Molecular Plant Breeding, authored by my CIMMYT colleague Yunbi Xu, is an out-
standing review and synthesis of the theory and practice of genetics and genomics that
can drive progress in modern plant breeding. Dr Xu has done a masterful job in integrating
information about traditional and molecular plant breeding approaches. This encyclope-
dic handbook is poised to become a standard reference for experienced breeders and stu-
dents alike. I commend him for this prodigious new contribution to the body of scientific
literature.
Foreword
DR RONALD L. PHILLIPS

The New Plant Breeding Roadmap

The road is long from basic research findings to final destinations reflecting important
applications but it is a road that can ultimately save time and money. There may be obsta-
cles along the way that delay building that road but they are generally overcome by careful
thought and timely considerations. A new road may involve the former road but with some
widening and the filling in of certain potholes. We seldom look back and think that the
improvements were not useful.
The road to improved varieties by traditional plant breeding has and continues to serve
society well. That approach has been based on careful observation, evaluation of multi-
ple genotypes (parents and progenies), selection at various generational levels, extensive
testing and the sophisticated utilization of statistical analyses and quantitative genetics.
About 50% of the increased productivity of new varieties is generally attributed to genetic
improvements, with the remaining 50% due to many other factors such as time of planting,
irrigation, fertilizer, pesticide applications and planting densities.
The statistical genetics associated with traditional plant breeding can now be supple-
mented by extensive genomic information, gene sequences, regulatory factors and linked
genetic markers. We can now draw on a broader genetic base, the identification of major
loci controlling various traits and expression analyses across the entire genome under vari-
ous biotic and abiotic conditions. One can anticipate a future when the networking of
genes, genotype-by-environment (G E) interactions, and even hybrid vigour will be better
understood and lead to new breeding approaches. The importance of de novo variation may
modify much of our current interpretation of breeding behaviour; de novo variation such as
mutation, intragenic recombination, methylation, transposable elements, unequal crossing
over, generation of genomic changes due to recombination among dispersed repeated ele-
ments, gene amplification and other mechanisms will need to be incorporated into plant
breeding theory.
This book calls for an integration of approaches traditional and molecular and
represents a theoretical/practical handbook reflecting modern plant breeding at its fin-
est. I believe the reader will be surprised to find that that this single-authored book is so
full of information that is useful in plant genetics and plant breeding. Students as well
as established researchers wanting to learn more about molecular plant breeding will be

xvii
xviii Foreword

well-served by reading this book. The information is up-to-date with many current refer-
ences. Even many of the tables are packed with information and references. A good rep-
resentation of international and domestic breeding is reflected through many examples.
The importance of G E interactions is clearly demonstrated. Various statistical models
are provided as appropriate. The importance of defining mega-environments for varietal
development is made clear. The role of core germplasm collections, appropriate population
sizes, major databases and data management issues are all integrated with various plant
breeding approaches. Marker-assisted selection receives considerable attention, includ-
ing its requirements and advantages, along with the multitude of quantitative trait locus
(QTL) analysis methods. Transformation technologies leading to the extensive use of trans-
genic crops are reviewed along with the increased use of trait stacking. The procurement of
intellectual property that, in part, is driving the application of molecular genetics in plant
breeding provides the reader with an understanding of why private industry is now more
involved and why some common crops represent new business opportunities.
Molecular Plant Breeding is not like other plant breeding books. The interconnecting
road that it depicts is one where you can look at the beautiful new scenery and appreciate
the current view, yet see the horizon down the road.
1
Introduction

Several definitions of plant breeding have and technologies discussed in the following
been put forward, such as the art and sci- chapters of this book.
ence of improving the heredity of plants for
the benefit of humankind (J.M. Poehlman),
or evolution directed by the will of man
(N.I. Vavilov). Bernardo (2002), however, 1.1 Domestication of Crop Plants
offers the most universal description: Plant
breeding is the science, art, and business of The earliest records indicate that agricul-
improving plants for human benefit. ture developed some 11,000 years ago in
Plants are employed in the manufac- the so-called Fertile Crescent, a hilly region
ture of a multitude of products for domes- in south-western Asia. Agriculture devel-
tic (cosmetics, medicines and clothing), oped later in other regions. Archaeologists
industrial (manufacture of rubber, cork suggest that plant domestication began
and engine fuel) and recreational uses because of the increasing size of popula-
(paper, art supplies, sports equipment and tions and changes in the exploitation of
musical instruments) and plant breeders local resources (see http://www.ngdc.noaa.
have therefore been driven by the chal- gov/paleo/ctl/10k.html for further details).
lenges of meeting the ever increasing Domestication is a selection process carried
demands of the manufacturers of these out by man to adapt plants and animals to
products. Lewington has described the their own needs, whether as farmers or con-
diverse uses of plants in his book Plants sumers. Successive selection of desirable
for People (2003). plants changed the genetic composition of
Plant breeding began by the domesti- early crops. Primitive farmers, knowing little
cation of crop plants and has become ever or nothing about genetics or plant breeding,
more sophisticated. New developments accomplished much in a short time. They
in molecular biology have now led to an did so by unconsciously altering the natural
increasing number of methods which can process of evolution. Indeed, domestication
be used to enhance breeding effective- is nothing more than directed evolution; as
ness and efficiency. This chapter includes a result, the process of evolution is acceler-
a brief history of plant breeding together ated. The key to domestication is the selec-
with breeding objectives and some back- tive advantage of rare mutant alleles, which
ground information relevant to the theories are desirable for successful cultivation,

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 1


2 Chapter 1

but unnecessary for survival in the wild. domesticated plants is another example.
The process of selection continues until For further information see http://oregon-
the desired mutant phenotype dominates state.edu/instruct/css/330/index.htm and
the population. There are three important Swaminathan (2006).
steps in the domestication process. Man It is generally believed that domesti-
not only planted seeds, but also: (i) moved cation of crop plants was undertaken in
seeds from their native habitat and planted several regions of the world independ-
them in areas to which they were perhaps ently. The Russian geneticist and plant
not as well adapted; (ii) removed certain geographer N.I. Vavilov, collected plants
natural selection pressures by growing the from all over the world and identified
plants in a cultivated field; and (iii) applied regions where crop species and their wild
artificial selection pressures by choosing relatives showed great genetic diversity. In
characteristics that would not necessarily 1926 he published Studies on the origin
have been beneficial for the plants under of cultivated plants in which he described
natural conditions. Cultivation also cre- his theories regarding the origins of crops.
ates selection pressure, resulting in changes Vavilov concluded that each crop had a
in allele frequency, gradations within and characteristic primary centre of diversity
between species, fixation of major genes, which was also its centre of origin. He
and improvement of quantitative traits. By identified eight areas and hypothesized
the end of the 18th century, the informal that these were the centres from which all
processes of selection practised by farmers our modern major crops originated. Later,
everywhere led to the worldwide creation he modified his theory to include second-
of thousands of different cultivars or land- ary centres of diversity for some crops.
races for each major crop species. These centres of origin included China,
More than 1000 species of plants have India, Central Asia, the Near East, the
been domesticated at one time or another, of Mediterranean, East Africa, Mesoamerica,
which about 100200 are now major com- and South America. From these foci, agri-
ponents of the human diet. The 15 most culture was progressively disseminated to
important examples can be divided into the other regions such as Europe and North
following four groups: America. Subsequently, others includ-
ing the American geographer Jack Harlan,
1. Cereals: rice, wheat, maize, sorghum,
challenged Vavilovs hypothesis because
barley.
many cultivated plants did not fit Vavilovs
2. Roots and stems: sugarbeet, sugarcane,
pattern, and appeared to have been domes-
potato, yam, cassava.
ticated over a broad geographical area for a
3. Legumes: bean, soybean, groundnut.
long period of time.
4. Fruits: coconut, banana.
In recent years, variation in DNA frac-
Certain characteristics may have been tions and other approaches have been used
selected deliberately or unwittingly. When to study the diversity of crop species. In
farmers set aside a portion of their har- general, these studies have not confirmed
vest for planting in the next season, they Vavilovs theory that the centres of origin
were selecting seeds with specific char- are the areas of greatest diversity, because
acteristics. This selection has resulted in while centres of diversity have been iden-
profound differences between crop plants tified, these are often not the centres of
and their progenitors. For example, many origin. For some crops there is little connec-
wild plants have a seed dispersal mecha- tion between the source of their wild ances-
nism that ensures that seeds will be sepa- tors, areas of domestication, and the areas
rated from the plants and distributed over of evolutionary diversification. Species
as large an area as possible, while mod- may have originated in one geographic area,
ern crops have been modified by selec- but domesticated in a different region and
tion against seed dispersal. The absence some crops do not appear to have centres
of seed dormancy mechanisms in some of diversity, thus a continuum of evolution-
Introduction 3

ary activity is perceived rather than discrete selected the best plants to provide seed for
centres. their next crop. When they found particular
In 1971, Jack Harlan described his own plants that fared well even in bad weather,
views on the origins of agriculture. He pro- were especially prolific, or resisted disease
posed three independent systems, each that had destroyed neighbouring crops,
with a centre and a concentre (larger, dif- they naturally tried to capture these desir-
fuse areas where domestication is thought to able traits by crossbreeding them into other
have occurred): Near East + Africa, China + plants. In this way, they selected and bred
South-east Asia, and Mesoamerica + South plants to improve their crop for commercial
America. purposes. Although unbeknown to them,
Evidence gathered since that time farmers have been utilizing genetics for cen-
suggests that these centres are also more turies to modify the food we eat by selecting
diffuse than he had envisioned. After the and growing seeds which produce a health-
initial phases of evolution, species spread ier crop that has a better flavour, richer col-
out over large, ill-defined areas. This is our and stronger resistance to certain plant
probably due to the dispersal and evolu- diseases.
tion of crops associated with iterant popu- Modern plant breeding started with
lations. Regional and/or multiple areas of sedentary agriculture and the domestica-
origin may prove to be more accurate than tion of the first agricultural plants, cereals.
the hypothesis of a unique, localized ori- This led to the rapid elimination of undesir-
gin for many crops. However, the probable able characters such as seed-shattering and
geographic origin of many crops is listed in dormancy and we can only speculate on
Table 1.1. how much foresight or what kind of plan-
ning based on experience was used by the
first selectors of non-shattering wheat and
1.2 Early Efforts at Plant Breeding rice, compact-headed sorghum, or soft-
shelled gourds. For 10,000 years man has
For thousands of years selective breed- consciously been moulding the phenotype
ing has been employed to re-engineer (and so the genotype) of hundreds of plant
plants to produce traits or qualities that species as one of the many routine activi-
were considered to be desirable to con- ties in the normal course of making a living
sumers. Selective breeding began with the (Harlan, 1992). Over long periods of time
early farmers, ranchers and vintners who there was a transition from the collection of

Table 1.1. Probable geographic origins for crops.

Region Crops

Near East (Fertile Crescent) Wheat and barley, flax, lentils, chickpea, figs, dates, grapes, olives,
lettuce, onions, cabbage, carrots, cucumbers, melons;
fruits and nuts
Africa Pearl millet, Guinea millet, African rice, sorghum, cowpea, groundnut,
yam, oil palm, watermelon, okra
China Japanese millet, rice, buckwheat, soybean
South-east Asia Wet- and dryland rice, pigeon pea, mung bean, citrus fruits, coconut,
taro, yams, banana, breadfruit, coconut, sugarcane
Mesoamerica and Maize, squash, common bean, lima bean, peppers, amaranth,
North America sweet potato, sunflower
South America Lowlands: cassava; Mid-altitudes and uplands (Peru): potato,
groundnut, cotton, maize

See http://agronomy.ucdavis.edu/gepts/pb143/lec10/pb143l10.htm for a thorough presentation on the geographic origins


of crops.
4 Chapter 1

wild plants for food to the selection of those of commercial seed production enterprises.
to be cultivated which began to guide the Besides selecting plants with useful charac-
evolutionary process. Now plant breeders teristics, breeders also arrange marriages
accelerate the evolution of major crop spe- between plants with different traits in the
cies through skilful manipulation of breed- hope of producing fertile offspring carry-
ing procedures. High-input agriculture ing both traits. The use of artificial crosses
emerged as a result of voyages of discovery in pre-Mendelian breeding is exemplified
and modern science. by the case of Fragaria ananassa devel-
Many traits important to early agricul- oped in the botanical garden of Paris by
turists were heritable and, therefore, could Duchesne, in the 17th century by crossing
be reliably selected. However, this phase Fragaria chiloense with Fragaria virgin-
of breeding was empirical and generally iana. In England, at about the same time
not considered scientific in the modern new cultivars of fruits, wheat and peas were
sense because changes in these plant and being obtained by artificial hybridization
animal populations were not analysed in (Snchez-Monge, 1993).
an attempt to explain biological phenom- Hybridization combined with selec-
ena. At this stage of agriculture, the focus tion was adopted by Patrik Sheireff in
was on the practical goal of producing 1819 in wheat and rice where the new
food rather than finding rational explana- selections were grown along with culti-
tions for nature (Harlan, 1992). Ideas about vars for comparative purposes. He specu-
heredity during the period when many lated that introduction and hybridization
early crops were domesticated ranged to be the important sources of new cul-
from mythological interpretations to near- tivars and stressed crossing of carefully
scientific notions of trait transmission. In selected parents to meet the aims of new
his Presidential Address to the American cultivars. Although the essential elements
Society for Horticulture Science in 1987, of plant breeding were known by this time,
Janick (1988) stated: there was still a lack of knowledge regard-
ing the scientific basis of variation among
The origin of new information in
horticulture derives from two traditions: plants. For example, the first generation
empirical and experimental. The roots of of crossed materials were mistakenly
empiricism stem from efforts of prehistoric expected to inevitably produce new culti-
farmers, Hellenic root diggers, medieval vars but instead took several generations to
peasants, and gardeners everywhere to stabilize. Many historical examples of suc-
obtain practical solutions to problems of cessful plant breeding can be found in the
plant growing. The accumulated successes literature, although there were still many
and improvements passed orally from important discoveries to be made before it
parent to child, from artisan to apprentice,
could be called a technology (Chahal and
have become embodied in human
Gosal, 2002).
consciousness via legend, craft secrets,
and folk wisdom. This information is
now stored in tales, almanacs, herbals,
and histories and has become part of our
1.3 Major Developments in the
common culture. More than practices
and skills were involved as improved History of Plant Breeding
germplasm was selected and preserved via
seed and graft from harvest to harvest and Plant breeders of today use various meth-
generation to generation. The sum total of ods to accelerate the evolutionary process
these technologies makes up the traditional in order to increase the usefulness of plants
lore of horticulture. It represents a by exploiting genetic differences within a
monumental achievement of our forbears
species. This has been made possible by the
unknown and unsung.
determination of the genetic basis for devel-
Large-scale breeding activities began oping crop breeding procedures and this in
very early in Europe, often under the auspices turn has a long history.
Introduction 5

1.3.1 Breeding and hybridization 1.3.3 Selection

The role of reproduction in plants was In 1859 Darwin proposed in The Origin of
first reported in 1694 by Camerarius who Species that natural selection is the mech-
noticed the difference between male and anism of evolution. Darwins thesis was
female reproductive organs in maize and that the adaptation of populations to their
produced the first artificial hybrid plant. He environments resulted from natural selec-
established that seed could not be produced tion and that if this process continued for
without the participation of pollen produced long enough, it would ultimately lead to
in male reproductive organs of plants. The the origin of new species. Darwins Theory
first hybridization experiment was carried of Evolution through Natural Selection
out on wheat by Fairchild in 1719 and the hypothesized that plants change gradually
current technique of hybridization is largely by natural selection operating on variable
based on the work of Klreuter (17331806), populations and was the outstanding dis-
a French researcher who carried out his covery of the 19th century with direct rele-
experiments in the 1760s. Hybridization vance to plant breeding.
freed the breeder from the severe constraints
of working within a limited population,
enabled him to bring together useful traits
1.3.4 Breeding types and polyploidy
from two or more sources, and allowed spe-
cific genes to be introduced.
By understanding the reproductive Other historical developments in plant
capacities of plants, plant breeders can breeding include, pedigree breeding, back-
manipulate these crosses to produce fer- cross breeding (Harlan and Pope, 1922) and
tile offspring which carry traits from both mutation breeding (Stadler, 1928). Natural
parents. Crossing has been very valuable and artificial polyploids also offered new
to plant breeders, because it allows some possibilities for plant breeding. Blakeslee
measure of control over the phenotype of and Avery (1937) demonstrated the use-
a plant. Nearly all modern plant breeding fulness of colchicine in the induction of
involves some use of hybridization. chromosome doubling and polyploidy,
enabling plant breeders to combine entire
chromosome sets of two or more species to
evolve new crop plants.
1.3.2 Mendelian genetics

It was Gregor Johann Mendel, a Moldavian


monk, who in 1865 discovered the basic 1.3.5 Genetic diversity and germplasm
rules that govern heredity as a result of a conservation
series of experiments in which he crossed
two cultivars of pea plants. By studying The importance of genetic diversity in plant
the inheritance of all-or-none variation in breeding was recognized by the 1960s and
peas, Mendel discovered that inherited Sir Otto Frankel coined the term genetic
traits are determined by units of material resources in 1967 to highlight the relevance
that are transferred from one generation and need to consider germplasm as a natural
to another. Mendel was probably ahead of resource for the long-term improvement of
his time as other biologists of that era took crop plants. The potentially harmful effects
35 years to appreciate his work and plant of genetic uniformity became apparent with
breeding remained deprived of the deliber- the epidemic of southern corn leaf blight
ate application of the law of genetics until in the USA in 1970 which destroyed about
1900 when Hugo de Vries, Carl Correns and 15% of US maize in just 1 year. The National
Erich von Tschermak-Seysenegg rediscov- Academy of Sciences, USA, released the
ered Mendels work. results of its study Genetic Vulnerability
6 Chapter 1

in Major Crops that brought into focus the of plant genomics, particularly molecular
causes and levels of genetic uniformity and markers, and other molecular tools that can
its consequences. It was a turning point be used to dissect complex traits into sin-
in the history of germplasm resources and gle Mendelian factors (Xu and Zhu, 1994;
the International Board for Plant Genetic Buckler et al., 2009; Chapters 6 and 7).
Resources (IBPGR) was established in 1974, Genotype-by-environment interaction
and was later renamed the International (GEI) and its importance to plant breed-
Plant Genetic Resources Institute (IPGRI) ing were first recognized by Mooers (1921)
and now Biodiversity International, to col- and Yates and Cochran (1938). Since then,
lect, evaluate and conserve plant germplasm various statistical methods have been
for future use. developed for the evaluation of GEI using
joint linear regression, heterogeneity of
variance and lack of correlation, ordina-
tion, clustering, and pattern analysis. As
1.3.6 Quantitative genetics and an important field in quantitative genet-
genotype-by-environment interaction ics, GEI has been receiving more attention
in recent years and is covered in Chapter
Quantitative genetics is the study of the 10 along with molecular methods for GEI
genetic control of those traits which show analysis.
continuous variation. It is concerned with
the level of inheritance of these differences
between individuals rather than the type of 1.3.7 Heterosis and hybrid breeding
differences, that is quantitative rather than
qualitative (Falconer, 1989). Several important
Although early botanists had observed
books have been published which document
increased growth when unrelated plants
the major developments in quantitative genet-
of the same species were crossed, it was
ics and these include Animal Breeding Plans
Charles Darwin who carried out the first
(Lush, 1937), Population Genetics and Animal
seminal experiments. In 1877, he showed
Improvement (Lerner, 1950), Biometrical
that crosses of related strains did not
Genetics (Mather, 1949), Population Genetics
exhibit the vigour of hybrids. He observed
(Li, 1955), An Introduction to Genetics Statistics
heterosis, i.e. the tendency of cross-bred
(Kempthorne, 1957) and Introduction to
individuals to show qualities superior to
Quantitative Genetics (Falconer, 1960).
those of both parents, in crops like maize
Many of the misconceptions regarding
and concluded that cross-fertilization was
the inheritance of quantitative traits, which
generally beneficial and self-fertilization
include most of the economically important
injurious. In 1879, William Beal demon-
characters, were corrected by the classical
strated hybrid vigour in maize by using
work of Fisher (1918) who successfully
two unrelated cultivars. The best combi-
applied Mendelian principles to explain
nations yielded 50% more than the mean
the genetic control of continuous varia-
of the parents. Reports by Sanborn in 1890
tion. He divided the phenotypic variance
and McClure in 1892 confirmed Beals ear-
observed into three variance components:
lier reports and extended the generality of
additive, dominance and epistatic effects. This
the superiority of hybrids over the average
approach has been substantially refined and
of the parental forms.
applied to the improvement of the efficiency
of plant breeding. Fisher also laid the found-
ations for scientific crop experimentation
by developing the theory of experimental 1.3.8 Refinement of populations
designs that is an essential part of any plant
breeding programme. Quantitative genetics Several different population breeding
has however evolved considerably in the methods can be used: (i) bulk; (ii) mass
past two decades because of the development selection; and (iii) recurrent selection. One
Introduction 7

of the methods used for managing large All the genes necessary to make an
populations of segregates was the bulk entire organism can be induced to function
method proposed by Harlan et al. (1940) in the correct sequence from a living cell
for multi-parent crosses. This concept isolated from a mature tissue (called totipo-
changed the breeding methodologies for tency). Regeneration of whole plants from
self-pollinated species. Mass selection is single cells is an important new source of
a system of breeding in which seeds from genetic variability for refining the properties
individuals selected on the basis of pheno- of plants because when somatic embryos
type are bulked and used to grow the next derived from single cells are grown into
generation. Mass selection is the oldest plants, the plants characteristics vary some-
breeding method for plant improvement what. Larkin and Scowcroft (1981) coined
and was employed by early farmers for the the term somaclonal variation to describe
development of cultivated species from this observed phenotypic variation among
their ancestral forms. plants derived from micro-propagation
The enhancement of open-pollinated experiments. When it was recognized as a
populations of crops such as rye, maize genuine phenomenon, somaclonal variation
and sugarbeet, herbage grasses, legumes, was considered to be a potential tool for the
and tropical trees such as cacao, coconut, introduction of new variants of perennial
oil palm, and some rubber, depends essen- crops that can be asexually propagated (e.g.
tially on changing the gene frequencies so banana). Somaclonal variation has also been
that the favourable alleles are fixed, while exploited by plant breeders as a new source
maintaining a high (but far from maximal) of genetic variation for annual crops.
degree of heterozygosity. Recurrent selec-
tion is a method of plant breeding associ-
ated with quantitatively inherited traits by
which the frequencies of favourable genes 1.3.10 Genetic engineering and
are increased in populations of plants. The gene transfer
methodology is cyclical with each cycle
encompassing two phases: (i) selection of The discovery of the structure of DNA by
genotypes that possess the favourable or Watson and Crick has enhanced traditional
required genes; and (ii) crossing among the breeding techniques by allowing breeders to
selected genotypes. This leads to a gradual pinpoint the particular gene responsible for
increase in the frequencies of the desired a particular trait and to follow its transmis-
alleles. While recurrent selection is often sion to subsequent generations. Enzymes
successful it also has potential limitations that cut and rejoin DNA molecules allow sci-
in closed populations and this has led to entists to manipulate genes in the laboratory.
numerous modifications and alternative In 1973 Stanley Cohen and Herbert Boyer
schemes (see Hallauer and Miranda, 1988). spliced the gene from one organism into the
Recurrent selection breeding methods have DNA of another to produce recombinant
been applied to a wide range of plant spe- DNA which was then expressed normally
cies, including self-pollinated crops. and this formed the basis of genetic engin-
eering. The goal of plant genetic engineers
is to isolate one or more specific genes and
introduce these into plants. Improvement in
1.3.9 Cell totipotency, tissue culture a crop plant can often be achieved by intro-
and somaclonal variation ducing a single gene, and genes can now be
transferred to plants using the natural gene
The discovery of auxins, by Went and transfer system of a promiscuous pathogenic
colleagues, and cytokinins, by Skoog and soil bacterium, Agrobacterium tumefaciens.
colleagues, preceded the first success of in DNA can also be introduced into cells by
vitro culture of plant tissues (White, 1934; bombardment with DNA-coated particles
Nobcourt, 1939). or by electroporation. Transgenic breeding
8 Chapter 1

has the potential to decrease or increase 1.3.12 Breeding efforts in the public
the environmental impact of agricultural and private sectors
practices.
The initial successes in plant genetic Agricultural research has mainly been the
engineering marked a significant turning responsibility of a national and/or state gov-
point in crop research. In the 1990s in par- ernment department. To accelerate progress
ticular, there was an upsurge of private sec- in food production especially in developing
tor investment in agricultural biotechnology. countries, international agricultural research
Some of the first products were plant strains centres were established with major empha-
capable of synthesizing an insecticidal pro- sis on the development of high yielding culti-
tein encoded by a gene isolated from the vars. Two centres, International Rice Research
bacterium Bacillus thuringiensis (Bt). Bt cot- Institute (IRRI), Philippines, and Centro Inter-
ton, maize, and other crops are now grown nacional de Mejoramiento de Maiz y Trigo
commercially. There are also crop cultivars (CIMMYT), Mexico, established in the 1960s,
which are tolerant to or capable of degrad- made phenomenal contributions to food pro-
ing herbicides. Proponents stress the value duction by developing shorter and higher-
of these crops in conserving tillage soil, yielding rice, wheat and maize cultivars.
reducing the use of harmful chemicals and Encouraged by the astonishing success of
reducing the labour and costs involved in these centres and two others which were
crop production. established later, the Consultative Group on
International Agricultural Research (CGIAR)
was established in 1971. The CGIAR now has
1.3.11 DNA markers and genomics 15 international agricultural research cen-
tres, of which eight concentrate on specific
During the 1980s and 1990s, various types crop plants and one on genetic resources
of molecular markers such as restriction with a mission to contribute towards sus-
fragment length polymorphism (RFLP) tainable agriculture for food security espe-
(Botstein et al., 1980), randomly ampli- cially in developing countries. The breeding
fied polymorphic DNA (RAPD) (Williams materials developed at these centres are dis-
et al., 1990; Welsh and McClelland, 1990), tributed to public and private sector research
microsatellites and single nucleotide poly- programmes for utilization in the develop-
morphism (SNP) were developed. Because ment of locally adapted cultivars. Through
of their abundance and importance in the National Agricultural Research Systems
plant genome, molecular markers have been (NARS), these centres work in close coor-
widely used in the fields of germplasm dination with public and private breed-
evaluation, genetic mapping, map-based ing programmes in each country and share
gene discovery and marker-assisted plant their breeding technologies and stocks of
breeding. Molecular marker technology germplasm.
has become a powerful tool in the genetic In the USA, crop breeding, with the
manipulation of agronomic traits. exception of cotton, began largely as a
Initiated by the complete sequencing tax-supported endeavour with breeding
of the Arabidopsis genome in 2000 (The programmes taking place in most State
Arabidopsis Genome Initiative, 2000) and Agricultural Experimental Stations and in
the rice genome in 2002 (Goff et al., 2002; the United States Department of Agriculture
Yu et al., 2002), the genomes of an increas- (USDA). This pattern changed with the
ing number of plants have been or are being advent of hybrid maize when inbred lines
sequenced. Technological developments in were initially developed by public institu-
bioinformatics, genomics and various omics tions and utilized to produce hybrids by pri-
fields are creating substantial data on which vate companies. With the implementation
future revolutions in plant breeding can be of a Plant Variety Protection Act in the USA
based. in 1974, private breeding was expanded to
Introduction 9

include forages, cereals, soybean, and other leading to the proliferation of specific traits
crops. The activities of private companies within that population. The degree of gene
contributed to the total crop breeding flow varies widely and is dependent on the
effort and offered a large number of culti- type of organism and population structure.
var options for farmers and consumers. For example genes in a mobile popula-
In the USA and other industrialized coun- tion are likely to be more widely distrib-
tries today, the new life-science companies uted than those in a sedentary population,
notably the big multinationals such as Dow, resulting in high and low rates of gene flow,
DuPont and Monsanto, dominate the appli- respectively.
cation of biotechnology to agriculture, and
have developed many proprietary products.
1.4.2 Mutation

1.4 Genetic Variation A mutation is any change in the sequence


of the DNA encoding a gene which leads
The creation of new alleles and the mixing to a change in the hereditary material
of alleles through recombination give rise to when an organism undergoes DNA replica-
genetic variation which is one of the forces tion. During the process of replication, the
behind evolution. Natural selection favours nucleotides of a chromosome are altered,
one phenotype over another and these phe- so rather than creating an identical copy of
notypes are conditioned by one or more DNA strands, there are chemical variations
alleles. Genetic variation is fundamental for in the replicated strands. The alteration on
selection, by which progress in plant breed- the chemical composition of DNA triggers
ing can be made. There are various sources a chain reaction in the genetic information
of genetic variation and those described in of an individual. The effect of a mutation
this section are largely based on the infor- depends on its size, location (intron or exon
mation provided at the following web sites: etc.), and the type of cell in which the muta-
http://www.ndsu.nodak.edu/instruct/mcclean/ tion occurs. Large changes involve the loss,
plsc431/mutation and http://evolution.berke addition, duplication or rearrangement of
ley.edu/evosite/evo101/IIICGeneticvariation. whole chromosomes or chromosome seg-
shtml. ments. Most DNA polymerases have the
ability to proofread their work to ensure that
the unaltered genetic material is transferred
1.4.1 Crossover, genetic drift and to the next generation. There are many types
gene flow of mutation and the most common are listed
below.
Chromosomal crossover takes place during 1. Point mutations represent the smallest
meiosis and results in a chromosome with changes where only a single base is altered.
a completely different chemical composi- For example, a single nucleotide change may
tion from the two parent chromosomes. result in the change of an amino acid (aa)
During the process, two chromosomes inter- codon into a stop codon and thus produce
twine and exchange one end of the chro- a change in the phenotype. Point mutations
mosome with the other. The mechanism do not usually benefit the organism as most
of crossing over is the cytogenetic base for occur in recessive genes and are not usually
recombination. expressed unless two mutations occur at the
Gene flow refers to the passage of traits same locus.
or genes between populations to prevent 2. In synonymous or silent substitu-
the occurrence of large numbers of muta- tions the aa sequence of the protein is not
tions and genetic drift. In genetic drift, ran- changed because several codons can code
dom variation occurs in small populations for the same aa, and in non-synonymous
10 Chapter 1

substitutions changes in the aa sequence 7. A mutation in which one nucleotide is


may not affect the function of the protein. changed causing all the codons to its right to
However, there have been many cases where be altered is known as frame-shift mutation.
a change in a single nucleotide can create Since protein-coding DNA is divided into
serious problem, e.g. in sickle cell anaemia. codons of three bases long, insertions and
3. Wild-type alleles typically encode a deletions of a single base can alter a gene
product necessary for a specific biological so that its message is no longer correctly
function and if a mutation occurs in that parsed. As a result, a single base change
allele, the function for which it encodes is can have a dramatic effect on a polypeptide
also lost. The general term for these muta- sequence.
tions is loss-of-function mutations and they 8. Mutations which occur in germ line cells
are typically recessive. The degree to which including both the gametes and the cells
the function is lost can vary. If the function from which they are formed are known
is entirely lost, the mutation is called a null as germinal mutations. A single germ line
mutation. It is also possible that some func- mutation can have a range of effects: (i) no
tion may remain, but not at the level of the phenotypic change; mutations in junk DNA
wild-type allele, these are known as leaky are passed on to the offspring but have no
mutations. obvious effect on the phenotype; (ii) small
4. A small number of mutations are actually (or quantitative) phenotypic changes; and
beneficial to an organism providing new or (iii) significant phenotypic change.
improved gene activity. In these cases, the 9. Mutations in somatic cells which give
mutation creates a new allele that is asso- rise to all non-germ line tissues, only affect
ciated with a new function. Any heterozy- the original individual and cannot be passed
gote containing the new allele along with on to the progeny. To maintain this somatic
the original wild-type allele will express mutation, the individual containing the
the new allele. Genetically this will define mutation must be cloned.
the mutation as a dominant. This class of
In general, the appearance of a new
mutation is known as gain-of-function
mutation is a rare event. Most mutations
mutations.
that were originally studied occurred spon-
5. A substitution is a mutation in which
taneously. Such spontaneous mutations rep-
one base is exchanged for another. Such
resent only a small number of all possible
a substitution could change: (i) a codon to
mutations. To genetically dissect a biologi-
one that encodes a different aa thus caus-
cal system further, induced mutations can
ing a small change in the protein produced;
be created by treating an organism with a
(ii) a codon to one that encodes the same
mutagenic agent.
aa resulting in no change in the protein
produced; or (iii) an aa-coding codon to
a single stop codon resulting in an incom-
plete protein (this can have serious effects 1.5 Quantitative Traits: Variance,
since the incomplete protein will probably Heritability and Selection Index
not be functional).
6. Insertions/deletions (indels) produce Recent advances in high-throughput tech-
changes by deleting or inserting sections nologies for the quantification of biological
of DNA into the parental DNA sequence. molecules have shifted the focus in quan-
Because it is usually impossible to say titative genetics from single traits to com-
whether a sequence has been deleted from prehensive large-scale analyses. So-called
one plant or inserted into another, these dif- omic technologies have now enabled genet-
ferences are called indels. Obviously the icists to determine how genetic informa-
deletion of part of a gene can seriously affect tion is translated into biological function
the phenotype of organisms. Insertions can (Keurentjes et al., 2008; Mackay et al., 2009).
be disruptive if they insert themselves into The ultimate goal of quantitative genetics in
the middle of genes or regulatory regions. the era of omics is to link genetic variation
Introduction 11

to phenotypic variation and to identify the both in terms of action and in transmission
molecular pathway from gene to function. through meiosis.
The recent progress made in humans by
combining linkage disequilibrium mapping
(Chapter 6) and transcriptomics (Chapter
3) holds great promises for high-resolution 1.5.2 The concept of allelic and
association mapping and identification genotypic frequencies
of regulatory genetic factors (Dixon et al.,
2007). Information from omics research will A biological population is defined geneti-
be integrated with our current knowledge at cally as a group of individuals that exist
the phenotypic level to increase the effec- together in time and space and that can
tiveness and efficiency of plant breeding. mate or be crossed to each other to produce
fertile progeny. Statistically, this group is
called a population. Breeding populations
are created by breeders to serve as a source
1.5.1 Qualitative and quantitative traits of cultivars that meet specific breeding
objectives.
In general, qualitative traits are genetically At the population level, genetics can be
controlled by one or a few major genes, characterized by allelic and genotypic fre-
each of which has a relatively large effect quencies. The allele frequencies refer to the
on the phenotype but is relatively insensi- proportion of each allele in the population,
tive to environmental influences. Trait dis- while the genotypic frequencies refer to the
tribution in a typical segregating population proportion of individuals (plants) in the
such as an F2 shows multi-peak distribu- population that have a particular genotype.
tion, although individuals within a category A gene may have many allelic states. Some
show continuous variation. Each individual of the alleles of a given gene may have such
in the population can be classified unam- marked effects as to be clearly recognized
biguously into distinct categories that cor- as a classical major mutant. Other alleles,
respond to different genotypes so that they though potentially separable at the DNA
can be studied using Mendelian methods. level, may well cause only minor differ-
Quantitative traits are genetically ences at the level of the external phenotype.
controlled by many genes, each of which has For example, one allele at a locus involved
a relatively small effect on the phenotype, with growth hormone production could be
but is largely influenced by environmental inactive and result in a dwarf plant, while
factors (Buckler et al., 2009). Trait distri- others may simply reduce or increase height
bution in an F2 population usually shows by a few percent.
a normal or bell-shape distribution and as Allele and genotypic frequencies can be
a result, individuals cannot be classified calculated by simple counting in the popu-
into phenotypic categories that correspond lation. For a gene with n alleles, there are
to different genotypes thus making the n(n + 1)/2 possible genotypes. The relation-
effects of individual genes indistinguish- ship between allele frequency and genotypic
able. Quantitative genetics is traditionally frequency for a single gene at the population
described as the study of all these genes as a level can be used to infer the genetic status
whole and the total variation observed in a of the gene in that population, relative to the
population results from the combined effects expected equilibrium under some assumed
of genetic (polygenes as a whole) and envi- mating system. Allele frequencies are gen-
ronmental factors. However, quantitative erally not an issue in breeding populations
variation is not due solely to minor allelic created from non-inbred parents or from
variation in structural genes as regulatory three or more inbred parents. But breed-
genes no doubt also contribute to this vari- ing populations in both self-pollinated and
ation. We expected polygenes to show all cross-pollinated crops are often created by
the typical properties of chromosomal genes crossing two inbred individuals.
12 Chapter 1

1.5.3 HardyWeinberg equilibrium (HWE) mean, m, also known as the first moment
about the origin, is a parameter used to
A population is in equilibrium if the allele measure the central location of a frequency
and genotypic frequencies are constant distribution. The population variance, s 2,
from generation to generation. A collec- also known as the second moment about the
tion of pure selfers is also at equilibrium mean, provides measures of the dispersion
if all are completely selfed, with PA1A1 = p of the distribution. If the yield trait for a cul-
and PA2A2 = q. This implies that the allele tivar that is genetically homogenous is taken
frequency and genotypic frequency share a as an example, the genetic effect for this
simple relationship: cultivar population is a constant. The yield
for all individuals should also be a constant
PA1A1 = p2 provided that environmental factors do not
PA1A2 = 2pq affect the yield which is equal to the pop-
PA2A2 = q2 ulation mean. However, the yield for each
individual is affected not only by its geno-
or
type but also by environmental factors such
(p + q)2 = p2 + 2pq + q2 as temperature, sunlight, water, and vari-
ous nutrients. As a result, individuals may
With one generation of random mating, have different phenotypic values, in this
i.e. an individual in the population that is case yield, resulting in continuous variation
equally likely to mate with any other indi- among individuals. Therefore, the individ-
vidual, the above simple relationship will ual yield measures vary either positively or
be obeyed. However, HWE represents ide- negatively around the population mean so
alized populations and breeders routinely that they are either higher or lower than the
use procedures that cause deviations from population mean by a certain number which
HWE. These procedures include the lack is determined by its variance.
of random mating, the use of small popula-
tion sizes, assortative mating, selection, and
inbreeding during the development of prog-
enies. Some of these procedures, such as 1.5.5 Heritability
inbreeding and the use of small population
sizes, affect all loci in the population while The response of traits to selection depends
others affect only certain loci. Suppose that on the relative importance of the genetic
two traits are controlled by different sets and non-genetic factors which contribute
of loci, and a change in one trait does not to phenotypic variation among genotypes
affect the other. If selection occurs only for in a population, a concept referred to as
the first trait, the loci affecting that trait heritability. The heritability of a trait has
may deviate from HWE, but the loci for the a major impact on the methods chosen for
other trait will remain in equilibrium. In population improvement, inbreeding, and
large natural populations, migration, muta- selection. Selection for single plants is
tion, and selection are the forces that can more efficient when the heritability is high.
change allelic frequencies from generation The extent to which replicated testing is
to generation. required for selection depends on the herit-
ability of the trait.
The question of whether a trait varia-
1.5.4 Population means and variances tion is a result of genetic or environmental
variation is meaningless in practice. Genes
Theoretically, a population can be described cannot cause a trait to develop unless the
by its parameters such as the mean and vari- organism is growing in an appropriate
ance which depend on the probability dis- environment, and, conversely, no amount
tribution of the population. The arithmetic of manipulation will cause a phenotype to
Introduction 13

develop unless the necessary gene or genes genetic gain, and predicted progress or gain,
are present. Nevertheless, the variability and has been denoted as R, GS, G and G.
observed in some traits might result prima- Starting with a parental population of
rily from difference in the numbers and the mean, m, a subset of individuals is selected.
_
magnitude of the effect of different genes, The selected individuals have a mean x ,
but that variability in other traits might while the offspring
_ of the selected popula-
stem primarily from the differences in the tion has a mean y . The difference between
environments to which various individuals the selected population and the original
have been exposed. It is therefore essential population is defined as the selection dif-
to identify reliable measures to determine ferential, and denoted by S, i.e.
the relative importance of not only the _
numbers and magnitude of the effects of the S=xm
genes involved, but also of the effects of dif-
ferent environments on the expression of The response to selection, R, can be
phenotypic traits (Allard, 1999). written as
Heritablity is defined as the ratio of _
genetic variance to phenotypic variance: R=ym

s G2 s2 The relationship between S and R is


h2 = = 2 G 2
s P sG + s E
2
determined by heritability,
where sP2 is phenotypic variance, which 2
R=hS
has two components, genetic variance sG2
and environment variance sE2. sE2 can be
How much of the selection differen-
estimated by the phenotypic variances of
tial is realized in the offspring population
non-segregating populations such as inbred
depends on the heritability of the trait.
lines and F1s because individuals in such 2
The heritability, h , in the formula can be
a population have the same genotypes and 2 2
either hN or hB (depending on whether the
thus, phenotypic variation in these popu-
offspring are produced by sexual or asexual
lations can be attributed to environmental
reproduction,
_ respectively). From the above
factors. sG2 can be estimated using segregat- 2
formula, y = m + h S.
ing populations such as F2 and backcrosses
The population mean of the offspring
where variance components can be obtained
derived from the selected individuals is
theoretically.
equal to the parental population mean plus
the response to selection (Fig. 1.1). When
h2 = 1, the selection differential will be fully
1.5.6 Response to selection realized in the offspring population so that
its mean will deviate from the parental pop-
Genetic variation forms the basis for selec- ulation by S. When h2 = 0, the selection dif-
tion in plant breeding. Selection results in ferential cannot be realized so the offspring
the differential reproduction of genotypes in population mean will regress to its parental
a population so that gene frequencies change population. When 0 < h2 < 1, the selection dif-
and, with them, genotypic and phenotypic ferential is partially realized so that the mean
values (mean and variance) of the trait being of the offspring population will deviate from
selected. Response to selection, or advance the parental population by h2S. It is very use-
in one generation of selection, is measured ful to predict the response before selection is
by the difference between the selected popu- undertaken and details of the mathematical
lation and their offspring population, which derivation of these predictions together with
is denoted as R. Response to selection has the various complications encountered can
been referred to by several different names, be found in Empig et al. (1972), Hallauer and
including genetic progress, genetic advance, Miranda (1988) and Nyquist (1991).
14 Chapter 1

Parental population

Individuals
x
selected,
k = 5%

S=xm
Selection differential

Progeny population

y (h2 = 0) y (h2 = 0.5) y (h2 = 1.0)


R=ym

Fig. 1.1. Distribution of parental and progeny populations with a selection intensity of 5%. Because the
phenotypic values of the selected plants include both a genetic and an environmental component, the
progeny means depend on the heritability of the trait selected.

1.5.7 Selection index and selection With tandem selection, one trait is selected
for multiple traits until it is improved to a satisfactory level
or a critical phenotypic value. Then, in the
In most plant breeding programmes, there next generation or programme, selection
is a need to improve more than one trait at for a second trait is carried out within the
a time. For example, a high-yielding culti- population selected for the first trait, and
var susceptible to a prevalent disease would so on for the third and subsequent traits.
be of little use to a grower. Recognition A selection index is a single score which
that improvement of one trait may cause reflects the merits and demerits of all target
improvement or deterioration in associ- traits. Selection among individuals is based
ated traits serves to emphasize the need on the relative values of the index scores.
for the simultaneous consideration of all Selection indices provide one method
traits which are important in a crop spe- for improving multiple traits in a breeding
cies. Three selection methods, which are programme. The use of a selection index
recognized as appropriate for the simulta- in plant breeding was originally proposed
neous improvement of two or more traits by Smith (1936) who acknowledged criti-
in a breeding programme, are index selec- cal input from Fisher (1936). Subsequently,
tion, independent culling, and tandem methods of developing selection indices
selection. Independent culling requires the were modified, subjected to critical evalu-
establishment of minimum levels of merit ation, and compared to other methods of
for each trait. An individual with a pheno- multiple trait selection.
type value below the critical culling level It is generally recognized that a selec-
for any trait will be removed from the popu- tion index is a linear function of observable
lation. That is, only individuals meeting phenotypic values of different traits. There
requirements for all traits will be selected. are a number of forms of the equations avail-
Introduction 15

able from index selection for multiple traits on the extent of previous testing of the par-
in grain. To construct a selection index, the ents included in the crosses. Although these
observed value of each trait is weighted by concepts were developed for breeding maize,
an index coefficient, an open-pollinated crop, they are generally
applicable to self-pollinated crops.
I = b1x1 + b2x2 + + bnxn The GCA for an inbred line or a cul-
tivar can be evaluated by the average per-
where I is an index of merit of an indi- formance of yield or other economic traits
vidual, xi represents the observed pheno- in a set of hybrid combinations. The SCA
typic value of the ith trait, and b1 bn are for a cross combination can be evaluated
weights assigned to phenotypic trait meas- by the deviation in its performance from
urements represented as x1 xn. The b val- the value expected from the GCA of its two
ues are the products of the inverse of the parental lines. If the crosses among a set of
phenotypic variancecovariance matrix, inbred lines are made in such a way that
genotypic variancecovariance matrix, and each line is crossed with several other lines
a vector of economic weights. A number of in a systematic manner, the total variation
variations of this index, most changing the among crosses can be partitioned into two
manner of computing the b values, have components ascribable to GCA and SCA. _
been developed. These include the base The mean performance of a cross (x AB)
index of Williams (1962), the desired gain between two inbred lines A and B can be
index of Pesek and Baker (1969), and retro- represented as
spective indexes proposed by Johnson et al. _
(1988) and Bernardo (1991). The emphasis x AB = GCAA + GCAB + SCAAB
in the retrospective index developments is
on quantifying the knowledge experienced The GCAA and GCAB are the GCA of the
breeders have obtained. Baker (1986) sum- parents A and B, respectively, and the cross
marized all select indexes in plant breeding of A B is expected to have a performance
developed before that time. equal to the sum (GCAA + GCAB) of the GCA
of their parents. The actual performance of
the cross, however, may be different from
1.5.8 Combining ability the expectation by an amount equivalent to
the SCA. Sprague and Tatum (1942) inter-
Combining ability is a very important con- preted these combining abilities in terms
cept in plant breeding and it can be used to of type of gene action. The differences due
compare and investigate how two inbred to GCA of lines are the results of additive
lines can be combined together to produce genetic variance and additive by additive
a productive hybrid or to breed new inbred interaction whereas SCA is a reflection of
lines. Selection and development of paren- non-additive genetic variances.
tal lines or inbreds with strong combining
ability is one of the most important breeding
objectives, no matter whether the goal is to 1.5.9 Recurrent selection
create a hybrid with strong vigour or develop
a pure-line cultivar with improved charac- Recurrent selection can be broadly defined
teristics compared to their parental lines. In as the systematic selection of desirable
maize breeding, Sprague and Tatum (1942) individuals from a population followed by
partitioned the genetic variability among recombination of the selected individuals to
crosses into effects due to primarily either form a new population. The basic feature of
additive or non-additive effects, which cor- recurrent selection methods is that they are
respond to two categories of combining abil- procedures conducted in a repetitive man-
ity, general combining ability (GCA) and ner, or recycling, including development
special combining ability (SCA). The rela- of a base population with which to begin
tive importance of GCA and SCA depends selection, evaluation of individuals from
16 Chapter 1

the population, and selection of superior for outcrossing crops, to rectify limitations
individuals as parents that can be crossed in inbred development by continuous self-
to produce a new population for the next ing that rapidly leads to inbreeding and
cycle of selection, as shown below: allele fixation and thus inadequate oppor-
tunity for selection. There are two ways by
Develop a
which recurrent selection address this lim-
population itation in inbred development (Bernardo,
2002). First, recurrent selection increases
the frequency of favourable alleles in the
Select superior Evaluate indi- population by repeated cycles of selection.
individuals as viduals in the Secondly, recurrent selection maintains the
parents population
degree of genetic variation in the popula-
tion to allow sustained progress from subse-
A cycle of selection is completed each quent cycles of selection. Genetic variation
time a new population is formed. The initial is maintained by recombining a sufficiently
population that is developed for a recurrent large number of individuals to reduce
selection programme is referred to as the random fluctuations in allele frequencies,
base, or cycle 0, population. The population i.e. genetic drift.
formed after one cycle of selection is called Since the late 1950s, extensive research
the cycle 1 population; the cycle 2 popula- has been conducted to determine the rela-
tion is developed from the second cycle of tive importance of different genetic effects
selection, and so on. on the inheritance of quantitative traits for
Recurrent selection procedures are most cultivated plant species. As indicated
conducted for primarily quantitatively by Hallauer (2007), quantitative genetic
inherited traits. The objective of recurrent research has provided extensive information
selection is to improve the mean perform- to assist plant breeders in developing breed-
ance of a population of plants by increas- ing and selection strategies. Directly and/or
ing the frequency of favourable alleles in a indirectly, the principles for the inheritance
consistent manner in order to enhance the of quantitative traits are pervasive in devel-
value of the population and to maintain the oping superior cultivars to meet the world-
genetic variability present in the popula- wide food, feed, fuel and fibre demands.
tion as effectively as possible. In addition, The principles of quantitative genetics will
separation of the genetic and environmen- have continued importance in the future.
tal effects is an important facet of effective
recurrent selection methods. The improved
populations can be used as a cultivar per
se, as parents of a cultivar-cross hybrid and 1.6 The Green Revolution and the
as a source of superior individuals that can Challenges Ahead
be used as inbred lines, pure-line cultivars,
clonal cultivars, or parents of a synthetic The application of science and technology
line. Successful recurrent selection results to crop production in the second half of the
in an improved population that is superior 20th century resulted in significant yield
to the original population in mean perform- improvements for rice, wheat and maize in
ance and in the performance of the best the developed countries, and the final result
individuals within it. Ideally, the popula- of these efforts was the Green Revolution
tion will be improved without its genetic which led to a new type of agriculture
variability being significantly reduced so high-input or chemical-genetic agri cul-
that additional selection and improvement ture which replaced the more traditional
can occur in the future. Recurrent selection system. Countries involved in the Green
is complementary to inbred development Revolution, a term coined by Borlaug
procedures; in fact the concept of recur- (1972), included Japan, Mexico, India and
rent selection was developed, particularly China among others.
Introduction 17

By production and acreage yardsticks, Malthus were forestalled, at least temporar-


agriculture has been very successful. The ily, by the extensive cultivation of new land
application of scientific knowledge to agricul- and by the development of a modern agricul-
ture has resulted in greatly increased yields tural science which enabled food crops to be
per unit land area for many of our important produced at far higher yields than Malthus
crops as exemplified by the 92% increase in could have ever anticipated. However, the
cereal production in the developing world production of food has still not been opti-
between 1961 and 1990. The sharp increase mized in all areas of the world.
in human populations has been paralleled by Weather and climate profoundly affect
the increase in food supply. However, yield crop production and natural events can dis-
growth rates are stagnating in some areas rupt normal climate cycles and affect agricul-
and, in a few cases, falling. A slowdown in ture. In addition, human-induced climatic
the rate of yield increase of major cereals change is set to accelerate during this century
raises concern because increased yields are and this will also impact on crop production.
expected to be the source of increased food Much of the arable land has been used for
production in the future (Reeves et al., 1999). industrial purposes and land-use patterns
On the other hand, increased national wealth indicate an increase in intensive farming
resulting from economic development is not which, however, must be sustainable.
necessarily correlated with a decrease in the Agricultural products are affected by
rate of population growth. Widespread hun- abiotic and biotic stresses and one of the
ger persists in a world that produces enough major challenges to the future of plant
food. breeding is the development of cultivars
There are many reasons for being con- and hybrids with multiple resistances or
cerned about meeting future food demands tolerances to these stresses.
(Khush, 1999; Swaminathan, 2007). Expan- The security of the food supply for
sion of the planets population creates an an increasing world population largely
increased demand for food and income. depends on the availability of water for agri-
Other issues such as the cost of food, which culture. Increasing the efficiency of water
may represent 60% of income in the devel- use for our major crop species is an import-
oping world, the 800 million people who ant target in agricultural research, particu-
are food insecure, the 200 million children larly in light of the increasing competition
who are malnourished, and the continu- for limited supplies of fresh water in many
ing decline of available land for farming parts of the world.
and water to irrigate crops, all indicate the There are four prerequisites for greater
need to use all the technologies available to productivity (Poehlman and Quick, 1983):
increase productivity, assuming they can be (i) an improved farming system; (ii) instruc-
employed in harmony with the environment. tion of farmers; (iii) optimization of the
Plant breeding has generally accounted for supply of water and fertilizers; and (iv)
one-half of the increases in productivity of availability of markets. To increase crop pro-
the major crops and the future will continue ductivity planting high-yielding cultivars
to depend on advances in plant breeding. must be combined with improved practices
The increase in productivity has meant that of irrigation, fertilization and pest control.
large areas of land can be saved as wildlife Maximum crop yield will only be achieved
habitats or used for purposes other than agri- if the improved crop cultivar receives and
culture. As the availability of land and water responds to the optimum combination of
is decreasing and populations are increasing water, fertilizer and cultural practices.
in size, the 50% increase in food production
predicted to be required over the next 25
years, poses an obvious challenge. 1.7 Objectives of Plant Breeding
The danger of population growth over-
taking food supplies was predicted by The aim of plant breeders is to reassemble
Malthus in 1817. The dire predictions of desirable inherited traits to produce crops
18 Chapter 1

with improved characteristics. Thus far, growth of increased numbers of nitrogen-


plant breeders have mainly been concerned fixing microorganisms around their roots to
with bringing about a continuous improve- reduce the need for nitrogen fertilizer.
ment in the productivity of that part of the 8. More efficient use of water whether there
plant which is of economic importance, is a plentiful supply or dearth of water.
the stability of production through in-built 9. Stability of crop production by resilience
resistance to pests and diseases and nutri- to weather fluctuations, resistance to the
tive and organoleptic or other desired qual- multiple alliance of weeds, pests and patho-
ity characters. gens, and tolerance to various abiotic stresses
Many parameters and selection criteria such as heat, cold, drought, wind, and soil
should be included as breeding objectives. salinity, acidity or aluminium toxicity.
According to Sinha and Swaminathan 10. Insensitivity to photoperiod and tem-
(1984) and other sources, the major objec- perature: selection of crop cultivars that are
tives of plant breeders can be summarized insensitive to photoperiod or temperature
by the following list: and characterized by a high per-day bio-
mass production would allow the develop-
1. High primary productivity and efficient ment of contingency cropping patterns to
final production for each unit of cultivation suit different weather probabilities.
and solar energy invested: to ensure that all 11. Plant architecture and adaptability to
the light that falls on a field is intercepted mechanized farming: the number and posi-
by leaves and that photosynthesis itself is tioning of the leaves, branching pattern of
as efficient as possible. Greater efficiency in the stem, the height of the plant, and the
photosynthesis could perhaps be achieved positioning of the organs to be harvested are
by reducing photorespiration. all important to crop production and often
2. High crop yield: plants must be selected determine how well plants can be harvested
which invest a large proportion of their total mechanically.
primary productivity into those areas which 12. Elimination of toxic compounds.
are commercially desirable, e.g. seeds, roots, 13. Identification and improvement of
leaves or stems. hardy plants suitable for sources of biomass
3. Desirable nutritional value, organolep- and renewable energy.
tic properties and processing qualities: the 14. Multiple uses of a single crop.
proportion of essential amino acids and the 15. Environmentally-friendly and stable
total protein in cereal grains, for example, across environments.
should be increased to improve their nutri-
tional quality. In conclusion, plant breeding has many
4. Biofortifying crops with essential mineral breeding objectives and each of the objec-
elements that are frequently lacking in the tives can be addressed in a specific breeding
human diet such as Fe and Zn, vitamins programme. A successful breeding pro-
and amino acids (Welch and Graham, 2004; gramme consists of a series of activities as
White and Broadley, 2005; Bekaert et al., Burton (1981) summarized in six words:
2008; Mayer et al., 2008; Ufaz and Galili, variate, isolate, evaluate, intermate, multiply
2008; Naqvi et al., 2009; Xu et al., 2009a). and disseminate.
5. Modifying crop plants to generate plant-
derived pharmaceuticals to supply low-cost
drugs and vaccines to the developing world 1.8 Molecular Breeding
(Ma et al., 2005).
6. Adaptation to cropping systems: includ- By 2025, the global population will exceed
ing breeding for contrasting cropping, inter- seven billion. In the interim per-capita
cropping, and sustainable cropping systems availability of arable land and irrigation
(Brummer, 2006). water will decrease from year to year as
7. More extensive and efficient nitrogen biotic and abiotic stresses increase. Food
fixation: breeding cereals that encourage the security, best defined as economic, physical
Introduction 19

and social access to a balanced diet and breeding is becoming quicker, easier, more
safe dinking water will be threatened, with effective and more efficient (Phillips, 2006).
a holistic approach to nutritional and non- Plant breeders will be well equipped with
nutritional factors needed to achieve suc- innovative approaches to identify and/
cess in the eradication of hunger. Science or create genetic variation, to define the
and technology can play a very impor- genetic feature of the genes related to the
tant role in stimulating and sustaining an variation (position, function and relation-
Evergreen Revolution leading to long-term ship with other genes and environments),
increases in productivity without any asso- to understand the structure of breeding
ciated ecological harm (Borlaug, 2001; populations, to recombine novel alleles or
Swaminathan, 2007). The objectives of the allele combinations into specific cultivars
plant breeder can be realized through con- or hybrids, and to select the best individu-
ventional breeding integrated with various als with desirable genetic features which
biotechnology developments (e.g. Damude enable them to adapt to a wide range of
and Kinney, 2008; Xu et al., 2009c). environments.
Plant breeding can be defined as an Sequencing data for many plants is now
evolving science and technology (Fig. 1.2). readily available and the GenBank database
It has gradually been evolving from art to is doubling every 15 months. Over 20 plant
science over the last 10,000 years, starting species including many important crops are
as an ancient art to the present molecular in the process of being sequenced (Phillips,
design-based science. With the develop- 2008). The next challenge is to determine
ment of molecular tools which will be dis- the function of every gene and eventually
cussed further in Chapters 2 and 3, plant how genes interact to form the basis of com-
plex traits. Fortunately, DNA chips and
other technologies are being developed to
Art-based Plant Breeding study the expression of multiple or even
all genes simultaneously. High throughput
Collection of wild plants for food robotics and bioinformatics tools will play
Selection of wild plants for cultivation an essential role in this endeavour.
(starting from 10,000 years ago)
New information about our crop spe-
cies is expanding our capabilities to use
Large-scale breeding activities supported molecular genetics. For example, we did
by commercial seed production enterprises not previously realize how similar broadly
Hybridization combined with selection
related species are in terms of their gene
Evolution through natural selection
(1700s1800s) content and gene order. Since these spe-
cies cannot usually be crossed, there was
Mendelian genetics no means of assessing their relatedness.
Quantitative genetics With the advent of DNA-based molecular
Mutation markers, the extensive genetic mapping of
Polyploidy chromosomes became readily possible for
Tissue culture a variety of species. We learned that the
(1900s) genomes were highly similar and that this
similarity allowed the prediction of gene
Gene cloning and direct transfer locations among species. For example, rice
Genomics-assisted breeding has become the model or reference spe-
(2000s and beyond)
cies for the cereals as many of the gene
sequences on the rice chromosomes are
Molecular Plant Breeding
shared with other cereals such as maize,
Fig. 1.2. The steps of evolution of plant breeding. sorghum, sugarcane, millet, oats, wheat
With the availability of more sophisticated tools, and barley (Xu et al., 2005). Knowing the
the art of plant breeding became science-based complete DNA sequence of a model or ref-
technology, molecular plant breeding. erence genome allows genes/traits from this
20 Chapter 1

model to be tracked to other genomes. We improve the understanding of the role of het-
have come to realize that the differences erosis in evolution and the domestication of
between species of plants are not due to crop plants (Lippman and Zamir, 2007), and
novel genes, but to novel allelic specifica- finally to make it possible to predict hybrid
tions and interactions. performance.
Since many fundamental aspects of Messenger RNA transcript profiling is
current plant breeding procedures are not an obvious candidate for functional genomic
well understood, further data relating to application to plant breeding. Although
the genetics of crop species may help to direct selection at the gene transcript level
shed light on the genetic gains obtained using microarray or real-time PCR may be
from plant breeding. For example, in suc- a long-term goal, other genomic tools can
cessful plant breeding programmes, the be used to achieve shorter term goals with
genetic base often becomes narrower rather more practical applications (Crosbie et al.,
than broader. Elite by elite crosses may be 2006). Genetic modification of crops today
the rule in these programmes. Molecular involves the interfacing of molecular bio-
genetic markers have been widely employed logy, cell and tissue culture, and genetics/
to identify cryptic and novel genetic vari- breeding. The transfer of genes by cellu-
ation among cultivars and related species lar and molecular means will increase the
and used to increase the efficiency of selec- available gene pool and lead to second
tion for agronomic traits and the pyramid of generation biotechnology plant products
genes from different genetic backgrounds. such as those with a modified oil, protein,
Long-term selection programmes would vitamin, or micronutrient content or those
be expected to lead to genetic fixation, how- that have been engineered to produce com-
ever this has not been found to be the case pounds that can be used as vaccines or anti-
so far and variation is still observed. Several carcinogens.
mechanisms for de novo variation have been While all these new innovations have
described, including intragenic recombin- been useful, practical plant breeding con-
ation, unequal crossing over among repeated tinues to be based on hybridization and
elements, transposon activity, DNA methyl- selection with little change in the basic
ation, and paramutation. Another important procedures. A more complete understand-
feature in plant breeding whose molecular ing of the mechanisms by which genetic
basis is not understood is heterosis although and environmental variation modify yield
it is used as the basis for many seed-producing and composition is needed so that specific
industries. Genomics and particularly tran- quantitative and qualitative targets can be
scriptomics are now being used to identify identified. To achieve this aim, the exper-
the heterotic genes responsible for increas- tise of plant genomics (including various
ing crop yields. Comprehensive quantitative omics), physiology and agronomy, as well
trait locus-based phenotyping (phenomics) as plant modelling techniques must be com-
combined with genome-wide expression bined (Wollenweber et al., 2005) and many
analysis, should help to identify the loci logistic and genetic constraints also need to
controlling heterotic phenotypes and thus be resolved (Xu and Crouch, 2008).
2
Molecular Breeding Tools:
Markers and Maps

2.1 Genetic Markers markers can be used to facilitate studies of


inheritance and variation.
In conventional plant breeding, genetic Desirable genetic markers should
variation is usually identified by visual meet the following criteria: (i) high level of
selection. However, with the development genetic polymorphism; (ii) co-dominance
of molecular biology, it can now be identi- (so that heterozygotes can be distinguished
fied at the molecular level based on changes from homozygotes); (iii) clear distinct
in the DNA and their effect on the pheno- allele features (so that different alleles
type. Molecular changes can be identified can be identified easily); (iv) even distri-
by the many techniques that have been used bution on the entire genome; (v) neutral
to label and amplify DNA and to highlight selection (without pleiotropic effect);
the DNA variation among individuals. Once (vi) easy detection (so that the whole proc-
the DNA has been extracted from plants or ess can be automated); (vii) low cost of
their seeds, variation in samples can be marker development and genotyping; and
identified using a polymerase chain reac- (viii) high duplicability (so that the data
tion (PCR) and/or hybridization process fol- can be accumulated and shared between
lowed by polyacrylamide gel electrophoresis laboratories).
(PAGE) or capillary electrophoresis (CE) to Most molecular markers belong to the
identify distinct molecules based on their so-called anonymous DNA marker type and
sizes, chemical compositions and charges. generally measure apparently neutral DNA
Genetic markers are used to tag and track variation. Suitable DNA markers should
genetic variation in DNA samples. represent genetic polymorphism at the DNA
Genetic markers are biological features level and should be expressed consistently
that are determined by allelic forms and can across tissues, organs, developmental stages
be used as experimental probes or tags to and environments; their number should be
keep track of an individual, a tissue, cell, almost unlimited; there should be a high
nucleus, chromosome or gene. In classical level of natural polymorphism; and they
genetics, genetic polymorphism represents should be neutral with no effect on the
allelic variation. In modern genetics, genetic expression of the target trait. Finally, most
polymorphism is the relative difference at DNA markers are co-dominant or can be
any genetic locus across a genome. Genetic converted into co-dominant markers.

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 21


22 Chapter 2

Table 2.1 lists the major molecular Schwarz (2005) and Falque and Santoni
marker technologies that are currently (2007). Further information regarding the
available. Only a selection of widely-used application of DNA markers in genetics and
representative types of markers will be dis- breeding can be found in Lrz and Wenzel
cussed in this section. Figure 2.1 shows the (2005). After a brief review of the classical
molecular mechanism of several major DNA markers, DNA markers will be discussed in
markers and the genetic polymorphisms more detail in this section.
that can be generated by restriction site or
PCR priming site mutation, insertion, dele-
tion or by changing the number of repeat 2.1.1 Classical markers
units between two restriction or PCR prim-
ing sites and nucleotide mutation resulting Morphological markers
in a single nucleotide polymorphism (SNP).
There are several comprehensive reviews In the late 1800s, following his studies on
that cover all the important DNA markers, the garden pea (Pisum sativum), G.J. Mendel
e.g. Reiter (2001), Avise (2004), Mohler and proposed two basic rules of genetics,

Table 2.1. DNA markers and related major molecular techniques.

Southern blot-based markers


Restriction fragment length polymorphism (RFLP)
Single strand conformation polymorphic RFLP (SSCP-RFLP)
Denaturing gradient gel electrophoresis RFLP (DGGE-RFLP)
PCR-based markers
Randomly amplified polymorphic DNA (RAPD)
Sequence tagged site (STS)
Sequence characterized amplified region (SCAR)
Random primer-PCR (RP-PCR)
Arbitrary primer-PCR (AP-PCR)
Oligo primer-PCR (OP-PCR)
Single strand conformation polymorphism-PCR (SSCP-PCR)
Small oligo DNA analysis (SODA)
DNA amplification fingerprinting (DAF)
Amplified fragment length polymorphism (AFLP)
Sequence-related amplified polymorphism (SRAP)
Target region amplified polymorphism (TRAP)
Insertion/deletion polymorphism (Indel)
Repeat sequence-based markers
Satellite DNA (repeat unit containing several hundred to thousand base pairs (bp) )
Microsatellite DNA (repeat unit containing 25 bp)
Minisatellite DNA (repeat unit containing more than 5 bp)
Simple sequence repeat (SSR) or simple sequence length polymorphism (SSLP)
Short repeat sequence (SRS)
Tandem repeat sequence (TRS)
mRNA-based markers
Differential display (DD)
Reverse transcription PCR (RT-PCR)
Differential display reverse transcription PCR (DDRT-PCR)
Representational difference analysis (RDA)
Expression sequence tags (EST)
Sequence target sites (STS)
Serial analysis of gene expression (SAGE)
Single nucleotide polymorphism-based markers
Single nucleotide polymorphism (SNP)
Markers and Maps 23

A. Mutation at
enzyme restriction
or PCR priming site
RFLP, AFLP, CAPS

RAPD, AP-PCR, DAF, ISSR

B. Insertion
between enzyme
restriction or PCR
priming sites Insertion

RFLP, AFLP, CAPS, RAPD, AP-PCR, DAF, ISSR

C. Deletion
between enzyme
restriction or PCR
Deletion
banding sites

RFLP, AFLP, CAPS, RAPD,


AP-PCR, DAF, ISSR

D. Change of
tandem repeat
units between
enzyme restriction
or PCR banding
sites
SSR, VNTR, ISSR

E. Single GGACTACGT C GTATCATCGTACCG


nucleotide CCTGATACA G CATAGTAGCATGGC
mutation
GGACTACGT A GTATCATCGTACCG
CCTGATGCA T CATAGTAGCATGGC
SNP

Enzyme restriction site PCR primer

Tandem repeat sequence

Fig. 2.1. Molecular basis of major DNA markers. Parts AE show different ways in which DNA markers
(listed below each diagram) can be generated. The cross in part A indicates that mutation has eliminated
the priming site. Abbreviations: as defined in Table 2.1; VNTR, variable number of tandem repeat; CAPS,
a DNA marker generated by specific primer PCR combined with RFLP; ISSR, inter simple sequence repeat.
24 Chapter 2

which were later known as the Mendelian 1998). Many of these markers have been
laws of equal segregation and independ- linked with other agronomic traits.
ent assortment. Mendel selected individu- Morphological markers are usually
als which differed in a particular trait and mapped by classical two- or three-point
used them as the parental lines in a cross linkage tests. The linkage groups are estab-
breeding experiment to determine the phe- lished and the order of the markers and
notype of the offspring with regard to the the relative distance between any two are
selected trait. The term phenotype (derived determined by their recombinant frequen-
from Greek) literally means the form that cies. Relatively complete linkage maps
is shown and is used by both geneticists have been constructed in many crop spe-
and breeders. The seven pairs of contrasting cies using morphological markers and these
phenotypes studied by Mendel included maps provide the fundamental information
round versus wrinkled seeds, yellow ver- for the genetic mapping of many physiolog-
sus green seeds, purple versus white petals, ical and biochemical traits.
inflated versus pinched pods, green versus However, it is difficult to construct a
yellow pods, axial versus terminal flowers relatively saturated genetic map because of
and long versus short stems. The plants in the limitation in the number of morphologi-
the segregated populations of the pea, such cal markers with distinguishable polymor-
as F2 and backcross, were classified into two phisms. In addition, many morphological
distinct groups depending on their pheno- markers have deleterious effects on pheno-
types. These contrasting morphological types and some are significantly affected by
phenotypes are the starting point for any other factors such as environments or matu-
genetic analysis and can be mapped to par- rity which results in potential problems
ticular chromosomes using the Mendelian when these markers are used for genetics
laws of inheritance and can thus be used as and plant breeding.
morphological markers of the genome and
the particular trait. Cytological markers
Morphological markers therefore gen-
erally represent genetic polymorphisms By studying the morphology, number and
which are visible as differences in appear- structure of chromosomes from different
ance, such as the relative difference in plant species, particular cytogenetic features can
height and colour, distinct differences in be found, such as various types of aneu-
response to abiotic and biotic stresses, and ploidy, variants of chromosome structure
the presence/absence of other specific mor- and abnormal chromosomes. These can
phological characteristics. A large number be used as genetic markers to locate other
of variants showing particular morphologi- genes on to chromosomes and determine
cal or physiological phenotypes have been their relative positions, or used for genetic
generated by tissue culture and mutation mapping via chromosome manipulations
breeding. Using selection techniques these such as chromosome substitution.
variants can be genetically stabilized and The structural features of chromo-
then used as morphological markers. somes can be shown by chromosome kary-
Some genetic stocks contain more than otype and bands. The banding patterns are
one morphological marker, for example indicated by colour, width, order and posi-
there are a total of over 300 morphologi- tion, revealing the difference in distribu-
cal markers available for genetic studies in tions of euchromatin and heterochromatin.
rice (Khush, 1987) and more are being cre- There are Q bands (produced by quina-
ated for functional genomics. Many mor- crine hydrochloride), G bands (produced
phological marker stocks are also available by Giemsa stain) and R bands (reversed
for tomato (http://www.plantpath.wisc.edu/ Giemsa). These chromosome landmarks are
GeminivirusResistantTomatoes/MERC/ not only useful for characterizing normal
Tomato/Tomato.html), maize (Neuffer et al., chromosomes but also for detecting chro-
1997) and soybean (Palmer and Shoemaker, mosome mutation.
Markers and Maps 25

Cytological markers have been widely otide difference within a gene or between
used to identify linkage groups within spe- genes; and in others it represents the site
cific chromosomes and have been widely of a variable number of tandem repeats of
applied in physical mapping. However, junk DNA present between genes. The
because of the limited number and reso- development of RFLP markers has acceler-
lution, they have limited applications in ated the construction of molecular linkage
genetic diversity analysis, genetic mapping maps for many organisms, improved the
and marker-assisted selection (MAS). accuracy of gene location, and reduced the
time required to establish a complete link-
Protein markers age map.
The digestion of purified DNA using
Isozymes are structural variants of an restriction enzymes which cut the DNA
enzyme and while they differ from the strand wherever there is a recognition
original enzyme in molecular weight and site sequence (usually four to eight base
mobility in an electric field, they have the pairs), leads to the formation of RFLPs
same catalytic activity. The difference in which yield a molecular fingerprint that
enzyme mobility is caused by point muta- may be unique to a particular individual.
tions resulting from amino acid substitu- If the bases are positioned at random in the
tion such that isozymes reflect the products genome, an enzyme having a recognition
of different alleles rather than different site with six bases will cleave the DNA at
genes. Therefore, isozymes can be geneti- every 4096 bases on average (46). A genome
cally mapped on to chromosomes and then of 109 bases could thus produce around
used as genetic markers for mapping other 250,000 restriction fragments of variable
genes. Isozyme markers are based on their length. Gel electrophoresis on such a large
biochemistry and thus are also known as number of genomic DNA digestion prod-
biochemical or protein markers. ucts produces a continuous smear image.
However, their use as markers is lim- Particular fragments that are homologous
ited. For example a total of 57 isozymes between several individuals, and possibly
representing about 100 loci have been iden- allelic, can be separated only by means
tified in plants (Vallegos and Chase, 1991) of molecular probes using the Southern
but for specific species only 1020 iso- technique (Southern, 1975). RFLP analysis
zymes are available so that they cannot be includes the following steps (Fig. 2.2):
used to construct a complete genetic map.
Each isozyme can only be identified with a 1. DNA isolation: a significant amount of
specific stain which also limits their use in DNA must be isolated from multiple indi-
practice. viduals from target genotypes (parents and
segregating populations, germplasm survey,
garden blot, etc.) and purified to a fairly
2.1.2 DNA markers stringent degree as contaminants can often
interfere with the restriction enzyme and
RFLP inhibit its ability to digest the DNA.
2. Restriction digestion: restriction enzyme
Botstein et al. (1980) first used DNA restric- is added to purified genomic DNA under
tion fragment length polymorphism (RFLP) buffered conditions. The enzyme cuts at
in human linkage mapping and this pio- recognition sites throughout the genome
neered the utilization of DNA polymor- and leaves behind hundreds of thousands
phisms as genetic markers. It is known that of fragments.
the genomes of all organisms show many 3. Gel electrophoresis: digested products
sites of neutral variation at the DNA level. (restriction fragments) are electrophoresed
These neutral variant sites do not have any on agarose gel and when visualized appear
effect on the phenotype. In some cases a neu- as smears because of the large number of
tral site is nothing more than a single nucle- fragments.
26 Chapter 2

A1 A2 A1 A2 A1 A2

DNA Restriction Agarose gel DNA


extraction fragments electrophoresis denaturing

A1 A2

Radioactive Wash Hybridization Southern


autograph blotting

Fig. 2.2. RFLP workflow from DNA extraction to radio-autograph. Modified from Xu and Zhu (1994).

4. The agarose gel is denatured using NaOH DNA (cDNA). The standard procedure for
solution and then neutralized. developing genomic DNA probes is to digest
5. The DNA fragments are transferred to a total DNA with a methylation-sensitive
nitrocellulose membrane using Southern enzyme (e.g. PstI), thereby enriching the
blotting. library for single-copy sequences (Burr et al.,
6. Probe visualization: the membrane-bound 1988). Typically, the digested DNA is size
genomic DNA is probed by hybridization fractionated on a preparative agarose gel.
using a cloned fragment of the genome of DNA fragments ranging from 500 to 2000 bp
interest or a genome from a relatively close are excised and eluted for cloning into a
species as the probe. plasmid vector (e.g. pUC18). Digests of the
7. The membrane is washed to remove non- plasmids are screened for inserts and their
specifically hybridized DNA. lengths can be estimated. Southern blots of
8. In most cases the sizes of the fragments the inserts can be probed with total sheared
are determined by radioactive methods. genomic DNA to select clones that hybrid-
The probe-restriction enzyme combina- ize to single- and low-copy sequences and to
tions may identify two or more differently eliminate clones that hybridize to medium-
sized fragments. Polymorphism is revealed and high-copy repeated sequences. Single-
whenever the recognized fragments are of and low-copy probes are screened for RFLPs
non-identical lengths. among a sample of genotypes using genomic
DNAs digested with restriction endonucle-
Differences in size of restriction frag- ases (one per assay). Typically, in species
ments are due to: (i) base pair changes that with moderate to high polymorphism rates,
result in gain and loss of restriction sites; two to four restriction endonucleases with
and (ii) insertions/deletions at the restric- hexanucleotide recognition sites are tested.
tion sites within the restriction fragments EcoRI, EcoRV and HindIII are widely used.
on which the probe sequence is located. In species with low polymorphism rates,
Molecular probes are DNA fragments additional restriction endonucleases can
isolated and individualized by cloning or be tested to increase the chance of find-
PCR amplification. They may originate from ing a polymorphism. Both the theory and
fragmented total genomic DNA and thus the techniques for RFLP analysis in plant
contain coding or non-coding sequences, genome mapping have been intensively
unique or repeated, of nuclear or cytoplas- reviewed (Botstein et al., 1980; Tanksley
mic origin. They may also be complementary et al., 1988).
Markers and Maps 27

Most RFLP markers are co-dominant and is used to amplify random sequences from
locus specific. RFLP genotyping is highly a complex DNA template that is comple-
reproducible and the methodology is sim- mentary to it (or includes a limited number
ple and requires no special instrumenta- of mismatches). This means that the ampli-
tion. High-throughput markers (e.g. cleaved fied fragments generated by PCR depend
amplified polymorphic sequence (CAPS) on the length and size of both the primer
or insertion/deletion (indel) markers) can and the target genome. Ten-base oligomers
be developed from RFLP probe sequences. of varying GC content (ranging from 40 to
The CAPS technique, also known as PCR- 100%) are usually used. If two hybridiza-
RFLP, consists of digesting a PCR-amplified tion sites are similar to one another (at least
fragment with one or several restriction 3000 bp) and in opposite directions, that is,
enzymes, and detecting the polymorphism in a configuration that will allow the PCR,
by the presence/absence restriction sites amplification will take place. The amplified
(Konieczny and Ausubel, 1993). products (of up to 3.0 kb) are usually sepa-
RFLP markers are powerful tools rated on agarose gels and visualized using
for comparative and synteny mapping. ethidium bromide staining. The use of a
However, RFLP analysis requires large single 10-mer oligonucleotide promotes the
amounts of high quality DNA and has low generation of several discrete DNA products
genotyping throughput and is very diffi- and these are considered to originate from
cult to automate. Most genotyping involves different genetic loci. Polymorphisms result
radioactive methods so its use is limited to from mutations or rearrangements either at
specific laboratories. RFLP probes must be or between the primer binding sites and are
physically maintained and it is therefore visible in conventional agarose gel electro-
difficult to share them between laboratories. phoresis as the presence or absence of a par-
In addition, the level of RFLP is relatively ticular RAPD band. RAPDs predominantly
low and selection for polymorphic parental provide dominant markers but homologous
lines is a limiting step in the development allele combinations can sometimes be iden-
of a complete RFLP map. tified with the help of detailed pedigree
information.
RAPD RAPDs have several advantages and for
this reason they are widely used (Karp and
Williams et al. (1990) and Welsh and Edwards, 1997). (i) Neither DNA probes nor
McClelland (1990) independently described sequence information is required for the
the utilization of a single, random-sequence design of specific primers. (ii) The proce-
oligonucleotide primer in a low stringency dure does not involve blotting or hybridiza-
PCR (3545C) for the simultaneous ampli- tion steps thus making the technique quick,
fication of several discrete DNA fragments simple and efficient. (iii) RAPDs require rel-
referred to as random amplified polymor- atively small amounts of DNA (about 10 ng
phic DNA (RAPD) and arbitrary primed PCR per reaction) and the procedure can be auto-
(AP-PCR), respectively. Another related mated; they are also capable of detecting
technique is DNA amplification fingerprint- higher levels of polymorphism than RFLPs.
ing (DAF) (Caetano-Anolls et al., 1991). (iv) Development of markers is not required
These methods differ from one another in and the technology can be applied to vir-
primer length, the stringency of the con- tually any organism with minimal initial
ditions and the method of separation and development. (v) The primers can be uni-
detection of the fragments. They all can be versal and one set of primers can be used for
used to identify RAPD. any species. In addition, RAPD products of
The principle of RAPD consists of a interest can be cloned, sequenced and con-
PCR on the DNA of the individual under verted into other types of PCR-based mark-
study using a short primer, usually ten ers such as sequence tagged sites (STS),
nucleotides, of arbitrary sequence. The sequenced characterized amplified regions
primer which binds to many different loci (SCAR), etc.
28 Chapter 2

Reproducibility affects the way in which clear what might be causing the problem, it
RAPD bands can be standardized for compar- is worth starting from the beginning by dis-
ison across laboratories, samples and trials posing of all the reagents used and preparing
and whether RAPD marker information can fresh ones. A careful experiment revealed
be accumulated or shared. Due to frequently that reproducibility could be improved and
observed problems with reproducibility of Taberner et al. (1997) reported that 3396 out
overall RAPD profiles and specific bands, of 3422 bands (99.2%) were reproducible.
this marker class is often treated with On the other hand, low reproducibility
reserve. In replication studies by Prez et al. is a major limitation of RAPD markers, par-
(1998), mispriming error amounted to 60%. ticularly in ongoing genetic and plant breed-
Several factors have been shown to affect ing programmes in which the accumulated
the number, size and intensity of bands. information and markers and marker data
These include PCR buffers, deoxynucleo- are shared between laboratories and experi-
tide triphosphates (dNTPs), Mg2+ concen- ments. RAPD markers may still find their
tration, cycling parameters, source of Taq applications in independent genetic diver-
polymerase, condition and concentration sity and phylogenetic studies that do not
of template DNA and primer concentra- depend on data sharing or accumulation. As
tion. Results obtained by RAPDs are highly RAPD markers can be converted into other
prone to user error and bands obtained can types of markers, they have a unique role in
vary considerably between different runs of the development of target markers for crop
the same sample. To correct the problems species that have limited molecular markers
that may be encountered when carrying out available to cover the whole genome.
RAPD-PCR, it is important to bear in mind To overcome the problem associated
the following: (i) the concentration of DNA RAPD analysis, Paran and Michelmore
can alter the number of bands; (ii) RAPD (1993) converted RAPD fragments into
profiles vary depending on the Mg2+ con- simple and robust PCR markers known as
centration and the PCR buffer provided by SCARs. This procedure increases the repro-
Taq polymerase suppliers may or may not ducibility of RAPD markers and also avoids
contain Mg2+ ions; (iii) there are different the occurrence of non-homologous mark-
sources of Taq polymerase and there is great ers of equal molecular weight. These spe-
variation between profiles produced using cific markers are obtained by introducing
Taq polymerase obtained from different RAPD bands (polymorphic) into single
companies; (iv) there are a large number of markers which are then sequenced and
alternative cycling times and temperatures specific primers are designed usually by
which are equally important and depend on expanding the original decamer primer
the type of machine used and even the wall sequence with 1015 bases so that only the
thickness of the PCR tubes. band of interest is amplified. In general,
Generally if a PCR does not work there DNA can be isolated from agarose gels,
is likely to be something wrong with the cloned and sequenced to produce the start-
template DNA, primers, Taq polymerase or ing DNA template for the development of a
choice of conditions. Initially it is impor- variety of PCR-based markers. The cloned
tant to try and repeat the PCR under the and sequenced DNA fragments can then be
same conditions to ensure that there was used for the development of CAPS, single
not a simple error that resulted in the fail- strand conformation polymorphism (SSCP)
ure. In addition it is recommended that both or SNP markers.
positive and negative controls are included.
A positive control with a template known AFLP
to amplify well will ensure that all reagents
have been added and that they are all func- Amplified fragment length polymorphism
tioning. A negative control without template (AFLP; Zabeau and Voss, 1993; Vos et al.,
DNA will reveal any contamination. In most 1995) is based on the selective PCR ampli-
cases if the PCR does not work and it is not fication of restriction fragments from a total
Markers and Maps 29

double-digest of genomic DNA under high GAATTC TTAA


CTTAAG AATT
stringency conditions, that is, the combi- Whole genome DNA
nation of polymorphism at restriction sites
and hybridization of arbitrary primers, and Restriction + EcoRI and MseI
because of this AFLP is also called selective
AATTC T
restriction fragment amplification (SRFA). G AAT
It was perfected by the company Keygene
in the Netherlands for initial use in plant TTAA
EcoRI Adaptor
Ligation +
improvement and has been patented. The TA
MseI Adaptor
AFLP technique combines the power of
RFLP with the flexibility of PCR-based AATTC TTA
markers and provides a universal, multi- TTAAG AAT

locus marker technique that can be applied


5 A EcoRI Primer 1
Pre-amplification +
to complex genomes from any source. The C 5 MseI Primer 1
method is based on the identification of
AFLP using selective PCR amplification of 5 A
AATTCN NTTA
digested/ligated genomic or cDNA templates TTAAGN NAAT
separated on a polyacrylamide gel, includ- C 5

ing restrictionligation, pre-amplification Selective 5* GA EcoRI Primer 1


and selective amplification (Fig. 2.3). The amplification CA 5 MseI Primer 1
purified genomic DNA is first cleaved with
5* GA
one or more restriction endonucleases, AATTCN NTTA
i.e. a 6-cutter (EcoRI, PstI and HindIII) and TTAAGN NAAT
CA 5
a 4-cutter (MseI, TaqI). Adaptors of 1820 bp
and of known sequence, adapted at the Electrophoresis

sticky ends of the restriction sites, are then


added to the ends of DNA fragments by a
ligation reaction using T4 DNA ligase. DNA
amplification is carried out using primers
with the sequence specificity of the adaptor
to generate a subset of fragments of differ-
ent sizes (up to 1 kb). The primer(s) also
contains one or more bases at their 3' ends
that provide amplification selectivity by
Fig. 2.3. AFLP flowchart. Adaptor DNA = short
limiting the number of perfect sequence
double strand DNA molecules, 1820 bp in length,
matches between the primer and the pool representing a mixture of two types of molecules.
of available adaptor/DNA templates. The Each type is comparable with one restriction
resulting amplification products (50400 bp enzyme generated DNA end. Pre-amplifications
size range) are typically observed by radio- uses selective primers, which contain an adaptor
labelling one of the primers followed by DNA sequence plus one or two random bases at
fragment separation on acrylamide gels to the 3' end for reading into the genomic fragments.
identify polymorphisms (changes in restric- Primers for re-amplification have the pre-amplification
tion sizes). primer sequence plus one or two additional bases
at the 3' end. A tag (*) is attached at the 5' end of
An AFLP primer is composed of a
one of the re-amplification primers for detecting
synthetic adaptor sequence, the restric-
amplified molecules.
tion endonuclease recognition sequence
and an arbitrary, non-degenerate selec-
tive sequence (typically one, two or three one rare cutter (6-bp recognition site).
nucleotides). In the first step, 500 ng of Oligonucleotide adaptors are ligated to the
genomic DNA will be completely digested end of each restriction DNA which serve
with two restriction enzymes, one fre- together with restriction site sequences as
quent cutter (4-bp recognition site) and target sites for primer annealing, one end
30 Chapter 2

with a complementary sequence for the rare the detector near the bottom of the gel/end
cutter and the other with the complemen- of the capillary, resulting in a linear spac-
tary sequence for the frequent cutter. In this ing of DNA fragments and therefore increas-
way only fragments which have been cut by ing the resolution over the whole size range
the frequent cutter and rare cutter will be (Schwarz et al., 2000).
amplified. Primers are designed from the In general, AFLP assays can be carried
known sequence of the adaptor, plus one out using relatively small DNA samples
to three selective nucleotides which extend (typically 1100 ng per individual). AFLP
into the fragment sequence. Sequences not has a very high multiplex ratio and genotyp-
matching these selective nucleotides in the ing throughput and is relatively reproduc-
primer will not be amplified so that the ible across laboratories. Simple off-the-shelf
specific amplification of only those frag- technology can be applied to virtually any
ments matching the primers is achieved. organism with no formal marker devel-
The option to permutate the order of the opment required and in addition, a set of
selective bases and to recombine the prim- primers can be used for different species.
ers with each other will theoretically lead However, there are limitations to the AFLP
to the gradual collection of all restriction assay. (i) The maximum polymorphic infor-
fragments from a particular enzyme com- mation content for any bi-allelic marker
bination that is of a suitable size for DNA is 0.5. (ii) High quality DNA is needed to
fragment analysis from a genotype. The ensure complete restriction enzyme diges-
multiplex ratio of an AFLP assay is a func- tion. Rapid methods for isolating DNA may
tion of the number selective nucleotides in not produce sufficiently clean template
the AFLP primer combination, the selective DNA for AFLP analysis. (iii) Proprietary
nucleotide motif, GC content and physical technology is needed to score heterozygotes
genome size and complexity. Typically, two and ++ homozygotes, otherwise AFLPs must
selective nucleotides are used for species be dominantly scored. (iv) AFLP markers
with small genomes (1 1085 108 bp), often cluster densely in centromeric regions
e.g. Arabidopsis thaliana L. (1 108 bp) and in species with large genomes, e.g. barley
rice (Oryza sativa L.) (4 108 bp), and three (Qi et al., 1998) and sunflower (Gedil et al.,
selective nucleotides are used for species 2001). (v) Developing locus-specific mark-
with large genomes (5 1086 109 bp), ers from individual fragments can be dif-
e.g. maize, soybean, sunflower and many ficult. (vi) AFLP primer screening is often
others. It is theoretically possible to use necessary to identify optimal primer spe-
several tens of combinations of restriction cificities and combinations otherwise the
enzymes at sites of four to six bases and a assays can be carried out using off-the-shelf
large number of combinations of selective technology. (vii) There are relatively high
bases on the amplification primers. Thus, technical demands in AFLP analysis includ-
as indicated by Falque and Santoni (2007), ing radio-labelling and skilled manpower.
the restrictionamplification combinations (viii) Marker development is complicated
are nearly infinite. and not cost-effective. (ix) Reproducibility
AFLP products can be separated in high- is relatively low compared to RFLP and
resolution electrophoresis systems. The simple sequence repeat (SSR) markers but
number of bands produced can be manipu- better than RAPD marker as AFLP reveals
lated by the number of selective nucleotides large numbers of bands and not all the bands
and the nucleotide motifs used. A well- will be comparable across laboratories or
balanced number of amplified restriction trials due to potential false positive, false
fragments ranges from 50 to150 bp. A major negative and complicated gel backgrounds.
improvement has been made by switching The AFLP technique can be modified
from radioactive to fluorescent dye-labelled so that one primer is obtained from a known
primers for the detection of fragments in multi-copy sequence to detect sequence-
gel-based or capillary DNA sequencers in specific amplification polymorphisms. This
which fluorescently labelled fragments pass approach was used successfully to generate
Markers and Maps 31

genome-wide Bare-1 retrotransposon-like The unique sequences bordering the SSR


markers in barley (Waugh et al., 1997) and motifs provide templates for specific prim-
diploid Avena (Yu and Wise, 2000) as well ers to amplify the SSR alleles via PCR.
as in lucerne by making use of consen- Referred to as simple sequence length poly-
sus sequences from long terminal repeats morphisms (SSLPs), they pertain to the
(LTRs) of Tms1 retrotransposon (Porceddu number of repeat units that constitute the
et al., 2002). The cDNA-AFLP technique microsatellite sequence. The rates of muta-
(Bachem et al., 1996) which applies the tion of SSR are about 4 1045 106 per
standard AFLP protocol to a cDNA tem- allele and per generation (Primmer et al.,
plate, was used to display transcripts whose 1996). The predominant mutation mecha-
expression was rapidly altered during race- nism in microsatellite tracts is slipped-
specific resistance reactions, for the isola- strand mispairing (Levinson and Gutman,
tion of differentially expressed genes from 1987). When slipped-strand mispairing
a specific chromosome region using aneu- occurs within a microsatellite array during
ploids and for the construction of genome- DNA synthesis, it can result in the gain or
wide transcription maps (as reviewed by loss of one or more repeat units depending
Mohler and Schwarz, 2005). In addition, on whether the newly synthesized DNA
there are several modified AFLP tech- chain or the template chain loops out. The
niques based on the use of endonucleases relative propensity for either chain to loop
such as single endonuclease (MspI) AFLP out seems to depend in part on the sequences
(Boumedine and Rodolakis, 1998), three making up the array and in part on whether
endonuclease-AFLP (van der Wurff et al., the event occurs on the leading (continuous
2000), and second digestion AFLP (Knox DNA synthesis) or lagging (discontinuous
and Ellis, 2001). Developments in the DNA synthesis) strand (Freudenreich et al.,
detection of AFLP include the replacement 1997). SSR loci are individually amplified
of radio-active detection with silver stain- by PCR using pairs of oligonucleotide prim-
ing, fluorescent AFLP or agrarose gels for ers specific to unique DNA sequences flank-
single endonuclease AFLP. Recent studies ing the SSR sequence.
have addressed specific areas of the AFLP Microsatellites may be obtained by
technique including comparison with other screening sequences in databases or by screen-
genotyping methods, assessment of errors, ing libraries of clones. If no sequence is
homoplasy, phylogenetic signal and appro- available, microsatellite markers can be
priate analysis techniques. The study by developed in the following steps: construct
Meudt and Clarke (2007) provides a syn- enriched or unenriched small-insert clone
thesis of these areas and explores new library; screen it by hybridizing labelled
directions for the AFLP technique in the oligo (with SSR motif of interest); sequence
genomic era. positive clones; design primers in single
copy regions flanking SSR repeats such that
SSR the amplified fragments will be > 50 bp and
< 350 bp; and identify size polymorphism on
Microsatellites, also known as SSRs, short PAGE gels. For multiplexing, design primers
tandem repeats (STRs) or sequence-tagged with similar melting temperature (Tm) and
microsatellite sites (STMS), are tandemly a range of expected amplicon sizes to have
repeated units of short nucleotide motifs non-overlapping groups of markers on a gel.
that are 16 bp long. Di-, tri- and tetranu- In rice, both an enzyme-digested (Chen, X.
cleotide repeats such as (CA)n, (AAT)n and et al., 1997) and a physically-sheared library
(GATA)n are widely distributed through- (Panaud et al., 1996) were constructed from
out the genomes of plants and animals cultivar IR36 based on size-selected DNA in
(Tautz and Renz, 1984). One of the most the 300800-bp range. These libraries were
important attributes of microsatellite loci screened for the presence of (GA)n microsat-
is their high level of allelic variation, mak- ellites by plaque and colony hybridization.
ing them valuable as genetic markers. A pre-sequencing screening step was used
32 Chapter 2

to eliminate clones where the microsatellite Additional information based on genetic


repeat was too near one of the cloning sites mapping and nearest marker informa-
to permit accurate design of primers and to tion provided the basis for locating a
determine which end should be sequenced total of 1825 designed markers along rice
with priority. The basic steps include: chromosomes.
Compared with library-derived SSRs,
PCR amplification of clone inserts and
EST-derived SSRs are expected to dis-
determination of their lengths before
play slightly fewer polymorphisms as
sequencing. Short and long insert
there is pressure for sequence conserva-
clones are usually discarded.
tion in the coding regions (Scott, 2001).
Selected clones are sequenced and
However, the availability of SSR markers
searched for SSRs.
from the expressed portion of the genome
Sequences within motif classes are
might facilitate their transferability across
grouped and aligned using sequence
genera compared to the low efficiency
alignment software to identify redun-
of SSR markers that have been retrieved
dant sequences.
from gene-poor areas (Peakall et al., 1998).
Oligonucleotide primers are designed
This approach could be used in plant spe-
for unique DNA sequences flanking
cies with minimal resources and research
non-redundant SSRs.
expenditure.
Primers are tested and genotypes are
Once a plant species has been com-
screened for SSR length polymorphisms.
pletely sequenced, the entire set of available
An alternative source of SSRs is to SSRs in the genome can be easily accessed
utilize expressed sequence tag (EST) and through online databases. For example,
other sequence databases (e.g. Kantety the International Rice Genome Sequencing
et al., 2002). SSRs can be identified com- Project identified 18,828 di, tri and tetra-
putationally, using a BLAST query (see nucleotide SSRs that were over 20 bp in
Simple Sequence Repeat Identification length and developed flanking primers for
Tool available at www.gramene.org) and use as SSR markers (IRGSP, 2005). The loca-
available genomic or EST sequences. Using tions of these SSRs on the physical map of
this method, a total of 2414 new di-, tri- rice in relation to other genetic markers can
and tetra-nucleotide non-redundant SSR be found using the online Gramene Genome
primer pairs, representing 2240 unique Browser (http://www.gramene.org/Oryza_
marker loci, were developed and experi- sativa_japonica/index.html).
mentally validated in rice (McCouch et al., The usual method of SSR genotyping is to
2002). SSR-containing sequences that separate radio-labelled or silver-stained PCR
consisted of perfect repeat motifs (> 24 bp products by denaturing or non-denaturing
in length) flanked by 100 bp of unique PAGE using ethidium bromide or SYBR stain-
sequence on either side of the SSR were ing although distinguishing SSRs on agarose
chosen from GenBank. Primer pairs con- gels is sometimes possible (Fig. 2.4). These
taining 1824 nucleotides devoid of sec- assays can usually distinguish alleles which
ondary structure or consecutive tracts of differ by 24 bp or more.
a single nucleotide, with a GC content of Semi-automated SSR genotyping
around 50% (Tm approximately 60C) and can be carried out by assaying fluores-
preferably G- or C-rich at the 3' end were cently labelled PCR products for length
automatically designed. Using electronic variants on an automated DNA sequencer
PCR (e-PCR) to align these designed primer (e.g. Applied Biosystems and Li-Cor)
pairs against 3284 publicly sequenced rice (Fig. 2.4). One drawback of fluorescent
BAC and PAC clones (representing about SSR genotyping is the cost of end-labelling
83% of the total rice genome), 65% of the primers with the necessary fluorophores,
SSR markers hit a BAC or PAC clone con- e.g. 6-carboxy-fluorescine (FAM), hexachloro-
taining at least one genetically mapped 6-carboxy-flurescine (HEX) or tetrachloro-
marker and could be mapped by proxy. 6-carboxy-fluorescine (TET). SSR length
Markers and Maps 33

by several repeat units can often be


distinguished on agarose gels (Fig. 2.4).
SSRs assayed on polyacrylamide gels
typically show characteristic stuttering.
Agarose gel-based SSR genotyping Stutter bands are artefacts produced by
DNA polymerase slippage. Typically, the
most prominent stutter bands are +1 and 1
repeats (e.g. + or 2 bp for a di-nucleotide
repeat), and, if visible, the next most prom-
inent stutter bands are +2 and 2 repeats.
Stuttering reduces the resolution between
PAGE gel-based SSR genotyping alleles such that 2- or possibly 4-bp differ-
ences between alleles cannot be sharply or
unequivocally distinguished on polyacry-
lamide gels. Figure 2.4 shows examples
of different genotyping systems used for
SSR analysis including multiplexing and
stutter bands.
Another source of noise is the incom-
plete addition of non-templated ade-
nine to PCR products thereby producing
adenylated (+A) and non-adenylated (A)
DNA fragments (Magnuson et al., 1996).
Adding a pigtail sequence (e.g. GTCTCTT)
Semi-automated SSR genotyping to the 5' end of the reverse primer pro-
motes the adenylation of the 3' end of the
156
forward strand (Brownstein et al., 1996),
thereby virtually eliminating the A prod-
ucts and producing a more homogenous
158 set of fragments.
SSR markers are characterized by
their hypervariability, reproducibility, co-
156 158 dominant nature, locus specificity and
random dispersion throughout most genomes.
In addition, SSRs are reported to be more
Automated SSR genotyping using fluorescent variable than RFLPs or RAPDs. The advan-
labelling tages of SSRs are that they can be readily
analysed by PCR and are easily detected on
polyacrylamide gels. SSLPs with large size
differences can be also detected on agarose
gels. SSR markers can be multiplexed, either
functionally by pooling independent PCR
products or by true multiplex-PCR. Their
genotyping throughput is high and can be
Stutter bands and multiple alleles automated. In addition, start-up costs are
low for manual assay methods (once the
Fig. 2.4. Examples of genotyping systems used markers have been developed) and SSR
for SSR analysis. assays require only very small DNA samples
(100 ng per individual).
polymorphisms can be also assayed using The disadvantages of SSRs are the labour-
non-denaturing high pressure liquid chro- intensive development process particularly
matography (HPLC). SSR alleles differing when this involves screening genomic DNA
34 Chapter 2

libraries enriched for one or more repeat ing barley, soybean, sugarbeet, maize,
motifs (although SSR-enriched libraries can cassava and potato; typical SNP frequen-
be commercially purchased) and the high cies are also in the range of one SNP every
start-up costs for automated methods. 100300 bp in plants (see Edwards et al.,
2007a for a review).
SNP SNPs may fall within coding sequences
of genes, non-coding regions of genes or in
A single nucleotide polymorphism or the intergenic regions between genes at dif-
SNP (pronounced snip) is an individual ferent frequencies in different chromosome
nucleotide base difference between two regions. In Arabidopsis the distribution of
DNA sequences. SNPs can be catego- SNPs was found to be even across the five
rized according to nucleotide substitu- chromosomes with the exception of cen-
tion as either transitions (C/T or G/A) or tromeric regions which contain few tran-
transversions (C/G, A/T, C/A or T/G). For scribed genes (Schmid et al., 2003). SNPs
example, sequenced DNA fragments from within a coding sequence will not neces-
two different individuals, AAGCCTA to sarily change the amino acid sequence of
AAGCTTA, contain a single nucleotide dif- the protein that is produced due to redun-
ference. In this case there are two alleles: dancy in the genetic code. A SNP in which
C and T. C/T transitions constitute 67% of both forms lead to the same polypeptide
the SNPs observed in humans, and about sequence is termed synonymous, while if
the same rate was also found in plants a different polypeptide sequence is pro-
(Edwards et al., 2007a). In practice, single duced they are non-synonymous. SNPs
base variants in cDNA (mRNA) are consid- that are not in protein coding regions may
ered to be SNPs as are single base inser- still have consequences for gene splic-
tions and deletions (indels) in the genome. ing, transcription factor binding or the
As a nucleotide base is the smallest unit sequence of non-coding RNA. Of the 317
of inheritance, SNPs provide the ultimate million SNPs found in the human genome,
form of molecular marker. 5% are expected to occur within genes.
For a variation to be considered a SNP, Therefore, each gene may be expected to
it must occur in at least 1% of the popula- contain 6 SNPs.
tion. SNPs make up about 90% of all human A variety of approaches have been
genetic variation and occur every 100300 adopted for discovery of novel SNPs in a
bases. Two of every three SNPs involve the wide range of organisms including plants.
replacement of cytosine (C) with thymine These fall into three general categories
(T). This is supported by a genome-wide (Edwards et al., 2007b): (i) in vitro discov-
analysis in rice. A polymorphism data- ery, where new sequence data is generated;
base constructed to define polymorphisms (ii) in silico methods that rely on the analysis
between cultivars Nipponbare (from sub- of available sequence data; and (iii) indirect
species japonica) and 93-11 (from subspe- discovery, where the base sequence of the
cies indica) contains 1,703,176 SNPs and polymorphism remains unknown. On the
479,406 indels (Shen et al., 2004), which other hand, a large number of different SNP
equates to approximately 1 SNP/268 bp genotyping methods and chemistries have
in the rice genome. Using alignments of been developed based on various meth-
the improved whole-genome shotgun ods of allelic discrimination and detection
sequences for japonica and indica rice, platforms. A convenient method for detect-
SNP frequencies varied from 3 SNPs/kb in ing SNPs is RFLP (SNP-RFLP) or by using
coding sequences to 27.6 SNPs/kb in the the CAPS marker technique. If one allele
transposable elements with a genome-wide contains a recognition site for a restriction
measure of 15 SNPs/kb or 1 SNP/66 bp enzyme while the other does not, digestion
(Yu et al., 2005). Based on partial genomic of the two alleles will give rise to fragments
sequence information, SNP frequencies of different length. A simple procedure is
have been revealed in many crops, includ- to analyse the sequence data stored in the
Markers and Maps 35

major databases and identify SNPs. Four be bound to streptavidin-coated wells and
alleles can be identified when the complete denatured under alkaline conditions. An
base sequence of a segment of DNA is con- oligonucleotide probe complementary to
sidered and these are represented by A, T, G one allele is added to the single-strand target
and C at each SNP locus in that segment. DNA molecules. The differences in melting
Sobrino et al. (2005) assigned the major- curves are measured by slowly heating and
ity of SNP genotyping assays to one of four observing the changes in fluorescence of a
groups based on the molecular mechanisms: double-strand-specific, intercalating dye.
allele-specific hybridization, primer exten- The 5' nuclease or TaqMan assay, molecu-
sion, oligonucleotide ligation and invasive lar beacon and the scorpion assays are all
cleavage. These four are described below. examples of ASH SNP genotyping technolo-
Chagn et al. (2007) added three methods gies. Large-scale scanning of SNPs in a vast
to this list, sequencing, allele-specific PCR number of loci using allele-specific hybridi-
amplification, DNA conformation methods zation can be carried out on high-density
and also generalized the enzymatic cleav- oligonucleotide chips.
age method to include the invader assay 2. The Invader assay, also known as flap
and also dCAPS and targeting induced endonuclease discrimination, is based on
local lesions in genomes (TILLING). the specificity of recognition and cleavage
by a three-dimensional flap endonuclease
1. Allele-specific hybridization (ASH), also which is formed when two overlapping oli-
known as allelic-specific oligonucleotide gonucleotides hybridize perfectly to a target
hybridization, is based on distinguishing by DNA (Lyamichev et al., 1999). The cleaved
hybridization between two DNA targets dif- fragment may be labelled with a probe-
fering at one nucleotide position (Wallace specific fluorescent dye which fluoresces
et al., 1979). Allelic discrimination can be following probe cleavage due to spatial sep-
achieved using two allele-specific probes aration from the quencher. Alternatively, the
labelled with a probe-specific fluorescent flap may act as the invader probe in a sec-
dye and a generic quencher that reduces flu- ondary reaction to amplify the fluorescent
orescence in the intact probe. During ampli- signal (Invader squared) (Hall et al., 2000).
fication of the sequence surrounding the Third Wave Technologies Inc. (http://www.
SNP, probes complementary to the DNA tar- twt.com) has manufactured an Invader assay
get are cleaved by the 5' exonuclease activ- for flap endonuclease discrimination which
ity of Taq polymerase. Spatial separation of can be carried out in solid phase using
the dye and quencher results in an increase oligonucleotide-bound streptavidin-coated
in probe-specific fluorescence which can be particles (Wilkins-Stevens et al., 2001).
detected with a plate reader. 3. Primer extension is a term used to
Under optimized assay conditions, describe mini-sequencing, single-base exten-
the SNP can be detected by the difference sion or the GOOD assay (Sauer et al., 2002).
in Tm of the two probetemplate hybrids A popular method which was designed
as only the perfectly matched probetarget specifically for genotyping SNPs is the
hybrids are stable and those with one-base mini-sequencing technique (Syvnen, 1999;
mismatch are unstable. To increase the reli- Syvnen et al., 1990). The method forms the
ability of SNP genotyping the probes should basis of a number of methods for allelic dis-
be as short as possible. Originally, ASH crimination. The robust detection of known
used the dot blot format in which probes are mutations employs oligonucleotides which
hybridized to membrane-bound genomic anneal immediately upstream of the query
DNA or PCR fragments. However, the SNP and are then extended by a single
more advanced PCR-based dynamic allele- dideoxynucleotide triphosphate (ddNTP)
specific hybridization (DASH) method uses in cycle sequencing reactions. The fidel-
a microtitre plate format (Howell et al., ity of thermostable proof-reading DNA
1999). Since one of the PCR primers is bioti- polymerases guarantees that only the com-
nylated at the 5' end, the PCR products can plementary ddNTP is incorporated. Several
36 Chapter 2

detection methods have been described on an automated sequencer and rolling-


for the discrimination of primer extension circle amplification with one of the ligation
(PEX) products. Most popular is the use of probes bound to a microarray surface.
ddNTP terminators that are labelled with
different fluorescent dyes. The differentially DETECTION SYSTEMS. There are several detec-
dye-labelled PEX products can readily be tion methods for analysing the products of
detected on charge coupled device camera- each type of allelic discrimination reaction:
based DNA sequencing instruments. gel electrophoresis, fluorescence resonance
In the case of a single base extension energy transfer (FRET), fluorescence polari-
(SBE), a primer is annealed adjacent to a zation, arrays or chips, luminescence, mass
SNP and extended to incorporate a ddNTP spectrophotometry, chromatography, etc.
at the polymorphic site. SNaPshot (Applied Fig. 2.5 summarizes the enzyme chemistry,
Biosystems) uses differential fluorescent demultiplexing and detection options in
labelling of the four ddNTPs in a SBE reac- SNP genotyping.
tion allowing fluorescent detection of the Fluorescence is the most widely applied
incorporated nucleotide. SNP-IT (Orchid detection method currently employed for
Biosciences) is also based on fluorescent high-throughput genotyping in general. The
SBE and uses solid phase capture and detec- use of fluorescence has been teamed with
tion of extension products. The GOOD assay a number of different detection systems
involves extension of a primer modified including plate readers, capillary electro-
near the 3' end with a charged tag to increase phoresis and DNA arrays. In addition to
sensitivity to mass spectrometry detection. fluorescence detection, mass spectrometry
Alternatives to SBE include pyrose- and light detection represent novel appli-
quencing, allele specific primer extension cations of established technology for high-
and the amplification refractory mutation throughput genotyping of SNPs.
system. Real-time monitoring of PEX relies
on the bioluminometric detection of inor- PLATE READERS. There are many fluores-
ganic pyrophosphate released upon incor- cent plate readers capable of detecting
poration of dNTP (Ahmadian et al., 2000). fluorescence in a 96- or 384-well format
4. The oligonucleotide ligation assay (OLA) (Jenkins and Gibson, 2002). Most models
for SNP typing is based on the ability of use a light source and narrow band-pass
ligase to covalently join two oligonucle- filters to select the excitation and emis-
otides when they hybridize next to one sion wavelengths and enable semi-quan-
another on a DNA template (Landegren titative steady state fluorescence intensity
et al., 1988). Both primers must have perfect readings to be made. This technology has
base pair complementarity at the ligation been applied to genotyping with TaqMan,
site which makes it possible to discriminate Invader and rolling-circle amplification.
two alleles at a SNP site. The OLA has been Fluorescence plate readers are also avail-
modified to exploit a thermostable DNA able which allow measurement of addi-
ligase, interrogate PCR templates and uti- tional fluorescence parameters including
lize a dual-colour detection system. OLA polarization, lifetime and time-resolved
also gave rise to another technique, Padlock fluorescence and FRET.
probes (Nilsson et al., 1994), which uses
oligonucleotide probes that ligate into cir- DNA ARRAY. Oligonucleotide arrays bound to
cles upon target recognition and isothermal a solid support have been proposed as the
rolling-circle amplification. As reviewed future detection platform for high-through-
by Chagn et al. (2007), there are several put genotyping. Two distinct approaches
applications which have been developed to have been adopted involving ASH whereby
detect SNP variation using OLA, including the oligonucleotide directly probes the
colorimetric assays in ELISA plates, sepa- target and tag arrays that capture solution
ration of the ligated oligonucleotides that phase reaction products via hybridization
have been labelled with a fluorescent dye to their anti-tag sequences.
Markers and Maps 37

Enzyme chemistry Demultiplexing Detection method Platform/company

Illumina
BeadArray
Allele-specific Luminex 100 Flow
Semi-homogen.
extend  ligate Cytometry

Sequenom iPlex
Oligonucleotide Solid phase Mass Spec.
ligation assay microspheres Fluorescence

ABI SNPlex
Single nucleotide
primer extension Homogeneous Mass Microarray
spectrometry minisequencing

ABI TaqMan
Capillary 5-Nuclease
Allele-specific
electrophoresis
hybridization Fluor. res. energy
transfer (FRET) ABI SNaPshot

Solid phase
DASH,
microarray
Amplicon Tm
Allele-specific Fluorescence
PCR polarization Perkin-Elmer
FP-TDI

Fig. 2.5. Chemistry, demultiplexing, detection options in SNP genotyping. From Syvnen (2001) reprinted
by permission from Macmillan Publishers Ltd.

The Affymetrix Genome-Wide Human MASS SPECTROMETRY. Many genotyping tech-


SNP Array 6.0 features more than 1.8 mil- niques involve the allele-specific incorpora-
lion markers for genetic variation, includ- tion of two alternative nucleotides into an
ing more than 906,600 SNPs and more oligonucleotide probe. Due to the inherent
than 946,000 probes for the detection of molecular weight difference of DNA bases,
copy number variation. The SNP Array 6.0 mass spectrometry can be used to determine
enables high-performance, high-powered which variant nucleotide has been incorpo-
and low-cost genotyping (http://www.affy rated by measuring the mass of the extended
metrix.com). Luminex has developed a primers and this approach has been applied
panel of 100 bead sets with unique fluores- primarily to genotyping by primer exten-
cent labels, identifiable by flow analyser. sion using the MALDI-TOF (matrix assisted
The bead sets can be derivatized with allele laser desorption/ionization-time of flight)
specific oligonucleotides to create a bead- mass spectrometry approach. The MALDI-
based array for multiplex genotyping by TOF method is particularly advantageous
ASH. for detection of PEX products in multiplex.
Tag arrays are generic assemblies of The polyanionic nature of oligonucle-
oligonucleotides that are used to sort or otides results in low signal to noise ratios,
deconvolute mixtures of oligos by hybri- particularly for longer (> 40 mer) fragments.
dization to the anti-tag sequences. The This has been addressed by specifically
current Affymetrix GeneChip Universal cleaving long probes by acidolysis of P3'-N5'
Tag Arrays are available in 3, 5, 10 or 25 K phosphoramidate bonds and by a combined
configurations and contain novel, bio- approach whereby the probe is digested to a
informatically designed tag sequences very short fragment which has been deriva-
that result in minimal potential for cross- tized to lower its charge to a single positive
hybridization. or negative charge.
38 Chapter 2

LIGHT DETECTION. Pyrosequencing involves level, the multiple steps can be assembled
hybridization of a sequencing primer to a and automated so that one laboratory tech-
single stranded template and sequential nician can produce 10,000 data points per
addition of individual dNTPs. Incorporation day. The TaqMan platform is highly suita-
of a dNTP into a primer releases pyrophos- ble for genetically modified organism tests
phate which triggers a luciferase-catalysed and MAS using a few markers for a large
reaction. The genotype of a SNP is deter- number of samples.
mined by the sequential addition (and The SNaPshot Multiplex Assay
degradation) of nucleotides. The light (Applied Biosystems, Foster City, USA)
produced is detected by a charge coupled is based on mini-sequencing, i.e. a single-
device camera and each light signal is pro- base extension using fluorescent labelled
portional to the number of nucleotides ddNTPs. The systems multiplex ready
incorporated (http://www.pyrosequencing. reaction mix enables robust multiplex
com), for which reason pyrosequencing is SNP interrogation of PCR-generated tem-
suitable for the quantitative estimation of plates. Multiplexing can be accomplished
allele frequencies in pooled DNA samples. by representing multiple SNP products
Furthermore, pyrosequencing proved to spatially. This is achieved by tailing the
be an appropriate method for genotyping 5' end of the unlabelled SNaPshot primers
SNPs in polyploidy plant genomes such with different lengths of non-complemen-
as potato because all possible allelic states tary oligonucleotide sequences that serve
of binary SNP could be accurately distin- as mobility modifiers. The reactions may
guished (Rickert et al., 2002). be carried out in 5- to 10-plex using capil-
There are various SNP detection systems lary electrophoresis for data detection in a
which differ in their chemistry, detection 96-well format so that one individual can
platform, multiplex level and application; generate over 10,000 data points per day.
some of these will be discussed below. SNaPshot is suitable for MAS of several
The reader is also referred to Bagge and traits simultaneously and if multiple sets
Lbberstedt (2008) for further information. of 10-plex are combined, it can be used for
The TaqMan SNP Genotyping Assay rough mapping and marker-assisted back-
(Applied Biosystems, Foster City, USA) is crossing with several hundreds of samples
a single-tube PCR assay that exploits the and markers involved.
5' exonuclease activity of AmpliTaq Gold The SNPlex Genotyping System
DNA. The assay kit includes two locus- (Applied Biosystems, Foster City, USA)
specific PCR primers that flank the SNP of uses OLA/PCR technology for allelic dis-
interest and two allele-specific oligonucle- crimination and ligation product amplifica-
otide TaqMan probes. These probes have a tion. Genotype information is then encoded
fluorescent reporter dye at the 5' end and into a universal set of dye-labelled, mobil-
a non-fluorescent quencher with a minor ity modified fragments known as Zipchute
groove binder at the 3' end. Upon cleav- Mobility Modifiers, for rapid detection by
age by the 5' exonuclease activity of Taq capillary electrophoresis. The same set of
polymerase during PCR, the reporter dye Zipchute Mobility Modifiers can be used
will fluoresce as it is no longer quenched for every SNPlex pool regardless of which
and the intensity of the emitted light can SNPs are chosen. The SNPlex System
be measured. Modified probes such as allows for multiplexed genotyping of up
locked nucleic acids, a modified nucleic to 48 SNPs simultaneously against a single
acid analogue, showed better hybridization sample with the ability to detect up to 4500
properties than standard TaqMan probes SNPs in parallel in 15 min. This integrated
(Kennedy et al., 2006). TaqMan is a simple system delivers cost-efficient, medium- to
assay, since all the reagents are added to the high-throughput genotyping and is suitable
microtitre well at the same time in a 96- or for various genetic and breeding applica-
384-well format. Although the assay can tions including fingerprinting, gene map-
be carried out at the monoplex or duplex ping and MAS for both foreground and
Markers and Maps 39

background. Both SNaPshot and SNPlex ing well-known reaction principles for DNA
can be used with capillary electrophoresis amplification and SNP genotyping.
systems as the genotyping platform which Identification of a specific single-base
can be also used for SSR genotyping. change among up to billions of bases that
MassARRAY iPLEX Gold (SEQUE- constitute a plant species is a challenging
NOM, San Diego, USA) combines the ben- task. PCR offers a means of reducing the
efits of the simple and robust single-base complexity of a genome and increasing
primer extension biochemistry with the the copy number of the DNA templates
sensitivity and accuracy of MALDI-TOF to the levels required for the specific and
mass spectometry (see Chapter 3) detection. sensitive detection of single-base changes.
It uses a single termination mix and universal However, the design of robust PCR assays
reaction conditions for all SNPs. The primer with multiplexing levels exceeding 1020
is extended, dependent upon the template amplicons has proven to be more diffi-
sequence, resulting in an allele-specific dif- cult than initially anticipated because in
ference in mass between extension prod- multiplex PCR the number of undesired
ucts. The assays can be multiplexed up to interactions between the PCR primers
40 SNPs in a 384-well format allowing for increases exponentially as the number of
throughput levels of up to 150,000 geno- primers included in the reaction mixture
types per instrument per day. MassARRAY increases. This interaction usually results
is flexible and suitable for generating both in preferential amplification of unwanted
small and large marker numbers for each primerdimer artefacts instead of the
sample so that it can be used for a variety of intended DNA templates (amplicons).
genetic and breeding purposes. Another problem in multiplex PCR is the
There are two major chip-based sequence-dependent differences in PCR
high-throughput genotyping systems, DNA efficiency between the amplicons. The
microarrays developed by Affymetrix (Santa problems of multiplexing can be reduced
Clara, USA) and a high-density biochip to some extent by using PCR primers that
assay by Illumina Inc. (San Diego, USA), are as similar to one another as possible.
both of which offer different levels of mul- The multiplexing level that can be read-
tiplexes up to several thousands or more ily achieved in standard PCRs is less than
plexes (Yan et al., 2009). As an increasing that offered by current technology for pro-
number of sets of these chips become avail- ducing high-density DNA microarrays.
able, outsourced genotyping through com- Simultaneous analysis of a reasonable
panies or service centres becomes one of amount of genomic DNA with the current
the options for genotyping large numbers detection sensitivity of microarray scan-
of samples using the same set of markers ners requires an amplification step. The
(e.g. fingerprinting) to achieve high effi- PCR step complicates the molecular reac-
ciency and low cost per data point. tions underlying the assays and introduces
multiple laboratory steps into the proce-
THE FUTURE OF SNP TECHNOLOGY. A key techni- dures and is therefore the chief obstacle to
cal obstacle in the development of micro- highly multiplexed SNP genotyping.
array-based methods for genome-wide SNP
genotyping is the PCR amplification step Diversity array technology
which is required to reduce the complexity
and improve the sensitivity of genotyping Diversity array technology (DArT) is a novel
SNPs in large, diploid genomes. The level type of DNA marker which employs a
of complexity that can be achieved in PCR microarray hybridization-based technique
does not match that of current microarray- developed by CAMBIA (http://www.diversity
based methods thus making PCR the lim- arrays.com) that enables the simultaneous
iting step in these assays (Syvnen, 2005). genotyping of several hundred polymorphic
Highly multiplexed microarray systems loci spread over the genome (Jaccoud et al.,
have recently been developed by combin- 2001; Wenzel et al., 2004). DArT can be
40 Chapter 2

used to construct medium-density genetic using vector-specific primers, purified and


linkage maps in species of various genome arrayed on to a solid support (microarray)
sizes. Two steps are involved: generating the (Fig. 2.6A). To genotype a sample, the rep-
array and genotyping the sample. For each resentation (DNA) of the sample is fluores-
sample, representative genomic DNA is pre- cently labelled and hybridized against the
pared by restriction enzyme digestion fol- discovery array. The array is then scanned
lowed by ligation of the restriction fragments and the hybridization signal is measured for
to adaptors. The genome complexity is then each array spot. By using multiple labels,
reduced by PCR primers with complemen- a representation from one sample is con-
tary sequences to the adaptor and selective trasted with that from another or with a con-
overhangs. Restriction generated fragments trol probe (Jaccoud et al., 2001; http://www.
representing the diversity of the gene pool cambia.org; http://www.diversityarrays.com).
are cloned. The outcome is known as a Polymorphic clones (DArT markers) show
representation (typically 0.110% of the variable hybridization signal intensities for
genome). Polymorphic clones in the library different individuals. These clones are sub-
are identified by array inserts from a random sequently assembled into a genotyping array
set of clones. Cloned inserts are amplified for routine genotyping (Fig. 2.6B).

A
Gx Gy Gn DNAs of interest

Use complexity reduction


method, e.g. RE digest,
adaptor ligation, PCR
Pool genomes amplification

Pick individual clones


and PCR amplification
Clone fragments from
the representation Library

Purified PCR products are arrayed

B
Gx Gy
Choose two genomes to analyse

Same complexity
reduction as used to make
the diversity panel

Cut, ligate adaptors


and PCR amplify

Label each genomic


subset: red ...
Label each genomic
subset ... green

Hybridize to chip

Fig. 2.6. Procedure of diversity array technology (DArT). (A) Preparing the array. RE, restriction enzyme.
(B) Genotyping a sample.
Markers and Maps 41

DArT markers are biallelic and behave derived from polymorphisms within genes.
in a dominant (present versus absent) or co- FMs are derived from polymorphic sites
dominant (two doses versus one dose versus within genes that are causally associated
absent) manner. DArT detects single-base with phenotypic trait variation and are supe-
changes as well as indels. It is a good alter- rior to RMs as a result of their complete link-
native to currently used techniques includ- age with trait locus alleles and functional
ing RFLP, AFLP, SSR and SNP in terms of motifs (Anderson and Lbberstedt, 2003).
cost and speed of marker discovery and The major drawback of the RMs is that their
analysis for whole-genome fingerprinting. predictive value depends on the known
It is cost-effective, sequence-independent, linkage phase between marker and target
non-gel based technology that is amenable locus alleles (Lbberstedt et al., 1998b).
to high-throughput automation and the dis- Genetic diversity at or below the spe-
covery of hundreds of high quality markers cies level has mostly been characterized
in a single assay. An open source software by molecular markers that more or less
package, DArTsoft, is available for automatic randomly sampled genetic variation in the
data extraction and analysis. The weak- genome. RM is a very effective tool among
nesses of this technology include marker others for the establishment of a breed-
dominance and its technically demanding ing system, the study of gene flow among
nature. Also there is some concern as to natural populations, and the determination
whether DArT markers are randomly dis- of the genetic structure of GeneBank col-
tributed across the whole genome, as DArT lections (Chapter 5; Xu et al., 2005). RM
markers in barley appear to have a moderate systems are still the systems of choice for
tendency to be located in hypomethylated, marker-assisted breeding (Xu, Y., 2003).
gene-rich regions in distal chromosome However, users of biodiversity are often not
areas (Wenzl et al., 2006). interested in random variation but rather in
DArT technology has been successfully variation that might affect the evolutionary
developed for Arabidopsis, cassava, bar- potential of a species or the performance of
ley, rice, wheat, sorghum, ryegrass, tomato an individual genotype. Such functional
and pigeon pea, while work is in progress variation can be tagged with neutral molec-
to establish DArT in chickpea, sugarcane, ular markers using quantitative trait loci
lupins, quinoa, banana and coconut (http:// (QTL) and linkage disequilibrium mapping
www.diversityarrays.com). For example, a approaches. Alternatively, DNA-profiling
genetic map with 385 unique DArT mark- techniques may be used that specifically
ers spanning the 1137 cM barley genome target genetic variation in functional parts
(Wenzl et al., 2004) was constructed, DArT of the genome.
markers along with AFLP and SSR mark-
ers were mapped on the wheat genome GENIC MARKERS. A wealth of DNA sequence
(Semagn et al., 2006), and a cassava DArT information from many fully characterized
genotyping array containing approximately genes and full-length cDNA clones has been
1000 polymorphic clones (Xia, L. et al., generated and deposited in online databases
2005) is now available. for an increasing number of plant species
and the sequence data for ESTs, genes
Genic and functional markers and cDNA clones can be downloaded from
GenBank and scanned for identification of
DNA markers can be classified into random SSRs. Subsequently, locus-specific primers
markers (RMs) (also known as anonymous flanking EST- or genic SSRs can be designed
or neutral markers), gene targeted mark- to amplify the microsatellite loci present in
ers (GTMs) (also known as candidate gene the genes. In maize for example, gene-derived
marker) and functional markers (FMs) SSR markers that have been developed
(Anderson and Lbberstedt, 2003). RMs from genes and their primer sequences
are derived at random from polymorphic are available at www.maizeGDB.org. Genic
sites across the genome whereas GTMs are SSRs have some intrinsic advantages over
42 Chapter 2

genomic SSRs because they can be obtained Novel markers can be developed from
quickly by electronic sorting, are present the transcriptome and specific genes. As
in expressed regions of the genome and summarized by Gupta and Rustgi (2004),
expected to be transferable across species these include EST polymorphisms (devel-
(when the primers are designed from more oped using EST databases); conserved
conserved coding regions; Varshney et al., orthologue set markers (developed by com-
2005a). The potential use of EST-SSRs devel- paring the sequences of target genomes with
oped for barley and wheat has been demon- sequences of the closely related species);
strated for comparative mapping in wheat, amplified consensus genetic markers (based
rye and rice (Yu et al., 2004; Varshney et al., on the known genes from model species);
2005a). These studies suggested that EST-SSR gene-specific tags (with primers designed
markers could be used in related species for using gene sequences); resistance gene
which little information is available on SSRs analogues (with primers designed to iden-
or ESTs. In addition, the genic SSRs are good tify consensus domains conferring resist-
candidates for the development of conserved ance); exonretrotransposon amplification
orthologous markers for the genetics and polymorphism (with primers designed to
breeding of different species. For example, a combine with a long terminal repeat retro-
set of 12 barley EST-SSRs was identified that transposon-specific primer or a randomly
showed significant homology with the ESTs selected microsatellite-containing oligonu-
of four monocotyledonous species (wheat, cleotide); and PCR-based markers target-
maize, sorghum and rice) and two dicotyle- ing exons, introns and promoter regions of
donous species (Arabidopsis and Medicago) known genes with high specificity.
which could potentially be used across these Target region amplification polymor-
species (Varshney et al., 2005a). phism (TRAP) markers are derived from a
Kumpatla and Mukopadhyay (2005) rapid and efficient PCR-based technique
examined the abundance of SSR in more which uses bioinformatics tools and EST
than 1.54 million ESTs belonging to 55 database information to generate poly-
dicotyledonous species. They found that the morphic markers around targeted candi-
frequency of ESTs containing SSR among date gene sequences (Hu and Vick, 2003).
species ranged from 2.65 to 16.82%, with This TRAP technique uses two primers of
dinucleotide repeats being most abundant 18 nucleotides to generate markers. TRAP
followed by tri- or mononucleotide repeats, markers are amplified by one fixed primer
thus demonstrating the potential of in designed from a target EST sequence in the
silico mining of ESTs for the rapid develop- database and a second primer of arbitrary
ment of SSR markers for genetic analysis sequence except for AT- or GC-rich cores
and application to dicotyledonous crops. that anneal with introns and exons, respec-
However, EST-SSRs produce high quality tively. The TRAP technique should be use-
markers but these are often less polymorphic ful in genotyping germplasm collections
than genomic SSRs (Cho et al., 2000; Eujayl and in tagging genes with beneficial traits
et al., 2002; Thiel et al., 2003). EST resources in crop plants.
are also being used to mine SNPs (Picoult-
Newberg et al., 1999; Kota et al., 2003). ESTs FUNCTIONAL MARKERS. Functional markers
provide a quantitative method of measuring (FMs) are derived from polymorphic sites
specific transcripts within a cDNA library within genes causally affecting phenotypic
and represent a powerful tool for gene dis- variation. The development of FMs requires
covery, gene expression, gene mapping and allele-specific sequences of functionally
the generation of gene profiles. The National characterized genes from which polymor-
Center for Biotechnology Information (NCBI) phic, functional motifs affecting plant phe-
database, dbEST 0900409 (http://www.ncbi. notype can be identified. Some theoretical
nlm.nih.gov/dbEST_summary.html) contains and application issues relevant to functional
the largest collection of ESTs in rice, wheat, markers in wheat have been addressed (Bagge
barley, maize, soybean, sorghum and potato. et al., 2007; Bagge and Lbberstedt, 2008).
Markers and Maps 43

FM development requires allele response to gibberellin and consequently


sequences of functionally characterized lead to decreased plant height. Thus, bial-
genes from which polymorphic, functional lelic (gibberellin sensitive and insensitive)
motifs affecting plant phenotype can be iden- FMs can be derived for targeted and rapid
tified. In contrast to RMs, FMs can be used as cultivar breeding aimed at increasing lodg-
markers in populations without prior map- ing tolerance.
ping, in mapped populations without risk In this section, several widely used DNA
of information loss owing to recombination markers have been discussed along with
and to better represent the genetic variation an overview of classical genetic markers.
in natural or breeding populations. Once DNA markers have gained wide acceptance
genetic effects have been assigned to func- because of their genome-wide coverage and
tional sequence motifs, FMs derived from increasingly simple and easy genotyping. It
such motifs can be used to fix gene alleles can be expected that SNP markers, as the
(defined by one or several FM alleles) in ultimate form of genetic polymorphism, will
several genetic backgrounds without addi- largely replace other types of markers when
tional calibration. This would be a major whole DNA sequences become available for
advance in the application of markers, par- an increasing number of plant species (e.g.
ticularly in plant breeding, for the selection Lu et al., 2009; Xu et al., 2009b). However,
of parental materials to produce segregating the choice of DNA markers in genetics and
populations for example, as well as the sub- breeding is still highly dependent on the
sequent selection of inbred lines (Andersen accessibility of geneticists and breeders
and Lbberstedt, 2003). Depending on the to various genetic resources including the
mode of FM characterization, they can also availability of DNA markers and the time
be used for the combination of target alleles and cost involved. Table 2.2 compares the
in hybrid and synthetic breeding and culti- five most widely used DNA markers.
var testing based on the presence or absence
of specific alleles at morphological trait
loci. In population breeding and recurrent 2.2 Molecular Maps
selection programmes, FMs can be used to
avoid genetic drift at characterized loci.
The order and relative distance of genetic
A typical example is Dwarf8 in maize
features that are associated with genetic
which encodes a gibberellin response mod-
variation or polymorphisms can be deter-
ulator from which FMs can be developed for
mined by genetic mapping. Genetic maps
plant height and flowering time. For exam-
constructed using molecular markers can
ple, nine sequence motifs in the Dwarf8
also be used to locate major genes which
gene of maize were shown to be associated
can then also be used as genetic markers.
with variation in flowering time and one
particular 6-bp deletion accounted for 711
days difference in flowering time between
inbreds (Thornsberry et al., 2001). Since 2.2.1 Chromosome theory and linkage
Dwarf8 is a pleotropic gene (also affecting
plant height) the FM from additional flow- During meiosis the parental diploid (2n)
ering time genes should also be identified cell divides to produce four haploid (n)
in addition to using the Dwarf8-derived FM. gametes. During the first meiotic division,
Orthologues to Dwarf8 have been identi- the homologous chromosomes align and
fied in wheat (Rht1) (Peng et al., 1999), rice stick together in a process called synapsis.
(SLR1) (Ikeda et al., 2001), and barley (sln1) This allows spindle fibres to attach to the
(Chandler et al., 2002), and such genes have synapsed homologues (tetrads) and to move
been bred into the high-yielding wheat them as a group to the equator of the cell.
and rice cultivars of the Green Revolution As anaphase begins, the homologues can
(Hedden, 2003). Altered function of alleles then be oriented such that they are pulled
in these orthologous genes can reduce the apart to opposite poles of the cell. Following
44
Table 2.2. Comparison of the five widely used DNA markers in plants.

RFLP RAPD AFLP SSR SNP

Genomic coverage Low copy coding region Whole genome Whole genome Whole genome Whole genome
Amount of DNA required 5010 g 1100 ng 1100 ng 50120 ng 50 ng
Quality of DNA required High Low High Medium high High
Type of polymorphism Single base Single base Single base Changes in Single base changes,
changes, indels changes, indels changes, indels length of repeats indels
Level of polymorphism Medium High High High High
Effective multiplex ratio Low Medium High High Medium to high

Chapter 2
Inheritance Co-dominant Dominant Dominant/ Co-dominant Co-dominant
co-dominant
Type of probes/primers Low copy DNA or Usually 10 bp Specific sequence Specific sequence Allele-specific PCR
cDNA clones random nucleotides primers
Technically demanding High Low Medium Low High
Radioactive detection Usually yes No Usually yes Usually no No
Reproducibility High Low to medium High High High
Time demanding High Low Medium Low Low
Automation Low Medium High High High
Development/start-up cost High Low Medium High High
Proprietary rights required No Yes and licensed Yes and licensed Yes and some Yes and some licensed
licensed
Suitable utility in diversity, Genetics Diversity Diversity and All purposes All purposes
genetics and breeding genetics
Markers and Maps 45

telophase and cytokinesis, two new daugh- with an increased number of molecular
ter cells are formed. Each of these daughter markers in the segregated population; geno-
cells has half the chromosomes (n) of the typing each individual/line using molecu-
parental cell (2n). The second meiotic divi- lar markers; and constructing linkage maps
sion closely resembles mitosis with each of from the marker data.
the nuclei generated during the first meiotic The recombination frequency between
division splitting to form two more nuclei. two linked genetic markers can be defined
Thus, four haploid gametes are produced. in units of genetic distance known as cen-
Crossing over is the process by which tiMorgans (cM) or map units. If two mark-
homologous chromosomes exchange por- ers are found to be separated in one of 100
tions of their chromatids during meiosis, progeny, those two markers are 1 cM apart.
resulting in new combinations of genetic However, 1 cM does not always correspond
information and thus affecting inheritance to the same length of physical distance or
and increasing genetic diversity. Genes that the same amount of DNA. The amount of
are present together on the same chromo- DNA per cM is referred to as the physical
some tend to be inherited together and are to genetic distance. Areas in the genome
referred to as linked. Genes that are nor- where recombination is frequent are known
mally linked may be inherited independ- as recombination hot spots; there is rela-
ently during crossing over. tively little DNA per cM in these hot spots
The proportion of recombinant gam- and it can be as low as 200 kb/cM. In other
etes depends on the rate of crossover during areas recombination may be suppressed and
meiosis and is known as the recombination 1 cM will represent more DNA and in some
frequency (r). The maximum proportion of regions the physical to genetic distance can
recombinant gametes is 50% and in this be up to 1500 kb/cM.
case crossover between two genetic loci has
occurred in all the cells. This is equivalent
to the case of non-linked genes, i.e. the two Developing mapping populations
loci are inherited independently. In population development, several factors
The recombination frequency depends should be taken into consideration includ-
on the rate of crossovers which in turn ing the selection of parental lines and
depends on the linear distance between two population types and the determination of
genetic loci. Recombination frequencies population size.
range from 0 (complete linkage) to 0.5 (com-
plete independent inheritance).
CHOICE OF PARENTAL LINES.
Four factors should
be considered in selecting appropriate
parental lines (Xu and Zhu, 1994):
2.2.2 Genetic linkage mapping 1. DNA polymorphism: genetic polymor-
phism between parental lines usually
In order to utilize the genetic information depends on how closely related they are,
provided by molecular markers more effi- which can be determined by criteria such as
ciently, it is important to know the locations geographical distribution and morphological
and relative positions of molecular mark- and isozyme polymorphisms. In general,
ers on chromosomes. The construction of DNA polymorphism is greater in open-
genetic linkage maps using molecular mark- pollinated species than in self-pollinated
ers is based on the same principles as those species. For example, RFLP polymorphism
used in the preparation of classical genetic is very high among maize lines so that a
maps: selection of molecular markers and population derived from any two inbred
genotyping system; selection of parental lines would be desirable for RFLP mapping.
lines from the germplasm collection that are Genetic polymorphism is very low in tomato
highly polymorphic at marker loci; devel- so that only interspecific populations are
opment of a population or its derived lines sufficiently polymorphic to allow for RFLP
46 Chapter 2

mapping. The level of polymorphism in rice P1  P2

is intermediate. In plant breeding, many


P3   P1 or P2
novel traits have been transferred from wild TC F1 BC1
species to cultivated species and such wild- AC  P1 or P2
cultivated crosses usually show high levels 

of DNA polymorphism. Several mapping F2 DH


BIL/NIL
P1, P2, F1   P3, P4, ...
populations may be needed because genetic IM

polymorphisms that cannot be found in one TTC DH-TC
population may be identified in another. F3
F2-IM
2. Purity: in the case of self-pollinated 
plants, the parental lines to be used for devel-
opment of mapping populations should be  P3, P4, ...
RIL RIL-TC
breeding-true, i.e. homozygous at almost
all of the genetic loci. Purification through Fig. 2.7. Examples of mapping populations and
further inbreeding may be needed before their relationship. Modified from Xu and Zhu (1994).
hybridization is carried out. Breeding-true AC, anther culture; BC, backcross population;
inbred lines can be used as parents in cross- BIL, backcross inbred line; DH, double haploid;
pollinated plants. For plants for which true- IM, intermating; NIL, near-isogenic line; RIL,
breeding is not possible, genetic mapping recombinant inbred line; TC, testcross; TTC,
can be based on the populations derived triple testcross.
from two heterogeneous parental lines.
3. Fertility: hybrid fertility determines are heterozygous. For dominant markers,
whether a large segregating population can be dominant homozygotes cannot be distin-
obtained. Distant crosses are usually accom- guished from the heterozygotes and the accu-
panied by abnormal chromosome pairing and racy of mapping is therefore reduced. In
recombination, segregation distortion and order to improve the accuracy, more F2 indi-
reduced recombination frequencies. Some viduals will be needed unless co-dominant
distant hybrids may be partially or completely markers can be used. Another disadvantage
sterile so that it becomes difficult to obtain a of F2 populations is that their genetic consti-
segregating population. In this case, back- tution will change during sexual reproduc-
crossing populations can be used for mapping tion so that their genetic structure is difficult
as partially sterile hybrids can be rescued by to maintain. Vegetative reproduction is one
backcrossing to one of the parents. method of prolonging the life of a popula-
4. Cytological features: cytological exami- tion as exemplified by ratooning in some
nation may be necessary in order to exclude grass species. Tissue culture (see Chapters 4
individuals containing translocations and and 12) is another method that can be used
polyploid species containing monosomes to regenerate a population without changing
or partial chromosomes from being used as its constitution. Using bulked DNA from F3
mapping parents. families, which are derived from F2 individ-
uals, is an alternative approach to prolong-
CHOICE OF POPULATION TYPES. There are many ing the life of a population because in some
types of populations that can be used for crops such as rice and maize, one F2 plant
genetic mapping. Figure 2.7 shows the rela- produces a large number of seeds which are
tionship between populations derived from sufficient for multiple plantings. By random
two or multiple parental lines. Some of mating within each F3 family, an F3 popula-
these populations are discussed in detail in tion can also be maintained.
Chapter 4. Their use in genetic mapping is Backcross populations (e.g. BC1) are
discussed below. also frequently used in genetic mapping.
F2 populations are used most frequently BC1 populations have only two genotypes
in linkage mapping because they are easy at each marker locus which represent the
to develop. At each marker locus, however, corresponding gametes produced in the F1
50% of the individuals in an F2 population hybrid, an advantage over F2 populations.
Markers and Maps 47

If reciprocal BC1 populations, A (A B) is to produce a relatively large population


and (A B) A are obtained from a cross by containing about 500 or more individuals
using the F1 hybrid as male and female par- from which a subset (n 150) can then be
ents, respectively, the difference in recombi- used for the construction of a framework
nation frequencies between male and female map as the initial step in genetic mapping.
gametes can be compared and the former When fine mapping of a specific chromo-
indicates the recombination frequency of some region is required, all the individuals
male gametes while the latter indicates that in the population can be used.
for female gametes. Like F2 populations, With regard to the mapping power, the
the genetic constitution of BC populations population size required depends on the
will change with selfing and they need to maximum map distance that can be distin-
be conserved in the same way as F2 popula- guished from random assortment and the
tions. For many crop species, false hybrids minimum map distance at which recombi-
may pose a problem which contributes to nation can be detected between two genetic
the inaccuracy of genetic mapping. When markers (Fig. 2.8). Using a large mapping
distant crosses are used however, backcross population, it is possible to map very small
populations are the only populations that genetic distances between markers and also
can be developed because of high sterility to identify weak genetic linkages. For exam-
among the F1 hybrids. ple, one recombinant represents a 1% recom-
Permanent populations such as doubled binant frequency ( 1 cM) for a population of
haploid (DHs), recombinant inbred lines 100 individuals, a 2% recombinant frequency
(RILs) and backcross inbred lines (BILs), ( 2 cM) for a population of 50 individuals
which are fully discussed in Xu and Zhu but only a 0.1% recombination frequency (
(1994) and Chapter 4, provide a continuous 0.1 cM) for a population of 1000 individuals.
supply of genetic material leading to the The maximum map distance that can
accumulation of genetic information pro- be distinguished, max, can be determined
duced in different laboratories and experi- as follows
ments. For major crops, there are many
permanent populations available that are max = r + t0.01, n 2 SEr < 0.50 cM
shared internationally with the continuous
accumulation of genetic marker and pheno- where n is the population size, t is Students
typic data. During population development t parameter for a significant probability of
careful attention should be paid to selection 0.01, n 2 is the degrees of freedom, SEr is
factors that could affect the segregation pat- the standard error of r and r is a point esti-
terns (Xu et al., 1997; Chapter 4). In some mate of recombination frequency.
cases, distorted segregation could become The population size required also
very severe if selection pressure is high. depends on the type of population. For
example, more individuals from F2 popula-
POPULATION SIZE. Achieving the maximum tions are required compared with BC or DH
resolution and accuracy from genetic maps populations, because the F2 population con-
largely depends on the size of the mapping tains more marker genotypes and to guarantee
population: the larger the mapping popula- detection of each genotype, a greater number
tion, the greater the accuracy of the genetic of individuals is required. In general, F2 popu-
map. The research objectives dictate the size lation size should be doubled compared to BC
of the population. For example, the con- in order to obtain the same mapping accuracy.
struction of marker maps requires a much Therefore, BC or DH populations are more
smaller population than the fine mapping suitable than F2 for genetic marker mapping.
of QTL (Chapters 6 and 7). Construction of The mapping power of RIL populations is in
a high density marker map can be achieved between that of F2 and BC (DH) populations.
with as few as 200 plants but fine mapping Maximum detectable map distance and mini-
a population in order to clone a gene usually mum resolvable map distance for F2 and BC
requires over 1000 plants. One alternative populations are shown in Fig. 2.9.
48 Chapter 2

50
Maximum distance between markers
Average distance between markers
40

30
cM

20

10

0
0 300 600 900 1200 1500 1800 2100 2400
Number of markers

Fig. 2.8. Average and maximum distance expected between markers on a linkage map depending
on number of random markers mapped for a genome with 1200 cM, e.g. 12 chromosomes of 100 cM
each. The maximum distance curve is for 95% confidence level. From Tanksley et al. (1988) with kind
permission of Springer Science and Business Media.

50 ently. However, the observed frequency of


Maximum detectable double crossovers is usually lower than
40 that expected by calculating r1 r2 which
means that a single crossover occurring in
30
F2 a particular chromosome region will reduce
cM

the probability of a second single crossover


BC
20 occurring in its flanking regions. This phe-
nomenon is called crossover interference.
10 Minimum resolvable The degree of interference can be meas-
ured by the coefficient of coincidence (C),
0
0 20 40 60 80 100 120 140
Population size Observed double crossover
C=
Expected double crossover
Fig. 2.9. Maximum detectable and minimum
Observed double crossover
resolvable map distances between markers =
utilizing backcross (BC) and F2 populations. r1 r2 n
Curves are for 99% confidence level. From
Tanksley et al. (1988) with kind permission of
Springer Science and Business Media.
where n is the total number of individuals
observed (including both recombinants and
non-recombinants). When C = 0, there is
Interference and mapping functions complete interference and no double cross-
overs, this usually means that the involved
As the genetic distance between two mark- chromosome region is very short. When
ers increases, the chance of double crossing C = 1, there is no interference, indicating
over within a marker interval increases. For that the involved chromosome region is
three linked genes, A, B and C with r1 and r2 long so that the single crossovers can occur
as single crossover frequencies between A-B independently.
and B-C, the double crossover frequency The genetic distance estimated from the
between A and C can be estimated as r1 r2 recombinant frequency will be smaller than
if the two single crossovers occur independ- the real distance by 2C r1r2 if the double
Markers and Maps 49

crossover is not taken into account. When 1 1 + 2r


q = ln
the genetic distance between two markers
4 1 2r
is relatively large, the adverse effect of dou-
ble or multiple crossovers on the estimation When r = 0.22, q = 23.6 cM. As two loci
of recombination frequency should be cor- become further apart, the amount of interfer-
rected. The correction can help establish a ence allowed by the Kosambi map function
reliable function between genetic distance decreases. For very small values of recombi-
and recombinant frequency and this cor- nation (r), both Haldane and Kosambi map
rection function is known as a mapping functions give q r.
function.
The number of (odd) crossovers (k) in
Segregation and linkage tests
an interval defined by two genetic markers
has a Poisson distribution with mean q. With co-dominance and complete domi-
nance models, populations F2, BC and DH
(RIL) have the segregation ratios at locus
q k e q
Pr(recombination) =
k
k!
M with two alleles M1 and M2 shown at the
bottom of the page.
q q3 qk Assuming two genetic loci M and
= e q + + ... + N each with two alleles, M1, M2 and N1,
1! 3! k !
N2 and a recombinant frequency r, geno-
e q (eq e q ) types and frequencies in an F2 popula-
=
2 tion derived from two parental lines, P1
1 (M1M1N1N1) and P2 (M2M2N2N2) will be as
= (1 e 2q )
2 shown in Fig. 2.10.
There are three types of locus combina-
This probability is represented by r and tions between two loci M and N, depending
has the following limits 0 r 1/2. q is the on the dominance: (1:2:1)-(1:2:1), (3:1)-(1:2:1)
number of map units (M) between two mark- and (3:1)-(3:1).
ers. Assuming that C = 1, Haldane (1919) By combining the genotypes listed
derived the relationship between the map in Fig. 2.10, nine genotypes and their fre-
distance (cM) and recombinant frequency r quencies can be obtained. Similarly, we can
by solving the equation for q: obtain genotypes and their frequencies for
(3:1)-(1:2:1) and (3:1)-(3:1) linkage combina-
tions (Fig. 2.11).
1
q= ln(1 2r ) Linkage can be determined by com-
2 paring the observed frequency for each
genotype with the theoretical frequency
which is known as Haldanes map function. expected from Mendelian ratios. If there are
When r = 0, q = 0 (complete linkage). When n individuals, the genotypes/phenotypes
r = 1/2, q = (markers are unlinked), suggest- listed in Fig. 2.11 can then be identified
ing that the markers are either on the same from top to bottom as n1 to n9 for (1:2:1)-
chromosome but distant from one another (1:2:1), n1 to n6 for (3:1)-(1:2:1) and n1 to n4
or are located on different chromosomes. for (3:1)-(3:1); linkage can be determined
When r = 22%, q = 29 cM. from these observations.
Kosambi (1944) derived a mapping Linkage detection depends on the nor-
function that takes the crossover interfer- mal segregation of the genetic loci involved,
ence into account: thus each locus should be tested to ensure

Population F2 BC DH (RIL)
Co-dominance 1 M1M1:2 M1M2:1 M2M2 1 M1M2:1 M2M2 1 M1M1:1 M2M2
M1 is dominant 3 M1_:1 M2M2 1 M1M2:1 M2M2 1 M1M1:1 M2M2
50 Chapter 2

F2 gamete frequency M1N1 (1 r)/2 M1N2 r/2 M2N1 r/2 M2N2 (1 r)/2

M1N1 (1 r)/2 M1M1N1N1 M1M1N1N2 M1M2N1N1 M1M2N1N2


(1 r)2/4 r(1 r)/4 r(1 r)/4 (1 r)2/4
M1N2 r/2 M1M1N1N2 M1M1N2N2 M1M2N1N2 M1M2N2N2
r(1 r)/4 r 2/4 r 2/4 r(1 r)/4
M2N1 r/2 M1M2N1N1 M1M2N1N2 M2M2N1N1 M2M2N1N2
r(1 r)/4 r 2/4 r 2/4 r(1 r)/4
M2N2 (1 r)/2 M1M2N1N2 M1M2N2N2 M2M2N1N2 M2M2N2N2
(1 r)2/4 r(1 r)/4 r(1 r)/4 (1 r)2/4

Fig. 2.10. Theoretical ratios in an F2 population derived from two parents M1M1N1N1 and M2M2N2N2 with
recombinant frequency r.

(1:2:1)-(1:2:1) (1:2:1)-(3:1) (3:1)-(3:1)


Genotype Frequency Genotype Frequency Genotype Frequency

M1M1N1N1 (1 r)2 M1M1N1_ 1 r2 M1_N1_ 3 2r + r 2


M1M1N1N2 2r(1 r) M1M1N2N2 r2 M1_N2N2 2r r 2
M1M1N2N2 r2 M1M2N1_ 2(1 r + r 2) M2M2N1_ 2r r 2
M1M2N1N1 2r(1 r) M1M2N2N2 2r(1 r) M2M2N2N2 1 2r + r 2
M1M2N1N2 2(1 2r + 2r 2) M2M2N1_ 2r r 2
M1M2N2N2 2r(1 r) M2M2N2N2 (1 r)2
M2M2N1N1 r2
M2M2N1N2 2r(1 r)
M2M2N2N2 (1 r)2

Fig. 2.11. Genotypes and their frequencies for three linkage combinations at two loci in F2 populations
(each frequency divided by 4).

that it fits Mendelian segregation. For each 2


of the three linkage combinations listed
2
cM = {2(n1 + n2 + n3 )2
n
above, four c2 tests can be constructed:
+ (n4 + n5 + n6 )2
cT2: general test + 2(n7 + n8 + n9 )2 } n dfM = 2
cM2 : test to determine whether the
segregation of M1 and M2 fits the 2
c N2 = {2(n1 + n4 + n7 )2
Mendelian ratio n
cN2 : test to determine whether the + (n2 + n5 + n8 )2
segregation of N1 and N2 fits the
+ 2(n3 + n6 + n9 )2 } n dfN = 2
Mendelian ratio
c 2L: test to determine whether M and N c2L = cT2 cM
2
c2N dfL = 4
loci are linked
For linkage combination (3:1)-(1:2:1)
Therefore
8
c T2 = cM2 + c N2 + cL2 cT2 = (2n12 + n32 + 2n52
3n
+ 6n22 + 3n42 + 2n62 ) n dfT = 5
For linkage combination (1:2:1)-(1:2:1)

4 2 4
cT2 = {n5 + 2(n22 + n42 + n62 + n82 )
2
cM = ((n1 + n3 + n5 )2
n 3n
dfT = 8 + 3(n2 + n4 + n6 )2 ) n
+ 4(n12 + n32 + n72 + n92 )} n dfM = 1
Markers and Maps 51

2 We have
c N2 = (2(n1 + n2 )2 + (n3 + n5 )2
n c T2 c20.05(8) = 15.5
+ 2(n4 + n6 )2 ) n dfN = 2 2
cM c0.05(2)
2
= 5.99
c2L = cT2 cA2 c2B dfL = 2 c N2 c20.05(2) = 5.99
c L2 c20.05(4) = 9.49
For linkage combination (3:1)-(3:1)
which indicates that both loci M and N
show normal Mendelian segregation and
1 2
c 2M = (n1 + n22 3n32 3n42 ) dfM = 1 are linked.
3n

1 2 Maximum likelihood estimation (MLE)


c 2N = (n1 3n22 + n32 3n42 ) dfN = 1 of recombinant frequency
3n
To simplify, we take the linkage combina-
1 2
c 2L = (n1 3n22 3n32 + 9n42 ) dfL = 1 tion (3:1)-(3:1) (one of the alleles at each
9n locus shows complete dominance) as an
c 2T = c A2 + c 2B + c 2L dfT = 3 example to show how to obtain the MLE for
recombination frequency. From Fig. 2.11,
Similarly, three linkage combinations there are four types of phenotypes, M1_N1_,
for BC or DH (RIL) populations can be M1_N2N2, M2M2N1_ and M2M2N2N2 with the-
constructed. oretical frequencies pi (i = 1, 2, 3, 4). pi is a
For example, linkage for (1:2:1)-(1:2:1) function of r, a parameter to be estimated,
in an F2 population as shown in Fig. 2.12 and f is a function of frequency:
can be tested as follows
pi = f(r)

cT2 =
4
{(562 + 2(62 + 52 + 42 + 32 ) We have p1 (M1_N1_) = (3 2r + r 2)/4, p2
132 (M 1 _N 2 N 2 ) = p 3 (M 2 M 2 N 1 _) = (2r r 2 )/4,
+ 4(272 + 12 + 02 + 302 )} p4 (M2M2N2N2) = (1 2r + r 2)/4, and pi = 1.
132 = 165.818 Considering the number of individuals
observed for each category, n1, n2, n3 and n4,
2 and ni = n, they have a probability distri-
2
cM = {2(27 + 6 + 1)2 + (5 + 56 + 4)2 bution of (p1+p2+p3+p4)n. For a specific set
132
of observations (n1, n2, n3 and n4), the likeli-
+ 2(0 + 3 + 30)2 } 132 = 0.045
hood function is:

2 n!
c N2 = {2(27 + 5 + 0)2 + (6 + 56 + 3)2 L (r ) = ( p1 )n1 ( p2 )n2 ( p3 )n3 ( p4 )n4
132 n1!n2!n3!n4!
+ 2(1 + 4 + 30)2 } 132 = 0.167 n!
= (1/4)n (3 2r r 2 )n1
n1!n2!n3!n4!
c2L = 165.818 0.045 0.167 = 165.606 (2r r 2 )n2 + n3 (1 2r + r 2 )n4

M1M1 M1M2 M2M2 Subtotal The MLE of r is L(r) which can be


obtained by solving the equation and setting
N1N1 27 5 0 32 the derivative zero
N1N2 6 56 3 65
N2N2 1 4 30 35 dL(r )
Subtotal 34 65 33 132 = n =0
dr

Fig. 2.12. Data example used for test of linkage The natural logarithm of L(r) is called sup-
for (1:2:1)-(1:2:1) in an F2 population. port or log-likelihood. Here we have
52 Chapter 2

ln L(r) = C + n1 ln(3 2r + r 2) + (n2 + n3) of ln L(r) with respect to r, E is expectation,


ln(2r r 2) + n4 ln(1 2r + r 2) and

where k 2
d 2[ln L (r )] ni dpi
C = ln
n!
n ln(1/4) dr 2
= p i
2
i dr

n1! n2! n3! n4! k
ni d 2 pi
is a constant. + p dr
i
i
2

The first partial derivative is the
slope of a function. The slope will be zero k 2
d 2[ln L (r )]
p dr
at the maximum (global/local and/or min- 1 dpi
imum). The partial derivative is set with
E
dr 2 = n
i
i
respect to r k
ni d 2 pi
d ln L(r)/dr = 0
+n p dr
i
i
2

k 2
The partial derivative of ln L(r) is usually 1 dpi
denoted as score or S =n i

pi dr
n1 2 (1 r ) 2(1 r )
S= + (n2 + n3 ) k k
3 2r + r 2 2r r 2 d 2 pi
p = 0,
d
Because = i
2(1 r ) dr 2 dr
n4 =0 i i
1 2r + r 2 k
1 dpi 2 k

i
1
That is =n =n i =I
Vr i pi dr i
n1 n + n3 n4
2 + =0
3 2r + r 2 2r r 2 1 2r + r 2 where I is the total information content and
n1 n2 + n3 n4 ii = I/n is the information derived from a
+ =0 single observation.
2 + (1 r )2 1 (1 r )2 (1 r )2
From the above formula, the variance
If (1 r)2 = k, then of r can be calculated using the information
provided in Table 2.3.
n1 n + n3 n4 To estimate k, the values of ni listed in
2 + =0
2+ k 1 k k the table are used in the formula:
therefore (see equation at bottom of page) 1927 19272 + 8 6952 1338
and the MLE is k=
2 6952
= 0.7743
r = 1 k
r = 1 0.7743 = 0.1201
According to the Rao-Cramer Unequation,
the sampling variance of r is Vr = 1.76702 105
1 d 2[lnL (r )]
= E = I Thus,
Vr dr 2
r = 0.1201 1.76702 105
2
d [ln L ( r )]
where 2 is the secondary derivative = 12.01% 0.42%
dr

nk2 + (2n 3n1 n4)k 2n4 = 0 (n = n1 + n2 + n3 + n4)

(2n 3n1 n4 ) + (2n 3n1 n4 )2 + 8nn4


k=
2n
Markers and Maps 53

Table 2.3. Calculation of the variance of recombinant frequency for two linked loci each with complete
dominance.

2
dpi 1 dpi
ii =
Group ni pi dr p i dr
2
M1_N1_ 4831 (3 2r + r 2 )/4 2(1 r )/4 (1 r )
i1 = 2
3 2r + r
2
M1_N2N2 390 (2r r 2)/4 2(1 r )/4 (1 r )
i2 = 2
2r r

2
M2M2N1_ 393 (2r r 2)/4 2(1 r)/4 (1 r )
i2 = 2
2r r
2
4(1 r )
M2M2N2N2 1338 (1 r 2)/4 2(1 r)/4 i4 = =1
2
4(1 r )
(1 r )2 2(1 r )2
Total 6952 = n 1 0
ii =
3 2r + r 2
+
2r r 2
+1

This is an example of (3:1)-(3:1) link- To simplify the calculation, the log base 10 of
age combination. Allard (1956) derived the ratio L(r)/L(1/2) known as LOD, is used
formulas for r and Vr for almost all possi-
ble linkage combinations and for different L( r )
populations. LOD = log10
L(1/2)

Likelihood ratio and linkage test With n = 6952, n1 = 4831, n2 = 390, n3 = 393,
and n4 = 1338, likelihood of odds (LOD)
In human genetics the linkage phase (repul- scores can be calculated for different r values
sion or coupling) is usually unknown thus as shown below (see (b) at bottom of page).
making it impossible to calculate recom- The result indicates that LOD scores
binant frequency based on the observable vary with r and reach the maximum when
recombinants. As a result, likelihood ratios r = 0.12.
or odds ratios (Fisher, 1935; Haldane and If M and N are linked, L(r)/L(1/2) > 1,
Smith, 1947; Morton, 1955) have been used and thus LOD is positive. When L(r)/L(1/2)
for linkage testing. The method is based < 1, LOD is negative.
on the comparison of the probability that In human genetics the likelihood ratio
observed data follow an hypothesis, for should be greater than 1000:1, i.e. LOD > 3
example two linked loci and the alternative in order to establish linkage unequivocally.
hypothesis, two independent loci. The ratio The concept of the likelihood ratio is now
of the two probabilities L(r)/L(1/2) is tested widely used in genetic mapping of other
as follows: r = 1/2 is entered into the like- organisms including plant species to judge
lihood function (see equation (a) at bottom the reliability of linkage estimation and to
of page). verify its existence.

n!
L(1 / 2) = (1 / 4)n (2.25)n1 (0.75)n2 + n3 (0.25)n4 (a)
n1 ! n2 ! n3 ! n4 !

r 0.05 0.10 0.12 0.15 0.20 0.25 0.30 (b)


LOD 586.42 682.51 688.04 678.52 632.01 560.79 472.54
54 Chapter 2

Multi-point analysis and ordering the observed data at the converged iteration
a set of markers is 10303.28 (351.45) = 1048 times higher than that
for the initial ri = 0.05.
The methods discussed above are all based
on two-point analysis using two markers
at a time. However, when more than two Linkage mapping in the presence
markers from one chromosome are consid- of genotyping errors
ered, they can theoretically be arranged in As generating marker data is time consum-
many different orders but only one particu- ing and expensive, maximum use should be
lar order will match the genetic order on the made of the information generated. Without
chromosome and this particular order can accounting for genotyping errors, each error
be determined by multi-point analysis. in a non-terminal marker causes two appar-
Consider M1, M2, . . . , Mm genetic markers, ent recombinations in the dataset. Thus
ordered by their real locations on a chromo- every 1% error rate in a marker adds 2 cM
some for m genetic markers, there are a total of of inflated distance to the map. If there is
m!/2 possible orders. Assume the recombinant an average of one marker every 2 cM, then
frequency between two flanking markers, Mi an average of a 1% error rate will double
and Mi+1 is ri. The objective is to find r1, r2, . . . , the size of the map. There will be large
rm1 to maximize the likelihood L(r), distances between adjacent markers with
very high error rates. These cases can be
L(r) p1(r1,r2, . . . ,rm1)n1 p2(r1,r2, . . . ,rm1)n2 detected, either manually or automatically,
. . . pm(r1,r2, . . . ,rm1)nm and the markers removed. Such genotyping
errors can be identified by simply sorting
Using the natural logarithm, the par- the marker data by a given linkage order to
tial derivative is then set with respect to determine whether there are a large number
r1, r2, . . . , rm1. EM algorithm (Dempster of crossovers involved.
et al., 1977) can be used to obtain the MLE For the markers with low error levels
for r1, r2, . . . , rm1, which involves multi- that cannot be detected easily, the best
ple iteration steps of Expectation (E) and strategy is to integrate error detection with
Maximization (M). The multiple steps map-building procedure. Cartwright et al.
include: (i) providing an initial set of esti- (2007) extended the traditional likelihood
mates, r old = (r1, r2, . . . , rm1); (ii) using the model used for genetic mapping to include
intial estimates as the estimates of recom- the possibility of genotyping errors. Each
binant frequencies to obtain the E, i.e. the individual marker is assigned an error rate
expected numbers of recombinants and which is inferred from the data as are the
non-recombinants in each marker interval; genetic distances. A software package,
(iii) using these expected values as true val- TMAP, was developed to use this model to
ues to obtain the MLE for r new = (r1, r2, . . . , rm1); identify maximum-likelihood maps for
(iv) repeating steps (ii) and (iii) until the phase-known pedigrees. The methods
MLE has converged to its maximum. were tested using a data set in Vitis and a
Lander and Green (1987) provided an simulated data set, which confirmed that
example of the EM method for multi-point the method dramatically reduced the infla-
linkage analysis. Using 15 marker inter- tionary effect caused by increasing the
vals on human chromosome 7 determined number of markers and resulted in more
by 16 markers and initial recombinant fre- accurate orders.
quencies of ri = 0.05, the log-likelihood was
found to be 351.45. To reduce the difference Molecular maps in plants
of log-likelihoods between two consective
iterations to less than a given critical value Table 2.4 lists some representative molecu-
(tolerance value, T = 0.01), 12 iterations were lar maps that have been developed for major
needed which resulted in convergence at crop plants including legumes, cereals and
log-likelihood 303.28. The probability of clonal crops, which vary in marker density,
Table 2.4. Representative genetic maps in plants.

Crop Marker and mapping population Map information Reference

Azuki bean SSR, RFLP, AFLP; 187 BC1F1 486 markers mapped into 11 linkage groups spanning 832.1 cM with Han et al. (2005)
(JP81481 Vigna nepalensis) an average marker distance of 1.85 cM, 95% genome coverage
Barley AFLP, SSR, STS, and vrs1); 1172 markers with a total distance of 1595.7 cM, and average marker Hori et al. (2003)
95 RILs (Russia 6 H.E.S. 4) density of 1.4 cM per locus
SNP, SSR, RFLP, AFLP; three DH 1237 markers, based on three mapping populations consisted of 1237 loci, Rostoks et al. (2005)
populations with a total map length of 1211 cM and an average marker density
of 1 locus per cM
Lettuce AFLP, RFLP, SSR, RAPD; seven inter- 2744 markers assigned to nine linkage groups that spanned Truco et al. (2007)
and intraspecific populations a total of 1505 cM. The mean interval between markers is 0.7 cM
Maize SSR markers; one intermated The IBM map: 748 SSR and 184 RFLP markers with a total map length Sharopova et al. (2002)
RIL (IBM) and two immortalized F2s of 4906 cM; two immortalized F2 maps: 457 and 288 SSR markers with
total map length of 1830 and 1716, respectively
cDNA probes; two RIL populations: Framework maps: 237 and 271 loci in IBM and LHRF populations, Falque et al. (2005)
IBM (B37 Mo17) and LHRF that both maps contain 1454 loci (1056 on IBM_Gnp2004 and
(F2 F252) 398 on LHRF-Gnp2004) corresponding to 954 cDNA probes
Oat RFLP, AFLP, RAPD, STS, SSR, 426 loci (with 243 loci each) spanning 2049 cM of the oat genome Portyanko et al. (2001)
isozyme, morphological; 136 F6:7
RIL (Ogle TAM O-301)
Pearl millet RFLP and SSR; four populations A consensus genetic map: 353 RFLP and 65 SSR markers, Qi et al. (2004)
marker density in four maps ranged from 1.49 cM to 5.8 cM
Potato AFLP markers; heterozygous diploid > 10,000 AFLP loci, with marker density proportional to physical van Os et al. (2006)
potato distance and independent of recombination frequency
Rice 726 markers; 113 BC1 (BS125 WL02) 726 markers with a total distance of 1491 cM and average marker Causse et al. (1994)
BS125 density of 4.0 cM on the framework map, and 2.0 cM overall
2275 markers; 186 (Nipponbare 2275 markers with a total distance of 1521.6 cM, and average Harushima et al. (1998)
Kasalath) F2 marker density of 0. 67 cM per locus
Sorghum 2590 PCR-based markers and 137 RIL The 1713 cM map encompassed 2926 loci Menz et al. (2002)
(BTx623 IS3620C)
RFLP probes; 65 F2 (Sorghum bicolor The S. bicolor S. propinquum map is composed of 2512 loci, Bowers et al. (2003a)
Sorghum propinquum) spanning 1059.2 cM, a marker per 0.4 cM
Sweet potato AFLP; (Tanzania Bikilamaliya) 632 (Tanzania) and 435 (Bikilamaliya) AFLP markers, with Kriegner et al. (2003)
F2 population a total of 3655.6 cM and 3011.5 cM, and a marker per 5.8 cM
and 6.9 cM, respectively
Wheat SSR and DArT markers; 152 RILs from a 14 linkage groups, 690 loci (197 SSR and 493 DArT markers), Peleg et al. (2008)
cross between durum wheat and wild spanning 2317 cM, a marker per 7.5 cM
emmer wheat
56 Chapter 2

and genomic coverage. For example, crops can be integrated with the molecular link-
such as barley, maize, potato, rice, sorghum age map by using the same population for
and wheat have high-density genetic maps both conventional and molecular markers.
while cassava, Musa, oat, pearl millet, sweet As only very few morphological markers
potato and yam have less saturated maps. can segregate simultaneously in one popu-
The large variation in map length results lation, integration of many of these mark-
from differences in the number of chro- ers requires multiple populations each with
mosomes and total size of the genomes as an available preliminary molecular map. If
well as from the use of different numbers of a complete linkage map for morpholgical
markers (increasing the number of markers markers is available, the positions of these
will generally give a larger total map length markers relative to molecular markers can
up to a certain threshold), the inclusion of be inferred from the linkage relationship
skewed markers (that tend to exaggerate map revealed by both morphological and molec-
distances) and the use of different mapping ular markers. In addition, morphological
software (which vary in estimates of genetic markers, including some traits of agronomic
distances). In addition, many published importance, can be mapped much more
maps report more linkage groups than the precisely if they are integrated with a dense
basic chromosome number of that species. molecular map and this has now become
This is frequently the result of insufficient an integral step in trait and gene mapping.
marker density as most saturated maps can Integration of conventional and molecu-
be directly aligned with the basic chromo- lar maps has been very successful for crop
some complement (Tekeoglu et al., 2002). plants for which relatively complete genetic
The sophistication of molecular map linkage maps are available as a result of the
construction has developed from the RFLP use of morphological markers.
maps of the 1980s to PCR-based markers Some representative examples of such
of the 1990s to more integrated maps, as maps include rice, maize, tomato and soy-
a result of the use of different types of bean. In rice, 39 morphological markers and
molecular markers including genic mark- 82 RFLP markers were mapped together
ers, over the past decade. Linkage maps based on the segregation analysis of 19 F2
have been used in gene mapping for major populations derived from the crosses between
genes and QTL (Chapters 6 and 7), MAS indica cultivar IR24 and japonica lines with
(Chapters 8 and 9) and map-based gene different morphological markers (Ideta et al.,
cloning (Chapter 11). 1996). In tomato, a number of morphologi-
cal and isozyme markers were mapped with
respect to RFLP markers by orienting the
2.2.3 Integration of genetic maps molecular linkage map to both morphologi-
cal and cytological maps. An integrated high-
Integration of conventional density RFLP-AFLP map of tomato based on
and molecular maps two independent Lycopersicon esculentum
Lycopersicon pennellii F2 populations was
During the period 19801990 molecular constructed (Haanstra et al., 1999), which
maps were developed for many plant species. spanned 1482 cM and contained 67 RFLP
The first generation of molecular maps have and 1175 AFLP markers. Integrated maps
been integrated with conventional genetic were also developed for maize (Neuffer et al.,
maps constructed using morphological and 1997; Lee et al., 2002) and soybean (Cregan
isozyme markers through cytological mark- et al., 1999).
ers and markers shared by different maps.
The 12 molecular linkage groups in rice Integration of multiple molecular maps
(McCouch et al., 1988) were assigned to clas-
sical linkage groups using trisomics for each For many crop plants, several molecular
of the 12 rice chromosomes. Shared markers maps have been constructed using differ-
and those which segregate in the population ent populations. These populations are of
Markers and Maps 57

variable size and structure and maps have Integration of genetic and physical maps
been created using different numbers and
types of markers. To build an integrated Integrated genetic and physical genome
reference or consensus map, the order and maps are extremely valuable for map-
genetic distance between specific markers based gene isolation, comparative genome
is compared across populations and maps. analysis and as sources of sequence-ready
Stam (1993) developed a computer pro- clones for genome sequencing projects.
gram, JOINMAP, for the construction of genetic A well-defined correlation between the
linkage maps for several types of mapping physical and genetic maps will greatly
populations: BC1, F2, RILs, DHs and out- facilitate molecular breeding efforts
breeder full-sib family. JOINMAP can be used through associating candidate genes with
to combine (join) data derived from several important biological or agronomic traits,
sources into an integrated map. positional cloning and comparative analy-
For each crop all the molecular maps sis across populations and species, and
developed from different populations will whole genome sequences, which will in
finally be integrated into a consensus map. turn facilitate the development of various
This process has been very successful for molecular breeding tools.
several major crops and it can be expected Various methods have been developed
that it will be extended to all crops when for assembling physical maps of complex
sufficient maps become available. In wheat, genomes and integrating them with genetic
an SSR consensus map was constructed by maps. To create an integrated genetic and
fusing several genetic maps to maximize the physical map resource for maize, a compre-
integration of genetic mapping information hensive approach was used that included
from different sources (Somers et al., 2004). three core components (Cone et al., 2002).
In cotton, chromosome identities were The first was a high-resolution genetic
assigned to 15 linkage groups in the RFLP map that provided essential genetic anchor
joinmap developed from four intraspecific points for ordering the physical map and
cotton (Gossypium hirsutum L.) popula- for utilizing comparative information from
tions with different genetic backgrounds other smaller genome plants. The physical
(Ulloa et al., 2005). In maize, two popula- map component consisted of contigs (sets
tions of intermated RILs (IRILs) were used of overlapping fingerprint clones) assem-
to build a consensus map, the first panel bled from clones from three deep-coverage
(IBM) was derived from B73 Mo17 and genomic libraries. The third core compo-
the second panel (LHRF) from F2 F252. nent was a set of informatics tools designed
Framework maps of 237 loci were built from to analyse, search and display the mapping
the IBM panel and 271 loci from the LHRF data. In rice, most of the genome (90.6%)
panel. Both maps were used to locate 1454 was anchored genetically by overgo hybrid-
loci (1056 on map IBM_Gnp2004 and 398 ization, DNA gel blot hybridization and
on map LHRF_Gnp2004) that corresponded in silico anchoring (Chen et al., 2002).
to 954 previously unmapped cDNA probes In wheat, the geneticphysical map rela-
(Falque et al., 2005). In barley, Wenzl et al. tionship of microsatellite markers was
(2006) built a high-density consensus link- established using the deletion bin system
age map from the combined data sets of ten (Sourdille et al., 2004). In sorghum, Klein
populations, most of which were simultane- et al. (2000) developed a high-throughput
ously typed with DArT and SSR, RFLP and/ PCR-based method for building bacterial
or STS markers. The map comprised 2935 artificial chromosome (BAC) contigs and
loci (2085 DArT, 850 other loci), spanned locating BAC clones on the genetic map
1161 cM and contained a total of 1629 bins in order to construct an integrated genetic
(unique loci). The arrangement of loci was and physical map. It was found that 30%
very similar to, and almost as optimal as, of the overlapping BACs aligned by AFLP
the arrangement of loci in component maps analysis provided information for merg-
created for individual populations. ing contigs and singletons that could not
58 Chapter 2

be joined using fingerprint data alone. In automated matching of BACs were then
the grasses Lolium perenne and Festuca anchored on to IBM2 and IBM2 neighbour
pratensis, the physical map was integrated maps. In the Gramene database, a web-
with a genetic map using genomic in situ based tool, CMAP, was developed to allow
hybridization, which was composed of 104 users to view comparisons of genetic and
F. pratensis-specific AFLPs. The integrated physical maps (Ware et al., 2002). In addi-
map demonstrated the large-scale analy- tion, an integrated bioinformatic tool, the
sis of the physical distribution of AFLPs Comparative Map and Trait Viewer (CMTV),
and variation in the relationship between was developed to construct consensus
genetic and physical distance from one part maps and compare QTL and functional
of the F. pratensis chromosome to another genomics data across genomes and exper-
(King et al., 2002). iments (Sawkins et al., 2004). All these
An integrated genetic and physi- tools can be used to build integrated maps
cal mapping tool has been developed by based on shared markers and a reference
the Maize Mapping Project, Columbia, map to initiate the process. The integra-
Missouri, USA (http://www.maizemap. tion of genetic, cytological and physical
org/iMapDB/iMap.html). Contigs that maps is illustrated in the example shown
were assembled by fingerprinting and the in Fig. 3.6.
3
Molecular Breeding Tools:
Omics and Arrays

The success of molecular breeding depends sis (2DE). The proteins can be identified by
upon the various tools that can be used for excising the spot from the gel, digesting
the efficient manipulation of genetic varia- the polypeptide into smaller peptide frag-
tion. All kinds of omics, arrays and high- ments with specific proteases, and sequenc-
throughput technologies make it possible to ing the peptides directly or analysing them
carry out more large-scale genetic analyses by mass spectrometry (MS). Although this
and breeding experiments than ever before. method is still useful and widely used, it
These technologies have been incorpo- is limited in sensitivity, resolution, and the
rated into many novel genetic and breeding range of abundance of the different proteins
processes, some of which were described in the sample (Zhu et al., 2003; Baginsky
in Chapter 2. In this chapter, microarrays, and Gruissem, 2004). For example, abun-
high-throughput technologies and several dant proteins in the sample dominate the
aspects of genomics will be briefly discussed gel whereas less abundant proteins might
to provide some of the fundamental know- not be visible. New approaches involve
ledge required for molecular breeding. both improved separation methods and
advanced detection equipment, and several
other new technologies are available for use
3.1 Molecular Techniques in Omics in proteomic research (Kersten et al., 2002;
Zhu et al., 2003; De Hoog and Mann, 2004).
New detection methods and proteomic
Developments in molecular techniques have
technologies are also being developed in an
contributed to the various fields of omics,
array format, which is increasingly being
which include genomics, transcriptomics,
focused on proteinprotein interactions,
proteomics, metabalomics and phenomics.
post-transcriptional modification, and
These underlying developments include
elucidation of three-dimensional protein
advanced gel, hybridization and expression
structure.
systems, cell imaging by light and electron
microscopy, high density microarrays and
array experiments, and genetic readout
experiments. 3.1.1 2-Dimensional gel electrophoresis
Using proteomics as an example, clas-
sical techniques used in proteomics involve 2DE is a form of gel electrophoresis com-
the use of two-dimensional gel electrophore- monly used to analyse proteins. Mixtures of

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 59


60 Chapter 3

proteins are separated by two properties in proteins are separated in one dimension by
two dimensions in 2DE. During the early isoelectric point and in the second dimen-
years of proteomics and until recently, sion by mass. In one-dimensional electro-
profiling of protein expression relied phoresis, proteins (or other molecules)
primarily on the use of two-dimensional are separated in one dimension, so that all
polyacrylmide gel electrophoresis (2D the proteins/molecules in one lane will
PAGE), which was later combined with be separated from one another according
MS. The basic procedure is to solubilize to the differences in a particular property
the protein contents of an entire cell popu- (e.g. isoelectric point) between each com-
lation, tissue or biological fluid, followed ponent. The result is a gel with proteins
by separation of the protein components separated out on its surface (Fig. 3.1a).
in the lysate using 2DE and visualization The proteins can then be visualized by a
of the separated proteins with silver stain- variety of staining methods, the most com-
ing. This approach allows only a limited monly used stains are silver nitrate and
display of the total protein content and Coomassie blue. By combining electro-
can identify only the relatively abundant phoresis with MS, individual proteins can
proteins. be profiled (Fig. 3.1b, c) and theoretical
2DE begins with one-dimensional and acquired MS profiles can be matched
electrophoresis and then separates the by a database search.
molecules by a second property in a direc- An important development in 2D PAGE
tion at 90 to the first. In this technique is the use of immobilized pH gradients

(a) pl
10 9 8 7 6 5 4 3
100
Molecular weight

80
Trypsin
60

40
12 14 16
20 Time
Peptides Separate peptides
0

(c) MS (d) MS/MS


(b) Peptide chromatography and ESI 200
Intensity (arbitrary units)

LLEAAAQSTK
516.27 (2+)
400
y7 y8

q1 q2 516.27 (2+) 100 a2 SQAA E L L


200
y5 y6
b2 y4
y3 y9
0 0
400 600 800 200 600 1000
m/z m/z

Fig. 3.1. Standard protein analysis by two-dimensional electrophoresis followed by mass spectrometry
proteomics. (a) Protein is separated by two-dimensional electrophoresis: in one dimension by
isoelectronic point (pI) and in the second dimension by mass (molecular weight). Individual peptides
are obtained using trypsin to cleave peptide chains. (b) Peptides are separated by chromatography and
then peptides are ionized using electospray ionization (ESI): they pass through the first quadrupole (q1)
and collision chamber (q2). (c) Individual ions are separated based on their mass-to-charge (m/z) by a
mass analyser. (d) From the MS spectrum, an individual peptide ion (516.27 (2+)) is selected for MS/MS
analysis to produce peptide ion fragmentation patterns. Letters S, Q, A, A, E, L and L represent amino
acids in the selected peptide and a2, b2, y3, etc. represent different ions.
Omics and Arrays 61

(IPGs) in which a pH gradient is fixed 3.1.2 Mass spectrometry


within the acrylamide matrix (Gorg et
al., 1999). Because a wide or narrow pH MS is an analytical technique used to deter-
range can be fixed within the gel, IPGs mine the composition of a physical sample
can be used to detect thousands of spots by measuring the mass-to-charge ratio of the
on a single gel with high reproducibility. ions. It has become the method of choice for
A variation on this theme is the use of so- analysis of complex protein samples (Han
called zoom gels in which the protein et al., 2008). MS-based proteomics has estab-
content of an individual sample is first lished itself as an indispensable technology
fractionated into narrow pH ranges under for interpreting the information encoded in
low resolution and then each fraction is genomes; this has been made possible by
subjected to high-resolution separation by technical and conceptual advances in many
2D PAGE. Another innovation in 2DE is dif- areas, most notably the discovery and devel-
ferential in-gel electrophoresis (DIGE; nl opment of protein ionization methods as rec-
et al., 1997) in which two pools of proteins ognized by the award of the Nobel prize for
are labelled with different fluorescent dyes. chemistry to John B. Fenn and Koichi Tanaka
The labelled proteins are mixed and sepa- in 2002. Mass spectrometry instrumentation
rated in the same 2DE. has made strides in recent years in terms of
Some of the main challenges facing dynamic range and sensitivity (Blow, 2008).
expression proteomics, be it using 2D PAGE Mass spectrometric measurements are
or any other approach, include the great carried out in the gas phase on ionized
dynamic range of protein abundance and a analytes. Mass spectrometers consist of
wide range of protein properties including three essential parts; the first, an ionization
mass, isoelectric point, extent of hyropho- source, converts molecules into gas-phase
bicity and post-translational modifications ions. Once ions are created, individual ions
(Hanash, 2003). Reducing sample com- are separated based on their mass-to-charge
plexity prior to analysis, for example by ratio (m/z) by a second device, a mass ana-
analysing protein subsets and subcellular lyser, and transferred by magnetic or electric
organelles separately, improves the reach fields to the third, an ion detector (Fig. 3.1b,
of 2DE or other separation techniques for c and d). The mass analyser is central to the
the quantitative analysis of low-abundance technology. It uses a physical property to
proteins. The isolation of sub-proteome separate ions of a particular m/z value that
components may be combined with protein then strike the ion detector. The magnitude
tagging to further enhance sensitivity. For of the current that is produced at the detec-
example, protein tagging technologies have tor as a function of time (i.e. the physical
been implemented for the comprehensive field in the mass analyser is changed as a
analysis of the cell-surface proteome (Shin function of time) is used to determine the
et al., 2003). m/z value of the ion. In the context of pro-
Even with all the improvements that teomics, its key parameters are sensitivity,
could be introduced, 2DE will probably resolution, mass accuracy and the ability to
remain a rather low-throughput approach generate information-rich ion mass spectra
that requires a relatively large amount of from peptide fragments. The technique has
sample for analysis. The latter is particu- several applications, including identifying
larly problematic when the samples to be unknown compounds by the mass of the
analysed are of limited availability (Hanash, compound molecules or their fragments,
2003). In particular, the use of laser-capture determining the isotopic composition of
microdissection, which allows defined cell an element and its structure by observing
types to be isolated from tissues, yields a the fragmentation, quantifying the amount
very small amount of protein that is dif- of a compound in a sample using carefully
ficult to reconcile with the large amounts designed methods and studying the funda-
needed for 2DE. mentals of gas phase ion chemistry.
62 Chapter 3

There are many types of mass analys- ally coupled to TOF analysers that measure
ers which use static or dynamic fields and the mass of intact peptides, whereas ESI
magnetic or electric fields. Each analyser has mostly been coupled to ion traps and
type has its strengths and weaknesses. Four triple quatrupole instruments and used to
basic types of mass analyser used in pro- generate fragment ion spectra (collision-
teomic research are: ion trap, time-of-flight induced spectra) of selected precursor ions
(TOF), quadrupole and Fourier transform (Aebersold and Goodlett, 2001). ESI creates
mass spectrometry (FT-MS) analyser. In ion- ions by application of a potential to a flow-
trap analysers, the ions are first captured or ing liquid causing the liquid to charge and
trapped for a certain time interval and are subsequently spray. The electrospray creates
then subjected to MS or tandem MS (MS/ very small droplets of solvent-containing
MS) analysis. Ion traps are robust, sensitive analyte. Solvent is removed by heat or some
and relatively inexpensive. A disadvantage other form of energy (e.g. energetic collisions
is their relatively low mass accuracy, due in with a gas) as the droplets enter the mass
part to the limited number of ions that can spectrometer and multiply-charged ions are
be accumulated at their point-like centre formed in the process. ESI ionizes the ana-
before space-charging distorts their distribu- lytes out of a solution and is therefore read-
tion and thus the accuracy of the mass meas- ily coupled to liquid-based (for example,
urement. The linear or two-dimensional ion chromatographic and electrophoretic) sepa-
trap is a recent development where ions ration tools (Fig. 3.1). MALDI creates ions
are stored in a cylindrical volume that is by excitation of molecules that are isolated
considerably larger than that of the tradi- from the energy of the laser by an energy-
tional, three-dimensional ion traps, allow- absorbing matrix. The laser energy strikes
ing increased sensitivity, resolution and the crystalline matrix to cause rapid excita-
mass accuracy. The FT-MS instrument is tion of the matrix and subsequent ejection of
also a trapping mass spectrometer, although matrix and analyte ions into the gas phase.
it captures the ions under high vacuum in MALDI-MS is normally used to analyse
a high magnetic field. It measures mass by relatively simple peptide mixtures in cases
detecting the image current produced by where integrated liquid-chromatography
ions cyclotroning in the presence of a mag- ESI-MS systems (LC-MS) are preferred for
netic field. Its strengths are high sensitiv- the analysis of complex samples.
ity, mass accuracy, resolution and dynamic Key developments leading to improved
range. In spite of the enormous potential, detection of proteins include TOF MS and
the expense, operational complexity and relatively non-destructive methods for con-
low-peptide-fragmentation efficiency of verting proteins into volatile ions (Zhu et al.,
FT-MS instruments has limited their rou- 2003). MALDI and ESI have made it possible
tine use in proteomic research (Aebersold to analyse large molecules such as peptides
and Mann, 2003). The TOF analyser uses an and proteins. Although MALDI-TOF MS is a
electric field to accelerate the ions through relative high-throughput method compared
the same potential and then measures the with ESI, the latter is more easily coupled
time they take to reach the detector. with separation techniques such as LC or
Techniques for ionization have been key high pressure LC (HPLC) (Zhu et al., 2003).
to determining what types of samples can This has provided an attractive alternative
be analysed by MS. Electrospray ionization to 2DE, because even low-abundance pro-
(ESI; Fenn et al., 1989) and matrix-assisted teins and insoluble transmembrane proteins
laser desorption/ionization (MALDI; Karas can be detected (Ferro et al., 2002; Koller
and Hillenkamp, 1988) are two techniques et al., 2002). Other MS techniques include
most commonly used to volatize and ion- gas chromatographymass spectrometry
ize proteins or peptides for MS analysis (GC-MS), and ion mobility spectrometry/
while inductively coupled plasma sources mass spectrometry (IMS/MS). All MS-based
are used primarily for metal analysis on a techniques require a substantial and search-
wide array of sample types. MALDI is usu- able database of predicted proteins, ideally
Omics and Arrays 63

representing the entire genome. Protein called bait) is screened against a library of
identification is possible by comparing the activation-domain hybrids (prey) to select
deduced masses of the resolved peptide interaction partners (Phizicky et al., 2003).
fragments with the theoretical masses of The key advantages of the Y2H assay
predicted peptides in the database. are its sensitivity and flexibility (Phizicky
Mass spectrometers are restricted in the et al., 2003). The sensitivity derives in part
number of ions that can be detected at any from overproduction of protein in vivo, their
point in time. Pre-fractionation of proteins designed direction to the nuclear compart-
on the basis of isolation of specific cell types ment where interactions are monitored,
or subcellular organelles is often necessary the large number of variable inserts of the
to reduce the complexity (Lonosky et al., interacting proteins that can be examined at
2004). Another method of fractionating a once, and the potency of the genetic selec-
complex sample is to introduce a chromato- tions. This sensitivity leads to the detection
graphic technique before MS analysis. This of interactions with dissociation constants
method, referred to as multidimensional around 107 M which is in the range of most
protein identification technology (MudPIT) weak protein interactions found in the cell
(Whitelegge, 2002) has been used to conduct and is more sensitive than co-purification.
a shotgun survey of metabolic pathways in It also allows detection of certain transient
the leaves, roots and developing seeds of interactions that might affect only a subpop-
rice (Koller et al., 2002). Compared with ulation of the hybrid proteins. Flexibility of
2DE-MS, each method identifies unique pro- the assay is provided by calibration to detect
teins, supporting the complementary nature interactions of varying affinity by altering the
of the different proteomic technologies. expression levels of the hybrid proteins, the
number and nature of the DNA-binding sites
and the composition of the selection media.
3.1.3 Yeast two-hybrid system Some disadvantages of the Y2H assay
include the unavoidable occurrence of false
The yeast two-hybrid assay (Fields and negatives and false positives (Phizicky et al.,
Song, 1989) provides a genetic approach 2003). False negatives include proteins
to the identification and analysis of pro- such as membrane proteins and secretory
teinprotein interactions. Yeast two-hybrid proteins that are not usually amenable to
(Y2H) systems detect not only members of nuclear-based detection systems, proteins
known complexes but also weak or tran- that failed to fold correctly and interactions
sient interactions (Jansen et al., 2005). The dependent on domains occluded in the
Y2H assay makes use of the molecular fusions or on post-translational modifica-
organization found in many transcription tions. False positives include colonies not
factors that have a DNA-binding domain resulting from a bona fide protein interac-
and activation domains that can function tion, as well as colonies resulting from a
independently, but when these domains are protein interaction not indicative of an
fused to two proteins that interact, the abil- association that occurs in vivo.
ity of the domains to control transcriptional There are several variations of the Y2H
activity is reconstituted. In this assay hybrid system. In the reverse Y2H system, induced
proteins are generated that fuse a protein X URA3 expression leads to 5-FOA being con-
to the DNA-binding domain and protein Y verted into the toxic substance 5-fluorouracil
to the activation domain of a transcription by Ura3p, leading to growth prohibition.
factor (Fig. 3.2a). Interaction between X Mutated or fragmented genes are created and
and Y reconstitutes the activity of the tran- then subjected to analysis and only loss-of-
scription factor and leads to expression of interaction mutants are able to grow in the
a reporter gene with a recognition site for presence of 5-FOA. In the one-hybrid sys-
the DNA-binding domain. In the typical tem, the bait is a target DNA fragment fused
practice of this method, a protein of interest to a reporter gene. Preys that are able to bind
fused to the DNA-binding domain (the so- to the DNA fragmentreporter fusion will
64 Chapter 3

(a)

X Y

(b) (c)
X Screened
against
Y1

Screened
X
Screened against
X
against Y2



Screened
X
against
Yn

(d) (e)
X1
Y1
Screened Screened
X
against against

X96
Y96

Fig. 3.2. Yeast two-hybrid approaches. (a) The yeast two-hybrid system. DNA binding and activation
domains (circles) are fused to two proteins X and Y, the interaction of X and Y leads to reporter gene
expression (arrow). (b) A standard two-hybrid search. Protein X, present as a DNA binding domain hybrid,
is screened against a complex library of random inserts in the activation domain vector (shown in square
brackets). (c) A two-hybrid array approach. Protein X is screened against a complete set of full length open
reading frames (ORFs) present as activation domain hybrids (shown as yeast transformant spotted on to
microtitre plates). (d) A two-hybrid search using a library of full length ORFs. The set of ORFs as activation-
domain hybrids (microtitre plates in square brackets) is combined to form a low-complexity library.
(e) A two-hybrid pooling strategy. Pools of ORFs as both DNA-binding domain and activation domain
hybrids (in square brackets) are screened against each other. From Phizicky et al. (2003) reprinted by
permission from Macmillan Publishers Ltd.

lead to activation of the reporter genes (lacZ, bait and prey proteins requires the presence
HIS3 and URA3). In the repressed transac- of a third interacting molecule to form a
tivator system, the interaction of baitDNA complex. The third interacting molecule can
binding domain fusion proteins and the be a protein used with a nuclear localization
preyrepressor domain fusion proteins can acting as a bridge between bait and prey to
be detected by repression of the reporter cause transcriptional activation.
URA3. The interaction of bait and prey ena- Different genome-wide two-hybrid
bles cells to grow in the presence of 5-FOA, strategies have been used to analyse protein
whereas non-interactors are sensitive to interactions in Saccharomyces cerevisiae.
5-FOA as a result of Ura3p production. In One approach involved screening a large
the three-hybrid system, the interaction of number of individual proteins against a
Omics and Arrays 65

comprehensive library of randomly gen- from biotinylated oligo-dT primers. The


erated fragments (Fig. 3.2b). A second DNA is cut with a frequent-cutting restric-
approach used systematic one-by-one test- tion enzyme (NlaIII), and the 3' extremities of
ing of every possible protein combination the double-stranded DNAs are isolated using
using a mating assay with a comprehensive streptavidin (which binds biotin). The double-
array of strains (Fig.3.2c). A third approach stranded DNA is divided into two groups, the
used a one-by-many matings strategy in 5' extremities of which are ligated to primers
which each member of a nearly complete A or B. These primers contain a restriction
set of strains expressing yeast open read- site recognized by the enzyme BsmFI which
ing frames (ORFs) as DNA-binding domain cuts 20 nucleotides away from its recogni-
hybrids was mated to a library of strains tion site. The two populations are then com-
containing activation-domain fusions of bined, ligated, amplified and sequenced. The
full-length yeast ORFs (Fig. 3.2d). A fourth four-nucleotide sequence CAGT (recognized
variation involved mating of defined pools by NlaIII) allows the identification of each
of strain arrays (Fig.3.2e). Suter et al. (2008) amplified region. The sequences obtained
reviewed the current applications of Y2H allow their unique identification for each
and variant technologies in yeast and mam- gene, although the size of the sequence is
malian systems. Y2H methods will continue very short (of the order of a dozen nucle-
to play a dominant role in the assessment of otides), it is sufficiently adequate to identify
protein interactomes. the specific gene from which it derives by
comparison with sequence databases.
SAGE can be used to identify the col-
lection of genes transcribed in a given tissue
3.1.4 Serial analysis of gene expression or developmental stage. It also provides an
estimate of the frequency of transcription of
Serial analysis of gene expression (SAGE) is each identified gene because it is propor-
a method for the comprehensive analysis of tional to the frequency of the sequence in the
gene expression patterns. SAGE is used to total collection of sequences obtained. The
produce a snapshot of the mRNA population study by Velculescu et al. (1995) indicated:
in a sample of interest (Velculescu et al., (i) that just nine base pairs of DNA sequence
1995). Several variants have since been are sufficient to distinguish 262,144 genes if
developed, most notably a more robust ver- the sequence is from a defined position in
sion, LongSAGE (Saha et al., 2002) and the the gene; (ii) if the 9-bp sequences are placed
most recent SuperSAGE (Matsumura et al., end-to-end (concatenated) and separated by
2005) that enables very precise annotation punctuation then they can be sequenced
of existing genes and discovery of new genes serially (analogous to the mechanism by
within genomes because of an increased tag- which a computer transmits data); and (iii)
length of 2527 bp. Three principles underlie a single sequencing reaction can yield infor-
the SAGE methodology: (i) a short sequence mation on 1050 genes.
tag (originally 1014 bp) that contains suf-
ficient information to uniquely identify a
transcript provided that the tag is obtained
from a unique position within each tran- 3.1.5 Quantitative real-time PCR
script; (ii) sequence tags can then be linked
together to form long serial molecules that Real-time reverse-transcriptase PCR (RT-
can be cloned and sequenced; and (iii) quan- PCR), also known as quantitative real-
titation of the number of times a particular time-PCR (QRT-PCR), measures PCR amplifi-
tag is observed which provides the expres- cation in real time (via fluorescence) during
sion level of the corresponding transcript. amplification. It enables both detection and
The principle of the technique is shown quantification (as absolute number of copies
in Fig. 3.3: mRNAs are isolated from a tissue, or relative amount when normalized to DNA
and double-stranded cDNAs are synthesized input or additional normalizing genes) of a
AAAAA
TTTTT
AAAAA
TTTTT

AAAAA
TTTTT

Cleave with anchoring enzyme (AE)


Bind to streptavidin beads

AAAAA
GTAC TTTTT
AAAAA
GTAC TTTTT
AAAAA
GTAC TTTTT

Divide in half
Ligate to linkers (A + B)

CATG AAAAA CATG AAAAA


GTAC TTTTT GTAC TTTTT
CATG AAAAA
CATG AAAAA GTAC TTTTT
GTAC TTTTT
CATG AAAAA
CATG AAAAA GTAC TTTTT
GTAC TTTTT

Cleave with tagging enzyme (TE)


Blunt end

GGATGCATGXXXXXXXXX GGATGCATGOOOOOOOOO
CCTACGTACXXXXXXXXX CCTACGTACOOOOOOOOO
TE AE Tag TE AE Tag

Ligate and amplify with


primers A and B

GGATGCATGXXXXXXXXXOOOOOOOOOCATGCATCC
CCTACGTACXXXXXXXXXOOOOOOOOOGTACGTAGG
Ditag
Cleave with anchoring enzyme
Isolate ditags
Concatenate and clone

CATGXXXXXXXXXOOOOOOOOOCATGXXXXXXXXXOOOOOOOOOCATG
GTACXXXXXXXXXOOOOOOOOOGTACXXXXXXXXXOOOOOOOOOGTAC
Tag 1 Tag 2 Tag 3 Tag 4
AE AE AE
Ditag Ditag

SAGE profile of wild type Arabidopsis and the Arabidopsis-Pti4 line

70
wild type Pti4
60

50
# Tags

40

30

20

10

0
Ca/b

Pti4

PDF1.2

Di19

Lhcb5

TIP

Catalase

Oxygen-
evolving protein

Germin1

TF
MYB60

BAC clone
T18N14

ATPase

Chrom.
5 clone

Peroxidase

Genes

Fig. 3.3. Serial analysis of gene expression (SAGE).


Omics and Arrays 67

specific sequence in a DNA sample. The pro- A Cycle 5 10 15 20 25 30 45


Agarose
cedure follows the general principle of PCR; gel
Conventional PCR
its key feature is that the amplified DNA is
quantified as it accumulates in the reaction 1.0
in real-time after each amplification cycle. Amplification curve obtained with
the LightCycler
Two common methods of quantification are
the use of fluorescent dyes that intercalate 0.8

with double-strand DNA, and modified

Fluorescence
DNA oligonucleotide probes that fluoresce
0.6
when hybridized with a complementary
DNA (cDNA).
Real-time RT-PCR uses fluorophores in 0.4
order to detect levels of gene expression. As
mRNA becomes translated at the ribosome to
0.2
produce functional proteins, mRNA levels tend
0 10 20 30 40
to roughly correlate with protein expression. Cycle number
In order to adapt PCR to the measurement of
RNA, the RNA sample first needs to be reverse B 4
10 copies
3.0
transcribed to cDNA via an enzyme known as
a reverse transcriptase. The original RT-PCR 2.5 10 copies
technique required extensive optimization
Fluorescence

of the number of PCR cycles, so as to obtain 2.0


results during logarithmic DNA amplification,
before it starts to plateau. Development of PCR 1.5

technology that uses fluorophores to meas-


1.0
ure DNA amplification in real-time allows 0 copies
researchers to bypass the extensive optimiza- 0.5
tion associated with normal RT-PCR.
In real-time RT-PCR, the amplified 0
0 10 20 30 40 50
product is measured at the end of each cycle.
Amplification cycles
This data can be analysed by computer soft-
ware to calculate relative gene expression
210 = 1024 27 37
between several samples or mRNA copy cycles cycles
number based on a standard curve (Fig. 3.4).
By comparing cycles of linear amplification Fig. 3.4. Quantitative real-time PCR. (A) Agarose
among target cDNAs/genes, the relative fold gel to show the amplification results in
conventional PCR of different cycles (above)
difference in expression can be measured
and amplification curve obtained with the
as 2cycles x cycles y. For example, comparing LightCycler to show relative gene expression
sample x linear at 37 and sample y linear at (below). (B) Relative gene expression for different
27, we have 23727, which means a 1024-fold mRNA copies, by which a relative fold difference
x mRNA accumulation of x versus y, in gene expression can be measured.
assuming that the sequences amplify with
equal efficiency (Fig. 3.4). and show the relative difference in the con-
centration of these molecules. It can be used
to enrich for differentially expressed genes.
3.1.6 Subtraction suppressive Subtracted cDNA libraries are hybridization
hybridization and PCR based and result in normalization
of the sample. They can be combined with
Suppression subtractive hybridization full length cDNA libraries.
(SSH) (Diatchenko et al., 1996) is a tech- SSH includes the following proce-
nique that uses PCR to quickly compare the dures: (i) prepare cDNAs from two stages/
expression of mRNA from different samples conditions; (ii) separately digest tester (from
68 Chapter 3

the same source as sample to be tested) and Transcriptional analysis may also be
driver cDNA (from a normal sample) to carried out by inserting a reporter gene such
obtain shorter fragments; (iii) divide tester as lacZ or GFP (green fluorescent protein)
cDNA into two portions and ligate each to downstream from the promoter under study.
a different adaptor, while driver cDNA has lacZ encodes -galactosidase and its expres-
no adaptors; (iv) hybridization kinetics lead sion is detected by the blue colour obtained
to equalization and enrichment of differ- in the presence of X-Gal. GFP is a protein
entially expressed sequences among single containing a chromophore which fluoresces
strand tester molecules; and (v) ultimately under blue light (395 nm). These reporters
generate templates for PCR amplification are used to evaluate the expression levels and
from differentially expressed sequences. identify the tissues in which the normal gene
As a result, only differentially expressed is expressed under the chosen promoter.
sequences are amplified exponentially.

3.1.7 In situ hybridization 3.2 Structural Genomics

In situ hybridization (ISH) is a type of Genomics is a term coined by Thomas


hybridization that uses a labelled cDNA or Roderick in 1986 and refers to a new scientific
RNA strand (i.e. probe) to localize a specific discipline of mapping, sequencing and ana-
DNA or RNA sequence in a portion or sec- lysing genomes. Genomics is now however,
tion of tissue (in situ) or in the entire tissue. undergoing a transition or expansion from the
DNA ISH can be used to determine the struc- mapping and sequencing of genomes to an
ture of chromosomes. Fluorescent DNA ISH emphasis on genome function. To reflect this
(FISH) can be used to assess chromosomal shift, genome analysis may now be divided
integrity. RNA ISH (hybridization histo- into structural genomics and functional
chemistry) is used to measure and localize genomics. Structural genomics represents
mRNAs and other transcripts within tissue an initial phase of genome analysis and has a
sections or whole mounts. clear end point: the construction of high-res-
For hybridization histochemistry, sam- olution genetic, physical and transcript maps
ple cells and tissues are usually treated to fix of an organism. The ultimate physical map of
the target transcripts in place and to increase an organism is its complete DNA sequence.
access of the probe. The probe is either a There are an increasing number of terms
labelled cDNA or more commonly, a cRNA ending up with -omes and -omics. Some
(riboprobe). The probe hybridizes to the tar- examples include cytomics, epigenomics,
get sequence at elevated temperature and genomics, immunomics, interactome, metab-
the excess probe is then washed away (after olomics, ORFeome, phenomics, proteomics,
prior hydrolysis using RNase in the case of secretome, transcriptomics, transgenomics,
unhybridized, excess RNA probe). Solution etc. Genome organization, physical mapping
parameters such as temperature, salt and/or and sequencing will be discussed in this sec-
detergent concentration can be manipulated tion. For further details, readers are referred
to remove any non-identical interactions (i.e. to Primrose (1995), Borevitz and Ecker (2004),
only exact sequence matches will remain Choisne et al. (2007) and Lewin (2007).
bound). Then, the probe that was labelled
with either radio-, fluorescent- or antigen-
labelled bases (e.g. digoxigenin) is localized 3.2.1 Genome organization
and quantitated in the tissue using autoradi-
ography, fluorescence microscopy or immu- Major differences among various genomes
nohistochemistry. ISH can also use two or
more probes labelled with radioactivity or Eukaryotes have large genomes, linear chro-
the other non-radioactive labels to simulta- mosomes with centromeres and telomeres,
neously detect two or more transcripts. low gene density disrupted by introns and
Omics and Arrays 69

highly repetitive sequences, while prokary- concentration and time required to pro-
otes have small genomes, single and cir- ceed to the half way of re-association. It is
cular chromosomes (few linear) with no directly related to the amount of DNA in the
centromere or telomere, high gene density genome.
without introns and very few or no repeti- The DNA content of haploid genomes
tive sequences. The genome size refers to ranges from 5 103 for viruses to 1011 bp for
the haploid genome since different cells flowering plants. Within mammals, there is
within a single organism can be of differ- only a two fold difference between the larg-
ent ploidy. Germ cells are usually haploid est and smallest C-value. However, there
and somatic cells diploid. The size of the is up to a 100-fold variation in size within
genome is known as the C-value and is flowering plants. The minimum genome
measured by re-association kinetics. After size found in each phylum increases from
denaturation, the rate of re-association is prokaryotes to mammals (Fig. 3.5).
dependent on genome size. The larger the Among the most important food crops,
genome, the more repeated DNA sequences rice has the smallest genome (389 Mb)
and the longer time to re-anneal, the higher (IRGSP, 2005) and wheat the largest
the C-value. C0 t1/2 is the product of the DNA (15,966 Mb). According to Arumuganathan

Lycopersicon Zea Capsicum


Phaseolus esculentum Triticum
mays annuum
vulgaris Musa sp. (953 Mb) (2,504 Mb) Allium aestivum
(2,702 Mb)
Arabidopsis (673 Mb) (873 Mb) cepa (15,966 Mb)
thaliana Glycine (15,290 Mb)
Sorghum Hordeum
(145 Mb) Oryza max
bicolor vulgare
sativa (1,115 Mb) Avena
(760 Mb) (4,873 Mb)
(389 Mb) sativa
(11,315 Mb)

109 1010 1011

Flowering plants
Birds
Mammals
Reptiles
Amphibians
Bony fish
Cartilaginous fish
Echinoderms
Crustaceans
Insects
Molluscs
Worms
Fungi
Algae
Bacteria
Mycoplasmas
Virus
103 104 105 106 107 108 109 1010 1011
DNA content (bp)

Fig. 3.5. DNA contents of organisms. Modified from Primrose (1995) and Arumuganathan and Earle (1991).
70 Chapter 3

and Earle (1991), other crops can be called selfish DNA). Some of the sequences
grouped into seven classes: Musa, cowpea are found to cause insertional or deletion
and yam (873 Mb); sorghum, bean, chick- mutations such as Alu.
pea and pigeonpea (673818 Mb); soy-
bean (1115 Mb); potato and sweet potato
(15971862 Mb); maize, pearl millet and 3.2.2 Physical mapping
groundnut (23522813 Mb); pea and barley
(43975361 Mb); and oat (11,315 Mb). Physical mapping entails constructing a
Genome size is often correlated with physical map which consists of continuous
plant growth and ecology and extremely overlapping fragments of cloned DNA that
large genomes may be limited both eco- has the same linear order as found in the
logically and evolutionarily. The manifold chromosomes from which they are derived.
cellular and physiological effects of large A series of overlapping clones or sequences
genomes may be a function of selection of that collectively span a particular chromo-
the major components that contribute to somal region and form a contiguous segment
genome size such as transposable elements is called a contig. Recommended references
and gene duplication (Gaut and Ross-Ibarra, for physical mapping include Zhang and
2008). Wing (1997), Brown (2002), Meyers et al.
(2004) and Lolle et al. (2005).
Sequence complexity

Within a phylum, the number of genes in DNA libraries


each organism is quite similar although
the genome size has a 100-fold difference. Large-insert DNA libraries are one of the
It is estimated that the number of genes in key components in genome research. They
flowering plants is 30,00050,000 but the are especially useful for genome studies in
genome size variation is about 100 times large and complex genomes. These libraries
(Arabidopsis versus wheat). This is because can be used in a variety of research projects
some large genomes contain a high percent- such as physical mapping of chromosomes,
age of repetitive DNA. map-based cloning of important genes,
The proportions of different sequence genome organization and evolution, com-
components in representative eukaryo- parative genomics and molecular breeding
tic genomes differ greatly. For example, programmes.
the Escherichia coli genome consists of A gene or DNA library is a collection of
100% non-repetitive sequences while all the genes for an organism so that there
tobacco contains 65% moderately repeti- is a high probability of finding any particu-
tive and 7% highly repetitive sequences. lar segment of the source DNA in the col-
Repetitive DNA is of two types: tandem lection. To contain a colony of bacteria for
repeats (those that are found adjacent to every gene, a library will consist of tens of
one another) and dispersed repeats (those thousands of colonies or clones. The col-
that recur in unlinked genomic loca- lection is represented in the form of recom-
tions). For example, two classes of dis- binants between DNA fragments from the
persed and highly repetitive DNA include organism and the vector. The library has
SINES (short interspersed elements), to be ordered so that each clone has been
i.e. shorter than 500 bases and present in placed in a precise physical location rela-
1056 copies, and LINES (long interspersed tive to others (such as in wells of microtitre
elements), i.e. longer than 5 kb and present plates).
in at least 104 copies per genome. Various highly efficient cloning vec-
Repeated sequence families can tors have been used to construct DNA
sometimes function as regulators of gene libraries. Most frequently used vectors are
expression. On the other hand, they can be l phages, cosmids, P1 phages and artificial
non-functional identities (such as the so- chromosomes. There are various types of
Omics and Arrays 71

artificial chromosomes including yeast arti- in screening libraries using antibodies or


ficial chromosome (YAC), bacterial artificial enzyme activities.
chromosome (BAC), binary BAC (BIBAC), In order to be confident that virtually
P1-derived artificial chromosome (PAC), all regions of the genome are represented at
transformation-competent artificial chromo- least once in a library, considerable redun-
some (TAC), mammalian artificial chromo- dancy of cloned DNA must be included
some (MAC), human artificial chromosome in the library. The number of DNA clones
(HAC) and plant artificial chromosome. (n) needed for a certain probability (P) of
When the DNA is simply ligated to the vec- finding a target clone, is calculated by the
tor and packaged in the phage particles, formula:
the library is said to be unamplified. In an
amplified library, the original DNA has been ln(1 P )
n=
subsequently increased by replication in k
ln 1
bacteria. m
Which DNA is cloned in libraries
depends on the purpose of the research. where k is the DNA insert size in kb and m
Genomic libraries are constructed from the is the haploid genome size in kb. As a rule
total nuclear DNA of an organism. In mak- of thumb, a library containing DNA inserts
ing these libraries, the DNA must be cut into which collectively add up to three times the
clonable-size pieces as randomly as possi- amount of DNA in a single gamete of the
ble. Shearing or partial digestion with a fre- organism, will provide about 95% confi-
quently cutting restriction endonuclease is dence that any DNA element in the genome
often used. Chromosome-specific libraries is represented at least once in the library.
are made from the DNA of purified isolated A library that has five genome-equivalent
chromosomes. A cDNA library contains a coverage (rather than three), will provide
collection of cDNA clones transcribed from about 99% confidence of including the target
mRNAs collected from a specific tissue or element. For example, the number of BACs
organ at a specific growth or developmen- of an average size of 150 kb required for 5
tal stage under a specific environment. coverage of Arabidopsis (m = 125,000 kb) is
Therefore, a cDNA library only contains the 3835. When DNA fragments are randomly
genes that are expressed in the specific con- distributed the probability of obtaining any
ditions. Furthermore, cDNAs do not contain DNA sequence from this library is no lower
introns or promoters. than 0.99.
Functionally, gene libraries can be clas-
sified into cloning and expression libraries. Construction of large insert genomic libraries
Cloning libraries are constructed by clon-
ing vectors which contain replicons, mul- Construction of large insert genomic librar-
tiple cloning sites and selection markers. ies includes three steps: (i) development
Clones can be multiplied by bacterial cul- of the cloning vector; (ii) isolation of high
ture. Expression libraries are constructed molecular weight DNA; and (iii) preparation
by expression vectors which contain spe- of insert DNA.
cific sequences that control gene expres-
sion such as promoters, Shine-Dalgarno DEVELOPMENT OF LARGE-INSERT CLONING VECTORS.
sequences, ATG and stop codons, etc. in Developing a vector which can accommo-
addition to those contained in cloning vec- date a large DNA fragment has been a dif-
tors. The coding products of clones can be ficult task. Ten kb is the maximum insert
expressed in host cells. size of most plasmid vectors. As the insert
cDNA libraries are often expression size increases, the ligation and transforma-
libraries in which clone construction is tion efficiency decreases significantly.
such that part or all of the encoded pro- The first such vector was the bacte-
tein is expressed in bacteria harbouring the riophage l vector in which the size of the
cloned DNA. Such expression is needed largest DNA insert is about 25 kb. This is
72 Chapter 3

because the fixed capacity of the phage head mediated transformation. A similar vec-
prevents genomes that are too long being tor called TAC, was developed and used
packaged into progeny particles. Cosmids to complement a mutant phenotype in
are one type of hybrid vector that replicate Arabidopsis (Liu et al., 1999). Table 3.1
like a plasmid but can be packaged in vitro provides characteristics of several artificial
into l phage coats. The vector can accom- chromosome vectors.
modate DNA inserts as large as 45 kb.
The YAC vector was developed in ISOLATION OF HIGH MOLECULAR WEIGHT DNA.
which an insert up to 1000 kb can be main- Preparing quality high molecular weight
tained. The YAC cloning system includes (HMW) DNA (most of the DNA > 1 Mb)
Tel yeast telemeres, ARS1 autonomously suitable for large insert library construc-
replicating sequence, CEN4 centromere tion can be one of the most difficult
from yeast chr.4, URA3 (Uracil) and TRP1 steps in constructing a large-insert plant
(tryptophan) yeast selection marker genes, genomic library. There are four predom-
Amp ampicillin-resistance gene and Ori inant problems involved in isolating
origin of replication of pBR322. Although plant nuclear DNA: (i) plant cell walls
the YAC clones have played a major role must be physically broken or enzymati-
in several genome projects and map-based cally digested without damaging nuclei;
cloning of many genes in the early 1990s, (ii) chloroplasts must be separated from
the following four problems have prevented nuclei and/or preferentially destroyed,
their further use in genome studies: (i) high an important process since copies of the
percentage of chimaeric clones; (ii) dif- chloroplast genome may comprise the
ficulty in DNA preparation and storage; majority of the DNA within a plant cell;
(iii) low transformation efficiency; and (iv) (iii) volatile secondary compounds such
instability of some inserts in yeast. In the as polyphenols must be prevented from
rice cultivar Nipponbare for example, 40% interacting with the nuclear DNA; and
of the clones in the YAC library alone were (iv) carbohydrate matrices that often form
chimaeric thus limiting its use for genome after tissue homogenization must be pre-
sequencing or map-based cloning. vented from trapping nuclei.
The BAC cloning system is based on Several different isolation methods
the E. coli single copy F factor (Shizuya have been developed. The first method
et al., 1992). It is easy to manipulate, screen was to isolate the protoplast from leaf tis-
and maintain the cloned DNA. It is non- sue and then embed the protoplast in low-
chimaeric, and has high transformation melting point agarose in the forms of a plug
efficiency. or bead. This method is expensive and
To facilitate gene identification in plant time consuming. In addition, chloroplast
species, second-generation BAC vectors DNA is not separated. The development of
such as BIBAC were constructed (Hamilton methods to isolate nuclei from leaf tissue
et al., 1996). A 150-kb human DNA frag- has dramatically improved the procedure
ment in the BIBAC vector was transferred and quality of the HMW DNA for library
into the tobacco genome by Agrobacterium- construction.

Table 3.1. Characteristics of artificial chromosome vectors.

Maximum DNA Plant


Vector Host size (kb) Stability Chimerism preparation transformation

YAC Yeast 1000 + Difficult No


P1 E. coli 100 + Easy No
BAC E. coli 300 + Easy No
PAC E. coli 300 + Easy No
BIBAC/TAC E. coli and
A. tumefaciens 300 + Easy Yes
Omics and Arrays 73

PREPARATION OF INSERT DNA FOR LIGATION. The gerprinting; chromosome walking; sequence
average size of DNA fragments produced tagged site (STS) mapping; and fluorescent
by complete digestion with restriction in situ hybridization (FISH). In restriction
enzymes with four- or six-base recogni- fragment fingerprinting, individual clones
tion sequences is too small for large insert are first digested with different restriction
library construction. To obtain relatively enzymes. The digested DNA is then labelled
HMW restriction fragments (100300 kb), with radioactive or fluorescent dye and run
the popular method is to partially digest the on a sequence gel. The fingerprint data is
target DNA with a four-base-cut enzyme. collected and analysed for contig assembly.
Partial DNA digestion not only yields frag- During the procedure, markers with known
ments of the desired size but also fragments map position are used as probes to screen
the genome randomly without exclusion of the large insert library. Clones hybridized
any sequence. with the same single copy marker are con-
To determine the conditions that yield sidered to be overlapping. PCR amplifica-
a maximum percentage of fragments between tion of DNA pools using primers derived
100 and 300 kb, a series of partial digestions from DNA markers with known position
are carried out by using different amounts can also be used for physical map construc-
of restriction enzyme for a specific diges- tion. The disadvantages of this method are
tion period. Once the optimal conditions that it is labour intensive and filling the
for producing fragments between 100 and gaps is difficult.
300 kb are determined, a mass digestion STS mapping uses a sequenced tagged
using several plugs is carried out to obtain site (STS) which is a short region of DNA about
sufficient DNA for size selection. Partially 200300 bases long whose exact sequence
digested HMW DNA is then subjected to is found nowhere else in the genome.
pulsed field gel analysis. Two or more clones containing the same STS
If there is no size selection of partially must overlap and the overlap must include
digested DNA, a random library will have a the STS. There are two disadvantages to this
preponderance of small inserts since small method: it is still very labour intensive and
fragments ligate more efficiently and clones the primer synthesis is expensive.
with small inserts transform with higher FISH uses synthetic polynucleotide
efficiency. Contour-clamped homogeneous strands that bear sequences known to be
electrical field (CHEF) is the most common complementary to specific target sequences
method for separating large DNA molecules. at specific chromosomal locations. The poly-
It uses a hexagonal array of fixed electrodes nucleotides are bound via a series of linked
and a homogeneous electrical field is gen- molecules to a fluorescent dye that can be
erated for enhancing DNA resolution. After detected with a fluorescence microscope.
two-size selection using CHEF Mapper, In addition, physical mapping can
the HMW restriction fragments must be be achieved by a combination of finger-
removed from surrounding agarose before printing, molecular linkage mapping, STS
they can be used in ligation reactions. After mapping, end sequencing and FISH map-
developing the high insert library, a number ping. A by-product of physical mapping
of random clones can be selected to confirm is the integration of genetic, physical and
the successful cloning of the inserts and the sequence maps as shown in Fig. 3.6.
average insert size. The average insert size
will then determine how many clones are
needed to achieve the desired amount of
3.2.3 Genome sequencing
genome coverage.
The sequencing of DNA in laboratories
Physical mapping first began in 1978. The first genome of a
multicellular eukaryote, Caenorhabditis
There are five physical mapping methods: elegans, was published in 1998. The ration-
optical mapping; restriction fragment fin- ale behind genome sequencing includes
74 Chapter 3

Human chromosome 16
Cytogenetic
map
Site of hybridization

FRA16D
FRA16B
CY180

CY165
Somatic cell
CY14
23HA with labelled probe

CY19

CY11
CY13
CY15

CY12

CY8

CY7

CY2
CY4
hybridization
map
(from cultured Region of interest
humanmouse D16 S159

D16 S150
D16 S149

D16 S160

D16 S144
between breakpoints

16AC 6.5
D16 S85

D16 S60

D16 S48

D16 S40
hybrid cells) CY8 and CY7

Genetic
linkage map Region of interest
between genetic
Region of interest can be localized either markers 16AC6.5
on physical map (somatic cell hybrid map) and D16S150
or genetic map.

YAC clone YAC clone containing


insert region of interest

BAC and/or
PAC contigs

BAC or PAC clones


Each of these lines represents a sequence- containing the region
tagged site (STS), a unique DNA sequence of interest
that can be amplified by PCR; presence
of an STS in a clone indicates where the
insert originated from in the chromosome.

STS GATCAAGGCGTTACATGA
AGTCAAACGTTTCCGGCCTA

Fig. 3.6. Example of physical mapping and integration of genetic, cytological and physical maps.

identification of all the genes in the sequenced DNA sequencers; and (iii) PCR. Until the
genome, elucidation of the functions and the late 1970s, obtaining the DNA sequences
interactions of genes in the genome, func- of even five to ten nucleotides was dif-
tional analysis of orthologues in related ficult and very laborious. The develop-
complex genomes, evolutionary analysis of ment of two new methods in 1977, that
genes or genomes and product development of Maxam and Gilbert (chemical sequenc-
and commercial application. As the next- ing method) and the other by Sanger and
generation sequencing technologies contin- Coulson (enzymatic sequencing), made it
ued to facilitate genome sequencing, new possible to sequence large DNA molecules.
applications and new assay concepts (e.g. Later refinements of Sangers chain termi-
Huang et al., 2009) have emerged that are nation method made it the preferred proce-
vastly increasing our ability to understand dure since it has proven to be technically
genome function, including sequence census simpler.
methods for functional genomics (Wold and The modified Sanger sequencing
Myers, 2008; Varshney et al., 2009). method or chain terminator procedure capi-
talizes on two properties of DNA polymer-
Technical developments in DNA sequencing ases: (i) their ability to synthesize faithfully
a complementary copy of a single-stranded
There are three major milestones in DNA DNA template; and (ii) their ability to use
sequencing: (i) the invention of sequenc- 3'-dideoxynucleotides as substrates. Once
ing reactions; (ii) automated fluorescent the analogue is incorporated at the growing
Omics and Arrays 75

point of the DNA chain, the 3' end lacks a and opening up many new possibilities
hydroxyl group and is no longer a substrate (Kahvejian et al., 2008; Shendure and Ji,
for chain elongation. Thus, the dideoxynu- 2008). There are three commercial next-
cleotides act as chain terminators. generation DNA sequencing systems avail-
The development of labelling and able (Schuster, 2008) which promise vastly
detection techniques have contributed to more sequencing capability (> 1 Gb of
an acceleration of sequencing procedures, sequence per run) than standard capillary-
which include 33P labelled primer (1970s); based technology can produce. A high-
33
P or 35S labelled primer with sharper throughput DNA sequencing technique
image and lower radiation (early 1980s); using a novel massively parallel sequenc-
and fluorescently labelled primers and ing-by-synthesis approach called pyrose-
dyes in four different reactions (1986). quencing was developed more recently by
DNA sequencing became automated in the 454 Life Sciences (Margulies et al., 2005;
late 1980s when the primer used for each www.454.com). 454 Sequencing employs
reaction was labelled with a differently clonal DNA fragment amplification on
coloured fluorescent tag. This technology beads in droplets of an aqueousoil emul-
allowed thousands of nucleotides to be sion, followed by loading the beads into
sequenced in a few hours and the sequenc- nanoscale ( 44 m) wells of a PicoTiterPlate
ing of large genomes then became a reality. which is a fibre optic chip. In each reac-
With ABI PRISM technology, up to four tion cycle, one of the four deoxynucleotide
different dyes can be used to label DNA triphosphates (dNTPs) is delivered to the
each of which can be differentiated when reactor along with DNA polymerase, ATP
run together in the same lane of a gel or sulfurylase and luciferase. Incorporation,
injected into a capillary. For DNA sequenc- which is accompanied by a chemolumins-
ing, this means that the four different dyes cent signal, is detected by a high-resolution
representing each of the DNA bases (A, C, charge-coupled device (CCD) sensor. 454
G and T) can be electrophoresed together. Sequencing is capable of sequencing roughly
The improvement of polyacrylamide 100 Mb of raw DNA sequence per 7-h run
gel electrophoresis (in the late 1980s and with their 2007 sequencing machine, the GS
early 1990s) led to high resolution, thin- FLX Genome Analyzer.
ner gels and a sharper image. Capillary 454 Sequencing allows large amounts
electrophoresis (CE) (1998) offers a number of DNA to be sequenced at low cost
of performance advantages such as faster compared to the Sanger chain-termina-
runs, small sample volumes and the abil- tion methods; G-C rich content is not as
ity to eliminate manual gel pouring and much of a problem, and the lack of reli-
sample loading tasks. Walk-away automa- ance on cloning means that unclonable
tion reduces instrument-associated labour segments are not skipped; it is also capa-
time by more than 80% over slab-gel sys- ble of detecting mutations in an amplicon
tems. The introduction of CE resulted in the pool at a low sensitivity level. However,
availability of automated electrophoresis each read of the 2005 sequencing machine
instruments with much lower cost per sam- GS20 is only 100 bp long, resulting in
ple (Amershams MegaBACE and Applied some problems when dealing with highly
Biosystems ABI3700, 3730, etc.). High- repetitive genomes, as repetitive regions
throughput sequencing can also incorporate of over 100 bp cannot be bridged and
full automation in colony picking, 96-well thus must be left as separate contigs. Also,
plasmid isolation and purification, PCR the nature of the technology lends itself
reactions, sample loading and sequence to problems with long homopolymer runs.
data analysis. As one of the projects using 454 sequenc-
The new generation of high-through- ing, Project Jim determined the first
put sequencing technologies promises to sequence of an individual, the complete
transform the scientific enterprise, poten- genome sequence of James Dewey Watson,
tially supplanting array-based technologies in May 2007.
76 Chapter 3

The second high-throughput sequenc- in a DNA strand offers the prospect of third
ing technique is Solexa (Illumina, Inc.; generation instruments that will sequence a
http://www.illumina.com) which depends diploid mammalian genome for US$1000
on sequencing by synthesis. Diluted DNA in 24 h (Branton et al., 2008).
templates are attached to a solid planar sur-
face and then amplified clonally. Sequencing Sequencing strategies
is performed by delivering a mixture of four
differentially labelled reversible chain ter- There are two general genome sequencing
minators along with DNA polymerase. The strategies: (i) clone-by-clone or hierarchical
resulting signal is detected at each cycle sequencing (International Human Genome
and a new cycle can be initiated after termi- Sequencing Consortium, 2001); and (ii) whole
nator removal (Bennet et al., 2005). Current shotgun sequencing (Venter et al., 2001).
average read lengths are about 3040 bases After constructing the complete physical
with 1 Gb per run. map, clone-by-clone sequencing can be
The third high-throughput sequenc- started in any specific region. Clone-by-clone
ing technique is SOLiD System which or hierarchical sequencing strategy has the
enables massively parallel sequencing of following advantages: (i) the ability to fill
clonally-amplified DNA fragments linked gaps and re-sequence the uncertain regions;
to beads. The SOLiD sequencing method- (ii) the ability to distribute the clones to
ology is based on sequential ligation with other laboratories; and (iii) the ability to
dye-labelled oligonucleotides. The SOLiD check the produced sequence by restriction
technology provides unmatched accu- enzymes. The main disadvantages are that
racy, ultra-high throughput and applica- it is expensive and time consuming for the
tion flexibility. It delivers advancements in construction of a physical map and experi-
throughput approaching 20 Gb per run. The enced personnel are required.
flexibility of two independent flow cells, The shotgun sequencing strategy
each capable of running 1, 4 or 8 samples, consists of making small insert librar-
allows multiple experiments to be con- ies (110 kb) from the genomic DNA of an
ducted in a single run. With unparalleled organism, sequencing a large number of
throughput and greater than 99.9% overall clones (six to eight times redundancy) and
accuracy, the SOLiD System enables large- assembling contigs using bioinformatics
scale sequencing and tag-based experiments software. It has no physical map construc-
to be completed more cost effectively than tion and less risk of recombinant clones. It
previously possible. is cost effective and fast and ideal for small
There are several emerging sequencing genome sequencing. However, it is difficult
methods: sequencing by hybridization; mass to fill gaps and re-track all the sequenced
spectrophotometric techniques; direct visu- plasmids and the resulting data is less use-
alization of single DNA molecules by atomic ful for positional cloning. Figure 3.7 com-
force microscopy; single-molecule sequenc- pares the two sequencing methods.
ing strategies. The intense drive towards
developing technology that can sequence a COMBINING CLONE-BY-CLONE AND SHOTGUN SEQUENC-
complete human genome for under US$1000 ING STRATEGIES. In 1997 The Institute of
will ensure that the speed and cost of Genome Research (TIGR) launched the ini-
sequencing will continue to improve rap- tiative of a whole-genome shotgun approach
idly (Schuster, 2008). For example, a nano- for the human genome. But BACs, BAC
pore-based device provides single-molecule end sequences and STS markers were used
detection and analytical capabilities that extensively in assembling the sequencing
are achieved by electrophoretically driving data from shotgun clones. The first draft of
molecules in solution through a nano-scale the human genome was completed within 3
pore. Further research and development to years compared with the 12 years taken by
overcome current challenges to nanopore the Human Genome Project which is funded
identification of each successive nucleotide by government agencies.
Omics and Arrays 77

Hierarchical sequencing Shotgun sequencing


1. Construct Chromosomal
large BAC DNA
Fragment and
or P1 clones sequence
whole genome
2. Align

3. Take subset
of clones,
fragment and
sequence

Assemble contigs and bioinformatics analysis

U-unitigs
Rock 50 kb Mates
Scaffold

Stones

Gap
Link mapped
scaffold to
existing map
STSs

Fig. 3.7. Comparison of two sequencing strategies: assembly of a mapped scaffold. U-unitigs are
assembled into scaffolds using mate-pair information to bridge gaps between two U-unitigs, and by
linking unitigs to rock, which are less-well supported unitigs that nevertheless fit in place according
to at least two independent large insert mate pairs. Stones are single short contigs whose position is
supported by only a single read. Gaps are filled in the finishing stage by further site-directed sequencing.
Scaffolds are placed against existing genetic and physical maps by sequence tagged site (STS) matches
and against the cytological map by fluorescent in situ hybridization (FISH).

Genome filtering strategies


is cleaved when transferred into a Mcr +
The extremely large size of many crop E. coli strain and only hypomethylated
genomes makes it difficult to decode them DNA is recovered. CBCS/HC separates
using the standard methods of genome single- and low-copy sequences including
sequencing such as clone-by-clone and most genes from the repeated sequences
whole-genome shotgun. Determining on the basis of their differential renatura-
their complete sequences is daunting and tion characteristics. Using the MF strategy,
costly. In recent years two genome filtra- Bedell et al. (2005) sequenced 96% of the
tion strategies, methylation filtration (MF) genes in sorghum with an average cover-
(Rabinowicz et al., 1999) and C0t-based age of 65% across their length. This strat-
cloning and sequencing (CBCS; Peterson egy filtered out repetitive elements during
et al., 2002) or high C0t (HC; Yuan et al., the sequencing of the genome of sorghum
2003) have been suggested for selec- which reduced the amount of sorghum
tively sequencing the gene space of large DNA to be sequenced by two-thirds, from
genomes. MF is based on the characteristics 735 Mb to approximately 250 Mb. Both MF
of plant genomes in which genes are largely and HC have been used for efficient char-
hypomethylated but repeated sequences acterization of maize gene space (Palmer
are highly methylated. Methylated DNA et al., 2003; Whitelaw et al., 2003). Using
78 Chapter 3

high C0t and MF, Martienssen et al. (2004) Plant genomic sequences
generated up to twofold coverage of the
gene space with less than one million The first complete plant genome to be
sequencing reads and simulations using sequenced was that of Arabidopsis. The
sequenced BAC clones predicted that sequenced regions cover 115.4 Mb of the
5 coverage of gene-rich regions, accompa- 125-Mb genome and extend into centro-
nied by less than 1 coverage of subclones meric regions. The evolution of Arabidopsis
from BAC contigs, will generate a high qual- involved a whole genome duplication fol-
ity mapped sequence that meets the needs lowed by subsequent gene loss and extensive
of geneticists while accommodating unu- local gene duplications. The genome contains
sually high levels of structural polymor- 25,498 genes encoding proteins from 11,000
phism. Haberer et al. (2005) selected 100 families (The Arabidopsis Genome Initiative,
random regions averaging 144 kb in size, 2000). Arabidopsis contains many families of
representing about 0.6% of the genome, to new proteins but also lacks several common
define their content of genes and repeats protein families. The proportion of predicted
for characterizing the structure and archi- Arabidopsis genes in different functional cat-
tecture of the maize genome. Combining egories is provided in Fig. 3.8. The complete
CBCS with genome filtration can greatly genome sequence provides the foundation
reduce the cost while retaining the high for more comprehensive comparison of con-
coverage of genic regions. An alternative served processes in all eukaryotes, identifying
approach is the identification of gene-rich a wide range of plant-specific gene functions
regions on a detailed physical map and and establishing rapid systematic methods
sequencing large-insert clones from these of identifying genes for crop improvement
regions. (Varshney et al., 2009).

Unclassified Metabolism
10% 11%

Net yet clear-cut


5% Energy
7%
Cell defence
3%
Cell growth
Elicitors 2%
4%
Transcription
Signal transduction 3%
4%

Cellular organization
5%

Intracellular traffic
3%

Transport facilitators Protein synthesis


4% 27%

Protein destination
12%

Fig. 3.8. Proportion of predicted Arabidopsis genes in different functional categories.


Omics and Arrays 79

Rice was the first crop to be fully (University of Missouri), Mark Vaudin
sequenced because of its importance as one (Monsanto) and Steve Rousley (Cereon);
of the major cereals and also because of its the other included Jeff Bennetzen (Purdue
small genome size, small number of chromo- University), Karel Schubert and Roger Beachy
somes (n = 12), well characterized genetic (Danforth Center), Cathy Whitelaw and John
and genomic resources and availability of Quackenbush (TIGR) and Nathan Lakey
a large number of DNA markers and a high (Orion). These two pioneer programmes have
density genetic linkage map. Two draft been extended by a massive US programme
sequences were completed in 2002 (Goff et from the National Science Foundation (NSF),
al., 2002; Yu et al., 2002) and a complete USDA and the Department of Energy (DOE)
sequence was published in 2005 (IRGSP, led by Rick Wilson (Washington University).
2005) which is available in the National The sequencing strategy is a hybrid between
Center for Biotechnology Information (NCBI) a BAC-by-BAC approach and a whole-
database. genome shotgun.
Many sequencing projects for impor-
tant crop species are currently ongoing. The
US Department of Energys Joint Genome 3.2.4 cDNA sequencing
Institute (JGI) is providing funding and
technical assistance to decode the genomes Why cDNA sequencing
of several major plants, including cassava
(Manihot esculenta), cotton (Gossypium), Large-scale DNA sequencing can be car-
foxtail millet (Setaria italica), sorghum, soy- ried out on genomic DNA or cDNAs. There
bean and sweet orange (Citrus sinensis L.) are four advantages to performing cDNA
(http://www.jgi.doe.gov/sequencing/). sequencing. First is the cost of sequencing
Other plants for which there are ongo- a whole genome. Although DNA sequenc-
ing genome sequencing projects include ing costs have fallen more than 50-fold over
Medicago truncatula (http:///www.medi the past decade, it still costs around US$10
cago.org/genome), Lotus japonicum (http:// million to sequence three billion base pairs.
www.kazusa.or.jp), poplar, tomato (http:// It will take years to realize the goal to lower
www.sgn.cornell.edu) and grapevine. the cost of sequencing a mammalian-sized
The International Wheat Genome genome to US$100,000 and ultimately to
Sequencing Consortium (IWGSC) has been cut the cost of whole-genome sequencing to
formed to advance agricultural research for US$1000 or less.
wheat production and utilization by develop- Secondly, the interpretation of the
ing DNA-based tools and resources that result genomic sequence of eukaryotes is not
from the complete sequencing of the expressed straightforward in contrast to prokaryotes:
genome of common (hexaploid) bread wheat coding regions are separated by non-coding
and to ensure that these tools and the sequences regions; introns and alternative splic-
are available for all to use without restriction ing occurs; one gene can lead to multiple
and without cost (Gill et al., 2004; http://www. mRNAs and gene products; a significant
wheatgenome.org/). A Global Musa Genomics fraction of genomic DNA does not code for
Consortium (GMGC) is decoding the Musa proteins (non-coding sequences).
genome (http://www.newscientist.com/article. Thirdly, cDNA sequencing helps in
ns?id-dn1037). A Global Cassava Partnership, annotation and identification of exons and
an alliance of the worlds leading cassava introns. Estimates of the number of human
researchers and developers, has proposed that genes vary from 30,000 to 80,000. The accu-
sequencing the cassava genome should be a racy of the Arabidopsis genome annotation
priority (Fauquet and Tohme, 2004). varied from 50 to 70% in the first draft.
To sequence the maize genome, two Many Arabidopsis genes are still not accu-
consortia in the USA began a pilot study: rately annotated.
one with Jo Messing (Rutgers University), Fourthly, sequencing cDNAs helps
Rod Wing (Arizona University), Ed Coe gain information about the transcriptome.
80 Chapter 3

mRNA populations are variable among efficiency of full-length cDNA cloning using
cells. The transcriptome is dynamic and a cap trapper method (biotinylated cap) and
constantly changing. Cells adapt to envi- thermoactivation of reverse transcriptase
ronmental, developmental and other sig- (cDNA synthesis at 60C: RNA secondary
nals by modulating their transcriptome. structures are melted). Some normalization
mRNA populations form an important and subtraction methods also allow enrich-
level of regulation between signal per- ment of full-length cDNAs.
ception and response. Genetically identi- For a given mRNA, multiple expressed
cal cells can exhibit distinct phenotypes. sequence tags (ESTs) can be obtained.
cDNA sequencing allows direct insight Depending on the extent of sampling, ESTs
into mRNA populations and allows the may or may not overlap. EST process-
dissection of the transcriptome which ing is needed to remove vector sequences,
genomic sequencing alone does not pro- linker sequences, check the quality using a
vide. Sequencing of random cDNA clones sequence quality filter, clean up the contam-
prepared from different tissues also allows inants and chimaeric sequences and store in
analyses of mRNA abundance. databases. To construct EST contigs, there
are two commonly used programs: Phrap/
cDNA libraries consed and TIGR assembler. These programs
generate a unigene set (contigs or Tentative
When constructing a representative cDNA Consensus): a consensus sequence for all
library, the source of the mRNA for the overlapping ESTs that (supposedly) corre-
cDNA library is critical and will vary spond to a single mRNA.
depending on the goal of the study. To esti- Several factors affect the quality of
mate the diversity of mRNAs expressed in EST contigs: contaminating sequences, bad
a given plant, the mRNA should represent quality sequences, non-overlapping ESTs
most plant tissues and organs. On the other from the same mRNA, alternative splicing
hand, to define the diversity of mRNAs resulting in one gene with multiple mRNAs
represented in a specific tissue, organ or and closely related genes (chimaeric con-
developmental stage, the library should tigs). EST annotation can be carried out
be prepared from the most highly defined using similarity searches against Genbank
source feasible. As indicated by Nunberg and other databases, e.g. protein motif data-
et al. (1996), it is better to invest the time bases, to assign a putative function or iden-
to harvest sufficient quantities of scarce tis- tify functional categories. This process can
sue for a library rather than using materials be automated or manual (usually a combi-
which will contain a significant proportion nation of the two).
of extraneous messages. Non-random (normalized or subtracted)
If large quantities of RNA are available, cDNA libraries are needed in order to over-
it is possible to create a plasmid library come some of the problems with redundant
directly. This is particularly feasible since ESTs in order to saturate EST databases when
electroporation transformation efficiencies budget is limited or when there is a specific
are so high. Plasmid libraries may or may interest in a particular stage. Hybridization-
not be directional and are easily arranged based methods are most commonly used
in an ordered array. Constructing plas- to decrease redundancy (reduce represen-
mid libraries directly avoids any sequence tation of abundant cDNAs and increase
bias, including internal deletion and trans rare cDNAs). Normalized cDNA libraries
recombination that may occur during the are used when gene discovery is the main
excision process. objective of the EST project.
The frequency of full-length cDNAs
depends on the length of transcript (the cDNA sequencing
longer the transcript the lower the frequency
of obtaining full-length cDNAs). Carninci Strategies for cDNA sequencing include
and Hayashizaki (1999) discussed the high- single-pass cDNA sequencing (ESTs),
Omics and Arrays 81

normalized cDNA libraries, subtracted nity to in silico simulations of plant growth,


cDNAs and high-throughput full-length development and response to environmen-
cDNA sequencing. Single-pass cDNA can tal change.
be achieved using the following steps:
(i) construct cDNA libraries; (ii) randomly
pick clones for sequencing (from the 5' or 3.3.1 Transcriptomics
3' end using vector primers); (iii) process
sequences (vector/linker removal, qual- The transcriptome is the set of all the mRNA
ity control, contaminants, empty, chimae- molecules or transcripts, produced in one
ric); (iv) construct contigs (sequences from cell or a population of cells. The term can
same transcript); (v) create a unigene set; be applied to the total set of transcripts in
and (vi) annotate sequences. The objective a given organism or to the specific subset
of high-throughput cDNA sequencing is of transcripts present in a particular cell
to obtain the full finished sequence of as type. Unlike the genome, which is roughly
many cDNAs as possible. This is necessary fixed for a given cell line (excluding muta-
for complex eukaryotic genomes (human, tions), the transcriptome can vary with
mouse, plants). Full-length cDNA sequenc- external environmental conditions. Because
ing is discussed in Chapter 11 along with its it includes all mRNA transcripts in the cell,
use in gene cloning. the transcriptome reflects the genes that are
Major limitations of the cDNA sequenc- being actively expressed at any given time
ing approach include: (i) high redundancy of with the exception of mRNA degradation
some genes in cDNA libraries; (ii) difficulty phenomena such as transcriptional attenu-
in isolating rare transcripts or developmen- ation. Transcriptomics is based on the idea
tally-regulated genes; and (iii) the fact that that a catalogue of all the transcripts associ-
some genes are not stable in E. coli. ated with a specific treatment or develop-
mental stage provides a reasonable overview
of the underlying biological processes at
3.3 Functional Genomics work. As we moved from northern blots to
tiling arrays, we have advanced from a gene-
The use of whole genome information and by-gene world to a full genome universe.
high-throughout tools has opened up a new The study of transcriptomics often uses
field of research called functional genomics. high-throughput techniques based on DNA
Among its subdisciplines, transcriptomics microarray or chip technology. Suggested
(the complete set of transcripts produced references for this section include Bernot
in a cell) (Zimmerli and Somerville, 2005), (2004), Bourgault et al. (2005) and Busch and
proteomics (the complete set of proteins Lohmann (2007).
produced in a cell) (Roberts, 2002) and Gene expression profiling technolo-
metabolomics (the complete set of metabo- gies provide a tool for analysing global
lites expressed in a cell) (Stitt and Fernie, gene expression by viewing activity of all
2003) have been used by the plant science or (more typically) a substantial part of the
community. Functional genomics refers to genome at a specific time of interest. There
the development and application of global are open and closed architecture systems
(genome-wide or system-wide) experimen- for gene expression profiling. In the open
tal approaches to assess gene function by architecture, all genes expressed in a tissue
making use of the information and reagents have the possibility of being detected (e.g.
provided by structural genomics. It is char- cDNA-AFLP, differential display (dd) PCR,
acterized by high-throughput or large-scale SAGE, cDNA substraction). Advantages
experimental methodologies combined include the potential discovery of previ-
with statistical and computational analy- ously unknown genes, comprehensive cov-
sis (bioinformatics) of the results. The new erage and the low requirements by way of
information provided by all the omics dis- equipment. Disadvantages include retriev-
ciplines will lead the plant science commu- ing only a small part of the gene (since it can
82 Chapter 3

be laborious to clone full-length cDNA) and indicated by Busch and Lohmann (2007),
simple gene identification that is limited the limited length of the sequenced tags
by sequences that are already in a database precludes the use of MPSS for de novo
(otherwise the corresponding gene must be sequencing but makes it a very powerful
cloned). tool for expression profiling of organisms
Several alternative technologies have with pre-existing sequence information.
emerged for measuring transcript abun- By contrast, two other high-throughput
dance in a parallel fashion. Essentially, these sequencing techniques as described previ-
methods can be divided into three catego- ously, 454 and Solexa, are ideally suited
ries according to their underlying principle, for expression-profiling purposes. Short
namely PCR-, sequencing- or hybridization- tags are sufficient to identify a transcript
based technologies. Therefore, strategies unambiguously and therefore problems
that are currently available for analysis of arising from assembling short tags into
transcriptomes include RT-PCR (qualitative larger contigs can be ignored.
and quantitative), hybridization methods PCR product-based arrays were heavily
(northern blots, macroarrays, DNA micro- used in the early days of global transcriptome
arrays, oligonucleotide microarrays), analysis. However, the low level of stand-
cDNA fingerprinting (differential display, ardization among laboratories, high levels
cDNA-AFLP), cDNA sequencing (full-length of noise and experimental variation and
cDNAs, subtracted cDNAs, normalized cross-hybridization between homologous
cDNA libraries, SAGE, massive parallel sig- transcripts have eroded the attractiveness of
nature sequencing MPSS) and combina- these arrays. Oligonucleotide-based micro-
tions of the above techniques. arrays are now becoming the most popular
The most straightforward and unbi- technology for large-scale expression pro-
ased method of analysing an RNA popu- filing because they allow the simultaneous
lation is the sequencing of cDNA libraries detection of tens of thousands of transcripts
and quantitative analysis of the result- at a reasonable cost. The expression level
ing ESTs. Traditionally, ESTs with read- of any gene represented on the array can
lengths of about 200900 nucleotides have be deduced from the fluorescence inten-
been produced by Sanger-sequencing but sity of the corresponding probe. However,
the associated costs have severely limited microarrays only offer linear expression
the resolution of this approach (Busch and measurements over a range of three orders
Lohmann, 2007). Deep sequencing has of magnitude compared to quantitative
become a viable alternative for unbiased RT-PCR which has a dynamic range of five
large-scale expression profiling because orders of magnitudes. Microarrays perform
of the development of new protocols and with less precision and sensitivity than
entirely new sequencing techniques. Non- other techniques when used for measuring
gel-based sequencing techniques promise low abundance transcripts in particular and
to deliver greatly increased throughput this is manifested in their greater inter-assay
and a considerable cost reduction. MPSS variability (Busch and Lohmann, 2007).
combines in vitro cloning of millions of Another major limitation of microarrays
template tags on separate microbeads designed for expression analysis is that they
with ligation-mediated sequence detec- rely on current genome annotations, which
tion. In each reaction cycle, a four-base precludes the identification of novel or very
overhang is produced on every tag to small transcription units.
which a fluorescently labelled adaptor of Microarrays and quantitative RT-PCR
defined sequence is ligated. The position have dominated expression profiling to date
and fluorescence of every microbead is but deep sequencing and whole-genome
monitored by a high resolution camera in tiling arrays will become increasingly
each of the reaction cycles, allowing the important because these techniques are
sequences of the 17-nucleotide tags to be not limited to the detection of known tran-
reconstructed (Brenner et al., 2000). As scripts. Tiling arrays, on which the entire
Omics and Arrays 83

genome is represented by evenly spaced only a rough estimate of its level of expres-
probes, provide a novel means of transcript sion into a protein. An mRNA produced
identification. In Arabidopsis, tiling arrays in abundance may be degraded rapidly or
have been used to map transcriptionally translated inefficiently, resulting in a small
active regions by profiling four different tis- amount of protein. Secondly, many proteins
sues (Yamada et al., 2003). experience post-translational modifications
The interaction transcriptome is the that profoundly affect their activities; for
sum of all microbe and host transcripts that example some proteins are not active until
are produced during the interaction. The they become phosphorylated. Methods
challenges in studying interaction transcrip- such as phosphoproteomics and glycopro-
tomes include how to discriminate patho- teomics are used to study post-translational
gen from host ESTs, similarity searches modifications. Thirdly, many transcripts
to genome/cDNA sequences, GC analyses give rise to more than one protein through
and determination of hexamer frequency alternative splicing or post-translational
(windows of 6 bp). Systems genomics/tran- modifications. It is generally supposed that
scriptomics can be used to analyse complex if genomes contain tens of thousands of gene
transcriptomes, for example the mixtures of sequences, the proteome comprises several
mRNAs from different species (e.g. infected hundred thousand proteins as a result of
tissue, environmental samples such as soil alternative slicing and post-translational
or seawater, etc.). One challenge is to iden- modifications. Finally, many proteins form
tify the species of origin in the mixtures. complexes with other proteins or RNA mol-
ecules and only function in the presence of
these molecules.
3.3.2 Proteomics Proteomics has become an important
approach for investigating cellular proc-
Proteomics is the study of the identification, esses and network functions. Significant
function and regulation of complete sets improvements have been made in technolo-
of proteins in a tissue, cell or subcellular gies for high-throughput proteomics, both at
compartment. Such information is crucial the level of data analysis software and mass
to understanding how complex biological spectrometry (MS) hardware (Baginsky and
processes occur at a molecular level and Gruissem, 2006). In this section, proteom-
how they differ in various cell types, stages ics will be briefly discussed. For further
of development or environmental condi- details, readers are referred to the follow-
tions (Bourgualt et al., 2005). Proteomics is ing review articles: van Wijk (2001), Molloy
important as proteins are active agents in and Witzmann (2002), de Hoog and Mann
cells and they execute the biological func- (2004), Saravanan et al. (2004), Baginsky and
tions encoded by genes. Sequences of genes Gruissem (2006), Cravatt et al. (2007) and
(or genomes) and transcriptome analyses Zivy et al. (2007).
are not sufficient to elucidate biological
functions. Proteomics complements tran- Protein extraction
scriptomics by providing information about
the time and place of protein synthesis Obtaining high quality protein is the first step
and accumulation, as well as identifying in proteomic research. Extracting protein
those proteins and their post-translational from plant tissue requires tissue disrup-
modifications. Gene expression does not tion by grinding and sonication, separation
necessarily indicate whether a protein is of proteins from unwanted cell materials
synthesized, how fast it is turned over or (cell wall, water, salt, phenolics, nucleic
which possible protein isoforms are synthe- acids) by centrifugation after precipitation
sized (Mathesius et al., 2003). In some cases, of proteins with acetonetrichloroacetic
the correlation between gene expression acid, resolubilizing protein in a solution
and protein presence is as low as 0.4. First, that dissolves the maximum number of dif-
the level of transcription of a gene gives ferent proteins and inactivation of protease
84 Chapter 3

by acetonetrichloroacetic acid treatment or tion can be calculated for all the known
specific protease inhibitors.Pre-fractionation sequence proteins of a given organism (Zivy
of tissue is optional for the analysis of pro- et al., 2007). These masses will depend on
teins from different organelles or micro- the length of peptides and their composi-
somal fractions. Solubilization requires urea tion since most amino acids have differ-
or, for more hydrophobic proteins, thiourea, ent masses. Thus, masses predicted from
as a chaotrope which solubilizes, denatures sequences stored in databases can simply be
and unfolds most proteins. Non-ionic zwit- compared with masses effectively measured
ter detergents, e.g. 3-[3-cholamidopropyl- by the MALDI-TOF equipment. The greater
dimethyl-ammonio]-1-propane sulfonate the number of positive mass matches the
(CHAPS), Triton-X, or amidosulfobetaines more likely it is that the peptides originate
are used to solubilize and separate proteins from the same protein thus facilitating the
in a mixture. Sodium dodecyl sulphate rapid identification of proteins.
(SDS) is also a strong detergent and used to
solubilize membrane proteins. However, it Protein profiling
renders a negative charge to proteins and,
therefore, interferes with isoelectric focus- Protein mixtures of considerable complexity
ing (Mathesius et al., 2003). Reducing agents can now be routinely characterized in some
(usually dithiothreitil [DDT], 2-mercapto- detail. One measure of technical progress is
ethanol or tributyl phosphine) are needed the number of proteins identified in each
to disrupt disulfide bonds. study. Such numbers can now reach the
thousands for suitably complex samples.
Protein identification and quantification Large-scale proteomic studies are needed
to solve three types of biological problem
N- or C-terminal sequencing has made pro- (Aebersold and Mann, 2003): (i) the genera-
tein identification possible on a small scale tion of proteinprotein linkage maps; (ii)
although with limitations. Improvements the use of protein identification technol-
in MS have made it possible to identify ogy to annotate and, if necessary, correct
proteins faster, on a larger scale, using genomic DNA sequences; and (iii) the use
smaller amounts of protein. In addition, of quantitative methods to analyse protein
post-translational modifications can be expression profiles as a function of the
determined by MS/MS analysis and pro- cellular state as an aid to inferring cellular
teins can be identified even when bound function.
to other proteins in complexes. A standard The sequences of many mature pro-
technique for protein identification with teins in higher eukaryotes after processing
MALDI-TOF MS is peptide mass finger- and splicing are often not directly apparent
printing. Protein spots in a gel can be vis- from their cognate DNA sequences. Peptide
ualized using a variety of chemical stains sequence data of sufficient quality provides
or fluorescent markers. Proteins can often unambiguous evidence of translation of a
be quantified by the intensity with which particular gene and can in principle, dif-
they stain. Once proteins have been sepa- ferentiate between alternatively spliced or
rated and quantified, they can be identi- translated forms of a protein (Aebersold
fied. Individual spots are cut out of the gel and Mann, 2003). Thus, it might be tempt-
and cleaved into peptides with proteolytic ing to systematically analyse the proteins
enzymes. These peptides can then be iden- expressed by a cell or tissue, that is, to gen-
tified by MS, specifically MALDI-TOF MS. erate comprehensive proteome maps.
The MALDI-TOF analysis will measure very The more common and versatile use
precisely (< 0.1 Da) the mass of peptides of large-scale MS-based proteomics has
formed by this digestion. Since the cleav- been to document the expression of pro-
age sites are known, the digestion can be teins as a function of cell or tissue state.
simulated by informatics, that is, the masses Aebersold and Mann (2003) argued that to
of all the peptides produced by this diges- be meaningful, such data must be at least
Omics and Arrays 85

semi-quantitative and that a simple list of There are many important charac-
proteins detected in the different states is teristics of a proteinprotein interaction.
insufficient. This is because analyses of Obviously, it is important to know which
complex mixtures are often not comprehen- proteins are interacting. In many experi-
sive and therefore the non-appearance of a ments and computational studies, the focus
particular sequence in the list of identified is on interactions between two different
peptides does not indicate that the peptide proteins. However, one protein can interact
or protein was not originally present in the with other copies of itself (oligomerization)
sample. Additionally, it is often impossible or with three or more different proteins.
to prepare a certain cell type, cell fraction The stoichiometry of the interaction is also
or tissue in completely pure form without important, that is, how many of each pro-
trace contamination from other fractions. tein involved are present in a given reac-
And because the ion current of a peptide is tion. Some protein interactions are stronger
dependent on a multitude of variables that than others because they bind together more
are difficult to control, this measure is not tightly. The strength of binding is known
a good indicator of peptide abundance. If as affinity. Proteins will only bind to each
stable-isotope dilution has not been used, a other spontaneously if it is energetically
rough relative estimate of the quantity of a favourable. Energy changes during bind-
protein can be obtained by integrating the ing are another important aspect of protein
ion current of its peptide-mass peaks over interactions. Many of the computational
their elution time and comparing these tools that predict interactions are based on
extracted ion currents between states, pro- the energy of interactions.
vided that highly accurate and reproducible Protein interaction maps represent
methods are used. Increasingly, stable-iso- essential components of the post-genomic
tope dilution and LC-MS/MS are used to tool kits needed for understanding biologi-
accurately detect changes in quantitative cal processes at a systems level. Over the
protein profiles and to infer biological func- past decade, a wide variety of methods have
tion from the observed patterns (Aebersold been developed to detect, analyse and quan-
and Mann, 2003). tify protein interactions, including surface
plasmon resonance spectroscopy, nuclear
Proteinprotein interactions magnetic resonance (NMR), Y2H screens,
peptide tagging combined with MS and
Proteinprotein interactions occur among fluorescence-based technologies. Lalonde
most proteins and there are six types of et al. (2008) and Miernyk and Thelen (2008)
interfaces found in proteinprotein inter- reviewed the latest techniques and cur-
actions: domaindomain, intra-domain, rent limitations of biochemical, molecular
hetero-oligomer, hetero-complex, homo- and cellular approaches for the detection
oligomer, and homo-complex. The analysis of proteinprotein interactions. In vitro
of proteinprotein interactions can be either biochemical strategies for identifying and
qualitative or quantitative. Traditional bio- characterizing interacting proteins include
chemical methods such as co-purification co-immunoprecipitation, blue native gel
and co-immunoprecipitation have been electrophoresis, in vitro binding assays, pro-
used to identify the members of protein tein cross-linking and rate-zonal centrifuga-
complexes. Proteomics-based strategies tion. Fluorescence techniques range from
have been used to determine the composi- co-localization to tags which may be limited
tion of complexes and to establish interac- by the optical resolution of the microscope,
tion networks. The systematic, large-scale, to fluorescence resonance energy transfer
high-throughput approaches now being (FRET)-based methods that have molecular
taken to build maps of the interactions resolution and can also report on the dynam-
between proteins predicted by genome ics and localization of the interactions within
sequence information have become known a cell. Proteins interact via highly evolved
as interactomics (Causier et al., 2005). complementary surfaces with affinities that
86 Chapter 3

can vary over many orders of magnitude. strate. For example, drugs can be used as
Some of the techniques such as surface plas- affinity baits in the same way as proteins to
mon resonance provide detailed information define their cellular targets and small mol-
regarding the physical properties of these ecules such as cofactors can be used to iso-
interactions. To analyse protein complexes late interesting sub-proteomes (MacDonald
systematically at a sub- or full-genome level, et al., 2002).
several methods have been adapted for high- The Y2H system has become one of
throughput screens using robotics: (i) Y2H the standard laboratory techniques for the
systems; (ii) the mating-based split-ubiquitin detection and characterization of protein
system (mbSUS); and (iii) affinity purifica- protein interactions. It can be used to map
tion of protein complexes followed by iden- individual amino acid residues involved
tification of proteins by MS (AP-MS). in a specific proteinprotein interaction.
One of the first questions usually asked It can also be used to identify novel inter-
about a new protein, apart from where it is actions from complex libraries of expressed
expressed, is to what proteins does it bind? proteins. The Y2H system has been widely
To study this question by MS, the protein used for determination of protein interac-
itself is used as an affinity reagent to isolate tion networks within different organisms.
its binding partners. Compared with two- In plants, the Y2H system has been suc-
hybrid and array-based approaches, this cessfully applied to detect interactions
strategy has the advantages that the fully with phytochromes, cryptochomes, tran-
processed and modified protein can serve scription factors, proteins involved in self-
as the bait, that the interactions take place incompatibility mechanisms, the circadian
in the native environment and cellular loca- clock and plant disease resistance (Causier
tion and that multi-component complexes et al., 2005). Taken together with the recent
can be isolated and analysed in a single progress made in the development of large-
operation (Ashman et al., 2001). However, scale Y2H screening procedures, the time is
because many biologically relevant interac- now ripe for large-scale Y2H screens to be
tions are of low affinity, transient and gen- applied to organisms such as Arabidopsis
erally dependent on the specific cellular and rice.
environment in which they occur, MS-based Another potential method to detect
methods in a straightforward affinity experi- proteinprotein interactions involves the
ment will detect only a subset of the protein use of FRET between fluorescent tags on
interactions that actually occur (Aebersold interacting proteins. FRET is a non-radio-
and Mann, 2003). Bioinformatics methods, active process whereby energy from an
correlation of MS data with those obtained excited donor fluorophore is transferred to
by other methods or iterative MS measure- an acceptor fluorophore that is within 60
ments possibly in conjunction with chemi- of the excited fluorophore (Wouters et al.,
cal crosslinking (Rappsilber et al., 2000) 2001). After excitation of the first fluoro-
can often help to further elucidate direct phore, FRET is detected either by emis-
interactions and overall topology of multi- sion from the second fluorophore using
protein complexes. appropriate filters or by alternation of the
The ability of quantitative MS to detect fluorescence lifetime of the donor. Two
specific complex components within a fluorophores that are commonly used are
background of non-specifically associated variants of GFP: cyan fluorescent protein
proteins increases the tolerance for high (CFP) and yellow fluorescent protein (YFP)
background and allows for fewer purifica- (Tsien, 1998). The potential of FRET is con-
tion steps and less stringent washing condi- siderable, for two reasons (Phizicky et al.,
tions, thus increasing the chance of finding 2003). First, it can be used to make meas-
transient and weak interactions. The same urements in living cells, which allows the
methods can be used to study the interac- detection of protein interactions at the
tion of proteins with nucleic acids, small location in the cell where they normally
molecules and in fact with any other sub- occur in the presence of the normal cellular
Omics and Arrays 87

environment. Secondly, transient interac- an irrelevant antibody or isolate from a


tions can be followed with high temporal cell devoid of affinity-tagged protein), the
resolution in single cells. Protein interac- method can distinguish between true com-
tion within the proteome might be mapped plex components and non-specifically asso-
by performing FRET screens on cell arrays ciated proteins.
that are co-transferred with cDNAs bearing
CFP and YFP fusion proteins. Post-translational modifications
In recent years there has been a strong
focus on predicting protein interactions Proteins are converted to their mature form via
computationally. Predicting the interac- a complicated sequence of post-translational
tions can help scientists predict pathways protein processing and decoration events.
in the cell, potential drugs and antibiotics Detection of post-translational modifications
and protein functions. Proteins are large is necessary, especially for phosphoryla-
molecules and binding between them often tion or ubiquitinylation because they affect
involves many atoms and a variety of inter- protein function. Phosphorylation can be
action types including hydrogen bonds, detected by the use of antiphosphotyrosine
hydrophobic interactions, salt bridges and antibodies on blots of 2DE or by radiolabel-
more. Proteins are also dynamic, with many ling proteins and detecting the labelled pro-
of their bonds able to stretch and rotate. teins. Glycosylation of proteins can easily
Therefore, predicting proteinprotein be detected on gels using the periodic acid
interactions requires a good knowledge of Schiff reaction. In addition, specific enzymes
the chemistry and physics involved in the can be used for selective cleavage of several
interactions. common post-translational modifications
The principle of using hybrid proteins (Mathesius et al., 2003).
to analyse interactions has been extended to Many of the post-translational modifi-
examine DNAprotein interactions, RNA cations are regulatory and reversible which
protein interaction, small moleculeprotein impacts biological function through a
interactions and interactions dependent on multitude of mechanisms. MS methods to
bridging proteins or post-translational mod- determine the type and site of such modi-
ifications. In addition, the reconstitution fications on single, purified proteins have
of proteins other than transcription factors been undergoing refinements since the late
such as ubiquitin, has been used to estab- 1980s. In this case, peptide mapping with
lish reporter systems to detect interactions different enzymes is usually used to cover
(Fashena et al., 2000) and these may enable as much of the protein sequence as possible.
the analysis of proteins not generally suit- Protein modifications are then determined
able for the traditional two-hybrid arrays by examining the measured mass and frag-
such as membrane proteins. mentation spectra via manual or computer-
In the future, quantitative methods assisted interpretation. For the analysis of
based on stable-isotope labelling are likely some types of PTMs, specific MS techniques
to revolutionize the study of stable or tran- have been developed that scan the peptides
sient interactions and interactions depend- derived from a protein for the presence of
ent on post-translational modifications. a particular modification. The analysis of
In such experiments, accurate quantifica- regulatory modifications, in particular pro-
tion by means of stable-isotope labelling tein phosphorylation, is complicated by the
is not used for protein quantification per frequently low stoichiometry, the size and
se; instead the stable-isotope ratios distin- ionizability of peptides bearing the modi-
guish between the protein composition of fications and their fragmentation behav-
two or more protein complexes (Aebersold iour in the mass spectrometer (Aebersold
and Mann, 2003). In the case of a sample and Mann, 2003). Given the difficulties of
containing a complex and a control sam- identifying all modifications even in a sin-
ple containing only contaminating proteins gle protein, it is clear that at present, scan-
(for example, immunoprecipitation with ning for proteome-wide modifications is not
88 Chapter 3

comprehensive. One of the strategies used metabolites occur in an individual species


is essentially an extension of the approach vary within the 525,000 range (Trethewey,
used to analyse protein mixtures. Instead of 2005). Metabolites are the products of inter-
searching a database only for non-modified related biochemical pathways and changes
peptides, the database search algorithm is in metabolic profiles can be regarded as the
instructed to also match potentially modi- ultimate response of biological systems to
fied peptides. To avoid a combinatorial genetic or environmental changes (Fiehn,
explosion resulting from the need to con- 2002). Plant metabolism research has expe-
sider all possible modifications for all pep- rienced a second golden age resulting from
tides in the database, the experiment is synergies between genome-enabled tech-
usually divided into identification of a set nologies and classical biochemistry. The
of proteins on the basis of non-modified rapid rate at which genomics data are being
peptides followed by searching only these accumulated creates an increased need for
proteins for modified peptides (MacCoss robust metabolomic technologies and rapid
et al., 2002). A more functionally oriented and accurate methods for identifying the
strategy focuses on the search for one type of activities of enzymes (DellaPenna and Last,
modification on all the proteins present in a 2008).
sample. Such techniques are usually based The metabolome refers to the complete
on some form of affinity selection that is set of small-molecule metabolites (such as
specific for the modification of interest and metabolic intermediates, hormones and
which is used to purify the sub-proteome other signalling molecules and secondary
bearing this modification. metabolites) that can be found within a bio-
Many challenges remain in the large- logical sample such as a single organism.
scale mapping of post-translational modi- Metabolomics is defined as the systematic
fications but it is clear that MS-based survey of all the metabolites present in a
proteomics can make a unique contribution plant tissue, cell and cellular compartment
in this area. For example, systematic quan- under defined conditions (Bourgault et al.,
titative measurements of post-translational 2005). The name metabolomics was coined
modifications by stable-isotope labelling in the 1990s (Oliver et al., 1998). The foun-
would be of considerable biological interest. dations of metabolomics lie in the descrip-
One of the future challenges in proteomics is tion of biological pathways and current
to increase sensitivity to visualize low abun- metabolomic databases, such as KEGG, are
dance proteins (e.g. regulatory proteins) as frequently based on well-characterized bio-
only 10% of proteins can be now visualized chemical pathways. Metabolomics might
by 2DE. It needs a high quality database for be considered to be the key to integrated
matching sequence to MS data (or the use systems biology because it is frequently a
of MS/MS). Technical developments are direct gauge of a desired phenotype (Fiehn,
needed for understanding post-translational 2002), measuring quantitative and qualita-
modifications, protein complexes, protein tive traits such as starches in cereal grains
localization and the interface with tran- or oils in oilseeds. Moreover, metabolomics
scriptomics and metabolomics. can be correlated with genetics through
genomes, transcriptomes and proteomes
and therefore bypass the more traditional
quantitative trait loci (QTL) approach
3.3.3 Metabolomics applied to molecular plant breeding. Major
recommended references for this section
Plants contain a wide diversity of low- include Fiehn (2002), Sumner et al. (2003),
molecular-weight chemical constituents. Weckwerth (2003), Bourgault et al. (2005),
More than 100,000 secondary metabolites Breitling et al. (2006), Schauer and Fernie
have been identified in plants and this prob- (2006) and Krapp et al. (2007).
ably represents less than 10% of natures Targeted metabolomics involves exami-
total (Wink, 1988). Estimates of how many nation of the effects of a genetic alteration or
Omics and Arrays 89

change in environmental conditions on par- The global study of the structure and
ticular metabolites (Verdonk et al., 2003). dynamics of metabolic networks has been
Sample preparation is focused on isolating hindered by a lack of techniques that iden-
and concentrating the compound of inter- tify metabolites and their biochemical
est to minimize detection interference from relationship in complex mixtures. Recent
other components in the original extract. advances in ultra-high mass accuracy MS
Metabolite profiling refers to a qualitative provide two advantages that can enable ab
and quantitative evaluation of metabolite initio determination of metabolic networks:
collections, for example those found in a (i) the ability to identify molecular formu-
particular pathway, tissue or cellular com- lae based on exact masses; and (ii) the infer-
partment (Burns et al., 2003). Finally, meta- ence of biosynthetic relationships between
bolic fingerprinting focuses on collecting masses directly from the mass spectrum.
and analysing data from crude extracts to Mass spectrometers with the necessary per-
classify whole samples rather than separat- formance parameters (mass accuracy around
ing individual metabolites (Johnson et al., 1 ppm and resolution above 100,000 m/m)
2003; Weckwerth, 2003). are now within the reach of many research-
In stark contrast to transcriptomics and ers and will change the way we think about
proteomics, metabolomics is mainly spe- metabolomics (Breitling et al., 2006). The
cies-independent, which means that it can recent application of Fourier transform
be applied to widely diverse species with ion cyclotron resonance MS (FTICR-MS)
relatively little time required for re-optimiz- to metabolomic analysis suggests a way to
ing protocols for a new species. Metabolite tackle the problem. A lower-cost alterna-
profiling can monitor variation in the accu- tive to high-field FTICR-MS, the Orbitrap
mulation of metabolites in plant cells in mass analyser, promises accelerated activ-
culture which are ectopically expressing ity in this area. These two analysers are able
transcription factors, as a hypothesis-gener- to achieve high resolution and mass accu-
ating tool to establish the possible pathways racy in the 1-ppm range for biomolecular
regulated by particular regulatory proteins. samples. In both instruments, the ionized
The first step consists of generating a trans- metabolite mixture is trapped in an orbital
genic cell line expressing the regulator from trajectory. The frequency of their orbit
a constitutive or inducible promoter. The depends on the mass-over-charge ratio of the
second step is to subject extracts from trans- ions and can be measured precisely, which
formed and control cells to various meta- is the basis of the exceptional accuracy. In
bolic profiling approaches to determine the FTICR-MS, trapping is achieved in a strong
qualitative and quantitative differences in magnetic field which exerts a force on the
metabolite accumulation. A more practical charged particles that is perpendicular to
approach to monitoring and purifying indi- their direction of motion and thus confines
vidual metabolites is to profile hundreds or them to a circular path. The Orbitrap traps
thousands of small molecules biochemically ions without a magnetic field and ions are
and to screen for changes in the relative trapped in a radial electrical field between
levels of these compounds. By comparing a central and an outer cylindrical electrode.
two conditions, a profile of the differences Theoretically, the resolving power of the
can be obtained that is then used as a blue- FTICR-MS and Orbitrap is sufficiently high
print to identify the individual compounds to resolve even the most complex metabo-
affected (Dias et al., 2003). The immense lite mixtures using direct infusion.
chemical diversity of small biomolecules Gas chromatography (GC)-MS or
makes comprehensive metabolome screens LC-MS is the tool of choice for generating
difficult. The lack of unifying principles high-throughput data for identification and
such as genetic codes that would assist mol- quantification of small-molecular-weight
ecule identification, comparison and causal metabolites (Weckwerth, 2003). Capillary
connection is another important challenge electrophoresis (CE) is an alternative
(Breitling et al., 2006). method which separates particular types of
90 Chapter 3

compound more efficiently and can be cou- centration in a single NMR experiment with
pled with MS or other types of detectors. excellent reproducibility.
NMR, infrared (IR), ultraviolet (UV) and HPLC and GC are the most widely used
fluorescence spectroscopy can be used as analytical techniques for the separation of
alternative means of detection, often in par- small metabolites. GC is used to separate
allel with MS (Weckwerth, 2003). TOF MS compounds on the basis of their relative
technology has also been used in metabo- vapour pressure and affinities for the sta-
lite analysis and provides a means of high tionary phase in the chromatographic col-
sample-throughput. In the end, a combina- umn. It offers very high chromatographic
tion of methods enables analysis of a broad resolution but requires chemical derivatiza-
range of metabolites. tion for many biomolecules: only volatile
NMR is a spectroscopic technique that chemicals can be analysed without deriva-
exploits the magnetic properties of the tization. Some large and polar metabolites
atomic nucleus (Macomber, 1998). In NMR, cannot be analysed by GC. GC tends to give
the sample is immersed in a strong external much greater chromatographic resolution
magnetic field and transitions between the than HPLC but has the disadvantages of
nuclear magnetic energy levels are induced being limited to compounds that are vola-
by a suitably oriented radiofrequency field. tile and heat stable. A big advantage of GC
In theory, any molecule containing one is that it can be easily combined with MS,
atom with a non-zero nuclear spin (I) is which greatly increase its utility for multi-
potentially visible by NMR. Considering component profiling because of its inherent
the isotopes with a non-zero nuclear spin high specificity, high sensitivity and positive
such as 1H, 13C, 14N, 15N and 31P, all biologi- peak confirmation (Dias et al., 2003). HPLC
cal molecules have at least one NMR signal. is a form of column chromatography used
There is wide variation in the sensitiv- frequently in biochemistry and analytical
ity of the experiment for different nuclei, chemistry. It is used to separate components
hence 1H NMR remains the best choice for in a mixture by using a variety of chemical
metabolite profiling by NMR mainly due to interactions between the substance being
its natural abundance (99.8%) and sensitiv- analysed (analyte) and the chromatography
ity (Moing et al., 2007). The NMR spectrum column. Compared to GC, HPLC has lower
generally consists of a series of discrete chromatographic resolution but it does have
lines (resonances) which are character- the advantage that a much wider range of
ized not only by the familiar spectroscopic analytes can potentially be measured.
quantities of frequency (chemical shift), The generation of reproducible and
intensity and line shape, but also by relaxa- meaningful metabolomic data requires great
tion times. Although less sensitive than GC care in the acquisition, storage, extraction
or LC-MS, proton NMR spectroscopy is a and preparation of samples (Fiehn, 2002).
powerful complementary technique for the The true metabolic state of samples must be
identification and quantitative analysis of maintained and additional metabolic activ-
plant metabolites either in vivo or in tissue ity or chemical modification after collec-
extracts (Krishnan et al., 2005). Typically, tion must be prevented. Depending on the
2040 metabolites have been identified in type of sample and the analysis performed,
metabolite profiling of plant extracts and this can be achieved in various ways. The
the number of metabolites quantified can be most common strategies are freezing in liq-
increased with higher field strength (increas- uid nitrogen, freeze-drying, and heat dena-
ing spectral resolution) and by using micro- turation to halt enzymatic activity (Fiehn,
probes for small quantity samples together 2002). Metabolomic experiments are typi-
with cryogenic probe heads (increasing cally conducted by comparing experimen-
sensitivity). One of the main advantages of tal plants possessing an expected metabolic
1
H-NMR is that structural and quantitative modification (i.e. because of the introduc-
information can be obtained on numerous tion of a transgene or exposure to a particu-
chemical species with a wide range of con- lar treatment) to control plants. Statistically
Omics and Arrays 91

significant changes in metabolite levels using a novel extraction method whereby


attributable to perturbations affecting the RNA, proteins and metabolites were all
experimental plants are identified. Natural extracted from a single sample (Weckwerth
variability in metabolite levels occurs as part et al., 2004).
of normal homeostasis in plants; thus a high Parallel to the development of the
number of replicates is typically necessary technologies of metabolite profiling, there
to establish a statistically significant dif- has been a bewildering proliferation in
ference between experimental and control the nomenclature associated with this
plants, especially if the differences between field. At the root of the problem is that
metabolite levels are subtle (Johnson et al., some groups have chosen to use the term
2003). In order to validate metabolomic metabolomics while others have opted
studies and to facilitate data exchange, the for metabonomics. Metabolomics will be
Metabolomics Standards Initiative (MSI) used in this book as it is derived from
has released documents describing mini- metabolic profiling or fingerprinting and
mum parameters for reporting metabolomic should be a parallel terminology to tran-
experiments. The reporting parameters scriptomics and proteomics (Trethewey,
encompassed by MSI include the biologi- 2005). The Human Metabolome Project
cal study design, sample preparation, data led by Dr David Wishart of the University
acquisition, data processing, data analysis of Alberta, Canada, completed the first
and interpretation relative to the biological draft of the human metabolome consisting
hypotheses being evaluated. Fiehn et al. of 2500 metabolites, 1200 drugs and 3500
(2008) exemplified how such metadata food components (Wishart et al., 2007).
could be reported by using a small case Schauer and Fernie (2006) assessed the
study: the metabolite profiling by GC-TOF contribution of metabolite profiling to sev-
mass spectrometry of Arabidopsis thaliana eral fields of plant metabolomics. As a fast
leaves from a knock-out allele of the gene growing technology, metabolite profiling is
At1g08510 in the Wassilewskija ecotype. useful for phenotyping and diagnostic analy-
The large data sets and multitude of ses of plants. It is also rapidly becoming a
metabolites require computer-based applica- key tool in functional annotation of genes
tions to analyse complex metabolomic exper- and in the comprehensive understanding
iments. Ideally, such systems compile and of the cellular response to biological condi-
compare data from a variety of separation tions such as various stresses of biotic and
and detection systems (Sumner et al., 2003). abiotic origin. Metabolomics approaches
Ultimately, gene functions can be predicted or have recently been used to assess the natu-
global metabolic profiles associated with par- ral variation in metabolite content between
ticular biological responses can be defined. individual plants, an approach with great
Multivariate data analysis techniques that potential for the improvement of the com-
reduce the complexity of data sets and enable positional quality of crops.
more simplified visualization of metabolomic
results are currently available. These include
principle-component analysis (PCA), hier- 3.4 Phenomics
archical clustering analysis (HCA), K-means
clustering and self-organizing maps (Sumner Phenomics is a field of study concerned with
et al., 2003). the characterization of phenotypes, which are
Considering the natural variability in characteristics of organisms that arise via the
transcript, protein and metabolite levels in interaction of the genome with the environ-
plants of the same genotype, correlations ment. Genomics has spawned a plethora of
within complex fluctuating biochemical related omics terms that frequently relate to
networks can be revealed using PCA and established fields of research. Of these terms,
HCA (Weckwerth, 2003; Weckwerth et al., phenomics, the high-throughput analysis of
2004). Metabolic networks were integrated phenotypes, has the greatest application in
with gene expression and protein levels plant breeding.
92 Chapter 3

3.4.1 Importance of phenotypes matics extrapolation were associated with


in genomics too much noise and were becoming non-
productive. Instead, he called for a renewed
For all sequenced organisms from the most focus on cellular studies and the creation of
thoroughly studied and simple bacterial function-based cell maps in a variety of cell
cells to humans, only about two-thirds of types by the year 2020.
all genes have an assigned biochemical However, generating phenotypic maps
function and only a fraction of those are will not be easy. Scientists generally test and
associated with a phenotype. Even when measure phenotypes one at a time, which
phenotypes are assigned, they might repre- is too slow. Almost every model system
sent only a partial understanding of the role in which the genome has been sequenced
of the gene. The function of a gene cannot has used functional genomics projects
be fully understood until it is possible to to associate the genome with the biology
predict, describe and explain all the phe- and this typically includes some efforts
notypes that result from the wild-type and that involve phenomics. Many large-scale
mutant forms of that gene (Bochner, 2003). projects are being carried out by generally
Phenotypes often cannot be predicted using and adapting diverse existing pheno-
on the basis of the biochemical function of typic technologies that range from animal
a gene alone because it is not clear how cat- autopsies to mass spectrometer analysis of
alytic or regulatory activity will affect the cellular metabolites. A phenotype micro-
biology of the cell or the whole organism. array technology was devised that had sev-
However, if a gene has a biological function eral attributes (Bochner, 2003): (i) it could
then, for every identified gene it should assay about 2000 distinct culture traits;
be possible to define at least one pheno- (ii) it could be used with a wide range of
type. A second layer of genomic annotation microbial species and cell types; (iii) it would
could then follow in which every gene is be amenable to high-throughput studies and
described biologically by the phenotypes automation; (iv) it would allow phenotypes
that it produces. The first step is to construct to be recorded quantitatively to facilitate
a so-called phenomic map and in diploid comparisons over time; (v) it would give a
and higher plants this will be complicated comprehensive scan of the physiology of the
by the fact that several genes can affect gene cell; and (vi) by providing global cellular
expression and the resulting phenotypes of analysis, it would provide a complement to
each other, leading to epistasis, complex genomic and proteomic studies.
traits and multifactorial stress responses
(Bochner, 2003).
Advances in genetic and genomic anal-
ysis are being hindered by the slow pace at 3.4.2 Phenomics in plants
which biological (that is, phenotypic) infor-
mation is being obtained, which is not keep- The great plasticity of plant genomes in
ing pace with genomic information. Bochner producing various phenotypes from a
(1989) predicted that global phenotypic small amount of genetic variation has pro-
analysis would soon be needed to comple- vided both challenges and opportunities for
ment the massive amounts of genetic data crop improvement. Detailed and systematic
being obtained and Brown and Peters (1996) analysis of phenotype requires both a data
called attention to the phenotype gap in repository and a means of structure inter-
mouse research. The Nobel laureate Sydney rogation. The field of phenomics developed
Brenner, in a keynote address (at a joint from the phenotypic characterization of
Cold Spring Harbor Laboratory/Wellcome mutant plants, the descriptions of which
Trust Genome Informatics Conference held have been published in volumes that fre-
at Hinxton in the UK on 9 September 2002) quently use structured ontological terms.
emphasized that approaches that relied The storage of these data in searchable
heavily on genome sequences and bioinfor- databases together with the application of
Omics and Arrays 93

phenomics to high-throughput analysis, The growth stages are described as germina-


plant development and natural variation, tion and sprouting, leaf development (main
creates the final link in the chain from the shoot), formation of side shoot to tillering,
genetics of crop development to crop pro- stem elongation or rosette growth (main
duction (Edwards and Batley, 2004). shoot, shoot development), development
There is an additional need to make of harvestable vegetative plant parts, inflo-
phenotypic data from different organisms rescence emergence (main shoot) and ear
simultaneously searchable, visible and or panicle emergence, flowering on main
most importantly, comparable (Lussier and shoot, development of fruit, ripening or
Li, 2004). As an example of attempts in maturity of fruit and seed and senescence
this field, PHENOMICDB has been created as a the beginning of dormancy.
multi-species genotype/phenotype database Mutant analysis provides an alternative
by merging public genotype/phenotype and typically more reliable means to assign
data from a wide range of model organisms gene function. However, this phenotype-
and Homo sapiens (Kahraman et al., 2005). centric process, classically known as for-
To provide systematic descriptions of phe- ward genetics, typically is not suitable
notypic characteristics of gene deletion for systematic genome-wide gene analy-
mutants on a genome-wide scale, a public sis, primarily due to the enormous effort
resource for mining, filtering and visual- required to identify each gene responsi-
izing phenotypic data the PROPHECY data- ble for a particular phenotype. In spite of
base was established. PROPHECY is designed improvements in the cloning of genes on
to allow easy and flexible access to physi- the basis of phenotype (such as availability
ologically relevant quantitative data for the of whole-genome sequences, large numbers
growth behaviour of mutant strains in the of mapped polymorphisms and faster and
yeast deletion collection during conditions cheaper genotyping technologies), it can
of environmental challenges. often take over a year for a skilled scientist
In plant biology, comparison of data to move from a mutant to the affected gene.
collected by laboratories in which plants As indicated by Alonso and Ecker (2006),
are grown under slightly different condi- the combination of classical forward genet-
tions can be problematic. This is especially ics with recently developed genome-wide,
true if the data are collected solely with ref- gene-indexed mutant collections is begin-
erence to chronological age. Kjemtrup et al. ning to revolutionize the way in which
(2003) described the development of a plant gene functions are studied in plants. High-
phenotyping platform based on a growth throughput screens using these mutant pop-
stage scale that will aid in the generation of ulations should provide a means to analyse
coherent data. While their emphasis is on plant gene functions the phenome on a
Arabidopsis, the principles they describe genomic scale.
can also be applied to other plant sys-
tems. They adapted a modified version of
the BBCH scale which is named after the
consortium of agricultural companies that 3.5 Comparative Genomics
developed it (BASF, Bayer, Ciba-Geigy and
Hoechst), for high-throughput phenotyp- Comparative genomics has been used to
ing of Arabidopsis to collect data for both address four major research areas (Schranz
quantitative and qualitative traits spread et al., 2007). First, all comparative analy-
over the developmental timeline of the ses are based on phylogenetic hypotheses.
plant. In the first phase of the method, data In turn, genomics data can be used to con-
are collected enabling a series of landmark struct more robust phylogenies. Secondly,
growth stages to be defined. The second comparative genome sequencing has been
phase involves the collection of detailed crucial in identifying changes in genome
data for additional traits that are of particu- structure that are due to rearrangements,
lar interest at any one of these given stages. segmental duplications and polyploidy.
94 Chapter 3

The alignment of multiple genomes can also 3.5.1 Comparative maps


be used to reconstruct an ancestral genome.
Thirdly, comparative genomics data have A comparative map aligns two or more spe-
been used to annotate homologous genes cies-specific maps using common sets of
and subsequently to identify conserved cis- markers or sequences. It requires identifica-
regulatory motifs. Having multiple genomes tion of regions of sequence similarity in the
of varying phylogenetic depths has proven genomes of different species or genera (i.e.
very useful for detecting conserved non- typically, genes). Sequence similarity can
coding sequences. Fourthly, comparative be identified due to common evolution-
genomics is used to understand the evolu- ary origins. Gene repertoire and gene order
tion of novel traits. may be found conserved over larger chro-
Comparative genomics provides the mosomal segments between closely related
potential for trait extrapolation from a spe- species. The long-term goals of compara-
cies where the genetic control is well under- tive genomics are to establish relationships
stood and for which there are molecular between map, sequence and functional
markers to a species about which there is genomic information across all plant spe-
a limited amount of information. For exam- cies and to facilitate taxonomic and phylo-
ple, rice is regarded as a model for cereal genetic studies in higher plants.
genomics because of its small genome. The
similarity of cereal genomes in general Importance of comparative maps
means that the genetic and physical maps
of rice can be used as reference points for The objective of the development of a com-
exploration of the much larger and more parative map is to identify subsets of genes
difficult genomes of the other major and that have remained relatively stable in both
minor cereal crops (Wilson et al., 1999). sequence and copy number since the radia-
Conversely, decades of breeding work and tion of flowering plants from their last com-
molecular analysis of maize, wheat and bar- mon ancestor. Why are comparative maps
ley can now find direct application in the so important? First, eukaryotic genomes
improvement of rice. Comparative genom- are organized into chromosomes and maps
ics can also be used to locate desirable alle- summarize genetic information using chro-
les in gene pools close to the target crop so mosomes as the organizational principle.
that transfer can achieved by conventional Secondly, conservation of gene identity
methods (Kresovich et al., 2002). and gene order along the chromosomes
Across plant species, genome size does determines potential for sexual reproduc-
not correlate with number of genes or bio- tion; disruption leads to speciation and
logical complex. Physical size of genomes major evolutionary change. Thirdly, species
across plant species varies greatly, while maps provide the context for the study of
genetic size of genomes is roughly equiva- inheritance and chart the history of genetic
lent. Large genomes usually have large change. Fourthly, comparative maps are the
physical:genetic distance ratios. Also the major tools for ferrying genetic information
relationship between genes and the number back and forth across species and genera in
of gene families is not clear. In this section, a systematic fashion.
comparative maps and collinearity among Once chromosomal duplications are
related species and their implications will identified in a genome and the timing of a
be discussed. Key references recommended duplication/polyploidization event has been
for an overview of comparative genomics determined relative to angiosperm diver-
include Shimamoto and Kyozuka (2002), gence nodes, ancestral gene order within the
Ware and Stein (2003), Miller et al. (2004), duplicated segments can be inferred. Map
Caicedo and Purugganan (2005), Filipski comparisons across divergent genera show
and Kumar (2005), Koonin (2005), Xu et al. greater conservation of ancestral gene order
(2005), Schranz et al. (2007) and Tang et al. and gene repertoire once genome-wide
(2008). duplication/gene loss within each genome
Omics and Arrays 95

is accounted for. Map comparisons between based on inferred protein matches between
closely related species are largely unaffected 26,028 genes. A total of 34 non-overlapping
because most duplications pre-date them. chromosomal segment pairs were identified
Comparative maps lay the groundwork for consisting of 23,177 (89%) Arabidopsis genes
asking questions about whether specific (Bowers et al., 2003b). To relate this alpha
linkage blocks or gene arrangements are sta- duplication to the angiosperm family tree, all
tistically associated with increased fitness or duplicated syntenic Arabidopsis gene pairs
have a relationship between polyploidy and were compared to individual genes from
plant adaptation. For example, comparative pine, rice, tomato, Medicago, cotton and
linkage mapping and chromosome painting Brassica. It was determined whether inferred
in the close relatives of Arabidopsis have protein sequences were from duplicated
inferred an ancestral karyotype of these spe- syntenic gene pairs. Arabidopsis genes were
cies. In addition, comparative mapping to more similar to one another than to the heter-
Brassica has identified genomic blocks that ologous protein in another species.
have been maintained since the divergence
of the Arabidopsis and Brassica lineages RELATIVE AGE OF CHROMOSOMAL DUPLICATION EVENTS.
(Schranz et al., 2007). It was concluded that the alpha duplication
event pre-dated divergence from Brassica
An example: Arabidopsistomato about 14.520.4 million years ago but post-
comparative map dated divergence from cotton about 8386
million years ago.
DEVELOPMENT OF ARABIDOPSISTOMATO COMPARA-
About 50% (4964%) of Brassica
TIVE MAP TO DETECT MACROSYNTENY. Fulton et al.
sequences were more similar to one dupli-
(2002) identified over 1000 conserved cated Arabidopsis sequence than was the other
orthologous sequences (COS) between Arabidopsis sequence to its paralogue. Only
tomato and Arabidopsis by comparison of 619% of cotton, rice, pine, etc. sequences
Arabidopsis genomic sequence with 130,000 clustered internally to the Arabidopsis syn-
tomato ESTs (representing 27,000 unigenes tenic duplicates (Bowers et al., 2003b).
or approximately 50% of the tomato gene
content). For 1025 COS markers developed,
POLYPLOID ANCESTRY OF MOST PLANT SPECIES. As
927 were screened against tomato DNA
using Southern analysis to classify them as more data accumulates, the history of
single, low or multiple copy, among which angiosperms emerges as a history of genome-
85% were considered to be single or low wide duplication followed by massive gene
copy (> 95% hybridization signal assigned loss (and return to diploidy). Only 30% of
to three or fewer restriction fragments) and Arabidopsis genes have retained syntenic
50% matched a gene of unknown function copies in less than 86 million years since
(Gene Ontology classification). A total of 550 the alpha duplication. In contrast, mam-
COS markers was mapped on to the tomato mals appear to harbour fewer polyploidiza-
genome. The size of conserved segments was tion events and less cycling of duplicated
generally smaller than 10 cM. Results indi- genes; 70% of human and mouse proteins
cated that multiple polyploidization events show conserved synteny after 100 million
punctuate the evolution of Arabidopsis and years of evolution.
tomato. Distinguishing orthologues from
paralogues is difficult due to reciprocal loss
of genes and chromosome segments follow- 3.5.2 Collinearity
ing polyploidization events.
Orthology and paralogy
PHYLOGENETIC ANALYSIS OF CHROMOSOMAL DUPLI-
The
CATION EVENTS TO DETECT MICROSYNTENY. Figure 3.9 shows the concepts of orthology
Arabidopsis genome sequence was used and paralogy. Orthologues and paralogues
to analyse internal duplication events are two types of homologous sequence.
96 Chapter 3

Homologues somes were highly collinear with those of


several other grass species and extensive
Orthologues Paralogues Orthologues work has shown a remarkable conserva-
tion of large segments of linkage groups
within rice, maize, sorghum, barley, wheat,
Frog a Chick a Mouse a Mouse b Chick b Frog b
rye, sugarcane and other agriculturally
important grasses (e.g. Ahn and Tanksley,
a-chain gene b-chain gene
1993; Kurata et al., 1994; van Deynze et al.,
1995a; Wilson et al., 1999). These studies
Gene duplication
led to the prediction that grasses could be
studied as a single syntenic genome. The
Early globin gene macrocollinearity was summarized by Gale
and Devos (1998) for rice and seven other
Fig. 3.9. The concepts of orthology and paralogy
cereals using what is now known as the
(from http://www.ncbi.nlm.nih.gov/Education/
circle diagram (Plate 1). Further studies
BLASTinfo/Orthology.html).
identified QTL controlling important agro-
nomic traits which showed similarities in
Orthology describes genes in different spe- locations for the same or similar traits (as
cies that derive from a common ancestor. reviewed by Xu, 1997). Shattering and plant
Orthologous genes may or may not have the height are examples that were also mapped
same function. Paralogy describes genes to collinear regions among grass genomes
that have duplicated (tandemly or moved (Paterson et al., 1995; Peng et al., 1999).
to a new location) within a genome since More recently, Chen et al. (2003) identified
they descended from a common ancestral four QTL for quantitative resistance to rice
gene. The word synteny (from the Greek blast that showed corresponding map posi-
syn, together, and taenie, ribbon) refers tions between rice and barley, two of which
to linkage of genes along a chromosome; had completely conserved isolate specifi-
currently used to indicate conservation of city and the other two had partial conserved
gene order across species. From this defi- isolate specificity. Such corresponding loca-
nition, macrosynteny means conservation tions and conserved specificity suggested a
of gene order across species detected at common origin and conserved functionality
low resolution (i.e. genetic maps) while of the genes underlying the QTL for quan-
microsynteny means conservation of gene titative resistance, which may be used to
order across species analysed by high res- discover genes, understand the function of
olution (i.e. physical or sequence-based the genomes and identify the evolutionary
maps). forces that structured the organization of
the grass genomes. Such findings reinforce
Macrocollinearity the notion of collinearity among the cereal
genomes.
Significant genomic collinearity in plants This unified grass genome model has
has been shown by comparative genetic had a substantial impact upon plant biol-
mapping and genome sequencing, although ogy but has not yet lived up to its potential.
plant genomes vary greatly in genome size There are some difficulties in evaluating
and chromosome number and morphology. synteny between genomes at the macro-level
Comparative mapping of cereal genomes (Xu et al., 2005). First, the genomic marker
using low copy number, cross-hybridizing data are very incomplete and genomic
genetic markers has provided compelling sequence data are largely lacking for many
evidence for a high level of conservation of grass species. Secondly, the data are some-
gene order across regions spanning many times biased because the homologous DNA
megabases (i.e. macrocollinearity). Initial probes used in comparative mapping are
studies of the organization of grass genomes selected for simple cross-hybridization pat-
indicated that individual rice chromo- terns. Thirdly, many genes are members of
Omics and Arrays 97

gene families and, accordingly, it is often dif- revealed excellent conservation between
ficult to determine if a gene mapped in the the overall structure and gene order of sor-
second species is orthologous or paralogous ghum chromosome 3 and rice chromosome
to that in the first species. Fourthly, the col- 1 but also indicated several rearrangements.
linearity of gene order and content observed Together, these studies indicate a general
at the recombinational map level is often conservation of large syntenic blocks within
not observed at the level of local genome cereals but with many more rearrangements
structure (Bennetzen and Ramakrishna, and synteny breakdowns than originally
2002). Finally, in most early studies, no anticipated.
statistical analysis was used to evaluate This trend is even more obvious when
whether the presence of a few markers in synteny is analysed at the sequence level.
the same order on two chromosomal seg- Rearrangements may occur that involve
ments in two species occurs by chance or is regions smaller than a few centimorgans and
truly significant. would be missed by most recombinational
The genome collinearity of several mapping studies. Comparative sequence
Cammelineae and Brassicaceae species analysis involving large genomic segments
have been recently compared to that of can detect these rearrangements. Such anal-
A. thaliana by comparative genetic link- yses reveal the composition, organization
age mapping and comparative chromosome and functional components of genomes and
painting (Schranz et al., 2007). A compre- provide insight into regional differences
hensive study identified 21 syntenic blocks in composition between related species.
that are shared by Brassica napus and Recently, the sequencing of genomic seg-
A. thaliana genomes, corresponding to 90% ments in the cereals has enabled microcol-
of the B. napus genome (Parkin et al., 2005). linearity across genes or gene clusters to
be investigated. Sequencing of the domes-
Microcollinearity tication locus Q in Triticum monococcum
revealed excellent collinearity with the
Using the rice genome sequence as the ref- bread wheat genetic map (Faris et al., 2003).
erence to compare with molecular marker Following the sequencing of the leaf-rust-
information of other cereals gave a result resistance locus Rph7 from barley, it was
which indicated many more rearrangements observed that this locus is flanked by two
than had been expected from Gale and HGA genes. The orthologous locus in rice
Devoss (1998) concentric circles model. chromosome 1 consists of five HGA genes.
One such comparison involved more than In barley, only four of the five HGA genes
2600 mapped sequenced markers in maize are present, one is duplicated as a pseudo-
among which only 656 putative ortholo- gene and six additional genes have been
gous genes could be identified (Salse et al., inserted in between the HGA genes. These
2004). The comparison of the wheat genetic six genes have homologues on eight dif-
map with the rice sequence also suggests ferent rice chromosomes (Brunner et al.,
numerous rearrangements between the two 2003). The most striking rearrangement
genomes with a high frequency of break- was revealed by the comparison of 100 kb
downs in collinearity (Sorrells et al., 2003). around the Bronze locus of two maize lines.
Extensive comparisons have also been made Not only does the retrotransposon distribu-
between sorghum and rice (Klein et al., tion differ between the two lines but the
2003; The Rice Chromosome 10 Sequencing genes themselves could also be different (Fu
Consortium, 2003). To align the sorghum and Dooner, 2002). Comparison of the low
physical map with the rice map, sorghum molecular weight glutenin locus between
BAC clones were selected from the mini- T. monococcum and Triticum durum also
mum tiling path of chromosome 3. Unique revealed dramatic rearrangements: more
partial sequences were obtained from each than 90% of the sequence diverged because
BAC clone and could be directly compared of retro-element insertions and because dif-
with the rice sequence. This approach ferent genes are present at this locus (Wicker
98 Chapter 3

et al., 2003). Therefore collinearity can be for identifying regions of cereal genomes
lost very rapidly within two genomes from that are prone to rapid evolution. Similar
the same species. comparative analyses of Arabidopsis acces-
With the sequencing of long regions, sions have shown that both the relocation
several studies in cereals have demon- of genes and the sequence polymorphisms
strated incomplete microcollinearity at the between accessions (in both coding and
sequence level. Song et al. (2002) identified non-coding regions) are common in the
orthologous regions from maize, sorghum Arabidopsis genome (The Arabidopsis
and two subspecies of rice. It was found Genome Initiative, 2000). Intraspecific vio-
that gross macrocollinearity is maintained lation of collinearity has also been identified
but microcollinearity is incomplete among in maize (Fu and Dooner, 2002). Han and
these cereals. Deviations from gene colline- Xue (2003) also discovered significant num-
arity are attributable to micro-rearrangement bers of rearrangements and polymorphisms
or small-scale genomic changes such as gene when comparing indica and japonica
insertions, deletions, duplications or inver- genomes in rice. The deviations from col-
sions. In the region under study, the orthol- linearity are frequently due to insertions or
ogous region was found to contain six genes deletions. Intraspecific sequence polymor-
in rice, 15 in sorghum and 13 in maize. In phisms commonly occur in both coding and
maize and sorghum, gene amplification non-coding regions. These variations often
caused a local expansion of conserved genes affect gene structures and may contribute to
but did not disrupt their order or orienta- intraspecific phenotypic adaptations.
tion. As indicated by Bennetzen and Ma
(2003), numerous local rearrangements dif- Implications of genome collinearity
ferentiate the structures of different cereal
genomes. On average, any comparison of a Genomics would be much simpler if the
ten-gene segment between rice and a dis- order of genes were common (syntenic)
tant grass relative such as barley, maize, across the major groups of plants. The
sorghum or wheat shows one or two rear- usefulness of the collinearity between the
rangements that involve genes. A simple genomes of model plants and important
extrapolation to the rice genome of about crops can be assessed by the number of
40,000 genes (Goff et al., 2002) suggests that failures or successes in its exploitation. For
about 6000 genic rearrangements occurred example, the analysis of the Arabidopsis
which differentiate rice from any of the sequence provides information that will
other cereals. Most of these rearrangements facilitate the annotation of the rice sequence
appear to be tiny and thus would not inter- and likewise sequencing Medicago provides
fere with the macrocollinearity observed by a resource for research on important crop
recombinational mapping. There are excep- legumes. Furthermore, the effort put into
tions however, which include chromosomal sequencing and annotating the rice genome
arm translocations and movements of single has also been rewarded, as this annotation
genes to different chromosomes (Bennetzen will be transferred to related sequences and
and Ma, 2003). used repeatedly in the future. The synteny
As expected, there is a high degree of between the monocots will help decipher
gene conservation between the two shot- the structure and function of the more
gun-sequenced subspecies of rice, japonica complex genomes. A fully assembled rice
and indica, which diverged more than 1 sequence allows more accurate assessment
million years ago. On careful inspection, of the macro- and microsynteny of rice with
however, narrow regions of divergence can other cereals (Xu et al., 2005).
be found in these genomes (Song et al., The advent of technologies for map-
2002). These regions correspond to areas of ping genomes directly at the DNA level has
increased divergence among rice, sorghum made comparative genetic mapping among
and maize, suggesting that the alignment sexually incompatible species possible.
of the two rice subspecies might be useful Extensive comparative maps for marker
Omics and Arrays 99

genes have been constructed for a number of of divergence among grass species. When
plant taxa, including species in the Poaceae evaluating 124 CISPs across rice, sorghum,
(rice, maize, sorghum, barley and wheat), millet, Bermuda grass, teff, maize, wheat
Solanaceae (tomato, potato and pepper) and barley, about 18.5% of them seemed
and Brassicaceae (Arabidopsis, cabbages, to be subject to rigid intron size constraints
mustard, turnip and rape). As a result, the that were independent of per-nucleotide
concept of a single genetic or ancestral DNA sequence variation. Likewise, about
map for all grasses, with species-specific 487 conserved non-coding sequence motifs
modifications, is emerging (Moore et al., were identified in 129 CISP loci. As pointed
1995). The extensive collinearity of wheat, out by Feltus et al. (2006), CISP provides the
rye, barley, rice and maize suggests that it means to effectively explore poorly char-
may be possible to reconstruct a map of the acterized genomes for both polymorphism
ancestral cereal genome. These conserved and non-coding sequence conservation on
gene orders and the possibility of sharing a genome-wide or candidate gene basis and
DNA probes and PCR primers across spe- also to anchor points for comparative genom-
cies will greatly extend the power of map- ics across a diverse range of species. After
ping analysis by facilitating the molecular the whole genomes of the major food crops
analysis of the corresponding chromosomal have been sequenced, plant breeders will be
regions in different species and allowing able to access new gene tools that will facili-
information, and perhaps DNA sequences tate the selection of outstanding individu-
and genes, to be transferred quickly and als characterized by resistance to biotic and
efficiently between different species. abiotic stresses and good seed quality, thus
The challenge of finding which map, enabling breeders to produce new cultivars
sequence and eventually functional genomic in addition to those currently available.
information from one species can be accessed, As a fundamental tool in biology, com-
compared and exploited across all plant spe- parative analysis has been extended from
cies will require the identification of a subset being focused on a specific field to biology
of plant genes that have remained relatively as a whole. With the growing availability of
stable in both sequence and copy number phenotypic and functional genomic data,
since the radiation of flowering plants from comparative paradigms are now also being
their last common ancestor. Identification of extended to the study of other functional
such a set of genes would also facilitate taxo- attributes, most notably gene expression.
nomic and phylogenic studies in higher plants Microarray techniques present an alterna-
that are presently based on a very small set of tive method of studying differences between
highly conserved sequences, such as those closely related genomes. Advances in micro-
of chloroplast and mitochondrial genes. The array-based approaches (see Section 3.6)
conserved orthologue set of markers, identi- have enabled the main forms of genomic var-
fied computationally and experimentally, iation (amplifications, deletions, insertions,
may further studies on comparative genomes rearrangements and base-pair changes) to be
and phylogenetics and elucidate the nature of detected using techniques that can easily be
genes conserved throughout plant evolution. undertaken in individual laboratories using
Completed genome sequences provide simple experimental approaches (Cresham
templates for the design of genome analysis et al., 2008).
tools in orphan species lacking sequence Tirosh et al. (2007) reviewed recent
information. For example, Feltus et al. studies in which comparative analysis was
(2006) designed 384 PCR primers to con- applied to large-scale gene expression data-
serve exonic regions flanking introns using bases and discussed the central principles
sorghum and millet EST alignments to the and challenges of such approaches. As differ-
rice genome. These conserved-intron scan- ent functional properties often co-evolve and
ning primers (CISP) amplified single-copy complement one another, their combined
loci with 3780% success rates; i.e. sampling analysis reveals additional insights. Unlike
most of the approximately 50 million years sequence-based genetic map information
100 Chapter 3

however, most functional properties are ogy. Depending on the type of molecules that
condition-dependent, a property that needs are arrayed, microarrays can also be based on
to be accounted for during interspecies com- proteins, tissues or carbohydrates.
parisons. Furthermore, functional proper- An array is an orderly arrangement of
ties often reflect the integrated function of samples. It provides a medium for match-
multiple genes, calling for novel methods ing known and unknown molecular samples
that allow network-centred rather than gene- based on base-pairing (i.e. A-T and G-C for
centred comparisons. Finally, one of the DNA; A-U and G-C for RNA) or hybridiza-
main challenges in comparative analysis is tion and automating the process of identify-
the integration of different data types which ing the unknowns. From its origin as a new
is becoming particularly important as addi- technique for large-scale DNA mapping and
tional data types are being accumulated. The sequencing and initial success as a tool for
lack of appropriate descriptors and metrics transcript-level analyses, microarray technol-
that succinctly represent the new informa- ogy has spread into many areas by adapting
tion originating from genomic data is one of the basic concept and combining it with other
the roadblocks on this path. Galperin and techniques. Microarray-based processes,
Koller (2006) outlined recent trends in com- either mature or under development, include
parative genomic analysis and discussed transcriptional profiling, genotyping, splice-
some new metrics that have been used. This variant analysis, identification of unknown
issue is related to the ontology concept and is exons, DNA structure analysis, chromatin
discussed in detail in Chapter 14. immunoprecipitation (ChIP)-on-chip, protein
binding, proteinRNA interaction, chip-based
comparative genomic hybridization, epige-
3.6 Array Technologies in Omics netic studies, DNA mapping, re-sequencing,
large-scale sequencing, gene/genome syn-
It is widely believed that thousands of genes thesis, RNA/RNAi synthesis, proteinDNA
and their products (i.e. RNA and proteins) in interaction, on-chip translation and universal
any given living organism function in a com- microarrays (Hoheisel, 2006).
plicated and orchestrated manner. However, In this section, the basic procedures
traditional methods in molecular biology of arraying will be introduced and several
generally work on a one gene in one experi- major microarray technologies and plat-
ment basis which means that the through- forms will be briefly described. The two
put is very limited and the whole picture volumes of DNA Microarrays (Kimmel and
of gene function is difficult to obtain. In the Oliver, 2006a, b) provide a comprehensive
late 1990s, a new technology known as a coverage of all the related fields from tech-
biochip or DNA microarray, attracted great nologies and platforms to data analysis.
interest among biologists. This technology The reader is also referred to Zhao and Bruce
promised to monitor the whole genome on a (2003), Amratunga and Cabrera (2004),
single array so that researchers would have Mockler and Ecker (2004), Subramanian
a better picture of the interactions among et al. (2005), Allison et al. (2006), Hoheisel
thousands of genes at the same time. (2006) and Doumas et al. (2007).
Various terminologies have been used in
the literature to describe this technology; for
DNA microarrays these include, but are not 3.6.1 Production of arrays
limited to, biochip, DNA chip, DNA micro-
array and gene array. Affymetrix, Inc. owns Complementary strands of DNA and nucleic
a registered trademark, GeneChip, which acids in general can pair in a duplex via non-
refers to its high density, oligonucleotide- covalent binding. This fundamental charac-
based DNA arrays. However, in some articles teristic is used in all DNA array techniques.
appearing in professional journals, popular Amaratunga and Cabrera (2004), Arcellana-
magazines and on the Internet, the term gene Panilio (2005) and Doumas et al. (2007)
chip(s) has been used as a general terminol- describe the principles of DNA miroarray
ogy that refers to DNA microarray technol- technology and how they are prepared and
Omics and Arrays 101

used. First, two terms related to microarrays, resulting in a dramatic increase in through-
probe and target, should be introduced. The put. In GeneChips (http://www.affymetrix.
gene-specific DNA spotted on to the array com/) the probe array was designed using
is referred to as the probe and the sample to an optimal set of oligonucleotides selected
be tested that will hybridize with the probe using computer algorithms and manu-
is referred to as the target. The same probe factured using Affymetrix light-directed
spotted on to the array can be repeatedly chemical synthesis. Fluorescent labels were
hybridized with many different targets (sam- used for hybridization and detection and
ples). An experiment using a single DNA the Affymetrix software suite was used for
chip can provide researchers with informa- data analysis and database management.
tion on thousands of genes simultaneously, Figure 3.10 illustrates a flowchart showing

EST database or
cDNA library

Treatment 1 Treatment 2
PCR inserts
from EST clones RNA 1 RNA 2

Multi-well plates Cy5-cDNA 1 Cy5-cDNA 2

Hybridization
Spotting

Laser Wash
Dry
scanning

10000 10000

1000 1000

100 100

10 10
1 1
1 10 100 1000 10000 1 10 100 1000
Treatment 1 Treatment 2 Treatment 1 Treatment 2

0 0.5 1 4 8 24 168 0 0.5 1 4 8 24 168


15 15

10
Mean fold-decrease compared to 0h

10
Mean fold-increase compared to 0h

7.5 7.5

5.0 5.0

4.0 4.0

3.0 3.0

2.0 2.0

1.0 1.0

Treatment 1 Treatment 2

Fig. 3.10. A flowchart for a general microarray process.


102 Chapter 3

a general microarray process. As DNA the complementary sequences are then


microarrays for whole genome expression determined. This technology, historically
profiling are the most mature and widely known as DNA chips, was developed at
used technology, they will be used in this Affymetrix, Inc. which sells its photolitho-
chapter as an example to describe the basic graphically fabricated products under the
procedures of microarrays. GeneChip trademark. Many companies are
now manufacturing oligonucleotide-based
Types of arrays microarrays using alternative in-situ syn-
An array experiment can be carried out using thesis or depositioning technologies.
common assay systems such as microplates
or standard blotting membranes; the arrays Source of arrays
can be created by hand or robotics used to
deposit the sample. In general, arrays are A collection of purified single-stranded DNA
described as macroarrays or microarrays, is the initial requirement. A drop of each type
the difference being the size of the sample of DNA in solution is placed on to a specially
spots. Macroarrays contain sample spot sizes prepared glass microscope slide by a robotic
of about 300 m or larger and can be easily machine known as an arrayer. This process
imaged with existing gel and blot scanners. is called arraying or spotting and consists
The sample spot sizes in microarrays are of binding a library of synthetic DNA on
typically less than 200 m in diameter and to a minimum surface area in a dense and
these arrays usually contain thousands of homogeneous fashion. The major difference
spots. Microarrays also require specialized between various types of DNA arrays lies in
robotics and imaging equipment. the density of the bound probes and the man-
There are two main types of arrays, ner in which these probes have been syn-
nylon and glass. Nylon arrays can contain thesized. The arrayer can quickly produce a
up to about 1000 probes per filter. The regular grid of thousands of spots in an area
target can be labelled using radioactive the size of a dime ( 1 cm2), small enough to
chemicals and detection of hybridization fit under the coverslip of a standard slide.
can be achieved using a phosphorimager The DNA in the spot is bound to the glass to
or X-ray film. Glass arrays can hold up to prevent it from being dislodged during the
about 40,000 spots per slide or 10,000 per hybridization reaction and subsequent wash.
2 cm2 area (limited by the capabilities of the The DNA spotted on to the microar-
arrayer). The target sample is labelled with ray may be either cDNA (in which case the
fluorescent dyes and detection of hybridiza- microarray is called a cDNA microarray),
tion requires specialized scanners. oligonucleotides (in which case it is called
There are two variants of the DNA micro- an oligonucleotide array), subgenomic
array technology in terms of the properties of regions of specific chromosomes or even
the arrayed DNA sequence of known identity. the entire set of genes. The DNA spotted on
Format I: the probe cDNA (5005000 bases to cDNA microarrays are cloned copies of
long) is immobilized on a solid surface such cDNA that have been amplified by PCR and
as glass using robot spotting and exposed to a which correspond to the whole or part of a
set of targets either separately or in a mixture. fully sequenced gene or putative ORF; ESTs
The development of this method, known tra- are commonly arrayed. The selection of
ditionally as a DNA microarray, is widely DNA probes to be spotted on to the microar-
attributed to Stanford University. ray depends on which genes are to be stud-
Format II: an array of oligonucleotide ied. For plants whose genomes have been
(2080-mer oligos) or peptide nucleic acid completely sequenced, it is possible to array
(PNA) probes is synthesized either in situ genomic DNA from every known gene or
(on-chip) or by conventional synthesis putative ORF. To obtain sufficient DNA for
followed by on-chip immobilization. The arraying, each gene or putative ORF from
array is exposed to the labelled DNA sample the total genomic DNA can be amplified by
and hybridized, the identity/abundance of PCR or each cDNA can be cloned and large
Omics and Arrays 103

numbers of identical DNA copies can be ing oligos close to the 3' end might also boost
generated by growing them in bacteria. signal intensity.
The DNA spots on a microarray are
produced either by synthesis in situ or by Slide substrates
deposition of the pre-synthesized product.
DNA synthesis in situ methods have largely Glass microscope slides are the solid sup-
been within the purview of commercial port of choice and they should be coated
companies. In this method, 2025-bp long with a substrate that favours binding of the
gene-specific oligonucleotides are gener- DNA. Development of substrates on atomi-
ated in situ on a silicon surface by combin- cally flat slide surfaces and minimum back-
ing a standard DNA synthesis protocol with ground for higher signal-to-noise ratios has
phosphoramidite reagents modified with contributed to the improvement of data
photolabile 5'-protecting groups (Doumas quality (Arcellana-Panilio, 2005). Different
et al., 2007). The activation for oligonucle- versions of silane, amine, epoxy and alde-
otide elongation is achieved using a mask hyde substrates which attach DNA by either
(Affymetrix; http://www.affymetrix.com) ionic interaction or covalent bond forma-
or maskless (NimbleGen; www.nimblegen. tion are commercially available.
com) method. Alternatively, the reagents
can be delivered to each spot using ink-jet Arrays and spotting pins
technology (Agilent; http://www.agilent.
com). Ongoing research and development The physical process of delivering the DNA
efforts ensure the optimum design of the to pre-determined coordinates on the array,
DNA content and continued technologi- involves spotting pens or pins carried on a
cal advancements enable the production of print head that is controlled in three dimen-
increasingly higher-density arrays. sions by gantry robots with sub-micron pre-
cision. A total of 30,000 features of 90-m
diameter can easily be spotted on to a 25 75-
Array content mm slide with a maximum spotting density
of over 100,000 features per slide. There are
The choice of DNA type to print is funda- several DNA arraying technologies, includ-
mental. The sequence of the cDNA could be ing high speed robotic printing of DNA
several hundred to a few thousand base pairs fragments on glass (usually PCR amplified
long. The DNA spotted on oligonucleotide cDNAs), high speed robotic printing of long
arrays consist of synthesized chains of oligo- oligonucleotides (70-mers; Agilent technol-
nucleotides corresponding to part of a known ogy and many academic facilities), synthesis
gene or putative ORF; each oligonucleotide is of oligonucleotides (25-mers) on micro-chips
usually about 2570 bp long. In an oligonu- using photolithographic masks (Affymetrix
cleotide array, a gene is generally represented GeneChips) and synthesis of oligonucle-
by several different oligonucleotides and otides (2570-mers) on microchips using
they are carefully chosen for maximal specif- maskless aluminium mirrors (NimbleGen
icity. Longer stretches of DNA such as those GeneChips). Improvements in arraying sys-
obtained from PCR of cDNA clones produce tems have included shorter printing times
robust hybridization signals but less specifi- and longer periods of walk-away operation.
city. Short oligonucleotides (2430 nt) have Arrayers are invariably installed within
greater discrimination and are also suitable controlled-humidity cabinets to maintain an
for assessing single-nucleotide changes. Long optimum environment for printing.
oligonucleotides (5070 nt) afford an excel-
lent compromise between signal strength and
specificity and their use has increased among
academic core facilities (Arcellana-Panilio, 3.6.2 Experimental design
2005). Choosing oligos corresponding to the
3' untranslated regions (3'UTR) increases the Careful experimental design is required
likelihood of their being specific and design- to determine the type of array to run; how
104 Chapter 3

many replicates to use; and which samples to 1025 g total RNA for cDNA spots and
will be hybridized to obtain meaningful long oligonucleotide arrays. In some cir-
data amenable to statistical analysis, upon cumstances it becomes necessary to amplify
which sound conclusions can be drawn. the RNA in the sample to obtain adequate
A biological question must first be framed amounts for labelling and hybridization to
and a microarray platform then chosen, fol- an array.
lowed by a decision on biological and tech- To prepare the labelled sample, the
nical replicates and the design of a series of first step is to purify mRNA from total cel-
hybridizations. lular contents. There are several challenges
Microarray experimental design is usu- involved: (i) mRNA accounts for only a
ally governed by the aim of the experiment. small fraction (less than 3% of all RNA in a
An important aspect of experimental design cell) so isolating mRNA in sufficient quan-
is deciding how to minimize variation which tity for an experiment (12 g) can be a chal-
can be thought of as occurring in three lay- lenge. Common mRNA isolation methods
ers: biological variation, technical variation take advantage of the fact that most mRNAs
and measurement error. Replication is the have a poly-adenine, poly(A), tail. These
easy answer to dealing with variation. To poly(A) mRNAs can be purified by captur-
make the best use of available resources, it ing them using complementary oligodeoxy-
is important to know what to replicate and thymidine (oligo(dT) ) molecules bound to
how many replicates to apply. Hybridization a solid support such as a chromatographic
of two samples to the same slide is made column or a collection of magnetic beads.
possible by labelling each sample with (ii) The more heterogeneous the cells, the
chemically distinct fluorescent tags. This more difficult it is to isolate mRNA specific
also provides the opportunity to make direct to the study. (iii) Captured mRNA degrades
comparisons between samples of primary very quickly and the mRNA has to be imme-
interest (Arcellana-Panilio, 2005). Using a diately reverse-transcribed into more stable
common reference becomes more efficient cDNA (for cDNA microarrays). The reverse
when a large number of samples need to transcription reaction usually starts from
be compared. When an experiment is test- the poly(A) tail of the mRNA and moves
ing the effect(s) of multiple factors, a well- toward its head; such a reaction is described
thought-out design is extremely critical so as oligo(dT)-primed.
that resources are not wasted on eventually
useless comparisons.
3.6.4 Labelling

3.6.3 Sample preparation Before hybridization to DNA arrays or chips,


the target (sample) has to be labelled to allow
Preparation of DNA samples for hybridiza- its subsequent detection. There are several
tion can follow general DNA extraction pro- methods that have been developed histori-
tocols. So here we will focus on RNA sample cally to detect or identify hybrid DNA mol-
preparation as described by Arcellana- ecules including the use of hydroxylapatite,
Panilio (2005). The sources of RNA for the radioactive labelling, enzyme-linked detec-
samples that will be hybridized to a micro- tion and fluorescent labelling depending on
array may be obtained from different types the nature of the chip, whether it is glass or
of cells or tissues. Obtaining pure, intact nylon. In order to be able to detect which
RNA, free from DNA or protein contamina- cDNAs are bound to the microarray, the sam-
tion, is important, while the homogeneity of ple is labelled with a reporter molecule that
the RNA source itself as defined by the bio- flags their presence. The reporters currently
logical question being asked must be con- used in microarray experiments are fluores-
sidered. The amount of RNA required for cent dyes known as fluors or fluorophores,
hybridization ranges from as little as 25 g chemicals that fluoresce when exposed to a
total RNA for short oligonucleotide arrays specific wavelength of light. A differently
Omics and Arrays 105

coloured fluor is used for each sample so cDNA whose sequence is complementary
that the two samples can be differentiated to the DNA on a given spot, that cDNA
on the array. will hybridize to the spot where it will be
The cDNA or mRNA can be labelled detectable by its fluorescence. In this way,
either directly or indirectly. In the direct every spot on an array is an independent
labelling procedure, fluorescently labelled assay for the presence of a different cDNA.
nucleotide is incorporated into the cDNA Hybridization is achieved by pouring the
products as it is being synthesized. With labelled sample on to the array and allow-
this method, a difference in the steric hin- ing it to diffuse uniformly. It is then sealed
drance conferred by different label moie- in a hybridization chamber and incubated
ties causes some labelled nucleotides to at a specific temperature for a period of time
be more efficiently used than others, pro- sufficient to allow hybridization reactions
ducing a dye bias in which one sample is to complete. The experimental conditions
labelled at a higher level overall than the should ensure that all areas of the array are
other. Cyanine 3 (Cy3) and cyanine 5 (Cy5) exposed to a uniform amount of labelled
are large molecules that reduce reverse sample throughout the hybridization.
transcriptase efficiency of long transcripts Hybridizations are processed directly
and certain sequences. Cy3-nucleotide on the slides after target synthesis. The
tends to be incorporated at a higher fre- hybridization step is literally where every-
quency than Cy5 although this does not thing comes together, i.e. the labelled mol-
necessarily translate into a better labelled ecules find their complementary sequences
target. To prevent the dye bias, the indirect on the array and form double stranded
labelling approach was developed where hybrids which are strong enough to with-
RNA is reverse transcribed in the pres- stand stringent washes. As in the hybridiza-
ence of an amino allyl-modified nucle- tion of classical Southern and northern blots,
otide that enables the chemical coupling the objective is to favour the formation of
of fluorescent labels after the cDNA is hybrids and the retention of those which are
synthesized. If the coupling reaction goes specific. Hybridization conditions depend
to completion, the frequency of labelling on the length of probes arrayed on the slide
becomes independent of the fluorophore and need to be extensively tested before
(Arcellana-Panilio, 2005). analysis. As an example, probe melting tem-
The labelled sample is the target for the peratures range from 42 to 70C depending
experiment. The number of fluor molecules on the nature of the buffer: the presence of
that label each cDNA depends on its length formamide exerts a positive effect on buffer
and also possibly its sequence composi- stringency in Denhardt-type buffers which
tion. For an RNA sample, either total RNA are used at 42C, whereas Sarkosyl-based
or mRNA is typically isolated and labelled buffers are commonly used around 70C.
using a first strand cDNA synthesis step Exogenous DNA (e.g. salmon sperm and
either by direct incorporation of a fluores- Cot-1 DNA) reduces background by block-
cent dye or by coupling the dyes to a modi- ing areas of the slide with a general affinity
fied nucleotide. For non-expression-based for nucleic acid or by titrating out labelled
experiments, DNA rather than RNA can be sequences that are non-specific. Denhardts
labelled and hybridized to the array. reagent (containing equal parts of Ficoll,
polyvinylpyrrolidone and bovine secum
albumin) is also used as a blocking agent.
Detergents such as SDS reduce surface
3.6.5 Hybridization and tension and improve mixing while help-
post-hybridization washes ing to lower background at the same time.
Temperature is an important factor that can
The array holds hundreds or thousands be manipulated during the hybridization
of spots, each of which contains a differ- and post-hybridization washes of microar-
ent DNA sequence. If a sample contains a rays and here much can be learned from
106 Chapter 3

what has already been established for end models enable excitation at several
Northern or Southern blots. For microar- wavelengths and offer dynamic focus, lin-
rays to be useful as a means of quantifying ear dynamic range over several orders of
expression the target has to be present in lim- magnitude and options for high-throughput
iting concentrations and the probe must be scanning. The objective of the scanning pro-
present in sufficient excess so as to remain cedure is to obtain the best image, where the
virtually unchanged even after hybridiza- best is not necessarily the brightest (to avoid
tion (Arcellana-Panilio, 2005). One impor- over-saturation beyond the signal range) but
tant feature of fluorescence detection is that is the most faithful representation of the
it allows the simultaneous hybridization of data on the slide.
two to several targets that have been differ- Although it is only supposed to pick up
ently labelled. the light emitted by the target cDNAs bound
The quality of the hybridization can to their complementary spots, the scanner
be assessed by spotting the sample with a will inevitably pick up light from various
set of hybridization control genes, spiking other sources, including the labelled sam-
the labelled sample with a known amount ple hybridizing non-specifically to the glass
of these controls prior to exposure to the slide, residual (unwanted) labelled probe
array and verifying that these control genes adhering to the slide, various chemicals
are indeed showing up as having been used in processing the slide and even the
hybridized. slide itself. This extra light creates back-
ground signals. Once signal and background
values are clearly defined, which is specific
to each experiment, data can be extracted
3.6.6 Data acquisition and quantification from the image by counting the pixels with
each probe and background area and record-
Once the wet phase (e.g. slide hybridization ing this in a computer readable format.
and washing off any excess labelled sample) Data extraction from the image involves
is completed, signal detection of each of the several steps (Arcellana-Panilio, 2005):
hybridization targets can be captured, that (i) gridding or locating the spots on the
is, the array must be scanned to determine array; (ii) segmentation or assignment of
how much of each labelled sample is bound pixels either to foreground (true signal)
to each spot. The signal is acquired using or background; and (iii) intensity extrac-
array scanners, either a charge-coupled tion to obtain a new value for foreground
device (CCD) or a confocal microscope, and background associated with each spot.
typically equipped with lasers to excite the Subtracting the background intensity from
fluorophores at a specific wavelength and the foreground yields the true spot intensity
photo-multiplier tubes to detect the emitted which can be used as an approximation of
light. Spots with more bound sample will relative gene expression.
have more reporters and will therefore fluo-
resce more intensely. Whatever the scanner
resolution, the microarray spot diameter
needs to be five to ten times larger than 3.6.7 Statistical analysis and data mining
the scanner resolution which can be as lit-
tle as 5 m for the most recent models. The Huge data sets are generated by microar-
end-product of a microarray experiment is ray experiments. For example, 20 hybridi-
a scanned grey scale image whose inten- zation experiments with the Arabidopsis
sity measurements range from 0 to 216. The GeneChip generates a set of 2,624,000 data
image is usually stored in a 16-bit tagged points (8200 genes 16 oligonucleotides
image file format (tiff, for short). The most 20 hybridizations). Such a massive amount
basic scanner models offer excitation and of data prohibits any manual treatment.
detection of the two most commonly used Also experimental variability is generally
fluorophores (Cy3 and Cy5) whereas higher- significant and has to be managed in order
Omics and Arrays 107

to exploit the data properly. Allison et al. spots and background can be difficult espe-
(2006) examined five key components of cially when the spots fade gradually around
microarray analysis: (i) design (the develop- their edges. Detection efficiency might not
ment of an experimental plan to maximize be uniform across the slide, leading to exces-
the quality and quantity of information sive red intensity on one side of the array
obtained); (ii) pre-processing (processing and excessive green on the other.
of the microarray image and normaliza- Data normalization addresses system-
tion of the data to remove systematic vari- atic errors that can skew the search for
ation. Other potential pre-processing steps biological effects. One of the most com-
include transformation of data, data filtering mon sources of systematic error is the
and in the case of two-colour arrays, back- dye bias introduced by the use of differ-
ground subtraction); (iii) inference (testing ent fluorophores to label the target. Print-
statistical hypotheses, e.g. which genes tip differences can also lead to sub-grid
are differentially expressed); (iv) classifi- biases within the same array while scanner
cation (analytical approaches that attempt anomalies can cause one side of an array to
to divide data into classes with no prior seem brighter than the other. Normalization
information or into predefined classes); and across multiple slides to remove bias can be
(v) validation (the process of confirming the accomplished by scaling the within-slide
veracity of the inferences and conclusions normalized data. In practice, examining the
drawn in the study). box plots of the normalized data of individ-
Reproducible and reliable microarray ual arrays for consistency of width can usu-
results can be only achieved through quality ally indicate whether normalization across
control starting with data generation. Good arrays is required.
laboratory proficiency and appropriate data Spatial plots can locate background
analysis practices are essential (Shi et al., problems and extreme values. The shape
2008). Numerous software packages, both and spread of scatter plots and the height
free and commercial, are available for quan- and width of box plots give an overall view
tifying microarray data. Typically, the inter- of data quality that can give clues about the
preted array data will highlight a relatively effects of filtering and different normaliza-
small number of spots that deserve further tion strategies. Gene expression profiling
investigation. Alternatively, the overall pat- will be taken as an example for the rest
tern of profiling can be used as a finger- of this section. Clustering algorithms are
print to characterize specific phenotypes. means of organizing microarray data accord-
The quantified data from the images ing to similarities in expression patterns. In
are obtained in typical form of tab-delim- this case, co-expressed genes must be co-
ited text files. First, dust artefacts, comet regulated, and a logical follow-up to this
tails and other spot anomalies should be analysis is the search for regulatory motifs
identified and flagged so that they will not and the common upstream or downstream
enter the analysis. Pre-processing the quan- factors that may tie these co-expressed
tified data before formal analysis includes genes together. Treatments can be clustered
the flagging of ambiguous spots with inten- based on similarity in gene expression pro-
sities lower than a threshold defined by the files. Genes can be clustered based on simi-
mean intensity plus two standard devia- larity in expression patterns across profiles.
tions of supposedly negative spots (no Two mathematical approaches are often
DNA, buffer and/or non-homologous DNA used, hierarchical or k-means clustering
controls). (Stanford) and self organizing maps (SOMs)
Interpreting the data from a micro- (Whitehead Institute).
array experiment can be challenging. A strategy for identifying differentially
Quantification of the intensities of each spot expressed genes is to compute the t-statis-
is subject to noise from irregular spots, dust tic and correct for multiple testing using
on the slide and non-specific hybridization. adjusted P-values. The B-statistic, derived
Deciding the intensity threshold between using an empirical Bayes approach, has
108 Chapter 3

been shown in simulations to be far supe- array. Compared with DNA microarrays, the
rior to either log ratios or the t-statistic development of protein-based approaches
for ranking differentially regulated genes poses technical problems for several rea-
(Lonnstedt and Speed, 2002). The twofold sons (Bernot, 2004): (i) proteins consist of
change continues to be a benchmark for 20 distinct amino acids while there are only
researchers perusing lists of microarray data four bases in DNA; (ii) depending on their
in order to validate the data by PCR, which amino-acid composition, proteins may be
can provide independent confirmation of hydrophilic, hydrophobic, acidic or basic
the expression patterns of specific genes. (while DNA is always hydrophilic and neg-
However, fold change has become more of atively charged); and (iii) proteins are often
a secondary criterion for the selection of post-translationally modified (by glycosyla-
candidates for follow-up from a list of genes tion, phosphorylation, etc.).
ranked according to more reliable measures Although detection of protein micro-
of differential expression (Arcellana-Panilio, arrays can be carried out using general
2005). After preliminary data mining and detection methods as described above, the
statistical analysis, validation and follow- problem is that protein concentrations in
up experiments can be designed. a biological sample may be many orders of
There are many examples of the array magnitude different from that of mRNAs.
technologies described in this section. In Therefore, protein array detection methods
yeast, 260,000 oligonucleotides correspond- must have a much larger range of detection.
ing to all the genes in yeast have been syn- The preferred method of detection is cur-
thesized on to a 1.28 cm2 chip. These chips rently fluorescence detection. Fluorescent
have allowed the identification of genes detection is safe, sensitive and can produce
expressed in various mutants under differ- high resolution. The fluorescent detection
ent culture conditions or at different stages method is compatible with standard micro-
of growth. Numerous genes of unknown array scanners but some minor alterations
function have thus been recognized, regu- to software may need to be made.
lated in a manner similar to or opposite to Protein microarrays have been made
that of genes of known function; transcrip- in the following manner (Macbeath and
tion of the genome is thus incorporated into Schreiber, 2000; Bernot, 2004). Proteins
a vast combinatorial network. In plants, are deposited on to a support and subse-
Affymetrix has commercialized microchips quently fixed to it. Thus 1600 distinct
to evaluate the expression of Arabidopsis proteins may be arranged per cm2. These
genes, allowing the identification of genes arrays are ordered so that it is known which
that are active during pathogen infection protein is represented by any given spot.
or during treatment with herbicides, fun- The microarrays are then incubated with
gicides or insecticides. This also facili- other ligands (fluorescently labelled) and
tates the determination of which genes are the result of the hybridization is analysed
transcribed in which tissues under which by confocal microscopy (it is also possible
conditions or during which stages of devel- to employ radioactively labelled ligands).
opment. Commercial microarrays are also The protein recognized may be identified
available from Affymetrix for several other using the signal localization data obtained.
crop plants such as maize and tomato. The intensity of the signal obtained is pro-
portional to the level of ligandprotein
interaction.
Except for the most frequently used
3.6.8 Protein microarrays and others DNA and protein microarrays discussed
above, other microarrays include those built
A protein chip or microarray is a piece of using tissues (cells) and carbohydrates.
glass on which different molecules of pro- Similar to other microarrays, a tissue chip
tein have been affixed at separate locations or microarray is a piece of glass on which
in an ordered manner to form a microscopic different tissues have been affixed, while
Omics and Arrays 109

sugar or carbohydrate microarrays include homogenous solution rather than on a solid


oligosaccharides, polysaccharides/glycans support (Hoheisel, 2006). The establish-
and glycoconjugates fixed on an array. ment of zip-code arrays can address these
Carbohydrates are very different from pro- problems by separating the actual assay
teins in the following aspects: (i) carbohy- from the microarray hybridization (Gerry
drates are highly heterogeneous as they have et al., 1999). Such microarrays contain a
a large number of different molecules deter- set of unique and distinct oligonucleotides
mined by over 500,000 different oligosac- that are immobilized at known locations.
charides units; (ii) their synthesis is very Because they should not be complemen-
complicated and involves a larger number tary to any sequence in any organism and
of enzymes; and (iii) biological information are made solely to identify the address of
that is stored in the various types of carbo- a particular location on the microarray, they
hydrates is less well understood. For these are called zip-code sequences (Fig. 3.11).
reasons, carbohydrate microarrays will be a The oligonucleotides are designed to have
useful tool for glycomics. similar thermodynamic properties and thus
A new technology that is related to hybridization can be carried out at one tem-
microarrays and should be mentioned is perature and under defined stringency con-
microfluidics. Microfluidics is the science ditions. Instead of having to produce many
and technology of systems that process or different microarrays, a single design can be
manipulate small (1091018 l) amounts used for various assays.
of fluids using channels with dimen- For example, Hoheisel (2006) described
sions of tens to hundreds of micrometres a universal microarray option that involves
(Whitesides, 2006). The first applications using the L-DNA enantiomer, the mir-
have a number of useful characteristics: ror image form of normal D-form DNA, for
(i) the ability to use very small quantities of the zip-code oligomer (Fig. 3.11). Because
samples and reagents and to carry out sepa- L-DNA forms a left-helical duplex, there is
rations and detections with high resolu- no cross-hybridization between L-DNA and
tion and sensitivity; (ii) low cost; (iii) short D-DNA. However, chimaeric molecules that
analysis times; and (iv) small footprints for consist of L-form and D-form stretches can be
the analytical devices. Microfluidics offers produced by standard chemistry. Therefore,
fundamentally new capabilities in the con- D-DNA primers are produced with an L-DNA
trol of concentrations of molecules in space zip-code tag that binds to the L-DNA com-
and time. In the areas of microanalysis, plementary oligonucleotide on the microar-
microfluidics offers approaches for bio- ray. L-DNA microarrays are stable because
logical analyses that require much greater L-DNA is resistant to nuclease activities.
throughput and higher sensitivity and reso- Simultaneously, only the zip-code part of
lution than were previously required. It has the molecules that is used in homogenous
great potential to improve the analytical solution is able to hybridize to the array.
processes involved in proteomics, DNA iso- Neither the D-formed primer portion nor the
lation, PCR and DNA sequencing. analyte (for example, genomic DNA or RNA
preparations) will cross-hybridize with the
array.

3.6.9 Universal chip or microarray

Most microarray platforms are designed 3.6.10 Whole-genome analysis using


to address a specific set of questions in a tiling microarrays
specific organism. This means that a spe-
cific microarray platform needs to be estab- The recent explosion in available genome
lished and produced for each application. sequence data has made it realistic to
Moreover, many assays that are carried out undertake microarray analysis at the whole-
on microarrays would work even better in a genome level. Interestingly, these sequence
110 Chapter 3

ddTTP ddGTP

ddCTP D-form 5
ddATP gene-specific
primer L-form
zip-code
A
T Genomic DNA
Molecule separation
Base discrimination by primer extension in solution on zip-code array

Genotyping

ProteindsDNA interactions

ProteinssDNA
Epigenetic studies interactions

Protein selection
CGH or attachment
by aptamers

Splice variant studies

Transcriptional profiling

D-form 5
Primer extension gene-specific
and labelling primer L-form
zip-code
AAAAAA-3

Sample 1

D-form 5 Hybridization to
Primer extension gene-specific
L-form
zip-code array
and labelling primer
zip-code
AAAAAA-3

Sample 2

Fig. 3.11. The concept of universal microarray. dsDNA, double-stranded DNA; SSDNA, single-stranded
DNA; CGH, comparative genomic hybridization.

data have led to the advent of high-density genomic content and should provide a dra-
DNA oligonucleotide-based whole-genome matic improvement in the understanding
tiling microarrays (WGAs) which can be of numerous biological processes. WGAs
employed to interrogate a full genomes comprise relatively short (< 100-mer) oligo-
worth of sequence data in a single experi- nucleotide features. Furthermore, they can
ment. This technology allows a more be created with > 6,000,000 discrete fea-
complete understanding of an organisms tures, each comprising millions of copies
Omics and Arrays 111

of a distinct DNA sequence. For instance, with sequence characteristics functions is


the Affymetrix GeneChip Arabidopsis a molecular marker popularly known as
tiling 1.0R array (http://www.affymetrix. single feature polymorphism (SFP). Using
com) is a single array comprising over 3.2 this approach a large number of SFPs were
million perfect match and mismatch probe identified between two laboratory strains of
pairs (approximately 6.4 million probes in yeast (Winzeler et al., 1998).
total) tiled with 35-bp spacing throughout For the larger and more complex
the complete non-repetitive A. thaliana Arabidopsis genome, tiling arrays were
genome (Zhang, X. et al., 2006). not available and hence the first experi-
WGAs can be employed for a myriad of ments involved hybridization of labelled
purposes in plants including empirical anno- genomic DNA using Affymetrix AtGenome1
tation of the transcriptome characterization, GeneChips based on available, expression-
mapping of regulatory DNA motifs using based annotation for ORFs. Despite this
ChIP-on-chip, novel gene discovery, analy- ORF-based focus, nearly 4000 SFPs were
sis of alternative RNA splicing, characteri- identified between the Columbia and
zation of the methylation state of cytosine Landsberg erecta accessions (Borevitz et al.,
bases throughout a genome (methylome) 2003). In order to determine genome-wide
and the identification of sequence poly- patterns of SFP, hybridization to the ATH1
morphism (Gregory et al., 2008). Overall, gene expression array was used to inter-
implementing standardized protocols for rogate genomic DNA diversity in 23 wild
RNA labelling, hybridization, microarray strains (accessions), in comparison with the
processing, data acquisition and data nor- reference strain Columbia. At < 1% false
malization within the plant community will discovery rate, 77,420 SFPs with distinct
minimize sources of error and data variabil- patterns of variation were detected across
ity between laboratories and across micro- the genome. Total and pair-wise diversity
array platforms. In this way WGA analysis was higher near the centromeres and the
among a diverse set of groups will results heterochromatic knob region (Borevitz et al.,
in high-quality, easily reproducible data 2007). By high-density array re-sequencing
that will aid the research of the entire plant of 20 diverse strains (accessions), more than
community. 1 million non-redundant SNPs were identi-
fied (Clark et al., 2007). Salathia et al. (2007)
developed a microarray-based method that
3.6.11 Array-based genotyping assesses 240 unique indel markers in a sin-
gle hybridization experiment at a cost of
Array techniques have become increasingly less than US$50 in materials per line. The
popular as a tool for genome-wide genotyp- genotyping array was built with 70-mer oli-
ing since they offer an assay that is highly gonucleotide elements representing indel
multiplexed at a low cost per data point. One polymorphisms between Columbia and
of the earliest reports of microarray-based Landsberg erecta. Multi-well chips allow
genotyping employed high-density WGAs groups of 16 lines to be genotyped in a sin-
produced by photolithographic synthesis gle experiment.
(Affymetrix) for the simultaneous discovery Microarray-based genotyping has
and array of DNA polymorphisms in yeast. recently been further developed in several
In genotyping assays based on microarrays, crop plants. Using a high-density microarray
allelic variation is detected as the differ- technology pioneered at Perlegen Sciences
ential hybridization of labelled genomic (http://www.perlegen.com), the International
DNA to individual probes or sets of probes Rice Functional Genomics Consortium ini-
covering identifiable genomic locations. tiated a project to identify a large fraction
The polymorphism of the two sequences, of the SNPs presented in cultivated rice
originating from two different cultivars or through whole-genome comparisons of 21
genotypes, results in differential hybridiza- rice genomes, including cultivars, germ-
tion intensity and this property associated plasm lines and landraces (McNally et al.,
112 Chapter 3

2006). Perlegen designed SNP-discovery morphism were found across diverse rice
arrays to include all possible SNP variations accessions.
with multiple levels of redundancy. In soybean, the GoldenGate assay, which
Edwards et al. (2008) developed a micro- is capable of multiplexing from 96 to 1536
array platform for rapid and cost-effective SNPs in a single reaction, has been tested
genetic mapping using rice as a model. In to determine the success rate of converting
contrast to methods employing genome til- verified SNPs into working arrays (Hyten
ing microarrays for genotyping, the method et al., 2008). Allelic data were successfully
is based on low-cost spotted microarray generated for 89% of the 384 SNP loci when
production, focusing only on known poly- it was used in three recombinant inbred line
morphic features. A genotyping microarray (RIL) mapping populations. Using the same
was produced comprising 880 SFP elements system, two panels of 1536 SNP markers
derived from indels identified by aligning have been developed in maize through col-
genomic sequences of the japonica cultivar laboration between Cornell, CIMMYT and
Nipponbare and the indica cultivar 93-11. Illumina, one with SNPs developed from
The SFPs were experimentally verified by candidate genes relevant to drought toler-
hybridization with labelled genomic DNA ance and the other with SNPs randomly
prepared from the two cultivars. Using the distributed on the maize genome (Yan et al.,
genotyping microarrays, high levels of poly- 2009).
4
Populations in Genetics
and Breeding

Many types of populations are currently 4.1.1 Genetic constitution-based


being used in genetic studies and plant classification
breeding. The properties of a popula-
tion depend on how it is developed and
For two alleles, A1 and A2, at a specific
which parents are involved. Doubled
genetic locus, A, there are three different
haploids (DHs), recombinant inbred lines
possible genotypes, A1A1, A2A2 and A1A2.
(RILs) and near-isogenic lines (NILs) are
If a population consists of individuals
three important types of populations
that have an identical genotype (no matter
that have a long history of application
whether they are homozygous, A1A1 or A2A2
in plant breeding and have been widely
or heterozygous, A1A2, for locus A) it is said
used in genetic mapping, gene discovery
and genomics-assisted breeding since the to be homogeneous. However, if a popula-
tion consists of individuals that have differ-
discovery of DNA-based markers. This
chapter describes in general the struc- ent genotypes (for example, some with A1A1
ture, development and utilization of these or A2A2, and others with A1A2) it is said to
important genetic populations, based on a be heterogeneous.
comprehensive discussion by Xu and Zhu Based on the above definitions, there
(1994). More details on applications of are four types of populations:
these populations will be covered in other 1. Homogeneous populations with homo-
chapters. zygous individuals: such as individuals
from a cultivar of a self-pollinated species
or from an inbred derived from an open-
pollinated species.
4.1 Properties and Classification 2. Homogeneous populations with het-
of Populations erozygous individuals: such as F1 plants
derived from two homogeneous and homo-
Populations that are currently used in genet- zygous cultivars of self-pollinated species
ics and plant breeding can be classified and or between two inbreds derived from an
their properties can be described based on open-pollinated species.
their genetic constitution, maintenance, 3. Heterogeneous populations with homo-
genetic background and origin. zygous individuals: such as pure-breeding

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 113


114 Chapter 4

individuals derived from continuous self- heterogeneous background. A population


ing of a hybrid of two inbred lines or culti- that consists of lines with nearly identical
vars, such as RILs, where each individual is genetic backgrounds can be derived from
homozygous, either A1A1 or A2A2 while dif- genetic processes such as continuous back-
ferent individuals have different genotypes. crossing of a hybrid to one of its parental
4. Heterogeneous populations with hetero- lines so that lines only differ for a specific
zygous individuals: such as individuals in target trait or locus. All other types of popu-
early generations such as F2 and F3 derived lations, including F2, backcross (BC), RILs
from two inbred lines or homozygous cul- and DHs have heterogeneous backgrounds,
tivars. A set of open-pollinated cultivars of i.e. individuals within these populations
an open-pollinated species is a heterogene- have heterogeneous backgrounds and differ
ous population containing heterozygous not only in the target traits but also in the
individuals. remainder of the traits.

4.1.2 Genetic maintenance-based 4.1.4 Origin-based classification


classification
Populations can be classified into two basic
Based on whether a population can main- categories based on the origin of the indi-
tain its genetic constitution through selfing viduals they contain: populations of natu-
from one generation to another, populations ral cultivars and populations formed by
can be classified into two types: planned materials among selected parents
or genetic mating populations.
1. Tentative or temporary populations: indi-
viduals in a population, such as F2, F3, BC1,
BC2, etc., have different genotypes and their Populations of natural cultivars
genetic constitution will change with recom- These populations consist of a group or sub-
bination resulting from selfing or inbreeding. set of cultivars which are selected from a
These types of population are difficult to large number of cultivars for specific target
maintain and in most cases, the same genetic traits or are based on specific pedigree rela-
constitution can be only used once. tionships. The variation for the target trait
2. Permanent or immortalized populations: among groups of cultivars can be investigated
this type of populations consists of a set of and the relationship between the target trait
pure-breeding lines derived from two par- and other traits or molecular markers can be
ents or a common set of parents. Individuals established. For example, the genetic effect
within a line have identical genotypes, of plant height can be studied by comparing
while individuals from different lines have tall cultivars with short ones.
different genotypes. Each line can serve as
a segregation unit from the parental popu-
lation and population structure and genetic Populations formed by planned matings
constitution can be maintained consistently Mating populations are specifically designed
generation after generation through selfing for genetic studies and derived from a spe-
or inbreeding processes. cific genetic mating design using selected
genetic stocks. There are several genetic mat-
ing designs that are widely used in genetics
4.1.3 Genetic background-based and breeding.
classification
DIALLEL CROSSES. A total of n cultivars or
Populations can be classified into those inbred lines are selected as male and female
within which individuals have a nearly parents to produce crosses of all possible
isogenic background and those that have a combinations. The F1s or F2s derived from
Populations in Genetics and Breeding 115

these crosses are then genetically analysed. females, to produce crosses of all possible
The mating design is as follows: combinations.

Parent P1 P2 P3 Pn Cultivar 1 2 3 n1

P1 n1+1
P2 n1+2
P3 n1+3

Pn n1+n2

A full diallel analysis will include all NCIII: n individuals are selected from
one-way hybrids and parents while a partial an F2 population to backcross with two par-
or incomplete diallel analysis may contain ents, P1 and P2:
just half the diallel without reciprocals or
parents. Diallel crosses are usually used to F2 individual 1 2 3 n
estimate general combining ability for the
parents and special combining ability for P1
specific crosses, providing information for P2
producing hybrids.
TRIPLE TESTCROSS (TTC) AND SIMPLIFIED TTC
NORTH CAROLINA DESIGNS. There are three North (STTC). TTC is an extension of NCIII, where
Carolina designs, denoted by NCI, NCII, and n individuals (n > 20) are selected from an F2
NCIII. These designs are most often used in population to backcross with both parental
cross-pollinated crops and to study broad- lines, P1 and P2, and the F1 (P1 P2):
based populations. Their use in self-pollinated
crops usually involves many inbred lines that
can reasonably be considered to represent a F2 individual 1 2 3 n
large, reference population, e.g. late matur-
ing soybean adapted to a geographical belt of P1
P2
USA. To simplify the description, however,
F1
inbreds are taken as an example.
NCI: two inbred lines are crossed to
produce F2, and then some individuals In sTTC: n cultivars or strains (n > 20)
are randomly selected from the F2 popu- are selected from the germplasm pool to cross
lation as males to intermate with other with two cultivars or strains, PH and PL, which
randomly selected females. The offspring show extreme phenotypes (with the highest
derived from this intermating will be used and lowest phenotypic values), respectively.
in genetic analysis. The design can be
described as below: Strain 1 2 3 n

Males 1 2 3 PH
PL

Female 4 5 6 7 8 9 10 11 12 The populations derived from the above


genetic mating designs have been widely
used in conventional quantitative genetics
Offspring to study and subsequently exploit modes of
gene action determining the inheritance and
NCII: n parental lines are divided into expression of the target traits. The reader is
two groups, one as males and the other as referred to Hallauer and Miranda (1988) and
116 Chapter 4

Mather and Jinks (1982) or sections discuss- et al. (2007) reviewed various approaches
ing quantitative genetics in plant breeding for haploid production in plants. Forster
texts for details regarding the genetic infor- and Thomas (2004) and Szarejko and
mation that can be derived from the study Forster (2007) reviewed the use of DHs in
of hybrids or families formed using each of genetic studies and plant breeding. Recent
these mating designs. Some of these designs reviews on specific crop species are avail-
have also been used in genetic mapping of able for tomato (Bal and Abak, 2007) and
quantitative traits. nutraceutical species (Ferrie, 2007).

Inbreeding populations
4.2.1 Haploid production
This type of population includes segregat-
ing populations such as F2 and F3 popula- There are several approaches to haploid
tions which are derived from selfing or production. Naturally occurring haploids
sibmating an F1 hybrid, BC populations that have been reported in a number of species
are derived from backcrossing the F1 to one including tobacco, rice and maize. In bar-
of the parents or advanced BC populations ley, the hap initiator gene was reported to
derived by multiple backcrossings of the F1 control haploidy and spontaneous haploids
to one of the parents. were recovered at high frequency (Hagberg
Populations used in genetic studies and and Hagberg, 1980), with up to 8% haploid
plant breeding can be derived from any of the offspring being recovered when a cultivar
mating designs discussed above. For breed- that was homozygous for the hap allele was
ing purposes, the sizes of populations that used as the female parent to cross with other
will be maintained can be much smaller than cultivars, but none were produced from the
those used in genetic studies because breed- reciprocal cross. In maize, the indeterminate
ers only need to retain the populations with gametophyte gene (ig) results in a monoploid
desirable traits. For genetic studies, however, embryo either from the sperm cell or the egg
geneticists need to maintain as large a popu- cell (Kermicle, 1969). Although DHs can be
lation as possible and all types of segregates recovered from such spontaneous haploids,
including those with undesirable traits. their frequencies are usually too low for
genetics and breeding purposes.
With the recognition of the importance of
4.2 Doubled Haploids (DHs) DHs in plant breeding, extensive efforts have
been made to induce haploid embryogenesis
Cells or plants that contain a single com- and increase the frequency at which DHs
plete set of chromosomes are called hap- can be recovered. The benefits of DHs have
loid. Haploids derived from diploids are already been demonstrated in many research
called monoploid, while haploids derived and breeding programmes. This progress has
from polyploids are called poly-haploid. led to DH cultivars for commercial produc-
Diploids produced from chromosome dou- tion and DH populations for genetics and
bling of haploids are called doubled or breeding studies. In barley, over 100 culti-
double haploid (DH). The DH approach vars have been released and similar numbers
has several advantages that make it useful of rice and rapeseed DH cultivars have been
in genetics and plant breeding. DHs can be listed (Forster and Thomas, 2004). DHs have
produced via in vivo and in vitro systems. also been used successfully in recalcitrant
Haploid embryos are produced in vivo by species such as oat (Kiviharju et al., 2005)
parthenogenesis, pseudogamy or chromo- and rye (Tenhola-Roininen et al., 2006).
some elimination after extensive crossing. Maluszynski et al. (2003) edited a
The haploid embryo is rescued, cultured manual presenting a set of protocols for the
and chromosome-doubling produces DHs. production of DH in 22 major crop plant spe-
The in vitro methods include gynegenesis cies including four tree species. The manual
(ovary and flower culture) and androgene- contains various protocols and approaches
sis (anther and microspore culture). Forster to DH production that have been success-
Populations in Genetics and Breeding 117

fully used for different germplasm resources endosperm. Chromosome or genome prefer-
in each species. The protocols describe in ential or uniparental elimination arises as a
detail all the steps in DH production, from result of certain crosses; fertilization occurs
donor plant growth conditions, through in but soon afterwards the genome of one par-
vitro procedures, media composition and ent is preferentially eliminated. Haploids
preparation to regeneration of haploid plants can be produced by interspecific hybridi-
and methods for chromosome doubling. The zation followed by chromosome elimina-
manual enables the researcher to choose the tion. In barley, this extensive hybridization
most suitable method for production of DH method consists of crossing cultivated bar-
for their particular laboratory conditions and ley, Hordeum vulgare (2n = 2x = 14) with
plant materials, e.g. microspore versus anther the wild, diploid cross-pollinated peren-
culture, wide hybridization or gynogenesis. nial Hordeum bulbosum (2n:::: 2x = 14).
The manual also contains information on Most progeny (95%) are barley haploids,
the organization of a DH laboratory, basic while the remainder is made up by diploid
DH media and associated simple cytogenetic hybrids. This technique, called the bulbo-
methods for ploidy level analysis. An excel- sum method, has been extensively utilized
lent overview of haploid induction and the for the production of haploids in barley.
application of doubled haploids is provided Haploids can also be produced in hexaploid
for Brassicaceae, Poaceae and Solanaceae wheat (var. Chinese Spring) by chromosome
in Haploids in Crop Improvement II elimination following hybridization of wheat
(Biotechnology in Agriculture and Forestry) with H. bulbosum (both 2x and 4x). A fre-
edited by Palmer et al. (2005). quency of 13.7% grain set with 2x bulbosum
There are now five methods generally and 43.7% grain set with 4x bulbosum were
applicable to the production of haploids in obtained (Barclay, 1975). During formation
plants with frequencies that are useful for of the embryo the chromosomes of H. bulbo-
genetics and breeding programmes (Palmer sum are eliminated. The immature embryos
and Keller, 2005): are cultured in vitro and plantlets from these
monoploid embryos can be induced via an
Extensive hybridization crosses fol-
efficient chromosome doubling technique to
lowed by chromosome elimination
produce fertile flowers bearing homozygous
from one parent of a cross, usually the
hexaploid seeds.
pollination parent.
The production of embryos as a result
Gynogenesis: cultured unfertilized
of wheat maize crosses was first reported
isolated ovules and ovaries of flower
by Zenkteler and Nitzsche (1984). Laurie
buds develop embryos from cells of the
and Bennett (1986) cytologically exam-
embryo sac.
ined embryos produced via this system and
Androgenesis: cultured anthers or iso-
found maize chromosomes to be preferen-
lated microspores undergo embryogen-
tially eliminated during the first three cell
esis or organogensis directly or through
divisions, leaving a haploid complement of
intermediate callus.
wheat chromosomes. This method was used
Parthenogenesis: development of an
in wheat haploid production and applied
embryo by pseudogamy, semigamy or
with some success in generating genetic
apogamy.
and mapping populations (Laurie and
Inducer-based approach: haploid-induc-
Reymondie, 1991). Mean frequencies of fer-
ing lines are used to produce haploids.
tilization, embryo formation, embryo germi-
nation and haploid regeneration of 83, 20,
Chromosome or genome elimination 45 and 8%, respectively have been reported
(Chen et al., 1999). Significant differences
Haploid embryos can be produced in plants in the percentage of embryo germination
after pollination by distantly related spe- and haploid regeneration were observed
cies. In most cases, normal double fertiliza- among crosses suggesting that the efficacy
tion takes place to form a hybrid zygote and of haploid production could be improved by
118 Chapter 4

selection of more responsive parents. Eighty germination and green-plant regeneration


per cent of haploid plants were doubled and and doubling is needed. Some green plants
had a normal seed set; however, only 6% will die during colchicine-induced chro-
produced viable progeny. Ultimately two mosome doubling and during transplanta-
DH green plants per pollinated head were tion of the colchicine-treated seedlings to
obtained on average. The frequency of hap- the field; therefore, the final population
loid regeneration was increased from 35 to size may be too small to represent a suffi-
50% in the winter 2000 study using a pre- cient number of possible genotypes to make
cold treatment of embryos. selection effective. In addition, application
Factors that have been reported to affect of 2,4-D is crucial and without it there may
the production of haploids by the chromo- be no seed set or embryo formation. Of the
some elimination approach include geno- various methods tested, the use of spikelet
type, temperature during growth (higher culture offers a practical and versatile alter-
temperature resulting in a higher rate of native for the production of wheat polyhap-
elimination), genome ratio of parental lines loid using wheat maize sexual crossing
and others. Factors affecting the efficacy of (Kaushik et al., 2004).
DH production in the wheat x maize system
include: (i) expertise and consistency in
SOMATIC REDUCTION AND CHROMOSOME ELIMINATION.
protocol implementation; (ii) control of tem-
Cases are known where either spontane-
perature and light regimes for optimal plant
ously or due to specific treatments, the chro-
growth and reproduction in both wheat and
mosome number was reduced to half in the
maize; (iii) wheat F1 genotype differences;
somatic tissues, a phenomenon described
(iv) timing of 2,4-dichlorophenoxyacetic
as somatic reduction or reductional mitosis.
acid (2,4-D) treatment; and (v) growth stage
Early studies include that of Swaminathan
at which colchicine is applied.
and Singh (1958), who induced a haploid
Compared to anther culture, the wheat
branch on a watermelon by irradiation of
maize system (sometimes called the maize
the seed used. This must have occurred
pollen method) has three advantages: less
by the reduction of chromosome number
genotype-dependent response, greater effi-
in the somatic tissue through an unknown
cacy and less time consuming. Based on
mechanism (perhaps due to spindle organ-
Kisana et al. (1993), the maize pollen method
izer abnormalities). Similarly, in Sorghum
is about two to three times more efficient
vulgare, somatic tetraploid (2n = 4x) cells
than anther culture. In the study by Chen et
responded to colchicine treatment and
al. (1999), twice as many green plants were
gave rise to diploid cells which took over
regenerated (mean = 7.54%) using the maize
the growing point completely thus giving
pollen method than anther culture. Kisana
rise to diploid individuals. There are also
et al. (1993) reported that aneuploids or
a number of other chemicals such as chlor-
gross chromosomal abnormalities were not
amphenicol and para-fluorophenylalanine
observed and confirmed that chromosome
(an amino acid analogue) which have some-
variations were not common in wheat
times been successfully used for produc-
maize-derived plants. They also concluded
tion of haploids in a number of materials.
that this technique could save 46 weeks in
Elimination of parental chromosomes has
obtaining the same age haploid green plants.
also been observed in somatically-produced
The cross incompatibility barrier in
wide hybrids. In these cases, the elimina-
wheat has been successfully overcome by
tion tends to be irregular and incomplete,
using maize pollen. The wheat maize
leading to asymmetric hybrids or cybrids
technique is currently being used as an
(Liu, J.H. et al., 2005).
alternative to the bulbosum technique and
anther culture for wheat haploid produc-
tion. In order to use the wheat maize MECHANISM OF CHROMOSOME ELIMINATION. Several
system in practical breeding programmes, hypotheses have been presented to explain
further enhancement of embryo formation, uniparental chromosome elimination during
Populations in Genetics and Breeding 119

hybrid embryo development in plants: for some 3 of H. vulgare are responsible for chro-
example, differences in timing of essential mosome elimination, although their effect
mitotic processes due to asynchronous cell may be neutralized or offset if a sufficient
cycles or asynchrony in nucleoprotein syn- dose of bulbosum chromosomes is available.
thesis leading to a loss of the most retarded
chromosomes. Other hypotheses propose Ovary culture or gynogenesis
the formation of multipolar spindles, spatial
separation of genomes during interphase Ovary culture involves production of a hap-
and metaphase, parent-specific inactiva- loid individual by culture of unfertilized
tion of centromeres and by analogy with the ovaries to obtain haploid plants from egg
host-restriction and modification systems of cells or other haploid cells of the embryo;
bacteria, degradation of alien chromosomes the process is known as gynogenesis. Under
by host-specific nuclease activity. Gernand the appropriate culture conditions the
et al. (2005) provide evidence for a novel unfertilized cell of the embryo sac develops
chromosome elimination pathway in wheat into an embryo by as yet unknown mecha-
pearl millet hybrids that involves the for- nisms. Haploid plants generally originate
mation of nuclear extrusions during inter- from egg cells in most species (in vitro par-
phase in addition to post-mitotically formed thenogenesis) but in some species, e.g. rice,
micronuclei. They found that the chroma- they arise chiefly from the synergids; in at
tin structure of nuclei and micronuclei was least Allium tuberosum even antipodal cells
different and heterochromatinization and produce haploid plants (in vitro apogamic)
DNA fragmentation of micronucleated pearl (Mukhambetzhanov, 1997).
millet chromatin was the final step during Gynogenesis may occur either via
haploidization. embryogenesis or plantlet regeneration
The mechanism of chromosome elimi- from callus. In rice 2-methyl-4-chlorophen-
nation in Hordeum hybrids was studied oxyacetic acid (MCPA) generally leads to a
by Subrahmanyam and Kasha (1975) and small amount of protocorm-like callus for-
Bennett et al. (1976) and the following con- mation from which shoots and roots regen-
clusions were drawn: (i) normal double fer- erate, while picloram promotes embryo
tilization occurs in interspecific crosses as regeneration. In contrast, sugarbeet usually
confirmed by cytological study; and (ii) after shows embryo development while in sun-
fertilization there is a gradual and selective flower embryos regenerate following a cal-
elimination of H. bulbosum chromosomes lus phase. In general, regeneration from a
from nuclei of endosperm as well as embryo callus phase appears, at least for the present,
cells so that eventually haploid embryos are to be easier than direct embryogenesis.
produced. A sudden shortage of proteins in Generally, gynogenesis has two or
the developing embryo and endosperm and more stages and each stage may have dis-
the better ability of vulgare chromosomes to tinct requirements. In rice, two stages, i.e.
form spindle attachments relative to bulbo- induction and regeneration, are recognized.
sum chromosomes, may be responsible for During induction, ovaries are floated on a
elimination of the bulbosum chromosomes. liquid medium containing low auxin levels
Other possible causes such as differences in and kept in the dark, while for regeneration
mitotic cycle, congression during mitosis, they are transferred on to an agar medium
etc. were ruled out by the authors. containing a higher auxin concentration
It has also been demonstrated that the and incubated in the light.
elimination of bulbosum chromosomes is Depending on the species, unfertilized
under genetic control (Subrahmanyam and ovules, ovaries or flower buds can be cultured.
Kasha, 1975). The above-mentioned authors In some members of the Chenopodiaceae,
used primary trisomics and monotelotri- Liliaceae and Cucurbitaceae, gynogenesis is
somics in crosses with tetraploid H. bul- the main route to DH production (Palmer and
bosum and concluded that both arms of Keller, 2005). Even where anther or micro-
chromosome 2 and the short arm of chromo- scope culture is successful, gynogenetic
120 Chapter 4

haploids have been produced, e.g. in barley, at lower levels somatic calli and somatic
maize, rice and wheat. embryos were also produced. Ovaries are
San Noeum (1976) was the first to generally cultured in the light but in some
demonstrate that gynogenesis can be species at least, e.g. sunflower and rice,
induced under in vitro conditions. She incubation in the dark favours gynogenesis
obtained gynogenic haploids using an and minimizes somatic callusing; in rice
ovary culture of H. vulgare. Subsequently, light may lead to the degeneration of gyno-
success has been obtained with many genic pro-embryos.
species, e.g. wheat, rice, maize, tobacco, Ovary culture has two main limita-
petunia, gerbera, sunflower, sugarbeets, tions: (i) it is not successful in all species;
onions, rubber, etc. About 0.26% of the and (ii) the frequency of responding ova-
cultured ovaries show gynogenesis and ries and the number of plantlets per ovary
one or two plantlets, rarely up to eight, is usually low. Therefore, anther culture is
originate from each ovary. preferred over ovary culture; only in those
Embryogenic frequency is low in many cases where anther culture fails, e.g. sugar-
cases, but relatively high frequencies have beet and for male sterile lines, ovary culture
been reported in some cases (Alan et al., assumes significance.
2003; Martinez, 2003). The rate of success
varies considerably with species and is Anther culture or androgenesis
markedly influenced by explant genotype
so that some cultivars do not respond at Anther culture or androgenesis is a proc-
all. In rice, japonica genotypes are far more ess by which a haploid individual develops
responsive than indica cultivars. In most from a pollen grain. Anther culture is often
cases, the optimum stage for ovule culture the method of choice for DH production
is the nearly mature embryo sac, but in rice in crop plants (Sopory and Munshi, 1996).
ovaries at the free nuclear embryo sac stage Good aseptic techniques are required but
are the most responsive. the methods are generally simple and appli-
The culture response is still genotype cable to a wide range of crops (Maluszynski
dependent (Alan et al., 2003; Bohanec et al., et al., 2003). In general, haploid plants are
2003). Generally, for culture of whole flow- generated in vitro from the microspores
ers, ovary and ovules attached to placenta contained in the anther and require chro-
respond better, but in gerbera and sunflower mosome doubling treatments. The number
isolated ovules give a better response. Cold of chromosomes in haploid plants can be
pretreatment (2448 h at 4C in sunflower doubled either naturally or by colchicine
and 24 h at 7C in rice) of the inflorescence treatment.
before ovary culture enhances gynogenesis. The process involved in anther cul-
The composition of the culture medium ture is poorly understood. Investigations
and stage of embryo sac development are have been hampered by the presence of
important considerations for successful the sporophytic anther wall that presents
culture (Keller and Korzun, 1996). Growth direct access to the microspores contained
regulators are crucial in gynogenesis and at within. This has become an important issue
higher levels they may induce callusing of because although many species respond to
somatic tissues and even suppress gynogen- anther culture, responsive genotypes can
esis. Growth regulator requirements seem be a limiting factor thus making it neces-
to depend on species. For example, in sun- sary to study, understand and manipulate
flower growth regulator-free medium is the microspore embryogenesis in order to
best and even a low level of MCPA induces develop genotype-independent methods
somatic calli and somatic embryos. But in (Forster et al., 2007). Many factors influ-
rice, 0.1250.5 mg l1 MCPA is optimal for ence the production of anther-culture-
gynogenesis. The sucrose level also appears derived plants including the physiological
to be critical; in sunflower 12% sucrose status of the donor plants, pre-treatment of
leads to gynogenic embryo production while anthers, developmental stage of the pollen,
Populations in Genetics and Breeding 121

components in the medium and culture media as it contains a complex mixture of


conditions such as light, temperature and nucleic acids, sugars, growth hormones and
humidity. The constraints associated with some vitamins.
this approach are the selective response The physiological state of the parent
of genotypes to the anther culture proc- plant plays a role in haploid production.
ess or medium, the high rate of albino for- In various plant species it has been shown
mation and somaclonal variation. These that the frequency of androgenesis is higher
factors have been discussed by Taji et al. in anthers harvested at the beginning of the
(2002) and are summarized in the following flowering period and declines with plant
discussion. age. This may be due to deterioration in the
The genotype of the donor plant general condition of the plants, especially
plays a significant role in determining during seed set. The lower frequency of
the frequency of pollen plant production. induction of haploids in anthers taken from
There are genotypes extremely recalci- older plants may also be associated with a
trant to anther culture. In rice, for exam- decline in pollen viability. Seasonal varia-
ple, japonica cultivars are much easier to tions, physical treatment and application of
culture from anthers than indica cultivars. hormones and salts to the plant also alter
Genotype-dependency is a major constraint its physiological status which is reflected in
that affects its wide application. changes in the anther response to culture.
The culture medium plays a vital role Temperature and light are two physi-
since the requirements vary with the geno- cal factors which play an important role
type and probably the age of the anther as in the culture of anthers. Higher tempera-
well as the conditions under which donor tures (30C) yield better results. Temperature
plants are grown. The medium should con- shocks also enhance the induction frequency
tain the correct amounts and proportions of of microspore androgenesis. Frequency of
inorganic nutrients to satisfy the nutritional haploid formation and growth of plantlets
as well as physiological needs of the many are generally better in the light. Certain
plant cells in culture. Sucrose is considered physical and chemical treatments given to
to be the most effective carbohydrate source flower buds or anthers prior to culture can
which cannot be substituted by other disac- be highly conducive to the development of
charides. The concentration of sucrose also pollen into plants. The most significant is
plays an important role in the induction of cold treatment.
pollen plants. Activated charcoal is also The developmental stage of pollen
added to the culture medium. greatly influences the fate of the microspore.
In addition to basal salts and vitamins, Androgenesis occurs when a microspore or
hormones in the medium are critical factors pollen is induced to shift from a gameto-
for embryo or callus formation. Cytokinins phytic pathway to a sporophytic pathway
(e.g. kinetin) are necessary for induction of embryo formation. Anthers of some spe-
of pollen embryos in many species of cies (Datura, tobacco) give the best response
Solanaceae, except tobacco. Auxins, in par- if pollen is cultured at first mitosis or later
ticular 2,4-D, greatly promote the formation stages (postmitotic), whereas in most others
of pollen callus in cereals. For regenera- (barley, wheat, rice) anthers are most pro-
tion of plants from pollen calli, a cytokinin ductive when cultured at the uninucleate
and lower concentration of auxin are often microspore stage (premitotic). Anthers at
necessary. a very young stage (containing microspore
Certain organic supplements added mother cells m tetrads) or a late stage (con-
to the culture medium often enhance the taining binucleate, starch-filled pollen)
growth of anther culture. Some of these of development are generally ineffective,
include the hydrolysed products of proteins albeit that some exceptions are known.
such as casein (found in milk), nucleic acids Barley and rice are considered to be
and others. Coconut milk obtained from ten- model cereal crops for androgenesis. The
der coconuts is often added to tissue culture application of barley anther culture protocols
122 Chapter 4

to other cereals such as wheat yielded a low feasible means for production of haploids in
frequency of green plants. Although a high cotton (Zhang and Stewart, 2004).
frequency of green plants is produced for There are many examples of DH lines
most barley crosses, androgenesis still poses developed from cultivars and intra- and
some problems that need to be addressed. interspecific hybrids between upland cot-
There are barley genotypes which are ton (Gossypium hirsutum L.) and American
extremely recalcitrant to microspore divi- Pima cotton (Gossypium barbadense L.)
sion and/or with a high rate of albinism. using semigamy. The semigametic trait has
The rate of embryogenesis is still low and also been transferred into different cotton
poorly-developed embryos are formed very cytoplasms to facilitate rapid replacement
frequently. New methods are needed that of nuclei. Stelly et al. (1988) proposed a
reduce the cost of DH production and are scheme called hybrid elimination and hap-
effective for all genotypes. loid production system using a cotton strain
Future objectives in plant androgen- with semigamy (Se), lethal gene (Le2dav),
esis include the development of efficient virescent (v7) and male sterility or glandless
androgenesis protocols for a wide range of (gl2gl3).
genotypes, a better understanding of the Semigametic lines can produce 3060%
biological processes involved in the stress haploids when self-pollinated and about
pre-treatment, the study of the influence 0.71.0% androgenic haploids when used
of different micronutrients on the induc- as female parents in crosses with normal
tion of gametic embryogenesis and possi- non-semigametic cottons (Turcotte and
ble gametophytic selection. Identification Feaster, 1967). A unique feature of semi-
of genetic loci associated with the anther gamy is that the inheritance of the gene is
culture response process will facilitate the conveyed by both male and female gametes
understanding of the mechanisms underly- but expression of the trait in terms of hap-
ing androgenesis. Identification and locali- loid production occurs only in the female
zation of molecular markers linked to the parent. As a consequence, for example, in
yield of green plants per anther and the reciprocal crosses between SeSe and sese
evaluation of their potential use for the parents, haploids will be produced only
prediction of the anther culture response when SeSe or Sese is the female parent.
of genotypes will also help to optimize the The results reported by Zhang and
production of DHs. Stewart (2004) verified that semigamy in
cotton is controlled by one gene, previously
Semigamy designated Se. The gene functions sporo-
phytically and gametophytically resulting
Semigamy is a form of parthenogenesis and in an incomplete dominance mode of action.
occurs when the nucleus of the egg cell and Consistent with the difference between the
the generative nucleus of the germinated two parental isogenic lines, semigametic
pollen grain divide independently, resulting F2.3 lines had significantly lower chloro-
in a haploid chimera (a plant whose tissues phyll content than non-semigametic F2.3
are of two different genotypes). Semigamy is lines, an observation that was confirmed by
a type of facultative apomixis in which the a significant association between haploid
male sperm nucleus does not fuse with the production and chlorophyll content. The Se
egg nucleus after penetrating the egg in the gene and the gene for reduced chlorophyll
embryo sac. Subsequent development can content could be either the same or closely
give rise to an embryo containing haploid linked.
chimaeral tissues of paternal and maternal
origins. In cotton, the semigametic phe- Inducer-based approach
nomenon was first reported by Turcotte and
Feaster (1963), who developed the Pima line Haploid inducing lines have been used
57-4 that produced haploid seeds at a high in maize to produce haploids by develop-
frequency. Currently semigamy is the only ment of the unfertilized egg cells (Eder and
Populations in Genetics and Breeding 123

Chalyk, 2002). A haploid induction rate (iii) improved chromosome doubling sys-
of up to 2.3% was detected by Coe (1959) tems using colchicine that gave a doubling
in crosses with the inbred line Stock 6. rate of greater than 10%.
A higher rate (about 6%) was later obtained A scheme to show in vivo haploid
by Sarker et al. (1994) and Shatskaya et al. induction includes the following steps:
(1994) in progenies of crosses between Creating new variation by intercrossing
Stock 6 and Indian and Russian germplasm,
with selected lines.
respectively. Inducer lines are now available In-vivo haploid induction in generation
with haploid seed induction rates of 812%
F1.
in temperate maize germplasm (Melchinger Chromosome doubling of haploid seed-
et al., 2005; Rber et al., 2005).
lings:
Segregation studies (Lashermes and
selection of haploid kernels;
Beckert, 1988; Deimling et al., 1997) and
germination of kernels;
quantitative trait loci (QTL) analysis (Rber,
cutting of coleoptile;
1999) demonstrated that in vivo haploid
doubling procedure: treatment of
induction in maize is a quantitative trait
seedlings with colchicine;
under the control of an unknown large
planting of treated seedlings in
number of loci. Individual QTL explained
greenhouse;
only small parts of the genetic variation.
transplanting DH plants at the
Compared with other methods of DH
three-leaf stage to the field and
production such as anther culture, the
selfing (generation D0); and
inducer-based approach is rather efficient,
formation of testcross hybrids.
less dependent on the genotype and can be Evaluation of testcrosses in multi-envi-
practised in almost every maize breeding
ronment yield trials (two stages).
programme without access to expensive lab-
oratory facilities (Rber et al., 2005; http://
www.uni-hohenheim.de/ipspwww/350a/
linien/indexl.html). 4.2.2 Diploidization of haploid plants
Requirements for in vivo DH produc-
tion in practical breeding include: (i) avail- As described above, haploids can be pro-
ability of inducer genetic stocks; (ii) high duced through various approaches. Haploid
induction rate; (iii) the inducer is a good plants may grow normally under in vitro or
pollinator; (iv) reproducibility with rea- greenhouse conditions up to the flowering
sonable seed quantities; (v) availability of stage, but viable gametes are not formed
a marker system that is independent of the due to the absence of one set of homologous
genetic background of the female and of chromosomes and consequently, there is no
environmental effects and can be used for seed set.
effective and unambiguous identification The only mechanism for perpetuating
of haploid kernels; and (vi) availability of the haploids is by duplicating the chro-
an artificial chromosome doubling system mosome complement in order to obtain
with high doubling rates that is safe, simple homozygous diploids. In pollen-derived
and cost-effective. plants duplication of chromosomes may
Since the late 1990s, these requirements occur spontaneously in cultures. However,
have been partially met in maize with: the spontaneous chromosome doubling
(i) inducer lines (e.g. RWS and UH400 devel- rate of haploids is usually low. In maize,
oped at the University of Hohenheim) with for example, the rate ranges from 0 to 10%
improved induction rates of 10% or higher; (Chase, 1969; Beckert, 1994; Deimling et al.,
(ii) a combination of two dominant mark- 1997; Kato, 2002). Therefore, it is neces-
ers (anthocyanin colour of endosperm and sary to diploidize the haploids by chemical
embryo for identification of haploids and means. Thus, artificial chromosome dou-
anthocyanin coloration of stalk for identi- bling (diploidization) is necessary for the
fication of false positives in the field); and efficient large-scale use of haploid plants.
124 Chapter 4

Chromosome doubling is thought 4.2.3 Evaluation of DH lines


to occur by one or more of four mecha-
nisms, namely endomitosis, endoredu- Randomness
plication, C-mitosis or nuclear fusion
(Jensen, 1974; Kasha, 2005). Endomitosis Systems used to produce DH lines should not
is described as chromosome multipli- have preference to specific gametes, which
cation and separation but failure of the means each gamete should have the same
spindle leads to one restitution nucleus probability of developing into a haploid.
with the chromosome number doubled. Chromosome elimination using the bulbo-
It has also been called nuclear restitu- sum approach in barley is usually a random
tion. Endoreduplication is duplication of process and there is no significant segregation
the chromatids without their separation distortion associated with it. Park et al. (1976)
and leads to diplo-chromosomes or to and Choo et al. (1982) did not find any gam-
polytene chromosomes if many replica- ete preference associated with this approach
tions occur. Endoreplication is a common by comparing the DH and single seed descent
feature in specialized plant cells where (SSD) populations. In rice, however, espe-
cells become differentiated or enlarged cially for the DH populations derived from
in cells that are very active in metabolite anther culture of distant crosses, distorted
production. C-mitosis is a specific form segregations were found for isozymes, restric-
of endomitosis where, under the influ- tion fragment length polymorphisms (RFLPs)
ence of colchicine, the centromeres do and morphological traits. As a result, the seg-
not initially separate during metaphase regation of two types of homozygotes devi-
while chromosome arms or chromatids ated from the 1:1 ratio for many single gene
do separate. Nuclear fusion occurs when loci (Chen, Y. et al., 1997).
two or more nuclei divide synchronously
and develop a common spindle. Thus, Stability
two or more nuclei could result with
doubled, polyploid or aneuploid chromo- Theoretically, DH lines have two properties:
some numbers. complete homogeneity and homozygosity
A simple procedure designed to within lines. Except for the variation that
achieve diploidization involves immer- might be produced during anther culture or
sion of very young haploids in a filter-ster- other generation processes, DH lines should
ilized solution of colchicine (0.4%) for 24 be genetically stable and the mutation rate
days, followed by their transfer to the cul- that can occur in DH lines should be in the
ture medium for further growth. In maize, same range as that of other true-breeding
the highest doubling rates are achievable cultivars.
by immersing 23-day-old seedlings in a There are some reports that identified
colchicine solution as suggested by Gayen somaclonal variation associated with anther
et al. (1994). Using an improved version culture-derived DH lines (Chen, Y. et al.,
of this method, Deimling et al. (1997) 1997) and theories that account for the ori-
obtained doubling rates of up to 63%. The gin of somaclonal variation (Taji et al., 2002).
studies of Eder and Chalyk (2002) using The variation within a DH line can be divided
genetically broader materials yielded an into two categories: (i) variation originating
average doubling rate of 27%. Optimized from the genetic heterogeneity of somatic
methods for colchicine treatment of hap- cells of the source (haploid) plant; (ii) varia-
loid seedlings yield average success rates tion due to structural alterations of DNA and
of about 10% fertile diploid plants with chromosomes caused by tissue culture.
satisfactory seed set (Mannschreck, 2004). Somaclonal variation as an important
In this procedure chromosome or gene cause for instability of DH lines and is not
instabilities are minimal compared to restricted to, but is particularly common in,
other methods of colchicine or chemical plants regenerated from callus. The varia-
treatment. tions can be genotypic or phenotypic which
Populations in Genetics and Breeding 125

in the latter case can be either genetic or the formation of multi-polar spindles on
epigenetic in origin. Typical genetic altera- chromosomes lagging at anaphase cause the
tions are: changes in chromosome numbers development of cell lines with haploid, tri-
(polyploidy and aneuploidy), chromo- ploid or other uneven ploidy status.
some structure (translocations, deletions Many studies have indicated that cryp-
and duplications) and DNA sequence (base tic structural modification of individual
mutations). chromosomes is more likely to cause soma-
clonal variation than modification induced
GENETIC VARIATION ARISING FROM SOURCE by ploidy changes in many tissue-cultured
PLANTS. The source plants used to initiate plants. Chromosomal changes occurring
cultures are likely to be heterogeneous with during tissue culture include transposition
respect to the state of differentiation, ploidy of mobile genetic elements (transposons),
level and age. These explant-related factors chromosome breakage and repositioning
will affect the genetic make-up of the cells of chromosome segments.
produced in the culture and thus the cal- As summarized by Taji et al. (2002),
lus arising from such a group of cells with several mechanisms have been proposed to
diverse genetic make-up will inevitably lead explain the genetic variability that occurs
to a mixed population of cells. Depending in tissue culture. The most possible causes
on the cell types from which the plants are are:
originated, those regenerated from such a 1. Reduced regulatory control of mitotic
genetically mosaic callus will undoubtedly events in culture: the ploidy status of plants
be of different genetic make-up. Taji et al. generated from callus, cell suspension or
(2002) indicated that such genetic mosaic- protoplast cultures of certain species differ
ness seems to occur commonly in polyploid significantly despite the fact that the cul-
plants rather than in diploids or haploids. tures originate from a highly homogenous
genetic background. This indicates a lack of
GENETIC VARIATION ARISING DURING CULTURE. tight regulation of cell-cycle-related controls
Although a significant degree of genetic during cell proliferation in culture.
variability can be traced to the genetically 2. Use of growth regulators: plant growth
heterogeneous cell types of explant at least regulators, particularly synthetic auxins
in polyploid species, there is substantial such as 2,4-D, are considered to be the
evidence to indicate that much of the vari- major cause of genetic variability in cul-
ability observed in generated plants stems ture. For example, cytokinins at low con-
from the culture process itself. Aneuploids, centrations have been shown to reduce the
polyploids or cells with structurally altered range of ploidy in culture while low levels
chromosomes may arise in culture. Many of both auxins and cytokinins appear to
differentiated cells when induced to divide preferentially activate the division of cyto-
in culture, undergo endoduplication of logically stable meristematic cells enabling
chromosomes resulting in the production the regeneration of genetically uniform
of tetraploid or octaploid cells with distinct plantlets.
phenotypes. 3. Medium components: some of the min-
Various phenomena have been eral nutrients influence the establishment of
observed in tissue culture of various plant genetic variability in culture. For instance,
species which explain the production of by altering the levels of phosphate and nitro-
cells with unusual ploidy levels (Bhojwani, gen as well as the form of nitrogen in the
1990). Occurrence of multi-polar spindles medium, the genetic composition (ploidy
due to failure of spindle formation during level) of the cultured cells can be controlled
cell division is one of the contributing fac- to a considerable extent. A marked increase
tors. Absence of spindle formation during in chromosome breakage has been observed
mitosis results in the appearance of cells in plant cell cultures grown with different
with doubled chromosome number while levels of magnesium or manganese.
126 Chapter 4

4. Culture conditions: some culture con- systems could thus be attributed to tissue-
ditions, such as incubation temperatures culture-induced methylation or demethyla-
above 35C and long duration of culture, tion of DNA. The activity of transposons and
have been implicated in inducing genetic retrotransposons induced by tissue culture
variability in regenerated plants. could also be responsible for some of the
5. Inherited genomic instability: molecular genetic and epigenetic variability observed
studies indicate the existence of certain in culture.
regions of genome that are more susceptible
to tissue-culture-induced structural alterna-
tions, although the reason for the increased
4.2.4 Quantitative genetics of DHs
susceptibility of these genomic loci known
as hot spots is not fully understood.
DH lines that are derived randomly from
an array of gametes produced by F1 plants
CAUSES OF EPIGENETIC VARIATION IN TISSUE CULTURE. are very useful in quantitative genetics.
Any culture-induced changes which are sta- Compared with diploid genetic models
ble but not heritable have frequently been for populations such as F2, F3 or BC, there
considered as epigenetic variation. However, are no dominance or dominance-related
a greater understanding of genetic and epi- epistasis effects involved in the genetic
genetic alterations in tissue culture in the model of DH populations. As a result, addi-
recent past has led to a clear distinction tive, additive-related epsitasis and linkage
between these two types of variation. For effects can be investigated properly. As a
instance, genetic mutations occur randomly permanent population, DH lines can be
and at a much lower rate than epigenetic replicated as many times as desired across
variations. Genetic changes are usually sta- different environments, seasons and labo-
ble and heritable. Epigenetic variation may ratories, providing endless genetic material
also lead to stable traits; however, reversal for phenotyping and genotyping particu-
can occur at high rates under non-selective larly for understanding the genotype-by-
conditions. Epigenetic traits are often trans- environment interaction. In DH populations,
mitted through mitosis in a stable manner the additive component of genetic variance
but rarely through meiosis and the level is larger than that of diploid populations
of induction of epigenetic traits is directly such as F2 and BC. Choo et al. (1985) dis-
related to the selection pressure experi- cussed in detail the quantitative genetics
enced by the cells. Epigenetic changes are associated with DH populations, including
generally assumed to reflect alteration in detection of epistasis, estimation of genetic
expression rather than in the information variance components, linkage test, estima-
content of genes. tion of gene numbers, genetic mapping of
As Taji et al. (2002) summarized, the polygenes and tests of genetic models and
epigenetic variation observed in cultured hypotheses. Rber et al. (2005) compared
cells or regenerated plants is mainly due to the expected gain from selection for DH
three cellular events: (i) gene amplification; lines and other populations and implica-
(ii) DNA methylation; and (iii) increased tions of epistatic effects, which is briefly
activity of transposable elements. In plants, described here.
nearly 25% of the genome can be methylated
at cytosine residues but the significance of
Expected gain from selection
this cytosine methylation is not apparent.
It has been suggested that methylation (and As is well known from quantitative genet-
demethylation) of DNA is one of the ways of ics (see e.g. Falconer and Mackay (1996)
controlling transcriptional activity and that and also Chapter 1), the expected gain from
this process can be affected by the tissue selection can be described by G = i hx rG sy,
culture process. The non-heritable genetic where i is the selection intensity, hx the
variability observed in many tissue culture square root of the heritability of the selection
Populations in Genetics and Breeding 127

criterion, rG the genetic correlation between for DH lines this correlation is 1. Thus com-
selection criterion and gain criterion and sy pared with S2 lines, the correlation of DH
the standard deviation of the gain criterion. lines is 1: 0.75 = 1.15 times stronger.
In long-term breeding programmes, the deci-
sive gain criterion for evaluating selection Implications of epistatic effects
progress in hybrid breeding is the general
combining ability (GCA) of the improved Epistatic gene action may positively or neg-
lines. At the beginning of a breeding cycle atively affect hybrid performance (Lamkey
the test units are the DH lines per se and and Edwards, 1999). In most cases, epi-
later on in the cycle their testcrosses. static effects have been reported to cause
Strong selection (large i) leads to a small a decrease in the testcross performance
effective population size and consequently of segregating generations (Lamkey et al.,
to a loss of genetic variance due to random 1995) or to penalize three-way and double
drift. To keep this loss within certain lim- crosses compared to their non-parental sin-
its, a minimum number of lines should be gle crosses (Sprague et al., 1962; Melchinger
recombined after each breeding cycle. This et al., 1986). These effects are commonly
number depends on the inbreeding coeffi- referred to as recombinational loss and
cient (F) of the candidate lines. The number may be explained by a disruption during
should be (2F) times larger for inbred lines meiosis of co-adapted gene arrangements
than for non-inbred genotypes. Assuming assorted by prior natural and artificial selec-
that S2 lines (F = 0.75) are recombined in tion. Marker-based analyses of QTL partially
conventional breeding, the number of DH corroborate this hypothesis (Stuber, 1999).
lines (F = 1) would have to be increased To avoid recombinational loss and still offer
1:0.75 = 1.33-fold to preserve an equiva- a chance to select for new positive interac-
lent level of genetic variation. This means tions, a balance between recombination and
that the selection intensity must be reduced fixation of gene arrangements is needed. The
accordingly when using DH lines. DH-line approach might offer the method
In contrast to the selection intensity, for achieving this goal as homozygosity can
hx and rG increase when using DH lines. be reached in one cycle of recombination
This increase is particularly large in the when F1 is used for DH development or in
first testcross stage. Neglecting epistasis, the different cycles when segregating popula-
GCA variance of inbred lines is equal to 1/2 tions of different generations are used.
F sA2 (Falconer and Mackay, 1996), where sA2
is the additive variance of the base popu-
lation. Thus the GCA variance of DH lines 4.2.5 Applications of DH populations
is 1:0.75 = 1.33 times larger than that of S2 in genomics
lines. This leads to better differentiation
among the testcrosses and consequently to In genetics, DHs may serve to recover
higher heritability. Seitz (2005) compared recessives. Using DHs, linkage data can be
three sets of S2 and S3 lines each with DH obtained directly by sampling gametes as
lines derived from the same crosses and monoploids. DHs are ideal for the study
evaluated the same testers in the same envi- of mutation frequencies and spectra. As
ronments. On average, the estimated genetic DHs represent homozygous, immortal and
testcross variances for grain yield (bu. acre1) true breeding lines, they can be repeatedly
amounted to 50, 94 and 124 for S2, S3 and phenotyped and genotyped so phenotypic
DH lines, respectively. and genotypic information can be accumu-
The genetic correlation between selec- lated over years and across laboratories. In
tion and gain criterion (rG) also increases genomics, DHs are therefore ideal for study-
with the degree to which the tested lines ing complex traits that are quantitatively
have been inbred. For example, the correla- inherited which may require replicated tri-
tion between St lines and their homozygous als over many years and locations for accu-
progenies for GCA is equal to Ft whereas rate phenotyping.
128 Chapter 4

DH populations are desirable genetic Ab and aB are recombinant gametes while


materials for genetic mapping including the AABB and aabb are genotypes for parental
construction of genetic linkage maps and lines and AAbb and aaBB are genotypes for
gene tagging using genetic markers. They recombinants. It is expected that for each
can be produced relatively rapidly, requir- molecular marker there are two parental
ing 11.5 years to become established after genotypes in DH populations and in any DH
the initial cross and they provide an ongo- line only one of the parental bands revealed
ing population that can be used indefinitely by markers will show up.
for mapping. QTL analysis is facilitated by A general step before map construc-
using DH mapping populations and the tion and gene mapping is to evaluate the DH
homozygosity of DHs enables accurate phe- population. In rice, 66 DH lines were derived
notyping by replicate trials at multiple sites from the F1 between indica Apura and
(Forster and Thomas, 2004). In addition, in upland japonica Irat 177 by anther culture.
DH populations, dominant markers are as Heterozygosity was found for some loci with
efficient as co-dominant markers because two parental bands while non-parental alle-
linkage statistics are estimated with equal les (or new alleles) were found for other loci.
efficiency (Knapp et al., 1995). DHs can be The limitations of using this DH population
also used to increase the expression level of in genetic linkage mapping do not result from
a transgene (Beaujean et al., 1998). the partial heterozygosity or new alleles but
Only the application of DHs for con- from the low RFLP polymorphism identified
struction of genetic linkage maps will be dis- between the parents (S.R. McCouch, Cornell
cussed here. Assuming that the two parental University, personal communication). Only
lines used for production of DH populations 40% of the 100 tested RFLP markers detected
have the genotypes P1 (AABB) and P2 (aabb), polymorphism. Of the markers that had been
their F1 will produce four types of gametes, mapped on to an F2 population, IR34583/
AB, Ab, aB and ab. As a result of single- Bulu, only 55% were polymorphic between
sex production, these gametes produce four Apura and Irat 177. However, a relatively sat-
types of haploids and by chromosome dou- urated molecular map can be established if
bling will produce four types of DHs: AABB, other types of molecular markers such as sin-
AAbb, aaBB and aabb. When A-a and B-b gle sequence repeats (SSRs) or single nucle-
are independent (not linked), the four types otide polymorphisms (SNPs) are used.
of DH lines are present in identical propor- In barley, a DH population consisting of
tions: 25%. The segregation of two loci in 113 lines was derived by anther culture from
DH population is shown in Fig. 4.1, among the F1 hybrid between two spring barley cul-
which AB and ab are parental gametes and tivars, Prottor and Nudinka (Heun et al.,

AB P2 ab
P1 ab
AB

AB
F1
ab

Gamete AB Ab aB ab
Haploid AB Ab aB ab

AB Ab aB ab
Double haploid
AB Ab aB ab

Ratio Independent 25% 25% 25% 25%


Linkage (1r)/2 r/2 r/2 (1r)/2

Fig. 4.1. Segregation of two genetic loci in a DH population.


Populations in Genetics and Breeding 129

1991). A genetic map was constructed using assess their true breeding potential for target
55 RFLP markers and two known genes and traits. They have the following advantages
is the first complete molecular map to be and clear beneficial applications (Melchinger
constructed using DH populations in crops. et al., 2005; Rber et al., 2005; Longin et al.,
Since then, many DH populations have been 2006; W. Schipprack, University of
developed using the different approaches Hohenheim, personal communication):
described above and have been used for map providing the quickest possible route to
construction and genetic mapping.
complete homozygosity;
giving an immediate product of stable
4.2.6 Application of DHs recombinants from species crosses;
in plant breeding no masking effects because of the high
homogeneity attained in the first gen-
The benefits of DHs in plant breeding have eration of DH populations;
been widely reviewed; readers should refer
increased performance per se due to selec-
to Forster and Thomas (2004), Forster et al. tion pressure in the haploid phase and/or
(2007), and the five volumes on In Vitro during the first generation of DHs;
Haploid Production in Higher Plants edited
complete genetic variance accessible
by Jain et al. (19961997). from the very beginning of the selec-
Application of DHs in plant breeding tion process;
can be described by comparison of the time
easy integration of line/hybrid develop-
required to obtain fixed inbreds relative to ment with recurrent selection;
inbreeding, starting from a heterozygote:
reduced efforts in the nursery after the
first multiplication of DH lines compared
to a conventional breeding nursery;
Selfing of a Haploids of a maximum genetic variance in line per
heterozygote heterozygote se and testcross trials;
high reproducibility of early-selection
Gametes: 1/2 A + 1/2 a Gametes: 1/2 A + 1/2 a
F2 1/4 AA, 1/2 Aa, 1/4 aa chromosome doubling results;
F3 1/4 Aa 1/2 AA + 1/2 aa
high efficiency in stacking specific tar-
F4 1/8 Aa geted genes in homozygous lines; and
F5 1/16 Aa simplified logistics for seed exchange
F6 1/32 Aa between main and off-season pro-
1/2 AA + 1/2 aa grammes since each line is fixed and
can be represented by a single plant.
Apparently, the DH approach has a time DHs have been used in plant breeding
reduction of three to four generations com- programmes to produce homozygous geno-
pared to inbreeding-based breeding. The DH types in a number of important species,
approach features many logistical advan- e.g. tobacco (Nicotiana tabacum L.), wheat,
tages simplifying breeding to a large extent barley, canola (Brassica napus L.), rice
and enabling evaluation of genetically fixed and maize (Maluszynski et al., 2003), but
hybrid components from the very beginning only rarely in triticale, oat, rye and others.
of the selection process. Depending on the Research in crops such as rice, wheat and
material, the costs and the breeding scheme maize has shown that significant progress
adopted, the DH approach can reduce the in haploid technology is attainable given an
time for development and commercializa- intensive research effort. Well-established
tion of new inbred lines and lead to a higher methods in these crops have allowed major
expected genetic gain per unit of time. parts or whole breeding programmes to
As outlined above, DH lines extracted be based on DH production. Oat, triticale,
from a heterozygote or a segregating popula- wild barley, potato and cabbage are exam-
tion represent immortalized, reproducible ples of crops where DH technologies are
gametes that can be immediately evaluated to less advanced but in which hundreds of
130 Chapter 4

DHs may still be obtained (Tuvesson et al., rare alleles and aid the efficient selec-
2007). In other crops, including some veg- tion for quantitative traits in breeding. In
etable species and forage and turf grass spe- outcrossing species, DHs enable undesir-
cies, DH methods are being developed, but able recessive genes to be eliminated from
applications in crop improvement are rare. lines at any breeding stage (Forster and
The DH approach has yet to be exploited Thomas, 2004).
in leguminous species, predominantly due Development of DHs through anther
to their cultivation in developing countries culture has been very successful with many
and consequent paucity of research fund- cultivars released in barley breeding world-
ing. Difficulties have also been posed by the wide and in rice breeding in China since the
small anther size and relatively low number 1970s. The production of DHs has become
of microspores per anther in legume crops the preferred tool in many advanced plant
(Croser et al., 2006). breeding institutes and commercial compa-
The DH technique offers an efficient nies for breeding many crop species. Due
tool for extracting individual gametes from to the obvious advantages of DH lines and
heterozygous materials and transform- the enhancements made in in vivo haploid
ing them into homozygous lines that can induction in recent years, many commer-
be reproduced ad libitum by selfing. DHs cial breeding companies such as Agreliant,
extracted from a heterogeneous popula- Monsanto and Pioneer are presently adopt-
tion, e.g. landraces, represent immortal- ing or are already routinely using this
ized, reproducible gametes that can be technology in their maize breeding pro-
immediately evaluated to assess their true grammes (Seitz, 2005). Recurrent selection
breeding potential for target traits. They for testcross performance using DHs has
can also serve as source material for breed- reduced the cycle length and improved
ing programmes of hybrids and synthetics. genetic advance (Gallais and Bordes, 2007).
Furthermore, DH lines may be used for In some companies in vivo haploid induc-
long-term conservation of heterogeneous tion has more or less replaced conventional
germplasm resources such as landraces line development with up to 15,000 DH
without the risk of genetic drift and other lines per year per breeding programme and
changes in gene frequencies, as well as for over 100,000 DH lines per year across all
in-depth characterization of the breeding programmes at costs of US$10 or less per
potential of each heterogeneous germplasm DH line. The first maize hybrids produced
collection because each of the extracted DH using DH lines have been commercialized
lines can be evaluated in replicated trials in in the USA and Europe (W. Schipprack,
diverse environments. University of Hohenheim, personal com-
With some DH methods, only a tiny munication). However, the development
fraction of the haploid seedlings will ger- of new, more efficient and cheaper large-
minate and survive to the adult stage due scale production protocols has meant that
to the uncovered genetic load and the stress DHs have also recently been applied in less
in plant development exerted by colchi- advanced breeding programmes.
cine treatment for chromosome doubling.
Nevertheless, because the DH technique is
rather simple, it is feasible to generate and
identify large numbers of haploid seeds, 4.2.7 Limitations and future prospects
treat them with colchicine and transplant
them to the field. Hence, by starting with a Genetics and breeding in DHs have not
sufficiently large number of haploid seeds given the desired and expected dividends,
it is possible to generate hundreds of via- despite the substantial investments made
ble DH lines with acceptable agronomic in haploid research since the late 1980s.
performance. Some of the widely recognized limitations
DHs are essentially important in the of DH breeding are as follows: (i) haploids
evaluation of diversity, because they fix cannot be obtained in the high frequency
Populations in Genetics and Breeding 131

required for selection in most important 4.3.1 Inbreeding and its genetic effects
crop species; (ii) the costbenefit ratio in
DH breeding is often not favourable, thus RILs result from continuous inbreeding such
discouraging its use despite the obvious as selfing or sibmating starting from an F2
advantages; (iii) haploids and DHs will population until homozygosity is reached.
express recessive deleterious traits and There are two genetic responses to inbreed-
deleterious mutations may arise during the ing, gene recombination and genotype
DH development process including anther homogenization. Starting from a heterozy-
culture, particularly for open-pollinated gote at a locus A-a, for example, selfing will
species; (iv) different ploidy levels may be produce three genotypes, AA, Aa and aa.
available so that haploid status may need With continuous selfing, two homozygotes,
to be confirmed cytologically; alternatively, AA and aa, will not segregate, while the
pollen culture may be necessary, which heterozygote Aa will continue to segregate
is expensive and has a relatively low suc- producing the three genotypes. However,
cess rate and is also genotype-dependent the proportion of heterozygotes in the popu-
in many species; (v) doubled haploidy may lation will decrease with continuous selfing
also decrease genetic diversity, which is and will approach zero. This process can be
better maintained in heterozygous lines; described as below.
(vi) the success of the DH method is highly Consider one locus with two alleles,
genotype dependent, so is not yet suitable A and a, underlying continuous selfing.
for all breeding programmes; (vii) some Homozygotes will increase by 50%, while
techniques, e.g. inducers in maize (espe- heterozygotes will decrease by 50% with
cially the good ones), are proprietary and each generation of selfing. At generation t,
not available to all interested breeders; and the proportion of heterozygotes in the
(viii) health and legal concerns related to population will be (1/2)t, while the propor-
handling the chemical doubling agents. tion of homozygotes will be 1 (1/2)t ; the
The Third International Conference on homozygotes AA and aa each account for
Haploids in Higher Plants (1215 February [1 (1/2)t]/2 = (2t 1)/2t+1 (Table 4.1).
2006, Vienna, Austria) highlighted the When two or more loci, for example k
following issues that are important to future loci, are involved, successive selfing from F1
studies on DHs: (i) new methods of haploid hybrids will produce (1/2)tk heterozygotes
and DH plant formation; (ii) mechanism and [1 (1/2)t]k = [(2t 1)/2t]k homozygotes
of initiation of haploids; (iii) application at generation t. The more loci are involved,
of haploid cells, gametes, haploid and DH the longer it takes to reach homozygos-
plants in fundamental and applied sci- ity (Fig. 4.2). In the seventh generation of
ence; (iv) genes controlling haploid forma- selfing starting from a heterozygous hybrid
tion from female and male gametes; and for example, the proportion of homozy-
(v) methods of diploidization of haploids. gotes will be 99% for the population with
one heterozygous locus involved, 96% for
the population with five heterozygous loci
involved, 89% for 15 loci, 79% for 30 loci
4.3 Recombinant Inbred Lines (RILs) and 46% for 100 loci.
If heterozygous loci are linked,
Recombinant inbred lines or random inbred successive inbreeding can still produce
lines (RILs) are usually a part of the ultimate a homozygous population. However, the
products of many breeding programmes rate of approach to homozygosity depends
and are also used as genetic materials. They on the recombination frequencies between
can be produced by various inbreeding the linked loci. The lower the recombina-
procedures. To help understand the whole tion frequency, the higher the proportion of
process of development and applications homozygotes in the population and the more
of RILs, the inbreeding procedure and its rapidly the population becomes homog-
effects will be discussed first. enized. If the recombination frequency, r,
132 Chapter 4

Table 4.1. Genotypes derived from a single-locus heterozygote and their frequencies in selfing generations.

Genotype
Frequency of Frequency of
Generation AA Aa aa heterozygotes homozygotes

0 1 - 1 0
1 1/4 2/4 1/4 1/2 50.0
2 3/8 2/8 3/8 1/4 75.0
3 7/16 2/16 7/16 1/8 87.5
4 15/32 2/32 15/32 1/16 93.8
5 31/64 2/64 31/64 1/32 96.9
10 1023/2048 2/2048 1023/2048 1/2048 99.9

t (2t 1)/2t + 1 2/2t + 1 = 1/2t (2t 1)/2t + 1 1/2t 1 1/2t

100

75
Homozygotes (%)

1 5 10 20 40 100
50

25

0
1 2 3 4 5 6 7 8 9 10 11 12
Generations of selfing

Fig. 4.2. Effects of generations and genetic loci on the proportion of homozygotes in self-pollinated
populations (numbers of generations are 1, 5, 10, 20, 40, 100).

is close to zero or two loci are completed the genetic combinations of two parental
linked, the rate of homogenization will genomes represented in individual F2 plants
be close or equal to the rate for the popu- are each represented by an RIL (Fig. 4.3).
lation with one heterozygous locus. If r is The genetic combinations of two parental
about 50%, the rate of homogenization will genomes are fixed in a group of RILs.
be about the same as that for the popula- For quantitative traits that are con-
tion with two heterozygous loci. It can be trolled by polygenes or multiple QTL, the
estimated that for two linked loci and after mean value of the population will return
one generation of selfing, the proportion of to the average value of the parental lines
homozygotes will be 41% for r = 10%, 34% because dominance and dominance-related
for r = 20%, 26% for r = 40% and 25.26% epistasis will dissipate with increasing
for r = 45%. homogenization. The variance will also
Continuous inbreeding (e.g. selfing) change with increasing homogenization
results in the fixation of segregation so that but the direction of change will depend
Populations in Genetics and Breeding 133

P1 P2 making it possible to manipulate large-


sized populations. For some plant species,
 such as tobacco and Brassica however, self-
incompatibility prohibits the production of
RILs through inbreeding.
F1

4.3.2 Development of RILs

RILs are the products of successive inbreed-


F2 ing. Based on reproduction systems and the
degree of inbreeding, there are several types
of procedures for developing RILs.
Full-sib mating: for outbreeding organ-
isms, the most severe inbreeding is full-sib
mating, i.e. mating between the offspring
of the same parents. Because outbreeding
organisms are highly heterozygous, they
RIL have to be inbred for several generations
to approach homozygosity. The inbreeding
parents can be then used to produce prog-
enies that will be intermated to produce the
next generation of progenies. This process
Fig 4.3. Production of RILs by successive selfings. will continue until the progenies are highly
Two parental lines, P1 and P2, are crossed to homozygous.
produce F1. The F1 is then selfed to produce F2. Selfing: for self-pollinated plants, cul-
The selfing process continues until a certain level tivars are genetically homozygous so they
of homozygosity is reached. The end product can be used to produce hybrids directly, fol-
consists of a set of RILs, each of which is a fixed lowed by successive selfing. There are two
recombinant of the parental lines. different procedures for the management
of the progenies, bulking and SSD. In the
on the effect of related genes and their bulking method, hybrids are bulk planted
interactions. Figure 4.4 shows the changes and harvested until F5 to F8 before they are
in mean value and variance in RIL popu- planted by families.
lations derived by SSD under different
genetic models. Single seed descent (SSD) method
In animals, RILs are usually stable for up
to 20 generations of sibmating. Such long- The SSD method was proposed by the
term continuous sibmating results in such Canadian scientist Guolden in 1941.
a low viability that it is very hard to main- Starting from F2, one or several seeds are
tain the population. The mouse was the first harvested from each plant and planted to
animal used for genetic mapping with RILs produce the next generation until F5 to F8.
and its RIL population is relatively small. When most plants are homozygous, all the
However, the problems associated with SSD seeds from each plant are harvested to
small population size can be ameliorated by produce RILs. Plant breeders use three pro-
using combined information from multiple cedures to implement the concept of SSD
sets of RILs. In plant species by contrast, it (Fehr, 1987).
takes about half of the time required in ani-
mals to obtain stable RILs through inbreed- SINGLE-SEED PROCEDURE. When the single-
ing. Also, maintaining RIL populations in seed procedure is used, the size of the pop-
plants can be achieved at much lower costs, ulations will decrease in each generation
134 Chapter 4

A B

38 10

III
III
34 8

IV
I

Variance
Mean

30 6
I
IV II

26 4

P1 II
22 P2 2

P F1 F2 F3 F4 F5 F6 F2 F3 F4 F5 F6
Generation Generation

Fig. 4.4. Change of mean (A) and variance (B) in RIL populations derived by SSD. (I) Additive increasing
alleles are completely dominant. (II) Additive without dominance effect. (III) Additive increasing alleles are
completely dominant with complementary interaction. (IV) Additive increasing alleles are completely domi-
nant with duplicate interaction.

because of lack of seed germination or fail- hill the following generation. An individual
ure of plant establishment to produce seed. plant is harvested from each line when the
It is necessary to decide on the number of population has reached the desired level of
inbred plants that are desired in the last homozygosity.
generation and begin with an appropriate With the single-hill procedure the iden-
population size in the F2 generation. The tity of each F2 plant and its progeny can be
single-seed procedure ensures that each maintained during self-pollination. When
individual in the final population traces to the identity of an F2 is maintained, the seed
a different F2 individual. However, the pro- packet and hill must be properly identified
cedure cannot ensure that a particular F2 with a line designation for planting and
will be represented in the final population harvest.
because failure of any seed to germinate or
generate a productive plant automatically MULTIPLE-SEED PROCEDURE. Use of the single-
eliminates that seeds F2 family. seed procedure requires that the size of the
populations in F2 be larger than in later gen-
SINGLE-HILL PROCEDURE. The single-hill pro- erations, due to lack of seed germination or
cedure can be used to ensure that each F2 plant establishment for seed set. Usually,
plant will have progeny represented in two samples are harvested, one for planting
each generation of inbreeding. Progeny in the next generation and one for a reserve.
from individual plants are maintained as Researchers sometimes bulk two or three
separate lines during each generation of seeds from each plant during harvest. Part
inbreeding by planting a few seeds in a of the sample is planted and the remainder
hill or row, harvesting self-pollinated seeds is reserved. The procedure is referred to as
from the hill and planting them in another modified SSD. The number of seeds planted
Populations in Genetics and Breeding 135

and harvested each season depends on the opportunities to recombine in RIL popu-
number of lines desired from the popula- lations. This property was discovered by
tion and the anticipated percentages of seed Haldane and Waddington (1931) by studying
germination, seedling establishment and inbreeding populations. For tightly linked
seed set. loci, the number of recombinants observed
in RILs is twice that observed in the popu-
lations with only one cycle of meiosis. At
Advantages and disadvantages of SSD
the beginning stage of genetic mapping,
procedures
this multiple recombination in RILs makes
Fehr (1987) summarized the merits of the it difficult to detect linkage. Once linkage
SSD procedures and indicated the follow- relationships are roughly established among
ing advantages: loci, the greater frequency of recombination
makes it easy to detect non-allelism among
They are an easy way to maintain popu- loci. It also makes the estimation of genetic
lations during inbreeding. distances more accurate because the con-
Natural selection does not influence fidence interval for an estimated genetic
the population unless genotypes differ distance is a function of recombination
in their ability to produce at least one frequency. With the increased number of
viable seed each generation. meiosis events, there are more opportunities
The procedures are well suited to green- to find recombinants between two tightly
house and off-season nurseries where linked loci (Fig. 4.5).
the performance of genotypes may not In populations that have undergone only
be representative of their performance one cycle of meiosis, recombinant frequency
in the area in which they are normally r (%) is linearly related to map distance
grown. R (cM), as indicated by the dashed line in
The disadvantages are: (i) artificial selec- Fig. 4.5. In RIL populations derived from
tion is based on the phenotype of individual selfing, r is almost equal to 2R when the
plants, not on progeny performance, when map distance is small, which is indicated
SSD is used for cultivar development rather by the solid line and formula R = r/(22r)
than genetic population development; and (Fig. 4.5). For the RIL populations derived
(ii) natural selection cannot influence the
populations in a positive manner unless
undesirable genotypes do not germinate or 50
set any seed.
Recombinant frequency r (%)

2R
40 r =
1+2R

4.3.3 Map distance and recombinant 30


fraction in RIL populations
20
Theoretically and in practice, no matter how
many cycles of inbreeding are completed,
some degree of heterozygosity will always 10
exist in the RIL population. From the above
discussion, we can estimate the remain-
10 20 30 40 50
ing heterozygosity for each generation of
Map distance R (cM)
inbreeding. In genetic mapping, nearly com-
pletely homozygous RILs are used. RILs Fig 4.5. Relationship between map distance
have undergone several cycles of meiosis (R) and recombinant frequency (r) for RIL
before fixation, which differs from F2 or BC populations derived by continuous selfing (solid
populations where only one cycle of meiosis line) and for populations that have undergone only
occurs. As a result, linked genes have more one cycle of meiosis, e.g. F2 or BC (dashed line).
136 Chapter 4

from sibmating, the skew becomes more sig- populations of comparable population size.
nificant with r nearly equal to 4R when the According to Taylor (1978), RILs derived
map distance is small. from sibmating were more powerful in the
estimation of map distances than popula-
tions undergoing single meiosis when R
12.5cM. Based on Taylors method, it can be
4.3.4 Construction of genetic inferred that RILs derived from self-pollina-
maps using RILs tion have greater influence on the estima-
tion of map distance when R 23cM.
As each RIL is inbred as a DH line and thus Because of the advantages of RILs,
can be propagated indefinitely, a panel of they have been receiving great attention in
RILs has a number of advantages for genomic genomics research. Numerous RIL popu-
studies: (i) each line needs to be genotyped lations have been developed in plant spe-
only once; (ii) multiple individuals can be cies, especially in maize and rice. Burr et
phenotyped from each line to reduce indi- al. (1988) reported RFLP maps constructed
vidual, environmental and measurement var- using two maize RIL populations, T232
iability; (iii) multiple invasive (destructive) CM37 and CO159 Tx303. Among
phenotypes can be obtained on the same set 334 mapped genetic loci, 220 were poly-
of genomes; and (iv) as recombinations are morphic in both populations. By comparing
more frequent in RILs than in populations the map distances obtained from these two
with only one meiosis, greater resolution populations with each other and with pub-
can be achieved in genetic mapping. licly accepted map distances, they found
In genetic mapping with RIL popula- that the differences could be twice as large
tions, recombinant frequency should be in some cases. Although these differences
converted into map distance using the for- were still within the range of confidence
mula R = r/(22r) proposed by Haldane and intervals, they might be due to the genetic
Waddington (1931). There are no mapping difference in recombinant frequencies at
functions available for RIL populations to specific chromosomal regions. In maize
adjust for double crossover events as there there is no significant polymorphism caused
are for populations with one cycle of mei- by chromosome rearrangement, except for
osis as discussed in Chapter 2. When the chromosome 10. Therefore, it is not surpris-
map distance is within the range that allows ing that there was no significant difference
confidence about linkage detection, recom- in map distance between the two maize RIL
binant frequency has a linear relationship populations. Table 4.2 provides some exam-
with map distance (Fig. 4.5; Silver, 1985). ples of RIL populations developed in maize
Non-linked loci may be linked simply (Burr et al., 1988) and in rice (Xu, Y., 2002)
due to chance. These false linkages can often that have been widely used for linkage map-
be confirmed by whether a linkage detected ping and gene tagging.
with one marker is also judged to be linked
by other markers in the same linkage group
and whether the suspected linkage found in 4.3.5 Intermated RILs and nested RIL
one population can also be detected in other populations
RIL populations. Mouse geneticists dis-
cussed the case when a linkage could not be
Intermated RILs
certain because of small population sizes,
and Silver (1985) provided a table for the The production of RILs allows for the accu-
95 and 99% confidence intervals for esti- mulation of recombination breakpoints
mated map distances based on RILs derived during the inbreeding phase. However, the
from sibmating. At low rates of recombi- accumulation in RILs is limited by the fact
nation, these intervals are relatively small that each generation of inbreeding makes
when compared with those obtained from the recombining chromosomes more simi-
the binomial distribution for F2 and BC1 lar to one another so that meiosis ceases to
Populations in Genetics and Breeding 137

Table 4.2. Some examples of RIL populations developed in maize (Burr et al., 1988) and rice (Xu, 2002).

Species Population Population size No. markersa

Maize T232 CM37 48


CO159 Tx303 160
Mo17 B73 44
PA326 ND300 74
CK52 A671 162
CG16 A671 172
Ch593-9 CH606-11 101
CO220 N28 173
Rice 9024 LH422 194 141
CO39 Moroberekan 281 127
Lemont Teqing 315 217
IR58821 IR52561 166 399
IR74 J almagna 165 144
Zhenshan 97 Minghui 63 238 171
Asominori IR24 65 289
Acc8558 H359 131 225
IR1552 Azucena 150 207
IR74 FR13A 74 202
IR20 IR55178-3B-9-3 84 217
a
Numbers of markers shown for the first generation of genetic maps and more markers have been added to many
of these maps since then.

generate new recombinant haplotypes. As map expansion. Random-mating designs


an alternative to RILs, Darvasi and Soller with variance in offspring number are also
(1995) proposed randomly mating the F2 poor at increasing mapping resolution. It is
progeny of a cross between inbred found- suggested that the most effective designs
ers and using successive generations of ran- for IRIL construction are inbreeding avoid-
dom mating to promote the accumulation ance and random mating with equal con-
of recombination breakpoints in the result- tributions from each parent to the next
ing advanced intercross lines or interma- generation.
ted recombinant inbred lines (IRILs). IRIL
design has obvious appeal in its union of Multi-way or nested RIL populations
the advantages of both IRILs and RILs and
has been employed in the production of Using two or more RIL populations in genetic
mapping populations in several species. As mapping provides several advantages:
a result, IRILs have become interestingly (i) polymorphisms that are not detected in
popular for QTL mapping. one population may be detected in another;
Breeding designs for IRILs have been (ii) weak linkage identified in one popula-
investigated by Rockman and Kruglyak tion can be confirmed or excluded by using
(2008). Their results indicated that the other populations; (iii) multiple popula-
simplest design, random pair mating with tions with shared genetic data can be com-
each pair contributing exactly two offspring bined and considered as a single population
to the next generation, performed as well to provide more reliable results; and (iv)
as the most extreme inbreeding avoidance multiple populations provide a wide spec-
schemes in expanding the genetic map, trum of target loci across the genome since
increasing fine-mapping resolution and for quantitative traits the related loci with
controlling genetic drift. Circular mating genetic differences between the parents are
designs offered negligible advantages for almost always different from population to
controlling drift and gave greatly reduced population.
138 Chapter 4

The Complex Trait Consortium for ing variation in maize, 25 RIL mapping
mouse proposed the development of a large populations were created. Twenty-five
panel of eight-way RILs (Complex Trait diverse lines were selected to capture
Consortium, 2004). An eight-way RIL, also 80% of the nucleotide polymorphism in
known as Collaborative Cross, is formed by maize. In order to provide a uniform eval-
intermating eight parental inbred strains uation background, each line was crossed
followed by repeated sibling mating to to a common parent, B73 (the standard
produce a new set of inbred lines whose US inbred), to form 25 RIL populations.
genome is a mosaic of the eight parental Each of these RIL populations has at least
strains (Broman, 2005). Such a panel of 200 RILs, each descended from a unique
RILs would serve as a valuable resource F2 plant, resulting in a total of 5000 RILs.
for mapping the loci that contribute to Using SSD and low density planting, 88%
complex phenotypes in mouse and would success in advancing lines per genera-
support studies that incorporate multiple tion was achieved. This has developed as
genetic, environmental and developmen- an integrated mapping approach, called
tal variables into comprehensive statisti- Nested Association Mapping (NAM),
cal models of complex traits (Complex which exploits simultaneously the advan-
Trait Consortium, 2004). The genomes of tages of linkage analysis and association
eight founder strains are rapidly combined (or linkage disequilibrium, LD) mapping
and are then inbred to produce finished as discussed in Chapter 6. The power of
RIL strains. Eight-way RIL strains achieve NAM for genome-wide QTL mapping has
99% inbreeding by generation 23. Each been demonstrated by computer simu-
strain captures 135 unique recombinant lation with varied numbers of QTL and
events. With genetic contributions from trait heritabilities (Yu et al., 2008). With
multiple parental strains including several a dense coverage (2.6 cM) of common-
wild derivatives, the eight-way RILs will parent-specific (CPS) markers, the genome
capture an abundance of genetic diversity information for 5000 RILs can be inferred
and will retain segregating polymorphisms based on the parental genome informa-
every 100200 bp. This level of genetic tion. Essentially, the linkage information
diversity will be sufficient to drive phe- captured by the CPS markers and the LD
notypic diversity in almost any trait of information among loci residing between
interest. An estimated 1000 strains will CPS markers was then projected to RIL
be required to guarantee high mapping based on parental information, ultimately
resolution and detect extended networks allowing for genome-wide high-resolution
of epistatic and geneenvironment interac- mapping. The power of NAM with 5000
tions. This estimate is based on the statisti- RILs allowed 3079% of the simulated
cal power necessary to detect biologically QTL to be precisely identified. In the
relevant correlations among thousands ongoing genome sequencing projects,
of measured traits. A set of 1000 strains NAM would greatly facilitate complex
containing 135,000 recombinant events is trait dissection in many species in which
a far more powerful and flexible research a similar strategy can be readily applied.
tool than a set of 100 strains containing the
same total number of recombinant events.
Mounting evidence suggests that genegene 4.4 Near-isogenic Lines (NILs)
interactions (epistasis) are crucial in many
complex disease aetiologies. A set of 1000 Near-isogenic lines (NILs) derived from
strains will readily support simultaneous inbreeding are in most cases the product of
mapping of many two-way and three-way successive backcrossing. The methods for
epistatic interactions. obtaining NILs, including the genetic effects
Similar resources have been of backcrossing will be described first, fol-
developed in maize, Arabidopsis and lowed by their applications in genomics
Drosophila. To map much of the segregat- and plant breeding.
Populations in Genetics and Breeding 139

4.4.1 Backcrossing and its genetic Donor Recurrent


effects parent (DP) parent (RP)

Backcrossing is a hybridization method by


which the hybrid is crossed back to one of
its parental lines. The hybrid could be of any
generation, although usually it starts with
the F1. Backcrossing has been widely used in F1
plant breeding to improve one or a few major
traits (target traits) that are agronomically or
genetically important but are lacking in a
current commercial or elite cultivars. In this BC1
case, the commercial cultivar is crossed with
the germplasm (the donor) that provides the
target trait and then a backcross programme
starts by backcrossing the hybrid to the com-
BC2
mercial cultivar. This process continues by
selecting the progeny produced in the pre-
vious backcross that has the target trait to
backcross again to the commercial cultivar,
until the progeny very nearly resembles the BC3
commercial cultivar except for the target trait
transferred from the donor.
The commercial cultivar used recur-
rently in this process is called the recur- BC4
rent parent (RP), while the germplasm as
the donor of the target trait is called the
donor parent (DP). The final product of con- Fig. 4.6. Genetic effect of backcrossing (with perfect
Mendelian segregation).
tinuous backcrossing will be a backcross
inbred line (BIL) with an almost identical
genome to the RP except for the target trait/ If there is no linkage among k loci that
locus. The final BILs are produced by one differ between the two parents, the prog-
or more generations of selfing following the eny that are homozygous for the recurrent
final backcross. A set of BILs can be pro- parent alleles will account for [1 (1/2)t]k
duced simultaneously by producing BILs at the t th generation of backcrossing. That
for different target traits/genes/chromosome is, for the fifth backcross and k = 10 loci,
regions using the same recurrent parent. In 72.8% of the backcross progeny will be
the extreme case, BILs can be produced for homozygous and have the same genotype as
any of the genetic loci distributed on the the recurrent parent at these loci. Backcross-
whole genome by marker-assisted selection derived homogenization is different from
(MAS), which will be discussed later. that resulting from self-pollination where
With continued backcrossing, the pro- the former is homogenized towards the RP
portion of the recurrent genome in the while the latter is homogenized towards the
backcross progeny increases while that of parental genotypes (both parents) and their
donor genome decreases. When one locus recombinants.
is involved in controlling the target trait, the Because of genetic linkage, the donor
proportion of the allele from the non-recurrent genome will be replaced by the recurrent
parent or DP in the backcross progeny will be genome at a lower rate in the genomic
(1/2)t, while the proportion of the allele from region around the target gene and the genes
the RP will be 1 (1/2)t, at the tth genera- nearby than in other regions. The low rate
tion of backcrossing. The same is true for the depends on the tightness of linkage, that is,
whole genome (Fig. 4.6). how far the target gene is from the linked
140 Chapter 4

genes. Under conditions of no selection, duction of NILs as discussed by Xu, Y.


the probability of obtaining recombinants, (2002):
that is, the DP alleles at non-target loci Selfing-derived NILs. Pairwise NILs can
that are replaced by the RP alleles, will be be developed through continuous selfing
1 (1 r)t. Therefore, the more backcrosses, while keeping the target trait locus hetero-
the greater the chance of recombination. zygous. Once other genetic loci are almost
When the allele for the target trait all fixed, an additional generation of self-
is dominant, the candidate progeny are ing will result in a pair of NILs that differ
selected for further backcrossing only when only at the target locus (Xu and Zhu, 1994).
they are heterozygous at the target locus Selfing-derived NILs can be any combina-
and bear a high level of phenotypic resem- tion of parental genotypes and each pair
blance to the recurrent parent. In this way, of NILs have identical genetic constitu-
the recovery of recurrent progeny will be tion (except for the target locus), whereas
accelerated. the backcross-derived NILs have the same
When the allele for the target trait is genetic constitution as the RP.
recessive the plants used for backcrossing can Whole genome selection of perma-
also be selfed. The same plants are selfed and nent populations. With the accumulation
backcrossed and their progeny can be planted of permanent mapping populations such
side by side and compared to find which of as RILs and DHs described in the previous
the plants used for backcrossing contain the sections, it is possible to find two which
target allele based on the phenotype of the are almost genotypically identical for the
selfed progeny. If the selfed progeny is seg- whole genome except for one or a few
regating for the target trait, the backcross marker loci.
progeny from the same plant can be used for Mutation. Creation of a collection of
further backcrossing. If the selfed progeny is single-locus mutants from a donor inbred
not segregating for the target trait, the corre- line is a quick approach to producing a large
sponding progeny from the backcross should number of NILs. For most mutants, mutation
be excluded from further backcrossing. only occurs at one or few genetic loci. These
Using linked markers to select the target trait mutants can be considered near-isogenic to
(Chapter 8) will make it possible to continue their wild-type donor and are thus known
backcrossing without the required selfing to as isomutagenic lines (IMLs). IMLs have
select the plants containing recessive genes. been widely used in functional genomics
According to the rate of recovery of research for gene cloning (see Chapter 11 for
the RP genome determined by [1 (1/2)t]k, details).
an infinite number of backcrosses are theo- Chromosome substitution. Using chro-
retically needed to eliminate the complete mosome engineering and/or MAS, whole or
genome from the DP. The currently available partial chromosome substitution lines can
BILs are mostly derived from fewer than ten be created, so that each line has one chro-
(usually five to six) backcrosses. As a result, mosome or partial chromosome replaced. A
specific BILs may still contain alleles from set of chromosome substitution lines, also
the donor at a large number of loci in addi- known as introgression lines, can be pro-
tion to the target locus. This is the reason why duced to cover the whole genome so that
BILs derived from repeated backcrosses are each represents a piece of chromosome
frequently called NILs. from the donor genome.

4.4.2 Other methods for production 4.4.3 Introgression line libraries


of NILs
Genetic stocks of genome-wide coverage
In addition to the repeated backcrosses, are the platform required for large-scale
there are several other approaches to pro- gene discovery (Chapter 11) and efficient
Populations in Genetics and Breeding 141

markertrait association (Chapter 6). them involve relatively small population


Extensive research has been carried out sizes (Dwivedi et al., 2007) that are not suf-
worldwide to develop genome-wide ficiently accurate to cover a significant part
genetic stocks for functional genomics of the genetic variation. In barley, a set of
research. Eshed and Zamir (1994) proposed 146 ILs was derived from BC2F6 of the cross
to exploit introgression lines (ILs), also Harrington and Caesarea (Hordeum vul-
known as chromosome substitution lines gare ssp. spontaneum), covering an aver-
(CSSLs) or IL libraries, which could be age 12.5% of the H. spontaneum genome
generated by repeated backcrossing and (Matus et al., 2003). In rice, there are sev-
MAS with the whole genome covered by eral reports involving development of over
the contigs of the introgressed segments 100 ILs. For example, 147 ILs were devel-
from the DP. ILs have a high percentage of oped from Oryza sativa (Taichung 65) and
the RP genome and a low percentage of the Oryza glumaepatula reciprocal crosses
DP genome. They offer several advantages containing O. glumaepatula or Taichung
over conventional populations: (i) they pro- 65 cytoplasm but with entire chromosome
vide useful stocks for highly efficient QTL segments from O. glumaepatula (Sobrizal
or gene identification and fine mapping; et al., 1999); 140 ILs were derived from a
(ii) they can contribute to the detection of cross between japonica cv. Nipponbare,
epistatic interactions between QTL; and and an elite indica line Zhenshan 97B (Mu
(iii) they can be used to map new region- et al., 2004); 159 ILs carrying variant intro-
specific DNA markers (Eshed and Zamir, gressed segments from Oryza rufipogon
1995; Fridman et al., 2004). Several sets of Griff. in the background of the indica cul-
IL are now available in barley, maize, rice, tivar Guichao were developed, represent-
soybean and wheat that contain beneficial ing 67.5% of the O. rufipogon genome and
alleles from wild relatives thus enrich- a 92.499.9% (97.4% average) representa-
ing the genetic diversity in the primary tion of the RP genome (Tian et al., 2006). As
gene pools of these crops. These ILs when additions to this list, three examples from
crossed with cultivars produce progenies rice and tomato will be discussed below in
with enhanced trait values as demonstrated detail.
by the increased yield in tomato and wheat
(Gur and Zamir, 2004; Liu et al., 2006). Rice ILs
ILs are of particular value for assessing
the phenotypic consequences associated To facilitate research into the functional
with diverse donor alleles in particular genomics of complex traits in rice, Z.K.
genetic backgrounds and subsequently for Li et al. (2005) developed over 20,000 ILs
map-based cloning of genes (Zamir, 2001). in three elite rice genetic backgrounds
IL libraries will facilitate large-scale QTL (two high-yielding indica cultivars, IR64
cloning and functional genomics research and Teqing, and a new plant type tropical
into complex phenotypes by resolving four japonica, NPT (IR68552-55-3-2)) by marker-
major technical difficulties (Li, Z.K. et al., assisted selective introgression for a wide
2005). They allow: (i) efficient identifica- range of complex traits, including resist-
tion of QTL with a large effect on specific ance/tolerance to many biotic and abiotic
target phenotypes; (ii) efficient fine map- stresses, morpho-agronomic traits, physio-
ping of target QTL and determination of logical traits, etc. A total of 195 accessions
candidate genes underlying QTL; (iii) effi- from 34 countries were used as donors in
cient determination and verification of the backcrossing programme, represent-
functions of QTL candidate genes; and (iv) ing different subspecies, ecotypes and
the dissection of gene networks and meta- gene pools. Twenty-five randomly chosen
bolic pathways underlying the complex plants from each BC1F1 population were
phenotypes. backcrossed with each RP to produce 25
There are many reports of the develop- BC2F1 lines. From each cross, 25 BC2F1 lines
ment of ILs in various crops, but most of were planted in the following season and
142 Chapter 4

seed from individual plants of 25 BC2F1 the recipient and 24 cultivars including
lines from each cross were bulk-harvested 14 indica and ten japonica cultivars from
to form a single bulk BC2F2 population. worldwide sources as donors (Xi et al.,
In addition, 3050 superior high-yielding 2006). The current library consists of 1529
BC2F1 plants from each cross were further SSSLs with an average substitution seg-
backcrossed with the RPs to produce BC3F1 ment length of 18.8 cM. The library cov-
lines and likewise BC4F1 lines. Similarly, ers 28705.9 cM of genome in total which
BC3F2 and BC4F2 bulks were generated is equal to 18.8 genome-equivalents. This
by bulk-harvesting seeds of all BC3F1 library has been used for QTL mapping
and BC4F1 lines from each cross. The BC of many traits (Xi et al., 2006; Liu et al.,
bulks were then screened for their resist- 2008).
ance or tolerance to different abiotic and
biotic stresses, including drought, salin- The Lycopersicon pennellii ILs
ity, submergence, anaerobic germination,
zinc deficiency, brown planthopper, etc. Using whole genome marker analysis,
and in all cases the severity of the stresses Eshed and Zamir (1994) developed a per-
was strong enough to kill the RPs and only manent mapping population designed for
the surviving BC progeny were selected. QTL analysis. This resource is composed
Selection for many agronomic traits, such of a tomato cultivar (Lycopersicon esculen-
as flowering time, plant height, plant tum cv. M82) which includes single intro-
type related traits (leaf and culm angles), gressed genomic regions from the wild
grain quality parameters, yield related green-fruited species L. pennellii. This
traits, etc., was also undertaken based congenic resource, composed of 76 ILs,
on visual observation of BC bulks. Each provides nearly complete coverage of the
of the selected BC progeny was tested for wild species genome. The IL map is con-
selected phenotypes and selfed for two or nected to the high-resolution F2 map which
more generations to form a homozygous is composed of 1500 markers (IL chromo-
IL. ILs within each genetic background some maps) with seed of the ILs available
are phenotypically similar to their RP but through the C.M. Rick Tomato Genetics
each carries one or few traits introgressed Resource Center, University of California
from a known donor. Together, these ILs Davis.
contain a significant portion of the alle- Applications of the ILs developed in
les affecting the selected complex pheno- rice and tomato will be further discussed in
types in which allelic diversity exists in later chapters.
the primary gene pools of rice. A forward
genetics strategy was proposed and dem-
onstrated with examples for using these
ILs for large-scale functional genomics 4.4.4 Gene tagging strategy using NILs
research (Chapter 11). Complementary
to the genome-wide insertional mutants, Due to genetic linkage, the chromosome
these ILs provide new methods for highly fragments around the target locus will be
efficient gene discovery, candidate gene dragged into backross progeny and may be
identification and cloning of important retained in the subsequent progeny. This
QTL for specific phenotypes based on the phenomenon is called linkage drag. The
convergent evidence from QTL position, basic idea behind gene tagging using NILs
expression profiling and functional and is to use the opportunity provided by link-
molecular diversity analysis of candidate age drag to identify the molecular mark-
genes. ers located on the chromosome segments
In another example in rice, a single- around the target locus. This can be accom-
segment substitution line (SSSL) library plished by comparing the marker genotypes
was developed using Hua-Jing-Xian, an among RP, DP and NILs. When the genotype
elite indica cultivar from South China as at a particular marker locus is the same for
Populations in Genetics and Breeding 143

the NIL and DP but different for the RP, pos- 4.4.5 Theoretical considerations
sible linkage between this marker and the in genetic mapping using NILs
target locus can be determined (Fig. 4.7).
Success in gene tagging using NILs One source of errors in NIL-based map-
relies on the assumption that there is genetic ping is the possibility that the RP not only
difference between DP and RP in the chro- differs from the DP in the regions around
mosome region flanking the target locus. the target locus but also differs at other
Apparently, the likelihood of detecting this loci distributed all over the genome. This
difference depends on the length of the chro- is because for a limited number of back-
mosome regions in the DP which have been crosses, t, the DP genome retained in the
retained during the backcross; this parame- backcross progeny is 1/2t, which creates
ter decreases with the increasing number of the possibility that the polymorphic mark-
backcrosses. Detecting these differences also ers in the retained regions could be falsely
depends on the molecular polymorphism in identified as being located in the target
this region in DP and RP. When DP and RP region. These false positive markers are not
are from different species such as cultivated actually linked with the target region. As
and wild species, there is a high probability calculated theoretically by Muehlbauer et
of finding polymorphisms between them. al. (1988), for a genome containing 20 chro-
Conversely, the probability will be low if DP mosomes each of 50 cM, progeny derived
and RP are genetically closely related to each from five backcrosses will retain DP alleles
other. The likelihood of detecting molecular at four out of 100 randomly selected marker
polymorphisms between DP and RP at the loci. Among these four retained DP loci, it
target region can be increased by using large is estimated that only one or two will not
numbers of markers and/or different types be linked to the target gene. This estima-
of markers. tion is based on the assumption that there
With the increasing availability of com- is no selection for the RP phenotype, that
plete sets of BILs or NILs that cover the is, individuals that are heterozygous at the
whole genome, NIL mapping strategies pro- target marker loci are selected randomly to
vide a convenient approach to tagging and be backcrossed with RP.
isolating numerous genes. In the following
section, the discussion will be focused on Backcross introgression without selection
major gene-related issues while QTL map- for the RP phenotype
ping using NILs is described in Chapter 7.
Assume that a plant species has n chro-
mosomes, each of L M and the objective is
A B to transfer the target gene (classic marker)
located in the middle of one of the chro-
mosomes from DP to RP by t backcrosses
(b = t + 1). Suppose there are 100 polymor-
Marker 1
phic markers between DP and RP and these
markers are randomly distributed over the
Marker 2 genome. In the final backcross progeny,
the proportion of the marked chromosome
(M, where the target gene is located) from
the DP will be
DP NIL RP DP NIL RP
UMb = [2 (1ebL/2)/b]/L
Fig. 4.7. Gene tagging using NILs. (A) Possible
linkage, NIL has the same allele as DP at Marker
1 locus. (B) No linkage, NIL has different alleles with variance
from DP at both marker loci (1 and 2). NIL,
near-isogenic line; DP, donor parent; R, recurrent VMb = (2/L2){[2 (bL + 2)ebL/2]/b2 [(1/b)
parent. (1 ebL/2)]2}
144 Chapter 4

With 20 chromosomes (n = 20) each 50 cM erogeneous backgrounds, the proportion of


long (L = 0.5) and five backcrosses (t = 5, variance explained by the target QTL will
and b = t + 1 = 6), the proportion of the DP increase and minor QTL can be identified.
genome containing the target chromosome is Without disturbance from the background
effect, multiple QTL can be easily sepa-
UMb = 0.5179 0.2498 = 51.79% rated. Since all genotypic variation comes
24.98% from the target loci, environmental effects
can be estimated. In QTL cloning, NILs have
It is expected that 51.79% of the tar- been used to map the target QTL precisely
get chromosome will come from DP. As the by using all of these advantages.
chromosome is 50 cM in length, this propor-
tion can be converted into genetic map dis- Backcross introgression with selection
tance by the following calculation for RP phenotype

0.5179 50 cM = 25.90 12.49 cM The above discussion applies to backcross-


ing without selection for RP phenotype. In
In the backcross progeny, the propor- reality, however, the individuals most like
tion of the DP genome in the non-target the RP are always selected for backcross-
chromosomes (N) is ing in order to obtain the introgressed lines
as rapidly as possible. The effectiveness of
UNb = (1/2)b = 0.015625 = 1.56% phenotype-selected backcrossing depends
on how the following requirements are met:
with variance (i) allelic differences exist not only at the tar-
get locus but also at many other loci across
VNb = (1/2)b(2/L)[(1/b)(1 ebL/2)] [(1/2)b]2 the genome between RP and DP; (ii) these
differences control the phenotype in back-
This can be converted into genetic map cross progeny, i.e. are highly inheritable;
distance as follows: (iii) each backcross produces enough prog-
eny to allow individuals of different pheno-
0.015625 50 cM = 0.78 4.43 cM. types to be identified. If these requirements
are met or at least partly met, selection for
In the backcross progeny therefore, the RP phenotype will help to improve the
the proportion of the DP genome (target rate at which backcrossing progeny recover
and non-target chromosomes) in the total the RP alleles not only at the target loci but
genome (T) will be also around the target region.
Selection for the RP phenotype acceler-
UTb = [L UMb + (n 1)L UNb]/(nL) ates the replacement of the DP genome and
at completion of backcrossing the average
with variance proportion of the DP genome retained in the
backcross progeny will be reduced. Because
VTb = [L VMb + (n 1)(L VNb)]/(nL)2 the replacement of the DP genome is pro-
portional to the number of backcrosses,
Therefore the effect of selection for the RP phenotype
on the replacement and retention of the
UTb = [(50 0.5179)+ (20 1) 50 0.015625]/ DP genome will decrease as backcrossing
(20 50) = 0.04074 0.1028 continues.
To derive the same formulas for back-
If NILs are used, interaction between cross introgression with selection for RP
the target QTL and other major genes/ phenotype as those discussed for the case
QTL can be eliminated and only epistasis without RP phenotype selection, the three
between multiple target QTL needs to be factors (requirements) affecting pheno-
considered. With removal of noise from het- typic selection mentioned above need to
Populations in Genetics and Breeding 145

be determined. However, these factors vary Selection for the RP phenotype reduces
depending on specific DP RP crosses so the DP genome ratio during the backcross
that it is impossible to develop general process compared to cases where the RP
formulas that are applicable to different phenotype has not been selected.
crosses. Instead, Muehlbauer et al. (1988)
provided two examples to explain the
effect of selection for the RP phenotype on
4.5 Cross-population Comparison:
the retention of the DP genome for marked
and unmarked chromosomes. In each case,
Recombination Frequency and
selection for the RP phenotype signifi- Selection
cantly reduced the DP genome retained in
backcross progeny. 4.5.1 Recombination frequencies
across populations

4.4.6 Application of NILs in gene tagging The recombinant frequencies in DH and RIL
populations have been compared in maize
(Murigneux et al., 1993), wheat (Henry et al.,
NILs have been successfully used in gene
1988) and rice (Courtois, 1993; Antonio
tagging for almost all crop species with
et al., 1996) using populations derived from
available molecular marker systems and
different crosses.
NILs. Some pioneering examples in this
DH lines are used extensively in bar-
field include identifying molecular mark-
ley breeding programmes to reduce the
ers for the Tm-2a gene controlling resist-
time required to obtain pure lines and to
ance to tobacco mosaic virus in tomato
increase breeding efficiency. The avail-
(Young et al., 1988), the Dm1, Dm3 and
ability of high-density linkage maps of
Dm11 genes for downy mildew resistance
barley makes it possible to compare the
in lettuce (Paran et al., 1991) and the Pto
recombination frequencies in H. bulbosum
gene for stem rot (Pseudomonas) resistance
(Hb) and anther culture-derived DH lines
in tomato (Martin et al., 1991). Since then,
(see Section 4.2.1) across most of the bar-
numerous studies using NILs for gene tag-
ley genome. These methods differ in three
ging and map-based gene cloning have been
major aspects: (i) the Hb and anther cul-
reported. From these studies, some general
ture-derived DH lines arise from female
conclusions can be drawn:
and male recombinant products, respec-
Backcrossing significantly reduces the tively; (ii) the optimal donor plant growth
linkage drag around the target region. conditions differ (Pickering and Devaux,
The more backcrosses that are carried 1992); and (iii) the in vitro culture phases
out, the smaller the linkage drag frag- are distinct: microspores evolve into
ment in backcross progeny and the less embryoids that give rise to plantlets, while
frequently will false positives be found in the Hb method, plantlets develop from
between the NILs. zygotic embryos. Recombinant frequency
Molecular markers can be used to is likely to be affected by the first two
improve the efficiency of backcrossing features. Devaux et al. (1995) reported
by significantly reducing the linkage drag the results of an experiment comparing
and increasing the RP genome ratio. map distances observed in Hb and anther
Linkage drag has significant influ- culture-derived DH lines obtained from an
ence on the recovery of the RP genome F1 (Steptoe Morex) hybrid. Male (anther
around the target region, indicating its culture-derived) and female (Hb-derived),
important impact on backcross breed- DH populations were used to map the
ing programmes. barley genome and thus determine the
Using multiple NILs significantly different recombination rates occurring
reduces the probability of reporting during meiosis in the F1 hybrid donor
false positives for linked loci. plants. The anther culture-derived (male
146 Chapter 4

recombination) population showed an abundant male gametes. Male gametes are


18% greater recombination rate than the not only more abundant, but also energeti-
Hb-derived population. This increased cally less costly and subject to selection at
recombination rate was observed for every the pollination and/or fertilization stages.
chromosome and most of the chromosome
arms. Examination of linkage distances
between individual markers revealed eight
segments with significantly higher rate of 4.5.2 Unintentional selection during the
recombination in the AC-derived popula- process of population development
tion and one in the Hb-derived population.
Although three out of eight segments that The process by which genetic and plant
appeared significantly longer in the AC breeding populations are developed may
population are non-telomeric, the most include various unintended selection pres-
significant increases were noted for the tel- sures which result in the deviation of geno-
omeres of the long arms of chromosomes typic and allelic frequencies from Mendelian
2 and 5. expectation. Segregation distortion has been
Significant excess male recombination documented in a wide range of organisms
was also found at the most distal regions including plants. Distorted segregation can
on chromosome 9 in tomato (de Vicente be detected with almost any type of genetic
and Tanksley, 1991), proterminal regions marker, including morphological mutants,
of several linkage groups in Brassica nigra isozymes and DNA markers (Table 3 in Xu
(Lagercrantz and Lydiate, 1995) and for sev- et al., 1997). The same allele at a given locus
eral terminal intervals in Pennisetum glau- can be distorted in either of two directions,
cum (Busso et al., 1995). In barley, most of with an allele frequency higher or lower
the chromosome termini have been tagged than the expected. For example, waxy and
with telomeric markers that were mapped in non-waxy rice kernels on a segregating
the Steptoe Morex HB population (Kilian plant could show ratios of equal to or larger
et al., 1999). These telomeric markers in the or smaller than the expected ratio (3:1), with
AC population will need to be mapped in the proportion of waxy rice kernels ranging
order to test whether an increased frequency from 8.9 to 95.6% depending on the crosses
of recombination at the telomeric regions is (Xu and Shen, 1992d).
a general phenomenon in barley or is lim- As reviewed by Xu et al. (1997), aber-
ited to particular chromosomal arms. rant segregation ratios in plants may arise
With more complete genetic maps now from a variety of physiological or genetic
available for between-sex comparison both causes and may be manifested as differ-
in plants and animals the tendency towards ential transmission in either the male or
increased recombination frequency at the female germ line or may result from post-
telomeres in male meiosis seems to be zygotic selection prior to genotypic evalu-
emerging as a general phenomenon. Since ation. Most commonly, however, skewed
the differences in chiasma distribution segregation appears to arise from male
show a similar trend, increased recombi- gametophytic selection, through the selec-
nation frequency at the telomeres may be a tive influences of the gynoecium, includ-
biologically significant phenomenon. There ing genetic incompatibility, environmental
is some evidence that sub-telomeric regions effects and the differential competitive abil-
of human and cereal genomes are much ity of genetically-variable pollen.
more gene-rich than proximal regions. If this
proves to be true for other species, it could Selection pressure associated with
be speculated that increased male recombi- DH development
nation at the telomeric regions is selectively
advantageous and therefore evolutionarily The representativeness of DH lines can be
conserved since it results in an increased severely affected by the process involved in
number of gene assortments in the more DH development. For the DH populations
Populations in Genetics and Breeding 147

derived from anther culture (male gameto- Selection pressure involved in RIL
phytes), the observed distortion in segrega- development
tion can be attributed to differential viability
or lethality of pollen or to selective regen- Deviation from randomness due to selec-
eration in in vitro culture and clearly not to tion pressure in the production of RILs is
the selective influences of the gynoecium or a potential problem that needs more atten-
the differential competitive ability of pollen. tion. In contrast to the populations derived
The distortion in three chromosomal regions from a one-step homogenization, RILs are
(two on chromosome 2 and one on chromo- produced by many generations of inbreed-
some 10) detected in the DH populations by ing during which plants are subjected to
Xu et al. (1997) indicated an overrepresenta- selection pressures generated by various
tion of alleles from the japonica parent that environmental disturbances and competi-
has been proven to be easily regenerated by tion among plants that may well occur for
anther culture. The other parent is indica many years and seasons and in many loca-
which belongs to a subspecies that is more tions. The distortions resulting from selec-
recalcitrant to anther culture (Shen et al., tion pressures involved in RIL development
1982; Yang et al., 1983). It has been suggested can be understood by comparison of mul-
that these regions may be associated with the tiple populations of different genetic struc-
preferential regeneration of japonica geno- tures derived from the same crosses and by
types during anther culture. Yamagishi et al. comparison of populations produced by dif-
(1996) also identified markers in several ferent approaches.
chromosomal regions that showed aberrant He et al. (2001) compared molecular
segregation ratios favouring japonica alleles marker segregations between DH and RIL
in a DH population, although these markers populations derived by anther culture and
segregated normally in the corresponding SSD respectively from the same rice cross,
F2 population. They concluded that these ZYQ8 (indica) JX17 (japonica). In the RIL
regions contained genetic factors which con- population, 27.3% of the markers showed
ferred a selective advantage on the japonica distorted segregation at the P < 0.01 level, of
genotypes during anther culture. which 90% of the markers favoured indica
Selective regeneration of genotypes alleles while in the DH population, 18.2%
has also been reported in other plants. Very of the markers were skewed almost equally
strong distortions of single locus segrega- towards indica and japonica alleles. This
tions were observed in an anther culture- might reflect the different types of selection
derived barley population (Devaux et al., pressures to which the DH and RIL popula-
1995). Devaux and Zivy (1994) demon- tions were subjected. Eight commonly dis-
strated that some markers showing distorted torted regions on chromosomes 1, 3, 4, 7,
segregation are linked to genes involved in 8, 10, 11 and 12 were detected in both RIL
the anther culture response. In another bar- and DH populations of which seven skewed
ley DH population, a significant proportion towards indica alleles and one towards a
(44%) of the mapped markers showed dis- japonica allele. Five of them were located
torted segregation which was caused mainly near gametophytic gene loci (ga) and/or ste-
by the prevalence of alleles from the parent rility gene loci (S).
that responded better to in vitro culture To compare the frequency and location
(Graner et al., 1991). Although segregation of loci showing distorted allele frequencies
distortion may arise from genetic, physio- between different population types (F2, DHs
logical and/or environmental causes and the and RILs), information from 53 populations
relative contribution of each of these factors with a known number of distorted markers
may differ in specific populations, much of was summarized and analysed (Xu et al.,
the reported segregation distortion in anther 1997). In summary, RIL populations had
culture-derived populations is likely to be significantly higher frequencies of distorted
the result of using parental genotypes that markers (39.4 2.5%) than other population
differ in their response to anther culture. structures (DH: 29.4 3.5%; BC: 28.6 2.8%;
148 Chapter 4

F2: 19.3 11.2%), which may indicate the ity, the gametophyte gene (ga) (Nakagahra,
cumulative effects of selection pressures dur- 1972) also referred to as a gamete eliminator
ing the process of RIL development. Distorted or pollen killer, causing abortion of gametes
segregation in RIL populations derived via (Sano, 1990). A large number of ga loci and
SSD represents the cumulative effect of both sterility gene loci (S) have been identified
genetic (G) and environmental (E) factors on using morphological markers.
multiple generations and the G E interac- If segregation distorters have high herit-
tion becomes more pronounced with the ability, they will be detected in almost any
progress of selfing. Thus, it is difficult to dis- population if the parents differ at the genetic
tinguish genetic from environmental causes locus in question and in almost any environ-
of distortion in RIL populations. However, an ment in which the population is grown. For
over-representation of indica alleles in two a specific chromosomal region, the prob-
chromosomal regions on chromosomes 3 and ability of a distortion locus being falsely
6 was specific to one RIL population. These assigned will decrease with the number of
chromosomal regions may be associated with populations sharing the same distortion and
a selective advantage in the indica growth with the number of markers in a cluster of
environment in which the RIL population distorted markers. Use of multiple popula-
was developed. tions developed in multiple environments
In contrast to DH and RIL populations would facilitate the detection of highly
where genotypic frequencies are a perfect heritable genetic segregation distortion fac-
reflection of the allele frequencies due to tors. Chromosomal regions associated with
lack of heterozygotes, F2 populations offer marker segregation distortion in rice were
the potential to detect an advantage or dis- compared using six molecular linkage maps
advantage associated with the heterozygote (Xu et al., 1997). Mapping populations were
class at specific loci, even when the paren- derived from one interspecfic backcross
tal allele frequency is normal. and five inter-subspecfic (indica/japonica)
Expression of distorters with low her- crosses including two F2 populations, two
itability will be influenced by the environ- DH populations and one RIL population.
ment and therefore these will be detected Marker loci associated with skewed allele
only in experiments carried out under well- frequencies were distributed on all 12 chro-
controlled conditions. Because the segrega- mosomes. Distortion in eight chromosomal
tion distortion occurs either during, or just regions showed the grouping of previ-
before or after meiosis, the experimental ously identified gametophyte (ga) or steril-
environment must be controlled during the ity genes (S). Three additional clusters of
reproductive phase of the parental lines, skewed markers were observed in more than
although the effect will only be detected in one population in regions where no game-
the offspring. tophytic or sterility loci had been reported
previously. A total of 17 segregation distor-
Genetics of selection associated tion loci were postulated and their locations
segregation distortion in the rice genome were estimated. Using a
single F2 cross, Harushima et al. (1996) iden-
The genetic control of distorted segregation tified 11 major segregation distortions at ten
has been studied in rice (as summarized by positions on chromosomes 1, 3, 6, 8, 9 and
Xu et al., 1997) and barley (Konishi et al., 10 and at least two of these segregation dis-
1990, 1992) using morphological and iso- tortion regions (on chromosomes 1 and 3)
zyme markers. The genetic basis of seg- were also detected by Xu et al. (1997).
regation distortion may be the abortion of A similar comparison was undertaken
male or female gametes or the selective fer- among four maize mapping populations
tilization of particular gametic genotypes. using 1820 co-dominant markers (Lu et al.,
Distortion at a marker locus in rice may be 2002). On a given chromosome nearly all of
caused by linkage between the marker and the markers showing segregation distortion
the gene conferring lower pollinating abil- favoured the allele from the same parent.
Populations in Genetics and Breeding 149

A total of 18 chromosomal regions on the genetics, construction of genetic maps and


ten maize chromosomes were associated identification of linkage among markers
with segregation distortion. The consistent and between markers and genes depends
location of these chromosomal regions in on the segregation patterns of all the mark-
four populations suggested the presence of ers and genes involved. In breeding, the
segregation distortion regions. Three known success of obtaining specific genes, geno-
gametophytic factors are possible genetic types and gene combinations depends on
causes for the presence of these regions. the probability of the target genes and gene
In Populus most markers exhibiting combinations occurring at a ratio expected
segregation distortion generally occurred by Mendelian segregation. To broaden the
in large contiguous blocks on two linkage genetic base of cultivated species, breed-
groups and it has been hypothesized that ers often undertake wide hybridization but
divergent selection had occurred on the they frequently fail to recover recombinants
chromosomal scale among the parental spe- of interest, in part as a result of non-random
cies (Yin, T.M. et al., 2004). survival or generation of offspring. On the
Segregation distortion loci were map- other hand however, phenotypic selec-
ped to chromosomal regions including three tion during inbreeding including the back-
regions on chromosome 5D in Aegilops crossing process, which breeders of course
tauschii using 194 molecular markers for an utilize, can significantly improve the prob-
F2 population (Faris et al., 1998). Two sets ability of recovering the desired alleles.
of reciprocal BC populations were used to Identification of genetic factors asso-
further analyse the effect of sex and cyto- ciated with segregation distortion will
plasm on segregation distortion. Extreme contribute to our understanding of where
distortion of marker segregation ratios in these genetic factors are located and how
the chromosome 5D regions was observed they might be managed in a breeding pro-
in populations in which the F1 was used as gramme. If a target locus is known to be
the male parent and ratios were skewed in linked to a segregation distortion locus
favour of one parent. There was some evi- and is underrepresented in a desired pop-
dence of differential transmission caused ulation, the frequency of the favourable
by nucleo-cytoplasmic interactions. This allele can be increased by using molecular
result, along with other studies, indicated markers to select for recombinants in the
that loci affecting gametophyte competition region of interest. To reduce the negative
in male gametes are located on 5DL. influence of segregation distortion in plant
To map segregation-distorting loci using breeding, it is reasonable to decrease the
molecular markers, both a maximum likeli- number of generations required for stabiliz-
hood (ML) method and a Bayesian method ing breeding lines. The production of DH
were developed (Vogl and Xu, 2000). ML populations from F1 hybrids minimizes the
mapping was implemented by use of an number of generations required to reach
expectation-maximization algorithm and homozygosity and therefore maximizes the
the Bayesian method was developed using chance of retaining desirable alleles in a
the Markov chain Monte Carlo (MCMC) population unless they are linked to segre-
approach. Bayesian mapping is computa- gation distortion factors that affect DH pro-
tionally more intensive than ML mapping duction. In wide crosses where wild alleles
but can handle more complicated models tend to be disproportionately lost, the fre-
such as multiple segregation-distorting loci. quency of rare alleles can be enhanced by
adjusting the type of selection and popu-
Implications for genetics and plant breeding lation structure used in accordance with
genetic information relating to segregation
The phenomenon of segregation distortion distortion, thus providing further opportu-
is intimately linked to the probability of nities for favourable recombination in later
producing specific recombinants of inter- generations (Xu et al., 1997). To under-
est in genetics and breeding populations. In stand the underlying mechanism(s) that
150 Chapter 4

is responsible for segregation distortion, it genetic backgrounds and environments.


would be useful to develop NILs contain- NILs would also provide material for clon-
ing individual segregation distortion loci ing these genetic factors to permit a more
so that the effect of these factors could in-depth characterization of their molecu-
be evaluated systematically in different lar structure and function.
5
Plant Genetic Resources: Management,
Evaluation and Enhancement

Plant genetic resources are one of the most tion and enhancement more efficient. In this
important tools in agricultural research and chapter, most fields related to plant genetic
are used for the improvement of productiv- resources will be covered, including germ-
ity and sustainability of production systems, plasm collection, maintenance, evaluation,
both in the developed and in the develop- enhancement, utilization and documenta-
ing world. The beginnings of the contem- tion. As an introduction, biodiversity and
porary international system for germplasm genetic diversity will be discussed first.
conservation can be traced back to the bril- Biodiversity is of ecological, economic
liant and pioneering work of the Russian and cultural importance. Diversity within
botanist, Nikolai Vavilov. He was the first, an ecosystem allows it to survive and be pro-
in the 1920s, to realize the importance and ductive while providing an enormous range
potential benefits to be derived from gath- of products and services for exploitation by
ering plant genetic resources from around man. Agrobiodiversity, as a component of
the world and organizing them into a col- the total biodiversity, is important to agri-
lection. He noted that the task was not only culture. It helps ensure sustainability, sta-
to gather plants for the immediate breeding bility and productivity (Hawtin, 1998). As
needs of Soviet agriculture, but also to save recognized by the Convention on Biological
seeds from extinction. He also recognized Diversity, there are three interdependent
that modern cultivars were replacing the levels of biodiversity: ecosystem level, spe-
local landraces and that plant research was cies level and genetic level, each of which is
destroying the very foundation of its own influenced by, and influences the other.
existence, thereby threatening global food As described by Hawtin (1998), eco-
security. Since the late 1950s, there has system diversity can be defined as the vari-
been an increasing awareness and docu- ability between interdependent communities
mentation of the benefits of biodiversity and of species and the physical environment in
the risks associated with genetic erosion. which they live. Diverse agroecosystems
Different methods have been developed can lead to a wide variety of enterprises on
to conserve plant genetic resources and a national, regional or community scale
make them available to breeders. Analytical which in turn contributes to maximizing
methods including molecular markers and food security, helps to increase employment
geographic information systems (GIS) have opportunities and increase local or national
been developed to characterize genetic self-reliance. Species diversity relates to the
diversity and make its management, evalua- cultivars of species within an area. A diversity

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 151


152 Chapter 5

of crop and animal species on the com- ticularly for societies in general. This chap-
munity, farm or field levels can help add ter will focus on genetic diversity within
stability by reducing reliance on a single crops, i.e. the genetic resources that lie at
enterprise. Such diversity can also lead to a the heart of sustainable agricultural devel-
more efficient use of resources, for example opment and provide the basis for the con-
by providing increased opportunities for tinued evolution and adaptation of crops.
nutrient recycling. Species diversity at the
field level, e.g. by planting crop mixtures,
can help to provide a buffer against adverse
conditions, pests and diseases. In addition,
5.1 Genetic Erosion and Potential
diversity in agricultural enterprises allows Vulnerability
for a more efficient use of inputs such as
labour. Genetic diversity refers to the vari- 5.1.1 Genetic erosion
ation within a species which represents the
total genetic information contained in indi- For centuries human intervention has
vidual plants of a species, each consisting altered the dynamic relationships among
of a unique assembly of genes constituting the various ecosystems used for food, feed,
its evolutionary heritage. This diversity fuel, fibre, shelter and medicines. However,
begins at the molecular level, is carried as one of the most profound and irreplace-
sequences of instructions on chromosomes able changes that humans have wrought is
and provides the foundation for environ- the acceleration of the rates of extinction
mental adaptation and ultimately for the of species caused by human colonization,
evolution of species. Genetic diversity ena- extension and intensification of agriculture
bles species to adapt to new ecosystems and industrialization. For example, defor-
and environments or changes in the cur- estation of tropical rainforests at the current
rent environment, by natural and/or human rate, may result in the elimination of some-
selection. Diversity within a crop species where between 5 and 15% of the worlds
helps diminish the risk of losses through species between 1990 and 2020. Given the
diseases or pests and provides opportuni- current estimate of about 10 million species
ties for the exploitation of different features in the world, these rates would translate
of the microenvironment by, for example into a loss of 15,00050,000 species year1 or
the presence of diverse growth habits and 50150 species day1. Thomas et al. (2004)
rooting patterns. Such factors can contrib- estimated the total extinction of plant spe-
ute both to greater stability and in many cies in Amazonia following the maximum
circumstances, greater productivity. At the expected climate change leading to habitat
same time, genetic diversity within crops destruction or climatic unsuitability to be
helps to provide a reservoir of genes for 69% for species with seed dispersal and
future crop improvement by farmers and 87% for those without seed dispersal. The
professional plant breeders. Figure 5.1 clearing of forests and the spread of urban
provides an example of genetic diversity areas is also resulting in the disappear-
in maize kernel phenotypes which exists ance of the wild relatives of crop plants.
in the maize germplasm, although only On the other hand, as indicated by Brown
a few of these phenotypes now exist in cul- and Brubaker (2002) several hundred crop
tivated maize. and wild plant species previously used by
Of the three levels which comprise glo- humans, are now classified as underutilized
bal biological diversity, genetic diversity has or neglected.
received the greatest attention within the The erosion in biodiversity is caused
agricultural community. As the raw mate- by a multiplicity of factors, including loss,
rial of future elite cultivars and an indicator fragmentation and degradation of habitats;
of sustainability of agricultural production, introduction of alien species into ecological
the status of genetic diversity is of utmost niches; overgrazing; excessive harvesting
concern for agricultural production and par- beyond the levels of natural regeneration
Plant Genetic Resources 153

Fig. 5.1. Maize kernel phenotype. After Neuffer et al. (1997; the original plate is from Correns, 1901).

as in the case of trees harvested for timber; settlement of new lands, changes in culti-
pollution of various media that sustain the vation methods and changes in agricultural
biological nutrient cycles in ecosystems; systems. Apart from the narrowing of genetic
deforestation and land clearance which diversity by extensive mono-cropping, the
is cited as being the most frequent cause Green Revolution has contributed indirectly
of genetic erosion in Africa; adverse envi- to the loss of biodiversity by soil mining. As
ronmental conditions such as drought and a result of the heavy use of fertilizers, chem-
flooding; introduction of new pests and dis- ical inputs and irrigation, agricultural crops
eases; population pressure and urbanization; which rely on Green Revolution technology
war and civil strife; technological advances have rendered the soil sub-fertile and in
in agriculture, particularly the green revolu- some cases inhospitable to other species.
tion which resulted in the abandonment of In addition, the phenomena of genetic drift
traditional crops in favour of new ones, the and selection pressure produce cumulative
154 Chapter 5

genetic erosion that may sometimes exceed that grew in the USA at the turn of the 20th
the genetic erosion actually taking place in century, more than 95% no longer exist;
the field (Esquinas-Alczar, 1993). Concerns (ii) only 20% of the maize types recorded
about the genetic erosion of plant genetic in 1930 in Mexico can now be found; and
resources were first articulated by scien- (iii) only 10% of the 10,000 wheat culti-
tists in the mid-20th century and have since vars grown in China in 1949 remain in use
become an important part of national poli- (Day Rubenstein et al., 2005; Gepts, 2006).
cies and international treaties. The process is under way in all countries,
Genetic erosion, or the reduction in both developed and developing and unfor-
genetic diversity in crop plants, takes on tunately includes some of the richest pri-
various shapes depending on ones view- mary and secondary gene centres of several
point, including the reduction in the number important food crops (Dodds, 1991). As
of different crop species being grown and the demand for uniform performance and
the decrease in genetic diversity (number grain quality has increased, new cultivars
of unrelated cultivars being grown with including hybrids are increasingly derived
crop species). In addition, other organisms, from adapted, genetically related and elite
both within and across agroecosystems, modern cultivars. The more genetically
are increasingly taken into account when variable but less productive primitive
assessing biodiversity as it relates to agri- ancestors have been almost excluded from
culture (Collins and Qualset, 1999; Hillel most breeding programmes. In a study of
and Rosenzweig, 2005). As one of the major pedigree relationships among 140 US rice
factors for genetic erosion, the transition accessions, Dilday (1990) concluded that
from primitive to advanced cultivars as a all parental germplasm in public cultivars
result of plant breeding, is worthy of a fur- used in the southern USA could be traced
ther discussion. This has occurred by two back to 22 plant introductions in the early
distinct pathways: (i) selection for relative 1900s and those used in California could be
uniformity, resulting in pure lines, multi- traced back to 23 introductions. The same
lines, single or double hybrids, etc.; and situation is true for soybean and wheat.
(ii) selection for closely defined objectives. Virtually all modern US soybean cultivars
Both processes have resulted in a marked can be traced back to a dozen strains from
reduction in genetic variation. At the same a small area in north-eastern China and the
time there has been a tendency to restrict majority of hard red winter wheat cultivars
the gene pool from which parental material in the USA originated from just two lines
has been drawn. This is a result of the high imported from Poland and Russia (Duvick,
level of productivity achieved when breed- 1977; Harlan, 1987).
ing within a restricted but well-adapted gene
pool and breeding methods that have made
it possible to introduce specifically desired
improvements such as disease resistance 5.1.2 Genetic vulnerability
and quality characteristics, into breeding
stocks with a minimum of disturbance to Genetic vulnerability is the potentially
the genotypic structure by backcrossing or dangerous condition which results from
transgenic approaches. a narrow genetic base. One of the most
In the process of modern plant improve- tragic cases on record caused by genetic
ment, the traditional cultivars (landraces) vulnerability is the Irish potato famine of
of farmers have been replaced by modern the 1840s in which more than one million
cultivars. In the 1990s, only about 15% of Irish starved to death as a consequence of a
the global area devoted to rice and 10% massive attack of late blight (Phytophthora
of the developing worlds wheat area were infestans) that destroyed the Irish potato
planted to landraces (Day Rubenstein et al., crop. The potato had been the main staple
2005). Other examples include the follow- of the Irish diet for the preceding centuries.
ing: (i) of the nearly 8000 cultivars of apple The underlying cause of the catastrophe
Plant Genetic Resources 155

was the narrow genetic base of the potato genomics, promises the potential to both
plants in that country; all had originated enhance and further endanger diversity. As
from a small quantity of uniform materi- a double-edged sword, biotechnology can
als brought from Latin America in the 16th enhance or jeopardize the greater utiliza-
century. Other famous examples include tion of genetic resources.
the coffee rust epidemic in Ceylon (1868)
and Southern corn leaf blight epidemic in
the USA (1970).
Genetic vulnerability stems from 5.2 The Concept of Germplasm
genetic uniformity, examples of which are
homozygosity (often recessive) as a result 5.2.1 A generalized concept
of clonal reproduction and the formation of of germplasm
F1 hybrids from inbred parents (e.g. hybrid
maize). The types of uniformity desired Germplasm can be defined as the genetic
in a crop are: (i) rapid and uniform germi- materials that represent an organism. The
nation of seeds; (ii) nearly simultaneous expression of plant genetic resources usu-
flowering and maturation; (iii) stature that ally refers to the sum total of genes, gene
promotes mechanical harvest; (iv) product combinations or genotypes embodied as
uniformity for taste, flavour and chemical cultivars that are available for the genetic
composition; and (v) year-to-year stability improvement of crop plants. Following the
of yield (Wilkes, 1993). With the substitu- proposal of Harlan and de Wet (1971), plant
tion and consequent loss of a primitive genetic resources were classified into three
cultivar, the genetic diversity contained gene pools that reflected the increasing dif-
in it is eliminated. To prevent such losses, ficulties in carrying out sexual crosses and
samples of the replaced landraces should obtaining viable and fertile progenies. Gene
be adequately conserved for possible future Pool I includes the crop species itself and
use. The tendency to eliminate the genetic its wild progenitor. Crosses within Gene
diversity contained in primitive landraces Pool I can generally be made easily and the
of plants jeopardizes the possible develop- resulting progeny is viable and fertile. This
ment of future cultivars adapted to tomor- gene pool corresponds closely to the biolog-
rows unforeseeable needs. ical species concept. Gene Pools II and III
As a few elite cultivars have come include other species that are less related to
to dominate the major crops worldwide, the crop species of interest. Crosses between
genetic vulnerability has increased. As Gene Pools II and III are possible but are
Wilkes (1993) indicated, we are now pro- usually more difficult to achieve. The prog-
moting a carpet of closely related dwarf- eny shows reduced viability and fertility.
stature cultivars across the grain belts of Finally, crosses between Gene Pools I and
the world. The magnitude of this potential III are the most difficult. Special techniques
is made clear by the fact that most of the such as tissue culture and embryo rescue
hybrid rice planted in Asia now shares must be used to obtain a progeny from
the same maternal cytoplasm and most of these crosses. The progeny often show a
the high-yielding bread wheat cultivars severe reduction in viability and fertility.
are presently based on only three types The operational definition of Harland and
of cytoplasm. There are many more such de Wet (1971) has been very useful because
examples in other important crops. The it reflects the realities of the breeding proc-
burden of genetic vulnerability has been ess, particularly the introduction of new
placed primarily on the shoulders of plant genetic diversity into the populations of a
breeders because elements in the technol- breeding programme by sexual hybridiza-
ogy of plant breeding can be designed to tion (Gepts, 2006). However, it could be
minimize its impact, for example by devel- argued that this definition may need to be
oping synthetic or composite cultivars expanded to include a Gene Pool IV based
and multi-lines. Biotechnology, including on the advances in scientific technology
156 Chapter 5

and increased awareness of the benefits of rier of genetic material, germplasm can be
biodiversity in general. Availability of plant anything that carries genetic information
transformation techniques (as discussed in required for controlling and rebuilding an
Chapter 12) has extended the reach of plant organism which includes genes and their
breeding beyond the limitations imposed clones, chromosome segments and even
by sexual cross-compatibility and as a pieces of functional DNA sequences. The
result, the Gene Pool IV should include all generalization of the concept of germplasm
organisms as a potential source of genetic depends on two major developments: cell
diversity. totipotency (the potential for regenerating
Classical germplasm can be defined a whole plant from a single cell) and the
according to reproduction systems to include development of the gene concept (genetic
seeds from sexual plants and all types of tis- material can be traceable to a small piece
sue such as roots, stems and other organs of DNA that controls a biological trait and
that can be used for reproduction in asexual codes for a specific protein) (Xu and Luo,
plants. Therefore, germplasm is tradition- 2002; Fig. 5.2).
ally defined as a morphologically distinct DNA as a type of genetic resource is
biological object. Different plant species or rapidly increasing in importance. DNA
cultivars from the same species can be dis- from the nucleus, mitochondrion and chlo-
tinguished from each other morphologically roplasts are now routinely extracted and
based on size, colour and shape. In the case immobilized on to nitrocellulose sheets
of sexual plants, seeds are the major carrier where the DNA can be probed with numer-
of germplasm and for most plant species ous cloned genes. With the development of
germplasm can be maintained and repro- PCR, specific fragments or entire genes from
duced by the collection and regeneration a mixture of genomic DNA can now be rou-
of seeds. Seeds are of major importance in tinely amplified (Engelmann and Engels,
the process of germplasm management and 2002). Genetic information can be synthe-
are collected, maintained and reproduced. sized and living variants can be created or
Evaluation and utilization of germplasm is rebuilt by using DNA sequence information.
dependent on the seeds that can be used to These advances have led to the formation of
generate plants and on other useful organs an international network of DNA repositor-
such as root, leaves, stems or even seeds ies for the storage of genomic DNA (Adams,
themselves. 1997). The advantage of this technique is
With the development of molecu- that it is efficient and simple and overcomes
lar biology, the concept of germplasm has physical limitations or constraints. The dis-
been generalized and broadened. As a car- advantage lies in problems with subsequent

Seed
ent
lo pm DN
d eve Ae
nd ct ion Synthetic xtra
th a odu Tissue seed or ctio
n
Gro
w epr
xu al r culture regeneration
Se
Plant Cell or tissue DNA
Tissue culture DNA extraction

Se
xu Tissue Protoplast ion
al lat
rep or pr culture fusion, etc.
e iso sfer
rod opa n n
uc g Ge d tra
tio ated an
n
Transgenic plants

Fig. 5.2. Germplasm carriers and their conversion through biological, molecular and biotechnological
approaches. Modified from Xu and Luo (2002).
Plant Genetic Resources 157

gene isolation, cloning and transfer (Maxted of their source (Kresovich et al., 2002). As
et al., 1997). A tool of potential importance a carrier for genetic materials, therefore,
is the development of collections of DNA germplasm is no longer limited to a specific
samples in glacie from wild species. species. Germplasm can be managed based
Genome sequencing and the under- on the classification of genes or properties
standing of the function of all plant genes across species rather than the terminology
will have significant impacts on the con- and classification of the plant. With the new
servation of plant genetic resources. The technologies that have been developed in
role that the genetic resources community molecular biology and genomics, the gene
might undertake in respect to conserving pool for any given species has expanded
molecular genetic products has yet to be well beyond the tertiary gene pool and
defined. As such, genebanks are facing new can be taken to include any gene from any
demands from the user community. There source, perhaps in the future, even to new
are an increasing number of resources being synthetic or shuffled sequences.
generated by the molecular community that As germplasm collection turns increas-
are relevant to conservation work. Some ingly to tissues, cells and DNA, methods for
genebanks might wish to store primers, collecting and preserving these materials
probes and DNA libraries to facilitate their may need to be modified or revolutionized
work, in addition to populations generated by using diversified techniques for preser-
for gene isolation and plant improvement. vation and reproduction. For example, pres-
In the future, users may want to receive ervation of tissues and cells can be achieved
functional DNA sequences, genes, clones by subculture or regeneration instead of the
or markers instead of seeds or other tradi- storage of seeds. Using cloning and transfor-
tional means of transporting DNA. They mation, DNA can be transferred into other
may want a series of specific alleles rather plants to obtain transgenic plants (Fig. 5.2).
than accessions where alleles are segregat- Germplasm management in the future will
ing. As Kresovich et al. (2002) indicated become an integrated science that is closely
a future-oriented analysis of these possi- linked with biotechnology including tissue
ble trends and their implications is very culture, gene cloning and transformation,
important in order to predict and thus pre- molecular marker technology and synthetic
pare for the changing role of genebanks and seed technology. As summarized by Taji
curators. et al. (2002), there are five main areas of
On the other hand, the concept of germ- biotechnology that can directly assist plant
plasm is no longer defined for each species conservation programmes: (i) molecular
or crop plant and its relatives. With the biology, particularly molecular markers:
development of technologies for gene clon- assisting in germplasm collection, aiding
ing and transfer as discussed in Chapters 11 genebank design and accession structure
and 12, the genetic barriers that used to exist and assessing genome stability, genetic
among different species or genera no longer diversity, population structure and distri-
exist and genes can be exchanged freely bution patterns; (ii) molecular diagnostics:
between different families and genera or assessing phytosanitary status; (iii) in vitro
even between plants, animals and bacteria. culture: micropropagation, slow growth and
Useful genes identified in one species can embryo rescue; (iv) cryopreservation: long-
be used to modify another species. Taxa that term conservation of seed-recalcitrant spe-
are evolutionarily related (e.g. grasses, leg- cies, vegetatively propagated species and
umes and members of the Solanaceae) have biotechnological products; and (v) informa-
strikingly similar genome organizations as tion technology: documentation, training,
discussed in Chapter 2. At the fundamen- transfer technology, germplasm exchange,
tal level of the gene, many sequences are DNA databases, genome maps, genebank
highly conserved across families. Therefore, inventories and international networking.
users of generic resources will acquire use- These areas will be discussed in various
ful genes from repositories independent sections of this chapter.
158 Chapter 5

5.2.2 Classical germplasm of either geneticists or plant breeders. The


material can be of value in itself or used as
Classical germplasm can be identified by a tool for research.
their position in an agricultural ecosystem: 6. Co-adapted or symbiotic organisms: in
which two forms of a crop, two distinct
1. Commercial cultivars (cultivars in current crops or a crop and its symbiont (unrelated
use): these are the standardized and com- weeds or a legume and a nodule-forming
mercialized cultivars that have in general bacteria) are grown together.
been created by professional plant breed-
ers. Many of them are characterized by high
productivity when subjected to intensive 5.2.3 Artificial or synthetic germplasm
cultivation systems requiring heavy invest-
ment (fertilizers, irrigation, pesticides, etc.)
Man-made germplasm is described as artifi-
and most by uniformity which may lead to a
cial or synthetic and can be of the following
high degree of genetic vulnerability.
types:
2. Advanced breeding lines: these are the
materials obtained by plant breeders as 1. Organisms possessing exotic genetic
intermediate products. These lines usu- materials including transgenic plants and
ally have a narrow genetic base because in engineered plants.
general they have originated from a small 2. Organisms containing genetic modi-
number of cultivars or populations. fications including variants induced by
3. Landraces or traditional/primitive culti- physical and chemical agents, somaclonal
vars: these are primitive cultivars that have variants derived from tissue culture and
evolved over centuries or even millennia, mutants occurring naturally during the
have been influenced decisively by migra- process of maintenance and production of
tions and have been subjected to both natu- germplasm.
ral and artificial selection. There is a large 3. Synthetic cultivars and species where
diversity between and within these cul- the former includes novel cultivars derived
tivars that are adapted to survive in often from distant crossing with relatives such
unfavourable conditions, have low but sta- as cultivars of sorghum with maize chro-
ble levels of production and are therefore mosomal segments or genes and the lat-
characteristic of subsistence agriculture. ter includes a man-made cereal, octaploid
These primitive cultivars have been fre- triticale, which is derived from hybridiza-
quently used as parental lines to breed new tion of wheat (Triticum aestivum) and rye
cultivars. (Secale cereale).
4. Wild and weedy species related to cul- 4. Variants with chromosome changes
tivated species: these are either the ances- in structure and number including poly-
tors of a cultivated crop or species that are ploids produced by chromosome doubling,
genetically close to the crop so that genes somatic allopolyploids from cell hybridi-
can flow between them without much diffi- zation and aneuploids which contain an
culty or where the genetic barriers between abnormal chromosome number due to a
them can be removed by certain methods missing chromosome.
such as embryo rescue of hybrids. These 5. A set of individuals with a specific
germplasm resources are becoming increas- genetic structure as discussed in Chapter 4,
ingly important as genetic engineering has including nearly isogenic lines that show
facilitated the transfer of genes between dif- isogenic differences at specific genetic loci,
ferent plant species, families and genera. recombinant inbred lines that are derived
5. Special genetic stocks: this category from continuous inbreeding of an F1 hybrid
includes other genetic combinations such as and doubled haploid lines derived from
genetic, chromosomal and genomic mutants female or male gametes by gynogenesis
which have been naturally or artificially or androgenesis using techniques such as
produced and conserved in the collections anther culture and chromosome doubling.
Plant Genetic Resources 159

These types of germplasm resources have dynamic ecosystems of their original or nat-
become very important materials in genet- ural environment including conservation in
ics and plant breeding. nature reserves and on the farm. This type of
conservation control is most suited to wild
related species. Ex situ conservation entails
removing germplasm resources (seed, pol-
5.2.4 In situ and ex situ conservation len, sperm and individual organisms) from
their original habitats and preserving them
As discussed above, the prospects of species in botanical gardens or gene/seed banks.
extinction is forever with or without human Examples of different methods of ex situ
intervention. Therefore, the best prepara- conservation include field genebanks, seed
tion for future uncertainties is to conserve storage, pollen storage, in vitro conservation
as many gene pools, species and ecosystems and DNA storage.
as possible, whether they have actual or There is an obvious fundamental differ-
potential utility to humankind. Use for the ence between these two strategies: ex situ
benefit of humankind is the strongest justi- conservation involves the sampling, transfer
fication for the conservation of plant genetic and storage of target taxa remotely from the
resources. During recent decades awareness collection areas whereas in situ conserva-
has been raised of the importance of con- tion involves the designation, management
serving crop gene pools to ensure that the and monitoring of target taxa where they are
breeder has adequate raw materials. This encountered (Maxted et al., 1997). Another
process is dynamic since the breeder is difference lies with the more dynamic nature
continually seeking new alleles and allelic of in situ conservation as opposed to the
combinations to improve the performance more static nature of ex situ conservation.
of a crop species in its target environments. Each technique has its own advantages
Conservation of germplasm resources and limitations. The major drawback of
goes far beyond the preservation of a species. ex situ conservation is that the evolution
The objectives must be to conserve sufficient of species would be frozen since no fur-
diversity within each species to ensure that ther adaptation to environmental or biotic
its genetic potential will be fully available stresses indigenous to their origin can take
in the future. Conservation of germplasm place and that the processes of selection
resources has been generalized to include and continuous adaptation to those local
all activities relating to germplasm manage- habitats are halted. Other disadvantages are
ment such as collection, maintenance, reju- that long-term integrity of the germplasm
venation and multiplication, evaluation, remains in question and that high rates of
exchange and documentation. In this sec- mutation exist among the ex situ stored
tion the discussion will focus on the meth- plants. Further drawbacks are the occur-
ods for germplasm conservation. rence of genetic drift (random loss of diver-
Conservation and control are two sity due to the fact that the samples collected
issues broadly related to biodiversity that and multiplied are necessarily very small)
have a bearing on the role of biotechnology. and selection pressure (the materials are
Conservation refers to the maintenance or usually multiplied in ecogeographical areas
enhancement of biodiversity particularly which differ from those in the areas where
the plant species and control refers to they were originally collected).
accessing this diversity. Two basic conserva- In situ techniques allow the conserva-
tion strategies, in situ and ex situ conserva- tion of greater inter- and intraspecific genetic
tion, each composed of various techniques diversity than is possible in ex situ facilities.
are employed to conserve genetic diversity They also permit continued evolution and
for various research and development pro- adaptation to take place, whether in the wild
grammes including plant breeding. In situ or on the farm where selection by man also
conservation refers to the preservation of plays a critical role. For some species such
genetic resources within the evolutionary as many tropical trees, it is the only feasible
160 Chapter 5

method of conservation. The main drawback orthodox seed for several reasons. First,
is the difficulty in characterizing, evaluating some species do not produce seeds at all
and assessing genetic resources and suscep- and consequently are propagated vegeta-
tibility to hazards such as extreme weather tively; these include banana and plantain
conditions, pests and diseases. In addition, (Musa spp.). Secondly, some species such
the monetary expense may be quite high, as potato, other root and tuber crops such
especially where there is pressure for alterna- as yams (Dioscorea spp.), cassava (Manibot
tive uses of the land. The method selected for esculenta), sweet potato (Ipomoea batatas)
in situ conservation depends on the nature of and sugarcane (Saccharum spp.) either
the species. Traditional crop cultivars may be have some sterile genotypes and/or do not
conserved on the farm while undomesticated produce orthodox seed. However, if they
relatives of food crops may require land to be are capable of seed production, these seeds
set aside as reserves (Hawtin, 1998). are highly heterozygous and are therefore
In situ conservation is especially appro- of limited utility for the conservation of
priate for wild species and for landrace particular genotypes. These crops are usu-
materials on the farm while ex situ conserva- ally propagated vegetatively to maintain
tion techniques are particularly appropriate the genotypes as clones (Simmonds, 1982).
for the conservation of crops and their wild Thirdly, a considerable number of species,
relatives (Engelmann and Engels, 2002). predominately tropical or subtropical in ori-
In situ conservation of biodiversity enables gin such as coconut, cacao and many forest
the preservation of the knowledge of farm- and fruit tree species, produce seeds which
ing systems, including biological and social do not undergo maturation drying and are
knowledge associated with them. Ex situ shed at relatively high moisture content.
conservation on the other hand, divorces Such seeds are unable to withstand desic-
the biological from the social context. cation and are often sensitive to chilling.
In situ and ex situ systems for conser- Seeds of this type are called recalcitrant and
vation of germplasm resources should be need to be kept in moist, relatively warm
considered as complementary and not antag- conditions to maintain viability (Roberts,
onistic. The current approach is to combine 1973; Chin and Roberts, 1980). Even when
both methods of conservation depending on recalcitrant seeds are stored in an optimal
such factors as reproductive biology, nature manner, their lifespan is limited to weeks
of the storage organs and propagules and or occasionally months.
availability of human, financial and institu- Other conservation methods are needed
tional resources (Bretting and Duvick, 1997). for these recalcitrant species. These include
Many major food plants produce seeds that conservation as living collections in field
undergo maturation drying or can be dried genebanks as described above for in situ
to low-water content due to their tolerance conservation or in vitro conservation either
to extensive desiccation and can therefore be as living plantlets, plant tissue on appropri-
stored dry at low temperatures. Seeds of this ate media often under conditions of slow
type are known as orthodox (Roberts, 1973). growth or by cryopreservation at very low
Storage of such orthodox seeds is the most temperatures, generally using liquid nitro-
widely practised method for ex situ conser- gen. For those problem species whose seeds
vation of plant genetic resources and about do not survive under conventional storage
90% of the 6.1 million accessions stored in conditions largely because they cannot tol-
genebanks are maintained as orthodox seed erate desiccation and die when exposed to
(Engelmann and Engels, 2002). For most spe- low temperatures, the field genebank is the
cies, stored seed is the most genetically rel- conventional approach to their conserva-
evant, i.e. it is the raw material with which tion. However, there are many drawbacks
the breeder works and seed propagation is an to this, not least being that field genebanks
integral part of the growth cycle of the crop. cannot provide secure, long-term conserva-
There are a significant number of tion as compared to the safety and low input
crops which fall outside the category of requirements of a seed genebank.
Plant Genetic Resources 161

5.3 Collection/Acquisition 5.3.1 Several issues on germplasm


collections
For most species the material to be
collected consists of seeds, although in How representative a collection is compared
some cases it may be bulbs, tubers, cut- to the entire species is a major concern of
tings, whole plants, pollen grains or even germplasm collections. A breeder will usu-
tissue samples for in vitro culture depend- ally look for useful agronomic character-
ing on the characteristics of the species istics (selective sampling), whereas the
and the manner in which the material population geneticist may try to collect
is to be conserved. Much work has been randomly (random sampling). It should be
carried out on the collection and acquisi- noted that the concept of usefulness is rela-
tion of germplasm resources worldwide. tive and may vary according to the objectives
The centres of the Consultative Group and information available to the collectors.
on International Agricultural Research Collections can be made more representa-
(CGIAR) have the responsibility of collect- tive by analysing patterns of ecogeographic
ing, preserving, characterizing, evaluating differentiation to identify related species
and documenting the genetic resources of that comprise crop gene pools, ensuring
the cultivated and wild relatives of the that 90% of the input is not being targeted
cereals (barley, maize, millets, oat, rice, to save only 10% of the known diversity,
sorghum and wheat), legumes (Bambara and planning for additional exploration and
groundnuts, chickpea, common bean, collection to amplify the collections while
cowpea, faba bean, grasspea, lentil, pea, avoiding any duplication of effort. Since
groundnut, pigeonpea and soybean), roots genetic erosion will not wait for approval
and tubers (Andean root and tuber crops, of pending international agreements or net-
cassava, potato, sweet potato and yam) working arrangements, plans for the collec-
and Musa (both banana and plantain). tion of germplasm should take into account
Based on the most recently available data, the numbers of samples estimated to be
over 6 million accessions are stored ex required by the World Resources Institute
situ throughout the world; of these, some for crop gene pools, forest species, medici-
600,000 are maintained within the CGIAR nal plants, ecosystem rehabilitation and tra-
system and the remaining 5.4 million ditional underexploited plants. Molecular
accessions are stored in national or regional markers such as randomly amplified poly-
genebanks. Nearly 39% are cereals, 15% morphic DNA (RAPD), restriction frag-
food legumes, 8% vegetables, 7% forages, ment length polymorphism (RFLP), simple
5% fruits, 2% roots and tubers and sequence repeats (SSRs) and single nucleo-
c.2% oil crops (Scarascia-Mugnozza and tide polymorphisms (SNPs) have contrib-
Perrino, 2002). Approximately 527,000 uted to a better understanding of the genetic
accessions are stored worldwide in field structure of gene pools and, together with
(in situ) genebanks, of which 284,000 are techniques such as GIS, offer new potential
in Europe, 10,000 in the Near East, 84,000 for mapping diversity which would help to
in Asia and the Pacific, 16,000 in Africa establish representative germplasm collec-
and 117,000 in the Americas (FAO, 1998). tions more efficiently.
There are 1500 botanical gardens (11% Optimal sampling methodology during
private) worldwide which maintain liv- the collection of field germplasm requires a
ing collections of plants. About 10% of clear understanding of the genetic structure
these also have seed banks and 2% in of the crop species in question. Biotechnology
vitro collections. Vegetatively propa- can help to reduce the practical impediments
gated species, forest trees, medicinal and to efficient collecting in at least two ways.
ornamental species, and plant genetic First, biochemical and molecular charac-
resources for food and agriculture which terization techniques can be used to provide
are of local significance are usually well information about the availability of genetic
represented. diversity in a given collecting area, thereby
162 Chapter 5

facilitating more rational and effective sam- Wild relatives of our present crop
pling. Molecular markers can be used to plants, although agronomically undesira-
measure the degree of divergence within ble, may also have acquired many desirable
species, analyse inter- and intrapopulational stress-resistant characteristics as a result of
diversity and monitor genetic erosion within their long exposure to natures pressures.
genebank collections. Secondly, in vitro Many recent studies using wild relatives in
propagation methods can be modified for genetic mapping have identified cryptic
application in the field to provide new ways alleles that do not exist in cultivated plants
of collecting problem materials. (for details see Chapter 7) which make con-
For clonally propagated and recalci- servation of wild species a more important
trant seed-producing species, the materi- component in germplasm resources than
als collected are often bulky and heavy. ever before. Requirements for the devel-
Furthermore, they are often soil bearing, opment of collection strategies suitable
thereby introducing a plant health hazard. for wild relatives has been increasing and
Recalcitrant seeds and vegetative explants genomic tools including molecular mark-
such as shoots, suckers or tubers have a lim- ers can help to identify the genetic diversity
ited lifespan and may be prone to decom- and merits that exist in the wild relatives by
position through microbial attack. In some the methods discussed in Section 5.5.
cases, suitable materials for collection may
not even be available and seed may be
immature or absent as a result of grazing. 5.3.2 Core collections
However, new in vitro collecting techniques
involve the principles of in vitro inocula- As germplasm collections of major crop
tion and culture without the cumbersome plants continue to grow in number and
and complex conditions that normally per- size around the world, better access to and
tain to the laboratory. This was originally use of the genetic resources in collections
explored for cacao buds and the coconut have become important issues. Potential
embryo and was also successfully adapted users require either populations repre-
for several other materials (Withers, 1993). sentative of the diversity or accessions that
The observance of adequate quaran- describe particular agronomic characters
tine, disease indexing and disease eradica- (e.g. disease resistance, drought tolerance).
tion procedures are essential for the safe In either case the managers of collections
movement of germplasm from its origin to may find it difficult to meet such needs.
genebanks and among genebanks and users. The very size and heterogeneous structure
Clonally-propagated crops present particu- of many collections have hindered efforts
lar problems in that they are commonly col- to increase the use of genebank materials in
lected in the form of vegetative propagules plant breeding. Recognizing this, Frankel
that carry a relatively high risk of disease (1984) proposed that a collection could
transmission. They may accumulate sys- be represented by what he termed a core
temic pathogens since they lack the patho- collection, which would represent with
gen filter that the seed production stage can minimum repetitiveness, the genetic diver-
offer. The potential for eliminating patho- sity of a crop species and its relatives. The
gens via meristem-tip culture, sometimes accessions excluded from the core collec-
linked to other therapeutic processes such tion would be retained as the reserve col-
as thermotherapy, is now an important com- lection. Construction of a core collection
ponent of the process of introducing many involves selecting approximately 10% of
clonally-propagated crops into conservation the germplasm accessions to represent at
collections. The introduction of the enzyme- least 70% of the genetic variation (e.g. Brown,
linked immunosorbent assay (ELISA) and 1989a, b) unless the entire germplasm col-
other methods based on nucleic acid, bio- lection is very large, in which case less than
chemical and molecular technology provide 10% would be necessary. This proposal
new methods for detecting pathogens. was further developed by Frankel and
Plant Genetic Resources 163

Brown (1984) and Brown (1989a), who out- selection of which species and which sam-
lined how to achieve core coverage of the ples to include is crucial. Since the aim is
collection by using information regarding to obtain the maximum amount of useful
the origin and characteristics of the acces- information from a limited sample, the use
sions. In terms of practical use, the three of core collections is an obvious approach.
major objectives of the core collection are A general procedure for the selection
to set up as wide a representation as possi- of a core collection can be divided in four
ble of the genetic diversity to be able to con- steps:
duct intensive studies on a reduced set of
genotypes and to attempt to extrapolate the Definition of the domain: the first step
results thus obtained to facilitate research in creating a core collection is defin-
on appropriate genotypes in the base col- ing the material that should be rep-
lection (Noirot et al., 2003). resented, i.e. the domain of the core
The core proposal was a radical depar- collection.
ture in thinking regarding genetic resources Division into groups: the second step is
(Frankel, 1986). Until then, the main dividing the domain into groups which
emphasis had been on the open-ended task should be as genetically distinct as
of collecting as many samples as possible possible.
and securing their survival in storage, irre- Allocation of entries: the size of the
spective of continuing cost and use. Frankel core collection should be determined
and Brown (1984) introduced the notion of and the choice of number of entries per
adequacy of sampling of the species range. group should be made.
Analysis of climatic, ecological and geo- Choice of accessions: the last step is the
graphical information on the species range choice of accessions from each group
could be used to suggest where distinctly that are to be included in the core.
different environments or separated locali-
ties occurred for that species. This analysis Several different methods have been
could be checked with the available collec- used to construct core collections and these
tions and used to identify places or habi- aim to represent most of the genetic diver-
tats where collections had been excessive sity with the fewest number of accessions
and others where further collection is war- possible (see for example Noirot et al.,
ranted. In this way, a complete collection 2003). Many reports have been published
can be built up, from which a core collec- on the formation of core subsets. Hintum
tion can be extracted. (1999) described one such system, the Core
Using all the available data, core col- Selector to generate representative selec-
lections are arranged to make their entries tions of germplasm accessions. Upadhyaya
representative of genetic diversity. The and Ortiz (2001) developed a two-stage
basic procedure is to recognize groups of strategy for developing a mini-core collec-
related or similar accessions within the tion, again based on selecting 10% of the
collection and sample from each group. accessions from the core collection repre-
Presently, in the constitution of a core col- senting 90% of the variability of the entire
lection, most researchers agree on the need collection. In this process, a representative
for stratification prior to the sampling. In core collection is first developed using all
other words, the organization of the vari- the available information on geographic ori-
ability in groups and subgroups should be gin, characterization and evaluation data. In
taken into account. There are clear ben- the second stage, the core collection is eval-
efits to the greater use of these more pre- uated for various morphological, agronomic
cise measures of genetic variation. Equally and quality traits to select a subset of 10%
clearly, it is costly in human and finan- accessions from this core subset (or 1% of
cial resources to generate these measures the entire collection) that captures a large
so they can only be employed in a lim- proportion (i.e. more than 80% of the entire
ited number of collections. Therefore, the collection) of the useful variation. At both
164 Chapter 5

stages in selection of core and mini-core DNA markers, studies of genetic diversity
collections, standard clustering procedures aimed at developing core collections have
are used to separate groups of similar acces- been reported for several plant species.
sions combined with various statistical tests Crops with cores established at the early
to identify the best representatives. stage include lucerne, barley, chickpea,
Molecular markers have been used clover, lentil, medic, groundnut, bean, pea,
to construct core subsets which preserve safflower and wheat (Clark et al., 1997).
as much of the diversity present in the Mini-core collections are reported for
original collection as possible (Franco et crops such as chickpea (Upadhyaya and
al., 2005, 2006). Genetic markers on three Ortiz, 2001), groundnut (Upadhyaya et al.,
maize data sets and 24 stratified sampling 2002), pigeonpea (Upadhyaya et al., 2006b)
strategies were used to investigate which and rice (1536 accessions, D.J. Mackill,
strategy conserved the most diversity in the International Rice Research Institute (IRRI),
core subset as compared with the original personal communication). Such efforts have
sample (Franco et al., 2006). The strategies led to the identification of diverse germ-
were formed by combining three factors: plasm with beneficial traits of significant
(i) two clustering methods (unweighted economic value being found in barley and
pair-group means arithmetic (UPGMA) many legume crops (Dwivedi et al., 2005,
and Ward); based on (ii) two initial genetic 2007; Brick et al., 2006). Table 5.1 provides
distance measures; and using (iii) six allo- examples for core collections that have been
cation criteria (two based on the size of the established with a relatively large number
cluster and four based on maximizing dis- of germplasm accessions included. Several
tances in the core (the D method) used with types of data were used for each crop, with
four diversity indices). The success of each geographic origin usually being one of the
strategy was measured on the basis of max- first criteria used for selection.
imizing genetic distances (Modified Roger In rice, methods for selecting accessions
and Cavalli-Sforza and Edwards distances) to construct a core collection were inves-
and genetic diversity indices (Shannon tigated based on shared allele frequencies
index, proportion of heterozygous loci and (SAFs) and the frequency of unique RFLP
number of effective alleles) in each core. and SSR alleles (Xu et al., 2004; Fig. 5.3).
For the three data sets, the UPGMA with D Subsets of various sizes were selected (rep-
allocation methods produced core subsets resenting 550% of the US and world collec-
with significantly more diversity than the tions) using random selection as a control.
other methods and were better than the M For each sample size, 200 replications were
strategy implemented in the MSTRAT algo- analysed using a re-sampling technique and
rithm for maximizing genetic distance. the number of alleles in each subgroup was
Using the advanced M strategy with a compared with the total number of alleles
heuristic search for establishing core sets, identified in the larger collection from which
a program known as POWERCORE has been the subsets were sampled. A cultivar subset
developed (Kim et al., 2007). The program (13% of the entire collection) selected on the
supports development of core-sets by reduc- basis of both SAFs and number of unique alle-
ing the redundancy of useful alleles and thus les detected, represented 94.9% of the RFLP
enhancing their richness. The output of the alleles but only 74.4% of the SSR alleles. It
POWERCORE has been validated using some can be expected that selection criteria based
case studies and the program effectively on additional sources of information will fur-
simplifies the generation process of core-set ther improve the value and representative-
while significantly cutting down the number ness of core collections. This resource may
of core entries, maintaining 100% of the serve as a source of novel alleles for genetic
diversity. POWERCORE is applicable to various studies and for broadening the genetic base
types of genomic data including SNPs. of US rice cultivars. In addition, the follow-
Based on phenotypic evaluation of eco- ing conclusions were drawn (Xu et al., 2004):
nomically important traits and the use of (i) more samples were needed to represent
Plant Genetic Resources 165

Table 5.1. Description of core collections in barley, cassava, finger millet, maize, pearl millet, potato, rice,
sorghum and wheat (modified from Dwivedi et al., 2007).

Number of
Crop Descriptiona accessions Reference

Barley USDA-ARS barley 2,303 Bowman et al. (2001)


core collection
Core collection 670 Fu et al. (2005)
Cassava Core collection 630 Chavarriaga-Aguirre
et al. (1999)
Finger millet Core collection 622 Upadhyaya et al. (2006a)
Maize Chinese maize core 1,193 Li et al. (2004)
collection
Pearl millet Core collection 1,600 http://icrtest:8080/Pearlmillet/
Pearlmillet/coreMillet.html
Potato Core collection 306 Huamn et al. (2000)
Rice USDA core collection 1,801 Yan et al. (2004)
IRRI core collection 11,200 Mackill and McNally (2004)
Sorghum Core collection 3,475 Rao and Rao (1995)
Wheat Novi Sad Core collection 710 Kobiljski et al. (2002)
Chinese common wheat 340 Dong et al. (2003)
core collection
a
Abbreviations: IRRI, International Rice Research Institute; USDA-ARS, United States Department of Agriculture-
Agricultural Research Service.

A B
100 100

80 80
Alleles detected (%)

60 60
USA-SAF
USA-SAF
USA-RS
40 USA-RS 40
World-SAF
World-SAF
World-RS
20 World-RS 20

0 0
5 10 15 20 26 30 35 40 45 50 5 10 15 20 26 30 35 40 45 50
Varieties selected (%) Varieties selected (%)

Fig. 5.3. Comparison of selection methods based on shared allele frequency (SAF) or random selec-
tion (RS) for identifying members of a core collection in rice. Proportion of RFLP (A) and SSR (B) alleles
detected in US and World collections based on SAF or RS. Modified from Xu et al. (2004).

the world collection, which was more diverse (iv) more samples were needed to adequately
than the US collection, which contained more represent genetic diversity if highly polymor-
pedigree-related cultivars; (ii) combining the phic markers were used (e.g. SSRs versus
use of SAF and unique alleles improved RFLPs).
the representativeness of the core collection; The core collection concept has aroused
(iii) core collections selected by SAF required considerable worldwide interest and debate
fewer samples than random selection for within the plant germplasm resources com-
the same level of representativeness; and munity. It has been welcomed as a way of
166 Chapter 5

making existing collections more accessible using carefully designed sets of molecular
through the development of a small group markers known to target specific traits or
of accessions that would be the focus of regions of the genome. The construction
evaluation and use and provide an entry of core collections using these approaches
point to the large collections that it aims may help establish heterotic groups from
to represent. However, a concern that still which parents can be chosen to establish
remains is that the available knowledge base populations for breeding hybrid crops.
regarding genetic diversity in any crop is
insufficient to enable a meaningful core to
be developed and that the most useful char-
acters often occur at such a low frequency 5.4 Maintenance, Rejuvenation
that they would be omitted from any small and Multiplication
core collection. Other concerns regard-
ing core collections include rendering the The main task of a germplasm bank is to
reserve collection more vulnerable to loss, conserve germplasm in a state in which it
the lack of representation of rare, endemic can be indefinitely propagated without loss
alleles and a poor relationship with the spe- of genetic diversity or integrity. In general,
cific needs of users (Gepts, 2006). the term base collection is applied to col-
When molecular markers are developed lections stored under long-term conditions,
from DNA sequences with unknown or no whereas the term active collection is used
function, identical marker alleles among for collections stored under medium-term
collections may not necessarily mean that conditions and working collection refers
these collections share identical functional to breeders collections usually stored under
alleles linked to the marker locus. Genetic short-term conditions. Monitoring the
variation for important phenotypic traits health of collections, particularly of field
could be lost if core collections are based genebanks, assessment of accession viabil-
solely on the use of such anonymous DNA ity and rejuvenation and multiplication of
markers. As the genome sequence is deci- collections are essential housekeeping func-
phered and the function of many genes is tions. For most crops with seed as the germ-
determined, gene-specific markers with plasm carrier, maintenance, rejuvenation
identified functional nucleotide polymor- and multiplication processes have been well
phisms (FNPs) will become available for established. This section will focus on prob-
many genes. Core collections of germplasm lem crops and the methodologies that will
constructed using FNPs could be assembled become increasingly important in the field.
to represent a core collection of genes.
As gene structurefunction relationships
are clarified with greater precision, it will
be possible to focus attention on genetic 5.4.1 In vitro storage techniques
diversity within the active sites of a struc-
tural gene or within key promoter regions. In the late 1970s and early 1980s, tissue cul-
This will make it productive to screen large ture or in vitro culture techniques had begun
germplasm collections for FNPs, targeting to make an impact in plant physiological
the search for alleles that are likely to be studies, vegetative propagation, disease erad-
phenotypically relevant at specific loci. ication and genetic manipulation. In vitro
From a primary collection, a user who had storage techniques were then recognized as
identified an accession or accessions of a way of conserving the genetic resources of
interest would move to the next level of problem crops and also of providing a con-
information where clusters of germplasm servation method for the emerging field of
known to represent a broader spectrum of plant biotechnology. Figure 5.4 provides a
diversity within a specific gene pool, or a flowchart for plant tissue culture from vari-
specific trait, could be defined. The second ous tissues to generate plantlets. A common
level of investigation could be conducted factor involved in plant tissue culture is the
Plant Genetic Resources 167

Meristem
Meristem culture
Meristem
or shoot
apex growth
Shoot tip culture Shoot tip culture
Shoot apex auxiliary branching multiple branching

Nodal Nodal Node


sections growth culture
Stock
Plantlets
plant

Adventitious Direct shoot


shoots formation
Direct
Explants morphogenesis
from
various Somatic Direct
tissues, embryos embryogenesis
e.g.
leaves, Adventitious Indirect
roots, shoots on shoot
Indirect
anthers callus formation
morphogenesis
Callus growth Callus
on explant
Somatic
embryos
from callus
Suspension Indirect somatic
culture embryogenesis
Somatic
embryos
from single
cells

Fig. 5.4. The principal methods of micropropagation.

growth of microbe-free plant material in an storage system, genetic erosion is reduced to


aseptic environment such as on sterilized zero; (v) by means of pollen and anther cul-
nutrient medium in a test tube. ture, haploid plants may be produced that
Along with other new technologies, in are of use in both genetics and breeding pro-
vitro techniques are now increasing the effi- grammes (for details see Chapter 4); (vi) they
ciency and security of conservation for not are useful in plant breeding programmes
only problem crops but many others as well. as a means of rescuing and subsequently
In vitro storage techniques have the follow- culturing zygotic embryos from incompat-
ing advantages (Dodds, 1991): (i) most in ible crosses that normally result in embryo
vitro systems posses the potential for very abscission; and (vii) reduced expenses,
high clonal multiplication rates under con- both in labour and financial terms, as com-
trolled environmental conditions; (ii) by pared to maintaining large field collections
generating plants through meristem culture is a further factor contributing to the use
in combination with thermotherapy, the cul- of in vitro collections. Other advantages of
ture systems are aseptic, can easily be kept in vitro techniques in germplasm conserva-
free from fungi, bacteria, viruses and insect tion include early maturation, e.g. in forest
parasites and are isolated from many exter- tree spp.; culturing and fusion of inter- or
nal threats, thus ensuring the production of intraspecific protoplasts; transformation of
disease-free stocks and simplifying quaran- nuclear and cytoplasmic structures of plant
tine procedures for international exchange cells through insertion of foreign DNA,
of germplasm; (iii) due to the miniaturiza- etc.; and production and transformation of
tion of explants they require less storage useful natural compounds in fermenters
space with continuous availability and ease (biosyntheses/biotransformations) (Kumar,
of shipment; (iv) in an ideal tissue-culture 1993; Ashmore, 1997).
168 Chapter 5

There are some examples of the appli- tures (meristems, shoot tips and embryos)
cation of in vitro storage techniques. are more stable than non-organized cultures
Techniques have been developed for the (protoplasts, suspensions and calli). Thus,
collection of species that produce recalci- organized cultures have a better likelihood
trant seeds and for vegetatively propagated of retaining their genetic integrity during
material, which enable a collector to intro- prolonged culture in vitro. There is a need
duce the material in vitro, under aseptic to develop and utilize storage methods that
conditions, directly in the field (Withers, reduce the maintenance requirements of
1995). This approach will allow germplasm plant cultures, while maintaining genetic,
collection to be made in remote areas (e.g. in biochemical and phenotypic stability. For
the case of highly recalcitrant cacao seeds) short- to medium-term maintenance, cell
or when the transport of the collected fruits and tissue cultures may benefit from some
would become prohibitively expensive (e.g. form of growth reduction, but for long-term
collecting coconut germplasm). Also in maintenance, growth suspension must be
cases where the target species does not have recommended.
seeds or other storage organs to be collected In vitro culture requires strict control of
or when budwood would quickly lose via- environmental conditions such as medium
bility or is contaminated, establishment constituents and often such conditions
of aseptic culture in the field will facili- are not immediately applicable to a wide
tate collection and improve its efficiency range of species or even to every selection
(Engelmann and Engels, 2002). within a species. Thus, culture conditions
The disadvantages of in vitro main- often need to be developed for each particu-
tenance are relatively high inputs of time lar species, subspecies or even culture of
and labour for culture establishment and interest. The International Board for Plant
maintenance, potential losses due to con- Genetic Resources (IBPGR, now known as
tamination or mislabelling, risk of microbial Biodiversity International) published gen-
infection at each subculture, the cumulative eral recommendations for in vitro storage,
risk of somaclonal variation with time and including recommendations on the design
accidental loss through equipment failure. and operation of culture facilities (IBPGR,
With certain tissue cultures, the morphoge- 1986). Extensive research has been car-
netic potential of the cultures may decline ried out to reduce growth rates by reduc-
after growth under in vitro conditions for an ing the components in the culture medium
extended period of time. Prolonged main- and modifying the physical environment.
tenance of de-differentiated plant cells Modification of the gaseous environment
and tissues in vitro by repeated subcul- by mineral oil overlay or control of the gas
ture is expensive, time consuming, labour balance can retard growth. However, the
intensive and often results in a reduction most practical and effective slow growth
in morphogenic or biosynthetic capacity methods to date involve reducing the cul-
and changes in genetic, chromosomal or ture temperature and/or adding osmotic
genomic composition, such as mutations, retardants to the culture medium. Cultures
aneuploidy and polyploidy. Somaclonal of many species can be maintained in this
variation resulting from subculture may way for 6 months2 years without the need
manifest itself at the molecular, biochemi- for subculturing.
cal or phenotypic level. However, its extent
can be limited by controlling environmen-
tal factors such as medium formulation and
subculture interval, but cannot be elimi- 5.4.2 Cryopreservation
nated with certainty unless metabolism
is suspended. The type of explant and the Efforts have been made to reduce or elim-
preservation method can have a significant inate the risks discussed above by the
impact on survival and the extent of soma- development of slow growth methods for
clonal variation. In general, organized cul- medium-term storage and cryopreservation
Plant Genetic Resources 169

for long-term storage (Withers, 1993; servation has concentrated on organized


Harding, 2004) which is considered to be cultures such as shoot tips, shoots and
the most effective way of preserving in vitro zygotic embryos. Some plant materials have
cultures for extended periods. Cryogenic survived well under conventional cryop-
storage is achieved at below 130C, where reservation while others have remained
liquid water is absent and molecular kinet- completely unresponsive. There is a large
ics and diffusion rates are extremely low. In group of materials that survive to some
practice, this refers to storage in or over liq- degree at the cell and tissue level, but with
uid nitrogen (196C liquid phase, 150C some physical damage. In these cases cal-
vapour phase). Under conditions of cryop- lusing, which has the attendant risks of
reservation, metabolism ceases and there- somaclonal variation, is employed instead.
fore, time effectively stops. Advantages of Recent efforts have increased the number of
cryopreservation include indefinite storage species that can be cryopreserved as shoots
without subculturing, frequent viability and have improved the quality of cryopre-
testing or plant generation and maintenance served specimens. However, progress has
of the biosynthetic and regeneration capac- been slow and has involved a relatively nar-
ity of cultures over time. The only threats to row range of species. Recalcitrant seeds
the survival and integrity of material once present a dual problem in cryopreservation:
conditions have been determined for safely they are often large and therefore prone to
conveying it to and from the storage tem- structural injury and they are very sensitive
perature are accidental thawing under sub- to dehydration. Perhaps the most promising
optimal conditions and free radical damage and intriguing of the new developments in
induced by background irradiation (Benson, cryopreservation involves artificial seed
1990). The former risk can be minimized by technology. Dereuddre et al. (1991) pio-
appropriate equipment, backup systems neered a technique involving encapsula-
and laboratory procedures. The latter can tion, dehydration and cryopreservation of
be minimized by screening and application shoot tips or somatic embryos. These are
of free radical-scavenging cryoprotectants. encapsulated in an alginate gel, dehydrated
The success of cryopreservation depends in air or by incubation in a hypertonic
on many factors such as the starting mate- sucrose solution and cooled rapidly. This
rial and its precondition, the cryoprotect- can give much higher survival levels and
ant treatment and the freezing and thawing result in less structural damage than con-
rates. Evidence suggests that genetic stabil- ventional approaches.
ity is maintained in cryopreserved materi-
als and any genetic damage probably occurs
during the actual freezing and thawing
phases, rather than during storage. 5.4.3 Synthetic seeds and storage of DNA
The first successes in plant cell cryo-
preservation were achieved in the early Two other techniques may be useful for con-
1970s. Cell suspension cultures have been serving genetic materials that are difficult
among the most amenable materials to this to conserve with conventional techniques.
method of preservation. The procedure for These are the production and preservation
these materials involves the following of synthetic seeds and the preservation of
stages: pre-growth, cryoprotection, cooling nucleic acids (DNA). Storage of DNA is in
(protective dehydration), storage, increas- principle, simple to carry out and widely
ing the temperature, post-thaw treatment applicable; however, it in no way replaces
and recovery growth. Cryopreserved embry- or solves the problem of the storage of germ-
ogenic suspension cultures from which plasm since at this point whole organisms
plants can be regenerated are a potentially cannot be regenerated from DNA. It may
valuable tool for conservation. Because of be viewed instead as complementary. As
concerns regarding genetic stability, most techniques in biotechnological approaches
cryopreservation research for genetic con- to breeding progress however, it is possible
170 Chapter 5

to envisage a role for the discrete collec- technologies, similar to stored DNA. Pollen
tion of genes governing particular traits. storage would also be a useful adjunct for
Furthermore, as the barriers between gene germplasm preservation of lines developed
pools are reduced through biotechnology for breeding programmes; however, cyto-
thereby facilitating the selective transfer of plasmic genes may not be conserved. Pollen
genes, relevant materials stored in the form storage can preserve genes, but may not pre-
of DNA may not necessarily originate from serve desirable gene combinations.
the target crops own gene pool.
Plant preservation initiatives have
by necessity, focused on the conservation
of species and landraces of international 5.4.4 Rejuvenation and multiplication
agricultural importance and to a lesser
extent, endangered and threatened spe- The loss of the germination capacity of
cies. Conversely, little effort has been made stored seeds necessitates their periodic reju-
to systematically collect and preserve the venation. As seed ages, before losing germi-
increasing number of genotypes being nation capacity, mutations increase and if
developed with biotechnological applica- rejuvenation of the material does not take
tions (Owen, 1996). Since these elite cul- place within a particular period of time,
tures are being maintained by individual the genetic structure of the population can
researchers or laboratories, they are in dan- vary. The multiplication site should have
ger of being lost. Owen (1996) highlighted ecological characteristics similar to those
several methods for the maintenance and where the material was collected in order
storage of plant germplasm, with particular to prevent selection that can change the
emphasis on those techniques most applica- allelic frequencies, even eliminating those
ble to the preservation of elite cultures used alleles most sensitive to certain soilclimate
in biotechnology and the plants derived factors.
from them. Tissue culture can be applied to mass
Somatic and zygotic embryos have been produce carbon copies of a selected (elite)
suggested as useful propagules for preser- plant whose agronomic characteristics are
vation. Somatic embryogenesis involves known (Fig. 5.4). It allows propagation of
adventitious propagation and produces plant material with high multiplication
cultures that are convenient to handle and rates in an aseptic environment. Since the
amenable to some technologies such as arti- 1970s, in vitro propagation techniques,
ficial seed production, which can be linked mainly based on micropropagation and
to storage by cryopreservation. Preservation somatic embryogenesis, have been exten-
of some recalcitrant species has been made sively developed and applied to thousands
possible by the observation that excised of different species.
embryos behave in an orthodox manner and As indicated previously, in vitro propa-
can be cryopreserved. Research has also gation techniques can be used for rapid
been conducted to determine the feasibil- clonal multiplication of germplasm in the
ity of using desiccated somatic embryos or vegetative form and also for other materi-
encapsulated somatic embryos. Preservation als such as recalcitrant seeds which are
of these synthetic seeds would be espe- often available in relatively small numbers.
cially useful for the preservation of clonal Preference is generally given to propagation
lines; however, more research is needed to methods that confer the lowest risk of soma-
elucidate how to increase viability after dry- clonal variation in culture, such as shoot
ing and to inhibit precocious germination. or meristem cultures reproduced by non-
The total genetic information of a adventitious means. However, other factors
plant can be readily isolated and DNA seg- must be included in the total equation; these
ments can be stored in lyophilized form. include availability of the preferred propa-
Thus, it would be a useful method for the gation technique, ease and rate of multipli-
storage of genes of interest to gene transfer cation and amenability to storage.
Plant Genetic Resources 171

5.5 Evaluation controlled by many genes, it is more diffi-


cult to distinguish accessions by phenotypic
Just having thousands of germplasm acces- evaluation alone because the same pheno-
sions available is not helpful if they are type may be controlled by different genes
not appropriately utilized. The evaluation and vice versa. As a result, exotic germ-
of germplasm resources is a prerequisite plasm which is perceived to be a poor bet
for their utilization in crop improvement. for the improvement of most traits based on
Vast genetic resources are available for phenotypic examination, may contain some
crop plants, but, to date, few of them have superior genes (alleles) for the improve-
been well characterized, either phenotypi- ment of most traits, but they lie buried amid
cally or genotypically. Of those that have the thousands of accessions maintained in
been characterized, it is usually only for genebanks (e.g. de Vicente and Tanksley,
a small number of traits and usually only 1993; Xiao et al., 1998). Visual evaluation
on a phenotypic level; on a genotypic level is not efficient enough to identify all of
the available information on useful traits is these features. For example, evaluation of
even less abundant. However, as the need anatomic and quality characteristics can be
for readily available genetic and genomic upgraded by using new techniques to reveal
information to use in targeted applications the novel differences that cannot be identi-
has grown so has interest in and support for fied by eye.
molecular evaluation of genebank materi- As types of conserved germplasm
als. The evaluation of a population or germ- extend to include tissues, cells and DNA,
plasm accession starts at the moment of additional evaluation criteria and meth-
collection and should never actually end. ods are needed. For example, specificity
The term descriptor is used increasingly of tissues or cells can be determined by
often in referring to each of those charac- their responses to specific culture media.
ters considered important and/or useful in DNA samples can be evaluated by their
the description of a population or acces- physical and chemical properties such as
sion. Species descriptors differ according to absorbance spectra, electrophoretic sepa-
whether they have been selected by plant ration and staining reactions. Even for
breeders, botanists, geneticists or experts in seed-based germplasm, molecular biology
other disciplines. provides many novel methods for evaluat-
Genetic evaluation and utilization of ing germplasm at the cellular, chromosomal
seed-based germplasm has focused on the and molecular levels; for example, identi-
characteristics of the seed itself and on fication of chromosome abnormalities and
whole plants such as morphological and differences at the DNA level. As a result,
physiological traits, stress tolerance and germplasm will be evaluated not only at
quality. Most traits scored first are easy to morphological and physiological levels
score, often by eye. Genetic evaluation of but also at a multidisciplinary level that
germplasm resources has been broadened includes molecular biology. Among all the
with the development of plant breeding pro- available new techniques, the most feasible
grammes incorporating molecular biology. is molecular marker technology which is
Morphological evaluation can be extended based on DNA differences and/or its integra-
from several traits of economical impor- tion with quantitative trait loci (QTL) analy-
tance to almost all traits that differ among sis. As described in Chapters 6 and 7, this
germplasm accessions. This paradigm for technique allows a more precise identifica-
germplasm characterization is based on the tion and definition of genes, alleles and the
evaluation of phenotypic variation of entries useful traits they underlie and can be used
from a genebank for a clearly defined char- to identify and extract superior genes (alle-
acteristic that is recognizable in the whole les) from inferior germplasm, thus allowing
plant. This approach works well when the germplasm evaluation to move from mor-
phenotype is controlled by major genes. For phological and physiological levels to bio-
traits such as yield, which is genetically chemical and molecular levels.
172 Chapter 5

5.5.1 Marker-assisted germplasm lar tools and comparative biology to explore


evaluation and exploit the genetic diversity housed
in existing germplasm collections with a
Marker-assisted germplasm evaluation particular focus on improving the drought
(MAGE) aims to complement phenotypic tolerance of various cereals, legumes and
evaluation by helping to define the genetic clonal food crops. One of the primary goals
architecture of germplasm resources and by of the GCP is the extensive genomic char-
identifying and managing germplasm that acterization of global crop-related genetic
contains alleles associated with traits of resources (composite collections); initially
economic importance (Xu, Y. et al., 2003). using SSR markers to determine popula-
Molecular markers may allow for charac- tion structure and now moving on to whole
terization based on genes, genotypes and genome scans (including SNP and diversity
genomes which provides more precise infor- array technology (DArT) arrays) and func-
mation than classical phenotypic or passport tional genomics analysis of subsets of germ-
data. Molecular marker data can be used to plasm (mini-composite collections). Thus,
answer questions of identity, duplication, the GCP has created composite collections
genetic diversity, contamination and integ- covering global diversity for most of the 20
rity of regeneration. In addition, molecular CGIAR mandated crops. These consists of
markers are extremely powerful for identi- up to 3000 accessions or no more than 10%
fying zygosity at important loci in species of the total number of available accessions
which are vegetatively propagated such as for inbreeding crops and 1500 accessions
potato, sugarcane, taro and sweet potato. for outcrossing species (where each acces-
Many features revealed by molecular mark- sion must be treated as a population). It is
ers, such as unique alleles, allele frequency expected that this analysis will also lead to
and heterozygosity, mirror the genetic the development of genetically broad-based
structure of germplasm resources at the mapping and breeding populations. The
molecular level (Lu et al., 2009). On a more results from these GCP-supported projects
fundamental level, molecular marker infor- are already starting to be made available
mation can lead to the identification of use- for the benefit of the scientific commu-
ful genes contained in collections and aid in nity. Furthermore, the GCP is supporting
the transfer of these genes into well-adapted a project on allele diversity at orthologous
cultivars. MAGE can play an important role candidate (ADOC) genes that will produce
in the procedures related to the acquisi- and deliver a public data set of allelic diver-
tion/distribution, maintenance and use sity at orthologous candidate genes across
of germplasm (Bretting and Widrlechner, eight important GCP crops and assess whole
1995; Xu, Y. et al., 2003). As summarized sequence polymorphism in a DNA bank of
by Xu, Y. et al. (2003), molecular mark- 300 reference accessions for each crop. This
ers can be used for: (i) differentiating cul- reference germplasm which has already
tivars and constructing heterotic groups; undergone one level of genome scan, will be
(ii) identifying germplasm redundancy, evaluated for traits associated with drought
underrepresented alleles and genetic gaps in tolerance to test for associations between
current collections; (iii) monitoring genetic observed polymorphisms and trait variabil-
shifts that occur during germplasm storage, ity (http://www.intl-pag.org/14/abstracts/
regeneration, domestication and breeding; PAG14_W264.html).
(iv) screening germplasm for novel genes or Molecular markers can be used for
superior alleles; and (v) constructing a rep- germplasm management in different ways.
resentative subset or core collection. Markers with known functional alleles or
The realization of the importance associated with agronomic traits can be
of MAGE led to the formation of the used to trace, select and manage these alle-
Generation Challenge Programme (GCP) les or traits. Genetic markers that reveal
(http://www.generationcp.org) (Dwivedi multiple bands or represent multiple loci
et al., 2007). The GCP aims to utilize molecu- such as RAPD or amplified fragment length
Plant Genetic Resources 173

polymorphism (AFLP), are usually dif- needed to detect linkage disequilibrium


ficult to trace back to specific alleles/loci, between any two markers in the genome
so they need to be converted into markers can be used to judge how many markers are
that are locus-specific such as sequence needed for a genome-wide MAGE, which is
tagged sites (STSs), SSRs or SNPs. Neutral apparently crop-dependent and also higher
markers or markers in unknown chromo- than in most MAGE projects that have been
somal regions can be used for fingerprint- reported if genome-wide germplasm is eval-
ing and background examination. In this uated within closely related germplasm or
case any type of marker that detects a high populations at gene or sequence level. With
rate of polymorphism is useful as long as the development of SNP markers cover-
it is able to reveal genome-wide polymor- ing whole genomes and high-throughput
phism. Spooner et al. (2005) provided some array-based genotyping systems (Chapter 3;
examples of the use of molecular markers in Lu et al., 2009; Yan et al., 2009), using mark-
genebank management, including assessing ers developed from all candidate genes for
the level of redundancy within and between all available germplasm collections can be
collections and the genetic integrity of a realistic target in the near future.
accessions during the course of genebank
operations such as regeneration as well as
the presence and magnitude of gene flow. 5.5.2 In vitro evaluation
As discussed in Chapter 9 and by Xu, Y.
(2003), an efficient MAGE system consists The basis for obtaining unique plants from
of several key components. cell culture resulted from the observa-
MAGE largely depends on multivariate tion that plant cells in culture, are geneti-
analysis of DNA genotypes. There are sev- cally variable. Cell culture was envisioned
eral questions that need to be answered for as a means of preferentially selecting cell
each experiment: Which entities should be lines with a mutation to cope with a spe-
sampled? What is the nature of the genetic cific selective agent. Plants generated from
material to be sampled? How should heter- such cell lines in many cases expressed the
ogeneous, segregating populations be sam- new traits at the whole-plant level giving
pled? What types of variables should be enhanced agronomic value for that spe-
measured? How many variables (e.g. mark- cific trait. There was a correlation between
ers) should be measured? Should analyses cell-level response to a selective agent and
be carried out on raw multivariate data or that of the entire plant. By using a specific
derived genetic similarities? Among these selective agent to preferentially screen for
questions, the most frequent might be how a specific cell line, the number of plants
many markers are enough for a genome- involved in whole-plant level screening can
wide MAGE; the answer however, depends be reduced. Techniques useful in the char-
on the questions that MAGE is expected to acterization and evaluation of cultures for
answer, and it also depends on the type of genetic stability include nuclear cytology,
marker being used. Smith et al. (1991) used isozyme analysis, DNA marker-based analy-
200 RFLP markers dispersed across the sis, as well as other molecular and biochem-
maize genome to fingerprint 11 inbred lines ical methods. As large populations of cells
(the genetic distance matrix comprised 55 can be screened in vitro in a small space as
elements). They estimated distance matrices compared to traditional screening of large
by sampling 5200 RFLP markers in incre- plant populations in the greenhouse or field,
ments of five (e.g. from 5, 10, 15, up to 200). this could save a vast amount of time, space,
They concluded that accuracy was sufficient labour and money and allow for screening
with 100 or more markers. Bernardo (1993) all year round in a controlled laboratory.
concluded that 250 or more marker loci Also, in vitro screening could reduce some
were needed to produce precise estimates of the problems conventional screening may
of coefficients of co-ancestry. As a rough encounter due to environmental variation
estimation, the number of markers that are and poor uniformity within a field.
174 Chapter 5

Plants lose many of their distinguish- cific combining ability; and (iv) compari-
ing phenotypic characteristics when trans- son of genetic diversity among different
ferred to in vitro culture. Therefore, accurate groups of maize germplasm. Taking maize
records and vigilance are essential to ensure as an example, some applications in these
that genetic integrity is preserved. The risk areas can be found in Melchinger (1999),
of somaclonal variation can be reduced by Warburton et al. (2002), Betrn et al. (2003),
monitoring, but frequent regeneration of Reif et al. (2004), Xia, X.C. et al. (2005) and
plants from stored cultures and monitoring Lu et al. (2009). Such studies have provided
them in the field is costly and inefficient. useful information for genebank curation,
Techniques are needed that can be applied gene identification and breeding.
easily and economically to cultured mate- Understanding the range of diversity
rial. Visual monitoring in vitro will only and the genetic structure of gene pools is
detect the most gross of variants, e.g. vari- critical for the effective management and
egated leaves or extreme dwarfism. More use of germplasm resources. The first ques-
accurate, wide-ranging and reliable moni- tion to ask might be about the distinctive-
toring may be achieved by biochemical ness of the concerned entities since the
methods and molecular marker techniques. issue of what level of diversity we should
Minute changes at the DNA level can be actually try to maintain is still under debate.
detected with molecular markers that pro- Some have argued that highly unique enti-
vide an ideal means of determining genetic ties should be given preference over equally
integrity. It may become possible to detect rare taxa with close relatives of abundant
undesirable variants in an overnight proce- distribution (Vane-Wright et al., 1991) while
dure at the culture stage, thereby eliminating others argue that evolutionary potential
the need for establishing in vitro propagated is highest in species-rich groups since the
plantlets in the field for monitoring. ability to adapt is seemingly greater (Erwin,
1991). On the other hand, the importance
of species versus subspecies, hybrids and
populations has generated considerable
5.5.3 Genetic diversity debate about the scientific legitimacy of
legal conservation units (OBrien and Mayr,
New genetic technologies, especially large- 1991). Therefore, as indicated by Hahn and
scale DNA sequencing (Chapter 3), have Grifo (1996), the first measures to be taken
led to the development of molecular sys- with molecular methods are taxon-specific
tematics and new methods of measuring markers and estimation of the degree of dif-
genetic similarity and divergence in plant ferentiation between units.
species and populations. It is now possi- Diversity studies are generally under-
ble to compare organisms from the genome taken using molecular markers that are
level (using for example, fluorescent in situ assumed to be neutral, that is, not within
hybridization or FISH) down to the level of expressed regions of the DNA. The correla-
single nucleotides (DNA sequencing and tion between molecular variation and quanti-
SNPs). Molecular markers have been used tative variation in expressed traits has rarely
for genetic diversity studies for the follow- been studied in detail but is an issue that must
ing purposes: (i) examination of genotype be addressed if studies in genetic diversity
frequencies for deviations at individual loci are to be used more effectively in biodiver-
and characterization of molecular variation sity assessment and conservation (Butlin and
within or between populations; (ii) con- Tregenta, 1998). Across a large genome, such
struction of phylogenetic trees or classi- as that of maize, diversity can accumulate so
fication of germplasm accessions based on that 150 million sites are commonly poly-
genetic distance and determination of heter- morphic. A small but important proportion
otic groups for hybrid crops; (iii) analysis of of these polymorphisms is responsible for
the correlation between the genetic distance the complex variation in phenotypic traits.
and hybrid performance, heterosis and spe- Molecular markers have increased our under-
Plant Genetic Resources 175

standing of the spatial and temporal patterns adapted and wild related species contain
of genetic variation and of the evolutionary untapped sources of new alleles for future
mechanisms that generate and maintain vari- crop breeding improvement (Tanksley and
ation. However, the direct benefit of these McCouch, 1997).
data to either practical biodiversity conserva-
tion or germplasm collection management is Factors impacting genetic diversity
equivocal (Harris, 1999).
Several past studies have highlighted The extent of polymorphism differs sub-
the decline of genetic diversity in modern stantially between species and sampled
cultivars compared to landraces or wild loci. In a comprehensive study of variation
relatives. In maize, for example, Liu et al. within a maize chromosome, the diver-
(2003) evaluated the genetic diversity among sity at 21 loci varied by 16-fold (Tenaillon
260 diverse maize inbred lines with 94 SSR et al., 2001). The variation between loci may
markers and found that tropical and sub- partly reflect sampling effects but selec-
tropical inbreds contain a greater number tion and other factors play a more impor-
of alleles and gene diversity than temperate tant role (Table 5.2). Although many factors
inbreds. It was also found that maize inbreds influence diversity, the neutral theory of
capture less than 80% of the alleles seen evolution suggests that the level of poly-
in the landraces, suggesting that landraces morphism (q) should be the product of the
can provide substantial additional genetic effective population size (Ne) and the muta-
diversity for maize breeding. After analys- tion rate (m) with q = 4 Nem (Kimura, 1969).
ing over 100 maize inbred lines and teosinte Unfortunately, there is little empirical
accessions with 462 SSRs, Vigouroux et al. proof of this in plants. Background selec-
(2005) concluded that many alleles in the tion is likely to be one of the major factors
progenitor species of maize (teosinte) are determining nucleotide diversity and it
not present in maize. Wright et al. (2005) suggests that diversity should be shaped
compared SNP diversity between maize by recombination at the intragenomic scale
and teosinte in 774 genes and concluded and by the outcrossing rate at the species
that maize accessions had much less genetic level. Strong selection pressure is impor-
diversity consistent with products of artifi- tant in decreasing the nucleotide diversity
cial selection and crop improvement. These of some plant species. During the selec-
reports in maize along with genetic mapping tion of advantageous phenotypes, some
studies involving wild relatives in other crops appear to have passed through bottle-
crops, support earlier conclusions that non- necks that substantially reduced diversity

Table 5.2. Factors that impact nucleotide diversity (reprinted from Buckler and Thornsberry (2002) with
permission from Elsevier).

Factor Correlation with diversity Scope

Mutation rate Positive Often whole genome


Population size Positive Whole genome
Outcrossing Positive Whole genome
Recombination Positive Whole genome
Positive-trait selection Negative Individual genes
Line selection Positive Whole genome
Diversifying selection Positive Individual genes
Balancing selection Positive Individual genes
Background selection Negative Individual genes or whole genome
Population structure Mixed Whole genome
Sequencing errors Positive Individual genes
PCR problems Negative Individual genes
176 Chapter 5

(Doebley, 1992). Balancing selection and/ information content of the sample accession
or frequency-dependent selection may also (Brown and Brubaker, 2002).
play an important role in increasing diver- When genetic marker data can be inter-
sity at specific loci within a genome. In preted by a locus/allele model, allelic diver-
these selection regimes, selection favours sity can be described by: (i) the percentage
the maintenance of multiple alleles with of polymorphic loci, calculated by divid-
different effects over evolutionary time. ing the number of polymorphic loci by the
total number of loci assayed; (ii) the mean
Measurement of diversity number of alleles per locus, calculated by
dividing the total number of alleles detected
The estimation of genetic similarity is vital by the number of loci assayed; (iii) total
to the formulation of optimal germplasm gene diversity or average expected hetero-
management strategies and lies at the core of zygosity (Nei, 1973; Brown and Weir, 1983),
modern plant systematics and evolutionary calculated by
biology. Plant systematicists and evolution-
ary geneticists have developed techniques m

for analysing genetic similarity that may be


ideally suited for addressing certain germ-
H = 1 P /m
i j =1
2
ij (5.1)

plasm management issues. Kresovich and


McFerson (1992) highlighted the important and (iv) polymorphic information content
role of genetic diversity assessment in plant (PIC), which was described by Botstein et al.
genetic resource management. One sim- (1980) to refer to the relative value of each
ple estimate of genetic diversity in a given marker with respect to the amount of poly-
taxon, germplasm collection or geographic morphism exhibited and is estimated by
region is the number of taxa included in the m
larger unit (e.g. the number of subspecies
found in a species in a given region). Yet,
PIC i = 1 P
j=1
2
ij (5.2)

the number of recognized subordinate taxa


may vary substantially among taxonomic In both Eqns 5.1 and 5.2, Pij is the frequency
treatments as may the actual level of genetic of the jth allele at the ith of m loci. The
differentiation among such taxa (Bretting variances of all these estimates are affected
and Goodman, 1989). Accordingly, diversity by the number of loci and by sample size
estimates derived from genetic marker data the number of progeny assayed per plant,
may be more valuable than counts of taxa plants assayed per population or number
for most germplasm management applica- of populations assayed per taxon (Brown
tions, since such estimates can be more eas- and Weir, 1983; Weir, 1990). Various theo-
ily compared across taxa and the focus may retical and empirical studies suggest that
be on conserving genes rather than taxa. for precise estimates, the number of loci
Because the genome is assayed directly, assayed may be more critical than the sam-
DNA-based technologies circumvent the ple size but that the latter should be as
often poor correspondence between mor- large as practical.
phological and genetic diversity in crop spe- In various applications of molecular
cies. With STSs developed from expressed marker data, a proper choice of a similar-
sequence tags (ESTs), it is even possible to use ity s or dissimilarity coefficient (d = 1 s)
expressed genes specific to life history stages, is important and depends on factors such
rather than anonymous sequence differences as: (i) the properties of the marker system
to assay genetic differences among accessions. employed; (ii) the genealogy of the germ-
Because database comparisons can often plasm; (iii) the operational taxonomic unit
identify the functional product of an EST, the (OTU) under consideration (e.g. lines, pop-
genebank manager obtains not only an indica- ulations); (iv) the objectives of the study;
tor of genetic diversity and the relationships and (v) the necessary preconditions for sub-
among accessions but also an increase in the sequent multivariate analyses.
Plant Genetic Resources 177

A wide variety of pairwise genetic sis of a published data set consisting of


similarity measures is available but only seven International Maize and Wheat
a few have been widely applied. Reif et Improvement Center (CIMMYT) maize
al. (2005) examined ten dissimilarity co- populations demonstrated close affin-
efficients widely used in germplasm ity between Euclidean, Rogers, modified
surveys (Table 5.3) with special focus Rogers (Rogers, 1972; Wright, 1978) and
on applications in plant breeding and Cavalli-Sforza and Edwards distance on
seed banks, by investigating their genetic one hand, and between Neis standard and
and mathematical properties, examin- Reynoldss dissimilarity on the other. This
ing the consequences of these properties study also showed that the genetic and
for different areas of application in plant mathematical properties of dissimilarity
breeding and seed banks and determin- measures are of crucial importance when
ing the relationships between these choosing a genetic dissimilarity coeffi-
ten coefficients. A Procrustes analy- cient for analysing molecular data.

Table 5.3. Dissimilarity coefficients d for allelic informative marker data. pij and qij are allelic frequencies
of the jth allele at the ith locus in the two operational taxonomic units consideration, ni is the number of
alleles at the ith locus, and m refers to the number of loci.

Variable Dissimilarity coefficient Range

m ni
dE
(p
i =1 j =1
ij q ij )2 Euclidean 0, 2m

m ni

2 (p
dR 1 1 Rogers (1972) 0,1
ij qij ) 2
m i =1 j =1

m ni

(p
dW 1 Modified Rogers 0,1
ij qij )2
2m i =1 j =1

m ni


1
dCE (1 pij qij ) Cavalli-Sforza and Edwards (1967) 0,1
m i =1 j =1

dRE ln(1 q) Reynolds et al. (1983) 0,

m ni

p q ij ij

dN72 ln i =1 j =1 Nei (1972) 0,


m ni m ni

p q
i =1 j =1
2
ij
i =1 j =1
2
ij

m ni

(1
dN83 1 Nei et al. (1983) 0,1
pij qij )
m i =1 j =1

m ni ni
2 (p (p
1 1 2
ij qij )2 2
ij + qij2 )
2(2n 1)
i =1 j =1 j =1
q= m nj

(1 p q )
i =1 j =1
ij ij
178 Chapter 5

Germplasm classification are deployed together because their strengths


are complementary (Sneath and Sokal,
Germplasm can be classified on the basis 1973; Dunn and Everitt, 1982; Sokal, 1986).
of morphological traits, geographic distri- In cluster analysis, taxa, germplasm col-
bution, evolutionary and breeding history, lections or genetic markers are arranged in
pedigree and/or genotypic diversity at the a hierarchy (called a phenogram or den-
molecular level. Both categorical and quan- drogram) by an agglomerative algorithm
titative data have been used for phenotype- according to patterns occurring in a matrix
based classification. A broad-based approach of pairwise genetic similarities as described
to germplasm classification will contribute above. The hierarchies obtained from clus-
to our understanding of the genetic structure ter analyses are highly dependent on both
of subpopulations within a species, how to the similarity measure and the cluster-
identify useful gene donors and the rationale ing algorithm used. The most frequently
for constructing heterotic groups for hybrid used clustering methods involve arithme-
breeding. A classification technique may be tic means (either UPGMA or weighted-
considered optimal if it has these character- pair group means arithmetic (WPGMA) )
istics (Crossa and Franco, 2004): (i) produces (Sneath and Sokal, 1973). One of the com-
clusters that respond to the optimization of mercial packages which implements these
a target function; (ii) is linked to a technique and other methods is NTSYS (http://www.
for defining the optimum number of groups, exetersoftware.com/cat/ntsyspc/ntsyspc.
preferably in the form of a statistical hypoth- html). More recently, a comprehensive set of
esis test; (iii) helps to calculate a measure of statistical methods for genetic marker data
the quality of the clusters; (iv) assigns obser- analysis, designed especially for SSR/SNP
vations to the groups, based on the prob- data analysis, POWERMARKER, has been widely
ability of each observation belonging to each used for cluster analysis (e.g. Lu et al., 2009).
group; (v) uses the information available in POWERMARKER has options for selecting dif-
categorical variables as well as in continu- ferent distances and clustering methods and
ous variables; and (vi) may be extended to is free for download at http://statgen.ncsu.
the problem of classification when the vari- edu/powermarker/.
ables are measured in different environ- With ordination, the multidimensional
ments. The best numerical classification variability in a pairwise, intertaxa or inter-
strategy is the one that produces the most marker similarity matrix can be portrayed
compact and well-separated groups, that is, in one or several dimensions through eigen-
minimum variability within each group and structure analysis. Ordination is best suited
maximum variability among groups. Crossa to revealing interactions and associa-
and Franco (2004) reviewed geometric clas- tions among taxa or germplasm accessions
sification techniques as well as statistical described by traits that vary continuously
models based on mixed distribution models. and quantitatively. Principal component,
The two-stage sequential clustering strategy, principal coordinate and linear discrimi-
which uses all variables, continuous and cat- nant analyses are the ordination techniques
egorical, tends to form more homogeneous most relevant for potential germplasm man-
groups of individuals than other clustering agement applications.
strategies. The sequential clustering strate- There are numerous reports on germ-
gies can be applied to three-way data com- plasm classification using molecular mark-
prising genotype environment attributes. ers. Only two examples will be discussed
This approach groups genotypes with con- here. In sorghum, 46 converted exotic lines
sistent responses for most of the continuous representing all five races and nine interme-
and categorical traits across environments. diate races of sorghum were fingerprinted
Patterns of genetic similarity among using AFLP and SSR markers. A total of 453
taxa or germplasm collections can be visu- scored marker loci were used to calculate
alized by cluster analysis and ordination. genetic similarities between the lines. The
Ideally, these two multivariate techniques dendrogram constructed using UPGMA
Plant Genetic Resources 179

grouped 31 lines into three major clusters 1 WC-6

with Jaccard coefficients greater than 0.75. 1 Colusa


The remaining 15 lines were grouped into
1 Early Wataribune
four small sub-clusters each with two lines
and seven single accession nodes (Perumal 1 Caloro
et al., 2007). RFLP marker-based analysis 1 Chinese
of 236 rice cultivars identified two major
1 Shoemed
groups which corresponded to the two major
rice types, indica and japonica. By compar- 2 Delitus
ison of allele frequencies between indica 2 Carolina Gold
and japonica cultivars, several subspecies-
2 Blue Rose
specific alleles were identified, with one
allele existing in more than 99% of indica 2 Improved Blue Rose
cultivars and another in more than 99% of
2 Supreme Blue Rose
japonica cultivars (Xu, Y. et al., 2003).
Figure 5.5 provides an example of clus- 2 Lady Wright

tering analysis using 169 SSR markers to 2 Edith


classify 18 US rice cultivars collected or
2 Honduras
selected before 1930 (Lu et al., 2005). These
cultivars were classified into three groups, 3 Nira

which corresponded to three types of cul- 3 Sinampaga Select


tivars with different grain sizes, i.e. short
3 Fortuna
grain cultivars in the western US rice belt
(California) and medium and long grain 0.1
3 Rexoro
cultivars in the southern US rice belt. These Genetic distance
three groups of cultivars (Fig. 5.5) formed
the foundation of germplasm resources for Fig. 5.5. Groups of 18 US rice cultivars collected
breeding short-grain temperate japonica and or selected before 1930 based on 169 SSR mark-
medium- and long-grain tropical japonica ers using UPGMA methods and Neis (1972)
genetic distance (Lu et al., 2005). Three groups
cultivars, respectively, in the USA.
(1, 2, 3) can be identified, consisting of 6, 8, and
Germplasm classification can be used
4 cultivars, and representing short-, medium-, and
to construct heterotic groups so that culti- long-grain US rice cultivars, respectively. From
vars within each group have a high level of Lu et al. (2005) with permission.
similarity in their genetic backgrounds. As
a result, intergroup hybrids show a higher
level of heterosis than within-group hybrids. heterotic groups (Xu, Y. 2003). Divergence
Commercial maize hybrids are typically at molecular marker loci has also been use-
created between inbreds from opposite, ful in assigning maize inbreds to known
complementary heterotic groups. Heterotic heterotic groups previously established in
patterns in many crop species have been breeding programmes, and the molecular
established based solely on large numbers information agreed with pedigree infor-
of testcrosses and extensive breeding expe- mation (Lee et al., 1989; Melchinger et al.,
rience. For inbreeding species for which 1991; Messmer et al., 1993).
subspecies or subpopulation differences Two areas need further development in
may be older or more pronounced than in germplasm classification: methods of data
cross-pollinating species, DNA-based mark- analysis and the understanding of molecu-
ers can be used to classify germplasm acces- lar diversity in relation to quantitative vari-
sions into different heterotic groups, each ation. Methods for the analysis of molecular
with a high level of similarity. Research data have not kept up with the sophisti-
results from rice, Brassica napus, barley cation of the methods of data generation
and wheat indicate that DNA markers are (Harris, 1999). Thus, it is common to find
very useful tools for the construction of sophisticated molecular data (e.g. AFLP)
180 Chapter 5

being analysed using similarity measures utilization strategies may change tangibly.
derived decades ago. Similarity measures For example, since at least the mid-1980s,
and classification methods are needed spe- maize evolutionists in general have accepted
cifically for handling molecular marker data the tripartite hypothesis of Mangelsdorf
from polyploid species. (originating in the 1930s and reviewed in
Mangelsdorf (1974) ). This hypothesis pos-
Phylogenetics tulated that maize evolved directly from
an undiscovered wild maize and that teo-
One of the most important roles of genetic sinte was derived from a hybrid between
markers in plant germplasm management maize and Tripsacum species. During that
is in the elucidation of the systematic rela- period, substantial resources (relative to
tionships within genera, tribes and families those devoted to similar programmes with
and obtaining characteristic genetic profiles teosinte) were allocated to improving maize
of germplasm. Using the similarity meas- with introgressed Tripsacum germplasm
ures and classification methods described (Galinat, 1977). To summarize, a clear under-
above, genetic markers of all types have standing of the systematic relationships
been instrumental in characterizing system- among a crop and its wild relatives is vital
atic and evolutionary genetic relationships for sound genetic resource management and
and in establishing a germplasms taxo- for crop improvement as a whole.
nomic identity which will probably change Taxonomic relationships have been
how the germplasm accessions are managed re-evaluated for many crop plants by using
and utilized. As indicated by Bretting and molecular markers and genomic sequences
Widrlechner (1995), clarifying evolutionary that cover some part of the genome for spe-
relationships among intermediate taxa may cific traits, attempting to replace the clas-
challenge the germplasm managers judge- sical morphological survey with a point
ment and acuity. Molecular taxonomy will survey using data obtained from one or
substantially improve our knowledge of the more marker or sequence loci. For example,
primary, secondary and tertiary gene pools studies of the genetic architecture of key
of many crops and evolutionary studies will yield-related components (e.g. flower and
help identify crop ancestors, past genetic seed production, maturity and photoperiod
bottlenecks and opportunities for introduc- response) will enable us to focus on areas
ing useful variation. It is particularly vital of the genome where diversity is particu-
for germplasm management purposes to larly important for this trait (Hodgkin and
discriminate recently synthesized, naturally Ramanatha Rao, 2002). Phylogenetic stud-
occurring F1 hybrids and/or hybrid deriva- ies provide a fundamental gain in genetic
tives from taxonomically intermediate taxa knowledge not only to prove that two indi-
originating from convergent-parallel evolu- viduals or gene copies differ but also to
tion, clonal variation, recombinational spe- place them in a hierarchy of relationships
ciation and/or the retention of intermediate based on the timing of a shared ancestor.
ancestral traits (where the latter includes Phylogenetic diversity can be also
the phenomenon known as lineage sorting; estimated by whole genome analysis and
Avise, 1986). genome-scale phylogenetic trees can be
Supraspecific systematic relationships created. Such genome-trees can be built
are best elucidated by phylogenetic meth- based on gene content, gene order, evolu-
ods. These methods can sometimes help tionary distances between orthologues and
estimate phylogenetic relationships among concatenated alignments of orthologous
crops and related taxa and accordingly, may protein sequences (see Wolf et al. (2003)
help determine whether a weedy crop rela- for a review). Both the initial results and
tive is a crop progenitor or a feral crop deriv- the general notion that using genome-wide
ative. As the exact systematic relationships information helps enhance the phyloge-
among a crop and its relatives are better netic signal suggest that the future belongs
understood, germplasm conservation and to these approaches.
Plant Genetic Resources 181

5.5.4 Collection redundancies and gaps they are isolines, with Calrose 76 repre-
senting a variant derived from Calrose via
As a large number of germplasm accessions chemical mutagenesis. Using 15 SSR mark-
are available for each cultivated plant, ers, Dean et al. (1999) assayed 19 sorghum
many likely represent duplicate or nearly (Sorghum bicolor (L.) Moench) accessions
identical samples of the same cultivar identified as Orange currently main-
while others embody those with rare alle- tained by the US National Plant Germplasm
les or highly unusual allele combinations System (NPGS). They found that most acces-
or those where many of their genes or alle- sions were genetically distinct, but two
les are underrepresented in current collec- redundant groups were found. The variance
tions. Molecular technology will help us to analysis also indicated that it should be pos-
understand the genetic structure of exist- sible to reduce the number of Orange acces-
ing collections and to design appropriate sions held by NPGS by almost half without
acquisition strategies. In particular, genetic seriously jeopardizing the overall genetic
distance can be calculated as described variation contained in these holdings.
previously to identify particularly diver- Germplasm collections can be compared
gent subpopulations that might harbour for the frequencies of alleles at all genetic loci
valuable genetic variation complementary so that distinctive alleles, allele combinations
to that in current holdings. and allele frequency patterns can be identi-
Germplasm redundancy exists in many fied for a given population. Chromosomal
germplasm collections due to the differ- regions containing loci that show the greatest
ent names given to the same cultivars or changes in allele frequency between the col-
duplicate samplings of the same accessions. lections can be located. The rationale for this
Duplication of germplasm among collections analysis is to define genomic regions where
is substantial. Eliminating this type of dupli- selection gave rise to allele combinations or
cation has often been suggested as a way of allele frequency patterns that distinguish
reducing the costs associated with the oper- a group of accessions with less diversity
ation of genebanks. Lyman (1984) estimated from those in a more diverse group. Alleles
that at least 50% of the germplasm held con- originally found in ancestral cultivars or the
sists of duplicated accessions. The Food and wild relatives may be gradually lost through
Agriculture Organization (FAO) (1998) esti- domestication and breeding. Modern breed-
mates that of the 6 million accessions stored ing programmes generally rely on a small
worldwide, only between 1 and 2 million number of superior accessions which results
are unique. On the other hand, it has been in genetic uniformity and loss of diverse alle-
recognized that all germplasm should be les that could be important to future breeding
backed up in at least two different sites to programmes. Valuable lost genes or alleles
avoid complete loss of the collections at any can be recovered by going back to the ances-
given site. Pedigree-related cultivars, sib- tors or wild relatives of our crop species.
ling lines and early isogenic lines may rep- Using 47 SSR markers, Christiansen et al.
resent another type of redundancy because (2002) determined the variation of genetic
they are genotypically duplicated at most diversity in 75 Nordic spring wheat cultivars
genetic loci. For example, US rice cultivars bred during the 20th century. They found
M5, M301, M103, S201, Calrose, Calrose 76, that some alleles were lost during the first
CS-M3 and Calmochi-202 shared the same quarter of the century whereas several new
panel of alleles at all of the 100 RFLP loci alleles were introduced in the Nordic spring
surveyed. Each of these cultivars can be wheat material during the second quarter of
traced back to a common ancestor, Caloro. In the century.
addition, no genetic polymorphism could be The allele frequencies at 100 RFLP and
detected at another 60 loci between Calrose 60 SSR loci in rice were compared between
and Calrose 76 when a more polymorphic the US and world collections and between
marker type, SSR, was used (Xu, Y. et al., the two major types, indica and japonica,
2004). This is probably due to the fact that within the world collection (Xu et al.,
182 Chapter 5

2004). Among 34 alleles at 20 RFLP and 14 eventual loss of alleles from the population.
SSR loci that were found most frequently The change in allele frequency in one gen-
(allele frequencies are 20.459.5%) in the eration for diploid organisms can be quan-
world collection, three of them were com- tified by q(1 q)/2Ne, where q represents
pletely lost and 31 of them were underrep- the frequency of allele and Ne the effective
resented (less than 5%) in the US collection population size (Falconer, 1981). Thus, the
(japonica) while some of them were also effective population size determines the
lost or underrepresented in indica types. extent of genetic drift. Effective popula-
As examples, the lost alleles and the under- tion sizes are generally smaller than actual
represented alleles with frequencies of less population sizes because of unequal num-
than 2% are listed in Table 5.4. Selection bers of females and males, overlapping gen-
against these alleles is clear and stems from erations, non-random mating, differential
the fact that modern US rice cultivars have fertility and fluctuations in population size
been developed from a small set of germ- (Falconer, 1981; Barrett and Kohn, 1991).
plasm introductions. Genetic drift may be controlled by adjusting
the size of the regenerating population or
developing improved regeneration methods
(Engels and Visser, 2003). Genetic drift can
5.5.5 Genetic drifts/shifts and gene flow be measured using molecular markers that
are neutral and co-dominant because of the
To generate stocks for distribution or to random nature of the process.
maintain the seed viability, a variable The genetic composition of populations
accession needs to be regenerated regu- may also change during regeneration due to
larly. During this process there is a risk that selection. Selection is different from genetic
the genetic integrity of the accession will drift because it does not affect all loci simul-
be compromised by genetic drift, selec- taneously and usually occurs towards par-
tion or gene flow (Sackville Hamilton and ticular genotypes or loci. Selection during
Chorlton, 1997). Genetic drift is a stochas- regeneration can be inferred from strong
tic phenomenon of fluctuations in allele shifts in marker allele frequencies for certain
frequencies in the offspring deviating from loci between parental and offspring popula-
the parental population which may result tions (Spooner et al., 2005). Maintaining
in random fluctuations in allele frequen- genetic diversity and preventing genetic
cies from generation to generation or the shifts are important objectives for germplasm

Table 5.4. SSR and RFLP alleles lost or underrepresented in the US collection but most frequent in the
world collection (the markers designated with the RM prefix are SSRs and others are RFLPs) (selected
from Xu et al., 2004).

Allele frequency (%)

Chromosome Marker Allele World USA World japonica World indica

1 RM259 156 bp 20.4 0 0 35


1 CDO118 17 kb 49.5 1.6 0 85.7
2 RM207 131 bp 21.7 0 4.7 35
3 RM7 181 bp 31.5 1.6 2.3 54.1
5 RM233B 138 bp 41.9 1.7 2.5 74.5
7 RM11 143 bp 29.6 1.6 4.7 48.4
9 RM219 216 bp 21 0.9 0 35.6
9 RM257 149 bp 21.5 0 0 36.5
9 RM205 123 bp 46.4 1.6 4.5 77.8
9 CDO1058 4.1 kb 55.9 1.6 4.5 93.8
12 RG901X 4.6 kb 43.6 1.6 9.1 69.8
Plant Genetic Resources 183

conservation. In open-pollinated species, (Jiang et al., 2003; Kikuchi et al., 2003;


deviations from random mating, primarily Nakazaki et al., 2003). These findings should
in the form of assortative or consanguineous reinforce the concept that the genetic stabil-
matings, need to be monitored during germ- ity of in vitro cultures should be monitored
plasm regeneration. In maize, deviations with a battery of different genetic markers,
from random mating have been widely stud- particularly transposon-based DNA markers
ied, with emphasis on detailed multi-locus that collectively span the whole genome.
isozyme analyses of one or two synthetic Germplasm accessions that are mainly
or open-pollinated maize cultivars (Kahler self-pollinated may contain a certain level
et al., 1984; Pollak et al., 1984; Bijlsma et al., of heterogeneity providing a buffer for main-
1986). In general, levels of selfing did not taining genetic diversity and preventing
exceed those expected under random-mat- genetic shifts. Monitoring heterogeneous
ing models, but significant deviations were accessions will help develop strategies for
caused by temporal variation in the pollen regeneration of germplasm samples with-
pool or by gametophytic selection. out loss of the allelic diversity provided by
The genetic profiles of germplasm can heterogeneity. In general, a higher level of
change during the course of medium- or heterozygosity was found in traditional cul-
long-term storage. Storage effects fall into tivars as reported in rice by Olufowote et al.
three broad categories: (i) the occurrence (1997). Genetic diversity resulting from het-
of mutations; (ii) the occurrence of chro- erozygosity or heterogeneity was also found
mosomal aberrations; and (iii) shifts in within inbred lines from different sources
gene frequencies resulting from differential in rice (Olufowote et al., 1997) and maize
genotypic viability in heterogeneous popu- (Gethi et al., 2002). In another rice example
lations. After a comprehensive review of (Xu et al., 2004), a total of 120 (50.8%) of
storage effects on seeds, Roos (1988) found the 236 rice accessions were found to be
little evidence for heritable changes in germ- heterozygous/heterogeneous at one or more
plasm attributable to storage-induced chro- RFLP or SSR loci and the number of hetero-
mosomal aberrations and noted little need zygous loci detected in a single rice acces-
for concern about mutation as a significant sion ranged from 0 to 39 (25.3% of the 160
factor in altering the composition of germ- loci). These heterozygous allele patterns
plasm collections. As indicated by Bretting may indicate either seed mixtures or true
and Widrlechner (1995), however, differ- heterozygosity remaining in these cultivars
ential seed longevity can markedly reduce although all accessions had been purified
genetic variability over time. This is well before genotyping and no apparent pheno-
documented by experiments involving mix- typic variation was detected.
tures of eight bean lines (Roos, 1984) and In plant breeding, specific accessions
four seed storage protein genotypes within a are selected as parental lines from which to
cultivar of wheat (Stoyanova, 1991). develop new cultivars based on availability
Genetic shifts can also be caused by in of target traits or their overall performance.
vitro culture. The genetic stability of germ- Some accessions have been used more fre-
plasm maintained in tissue culture (in vitro) quently than others and as a result, con-
has historically been monitored with karyo- sidering cultivars that have been bred and
typic markers such as chromosome number used as a whole, genotypic selection and
and morphology (DAmato, 1975) because the change of allele frequencies at various
cytological variability has been considered genetic loci have occurred resulting in the
a primary cause of somaclonal variation. loss or underrepresentation of specific gen-
Lassner and Orton (1983) reported that in otypes, alleles and allele combinations.
vitro cultures of celery with identical iso- Gene flow plays an important role
zyme profiles were markedly variable cyto- in generating troublesome weeds that are
logically. More recently, in vitro culture has difficult to control. Gene flow can also
been shown to induce changes because of lead to either crop diversification (Harlan,
the mobilization of transposable elements 1965; Jarvis and Hodgkin, 1999) or genetic
184 Chapter 5

assimilation (Para et al., 2005) or a combina- alleles and further phenotypic characteri-
tion of both at different times and in different zation may determine whether these alle-
locations. Molecular marker analysis can be les will be important to our future breeding
used to monitor gene flow among cultivars programmes. The germplasm that holds
developed in a long history of plant breed- unique alleles may contain unique genetic
ing (vertical flow), among pedigree-related variation required for trait improvement.
cultivars developed in a relatively short For example, 15 (6.4%) of the 236 rice
period (horizontal flow) and among culti- accessions examined by Xu et al. (2004)
vated species and weeds. Tracing specific contained unique alleles (those present in
alleles or genes has become an important only one of the cultivars) for at least one
objective in parentage control and cultivar RFLP locus and 81 (34.3%) rice accessions
identification which has been used to pro- had unique SSR alleles. The germplasm
tect the rights of breeders. Further discus- accessions identified as having unique
sion on gene flow between transgenic plants alleles also had unusual geographic origins
and weeds can be found in Chapter 12. with high genetic diversity and could have
potential use in the exploitation of hetero-
sis and novel alleles for agronomic traits.
5.5.6 Unique germplasm The degree of genetic similarity between
any two cultivars can be calculated as the
To broaden the genetic base of specific cul- proportion of shared alleles. The most simi-
tivated species, the genetic diversity within lar accessions share alleles at almost all
collections must be assessed in the context marker loci while the least similar acces-
of the total available genetic diversity for sions have few or no alleles in common.
each species. With the use of DNA profiles, When evaluating genetic similarity, shared
the genetic uniqueness of each accession in allele frequencies (SAFs) can be averaged
a germplasm collection or in a population over all possible pairs of cultivars in a sam-
can be determined and the identity and fre- ple. A smaller average similarity indicates
quency of individual alleles can be clearly a greater genetic difference with respect
described and characterized (Brown and to the rest of the cultivars in the collec-
Kresovich, 1996; Smith and Helentjaris, tion. Based on the averaged SAF, the most
1996; Lu et al., 2009). A DNA bank can be diverse accessions can be selected to rep-
developed to undertake allele mining for resent cultivars that host the least-frequent
identifying unique germplasm containing alleles and are genotypically most differ-
novel alleles and allele combinations. ent from other accessions. From 236 rice
The sampling of exotic germplasm cultivars, Xu et al. (2004) selected the 16
should emphasize the genetic composi- most diverse accessions (with SAF < 50%)
tion rather than the appearance of some- based on RFLP markers and 49 accessions
thing very different. Accessions with based on SSR markers. Most of these selec-
DNA profiles most distinct from that of tions, such as Caloro, Cina, Badkalamkati,
modern germplasm are likely to contain DGWG and TN1, were ancestral cultivars
the greatest number of novel alleles (dif- that had been used as parents in breeding
ferent from those already present in the programmes more than 40 years previously;
elite gene pool). Marker analysis could none of the selections includes lines from
be used to identify accessions harbouring the US collection which has a much nar-
rare or novel alleles so that the functional rower genetic basis.
significance of the resident genes can be Genetic mapping studies involving
determined using both traditional crossing interspecific crosses have identified novel/
and sequence-based genomics approaches. superior alleles originating from pheno-
Considering the allele frequency profiles typically unfavorable distant relatives that
across all cultivars and germplasm acces- enhance the performance of modern culti-
sions will give us some idea of which germ- vars (Xiao et al., 1998; Moncada et al., 2001;
plasm may retain or contain the rare genes/ Brondani et al., 2002; Nguyen et al., 2003;
Plant Genetic Resources 185

Thomson et al., 2003). These novel alleles infancy currently facing the fundamental
are present in germplasm collections but challenge to establish which of the various
have not been previously identified because alleles present is functionally different from
they are hidden in the inferior phenotype. the wild type and where possible to identify
The valuable alleles identified from Oryza which new alleles beneficially influence
rufipogon that increase yield in commercial the target trait. Methods to ascertain allele
cultivars have been used to improve the function include marker-assisted back-
best hybrids that have been commercial- crossing (MABC), transformation, transient
ized in China since before 1989 (Xiao et al., expression assays and association analysis
1998) and new hybrids containing these using an independent set of germplasm
O. rufipogon introgressions demonstrate for association mapping from that used to
more than a 30% yield advantage over previ- identify the original allele. As more of these
ous Chinese hybrids (Yuan, 2002). The novel studies are carried out, it is hoped that the
genes/alleles identified from a germplasm growing database of comparisons between
collection can also be utilized in transfor- sequence variation and phenotype will
mation experiments by one of the methods allow bioinformaticians to identify patterns
described in Chapter 12 if sexual transfer is that can form the basis of future predictive
impossible or too slow, especially during methods. The current rate limiting factor for
the phase of testing new genes and alleles in the effective use of outputs from allele min-
a common genetic background. Utilization ing in breeding programmes is that there
of genetic resources in plant breeding is the is insufficient information on the relation-
major task of plant breeders, a topic which ship between SNP variation and changes in
is discussed in detail in Chapters 7 and 9. phenotypes that may be useful for breeders.
However, the resources and tools neces-
sary to perform in silico trait targeted selec-
5.5.7 Allele mining tion of the outputs from allele mining are
becoming available. Thus, proof-of-concept
There are several options for identifying projects are now being carried out in model
or capturing diversity that might not exist organisms in order to study the relation-
in the germplasm pool of existing breeding ships between SNP haplotypes and changes
lines: allele mining, transformation, muta- in phenotypes. This has already led to the
tion breeding, use of landraces or synthetic development of predictive tools that can
polyploids and wide crossing (Able et al., identify those SNPs with a high probabil-
2008). Allele mining, which is important ity of conferring deleterious phenotypes.
for utilizing novel alleles hidden in genetic However, the next big step in this area is
diversity, will be discussed here. the development of bioinformatics tools
Molecular and functional diversity to compare sequence variation with pro-
of crops genomes can be characterized by tein and functional domain variation or
allele mining, identification of distinct hap- with public databases including associated
lotypes for different inbred lines, single phenotype data, in order to predict which
feature polymorphism (SFP) analysis, dis- sub-selections of SNP haplotype variants
covery of nearly identical paralogues (NIPs; have the maximum likelihood of providing
Emrich et al., 2007) and determination of beneficial phenotypic variation in the target
their evolutionary implications. In general, trait. It is likely that SNPs in promoter and
there are two approaches that have been non-coding regions will also be important
elaborated for allele mining: re-sequencing for predictive phenotype analysis.
(e.g. Huang et al., 2009) and EcoTILLING The same methodology used in asso-
(discussed in Chapter 11) (Comai et al., ciation mapping may also be used for
2004). Whole genome genotyping using allele mining of the diverse core subsets
gene-based markers can be used as the foun- of germplasm being created from breeders
dation of the re-sequencing method. Allele lines, genebank accessions and wild rela-
mining from germplasm collections is in its tives. Once a gene of interest is positively
186 Chapter 5

identified (via association mapping or any defined pre-breeding as making particular


other technique) and the sequence deter- genes more accessible and usable to breed-
mined, the same gene can be re-sequenced ers by adapting exotic germplasm to a local
(entirely or in part) in all the individuals in environment without losing its essential
the subset (Huang et al., 2009; Vashney et al., genetic profile and/or introgressing high-
2009). Changes in the DNA sequence corre- value traits from exotic germplasm into
sponding to new alleles of this locus will be adapted cultivars. Although the end-prod-
identified in this manner and individuals ucts of pre-breeding are usually deficient in
carrying the new alleles can be evaluated certain desirable characters, they are attrac-
for the target trait to determine the associ- tive to plant breeders due to their greater
ated change in phenotype and value for sub- potential for direct utilization in a breeding
sequent use in breeding programmes. These programme when compared to the original
alleles may not ever have been found via source(s). Germplasm enhancement or con-
simple phenotypic screens, either because verting unadapted germplasm to a usable
it is not possible to grow and measure every form is a key to modern crop improve-
plant in a large germplasm collection under ment. Thus, germplasm curators should
all possible environmental conditions, not expect large germplasm requests from
because its effect may be masked in an breeders who are searching for enhanced
unsuitable genetic background or because or improved germplasm with specific traits
its effect may be so small that it will not be rather than the raw, often unadapted germ-
found unless specifically sought in care- plasm that are the principle components
fully controlled phenotypic screens (not of major genebanks. Effective alliance with
generally possible on a very large scale). germplasm enhancement specialists and
experimental biologists (molecular geneti-
cists, physiologists, biochemists, patholo-
gists and entomologists) would enhance the
5.6 Germplasm Enhancement use of unadapted germplasm.
Take the common bean (Phaseolus
Although evaluation data are vital to the vulgaris L.) as an example for crop pre-
active use of conserved germplasm, this breeding. Evaluation of wild common bean
information does not automatically guar- accessions has shown resistance to insects
antee active use. Most genebank accessions and diseases and higher N, Fe and Ca con-
are landraces that have been grown by tra- tent in the seeds, which will ultimately
ditional farmers for centuries. They have contribute to improvements in nutritional
undergone countless cycles of selection for quality and yield (Acosta-Gallegos et al.,
adaptation to biotic and abiotic stresses. 2007). In this situation, the pre-breeding
Therefore, they may have specific value efforts will be enhanced by: (i) informa-
for modern crop improvement. Unadapted, tion on gene pool origins, classification of
wild germplasm is valuable and often offers syndrome traits, molecular diversity and
great potential for improvement of impor- mapping data of the wild forms; (ii) indirect
tant characteristics; however, breeders have screening for biotic and abiotic stresses; and
usually been unwilling to access this valu- (iii) marker-assisted selection.
able resource because of the detrimental
effect of other genes carried along with the
selected gene by linkage drag.
The term pre-breeding is often used 5.6.1 Purification of germplasm
to designate the phase between evaluation collections
and breeding. Many programmes that aim
to facilitate the utilization of plant germ- Off-types in germplasm collections are a
plasm include the process of pre-breeding, potential quality problem. Traditionally,
also known as development breeding or off-types are defined as individual plants
germplasm enhancement. Duvick (1990) that are phenotypically different from the
Plant Genetic Resources 187

typical plants or the plants developed by 5.6.2 Tissue culture and transformation
breeders. They may result from mechanical in germplasm enhancement
mixtures, outcrossing, mutation or residual
genetic variation. From the point of view A new level of enthusiasm and activity in
of germplasm conservation, the off-types plant cell culture research developed in the
could be mixed with the real type and when early 1970s with reports of plant cell lines
the proportion of the off-types is sufficiently resistant to amino acid analogues, nucle-
high within accessions, they can dominate otide analogues, antibiotics and plant
the collection and cannot be differentiated pathogen toxins. The real excitement of
from the typical plants. From the point of these reports from plant breeding was the
view of germplasm utilization, the presence potential to generate crop plant germplasm
of off-type plants will reduce the uniformity that expressed new sources of resistance
of the crop and thus reduce its productiv- to herbicides, plant pathogens and min-
ity and quality. Phenotypic off-types can be eral and salt stresses that could not be
easily rogued if there are not too many and obtained through conventional breeding
if they can be distinguished from the real methodologies. The manipulation of plant
type by phenotype. In addition to the off- cells, tissues and organs in vitro is produc-
types that are visible phenotypically, many ing an increasing number of unique clones
off-types are genetically different from the of industrial, biochemical, genotypic and
typical plants but are difficult to distinguish agronomic importance. Examples include
visually. The presence of genotypic off-types regenerable genotypes, transformants, hap-
may impose a more severe effect on germ- loids, polyploids, mutants, isogenic lines,
plasm and could be one of the reasons for somaclonal variants, somatic hybrids and
genetic drift and cryptic loss of germplasm secondary product-producing cultures. In
accessions. Both genotypic and phenotypic addition, a wide array of industrial chemi-
off-types may be exaggerated by multipli- cals is derived from plants, including fla-
cation. Molecular technology provides a vours, pigments, gums, resins, waxes, dyes,
powerful mechanism for distinguishing essential oils, edible oils, agrochemicals,
both the phenotypic and the genotypic off- enzymes, anaesthetics, analgesics, stimu-
types from the typical plants. Markertrait lants, sedatives, narcotics and anticancer
associations and high-resolution molecular agents. The ultimate strategy was to put
markers such as SSRs and SNPs could be the most advanced breeding germplasm
used to distinguish two plants with very into cell culture and obtain either by selec-
similar genetic backgrounds. With ten or tion or somaclonal variation a derived line
more co-dominant molecular markers for improved by the addition of one new trait.
example, breeders can identify distinct off- Advances in tissue culture and molec-
types from their breeding populations and ular biology have opened new avenues
hybrid seed bulks and obtain detailed geno- for the precise transfer of novel genes into
typic information such as the source of the crop plants from diverse biological sys-
off-type genotypes and proportion of the tems (plants, animals and microorganisms)
off-types to typical plants. A selection and which were previously not feasible. The
purification decision can then be made to development of efficient procedures for
refine the germplasm collection and breed- culture of somatic cells, pollen, protoplasts
ing materials. and for plant regeneration from a large
Heterogeneity existing in a germplasm number of plant species combined with a
collection reduces the potential of utiliza- broad-suite of tools, including improved
tion and reduces the interest of plant breed- DNA vector systems based on Ti and Ri
ers. The first step in genetic enhancement plasmids of Agrobacterium, direct DNA
for this kind of germplasm is purification by transfer methods, transposable elements,
selecting typical plants to obtain true-breed- series of promoters, marker genes and a
ing genotypes. This is extremely important large number of cloned genes, have made
for wild relatives of self-pollinated crops. gene transfer more precise and directed
188 Chapter 5

(for details see Chapter 12). As a result of adapted to Iowa, yield-enhancing genomic
these developments, transgenic plants have segments from an inbred line adapted to
been produced in many plant species with Texas by: (i) identifying the favourable seg-
foreign genes inserted for a wide range of ments through yield trials coordinated with
traits. These developments have resulted molecular (RFLP and isozyme) marker gen-
in a large number of germplasm/cultivars otyping; and (ii) transferring into the Iowa
with enhanced agronomic traits. line, with the help of molecular marker
genotyping, only the favourable segments
from the Texas line. Although favourable
5.6.3 Gene introgression in germplasm segments were identified in field trials
enhancement conducted in the diverse environments
of North Carolina, Iowa and Illinois, both
Up until now the primitive cultivars and the recipient and the donor lines could be
related wild populations have been a fruit- considered somewhat alien to the primary
ful and sometimes the sole, source of genes breeding site for this programme in North
for pest and disease resistance, adaptations Carolina. Nevertheless, these two examples
to difficult environments and other agricul- do represent cases in which genetic markers
tural traits. The proportions of unadapted apparently facilitated yield enhancement
and adapted genomes and/or genotypes successfully. With numerous genes identi-
persisting in enhanced germplasm will dif- fied from wild relatives of crop species by
fer according to the particular goals of the molecular markers, marker-assisted gene
enhancement programme. Incorporation introgression will be used increasingly in
programmes seek to increase genetic diver- germplasm enhancement.
sity by maximizing the proportion of unad- The importance of the various base broad-
apted genome/genotypes that is retained. In ening procedures that are being explored
contrast, when introgressing adapted germ- at present should be emphasized. Genetic
plasm with high-value traits, only the requi- resources workers need to collaborate with
site high-value genes should be transferred. plant breeders in the development of proce-
Finally, yield enhancement efforts identify dures that allow effective testing of new mate-
whichever proportion of the unadapted and rials and their introduction into improvement
adapted genome/genotypes that optimizes programmes in a systematic manner. As indi-
the yield of the desired end product. cated by Hodgkin and Ramanatha Rao (2002),
Gene introgression from wild species both introgression and incorporation pro-
through molecular marker-assisted selec- grammes will be needed.
tion will be discussed in detail in Chapters
8 and 9. Only two examples that relate to
germplasm enhancement will be given here. 5.7 Information Management
Isozyme, RFLP and morphological markers
diagnostic for chromosomes of one of the
Information management has become
wild-weedy relatives of tomato facilitated
increasingly important in plant breeding and
efforts to introgress wild genomic segments
germplasm conservation as a large amount of
into elite tomato-breeding germplasm
data is accumulating. Since breeding-related
(DeVerna et al., 1987, 1990). As a result, the
information is described in detail in Chapter
elite germplasm received wild genomic seg-
14 only issues related to germplasm manage-
ments that improved horticulturally valu-
ment will be discussed in this section.
able traits to increase the yield (Rick, 1988).
Notably, as a result of this programme,
modern tomato cultivars may be more
genetically diverse than heirloom, vintage 5.7.1 Information system
cultivars (Williams and St Clair, 1993).
Stuber and Sisco (1991) and collabora- There are two areas in which the rapid
tors introgressed into a maize inbred line developments of the past few years have
Plant Genetic Resources 189

had a major effect on plant genetic resources tions. In addition, an increasing body of
work: molecular genetics as discussed in geo-referenced data has become available,
Chapters 2 and 3 and information technol- i.e. data associated with coordinate and
ogy. Information generated throughout germ- altitude information. These geo-referenced
plasm conservation activities must be stored data include both biological (e.g. landcover,
in an easily accessible form. Dissatisfaction cattle density) and non-biological (e.g. cli-
with the quantity, quality and availability of mate, topography, soil and human activity)
information on the accession level is the most data. The non-spatial attributes are any bio-
frequent concern expressed by genebank logical, including genetic, data associated
clients (Fowler and Hodgkin, 2004). The with the individual accessions collected.
information situation has improved in the Thus, GIS is a tool designed to visualize
last few years. The CGIAR System-wide and analyse spatial patterns in genetic data
Information Network for Genetic Resources in relation to ecological data; it is also a
(SINGER) provides access to data on the hypothesis-generating tool for investigat-
plant collections held in trust by the Future ing processes that shape genomes. With
Harvest Centres while the US Genetic a free mapping program, DIVA-GIS, we can
Resources Information Network (GRIN) DA create grid maps of the distribution of
system and the European EURISCO system biological diversity to identify hotspots
provide access to data on collections held and areas that have complementary levels
in the USA and Europe, respectively. The of diversity (Hijmans et al., 2001; http://
information revolution enables us to manage www.diva-gis.org/). Furthermore, informa-
and process the very large amounts of data tion generated by GIS analysis can help
generated in various areas and the Internet in conserving and using genetic diversity
potentially provides global access to that as effectively and efficiently as possible
data. Information management has always (Greene and Guarino, 1999; Jarvis et al.,
had a central place in plant genetic resources 2005).
conservation. The need to identify, record Examples of GIS applications, as
and communicate information about acces- summarized by Gepts (2006), include:
sions has led to the development of a sub- (i) study of isolation by distance and its
stantial infrastructure with relatively highly effect on the genetic structure of gene
developed database structures and informa- pools by comparing genetic and geo-
tion management systems. The information graphic distances; (ii) linking diversity
revolution will profoundly affect our under- and environmentally heterogeneity; (iii)
standing of the organisms we conserve. All determination of species distribution and
scientists involved in germplasm collection, areas of greatest diversity; (iv) identifica-
conservation and utilization would ben- tion of germplasm with specific adapta-
efit from professionally built, deployed and tion; (v) predicting the distribution of
maintained information resources for their species of interest and identifying new
favourite organisms. areas for germplasm exploration; (vi) plan-
Application of GIS technologies to the ning germplasm exploration trips by iden-
management of information on global plant tifying highly diverse areas, ecologically
diversity is one of the greatest achieve- dissimilar areas, under-conserved areas
ments made in plant genetic resources con- and areas containing threatened species,
servation since the late 1950s (Kresovich timing of the exploration and additions to
et al., 2002). A GIS system is a database passport data; (vii) designing zoning plans
management system that can simultane- for in situ conservation integrated with
ously handle digital spatial data and asso- socio-economic and indigenous knowl-
ciated non-spatial attribute data. Spatial or edge data; and (viii) establishment of core
location data are acquired via geographic collections (e.g. those based in part, on
positioning system devices which are now environmental variables such as length
quite inexpensive and have become part of of the growing season, photoperiod, soil
the obligatory equipment for field explora- types and moisture regimes).
190 Chapter 5

5.7.2 Standardization of data collection The Species 2000 and Integrated


Taxonomic Information System Catalogue of
Issues related to data standardization become Life (http://www.sp2000.org/) aims to create
more prominent with the accumulation a comprehensive catalogue of all known spe-
of data from different collections. Genetic cies of organisms on Earth by 2011. Rapid
resources documentation systems have progress has been made recently and the 2006
been developed in many different places Annual Checklist contains 884,552 species,
and using very different approaches. With approximately half of all known organisms.
the increasing international collaboration This database is a valuable asset and will
among genebanks, the different approaches provide a unified structure through which a
used by individual genebanks has made it wealth of basic biological information can be
increasingly difficult to exchange informa- linked. It is essential for genebanks to adhere
tion and the need to formulate international to such taxonomic standards not only to
formats for the documentation of crop germ- facilitate data exchange between genebanks
plasm became apparent (Hazekamp, 2002). but also to ensure effective linkage with
To assist in this task, the International Plant related biological disciplines. While initia-
Genetic Resources Institute (IPGRI) started tives such as the Species 2000 project will
the production of crop descriptor lists in go a long way towards rationalizing the use
1979 and more than 80 descriptor lists have of different taxonomic treatments, there is
been produced. In 1996, IPGRI went one still a need for adequate taxonomic capacity
step further and defined a set of multi-crop within national programmes to apply such
passport descriptors (Hazekamp et al., 1997). taxonomic standards correctly (Hazekamp,
These descriptors aim to provide consistent 2002).
coding schemes for a set of common pass-
port descriptors for all crops and will there-
fore facilitate the accumulation of passport
data into multi-crop information systems. 5.7.3 Information integration
Another issue will be the need to and utilization
ensure that there are common vocabular-
ies, also called ontologies, where differ- Multiple conservation approaches rely on
ent databases are being linked and that good documentation systems integrated
there are reasonably straightforward ways across programmes and based on individual
of linking them (Sobral, 2002). Several commodities. For instance, samples can be
genebanks mentioned data standardiza- sorted when it is known who stores what
tion as obstacles to fuller use of passport and redundancies can be identified. Site
data. The lack of standardization in taxo- collection data and characterization data
nomic treatments seriously hampers the enable unique materials to be identified,
exchange of basic biological data. Not only provide knowledge of where unique lan-
are genebanks reliant on a wealth of basic draces or primitive cultivars originated so
biological data to fulfil their collection that they can be conserved and multiplied
acquisition and maintenance tasks, access in areas with similar environments and
of users to genebank collections is equally facilitate decisions on what should justifia-
hampered by inconsistent taxonomies. bly be conserved by whom. It is necessary
Scientific names are notoriously inconsist- to provide a framework for a more integrated
ent among the collection databases of dif- approach to biology where information from
ferent organizations (Hulden, 1997). Not widely different sources can be brought
only do taxonomic treatments frequently together to help understand crop plant per-
change as a result of new research, but also formance and diversity and the forces that
conflicting treatments coexist for some spe- are responsible for the patterns we observe.
cies. This creates major problems when To integrate information generated in differ-
trying to exchange basic biological informa- ent areas, it is important for all researchers
tion (Hazekamp, 2002). to follow general rules for reporting their
Plant Genetic Resources 191

genotyping and phenotyping results. One database serves as a classification tool


direction for the use of a germplasm data- describing the overall levels and patterns
base is to combine information collected of variation within the crop gene pool and
from global research efforts. Facilitating illustrating subdivisions within a gene pool
cross talk among currently existing genome such as heterotic groups. Such information
databases specializing in sequence informa- is useful in making predictions about the
tion or expression data with germplasm performance of new cultivars and hybrids
databases documenting phenotypic and or selecting parents for crosses that are
genotypic variation will add value to all likely to yield new gene combinations or
sources of information. afford an optimal degree of performance.
Integration of molecular information Plant geneticists and breeders can use the
is important but it is even more important data from a germplasm evaluation project
(and challenging) to integrate it with phe- as a guide in choosing the most efficient
notypic information obtained at the level crosses for genetic studies and breeding. For
of the whole organism, the latter typically example, a genotyping project of n acces-
provided through germplasm repositories sions theoretically provides polymorphism
and mutant stock centres. This need is clear surveys for n (n 1)/2 possible cross
because organisms are more than the sum combinations. With an increasing number
of their parts. In addition, it is within this of markers surveyed on a variety of germ-
organism context that data can be trans- plasm accessions and as more data flow
formed into useful information and perhaps, into the database from multiple sources, it
knowledge. Databases from genebanks will will be increasingly possible to determine
be linked with those from botanical gardens the genetic constitution and genetic rela-
and protected areas to support conserva- tionships among a wide range of parental
tion planning. Sequence information from lines, cultivars and wild relatives. This
molecular databases will be compared with also provides the foundation for developing
expressed sequence tagged sites (ESTs) hypotheses based on association genetics
obtained from diversity studies to target and haplotypic patterns to relate agronomi-
markers associated with potentially useful cally important phenotypes to the presence
traits (Hodgkin and Ramanatha Rao, 2002). or absence of specific molecular alleles. The
Once integration of molecular information most ambitious project would be to geno-
with organism (phenotypic) information is type all available germplasm accessions
achieved in a robust manner then it is very using markers from all candidate genes of a
likely that the resulting system will modify genome. As a result, all germplasm-related
the way we think about germplasm conser- efforts could then be based on the whole
vation and enhancement. genome.
Comprehensive DNA fingerprinting of An efficient approach to the screening
crop gene pools, including as many culti- of germplasm involves the ability to rapidly
vars, hybrid parents and progenies as pos- create a nested series of core collections
sible, is the first step for using molecular based on information about geographic, phe-
marker technology in germplasm enhance- notypic and genotypic diversity stored in a
ment. DNA fingerprinting data may be database. The construction of such a system
stored as alleles and as scanned images of would require a large-scale effort to provide
gels and autoradiographs. These data must genotypic information using a standard set
be integrated with both phenotypic informa- of markers that could serve as a reference
tion and passport and pedigree information. point. As new markers and marker systems
A database of DNA marker alleles for the were developed, they could be overlaid on
elite gene pool of a crop provides informa- to the essential framework of diversity estab-
tion on specific DNA polymorphisms that lished previously. An increasingly power-
is needed to design, execute and analyse ful information system could be developed
genetic mapping experiments targeted at if data models were made explicit and the
specific traits or specific crosses. The same data structures were modular so that new
192 Chapter 5

types of genetic information could be read- Developments in recent years have


ily incorporated as they became available heightened the public nature of the debate
(Chapter 14). By accumulating historical over fair access and benefit sharing, and
information in a systematic manner, germ- have strengthened the role of global insti-
plasm collections would rapidly gain value tutes and instruments in protecting these
because they could be screened computa- public goods as well as ensuring access to
tionally for essential molecular and pheno- them and the information associated with
typic characteristics of interest. them. The Global Crop Diversity Trust and
Databases for whole genomic sequences the Challenge Programme on Unlocking
for several important species, both dicots and Genetic Diversity (now known as the
monocots, are available allowing directed Generation Challenge Programme) are two
discovery of genes in higher plants and clas- of these developments (Thompson et al.,
sification of alleles present in a wide range 2004). The Global Crop Diversity Trust has
of breeding germplasm. As indicated by been established as an international fund
Sorrells and Wilson (1997), the identification whose goal is to support the conservation of
of the genes controlling a trait and knowl- crop diversity over the long term. The estab-
edge of their DNA sequence would facilitate lishment of the Trust involves a partnership
the classification of variation in a germplasm between the FAO and the CGIAR. The Trust
pool based on gene fingerprinting or charac- aims to match the long-term nature of con-
terization of variation in key DNA sequences. servation needs with long-term secure and
Classification of functional sequence variants sustainable funding by creating an endow-
within genes such as FNP at a large number ment that will provide a permanent source
of targeted loci would substantially reduce of funding for crop diversity collections
the amount of work required to determine around the world. The CGIAR has taken the
their relative breeding value and lead to the initiative for a large research programme
identification of superior alleles. that aims to use molecular tools to unlock
the genetic diversity in genebank collec-
tions for transfer to breeding programmes.
This Challenge Programme brings together
5.8 Future Prospects advanced research institutes, national pro-
grammes from developing countries and
Germplasm collection, management and many of the CGIAR institutes. CIMMYT,
evaluation are complex and endless endeav- IRRI and IPGRI were the founders of this
ours. Protecting plant genetic diversity Challenge Programme. Apart from advanc-
poses political, ethical and technical chal- ing state-of-the-art techniques for molecu-
lenges (Esquinas-Alczar, 2005). Germplasm lar characterization of germplasm, one of
curators face serious problems, including the main thrusts of the programme is devel-
insufficient operational funds, needs for oping molecular toolkits and information
new germplasm acquisitions, genetic ero- systems for crops other than the big five
sion during genebank management, lack of model plant species (rice, Arabidopsis,
research opportunities and a high turnover Medicago, wheat and maize), including
of personnel. On the other hand, there are crops such beans, cassava, banana and
also several major constrains to the efficient millets.
use of conserved germplasm which include: Although molecular markers have
(i) insufficient stock of viable seed for distri- been recognized as the most useful tools
bution; (ii) high cost of seed production and among molecular techniques, the high cost
distribution; (iii) lack of relevant characteri- per data point is a bottleneck that is cur-
zation and evaluation data; (iv) reluctance rently limiting the extensive use of molecu-
of breeders to use unadapted germplasm; (v) lar markers in germplasm management.
plant quarantine regulations; and (vi) obsta- The cost of molecular genotyping depends
cles posed by legislation on plant breeders on marker type and its capacity in high-
rights and intellectual property rights. throughput analysis. For example, the cost
Plant Genetic Resources 193

of SNP analysis is now about US$0.200.30 locus from many accessions on a single chip
per genotype, with a cost of only a few cents as described in Chapter 3. Consequently,
per genotype expected in the coming years the breadth of genetic information from
(Jenkins and Gibson, 2002). With well- thousands of DNA polymorphisms and the
established marker systems and sequenc- depth of phenotypic measurements hold
ing facilities, genotyping with SSR markers promise for identifying markertrait cor-
costs about US$0.300.80 per data point, relations through linkage disequilibrium-
depending on marker multiplexing and based association genetics. The current QTL
the number of markers genotyped for each cloning procedures are time consuming; for
sample (Xu et al., 2002). There are several example, in species that have two growing
ways to reduce the cost of genotyping. First, seasons per year, it may take 5 years to pro-
increasing the throughput using automated duce the population needed for fine-scale
genotyping and data-scoring systems can mapping. With thousands of genes evaluated
help increase the daily data output (Coburn for QTL effects, a more efficient approach is
et al., 2002). Secondly, the optimization of needed to complement map-based cloning.
marker systems, including facilities and This role may be fulfilled by the applica-
personnel, will result in less cost per data tion of association tests to naturally occur-
point. Now there are powerful new tech- ring populations (Buckler and Thornsberry,
niques for screening thousands of plants 2002). This process, which can be called
for sequence variation in any particular association mapping or linkage disequilib-
gene which is known to be of importance in rium mapping within a sample of known
a breeding programme. Detection of SNPs pedigree (described in detail in Chapter 6),
can reduce a collection of many thousands exploits related individuals that differ for
of accessions to some tens of plants with a particular trait to establish which region
sequence changes in the gene of interest. of the genome is associated with the phe-
These can then be screened for their pheno- notype among the population members. In
typic characteristics and used where appro- order to apply this method to mapping genes
priate (Peacock and Chaudhury, 2002). using a plant genetic resources collection,
Genomic research has helped establish the following prerequisite resources will be
an information flow from molecular mark- required: (i) a dense set of molecular mark-
ers to genetic maps to sequences to genes ers; (ii) passport and phenotypic data; (iii)
and to functional alleles. Apparently, how- information on population structure; and
ever, there is still a gap between sequence- (iv) a sample with contrasting genotypes for
based information targeting genes and the trait of interest (Kresovich et al., 2002).
alleles and breeding-related information For several reasons, there is great enthusi-
targeting germplasm, pedigrees and pheno- asm at present about the promise of linkage
types. Phenotypic evaluation still provides disequilibrium-based association studies
the foundation for the functional analysis of for uncovering the genetic components of
many genes even when a complete genomic complex traits in humans: dense SNP maps
sequence becomes available. Integration of across the genome, elegant high-throughput
breeding-related phenotypic evaluation genotyping techniques, simultaneous com-
with the high-throughput evaluation of parison of groups of loci, statistical meas-
mutants in a genomics context will hasten ures for assessing genome-wide significance
progress towards understanding the func- and phenotypic insights as the basis for
tions of all plant genes in the years ahead. comparative genomic studies among differ-
This information will enhance the efforts to ent human groups are all available. These
use MAGE effectively to achieve a substan- conditions have already been or will soon
tial increase in the efficiency of germplasm be satisfied in some plant species as well,
management in the years to come. and association studies have been reported
With new genotyping methods such as in many plant species including maize, rice,
the locus-specific microarray and resequenc- barley and Arabidopsis. These studies bring
ing, it is now possible to scan variation at a together the power of genomics with the
194 Chapter 5

richness of crop germplasm collections and availability of the genome sequences from
promise to provide new insights into the rice and other plants, accelerated efforts are
genetic bases of domestication and produc- underway to determine the function of all
tivity of our major food crops. genes in a plant. As gene structurefunction
In the age of genomics, new attention relationships are clarified with greater pre-
is being focused on the value of germplasm cision, it will be possible to focus atten-
resources, including whole plants, seeds, tion on genetic diversity within the active
plant parts, tissues and clones from distinct sites of a structural gene or within key pro-
species and synthetic germplasm and all moter regions. This will make it productive
types of mutants. The ultimate goal of germ- to screen large germplasm collections for
plasm conservation is to maintain diver- FNPs, targeting the search for alleles that
sity of the genes and gene combinations are phenotypically relevant and have high
(Xu and Luo, 2002). Information regarding breeding value.
germplasm resources is being increasingly Locating useful genes in collections
extracted from studies involving the rela- will require an integrated approach that
tionship between genome sequence and its brings together information from molecular
biological and evolutionary significance in studies and other areas. This might include
the context of genetic resources. This infor- using an extensive set of molecular markers
mation can be translated across species in a for diversity studies, analysis of the extent
comparative context and thus the effective of linkage disequilibrium and identifica-
management of germplasm resources today tion of areas of the genome where important
involves both the practical management genes may occur combined with more con-
of seeds, tissues, clones, cells and mutant ventional approaches using passport data,
stocks and the effective management of large GIS and collections (Kresovich et al., 2002).
reservoirs of electronic information that These techniques could together provide
helps us decipher the value and meaning an optimum set of candidate accessions
of the genetic information contained within for phenotypic and genetic analyses. In all
each germplasm accession. An important cases, efficient characterization methods
question will be whether the increased use will remain an essential component of any
of molecular tools in genebanks will make plant genetic resources studies.
genebanks into true banks of genes and Finally, the domestication stories of
whether, for example, associated sequence maize and tomato should provide a warning
data will be freely available. to all curators and users of genetic resources
A user who has identified an accession that major phenotypic differences between
or accessions of interest from a primary col- accessions do not always mean that there
lection would then move to the next level are equally extensive genetic differences. In
of information, where clusters of germ- addition, significant contributions to agro-
plasm known to represent a broader spec- nomically desirable traits may result from
trum of diversity within a specific gene pool the regulation, both spatial and temporal,
(a subspecies or ecotype within a species) of gene expression rather than from differ-
or a specific trait (resistance to diseases ences in amino acid sequences or protein
and pests) could be defined. The second structures. The challenge for curators now
level of investigation could be conducted is to interpret how this knowledge affects
using carefully designed sets of molecu- not only genetic resource conservation in
lar markers known to target specific traits general, but where and how to look for alle-
or designed to provide haplotype data for les that will be useful for genetic diversity
specific regions of the genome. With the characterization and plant breeding.
6
Molecular Dissection of Complex Traits: Theory

As discussed in Chapter 1, quantitative As a result, interchange between molecular


variation in phenotype can be explained biology and quantitative genetics, which
by the combined action of many discrete have developed independently for many
genetic factors or polygenes, each having years, has become apparent since the 1990s
a rather small effect on the overall pheno- (Peterson 1992; Xu and Zhu, 1994). Since
type and being influenced by the environ- then, high-density molecular maps have
ment. The contribution of each quantitative been constructed in many crops and genome-
locus at a phenotypic level is expressed as wide mapping and marker-based manipula-
an increase or decrease in trait value and tion of genes affecting quantitative traits has
it is not possible to distinguish the effect become possible. Traits which have been
of various loci acting in this manner from improved largely by conventional breed-
one another based on phenotypic variation ing and genetically analysed by biometrical
alone. Furthermore, the effect of particular methods in the past can be manipulated
environmental variables is also expressed now using molecular markers. Location and
as a quantitative increase or decrease in the effect of the genes controlling a quantitative
final trait value. The same amount of total trait can be determined by marker-based
genetic variation can be produced by allelic genetic analysis. A chromosomal region
variation at many loci, each having a small linked to or associated with a marker which
effect on the trait or at a few loci having a affects a quantitative trait was defined as a
larger effect. As both genetic and environ- quantitative trait locus (QTL) (Geldermann,
mental factors contribute in the same posi- 1975). A QTL that has a large effect and can
tive or negative manner to trait value, it is explain a major part of total variation can
generally not possible, from the phenotypic be analysed genetically as a major gene in
distribution of the trait alone, to distinguish most cases. In this chapter, our discussion
the effect of genetic factors from those of is devoted to the genes with relatively small
environmental factors as sources of varia- effects.
tion in traits. Therefore, breeding for quan- Classical quantitative genetics is based
titative traits tends to be a less efficient and on the statistical analysis of the mean and
time-consuming process. variance of the trait of interest (Chapter 1).
Tools for directed genetic manipulation It would be preferable to move away from
of quantitative traits have undergone a crucial statistical considerations of variances to
revolution since the late 1980s with the devel- directly examine individual QTL. Although
opment of molecular markers (Chapter 2). oit is possible to examine candidate loci, it

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 195


196 Chapter 6

is more likely that we can work indirectly DH between two inbred lines (Chapter 4).
with QTL via linked marker loci. Early QTL If a marker is linked to a QTL, marker and
studies were based on manipulations of QTL alleles co-segregate to some extent in
whole chromosomes, including substitu- the progeny. As a result, the frequencies
tion of one chromosome from one inbred of QTL genotypes will be different among
line into another. The approach was refined marker genotypes (Fig. 6.1) and hence the
to apply to small segments of chromosomes, distribution (e.g. the mean and variance)
delineated first by morphological markers of the quantitative trait will vary over the
(Thoday, 1961). marker genotypes. Marker-based analysis of
As large numbers of molecular mark- linkage proceeds by testing for phenotypic
ers become available and thus the whole differences among marker genotypes (Soller
genome mapping of quantitative traits and Beckmann, 1990).
become feasible, Xu (1997) proposed the Before the discovery of molecular
concept of separating, pyramiding and clon- markers, marker-based analysis utilized the
ing QTL, describing how multiple quantita- data from single markers (e.g. Sax, 1923).
tive trait loci (QTL), either clustered together Here, only one or a few markers could be
or dispersed in different chromosomes, can analysed in an experiment because the
first be separated or dissected by molecular number of markers available at that time
marker-assisted QTL mapping and selection was limited. Further, most were morpho-
and then pyramided into one genetic back- logical or biochemical markers making it
ground, either by marker-assisted selection impossible to construct a complete link-
(MAS) or transformation of cloned multi- age map by using single or even multiple
ple QTL, to create transgressive progeny populations. With the development of high-
in plant breeding. At about the same time, density molecular maps, it became appar-
Molecular Dissection of Complex Traits ent that simple (one-locus) marker-based
became an attractive title for a multi-author
book (Paterson, 1998).
The basic method used in Mendelian M1Q1 M2Q2
P1 P2
genetics to find linkage relationship is to M1Q1 M2Q2
classify individuals based on phenotypes
and then compare the proportions of these F1
groups with the theoretical ratio expected
from independent loci and estimate the
recombination fraction. QTL mapping is
establishing linkage between marker loci
and QTL. The fundamental principle is the M1Q1 M1Q2 M2Q1 M2Q2
DH
same, i.e. classifying individuals. Depending M1Q1 M1Q2 M2Q1 M2Q2
on what criterion is used for classification
(1r )/2 r/2 r/2 (1r )/2
of individuals, there are two major types
of QTL mapping methods: marker-based
analysis and trait-based analysis. M1M1 M2M2

1. Marker-based analysis: methods for (1r )mQ1Q1 + rmQ2Q2 (1r )mQ1Q1 + rmQ2Q2
locating chromosomal regions or loci affect-
ing quantitative traits or QTL based on their Fig. 6.1. The frequency distribution of QTL
linkage relationships to Mendelian marker genotypes, Q1Q1 and Q2Q2, within two marker
genotypes, M1M1 and M2M2, in a double haploid
loci was first presented by Thoday (1961)
(DH) population. r is the recombination frequency
and applied in experimental and agricul- between the marker and QTL loci. When r = 0.5
tural species. These studies are all based on (there is no linkage between them), the
differences generated by QTL linked to the frequencies of Q1Q1 and Q2Q2 are the same
marker locus in the mean value of a quan- between the two marker genotypes, which
titative trait between marker genotypes in a means that there is no phenotypic difference
segregating population such as F2, BC and between the two marker genotypes.
Molecular Dissection of Traits: Theory 197

analysis alone could not fully utilize the tal design, linear modelling and the theory
genetic information harboured in complete of probability. Although the readers who
linkage maps for QTL mapping. To fully have little background in these fields are
exploit the potential of complete linkage recommended to focus on Chapter 7 for
maps to locate QTL more efficiently and more practical concepts of molecular dis-
accurately, many QTL mapping approaches section of complex traits, they will still
have been developed using multiple mark- benefit by scanning through each section
ers simultaneously. of this chapter. On the other hand, how-
2. Trait-based analysis: an alternative ever, it is a challenge to have all the sta-
approach to TB analysis is to examine marker tistical issues in this field fully described
allele frequencies in the lines originating in a single chapter. The following refer-
from a segregating population but selected ences are highly recommended for a full
for specific phenotypes (Stuber et al., 1980, coverage of QTL mapping statistics: Xu and
1982). In such populations, selection, espe- Zhu (1994), Lynch and Walsh (1997),
cially for the phenotypic extremes, would Liu (1998), Sorensen and Gianola (2002)
be expected to change the allelic frequen- and Wu et al. (2007). Furthermore, there
cies of segregating plus or minus alleles at are several websites which provide free
QTL thus affecting the trait in question. access to statistical genomics and QTL
Although variation in quantitative traits is mapping courses (e.g. http://www.stat.wisc.
continuous in a population, two extremes edu/yandell/statgen/course/).
of phenotypes could be distinguishable if
the intermediate phenotypes are excluded.
The frequency of plus alleles increases 6.1 Single Marker-based
in the high extreme and the frequency of Approaches
minus alleles increases in the low extreme
(ref. Fig. 7.6B). Hitchhiking effects between Even though trait variation is often known to
such QTL alleles and nearby marker alleles be genetic, the number and location of the
would be expected to generate correspond- genes controlling this variation is generally
ing changes in the allelic frequencies of unknown. On the other hand, marker geno-
the coupled marker alleles. Consequently, types can be scored precisely. If there is an
marker loci at which allele frequencies association between marker type and trait
differed significantly in the high and low value, it is likely that a trait locus is close
extremes would be considered to be in link- to that marker locus. Therefore, the simplest
age with QTL having an effect on the trait analyses consider each marker locus in turn.
under selection. In this way, the number of
segregating QTL affecting the trait under
selection and their general map locations 6.1.1 Assumptions
could be determined. The deviation of allele
frequency from the Mendelian ratio in each Assume two inbred parental lines, P1 and P2
of the two extremes can be tested by the c2 and their F1, a marker locus M with two alle-
statistic. This method is called trait-based les, M1 and M2 linked with a QTL (Q) with
analysis (Keightley and Bulfield, 1993). As two alleles Q1 and Q2 and the recombination
less statistical but more practical issues are fraction between M and Q is r with the trait
associated with trait-based analysis, utiliz- values normally distributed, we have
ing this method is discussed in Chapter 7
along with selective genotyping and pooled P1(Q1Q1M1M1): Y N(mQ1Q1, s 2)
DNA analysis.
P2(Q2Q2M2M2): Y N(mQ2Q2, s 2)
This chapter discusses theoretical
or statistical aspects of QTL mapping. To F1(Q1Q2M1M2): Y N(mQ1Q2, s 2)
understand better the story behind QTL
mapping, the reader should have some For the populations derived from this cross,
basic knowledge of statistics, experimen- there are three marker types in F2 (M1M1,
198 Chapter 6

M1M2 and M2M2), two marker types in the For the convenience of discussion, fre-
double haploid (DH) (M1M1 and M2M2) or in quencies for genotypes in F2 and DH (BC)
the backcross (BC) (M1M1 and M1M2 for BC populations are expressed by a matrix:
derived from backcrossing F1 to P1 and M1M2
and M2M2 for BC derived from backcrossing F1 (1 r )2 2r (1 r ) r2
to P2). Similarly, QTL genotypes for each pop-
pij = r (1 r ) 1 2r + 2r 2 r (1 r )
ulation can be obtained. Tables 6.1, 6.2 and
r2 2r (1 r ) 2
(1 r )
6.3 provide the genotypic frequencies, means
and variances for each marker type and QTL
(i, j = 1, 2, 3 for F2 )
genotypes in populations F2, DH and BC.
In most cases, we assume that trait and
variances are homogeneous over the differ-
ent QTL genotypes, so that s Q21Q1 = s Q2 1Q2 = 1 r r
pij =
s Q2 2Q2 = s 2 for F2 and we also make this same
r 1 r
assumption for other populations such as BC,
DH and recombinant inbred lines (RILs). (i, j = 1, 2 for DH or BC)

Table 6.1. Genotypic frequencies, means and variances in marker and QTL genotypes in the F2 population.

Genotypic frequency

Marker genotype Q1Q1 Q1Q2 Q2Q2 Sample mean Sample variance Sample size

M1M1 (1 r)2 2r(1 r) r2 mM1M1 s M2 1M1 nM1M1


M1M2 r(1 r) 1 2r + 2r 2 r(1 r) mM1M2 s M2 1M2 nM1M2
M2M2 r2 2r (1 r) (1 r)2 mM2M2 s M2 2M2 nM2M2
Mean mQ1Q1 mQ1Q2 mQ2Q2
Variance sQ21Q1 sQ21Q2 sQ22Q2

Table 6.2. Genotypic frequencies, means, and variances in marker and QTL genotypes in the DH
population.

Genotypic frequency

Marker genotype Q1Q1 Q2Q2 Sample mean Sample variance Sample size

M1M1 1r r mM1M1 s M2 1M1 nM1M1


M2M2 r 1r mM2M2 s M2 2M2 nM2M2
Mean mQ1Q1 mQ2Q2
Variance sQ21Q1 s Q2 2Q2

Table 6.3. Genotypic frequencies, means and variances in marker and QTL genotypes in the BC
population derived by backcrossing to P1.

Genotypic frequency

Marker genotype Q1Q1 Q1Q2 Sample mean Sample variance Sample size

M1M1 1 r r mM1M1 s 2M1M1 nM1M1


M1M2 r 1 r mM1M2 s 2M1M2 nM1M2
Mean mQ1Q1 mQ1Q2
Variance s 2Q1Q1 s Q2 1Q2
Molecular Dissection of Traits: Theory 199

6.1.2 Comparison of marker means linkage), the bigger the difference between
mM1M1 and mM1M2. The difference reaches the
Backcross design maximum, mM1M1 mM1M2 = mQ1Q1 mQ1Q2, when
r = 0, i.e. M and Q are completely linked.
Now we take the BC population as an exam- In this case, all differences between marker
ple to show how to detect markertrait genotypes can be attributed to the effect of
association through comparison of means the putative QTL.
for the different marker classes. The difference between marker geno-
The genotypic array for the BC popula- types can be tested by the t-test statistic
tion derived from backcrossing F1 to P1 is
1 r m M1M1 m M1M 2
r
Q1Q1M 1M 1 + Q1Q2M 1M 1 t=
2 2 1 1
s2 +
r
+ Q1Q1M 1M 2 +
1 r
Q1Q2M 1M 2 nM1M1 nM1M 2
2 2
2
For a BC population derived from back- The quantity s is a pooled estimate of
crossing F1 to P2, the two marker genotypes the variance within each marker class of BC
are M1M2 and M2M2 and the two QTL geno- individuals. The higher the t value, the more
types are Q1Q2 and Q2Q2, with other items significant the difference and the closer the
unchanged. linkage is between M and Q.
Only the marker genotypes can be The above discussion can be extended
directly observed and BC individuals have to DH populations where mM1M2 is replaced
two classes of segregates: marker types by mM2M2 and mQ1Q2 by mQ2Q2.
M1M1 and M1M2. The trait distributions in
these two classes are
F2 design
2 2
M1M1: Y(1 r) N (mQ1Q1,s ) + rN (mQ1Q2,s ) For an F2 population, there are ten trait
2
M1M2: YrN (mQ1Q1,s ) + (1 r)N (mQ1Q2,s ) 2 marker genotypes with three distinguish-
able marker classes. The trait distributions
The means and variance of the two are
mixture distributions are
M1M1: (1 r)2 N(mQ1Q1, s 2) + 2r(1 r)
mM1M1 = (1 r)mQ1Q1 + rmQ1Q2 N(mQ1Q2, s 2) + r 2N(mQ2Q2, s 2)
mM1M2 = rmQ1Q1 + (1 r)mQ1Q2 M1M2: r (1 r) N(mQ1Q1, s 2) + [r2 + (1 r)2]
s 2
M1M1 =s 2
M1M2= s + r(1 r)
2 N(mQ1Q2, s 2) + r (1 r) N (mQ2Q2, s 2)
(mQ1Q1 mQ1Q2)2 M2M2: r2 N (mQ1Q1, s 2) + 2r(1 r) N (mQ1Q2, s 2)
The expected difference in average trait val- + (1 r)2 N(mQ2Q2, s 2)
ues is
The trait means of these three marker classes
mM1M1 mM1M2 = (1 2r)(mQ1Q1 mQ1Q2) are
m M1M1 = (1 r )2 mQ1Q1 + 2r (1 r )mQ1Q2
Therefore, only when r = 0.5, i.e. there is no
linkage between M and Q, + r 2mQ2Q2
m M1M 2 = r (1 r )mQ1Q1 + [r 2 + (1 r )2 ]mQ1Q2
mM1M1 mM1M2 = 0 + r (1 r )mQ2Q2

If r < 0.5, there is linkage and mM1M1 mM1M2. m M 2M 2 = r 2mQ1Q1 + 2r (1 r )mQ1Q2


The smaller r (and hence and the tighter the + (1 r )2 mQ2Q2
200 Chapter 6

The trait variances of these three classes are with

(nM1M1 1) M
2
1M 1 + ( nM 2M 2 1) M 2M 2
2
1M 1 = s + 2r (1 r )[( mQ1Q1 mQ1Q2 )
2 2
sM S2 =
nM1M1 + nM 2M 2 2
r ( mQ1Q1 + mQ2Q2 2mQ1Q2 )]2
+ r 2(1 r )2( mQ1Q1 + mQ2Q2 2mQ1Q2 )2 To test marker dominance effect, the test
22
statistic is
s 2
M 1M 2 = s + r (1 r )[( mQ1Q1 mQ1Q2 )
2

+ ( mQ2Q2 mQ1Q2 )2 m M1M 2 ( m M1M1 + m M 2M 2 ) / 2


t2 =
r (1 r ) ( mQ1Q1 + mQ2Q2 2mQ1Q2 )]
2 2
1 1 1 (6.2)
S2 + +
2M 2 = s + 2r (1 r )[( mQ2Q2 mQ1Q2 ) nM1M 2 4nM1M1 4nM 2M 2
2 2
sM
r ( mQ1Q1 + mQ2Q2 2mQ1Q2 )]2
+ r 2(1 r )2( mQ1Q1 + mQ2Q2 2mQ1Q2 )2 with (see equation at bottom of page).

For an additive trait, i.e. mQ1Q1 + mQ2Q2 2mQ1Q2


= 0, the three F2 variances are in general 6.1.3 ANOVA
equal. Then
The test of markertrait association can also be
s M2 1M1 = sM2 1M2 = sM2 2 M2 = s 2 + 2r(1 r)d 2 carried out by analysis of variance (ANOVA)
to analyse between marker classes for each
where d 2 = (mQ1Q1 mQ1Q2)2 = (mQ2Q2 mQ1Q2)2.
marker. If there is no linkage between marker
locus M and QTL, no markertrait associa-
The hypothesis of no linkage between
tion will be found. As a result, the means of
marker and trait loci can be tested by com-
marker genotypes are equal for F2, i.e.
paring the three marker class means. Under
this hypothesis, the three marker means and
mM1M1 = mM1M2 = mM2M2
variances will be equal regardless of the
degree of dominance.
If the individual groups of marker geno-
For an F2 population we can construct
types are considered as independent samples,
two t-tests to test marker additive and dom-
~ ~ phenotypic differences among marker geno-
inance effects separately. Let m M1M1, m M1M2,
types can be tested by one-way ANOVA with
m~M1M2 be the observed trait means of the
unequal sample sizes. The ANOVA model is
groups of individuals with marker geno-
types M1M1, M1M2 and M2M2 for a marker
yij = m + ti + eij
in an F2 population with corresponding
samples size nM1M1, nM1M2 and nM2M2 and vari-
where yij is the phenotypic value for the jth
ances s M2 1M1, s M2 1M2 and sM2 2M2. Recall the addi-
individual within the ith marker genotype
tive and dominance effects as defined in
and m is the phenotypic mean of the map-
Chapter 1. To test marker additive effect, the
ping population. We therefore have
test statistic is
ti = mi - m
m M1M1 m M2M2 eij = yij - mi
t1 =
2 1 1 (6.1)
S + with i = 1, , k (F2: k = 3; BC (DH): k = 2)
nM1M1 nM 2M 2 and j = 1, , ni. We have an ANOVA table

(nM1M1 1) s M
2
1M 1 + ( nM 1M 2 1) s M 1M 2 + ( nM 2M 2 1) s M 2M 2
2 2
S2 =
nM1M1 + nM1M 2 + nM 2M 2 3
Molecular Dissection of Traits: Theory 201

as shown in Table 6.4. A significant treat- So, the regression coefficient for Y on X is
ment effect implies linkage to a segregat-
ing QTL. bYX = (1 2r)d

This method uses one marker at a time


6.1.4 Regression approach to test whether this marker is significantly
associated with the quantitative trait under
Instead of a t-test or ANOVA, the trait value investigation, using the statistic
can be regressed on marker genotype. Using
BC populations as an example, for the jth bYX 0
t=
individual, sbYX

Yj = b0 + bYX Xj + ej
From this statistic, we know that the
where the indicator variable Xj takes the environmental error will affect sbYX but not
values 1 or 0 according to whether the indi- bYX so that reducing the environmental
vidual has marker genotype M1M1 or M1M2. error by controlling environmental factors
The variances and correlations among vari- will improve the QTL mapping effect. The
ables Y and X are: major drawback of such linear model-based
(e.g. ANOVA, regression) single marker
1 1 approaches is that they do not indicate
mX = , mY = ( mQ1Q1 + mQ1Q2 ) which side of the marker the QTL is located
2 2
nor how far it is from the marker.
1 2 1
s X2 = , sY = s 2 + d 2
4 4
6.1.5 Likelihood approach
1
XY = (1 2r )d
4 For a normal variable, Y N(m,s 2), the like-
r XY = (1 2r ) 1 + 4s / d 2 2 lihood for the parameters (see Chapter 2

Table 6.4. One-way ANOVA for quantitative trait values among marker genotypes. df, degree
of freedom; SS, sum of squares; MS, mean square; EMS, expected mean square.

Source df SS MS EMS
Between genotype dfg SSg MSg se2 + n0 st2
Error dfe SSe MSe se2
Total dfT

dfT = n 1, df = k 1, df
i
i t e = dfT dft

SST = y ( y ) / n
i j
2
ij
i j
ij
2
i
i

SS =
y ) ( y )
(
j
ij
2
i j
ij
2

n
t
n i i i
i

SSe = SST SSt

n0 =
( n ) n
i
i
2
i
i
2

( n )(k 1) i
i
202 Chapter 6

for a basic description of the likelihood DH) have genotypes M1M1Q1Q1N1N1 and
method) is M2M2Q2Q2N2N2, respectively, at two marker
2
loci M (with alleles M1 and M2) and N (with
/(2s 2 )
e (Y m ) alleles N1 and N2) and a QTL, each with two
L( m, s 2 ) =
2ps 2 alleles. The QTL is located between two
marker loci. Its genetic distances to M and
If Y1i and Y2i are the trait values for the ith
N are qMQ and qQN, respectively. The genetic
individuals in BC marker classes M1M1 and
distance (cM) between markers M and N is
M1M2, then the likelihood from all nM1M1 and
q. q can be converted into recombinant fre-
nM1M2 backcross individuals is shown in
quency, r, by
the equation at the bottom of the page. The
hypothesis of no linkage can be tested by 1 1 e 2q e 2q
the likelihood ratio statistic r= tanh(2q ) =
2 2 e 2q + e 2q

L( m Q1Q1 , m Q1Q2 , s 2, r = 0.5) Theoretically we have 0 < rMQ < r. When


l=
L( m Q1Q1 , m Q1Q2 , s 2, r ) there is no crossover interference, r = rMQ +
rQN 2rMQrQN.
The estimates of mQ1Q1, mQ1Q2,s2 will be differ- Hypotheses include
ent for r being estimated or set to 0.5. H0: rMQ = rQN = 0.5 QTL unlinked to
markers
6.2 Interval Mapping H1: min(rMQ, rQN) < 0.5 QTL linked to
markers
An indication that the recombination frac- or
tion r is less than 0.5 from single-marker anal- H0: min(rMQ, rQN) > rMN QTL exterior to
yses is confounded by the size of the effects interval
of locus Q, since it is actually the product H1: min(rMQ, rQN) < rMN QTL interior to
(1 r)d that is being tested for departure from interval
zero. A marker close to a QTL of small effect
will give the same signal as a marker some Under the assumption of no interference,
distance from a QTL of large effect. Lander when Q is interior to MN, the event of no
and Botstein (1989) developed a QTL map- recombination between M and N is equivalent
ping method known as interval mapping, to no recombination in both intervals MQ and
by using two flanking markers. van Ooijen QN or no recombination in either interval:
(1992) described this method in more detail
to make it more understandable, while Xu (1 rMN) = (1 rMQ)(1 rQN) + rMQ rQN
et al. (1995) extended the statistics issues to rMN = rMQ + rQN 2rMQrQN
DH populations. (1 2rMN) = (1 2rMQ)(1 2rQN)

where rMN is known, there is only one inde-


6.2.1 Assumptions pendent unknown recombination fraction.
Assume the phenotypic values of n
Suppose the two parental lines used to measured individuals or families/lines in
produce a mapping population (F2, BC or the mapping population is y = {y1, y2, , yn}

nM1M1
1 r (Y1i mQ1Q1 )2 (Y1i mQ1Q2 )2

r
L= exp 2 + exp
i =1 2ps
2
2s 2ps 2 2s 2
nM1M2
(Y2i mQ1Q1 )2 (1 r ) (Y2i mQ1Q2 )2

r
exp 2 + exp
i =1 2ps 2 2s 2ps 2 2s 2
Molecular Dissection of Traits: Theory 203

and the genetic effects of three QTL geno- variance s 2, with m1 = mQ1Q1, m2 = mQ1Q2 and m3
types, Q1Q1, Q1Q2 and Q2Q2 follow normal = mQ2Q2 for an F2 population.
distribution N(mQ1Q1,s2), N(mQ1Q1,s2) and
N(mQ2Q2,s2). Therefore, the effect of QTL on
the quantitative trait can be described by 6.2.2 Likelihood approach
the mixture of these three normal distribu-
tions, with proportions Pm1, Pm2 and Pm3, For two specific markers M and N, the F2
respectively, for the marker locus M. Pm1 and BC (DH) populations have nine and
is zero for the BC population derived from four marker genotypes, respectively, and
backcrossing F1 to P2, Pm2 is zero for DH pop- any individual in the population must have
ulations and Pm3 is zero for the BC popula- one of these genotypes. For the individuals
tion derived from backcrossing F1 to P1. The or lines with a specific marker genotype, the
probability density function for the pheno- sum of the probabilities for three QTL geno-
typic value of an individual or line is types, Q1Q1, Q1Q2 and Q2Q2, is Pm1 + Pm2 +

P
Pm3 = 1. From the genotypic frequencies pro-
f ( y i mi ; rM ) = f (y )
mq q
vided in Tables 6.5 and 6.6, probabilities for
q =1
three QTL genotypes can be obtained. The
where m is the marker genotype; rM is the combined probability density function or
recombinant frequency between the marker likelihood function for all individuals/lines
M and QTL and PMq is the probability of QTL in a population can be expressed as:
genotype (q {1,2,3} for an F2 population),
which depends on the marker genotype m L = L( mq , m j , s 2, y 1, y 2,...y n )
and rM and n n

fq ( y ) =
1 ( y mq )2
exp
= f (y
i =1
i mi ; rM ) = P
i =1 q =1
f (y )
mq q

2ps 2 2s 2
n
( y i mq )2
P
1
which is the probability density function = mq exp
2ps 2 2s 2
for a normal distribution with mean mq and i =1 q =1

Table 6.5. The expected genotypic frequencies in the F2 population (each frequency 4).

Genotype Q1Q1 Q1Q2 Q2Q2

M1M1N1N1 rM2 r N2 2rM(1 rM)rN(1 rN) (1 rM)2(1 rN)2


M1M1N1N2 2r M2 rN(1 rN) 2rM(1 rM)[rN2 + (1 rN)2] 2(1 rM)2 rN(1 rN)
M1M1N2N2 rM2 (1 rN)2 2rM(1 rM)rN(1 rN) (1 rM)2r N2
M1M2N1N1 2rM(1 rM)rN2 2[r M2 + (1 rM)2]rN(1 rN) 2rM(1 rM)(1 rN)2
M1M2N1N2 4rM(1 rM)rN(1 rN) 2[r M2 + (1 rM)2][r N2 + (1 rN)2] 4rM(1 rM)rN(1 rN)
M1M2N2N2 2rM(1 rM)(1 rN)2 2[r M2 + (1 rM)2]rN(1 rN) 2rM(1 rM)r N2
M2M2N1N1 (1 rM)2 r N2 2rM(1 rM)rN(1 rN) r M2 (1 rN)2
M2M2N1N2 2(1 rM)2rN(1 rN) 2rM(1 rM)[r N2 + (1 rN)2] 2r M2 rN(1 rN)
M2M2N2N2 (1 rM)2 (1 rN)2 2rM(1 rM)rN(1 rN) r M2 r N2

Table 6.6. The expected genotypic frequencies in the DH (BC) population (each frequency 2).

Genotype Q1Q1(Q1Q2) Q2Q2

M1M1N1N1(M1M2N1N1) rMrN (1 rM)(1 rN)


M1M1N2N2(M1M2N2N2) rM(1 rN) (1 rM)rN
M2M2N1N1(M2M2N1N2) (1 rM)rN rM(1 rN)
M2M2N2N2(M2M2N2N2) (1 rM)(1 rN) rMrN
204 Chapter 6

where mi is the marker genotype for the ith M (maximization) step is to maximize the
individual/line, with a total of n individu- likelihood function (Eqn 6.3 ) to obtain a
als/lines in the population. new cycle of l values, by using the initial
Maximum likelihood estimates (MLEs) values of l and the expectation obtained
of parameters mj and s2 are those for maxi- for missing data. E and M steps are proc-
mizing the above likelihood function. In essed alternatively by using the new l to
order to maximize the function, we take the replace the old l, until the likelihood func-
logarithm for the likelihood, tion (Eqn 6.3) does not increase (the differ-
ence between the two iterations is less than
n

f (y
a predetermined critical value).
ln L = ln i mi ; rM )
Under the null hypothesis H0: mi = mL
i =1
(i L) (there is no linked QTL), the likeli-
n hood function becomes
= ln (6.3)
2ps 2 n
n
( y i mq ) 2 L0 = L( m P , s P2 , y 1, y 2,...y n ) = f (y ) (6.5)

i
+ ln Pmq exp i =1
i =1 q =1 2s 2
where
If we define 1 n
m p = yi is the average mean of the
n i =1
Pmq fq ( y i ) mapping population;
Wq ( y i mi ; rM ) =
f ( y i mi ; rM ) n
s p = 1 ( yi m P )2 is the variance
2

n i =1
thus the probability that QTL genotype is of the mapping population; and
q, when an individual or line has pheno-
1 ( y m )2
type y and marker genotype m, is deter- f ( yi ) = exp i 2 P is the
mined by Wq. 2ps 2 2s
Set the derivative of Eqn 6.3 as zero and normal density function with mean mP
solve the equation to obtain and variance s P2.
n The statistic for the likelihood ratio test of

i =1
[Wq ( y i mi ; rM ) y i ] the alternative hypothesis (at least one QTL
exists at this location) can be converted into
m q = n (6.4a) a likelihood of odds (LOD) score,
i =1
Wq ( y i mi ; rM )
L( m j , s 2, y 1, y 2,...y n )
LOD = log10 2
n L0 ( m P , s P , y 1, y 2,...y n )

1
s 2 = [Wq ( y i mi ; rM ) (y i 2
mq ) ]
n (6.4b) For an interval bracketed by the two
i =1 q =1
markers M and N, a LOD score is calculated
When QTL is located between the two for every scanning position. LOD scores
marker loci, Eqn 6.4 has no explicit solu- obtained for all marker intervals located
tion. However, it can be solved using EM on the same chromosome form a likelihood
iteration method (Dempster et al., 1977). profile to show possible position(s) of QTL
The E (expectation) step in the EM method associated with the quantitative trait. This
is to obtain the expectations for unknown method uses two flanking markers at a time
missing data by using known data (y and m) to test whether there is any QTL locating at
and initial approximations of, for example the interval bracketed by the two markers.
for a F2 population, l (m1, m2, m3, s2)(using For a specific interval, the test is carried
the average phenotypes of quantitative trait, out at any point by moving a step from one
x1, x2 and x3, for individual/line groups of marker to the other. After completion of the
marker genotypes and the sample variance test for the interval, the test moves to the
of the population, s2, as initial values). The next two flanking markers. The LOD score
Molecular Dissection of Traits: Theory 205

does not provide a test for the presence of a intervals are larger than the critical values.
QTL between the two markers and so is not The two-LOD support interval determined
a formal test of a QTL within the interval. by the range of the highest LOD minus two
Instead the LOD compares the likelihood of LOD provides an empirical confidence
the QTL being at the position characterized interval for the range of QTL location (Fig.
by recombination fractions rMQ, rQN against 6.2). Simulation studies show that a two-
the likelihood that it is at some position LOD interval is close to the 95% interval.
unlinked to the interval.
The amount of support for a QTL at a
particular map position is often displayed
graphically through use of likelihood (or 6.3 Composite Interval Mapping
profile) maps (Fig. 6.2), which plot the like-
lihood-ratio statistic (or a closely related Most of the single-QTL methods can be
quantity) as a function of the map position extended to multiple QTL by condition-
of the putative QTL. Lander and Botstein ing additional marker loci and using
(1989) plotted the LOD scores defined by conditional probabilities for multi-locus
Morton (1955). genotypes. This approach has been used
Empirically, a QTL is claimed when to develop explicit models for two or
the LOD is larger than a critical value pre- three linked QTL (e.g. Knapp, 1991;
determined (for example, 2 or 3) or gener- Haley and Knott, 1992; Martinez and
ated by permutation. The location of the Curnow, 1992; Jansen, 1996; Satagopan
QTL should be the chromosome region that et al., 1996). Kearsey and Hyne (1994),
corresponds to the highest likelihood map Hyne and Kearsey (1995) and Wu and Li
if LOD scores in several flanking marker (1994, 1996) also proposed a very simple

8
Estimated map location
7

6
LOD score

5 2-LOD decrease

Critical value = 3.5


4

2 2-LOD
support
interval
1

0
0 10 20 30 40 50 60
Linkage map (cM)

Fig. 6.2. Hypothetical likelihood map for the markerQTL association on a linkage map in internal
marker analysis. A QTL is indicated if any part of the likelihood map exceeds a critical value. In such
cases, the estimated QTL location is the value of centimorgans giving the highest likelihood. Approximate
confidence intervals for QTL position (two-LOD support intervals) are often constructed by including
the set of all centimorgan values giving likelihoods within two-LOD scores of the maximum value.
206 Chapter 6

regression based method that simultane- genes on chromosomes, multiple regres-


ously considers all the markers on a single sion analysis has a very important property
chromosome for locating multiple linked in that the partial regression coefficient of
QTL. Wright and Mowers (1994) and a trait on a marker is expected to depend
Whittaker et al. (1997) showed how posi- only on those QTL which are located on
tional information for linked QTL can be the interval bracketed by the two neigh-
extracted from the regression coefficients bouring markers and to be independent of
of a standard multiple regression incorpo- any other QTL if there is no crossing-over
rating several linked markers (see Lynch interference and no epistasis (Stam, 1991;
and Walsh (1998) for a detailed discussion Zeng, 1993).
of these topics). Here a composite inter-
val mapping (CIM) developed by Zeng
(1993, 1994) and Jansen and Stam (1994)
is described following Zeng (1998). 6.3.2 Model

CIM is an extension of interval mapping


with some selected markers also fitted
6.3.1 Basis in the model as cofactors to control the
genetic variation of other possibly linked
The interval mapping methods described or unlinked QTL. Using the appropri-
above and other methods for single QTL ate unlinked markers can partly account
are not suitable when there are multiple for the segregation variance generated by
QTL closely linked on one chromosome. unlinked QTL, while the effects of linked
When two QTL are closely linked in cou- QTL can be reduced by including markers
pling phase, i.e. two increasing (or decreas- linked to the interval of interest. To test
ing) alleles are linked together, the marker for a QTL on an interval between adjacent
between the two QTL will show the high- marker Mi and Mi+1 in particular, we extend
est t- or F-value for single marker analysis the model
because of the additivity of two linked QTL.
As a result, a false QTL will be declared to yj = m + b*xj* + ej
be located between the two real QTL, which to
is called a ghost QTL. The same is true for
interval mapping. Interval mapping gives y j = m + b*x*j + b xk
k jk + ej
results that can be confounded by the pres-
ence of additional QTL outside the interval where yj is the trait value of the jth individ-
being considered. ual in a population, b* is the effect of the
Ideally, when we test an interval for a putative QTL, xj* refers to the putative QTL
QTL, we would like to have our test statis- and xjk refers to those markers selected for
tic independent of the effects of possible genetic background control.
QTL at other regions of the chromosome
to avoid ghost QTL. If such a test can be
formulated, we can simplify mapping for
multiple QTL from a multiple dimensional 6.3.3 Likelihood analysis
search problem to a one-dimensional
search problem, as the test for each inter- The likelihood function is specified as
val is independent and for each marker
interval we can effectively consider the n
y j X jB b *
possibility of the presence of only a sin-
gle QTL. This test can be constructed by
L(b*, B, s 2 ) = p f
j =1
1j
s
using a combination of interval mapping
y j X jB
with multiple regression. Largely because + p0 jf
of the linear structures of the locations of s
Molecular Dissection of Traits: Theory 207

where X j B = m + bk x jk . The MLEs of the Setting this derivative to zero leads to the
k
various parameters can be found in a similar solution
manner as for interval mapping. For b*see ) b*2 c
)'(Y XB
ns 2 = (Y XB
Eqn a at the bottom of the page. Setting this
derivative to zero gives

n 6.3.4 Hypothesis test


P (y
j =1
j j X jB b*) = 0
The hypotheses to be tested are H0:b* = 0
and H1:b* 0. The likelihood function
where Pj is calculated by Eqn b (see bottom under the null hypothesis is
of page). This leads to the solution given by
n
y j X jB
Zeng (1994) as
L(b* = 0, B, s 2 ) = f
j =1
s
n n

b* = (y j =1
j
)Pj /
X jB P
j =1
j
with the MLEs
= (Y XB)' P/c
B = (X'X)1X'Y

2 = (Y XB)'(Y XB)/n
where c = nj =1 p j , Y = { y j }n1, P = {Pj }n1, and
a prime denotes transposition.
The likelihood ratio (LR) test statistic is
Differentiating the log-likelihood with
respect to B L(b* = 0, B , s 2 )
n LR = 2ln or
ln L L(b*, B
, s 2 )
B
= [ P X' ( y
j j j X jB b*)
2
j =1 L(b* = 0, B , s )
LOD = log10
+ (1 Pj )X'j ( y j X jB)]/s 2
L(b*, B, s 2 )

Expressed in matrix notation, the equation Like Lander and Botsteins interval
ln L/ B = 0 becomes mapping, this test can be performed at any
position in a genome. Thus it gives a system-
X'(Y XB) = X'Pb* atic strategy to search for QTL in a genome.
B = (X'X)1X'(Y Pb*) As the test statistic is almost independent
for each interval, a test on each interval is
Differentiating the log-likelihood with res- more likely to test for a single QTL only.
pect to s 2:
n
ln L
s 2
= [P (y
j =1
j j X j B b*)2
6.3.5 Selection of markers as cofactors

+ (1 Pj )( y j X j B)2 ]/(2s 4 ) Which markers should be added as cofac-


n / (2s 2 ) tors has no single solution as the question

n
ln L p1 jf([ y j X jB b*] / s ) y j X jB b*
b*
= p f([y
j =1
1j j X jB b*] / s ) + p0 jf([ y j X jB] / s ) s2
(a)

p1 jf([ y j X jB b*] / s )
Pj = (b)
p1 jf([ y j X jB b*] / s ) + p0 jf([ y j X jB] / s )
208 Chapter 6

depends on the number and positions of close to the interval under test are not suita-
underlying QTL, the information that is not ble as cofactors. To solve this problem, only
available a priori. Suppose the interval of markers that are some distance away from
interest is delimited by markers i and i + 1. the test interval can be selected. Because
Additional markers i 1 and i + 2 as cofac- the size of this test window depends on test
tors account for all linked QTL to the left of intervals, different sizes should be tested
marker i 1 and to the right of marker i + 2. to find a suitable window size for each test
Thus, while these cofactors do not account interval.
for the effect of linked QTL in the intervals
immediately adjacent to the one of interest,
they do account for all the linked QTL.
The number of cofactors should not 6.3.6 Inclusive composite interval
exceed 2 n where n is the number of indi- mapping
viduals in the analysis (Jansen and Stam,
1994) or alternatively it can be determined In Zengs (1993, 1994) algorithm, the QTL
automatically by F-to-enter or F-to-drop effect at the current testing position and
criterion in the forward or backward step- regression coefficients of the marker varia-
wise regression analysis. A first approach bles used to control genetic background are
would be to include all unlinked markers estimated simultaneously in an expecta-
showing significantly markertrait associa- tion and conditional maximization (ECM)
tion (detected, for example, by stand single- algorithm. Thus, the same marker variable
marker regression). If several linked markers may have different coefficient estimates
from a single chromosome all show signifi- as the testing position changes along the
cant effects, one might just use the marker chromosomes. The algorithm used in CIM
having the largest effect. A related strategy cannot completely ensure that the effect of
is to first perform a multiple regression QTL at the current testing interval is not
using all markers unlinked to the region of absorbed by the background marker vari-
interest and then eliminate those that are ables, which may result in biased estima-
not significant. tion of the QTL effect.
In the computer program designed for A modified algorithm called inclusive
CIM, QTL CARTOGRAPHER, a two-step procedure composite interval mapping (ICIM) was
for practical data analysis was implemented. proposed by Li et al. (2007). In ICIM, marker
In the first step, np markers that are signifi- selection is conducted only once through
cantly associated with the trait are selected stepwise regression by considering all
by (forward or backward) stepwise regres- marker information simultaneously and the
sion. In the second step (mapping step), for phenotypic values are then adjusted by all
each testing interval, except of the markers markers retained in the regression equation
for the putative QTL, two markers that are at except the two markers flanking the current
least Ws cM away from the test interval (one mapping interval. The adjusted phenotypic
for each direction) are first picked up to fit values are finally used in interval mapping.
in the model to define a testing window for The modified algorithm has a simpler form
blocking other possible linked QTL effects than that used in CIM but a faster conver-
on the test. Then, those selected np mark- gence. ICIM retains all the advantages of
ers that are outside of the testing window CIM over interval mapping and avoids the
are also fitted into the model to reduce the possible increase of sampling variance and
residual variance. the complicated background marker selec-
The accuracy of locating QTL provided tion process. Extensive simulations using
by CIM is at the cost of reduced statistical two genomes and various genetic models
power because markers selected as cofac- indicated that ICIM has increased detection
tors around the test interval will pick up power, reduced false detection rate and less
some effect of the QTL that is located in biased estimates of QTL effects. ICIM has
the test interval. Therefore, markers that are been extended to map digenic interacting
Molecular Dissection of Traits: Theory 209

QTL (Li et al., 2008). Windows-supported a prediction procedure to estimate or


software, ICIMAPPING, was developed for predict the genotypic values of indi-
using the ICIM mapping approach and is viduals based on the selected genetic
available at http://www.isbreeding.net. model and estimated genetic parameter
values for MAS.

6.4 Multiple Interval Mapping


6.4.1 Multiple interval mapping model
Genetic mapping approaches involving and likelihood analysis
multiple QTL have been developed. In gen-
eral, there are three different approaches: For m putative QTL, the model of MIM is
(i) maximum likelihood using EM include specified as
multiple interval mapping (MIM) (Kao and
m
Zeng, 1997) and sequential testing to search
model space; (ii) multiple imputation (Sen yi = m + a x*
r =1
r ir

and Churchill, 2001) uses Bayesian log pos- t (6.6)


terior odds and sequential testing and pair-
wise plots to search; and (iii) Markov chain
+
r s (1, ... , m )
b rs ( x*ir x*is ) + ei

Monte Carlo (MCMC) (Satagopan et al.,


1996) employs Markov chain sampling to where:
search model space. In this section, we will
focus on MIM based on Kao and Zeng (1997) yi is the phenotype value of individual i;
and Kao et al. (1999). i indexes individuals of the sample
MIM is a multiple-QTL oriented method (i = 1, 2,,n);
combining QTL mapping analysis with the m is the mean of the model;
analysis of genetic architecture of quanti- ar is the marginal effect of putative QTL r;
tative traits through a search algorithm to x*ir is an indicator variable denoting

search for number, positions, effects and genotype of putative QTL r (defined
interaction of significant QTL. Using mark- by 1/2 or 1/2 for the two genotypes),
ers for simultaneous multiple QTL analysis which is unobserved but can be
was suggested first by Lander and Botstein inferred from marker data in sense of
(1989), although the idea was pursued only probability;
with a very limited scope. Bayesian statis- brs is the epistatic effect between puta-
tics via MCMC for mapping QTL is also tive QTL r and s;
based on multiple QTL, particularly when it r s (1, , m) denotes a subset of
is combined with a reversible-jump process, QTL pairs that each shows a significant
which will be discussed in the Section 6.7. epistatic effect, because if all pairs of m
MIM consists of four components: QTL are fitted in the model, the model
can be over parameterized;
an evaluation procedure to analyse the m is the number of putative QTL cho-
likelihood of the data given a genetic model sen by either their significant marginal
(number, position and epistasis of QTL); effects or significant epistatic effects;
a search strategy to select the best t is the number of significant pairwise
genetic model (among those sampled) epistatic effects; and
in the parameter space; ei is a residual effect of the model
an estimation procedure to estimate assumed to be normally distributed
all parameters of interest in the genetic with mean zero and variance s 2.
architecture of quantitative traits
(number, positions, effects and epista- As the genotypes of an individual at
sis of QTL; genetic variances and covar- many genomic locations are not observed
iances explained by QTL effects) given (but marker genotypes are), the model con-
the selected genetic model; and tains missing data. So the likelihood function
210 Chapter 6

of the data given the model is a mixture of The M-step is shown in Eqns 6.96.11
normal distributions (see bottom of page) where Er is the rth
element of E and Dijr is the rth element
L(E, m, s 2 ) of Dij.
n 2m These equations can be expressed in a
(6.7)
= p f( y


ij i m + D ij E, s 2 )

general form in matrix notation as (Kao and
Zeng, 1997)
i =1 j =1

The term in brackets is the weighted E[t + 1] = diag(V)1[D'P' (Y m)


sum of a series of normal density functions, nondiag(V)E(t)] (6.12)
one for each of 2m possible multiple-QTL
1
genotypes. pij is the probability of each m= 1 '[Y PDE]
multi-locus genotype conditional on marker n
data; E is a vector of QTL parameters (as
1
and bs), Dij is a vector of the genetic model s2 = [(Y m )'(Y m )
design specifying the configuration of x*s n
association with each a and b for the jth 2(Y m )' PDE + E ' VE]
QTL genotype (see Kao and Zeng, 1997);
with
and f(yi|m,s2) denotes a normal density
function for y with mean m and variance s 2.
Thus the probability density of each V = {1'P(Dr # Ds)}r,s = 1,,w and P = {pij}
individual is a mixture of 2m possible nor-
mal densities with different means m + DijE where # denotes the Hadamard product,
and mixing proportions pij which are calcu- which is the element-by-element product of
lated from marker information. corresponding elements of two same order
The procedure to obtain MLEs using an matrices and ' denotes transposition of a
EM algorithm has been described by Kao matrix or vector.
and Zeng (1997). In the [t + 1]th iteration, The two forms (Eqns 6.9 and 6.12) are
the E-step is actually somewhat different for computa-
tion. Equation 6.12 implies the update of E
pijf( y i m [t ] + Dij E [t ], s 2[t ] ) as a vector in one step and Eqn 6.9 implies
p ij[t +1] = 2m (6.8)

the update of each element in E in turn
pijf( y i m [t ] + Dij E [t ], s 2[t ] )
j =1 (always using the most recently updated

p
r 1 w
[t + 1]
ij Dijr [( y i m [t ] ) Dijs Es[t +1] Dijs Es[t ] ]
[t + 1] i j s =1 s = r +1
E = (6.9)
p
r
[t + 1] 2
ij Dijr
i j


y p
1
m [t +1] = i
[t + 1]
ij Dijr E r[t +1] (6.10)
n i j r

1
s 2[t +1] =
n (y i m [t +1] )2 2 (y i m [t +1] ) p [t + 1]
ij Dijr E r[t +1]
i i j r

(6.11)
+ p [ijt +1]Dijr Dijs E r[t +1]Es[t +1]

r s i j
Molecular Dissection of Traits: Theory 211

values for other parameters). Equation 6.9 analysis to finalize the search for a genetic
is more stable than Eqn 6.12 numerically: model under MIM.
Eqn 6.12 can lead to divergence in certain
cases and Eqn 6.9 can always lead to con- 1. Begin with a model that contains m QTL
vergence, although at a slightly slower pace and t epistatic effects.
(Z.-B. Zeng, North Carolina State University, 2. Scan the genome to search for the best
personal communication). position of an (m + 1)th QTL and then per-
Note on the meaning and difference form a likelihood ratio test for the marginal
between pij and pij: pij is the probability of effect of this putative QTL. If the test statis-
each multi-locus QTL genotype conditional tic exceeds the critical value, this effect is
on marker genotype and pij is the prob- retained in the model.
ability of each multi-locus QTL genotype 3. Search for the t +1 epistatic effect among
conditional on marker genotype and also the pairwise interaction terms not yet
phenotypic value. included in the model and perform the like-
The test for each QTL effect, say Er, is lihood ratio test on the effect. If LOD exceeds
performed by a likelihood ratio test con- the critical value, the effect is retained in
ditional on other selected QTL effects (see the model. Repeat the process until no more
equation at bottom of page). significant epistatic effects are found.
For given positions of m putative QTL 4. Re-evaluate the significance of each QTL
and m + t QTL effects, the likelihood analy- effect currently fitted in the model. If LOD
sis can proceed as outlined above. Now the for a QTL (marginal or epistatic) effect falls
task is to search and select the best genetic below the significant threshold conditional
model (number, positions and interaction of on other fitted effects, the effect is removed
QTL) that fits the data well. from the model. However, if the marginal
effect of a QTL that has significant epistatic
effect on other QTL falls below the thresh-
6.4.2 Model selection old, this marginal effect is still retained.
This process is performed in a stepwise
Pre-model selection manner until the test statistic for each effect
is above the significance threshold.
As the evaluation of the MIM model is com- 5. Optimize estimates of QTL positions
putationally intensive, it is important to based on the currently selected model.
select a good pre-model for MIM analysis. Instead of performing a multi-dimen-
The following procedure can be used. First, sional search around the regions of cur-
select a subset of significant markers. Then, rent estimates of QTL positions (which is
use the results from marker selection to an option), estimates of QTL positions are
perform CIM to scan the genome for can- updated in turn for each region. For the
didate positions. Finally, evaluate and test ith QTL in the model, the region between
each parameter in the pre-model under its two neighbour QTL is scanned to find
MIM and drop any non-significant estimate the position that maximizes the likelihood
in a stepwise manner. (conditional on the current estimates of
positions of other QTL and QTL epista-
Model selection using multiple interval
sis). This refinement process is repeated
mapping
sequentially for each QTL position until
After the first evaluation of the pre-model, there is no change in the estimates of QTL
perform the following stepwise selection positions.

L( E1 0,..., E m+t 0)
LOD = log10
L( E10,..., E r 1 0, E r = 0, E r +1 0,..., E m+t 0)
212 Chapter 6

6. Return to step 2 and repeat the process LR-to-enter statistic for likelihood analysis
until no more significant QTL effects can be at minimum is
added into the model and estimates of QTL
positions are optimized. Lk
LRk = 2log n log(c(n) / n + 1)
L k +1
Stopping rules c(n)
An important issue associated with model
selection is a stopping rule for the model The criterion is basically defined by the
search algorithm or criterion for compar- choice of the penalty c(n). Using c(n) = 2 as
ing different models. In regression analysis suggested by Akaike (1969) would mean
with model selection, the stopping rules are that the final threshold in LOD is 0.43. c(n)
usually decided by minimizing the final can take a variety of forms, such as: c(n) =
prediction error (FPE) criterion or informa- log(n), which is the classical Bayes informa-
tion criteria (IC). tion criterion (BIC); c(n) = 2, which is the
The FPE criterion is Akaike information criterion (AIC) (Zou
and Zeng, 2008).
In reference to QTL analysis on markers,
Sk = (n + k)RSSk / (n k)
Broman (1997) suggested using c(n) = d log n
and recommended d be between 2 and 3. For
where RSSk is the residual sum of squares n = 100500, the threshold in LOD would be
and k is the number of parameters fitted in 22.7 for d = 2 and 34 for d = 3. However,
the model. The IC of the general form is this argument is still rather arbitrary and
does not relate to the genetic length of the
IC = 2[log Lk kc(n)/2] (6.13) linkage map, number of markers and linkage
groups or the distribution of markers.
where Lk is the likelihood of data (Eqn 6.7)
given a genetic model with k parameters
and c(n) is a weighting function of the sam- 6.4.3 Estimating genotypic values
ple size (examples given below). This is and variance components of QTL effects
approximately equivalent to
Given estimates of the QTL parameters, the
IC = log [RSSk/n] + kc(n)/n genotypic values of an individual can be
estimated. This estimation is complicated by
the fact that QTL genotypes are not observed
in regression analysis. directly, rather only marker genotypes are
The IC criteria can be related to the observed. Thus, the estimation for an indi-
F-to-enter statistic (for regression analy- vidual is the weighted mean of all possible
sis) or LR-to-enter statistic (for likelihood genotypic values, weighted by the probabil-
analysis) in the stepwise selection proce- ity (pij) of each QTL genotype conditional on
dure. It was shown (Miller, 1990: p. 208) both the marker and the phenotypic data.
that Eqn 6.12 leads to the F-to-enter statis- From Eqn 6.10, this estimation equation is
tic for regression analysis at the minimum
(see Eqn 6.14 at bottom of page) provided
2m m +t

D
that c (n)/n is small. As LR = n log (SSRk /
SSRk + 1) in the setting of regression analy- y i = m + ij ijr E r
j =1 r =1
sis, Eqns 6.13 and 6.14 imply that the

SRRk SRRk +1 k + 1
(n k 1)(e c( n)/ n 1) 2c(n) 1 (6.14)
SRRk +1 / (n k 1) n
Molecular Dissection of Traits: Theory 213

where the first summation is over all pos- In this form s 2 is expressed as a differ-
sible 2m QTL genotypes and the second ence between the MLE of total phenotypic
summation is over all effects of the model variance s p2 (the first part of Eqn 6.15) and
(m main effects and t epistatic effects). m that of the genetic variance s g2 (the second
is the MLE of m obtained from Eqn 6.10 at part of Eqn 6.15). g2 can be further parti-
the equilibrium of the final model and r tioned into the equation at the very bottom
is the MLE of QTL effect Er obtained from of the page. s E2r estimates genetic variance
Eqn 6.9. pij is the MLE of p obtained from due to the QTL effect Er and s E2r ,Es estimates
Eqn 6.8. genetic covariance between QTL effects Er
To predict the genotypic values of and Es.
quantitative traits based on marker informa- It is convenient and informative
tion only, we need to use to combine the variance due to each
QTL effect with half of the covariances
y i = m + p D
j r
ij ijr E r between this QTL effect and other effects,
and report this variance component as
the variance component explained by this
as p ij is a function of phenotype yi which is QTL effect
unavailable in early selection.
s
1
The genetic variances and covariances s r2 = s E2 r + E r , Es
explained by each QTL effect can be esti- 2 sr
mated directly from the likelihood analy-
sis. Applying the EM algorithm, Eqn 6.12 Whereas 2Er estimates the variance of the rth
leads to QTL effect in linkage equilibrium (in which
sEr ,Es = 0), s r2 estimates the contribution to
E = V '(Y m )
1D'P the total variance in the current population
with linkage disequilibrium. Estimates of
This implies
these variances, covariances and variance
1 components can be given as a ratio of the
s 2 = [(Y m )'(Y m ) E ' VE
]
total phenotypic variance. Note that s2g/s2p
n
is the coefficient of determination (R2) of the
or Eqn 6.15 at the bottom of m the page MIM model. Note also whereas s 2Er is always
n n 2
where y = i =1 yi /n and Dr = i=1 j =1 ij Dijr/n. positive, s r2 is not necessarily positive.

1
n m +t m +t n 2m

s 2 =
n i =1
( y i m )2 p D ij ijr Dijs E r E s

r =1 s =1 i =1 j =1
1
n m+t m+t n 2m (6.15)
=
n i =1
( y i y )2 p (D ij ijr Dr )( Dijs Ds )E r E s

r =1 s =1 i =1 j =1

m +t 1 n 2m m + t r 1 2 n 2m
s g2 =
n
p ij ( Dijr Dr ) E r2 +
2

n
p ij ( Dijr Dr )( Dijs Ds )E r E s

r =1 i =1 j =1 r = 2 s =1 i =1 j =1
m +t m + t r 1

= s + s
r =1
2
Er
r = 2 s =1
E r , Es
214 Chapter 6

6.5 Multiple Populations/Crosses segregating, depending how many paren-


tal lines are involved. On the other hand,
6.5.1 Experimental designs all crosses or populations could be derived
from outbred parents, which results in an
There are many different types of popula- uncertain coupling/repulsion phase and is
tions available in genetics and breeding pro- heterozygous for some or all parental lines.
grammes (Chapter 4). However, only very few For QTL mapping with crosses from
of them, as discussed previously, have been segregating populations, similar model and
exploited for QTL mapping. These crosses/ analysis procedures can be used as inbred
populations, derived from divergent inbred crosses, but with more complicated analy-
lines, populations and species, provide sis. The probability of the allelic origin for
potential opportunities for more convenient each genomic point from observed markers
QTL mapping and better integration of map- needs to be estimated. This type of popula-
ping with plant breeding programmes. tion has low power for QTL analysis because
One of the most frequently used crosses QTL alleles may not be preferentially fixed
are BCs with only two genotypes at a locus in the parental populations and it makes
which are simple to analyse. Another com- power calculations more difficult.
mon cross, the F2, which has three geno- For multiple crosses developed from
types at a locus, can be used to estimate different heterogenous parental lines, half-
both additive and dominance effects. sib or full-sib relationships may exist in
Compared to the BC, it is more complex some of the individual plants used for map-
for data analysis particularly for multiple ping. Half sibs can be analysed based on the
QTL with epistasis, while it provides more segregation of one parent, which is similar
opportunities and information to examine to the backcrossing model and analysis. This
genetic structure or architecture of QTL and type of population is less powerful for QTL
has more power than the BC. detection because there is more uncontrol-
Some less frequently used crosses are: lable variability in the other parents. The
(i) F3, F4, etc. derived by selfing or ran- allelic effect differences are only analysed
dom mating from F2. The random mating for one parent, not for those between widely
increases recombination and expends the differentiated inbred lines, populations and
length of the linkage map and thus increases species. Generally the relevant heritability
the mapping resolution (estimation of QTL is low for QTL analysis. For full sibs, there
position). (ii) Repeated BCs that are derived are four genotypes at a locus; allelic sub-
by continuous backcrossing to one of the stitution effects can be estimated for male
parental lines and the end-product would and female parents and their interaction
be the near-isogenic lines (NILs). (iii) Multi- (dominance). Information for QTL analysis
way crosses, derived by crossing a hybrid is double that for half sibs and this type of
with another hybrid or an inbred/cultivar, population should be more powerful.
which results in a population with three or
four parental lines involved. (iv) Multiple
crosses derived from a complicated mat- 6.5.2 QTL for multiple crosses
ing design such as NCII and diallel design
(Chapter 4). QTL analysis with multiple crosses can
With all available crosses and popula- be achieved separately for each cross,
tions, QTL mapping can be based on mul- which is simple but inefficient with less
tiple populations derived from the same power. Multiple crosses derived from dif-
inbred parents or a hybrid, or on multiple ferent parents have more power because
crosses derived from different parental more individuals are involved and there
lines. In the former case, there are two pos- are thus more informative markers. These
sible alleles segregating in the populations types of crosses can be used to study
for the diploid species, while in the latter the effect of QTL under different genetic
case, more than two possible alleles will be backgrounds such as genotype by cross
Molecular Dissection of Traits: Theory 215

interaction and epistatic interaction. For ing dominance when it does not exist and
all multiple crosses, a more reasonable increased variance and thus bias QTL
analysis would be the combined analy- results, although location estimate is unbi-
sis over crosses. In this way, crosses cre- ased. Statistical power can be increased
ated or evaluated at different times can be by combining crosses, which is important
combined and multiple projects in a team when several related crosses are created.
or across research groups can be related The threshold idea for testing and loci inter-
and shared. Disadvantages include more vals has been extended to multiple crosses
complications for analysis with few soft- (Zou et al., 2001).
ware packages available and having to Jannick and Jansen (2001) devel-
account for the multiple related crosses oped a method to map epistatic QTL by
(where individuals may be correlated to identifying loci with strong interaction
each other both genotypically and pheno- between QTL and genetic background.
typically). QTL mapping approaches have The approach requires large populations
been developed for four-way crosses (Xu, derived from multiple related inbred-line
1996) and crosses derived from multiple crosses. The method is applied to simu-
inbred lines (Liu and Zeng, 2000). late DH populations derived from a diallel
Broman et al. (2003a) discussed how among three inbred parents. This approach
to combine multiple crosses in QTL analy- allows detection of QTL involved not only
sis. For crosses with founders unrelated to in pairwise but also higher-order interac-
each other, a nave sum of separate LODs tion and does so with one-dimensional
by cross can be used assuming a different genome searches.
gene action in different crosses, or com- The North Carolina Experimental III
bined analysis can be used for independent (design NCIII in Chapter 4), originally
crosses. For crosses with related founders, designed by Comstock and Robinson
QTL analysis depends on genetic relation- (1952), is the first complex design that
ships within and between crosses. With was exploited for QTL mapping. In NCIII,
constant genetic covariance within a cross, the experimental units are produced from
all individuals have the same genetic rela- BC matings of F2 plants to the two paren-
tionship and combined analysis has no tal lines from which the F2 was derived.
effect on single cross analysis. However, Additive and dominance components of
genetic covariance may differ between variance can be estimated with nearly
crosses, depending on the expected equal precision under the assumption
number of alleles shared by identity by of diploidy, biallelic and equal gene
descent (IBD). It should be noted that cov- frequencies and absence of linkage and
ariance across multiple crosses is not con- epistasis. Cockerham and Zeng (1996)
stant. In these cases, combined analysis extended Comstock and Robinsons
will provide results that are different from ANOVA to include linkage and epistasis
single cross analysis. The problems with for F2 and F3 progenies and developed
multiple cross analysis can be fixed sim- orthogonal contrasts for QTL mapping
ply by the introduction of blocking factors using single-marker ANOVA. Melchinger
for crosses as a random effect for genetic et al. (2007) demonstrated the excep-
relationships. This addresses the constant tional features of NCIII for identification
covariance with each cross and different of QTL contributing to heterosis. They
covariances between crosses, which pro- defined a new type of heterotic gene
vides an appropriate recombination model effect, denoted as the augmented domi-
for crosses to relate the recombination rate nance effect di*, which is equal to the net
to distance and common phenotype model contribution of QTLi to mid-parent het-
across all crosses to allow cross by genetic erosis (MPH). It comprises the dominance
effect interactions. effect d minus half the sum of additive
Ignoring polygenic effects will result dominance epistatic interactions with
in a biased additive effect estimate, detect- genetic background. The novelty of their
216 Chapter 6

approach is that QTL that significantly Pooled analysis provides a means for evalu-
contribute to MPH are identified and both ating, as a whole, evidence for the existence
dominance and epistasis are accounted for. of a QTL from different studies and exam-
An elegant experimental design that ining differences in gene effect of a QTL
can provide a test of significance for the among different populations.
presence of epistasis is the triple testcross Walling et al. (2000) extended least
(TTC) design (Chapter 4), proposed by square interval mapping (Haley et al., 1994)
Kearsey and Jinks (1968), which is an to analysis of combined data from seven
extension of NCIII. In the TTC design, test- porcine populations, while Li, R. et al.
crosses are produced not only with the two (2005) extended the Bayesian QTL analysis
parental lines but also with the F1 derived method (Sen and Churchill, 2001) to analy-
from them. For every progeny from a seg- sis of combined data from four mouse popu-
regating population, e.g. F2 plant or RIL, lations. The former (Walling et al., 2000) is
three sets of data can be generated: (i) the simple and computation and general sta-
average parental testcross performance; (ii) tistical software such as SAS is applicable.
the difference between the parental test- The latter (Li, R. et al., 2005) adopted a new
cross performances; and (iii) the deviation QTL analysis method and this requires spe-
of testcross progenies with the F1 from the cial software. Some earlier studies (Rebai
mean of the parental testcrosses. Kearsey and Goffinet, 1993; Xu, 1998; Liu and Zeng,
et al. (2003) and Frascaroli et al. (2007) 2000) also developed QTL analysis meth-
presented experimental results from QTL ods for data which may be produced from
analyses based on the TTC design with several populations.
data from Arabidopsis and maize, respec- Guo, B. et al. (2006) provided an exam-
tively. Melchinger et al. (2008) gave genetic ple of pooled analysis of data from multiple
expectations of QTL effects estimated with QTL mapping populations. Least square
the TTC design in the presence of epistasis. interval mapping was extended for pooled
With the TTC design, dominance addi- analysis by inclusion of populations and
tive epistatic interactions of individual cofactor markers as indicator variables and
QTL with the genetic background can be covariate variables separately in the multi-
estimated with one-dimensional genome ple linear models. The general linear test
scans. They demonstrated that the limita- approach was applied to the detection of
tion of NCIII in the analysis of heterosis to QTL. Single population-based and pooled
separate QTL main effects and their epi- analyses were conducted on data from two
static interactions with all other QTL can F2:3 mapping populations, Hamilton (sus-
partially be overcome with the TTC design. ceptible) PI 90763 (resistant) and Magellan
They also presented genetic expectations (susceptible) PI 404198A (resistant), for
of variance components for the analysis of resistance to cyst nematode in soybean. It
TTC progeny tested in a split-plot design, was demonstrated that where a QTL was
assuming digenic epistasis and arbitrary shared among populations, pooled analy-
linkage. Kusterer et al. (2007) used the the- sis showed increased LOD values for the
ory to study heterosis for biomass-related QTL candidate region over single popula-
traits in Arabidopsis. tion analyses. Where a QTL was not shared
among populations, however, the pooled
analysis showed decreased LOD values
for the QTL candidate region over single
6.5.3 Pooled analysis population analyses. Pooled analysis on
data from genetically similar populations
Very often, more than two mapping popu- may have a higher power of QTL detection
lations are studied for the same or related relative to single population-based analy-
traits. QTL analysis on pooled data from ses. An important issue emerges from such
multiple mapping populations was sug- pooled analyses: because of this dilution
gested by Lander and Kruglyak (1995). effect, a QTL with strong effects, but exist-
Molecular Dissection of Traits: Theory 217

ing in only one or few populations, may and variance. A few QTL can dramatically
become undetectable if a large number of reduce bias while many predictors (QTL)
populations are pooled. can increase variance. Finally, estimation
of QTL parameters depends on sample size,
heritability and environmental variation.
What can we do with the QTL below
6.6 Multiple QTL the limits of detection? There is a problem
of selection bias: QTL of modest effect can
6.6.1 Reality of multiple QTL sometimes be detected but their effects are
biased upwards when detected (Beavis,
A multiple QTL model is designed to: (i) effec- 1994). To avoid sharp in/out dichotomy,
tively search over the space of genetic archi- caution should be taken about only exam-
tecture for the number and positions of loci, ining the best model and the probability
gene action (additive, dominance, epistasis); that a QTL is in the model should be con-
(ii) select best or better model(s) includ- sidered. Building m detected loci into the
ing what criteria to use and where to draw QTL model will directly allow uncertainty
the line; and (iii) estimate features of model in genetic architecture and model selection
such as means, variances and covariances, over number of QTL.
confidence regions and marginal or condi-
tional distributions (Broman et al., 2003a).
The multiple QTL approach should 6.6.2 Selecting a class of QTL models
have several advantages relative to single
QTL approaches. First, statistical power
There are many parameters to be consid-
and precision can be improved so that the
ered when selecting a class of QTL models
number of QTL detected will increase and
(Broman et al., 2003a): (i) number of QTL,
better estimates of loci (less bias, smaller
single QTL or multiple QTL of known or
intervals) will be provided. Secondly, the
unknown number; (ii) location of QTL with
inference of complex genetic architecture
known positions and widely spaced (no two
including patterns and individual elements
QTL within a marker interval) or arbitrarily
of epistasis can be improved; means, vari-
close; (iii) gene action including additive
ances and covariances can be estimated
and/or dominance effects, epsitatic effects
appropriately and the relative contribu-
(four combinations for diploid species aa,
tions of different QTL can be assessed.
ad, da, dd; more combinations for species
Thirdly, estimates of genotypic values can
with higher levels of ploidy) and phenotypic
be improved with less bias (more accurate)
distribution (normal, binomial, Poisson, etc.).
and smaller variance (more precise).
Consider a phenotype normally distrib-
Is there any limit of estimation for
uted with
QTL? As indicated by Bernardo (2001), the
reasonable number of QTL for an efficient
Pr(Y|Q,q) = N(GQ,s2)
MAS is 10. A larger number such as 50 is
too big. Phenotype is a better predictor than
Typical assumptions that are required for
genotype when there are a large number of
building a model are: (i) normally-distributed
QTL. Increasing sample size does not give
environmental variation, i.e. residuals e (not
multiple QTL any advantage. Also it is
Y!) give a bell-shaped histogram; (ii) genetic
hard to select many QTL simultaneously
value GQ is a composite of m QTL, i.e. Q =
because there are 3m possible genotypes to
(Q1, Q2, , Qm); and (ii) genetic effect uncor-
choose from when a trait is controlled by
related with environment. That is,
m QTL. Genetic linkage between QTL, i.e.
multi-collinearity, will lead to correlated
Y = m + GQ + e, e N(0, s 2)
estimates of gene effects and the precision
of each effect drops as more predictors E(Y|Q,q) = m + GQ, var(Y|Q,q) = s 2
are added. There is a need to balance bias q = (m,GQ,s 2)
218 Chapter 6

Considering multiple QTL, the genotypic 6.6.3 Multiple QTL with epistasis
value can be partitioned (assuming no
epistasis) as When a trait is controlled by multiple QTL,
it is very possible that some epistasis may
GQ = qQ (1) + qQ (2) + ... + qQ ( m ) occur between loci. With two QTL involved,
or GQ = q j
Q( j )
there are four types of epistasis, aa, ad, da
and dd. With more than two loci involved,
there would be higher-order epistasis.
Thus genetic variance can be partitioned as Considering genetic models with
epistasis, genotypic values can be parti-
var(GQ ) = s G2 = s
j
2
G( j ) , tioned with epistasis as

s G2 (j ) = var(qQ ( j ) ) GQ = qQ(1) + qQ(2) + qQ(1,2)

with partitioned heritability h2 This genetic variance can be partitioned


accordingly,
s G2 s G2 ( j )
h2 =
s G2 + s 2
= j
s G2 + s 2
var(GQ) = s G2 = s G(1)
2
+ sG(2)
2
+ s G(1,2)
2

For 2-QTL interactions


With many optional models for selec-
tion, alternative QTL models should be
compared. The comparison can be based
GQ = q + q
j
1Qj
j
2Qj

on the residual sum of squares (RSS), infor-


mation criteria such as Bayes information where q1Qj =qQ(j1), q2Qj = qQ(j1, j2) ; j1, j2 = 1, , mj.
criteria (BIC) and Bayes factors (Broman With an extra subscript k to keep track-
et al. 2003a). ing the order of loci, the genetic variance is
partitioned as
1. Comparing models can be based on the
RSS, which has a nice property in that it sG2 = s1G
2
+ s 2G
2

never increases as the model grows in size.


The goal is to obtain a small RSS with the
simplest model.
2
s kG = s j
2
kGj
2
, s kGj = var(q kQj )

2. Classical linear models that can be


used for comparing models include mean Considering m QTL (m > 2) with higher
squared error (MSE), Mallows Cp and order epistasis, it sums over order k and
adjusted R2. over QTL index j,

q
3. Models can be compared based on re-sam-
GQ = kjQ
pling techniques, which include bootstrap
k j
(re-sampling with replacement from data),
cross validation (repeatedly dividing data qkjQ = q(j1, j2, , jk)Q
into estimation and test sets) and sequen-
tial permutation tests which are conditional Genetic variance is partitioned as
on the QTL already in the model and stops
when added QTL are not significant.
4. There are some information criteria for
G2 = k
2
kG , kG
2
= j
2
kGj ,

comparing models built on RSS and likeli- 2


= var( kQj )
kGj
hoods, which included Akaike information
criteria (AIC), Bayes/Schwartz information With so many parameters, a large sample
criteria (BIC), BIC-delta (BICd) and Hannon size is needed for even modestly reasonable
Quinn information criteria (HQIC). estimates.
Molecular Dissection of Traits: Theory 219

QTL mapping incorporating multiple 6.7.1 Advantages of Bayesian mapping


QTL with epistasis have received much
attention with various statistical approaches Bayesian methodology has become popular
developed (e.g. Doebley et al., 1995; Jannink in QTL mapping because of the availabil-
and Jansen, 2001; Boer et al., 2002; Carlborg ity of simulation-based MCMC algorithms.
and Andersson, 2002; Yi and Xu, 2002; Yang, MCMC provides an approach for achieving
2004; Baieri et al., 2006; Alvarez-Castro and a number of analytic goals that are other-
Carborg, 2007). Examples of QTL mapping wise difficult to achieve (Xu, S., 2002).
software that can handle multiple QTL with Bayesian mapping allows the use of prior
epistatic effects are QTL CARTOGRAPHER and knowledge of QTL parameters and the pos-
MULTIQTL. terior variances and credibility intervals for
the estimated QTL parameters is automati-
cally obtained. With MCMC approaches,
it is possible to perform linkage analysis
6.7 Bayesian Mapping with any number of marker loci, multiple-
trait loci and multiple genomic segments.
A coherent approach to statistical model- At the same time, these approaches allow
ling is provided by the Bayesian paradigm, the use of pedigrees of arbitrary size and
which has been applied successfully in complexity. In addition to mapping the
various contexts (Malakoff, 1999), includ- loci, Bayesian reversible-jump MCMC
ing problems in genetics (Shoemaker approaches allow one to estimate the
et al., 1999; Huelsenbeck et al., 2001; number of loci and associated individual-
Sorensen and Gianola, 2002; Xu, S., locus model parameters as well as covari-
2003). In Bayesian analysis everything ate effects in a joint linkage and oligogenic
is treated as an unknown variable with a segregation analysis. This is particularly
prior distribution. A variable can be clas- useful when multiple contributing loci are
sified into one of two classes: observa- considered, but the number is unknown. It
bles and unobservables. The observables is advantageous to be able to estimate the
include data (phenotypic values, marker number of contributing loci, rather than to
scores and pedigrees, etc.). The unob- fix this number a priori. The compromise
servables include parameters. Generally, made in order to achieve these goals lies
this approach includes a careful consid- in the overall approach, which is based
eration of the structure of the problem at on statistical sampling rather than exact
hand, which then culminates in a model enumeration of all possible underlying but
(likelihood) and in prior beliefs of unob- unobserved genotypes.
servables expressed in a form of a prob-
ability distribution. Given the likelihood
and prior beliefs, Bayesian machinery 6.7.2 Bayesian mapping statistics:
then delivers exactly the relevant infor- a brief overview
mation through the posterior probability
distribution of the unobservables. The With observed phenotypic trait values,
prior use can range from non-informative markers and linkage map data (Y, X) and
to very informative distributions and unknown quantitative genotype (Q), we can
should reflect the knowledge available study the unknowns (q, l, Q), where l is
(e.g. derived from the earlier studies or QTL location and q their genetic effects.
from existing theory). The simulation
(integration) method can be used to gener- Q Pr(Q|Yi,Xi,q,l)
ate an approximate sample from the poste-
rior distribution. The sampling algorithm Genotypes Q for every individual at m QTL
tailored for a specific model is called the can be sampled and their positions, marginal
MCMC sampler. effects and epistatic effects q can be tested.
220 Chapter 6

The properties of the posterior distribution From full conditionals for a model with
can be studied by using prior distributions mQTL, it is hard to sample from joint poste-
that are independent between QTL and by rior probability
drawing samples from posterior probabili-
ties. The conditional posterior probability Pr(l,Q,q|Y,X) = Pr(q) Pr(l) Pr(Q|X,l)
for multiple imputation or MCMC is shown Pr(Y|Q,q)/constant
in the equation at the bottom of the page.
To construct a Markov chain around But it is easy to sample parameters from full
posterior distribution, we need posterior conditionals as following:
probability as a stable distribution of the
Markov chain. In practice, the chain tends Pr(q|Y,X,l,Q) = Pr(q|Y,Q) = Pr(q)
towards stable distribution. The MCMC Pr(Y|Q,q)/constant
algorithm starts with given values of the (for genetic effects)
parameters in the prior distributions and Pr(l|Y,X,q,Q) = Pr(l|X,Q) = Pr(l)
the initial values for all the unknowns Pr(Q|,l)/constant
generated from their prior distributions (for QTL locus)
Pr(Q|Y,X,l,q) = Pr(Q|X,l) Pr(Y|Q,q)/
(l,Q,q,m) Pr(l,Q,q,m|Y,X)
constant (for QTL
genotypes)
and m-QTL model components from full
conditionals are updated with the following
updating steps:
6.7.3 Bayesian mapping methods
update genetic effects q given geno-
types and traits;
When fully structuring the gene map-
update locus l given genotypes and
ping problem in the Bayesian framework,
marker map; and
the types of models considered to be suit-
update genotypes Q given traits, marker
able (e.g. no epistasis) need to be defined
map, locus and effects.
a priori. A prior opinion concerning the
This generates the following chain of plausible values of the model dimension
estimates: (the number of parameters) in addition to
plausible values of the parameters them-
(l, Q, q, m)1 (l, Q, q, m)2 (l, Q, q, m)N selves then need to be incorporated. This
includes the prior distributions attached to
To ensure that the chain mixes well, the the number of influential genes (QTL) and to
initial values may have low posterior prob- their effects, which together reflect the prior
ability at the period of burn-in (initial itera- beliefs concerning sensitivity towards small
tions of the MCMC process that are used to gene effects. In general, the MCMC analysis
locate the sampler in this part of the sample requires specific prior or proposal, distribu-
space). tions and involves many iterations. In order
After the burn-in period, realizations of for the MCMC process to provide useful esti-
(l, Q, q, m) are sampled from the chain and mates, it is necessary for the sampler to move
stored. Once enough realizations have been around the sample space successfully.
sampled, empirical posterior distributions Bayesian mapping was initiated
for parameters in (l, Q, q, m) can be created by Hoeschele and VanRaden (1993a,b)
from the posterior sample. and subsequently developed by Satagopan

Pr(Q X , l )Pr(Y Q, q )Pr(l X )Pr(q )


Pr(q , l, Q Y , X ) =
Pr(Y X )
Molecular Dissection of Traits: Theory 221

et al. (1996) and Sillanp and Arjas effects to be reduced towards zero, while
(1998, 1999). Since then, various Bayesian QTL with large effects are estimated with
mapping methods have been developed virtually no shrinkage. To do this, each
for different models and genetic sys- marker effect is allowed to have its own var-
tems, including the reverse-jump MCMC iance parameter, which in turn has its own
Bayesian method (Green, 1995; Satagopan prior distribution so that the variance can be
et al., 1996; Sillanp and Arjas, 1998, estimated. Henceforth, prior distributions
1999; Sillanp and Corander, 2002), model for all parameters are firstly assumed, i.e.
selection framework (Yi, 2004; Yi et al., p(b0) 1, p(se2) 1/se2, p(bj) = N(0,sj2) and
2005, 2007) and the shrinkage estimation p(sj2) 1/sj2 (j = 1, , q); then conditional
(SE) method (Xu, S., 2003; Zhang and posterior distributions (CPD) for all param-
Xu, 2004; Wang, H. et al., 2005). Wu and eters and hyperparameters are deduced,
Lin (2006) concluded that the SE method i.e. CPD for bj is N(bj, sj2) where
allows analytical strategies for QTL map-
ping to expand to whole-genome mapping 1
n

of epistatic QTL by use of all markers.
However, the number of variables involved
bj =


i =1
x ij2 + s e2 / s 2j

is so large that the computation time is too n q
long. To solve this problem, Zhang and Xu
(2005) proposed the penalized maximum

i =1
x ij y i bo
x
kj
b
ik k

likelihood (PML) method. Yi and Shriner
(2008) reviewed Bayesian mapping meth- and
ods and associated computer software for
mapping multiple QTL in experimental 1
n

crosses. They compared and contrasted
the various methods to clearly describe the
s 2j =


i =1
x ij2 + s e2 / s 2j s e2

relationship between them.
The CPD for sj2 is an inverted chi-square dis-
tribution; and finally, we sample observations
Bayesian shrinkage estimation (BSE) method of all parameters from the corresponding CPD.
When the sampling chain converges to the sta-
With BSE, the number of effects that can
tionary distribution, the sampled parameters
be handled can be larger than the number
actually follow the joint posterior distribu-
of observations. The BSE method has been
tion. When the sample of a single-parameter
extended to map multiple QTL (Zhang
is considered, this univariate sample is actu-
and Xu, 2004; Wang, H. et al., 2005) and
ally the marginal posterior sample for this
epistatic QTL.
parameter. Therefore, the number, positions
Assuming m QTL, Q1, Q2, and Qm,
and effects of QTL can be estimated.
the model for the quantitative trait value
Provided the jth QTL is false (i.e. effect
can be written as
size is zero), the estimate of sj2 will tend to
q zero and the mean and variance of the pos-
y i = b0 + x b +e
j =1
ij j i
terior distribution for bj regress to zero so
that the sampled observations of bj are close
to zero. Note that updating the variance sj2
where yi is the quantitative trait value for for the jth QTL is important because this
individual i, b0 is the mean, bj is the main either overcomes the shortcomings of the
effect of Qj, xij is coded as 1/2 or 1/2 if the fixed ridge parameter in ridge regression or
genotype of Qj is Qj Qj or Qj qj (j = 1, , m), reflects the information of the data. If bj
and m is not equal to the number of markers 0 in the formula sj2 = bj2/c v2 = 1, then sj2 0;
for multiple-marker analysis but the number however, dividing bj2 by a chi-square vari-
of marker intervals for multiple QTL analy- able allows sj2 a chance to recover because
sis. The BSE method allows spurious QTL c v2 = 1 can be very small by chance.
222 Chapter 6

Bayesian shrinkage analysis was used decrease the running time. PML is different
to develop a QTL model for mapping multi- from the ML method because the function
ple QTL for dynamic traits (such as growth to be maximized is a penalized likelihood
trajectories) under the maximum likelihood function rather than a likelihood function.
framework (Yang and Xu, 2007). The growth Penalized likelihood is similar to the poste-
trajectory was fitted by Legendre polynomi- rior distribution of the parameters, with the
als. The method combines the shrinkage prior distribution of the parameters serving
mapping for individual quantitative traits as the penalty. So PML method depends
with the Legendre polynomial analysis for on the prior distribution. It estimates
dynamic traits. The multiple-QTL model means and variances of prior distributions
was implemented in two ways: (i) a fixed- of QTL effects together with QTL effects,
interval approach where a QTL is placed i.e. QTL effects can be estimated by using the
in each marker interval; and (ii) a moving- equation shown at the bottom of the page.
interval approach where the position of a If, sj2 0 then bj mj. Additionally, mj = bj/
QTL can be searched in a range that covers (m + 1), so bj 0. This explains the reason
many marker intervals. Simulation showed why the estimate of a false-QTL effect is
that the Bayesian shrinkage method gener- close to zero. Note that the PML method
ated much better signals for QTL than the can select variables in the estimation
interval mapping approach. of parameters, handle a model with the
number of considered effects ten times
Model selection larger than the sample size (Zhang and Xu,
2005; Hoti and Sillanp, 2006) and be a
A composite model space approach was refined method of mapping QTL (Yi et al.,
proposed by Yi (2004) for mapping multi- 2006) because of small residual variance
ple non-epistatic QTL and extended by Yi at the beginning of parameter estimation.
et al. (2005) to epistatic QTL mapping for However, the PML method cannot detect
continuous traits. The key advantage of this epistasis between nearby markers because
approach is that it provides a convenient of their multi-collinearity. For the real data
way to reasonably reduce the model space analysis, two approaches are available for
and to construct efficient algorithms for epistatic analysis. With the PML method
exploring the complicated posterior distri- along with the variable-interval approach,
bution. Yi et al. (2007) proposed a Bayesian whole-genome mapping of epistatic QTL
model selection approach of genome-wide may be carried out by the use of all markers
interacting QTL for ordinal traits in experi- or the BSE method along with the variable-
mental crosses. They first developed a interval approach can be used to map epi-
Bayesian ordinal probit model for multiple static QTL.
interacting QTL on the basis of the com- MCMC and especially the Gibbs sam-
posite model space framework and then pler allows for the efficient exploration of
used this framework to develop an efficient very complex likelihood surfaces and cal-
MCMC algorithm for identifying multiple culation of Bayesian posterior distributions.
interacting QTL for ordinal traits. For these reasons, Walsh (2001) predicted
that the next 20 years will likely be marked
Penalized maximum likelihood (PML) method by a strong influx of Bayesian methods
replacing their likelihood counterparts. In
Integrating the shrinkage estimation with contrast to classical methods, the Bayesian
maximum likelihood (ML) method can MCMC approach necessitates more human

1
n
n q
b j =

x + s /s
2
ij
2
e

2
j

x ij y i b0
x ik bk + ms e2 / s 2j

i =1 i =1 kj
Molecular Dissection of Traits: Theory 223

effort and care to ensure that the simulation (2002), Flint-Garcia et al. (2003), Breseghello
produces a representative sample from the and Sorrels (2006b), De Silva and Ball (2007),
posterior distribution. This requires careful Mackay and Powell (2007), Oraguzie et al.
monitoring of the convergence and the mix- (2007), Zhu et al. (2008), Buckler et al. (2009),
ing properties of the MCMC sampler. Myles et al. (2009) and Yu et al. (2009).

6.8 Linkage Disequilibrium Mapping 6.8.1 Why linkage disequilibrium


mapping?
QTL mapping methods most frequently
used so far are largely based on segregat- Allele association between marker loci and
ing populations derived from two parental association between marker alleles and
lines, although some of them may be modi- phenotypes can be designated as marker
fied to use multiple populations simulta- marker association and markertrait asso-
neously. Linkage disequilibrium (LD) or ciation, respectively (Xu, Y., 2002). As
association mapping to be discussed in discussed previously, the objective of link-
this section can be exploited to identify age mapping is to identify simply inherited
QTL using collections of germplasm, culti- markers in close proximity to genetic fac-
vars and all available genetic and breeding tors affecting QTL. This localization relies
materials, by which molecular dissection of on processes that create a statistical associa-
complex traits can be more closely linked tion between marker and QTL alleles and
up with plant breeding programmes. that selectively reduce that association as a
LD is also known as gametic phase dis- function of the distance between the marker
equilibrium, gametic disequilibrium and and QTL. Recombination in meiosis that
allelic association. Simply stated, LD is leads to DHs, F2 or RILs reduces the asso-
the non-random association of alleles at ciation between a given QTL and markers
different loci. It is the correlation between distant from it. Unfortunately, derivation
polymorphisms (e.g. single nucleotide poly- of these populations (Chapter 4) requires
morphisms (SNPs)) that is caused by their relatively few meiosis, such that even mark-
shared history of mutation and recombina- ers that are far from the QTL (e.g. 10 cM)
tion. The terms linkage and LD are often remain strongly associated with it. Such
confused. Linkage refers to the correlated long-distance association hampers precise
inheritance of loci through the physical con- localization of the QTL. One approach to
nection on a chromosome, whereas LD refers fine mapping is to expand the genetic map,
to the correlation between alleles in a popu- for example, through the use of RILs and
lation. The confusion occurs because tight advanced intercross lines (Chapter 7).
linkage may result in high levels of LD. For Although designed segregating popula-
example, if two mutations occur within a few tions are easy to create, they come with a
bases of one another, they undergo the same number of disadvantages (Malosetti et al.,
pressures of selection and drift through time. 2007). First, the amount of segregating
Because recombination between the two genetic variation within the population
neighbouring bases is rare, the presence of is limited, because at most two alleles per
these SNPs is highly correlated and the tight locus can segregate in a diploid species,
linkage will result in high LD. In contrast, where in the absence of allele polymor-
SNPs on separate chromosomes experience phisms between the parents no QTL can
different selection pressures and independ- be identified. Secondly, the genetic back-
ent segregation; these SNPs thus have a much grounds within which mapping studies
lower correlation or level of LD. take place are generally not representative
This section focuses on the basic con- of the backgrounds used in elite germplasm
cepts of LD mapping. Important references (Jannink et al., 2001). In order to increase the
for this section include Jannink and Walsh genetic polymorphism, the parental lines are
224 Chapter 6

usually selected from highly diverse germ- regarded as an initial screening for identi-
plasm. Thirdly, the relatively low number of fication of QTL (Bar-Hen et al., 1995; Virk
generations after maximum LD, where the et al., 1996). The development of saturated
maximum LD is reached in the F1, implies a linkage maps and highly informative micro-
reduced number of sampled meioses within satellite and SNP markers in plants makes
designed populations (typically a few hun- it possible to systematically survey marker
dred), leading to relatively long stretches trait association on a whole-genome scale.
of chromosome being in LD. Consequently, Compared with transmission-based linkage
the characteristic size of confidence inter- mapping, LD mapping provides more oppor-
vals for QTL locations is between 10 and tunities for breeding applications since
20 cM (Darvasi et al., 1993). In addition, hundreds of germplasm accessions that are
germplasm resources and breeding popula- useful as parents in breeding are involved.
tions that have been accumulating in breed- An important asset of LD mapping strategies
ing programmes with available phenotypic is the straightforward utilization of large
information cannot be used so that genetic amounts of historical phenotypic data that
mapping and breeding are usually two sepa- are available for mapping efforts at no or lit-
rate, independent procedures. tle extra costs, especially when evaluation
LD mapping takes advantage of events of the trait is time and money consuming,
that created association in the relatively dis- as is the case with mean yield, adaptabil-
tant past. Assuming many generations and ity and stability. As an increasing number
therefore meioses have elapsed since these of germplasm accessions are evaluated with
events, recombination will have removed molecular markers and phenotyped for agro-
association between a QTL and any mark- nomic traits, it is essential to consider using
ers not tightly linked to it. LD mapping thus the LD mapping approach to map genes or
allows for much finer mapping than stand- at least to provide a pre-screen for linkage-
ard biparental cross approaches. At a fun- based genetic mapping (Xu, Y., 2002).
damental level, both LD and linkage rely on
the co-inheritance of adjacent DNA variants,
with linkage capitalizing on this by identi-
fying haplotypes that are inherited intact 6.8.2 Measurement of linkage
over several generations and LD relying disequilibrium
on the retention of adjacent DNA variants
over many generations. Thus, LD studies A variety of statistics have been used to
can be regarded as very large linkage stud- measure LD. Delvin and Risch (1995) and
ies of unobserved, hypothetical pedigrees Jorde (2000) reviewed the relative advan-
(Cardon and Bell, 2001). LD analysis has the tages and disadvantages of each statisti-
potential to identify a single polymorphism cal approach. Here, we introduce the two
within a gene that is responsible for the dif- most common statistics for measuring LD:
ference in phenotype and is perfectly suited r2 and D'. Consider a pair of loci with alleles
for sampling a wide range of alleles from A and a at locus one and B and b at locus
germplasm collections with high resolu- two, with allele frequencies pA, pa, pB and pb,
tion (Flint-Garcia et al., 2003). A less obvi- respectively. The resulting haplotype fre-
ous additional attractive property is that quencies are pAB, pAb, paB and pab. The basic
LD mapping approaches offer possibilities component of all LD statistics is the differ-
for QTL identification in polyploidy crops ence between the observed and the expected
with hard to model segregation patterns haplotype frequencies,
(Malosetti et al., 2007).
For markertrait association, differences Dab = (pAB pApB)
in both phenotype and allele frequency can
be identified in a group of cultivars that The distinction between these statistics lies
are derived from a common ancestral gene in the scaling of this difference (Flint-Garcia
pool (Xu and Zhu, 1994). The procedure is et al., 2003).


Molecular Dissection of
Traits: Theory 225

The first of the two measures, r2, also LD when the polymorphisms are not com-
described in the literature as 2, is calcu- pletely correlated, but there is no evidence
lated as of recombination. One way this type of LD
structure can develop is when the muta-
tions occur on different allelic lineages.
( Dab )2
r2 = This situation can reflect the same recom-
p A pa p B pb binational history but different mutational
histories. This is the situation in which r2
It is convenient to consider r2 as the square and D' act differently, with D' still equal to
of the correlation coefficient between the 1, but where r2 can be much smaller. Figure
two loci. However, unless the two loci 6.3C shows an example of polymorphisms
have identical allele frequencies, a value in linkage equilibrium. If the sites are
of 1 is not possible. Statistical significance linked, then equilibrium could be produced
(P-value) for LD is usually calculated using by a recombination event between the two
either Fishers exact test to compare sites sites. In this case, the recombinational his-
with two alleles at each locus or multi- tory differs for the various haplotypes but
factorial permutation analysis to compare the mutational history is the same. Hence,
sites with more than two alleles at either both r2 and D' will be zero.
or both loci. Although neither r2 nor D' perform
Alternatively, the LD statistic D' extremely well with small sample sizes
(Lewontin, 1964) is calculated as and/or low allele frequencies, each has
distinct advantages. Whereas r2 summa-
( Dab )2 rizes both recombinational and mutational
D = for Dab < 0 history, D' measures only recombinational
min(p A p b , p a p B )
history and is therefore the more accurate
statistic for estimating recombination dif-
( Dab )2
D = for Dab > 0 ferences. However, D' is strongly affected
min(p Ap B , p ap b ) by small sample sizes, resulting in highly
erratic behaviour when comparing loci
D' is scaled based on the observed allele fre- with low allele frequencies. This is due to
quencies, so it will range between 0 and 1 the decreased probability of finding all four
even if allele frequencies differ between the allelic combinations of low frequency poly-
loci. D' will only be less than 1 if all four morphisms even if the loci are unlinked.
possible haplotypes are observed; hence, a For the purpose of examining the resolu-
presumed recombination event has occurred tion of association studies, the r2 statistic is
between the two loci. preferred, as it is indicative of how markers
The statistics r2 and D' reflect different might correlate with the QTL of interest.
aspects of LD and perform differently under There are two common ways to visual-
various conditions. Figure 6.3 presents three ize the extent of LD between pairs of loci

scenarios of how linked polymorphisms (Flint-Garcia et al., 2003). LD decay plots


may exhibit different levels of LD (Flint- are used to visualize the rate at which LD
Garcia et al., 2003). Figure 6.3A shows an declines with genetic or physical distance

example of absolute LD, where the two poly- (Fig. 6.4). Scatter plots of r2 values versus
morphisms are completely correlated with genetic/physical distances between all pairs
one another. An instance when absolute of alleles within a gene, along a chromo-
LD can develop is when two linked muta- some or across the genome are constructed.
tions occur at a similar point in time and Alternatively, disequilibrium matrices are
no recombination has occurred between effective for visualizing the linear arrange-
the sites. In this case, the history of muta- ment of LD between polymorphic sites
tion and recombination for the sites is the within a gene or loci along a chromosome
same. Both r2 and D' have a value of 1 in this (Plate 2). It should be noted that LD decay is
scenario. Figure 6.3B shows an example of unpredictable. Both plot types highlight the


226 Chapter
6

A
1 2
Locus 1

Locus 2
6 0
0 6
lDl = 1
r2 = 1

B
1 2
Locus 1
Locus 2

6 0
3 3
lDl = 1
r 2 = 0.33

C
1 2
Locus 1
Locus 2

3 3
3 3
lDl = 0
r2 = 0

Fig. 6.3. Hypothetical scenarios of linkage disequilibrium (LD) between linked polymorphisms caused by

different mutational and recombinational histories demonstrating the behaviour of the r 2 and D' statistics.
Images in the left column represent the allelic states of two loci. The middle column represents the

2 2 contingency table of haplotypes and the resulting r 2 and D' statistics. The right column represents
a possible tree responsible for the observed LD present. (A) An example of absolute LD, where the
two polymorphisms are completely correlated with one another. (B) An example of LD when the
polymorphisms are not completely correlated, but there is no evidence of recombination. (C) An example
of when polymorphisms are in linkage equilibrium. Modified from Rafalski (2002).

random variation in LD owing to a variety including the allele frequency of QTL, its
of forces discussed below. effects, its location and its population asso-
The limits of linkage analysis and LD ciation with a known marker locus.
mapping when they are used alone can be
overcome by a joint mapping strategy as
demonstrated by Wu and Zeng (2001) in
which a random sample from a natural pop- 6.8.3 Factors affecting linkage
ulation and the open-pollinated progeny disequilibrium
of the sample were analysed jointly. The
joint linkage and LD mapping strategy was In a large, randomly mated population
extended to map QTL segregating in a natu- with loci segregating independently,
ral population (Wu et al., 2002b). The exten- but in the absence of selection, muta-
sion allows for simultaneous estimates of a tion or migration, polymorphic loci will
number of genetic and genomic parameters be in linkage equilibrium (Falconer and
Molecular Dissection of Traits: Theory 227

1.0 ing drift derives from fewer individuals


than its present size. Secondly, by consider-
0.8 ing an individual with a new mutation as
a founder, we see that its descendants will
0.6 predominantly receive the mutation and
loci linked to it in the same phase. Linkage
r2

marker alleles will therefore be in LD with


0.4
the mutant allele. Finally, an extreme case
arises in the F2 population derived from
0.2
the cross of two inbred lines. Here, all indi-
viduals derived from a single F1 founder
0.0 genotype and association between loci can
0 2000 4000 6000 8000
be predicted on the basis of their mapping
Distance between SNPs (bp)
distance.
Fig. 6.4. Linkage disequilibrium (LD) decay plot 2. Mutation: immediately after a mutation
of shrunken 1 (sh1) in maize. LD, measured as occurs, it is in LD with all other loci: the new
r 2, between pairs of polymorphic sites is plotted mutation only occurs on a single haplotype.
against the distance between the sites. For this In successive generations, recombination
particular gene, LD decayed within 1500 bp. Data causes LD to decay as new haplotypes are
from Remington et al. (2001). created, but this process takes a long time
for closely linked markers. Most of the poly-
morphisms we observe are old: many gener-
Mackay, 1996). Mutation provides the raw
ations are required for allele frequencies to
material for producing polymorphisms
rise to a frequency at which we detect them.
that will be in LD. Recombination is the
Therefore, most pairs of polymorphic loci
primary force that eliminates both linkage
show little LD originating from mutation
and association over generations and the
unless closely linked.
main phenomenon that weakens intra-
3. Population structure: the presence of
chromosomal LD, whereas interchromo-
subgroups in the sample in which individu-
somal LD is broken down by independent
als are more closely related to each other
assortment. The rate for recombination to
than the average pair of individuals taken
erode LD is slow between closely linked
at random in the population. Substructure
loci. For example, for loci that are 1 cM
is a common cause of covariance of poly-
apart, more than 50% of the initial dis-
genic effects because relatives tend to share
equilibrium remains after 50 generations
marker and gene alleles genome-wide.
(Falconer and Mackay, 1996). However,
LD arises in structured populations when
LD decays with time but not in the case
allelic frequencies differ at two loci across
of LD beyond 510 cM except due to
subpopulations, irrespective of the linkage
epistasis. A variety of mechanisms gener-
status of the loci. Admixed populations,
ate LD, including linkage, selection and
formed by the union of previously separate
admixture, several of which can operate
populations into a single panmictic one, can
simultaneously. Some common mecha-
be considered a case of structured popu-
nisms are summarized from Jannink and
lation where substructuring has recently
Walsh (2002), Flint-Garcia et al. (2003)
ceased. As gene flow between individuals of
and Mackay and Powell (2007).
genetically distinct populations is followed
1. Founder effect: when populations are by intermating, an admixture results in the
expanded from a small number of found- introduction of chromosomes of different
ers, the haplotypes present in the founders ancestry and allele frequencies.
will be more frequent than expected under 4. Selection: this changes allele frequencies
equilibrium. Three special cases are note- at QTL determining the selected trait, which
worthy. First, genetic drift affects LD by this causes LD between the selected allele at a
mechanism in that a population experienc- locus and linked loci. This process, called
228 Chapter 6

hitchhiking, generates LD among markers 7. Migration: if two populations, differing


around the selected locus. Moreover, selec- in allele frequency, are brought together, LD
tion for or against a phenotype controlled is created. Less extreme population admix-
by two unlinked loci (epistasis) may result ture or migration also generates LD.
in LD despite the fact that the loci are not
physically linked. Negative LD will occur
between loci affecting a trait in populations 6.8.4 Methods for linkage disequilibrium
under stabilizing or directional selection mapping
as a result of the Bulmer effect. Positive
LD will occur between loci affecting a The transmission disequilibrium
trait under disruptive selection. When loci test and derivatives
interact epistatically, haplotypes carrying
the allelic combination favoured by selec- The first and most robust method for distin-
tion will also have higher frequencies than guishing QTLmarker associations arising
expected. from LD between closely linked markers
5. Mating patterns: population mating pat- from spurious background associations is
terns can strongly influence LD. Generally, the transmission disequilibrium test (TDT)
LD decays more rapidly in outcrossing (Spielman et al., 1993). Neither linkage
species as compared to selfing species alone nor disequilibrium alone (i.e. between
(Nordborg, 2000). This is because recom- unlinked markers) will generate a positive
bination is less effective in selfing species, result hence the TDT is an extremely robust
where individuals are more likely to be way of controlling for false positives.
homozygous, than in outcrossing species. The single progeny in each family is
LD breaks down rapidly with random mat- usually selected for an extreme phenotype.
ing (Pritchard and Rosenberg, 1999). Parents and progeny are genotyped, but
6. Genetic drift: population size plays an only parents heterozygous at the marker
important role in determining the level locus are included in the analysis. From
of LD. In small populations, the effects of each parent, one allele must be transmit-
genetic drift result in the consistent loss of ted to the progeny and one is not. Over all
rare allelic combinations which increase LD families a count is made of the number of
levels. When genetic drift and recombina- transmissions and non-transmissions. In
tion are in equilibrium, the absence of linkage between QTL and
marker, the expected ratio of transmission
1 to non-transmission is 1:1. In the presence
r2 =
1 + 4Nc of linkage it is distorted to an extent that
depends on the strength of LD between the
where N is the effective population size and marker and QTL. The distortion is tested in
c is the recombination fraction between sites a chi-squared test. Power depends on the
(Weir, 1996). Therefore, LD can be created in strength of LD and on the effectiveness of
populations that have recently experienced selection of extreme progeny in driving seg-
a reduction in population size (bottleneck) regation away from expectation (Mackay
with accompanying extreme genetic drift and Powell, 2007).
(Dunning et al., 2000). During a bottleneck, This elegant test is extremely robust
only few allelic combinations are passed on to the effects of population structure, par-
to future generations. This can generate sub- ticularly in human genetics, but is suscep-
stantial LD. The activities of plant breeders tible to an increase in false positive results
themselves can result in bottlenecks the generated by genotype error and biased
introduction of a new disease resistance or allele calling (Mitchell and Chakravarti,
agronomic trait might result in a period of 2003). This risk can be reduced by model-
breeding in which a small number of paren- ling genotype errors and missing data in
tal lines are used extensively, generating the analysis or by comparing the transmis-
some degree of LD. sion ratio for extreme phenotypes with that
Molecular Dissection of Traits: Theory 229

for control individuals or for the opposite molecular marker data. Many individuals or
extreme. The TDT has been extended to lines will not belong uniquely to one popu-
study haplotype transmissions, quantita- lation, but will be the descendents of crosses
tive traits, the use of sib pairs rather than between two or more ancestral populations.
parents and progeny and information from STRUCTURE also estimates the proportion
extended pedigrees. of ancestry attributable to each popula-
In crops, parental and progeny lines are tion. Following allocation of individuals to
usually separated by several generations of populations, the test for association is car-
gametogenesis rather than by one. In this ried out in a model fitting exercise. Here,
case, the TDT is still valid, but might no the principle is that variation attributable
longer be so robust: the process of breed- to population membership is accounted for
ing might itself distort segregation patterns. first, using estimates of population mem-
A family-based association test that is appli- bership from STRUCTURE and then the pres-
cable to plant breeding programmes has ence of any residual association between
been proposed by Stich et al. (2006). The the marker and phenotype is tested. For
authors point out that for candidate gene example, to test for association between a
studies, this method is more cost-effective quantitative trait and a microsatellite, the
than the alternative methods described trait is first regressed on the estimated coef-
below given that no additional control mark- ficients of population membership and then
ers are required. However, some power will on the marker coded as a factor as if in an
be lost because only progeny derived from analysis of variance (Aranzana et al., 2005).
F1s known to have a heterozygous marker Alternatively, the groups can be integrated
genotype are informative. Laird and Lange as an extra factor or a set of covariables in
(2006) reviewed TDT and other family- a statistical model relating phenotype to
based association tests. genotype (Thornsberry et al., 2001; Wilson
et al., 2004).
Structured association As a valid alternative to the use of
STRUCTURE, classical multivariate analysis
Structured association provides a sophis- methods can be used to classify genotypes.
ticated approach to detecting and control- In that case a matrix of genetic/genotypic
ling population structure (Pritchard et al., distances is calculated from molecular
2000a,b; Falush et al., 2003; Mackay and marker information and used as input
Powell, 2007). To deal with non-functional, for clustering and/or scaling techniques
spurious associations between a phenotype (Ivandic et al., 2002; Kraakman et al., 2004).
and an unlinked candidate gene caused For collections of cultivars and breeding
by the presence of population structure lines, genotypic relationships as obtained
and unequal distribution of alleles within from the pedigree or from similarities in
sub-populations, several methods have neutral marker profiles (Yu et al., 2006) can
been proposed. Pritchard et al. (2000b) be translated into distances that are subse-
proposed a method of testing association quently analysed by cluster analysis. The
that depends on the inferred ancestries of groups detected by such a cluster analysis
individuals. Ancestries were inferred by can be interpreted as representing popula-
a Bayesian method proposed by Pritchard tion structure and form an approximation to
et al. (2000a). Thornsberry et al. (2001) the original relationships between the geno-
extended this method to deal with a quanti- types as present before the grouping. The
tative trait and studied a candidate gene for identified groups can be used as a kind of
the control of flowering time in maize. correction factor in association analyses.
The computer program STRUCTURE
(http://pritch.bsd.uchicago.edu/software/ Principal component analysis
structure2_1.html; Pritchard et al., 2000a)
uses computationally intensive methods to A method termed EIGENSTRAT (Price et al.,
partition individuals into populations given 2006) is based on principal component
230 Chapter 6

analysis (PCA) across a large number of Mixed models


biallelic control markers with a genome-
wide distribution. The PCA summarizes Parisseaux and Bernardo (2004) show how
the variation observed across all markers to integrate the pedigree-based relation-
into a smaller number of underlying com- ship matrix into a QTL mapping analysis
ponent variables. These can be interpreted within a mixed-model framework. Yu et al.
as relating to separate, unobserved, sub- (2006) proposed the QK mixed-model LD
populations from which the individuals in mapping approach that promises to correct
the dataset (or their ancestors) originated. for LD caused by population structure and
The loadings of each individual on each family relatedness. In this approach, both a
principal component describe the popula- marker-based relationship matrix (K) and a
tion membership or the ancestry of each factor representing population structure are
individual. However, these estimates are included in a mixed model for association
not ancestral proportions (values can be analysis of a single trait in a single environ-
negative) in the same way that estimates of ment. The population structure matrix Q
ancestry from STRUCTURE are. The loadings was calculated by the software STRUCTURE,
are used to adjust individual candidate which gives for each individual under con-
marker genotypes (coded numerically) sideration the probability of membership in
and phenotypes for their ancestry. The each subpopulation.
adjusted values are independent of esti- Malosetti et al. (2007) proposed a LD
mated ancestry so a statistically significant approach based on mixed models with atten-
correlation between an adjusted candidate tion to the incorporation of the relationships
marker and adjusted phenotype is there- between genotypes, whether induced by
fore evidence of close linkage of a trait pedigree, population substructure or oth-
locus to the marker. erwise. Furthermore, they emphasized the
The EIGENSTRAT approach is similar to need to pay attention to the environmental
that of structured association but is less features of the data as well, i.e. adequate rep-
dependent on assessing the number of resentation of the relations among multiple
ancestral populations. Although each prin- observations on the same genotypes. They
cipal component is attributed to a separate illustrated their modelling approach using
population, the analysis is robust to the 25 years of Dutch national cultivar list data
number included in the analysis, provided on late blight resistance in the genetically
this is sufficiently large to capture all true complex crop of potato. As markers, they
population effects. used nucleotide binding-site markers, a spe-
EIGENSTRAT was developed for analys- cific type of marker that targets resistance or
ing human datasets, which have high- resistance-analogue genes. To assess the con-
density genotyping and low levels of sistency of QTL identified by their mixed-
population differentiation. Many crops model approach, a second independent data
have much higher levels of population set was analysed. Two markers were identi-
differentiation than those found in human fied that are potentially useful in selection
data sets and often only low densities of for late blight resistance in potato.
markers are available. EIGENSTRAT, unlike Malosetti et al. (2007) showed how pow-
structured association, will not readily erful and flexible a mixed-model framework
handle multi-allelic markers. However, can be for association mapping in plant spe-
a microsatellite with ten alleles could be cies. They illustrated the use of mixed mod-
coded as ten biallelic loci, all in complete els by analysing two independent data sets
LD. An analysis of human data showed on resistance to late blight, where the sec-
that EIGENSTRAT was little affected by LD ond data set served as an empirical check
among > 3 million SNPs. The method on the QTL identified in the first set. The
shows great promise but additional approach can be implemented in any sta-
research is required to establish its suit- tistical package with extended facilities for
ability for crops. mixed-model analyses.
Molecular Dissection of Traits: Theory 231

Stich et al. (2008) evaluated various based on computer simulations model-


methods for LD mapping in the autogamous ling 55 years of hybrid maize breeding in
species wheat using an empirical data set, Central Europe. Furthermore, QIPDT was
determined a marker-based kinship matrix applied to a cross-section of 49 European
using restriction maximum-likelihood elite maize inbred lines genotyped with
(REML) estimate of the probability of two 722 AFLP markers and phenotyped in four
alleles at the same locus being identical in environments for several days to anthesis.
state but not identical by descent, and com- Compared to LRRT, the power to detect
pared the results of LD approaches based on QTL was higher with QIPDT when using
adjusted entry means (two-step approaches) data collected routinely in plant breeding
with the results of approaches in which programmes. Application of QIPDT to the
the phenotypic data analysis and the asso- 49 European maize inbreds resulted in a
ciation analysis were performed in one step significant (P < 0.05) association located at
(one-step approaches). On the basis of the a position for which a consensus QTL was
phenotypic and genotypic data of 303 soft detected in a previous study.
winter wheat inbreds, their results indi-
cated that the ANOVA approach was inap-
Bayesian methods
propriate for LD mapping in the germplasm
set examined. Their observations suggested Bayesian methods based on the MCMC
that the QK methods proposed by Yu et al. algorithm have been developed for map-
(2006) are appropriate for LD mapping not ping multiple QTL as discussed in Section
only in allogamous species such as humans 6.7. Those based on Bayesian variable selec-
and maize, but also in the autogamous spe- tion (e.g. Yi et al., 2003;Yi, 2004; Sillanp
cies wheat. LD mapping approaches using and Bhattacharjee, 2005) are advantageous
a kinship matrix estimated by REML were in that they can be implemented via a sim-
more appropriate for LD mapping than the ple and easy-to-use Gibbs sampler and can
QK method proposed by Yu et al. (2006) be extended to the whole genome LD map-
with respect to: (i) the adherence to the ping (Kilpikari and Sillanp, 2003).
nominal a-level; and (ii) the adjusted power Iwata et al. (2007) proposed an
for detection of QTL. They showed that the approach that combines a Bayesian method
data set could be analysed by using two- for mapping multiple QTL with a regres-
step approaches of the proposed LD method sion method that directly incorporates
without substantially increasing the empiri- estimates of population structure and mul-
cal Type I error rate in comparison to the tiple QTL. The efficiency of the approach
corresponding one-step approaches. in simulated- and real-trait analyses of a
rice germplasm collection was evaluated.
Quantitative inbred pedigree Simulation analyses based on real marker
disequilibrium test data showed that the model could suppress
both false-positive and false-negative rates
Stich et al. (2006) conducted a study and the error of estimation of genetic effects
to: (i) adapt the quantitative pedigree over single QTL models, indicating statis-
disequilibrium test to typical pedigrees of tically desirable attributes over single QTL
inbred lines produced in plant breeding models.
programmes; (ii) compare the newly devel-
oped quantitative inbred pedigree disequi-
librium test (QIPDT) with the commonly
employed logistic regression ratio test 6.8.5 Applications of linkage
(LRRT), with respect to the power and Type disequilibrium mapping
I error rate of QTL detection; and (iii) dem-
onstrate the use of the QIPDT by applying There are many reports on LD mapping
it to flowering data of European elite maize in various plants (e.g. Thornsberry et al.,
inbreds. QIPDT and LRRT were compared 2001; Kraakman et al., 2004; Breseghello
232 Chapter 6

and Sorrels, 2006a; Auzanneau et al., 2007; as hot spots for markertrait associations
Crossa et al., 2007; Brown et al., 2008; Dhoop have been assigned to QTL clusters. Several
et al., 2008; Raboin et al., 2008; Weber et al., highly consistent alleletrait associations
2008; Buckler et al., 2009; Chan et al., 2009; were revealed among multiple alleles at
McMullen et al., 2009; Stich et al., 2009). specific loci.
Earlier attempts at establishing association The same data set was used to evaluate
between traits and markers across germplasm the potential of discriminant analysis, a mul-
collections concerned rice, oats, maize, sea tivariate statistical procedure, to detect can-
beet and barley. In rice, Virk et al. (1996) pre- didate markers associated with agronomic
dicted the value for six traits using multiple traits (Zhang et al., 2005). Model-based meth-
linear regression. In oats, Beer et al. (1997) ods revealed population structure among the
found associations between markers and 13 lines. Marker alleles associated with all traits
quantitative traits in a set of 64 landraces and were identified by discriminant analysis at
cultivars. In maize, Thornsberry et al. (2001) high levels of correct percentage classifica-
found associations between Dwarf8 poly- tion within sub-populations and across all
morphisms and flowering time. In sea beet, lines. Associated marker alleles pointed to
Hansen et al. (2001) mapped the bolting gene, the same and different regions on the rice
using AFLP markers in four populations. genetic map when compared to previous QTL
In barley, Igartua et al. (1999) concluded mapping experiments. Results suggested
that markertrait associations for heading that candidate markers associated with agro-
date, found in mapping populations, were to nomic traits can be readily detected among
some extent, maintained in 32 cultivars. inbred lines of rice using discriminant analy-
Ivandic et al. (2003) found association sis combined with other methods.
between markers and the traits of water- Using 236 AFLP markers and 146 mod-
stress tolerance (chromosome 4H) and ern two-row spring barley cultivars, asso-
powdery mildew resistance in 52 wild bar- ciations between markers were found for
ley lines. Chromosome 4H is, according to markers as far apart as 10 cM (Kraakman et
Forster et al. (2000), known for many loci al., 2004). Subsequently, for the 146 cultivars
involving abiotic stress tolerance, includ- the complex traits mean yield, adaptability
ing salt tolerance, water use efficiency and (FinlayWilkinson slope) and stability (devi-
adaptation to drought environments. ations from regression) were estimated from
Using 237 rice accessions collected from the analysis of cultivar trial data. Regression
around the world and genotypic data for 100 of those traits on individual marker data dis-
restriction fragment length polymorphism closed markertrait associations for mean
(RFLP) and 60 simple sequence repeat (SSR) yield and yield stability. Many of the associ-
marker loci and phenotypic data for 12 traits, ated markers were located in regions where
a stronger markermarker association was earlier QTL were found for yield and yield
found in the cultivar groups that had greater components. In tetraploid potato, LD map-
genetic variation or closer pedigree relation- ping has been successfully applied to study-
ship (Xu, Y., 2002). Markers within linkage ing disease resistances for which candidate
groups showed stronger allelic association genes were defined (Gebhardt et al., 2004;
than markers between linkage groups. The Simko et al., 2004a,b).
statistical associations, however, could not Historical multi-environmental trial
be interpreted solely from genetic linkage. data provides comprehensive phenotypic
Comparison of markertrait association in data for LD mapping and modelling geno-
different cultivar groups demonstrated that type-by-environment interaction. Crossa
both phenotypic variation and pedigree et al. (2007) reported a comprehensive
relationship among rice accessions strongly study using historical wheat data. Mapped
influenced the association detection. diversity array technology (DArT) markers
A highly consistent alleletrait association (Chapter 2) were used to find association
was revealed among multiple alleles at a with resistance to stem rust, leaf rust, yellow
given locus. Several chromosomal regions rust and powdery mildew, plus grain yield
Molecular Dissection of Traits: Theory 233

in five historical wheat international multi- Arabidopsis populations LD even exceeded


environment trials from the International 50 cM (Nordborg et al., 2002). In contrast,
Maize and Wheat Improvement Center in maize LD had already diminished after
(CIMMYT) conducted from 1970 to 2004. 2000 bp (Remington et al., 2001). The marker
Two linear mixed models were used to density in many plant species will allow
assess markertrait associations incorpo- effective GWA.
rating information on population structure
and covariance between relatives. Several
LD clusters bearing multiple host plant
resistance genes were found. Most of the 6.9 Meta-analysis
associated markers were found in genomic
regions where previous reports had found The explosion of interest in QTL mapping
genes or QTL influencing the same traits. In has led to numerous studies in plants,
addition, many new chromosome regions each based on its own experimental
for disease resistance and grain yield were population(s). Each experiment is limited
identified. Phenotyping across up to 60 in size and usually restricted to a single
environments and years allowed modelling population or a cross, planted in a specific
of genotype-by-environment interaction, environment(s). Therefore, QTL effects that
thereby making possible the identification can be detected are also limited. One direc-
of markers contributing to both additive and tion for QTL analysis is to combine informa-
additive-by-additive interaction effects. tion from several studies for example, by
As whole genome sequences become meta-analysis of the results of QTL studies
available for more and more plant species (Goffinet and Gerber, 2000) or joint analysis
and thus sequence-based markers cover- of the raw data (Haley, 1999) as discussed in
ing whole genomes, genome-wide associa- Section 6.5.3.
tion (GWA) studies are becoming popular in Efforts to combine findings from sepa-
genetic studies to replace the candidate gene- rate studies have a long history. Glass (1976)
based approach. This boom follows a long proposed a method to integrate and summa-
germination period during which the neces- rize the findings from a body of research.
sary concepts, resources and techniques were He called the method meta-analysis.
developed and assembled (Kruglyak, 2008). Since then, meta-analysis has become a
With the completion of the initial wave of widely accepted research tool in a variety
GWA scans in humans, McCarthy et al. (2008) of disciplines (Hedges and Olkin, 1985).
reviewed each major step in the implementa- Meta-analysis involves the application of
tion of a GWA scan, highlighting areas where standard statistical principles (hypoth-
there is an emerging consensus over the esis testing, inference) to situations where
ingredients for success and those aspects for only summary information is available (e.g.
which considerable challenges remain. published reports) and not the source unit
In plants, it is apparent that molecular record data. Well-conducted meta-analysis
marker-based germplasm evaluation will allows for a more objective appraisal of the
produce a large data set that can be explored evidence, which may lead to resolution
for LD studies for crops with different levels of uncertainty and disagreement. Meta-
of LD (e.g. Lu et al., 2009), with the devel- analysis makes the literature review process
opment of highly informative DNA markers more transparent, compared with tradi-
(e.g. SNPs) and high-throughput genotyp- tional narrative reviews where it is often not
ing technology. The number of SNPs that clear how the conclusions follow from the
are required for GWA obviously depends on data examined (Smith and Egger, 1998). The
the genomic extent of LD because genotyped application of meta-analysis to QTL detec-
SNPs must be spaced sufficiently densely to tion is recent (Goffinet and Gerber, 2000;
be in LD with most of the variants that are Hayes and Goddard, 2001). The combining
not genotyped. In sugarbeet, LD extended of the results across studies can provide a
up to 3 cM (Kraft et al., 2000), while in some more precise and consensus estimate of the
234 Chapter 6

location of a QTL and its effect as com- the curvature (Fisher information) of the
pared with any single study. However, log-likelihood profile at the estimated map
there are many challenges in combining position
the results of QTL mapping across studies,
including differences in marker density, si = [ 2 ln L/d2|d = di]1/2
linkage map, sample size, study design,
as well as statistical methods used. One In particular, the curvature is estimated
aspect that might transcend the meta- by fitting a local quadratic near the maxi-
analysis problem and benefit the whole mum of ln L and determining the coefficient
field of QTL detection and location is of the quadratic term. These standard errors
the reliability of the principal parameters are used to construct a weighted estimate of
which characterize QTL: position, confi- QTL location, the weights being inversely
dence interval, R2 and LOD score (Hanocq proportional to the squared standard errors
et al., 2007). These parameters are criti- (wi = si2).
cal to the meta-analysis process but are For studies that did not include an
often only partially reported in research interval map, average standard errors
papers.
m

s = (1 / m) s
i =1
2
i

6.9.1 Meta-analysis of QTL locations


can be computed based on the studies where
We followed the method described by interval maps were available.
Goffinet and Gerber (2000). In summary,
with a total of m published reports of a QTL
on a particular chromosome, the statisti-
cal question is to decide on whether these 6.9.2 Meta-analysis of QTL maps
reports represent a single QTL, two QTL,
etc. up to m separate QTL (one for each Integration of genetic maps and QTL by iter-
publication). ative projections on a reference map is now
Assessment of the number of QTL can widely used to position both markers and
be made on the basis of a likelihood ratio QTL on a single and homogeneous consen-
test, AIC or adjusted AIC, as in the method sus map (e.g. Arcade et al., 2004; Sawkins
outlined by Goffinet and Gerber (2000). et al., 2004). Comparison of multiple QTL
This involves selecting from among the mapping experiments by alignment to a
best-fitting models with 1, 2, , m distinct common reference or consensus map offers
QTL. As a result each published QTL can a more complete picture of the genetic con-
then be allocated to its respective consensus trol of a trait than can be obtained in any
QTL. Note that usually, only the latest one study. In order to study QTL congru-
paper in a publication series on the same ency, Goffinet and Gerber (2000) proposed
study population was included, to avoid an original approach based on a meta-anal-
duplication of the same QTL report. For a ysis strategy. Etzel and Guerra (2003) devel-
publication to be included in meta-anal- oped a meta-analysis based on an approach
ysis, it ideally provides the interval map to overcome the between-study heteroge-
(test statistic profile). As well as providing neity and to refine both QTL location and
the estimate of the QTL location (di), the the magnitude of the genetic effects. Both
interval map also enables estimation of the methods of Goffinet and Gerber (2000)
the standard error for the QTL location, and Etzel and Guerra (2003) are limited to
si = se(di), after conversion of the test sta- a small number of underlying QTL posi-
tistic to a (approximate) log-likelihood tions (from one to four for the former and
(ln L) scale. It has been suggested that only one for the latter) which is a serious
the standard error can be estimated from limitation for a whole genome study of QTL
Molecular Dissection of Traits: Theory 235

congruency. Even if the average number behind this meta-analysis is to estimate the
of QTL per experiment is around four in variance of these effects, sA2. Next assume
plants (Kearsey and Farquhar, 1998; Xu, that for each sire in the available studies,
Y., 2002; Chardon et al., 2004), it would be the estimate of the QTL allelic substitution
expected that more than four genes can be effect, ai, is i with corresponding standard
involved in the trait variation on a single error Vi = se(i) and variance Vi2, i = 1, 2, ,
chromosome. n, where n is the number of sires. To
A meta-analysis of flowering time and model the imprecision of i estimating ai,
related traits in maize from 22 QTL detection we assume that i|ai N(ai,Vi2) and conse-
studies concluded that a total of 62 different quently, the unconditional distribution of
QTL are likely to be involved in the variation estimated effects will be i N(0,Vi2 + sA2).
of these traits, whereas on average four to As also considered by Hayes and Goddard
five QTL were detected in single-population (2001), there are two other features that
analyses (Chardon et al., 2004). To remove need to be modelled in the meta-analysis.
these impediments, Veyrieras et al. (2007) First, since it is to a certain extent arbitrary
developed a new two-stage meta-analysis which sire allele is labelled as having a
procedure in order to integrate multiple positive effect, we will ignore the sign and
independent QTL mapping experiments condition on ai > 0 and i > 0. Secondly,
with the aim of creating a global framework only significant QTL tend to be pub-
to evaluate the homogeneity of both genetic lished (resulting in potential publication
marker and QTL mapping results from lit- bias), so we assume that i > c where c is
erature and public databases. First, it imple- the threshold QTL effect that just reaches
ments a new statistical approach to merge publication level. With these constraints,
multiple distinct genetic maps into a single the probability density function, h(), for
consensus map which is optimal in terms the observed QTL effects will be
of weighted least squares and can be used
to investigate recombination rate heteroge- h(i|ai > c) = ni(i)/[1 Ni(c)], i > c
neity between studies. Secondly, assuming
that QTL can be projected on the consensus where say,
map, METAQTL, a computational and statisti-
cal package developed for the whole-genome 1 y2
ni ( y ) = exp 2
meta-analysis of QTL mapping experiments, 2p (V + s )
i
2 2
A
2(Vi + s A )
2

offers a new clustering approach based on


a Gaussian mixture model to decide how is the normal probability density function
y
many QTL underlie the distribution of the and Ni(y) = ni(t)dt the corresponding
observed QTL. Contrary to existing meth- cumulative normal distribution function.
ods, METAQTL offers a complete statistical So there are two parameters to be estimated,
process to establish a consensus model for sA2 and c and this is achieved by an ML
both the marker and the QTL positions on procedure.
the whole genome. For those papers where zI was not
reported, the average value (V) is computed
in a similar way to that of s. However,
because the different studies were con-
6.9.3 Meta-analysis of QTL effects ducted under different conditions, there was
a large variation in the phenotypic standard
After estimating the consensus QTL posi- deviation across studies for a particular
tion using the above approach, a meta- trait. Consequently, both the effect estimates
analysis can be conducted for the effect and their standard errors were re-scaled
size for each consensus QTL. Suppose that by dividing by their reported phenotypic
for a consensus QTL, the QTL allelic sub- standard deviations (where reported) or by
stitution effects (a) differ from sire to sire appropriate consensus standard deviations
and assume that a N(0,s 2A). The purpose used for international evaluations where
236 Chapter 6

this was not reported. Consequently, the the largest interval between adjacent mark-
consensus estimate of s A2 will be the propor- ers when considering the Nm markers; and
tion of the phenotypic variance explained (v) the weighted standard deviation stand-
by the consensus QTL. ardized to 100 cM (which evaluates the
heterogeneity in homothetic coefficients
for intervals within and flanking a QTL CI
region). Chromosomes of groups 2 and 5
6.9.4 Examples of meta-analysis had greater control over the incidence of
earliness as they carry the known, major
Meta-analysis of all identified QTL prom- genes Ppd and Vrn. The other four chromo-
ises to contribute to our understanding of some regions played an intermediate role in
fundamental questions and to expedite control of earliness.
crop improvement. Khatkar et al. (2004) In cotton, a total of 432 QTL involv-
reviewed the results of QTL mapping in ing cotton fibre quality, leaf morphology,
dairy cattle. Based on the information flower morphology, resistance to bacteria,
available in the public domain, they devel- trichome distribution and density and
oped an online QTL map for milk produc- other traits that were mapped in one dip-
tion traits. To extract the most information loid and ten tetraploid interspecific cotton
from these published records, a meta-anal- populations, was aligned using a reference
ysis was conducted to obtain consensus on map which consisted of 3475 loci in total
QTL location and allelic substitution effect and was depicted in a CMAP resource (Rong
of these QTL. The meta-analysis indicated et al., 2007). Meta-analysis of polyploidy
a number of consensus regions, the most cotton QTL showed unequal contributions
striking being two distinct regions affect- of sub-genomes to a complex network of
ing milk yield on chromosome 6 at 49 cM genes and gene clusters implicated in lint
and 87 cM explaining 4.2 and 3.6% of the fibre development. QTL correspondence
genetic variance of milk yield, respectively. across studies was only modest, suggest-
Outputs from such analyses highlight the ing that additional QTL for the target traits
specific areas of the genome where future remain to be discovered. Crosses between
resources should be directed to refine char- closely-related genotypes differing by sin-
acterization of the QTL. gle-gene mutants yield profoundly differ-
To identify the genome regions of bread ent QTL landscapes, suggesting that fibre
wheat involved in the control of earliness variation involves a complex network of
and its three components: photoperiod interacting genes. Meta-analysis linked
sensitivity, vernalization requirement and to synteny-based and expression-based
intrinsic earliness, Hanocq et al. (2007) car- information provides clues about spe-
ried out a QTL meta-analysis to examine cific genes and families involved in QTL
the replicability of QTL across 13 inde- networks.
pendent studies and to propose meta-QTL. Munaf and Flint (2004) described
QTL were projected on to the reference map how meta-analysis works and considered
using the BIOMERCATOR 2.0 software (Arcade whether it will solve the problem of under-
et al., 2004). To assess the reliability of this powered studies or whether it is another
projection, five variables were calculated to affliction visited by statisticians on geneti-
assess QTL projection quality for each QTL: cists. A crucial question for any meta-analy-
(i) the percentage of QTL confidence inter- sis is the degree of heterogeneity that exists
val (CI) included in the linked region; between the individual studies, which is per-
(ii) Nm (the number of common markers haps, not surprisingly common. Ioannidis
characterizing a QTL CI region, i.e. within et al. (2001) conducted a meta-analysis of
and flanking it); (iii) local map density 370 studies addressing 36 genetic associa-
(which is computed as the local average tions. They found that significant between-
distance of the Nm markers on the projected study heterogeneity is frequent and that
map); (iv) maximum gap size or the size of the results of the first study often correlate
Molecular Dissection of Traits: Theory 237

only modestly with subsequent research progeny, reducing the time required for
on the same association. It has been argued QTL interval identification to milliseconds
that meta-analysis is analogous to averag- when a large number of related data become
ing the characteristics of apples and orange available.
(Hunt, 1997) and consequently, its outcome
is meaningless. Another concern in meta-
analysis is publication bias that can exist
when non-significant findings remain 6.10.1 Pros and cons
unpublished, thereby artificially inflat-
ing the apparent magnitude of the effect. As massive amounts of phenotypic data for
The concern is not new and was raised in different traits have accumulated in pub-
the late 1950s in relation to psychiatric and lic and private plant breeding programmes
psychological research in humans (Sterling, in major crop species, in silico mapping
1959). in plants has become possible and attrac-
As indicated by Munaf and Flint tive. Compared with designed mapping
(2004), meta-analysis has been successful experiments, in silico mapping has several
in revealing unexpected sources of hetero- advantages (Grupe et al., 2001; Parisseaux
geneity, such as publication bias. If hetero- and Bernardo, 2004). First, in silico map-
geneity is adequately recognized and taken ping exploits larger populations than
into account, meta-analysis can confirm the designed mapping experiments. In maize,
involvement of a genetic variant, but it is for example, thousands of experimental
not a substitute for an adequately powered hybrids are evaluated each year (Smith
primary study. et al., 1999). In contrast, the small populations
(e.g. fewer than 500 progenies) often used
in designed mapping experiments lead to a
low power for detecting QTL (Melchinger
6.10 In Silico Mapping et al., 1998), overestimation of QTL effects
(Beavis, 1994) and imprecise estimates of
As an alternative to designed mapping QTL location (van Ooijen, 1992; Visscher
experiments using an F2 or BC mapping et al., 1996). Secondly, phenotypic data
population, in silico mapping was devel- used for in silico mapping are obtained
oped to detect genes by simultaneously through more extensive testing under mul-
exploiting existing phenotypic, genotypic tiple, diverse environments. An experimen-
and pedigree data available in breed- tal maize hybrid is typically evaluated in
ing programmes and genomic databases. 20 environments; those that are eventually
Grupe et al. (2001) were the first to use this released as cultivars are evaluated in up to
approach to investigate whether chromo- 1500 locationyear combinations (Smith
somal regions regulating quantitative traits et al., 1999). The use of many environments
(QTL intervals) could be computationally permits the sampling of a sufficient set of
predicted with the use of the mSNP data- QTL environment interactions. Thirdly,
base and available phenotypic informa- the hybrids and inbreds tested typically
tion obtained from mouse inbred strains. represent a wide sample of the germplasm
The phenotypic and genotypic information and genetic backgrounds. In contrast, only
was analysed in silico to identify candidate a narrow genetic background is exploited in
QTL intervals. Ability of the computational designed mapping experiments that use F2
method to correctly predict QTL intervals or BC populations. Fourthly, the data used
was evaluated and 19 of 26 experimentally for in silico mapping are already available
verified QTL intervals for ten phenotypic without extra cost.
traits were correctly identified. In silico Offsetting these advantages are three
mapping can eliminate many months to main complications to in silico mapping
years of laboratory work required to gener- (Parisseaux and Bernardo, 2004). First, the
ate, characterize and genotype intercross performance data are highly unbalanced:
238 Chapter 6

the same set of hybrids or inbreds are evalu- mapping via a mixed-model approach can
ated in a different set of environments, as detect associations that are repeatable
some hybrids or inbreds that fail to perform across different populations.
well are discarded and those that perform Because of differences in the germ-
well are subjected to more testing. Secondly, plasm used, the numbers of QTL identified
the hybrids or inbreds do not comprise through in silico mapping were not directly
a single homogenous population. Any in comparable with those previously detected
silico mapping procedure would therefore through designed mapping experiments. On
have to account for pedigree relationships the one hand, the wide range of germplasm
and differences in the genetic backgrounds sampled with in silico mapping enhances
among tested hybrids or inbreds. Thirdly, the detection of many QTL. On the other
few crops have enough data available for in hand, mapping populations are often devel-
silico mapping. oped by crossing two parents that are widely
divergent for a trait, e.g. susceptible parent
and resistant parent for smut. A diverse
mapping population also enhances the
6.10.2 Mixed-model approach detection of many QTL. In the largest QTL
mapping study published in maize (976
The usefulness of in silico mapping families from an F2 population, genotyped
has been explored via a mixed-model with 172 markers and evaluated in 19 envir-
approach in maize (Zea mays L.) to deter- onments), Openshaw and Frascaroli (1997)
mine whether the procedure gave results detected 36 significant markers for plant
that were repeatable across populations height and 32 for grain moisture (data for
(Parisseaux and Bernardo, 2004). Multi- smut resistance were absent). This result
location data were obtained from the for plant height (36 QTL) was consistent
19952002 hybrid testing programme with the number of significant markers
of Limagrain Genetics in Europe, which detected for plant height (37) via in silico
included: (i) multi-location phenotypic mapping. The number of significant mark-
data for 22,774 single-cross hybrids; ers (44) for grain moisture was larger than
(ii) SSR marker data at 96 loci for the that detected by Openshaw and Frascaroli
1266 parental inbreds of the single-cross (1997), perhaps because of a wider range
hybrids; and (iii) pedigree records for the of maturities sampled in the in silico map-
1266 parental inbreds which were classi- ping germplasm than in the single F2 popu-
fied into nine different heterotic groups. lation used by Openshaw and Frascaroli
Using a mixed-model approach, the (1997). For smut resistance, Lbberstedt
general combining ability effect associ- et al. (1998a) detected 19 significant mark-
ated with marker alleles in each heterotic ers across four populations, whereas Kerns
pattern was estimated. The numbers of et al. (1999) detected 22 significant markers
marker loci with significant effects 37 in one population. These previous results
for plant height, 24 for smut (Ustilago were consistent with the number of signifi-
maydis (DC.) Cda.) resistance and 44 for cant markers (24) detected for smut resist-
grain moisture were consistent with ance in in silico mapping.
previous results from designed mapping
experiments. Each trait had many loci
with small effects and few loci with large
effects. For smut resistance, a marker in 6.10.3 Statistical power
bin 8.05 on chromosome 8 had a signifi-
cant effect in seven (out of a maximum It has been shown that the heritability and
of 18) instances. For this major QTL, the genetic architecture (e.g. number of QTL
maximum effect of an allele substitution and distribution of effects) of the trait and
ranged from 5.4% to 41.9%, with an aver- resources available for QTL mapping (e.g.
age of 22.0%. It is concluded that in silico sample size and number of markers) affect
Molecular Dissection of Traits: Theory 239

the statistical power of designed QTL in hybrid crops can be initiated by in silico
mapping experiments as discussed in this mapping. Finding an acceptable compro-
chapter. These genetic and non-genetic fac- mise, however, between the power to detect
tors are also expected to affect the power QTL and the proportion of false QTL would
of in silico mapping via a mixed-model be necessary.
approach. In plant breeding programmes, the
The statistical power of the in silico phenotypic data are highly unbalanced
mapping method was evaluated via a and the inbreds and hybrids have a pedi-
mixed-model approach in hybrid crops (Yu gree structure. In silico mapping via a
et al., 2005). Simulation mimicked a two- mixed-model approach accommodates
stage breeding process in maize, with inbred unbalanced data, pedigree relationships
development and hybrid testing. First, two and different heterotic groups of parental
opposite heterotic groups were considered, inbreds by fitting relevant terms in the
each having a total of n1 = n2 = 112 inbreds mixed model. Furthermore, the relative
developed from different ancestral inbreds. effects of the QTL are measured by the
Secondly, it was assumed that n = 600 or regression coefficients of the significant
2400 hybrids, among all potential single- markers and the approximate positions of
cross hybrids (112 112 = 12,544) between the QTL are indicated by the location of
the two heterotic groups, had data available the significant markers.
from multi-location performance trails. The As with other QTL mapping methods,
number of inbreds in each heterotic group the results from in silico mapping should
and the number of hybrids with avail- be followed by fine mapping at the target
able phenotypic data were chosen to agree regions, sequence analysis and functional
with the empirical data of Parisseaux and tests of gene effects (Glazier et al., 2002). In
Bernardo (2004). hybrid crops for which multiple heterotic
A total of 64 simulation experiments groups exist, in silico mapping via a mixed-
was conducted. These 64 experiments had model approach can be applied to differ-
contrasting values of six different param- ent heterotic patterns. Subsequently, the
eters: level of initial LD (t = 10 or 20 gen- markers or the genomic regions that show
erations of random mating), significance a repeatable association with the trait of
level (a = 0.01 or 0.0001), number of QTL interest across different populations can be
(l = 20 or 80), heritability (H = 0.40 or 0.70), considered as the prime targets for further
number of markers (m = 200 or 400) and analysis (Parisseaux and Bernardo, 2004).
sample size (n = 600 or 2400 hybrids). For Cross validation by conducting in silico
each experiment, 50 runs were conducted mapping in multiple heterotic patterns
with different locations of QTL and mark- would result in better control of the overall
ers on the genetic map and different inbreds false discovery rate and provide increased
and hybrids. confidence for conducting further investiga-
It was found that the average power tion in putative QTL regions.
to detect QTL ranged from 0.11 to 0.59 for
a significance level of a = 0.01 and from
0.01 to 0.47 for a = 0.0001. The false dis-
covery rate ranged from 0.22 to 0.74 for a
6.11 Sample Size, Power
= 0.01 and from 0.05 to 0.46 for a = 0.0001. and Thresholds
As with designed mapping experiments, a
large sample size, high marker density, high 6.11.1 Power and sample size
heritability and small number of QTL led
to the highest power for in silico mapping There are two types of errors that can be
via a mixed-model approach. The power to made when carrying out a statistical test.
detect QTL with large effects was greater A false positive (a Type I error) occurs when
than the power to detect QTL with small the null hypothesis is rejected when in fact
effects. It is concluded that gene discovery it is correct. We control for this by setting
240 Chapter 6

a low significance level a for a test (the hypothesis t = 0 and F(x) is the standard
probability of a false positive). The other normal cumulative distribution function.
source of error is a false negative (a Type II For given a and b, the sample size n
error), i.e. failing to reject the null hypoth- required for the test is
esis when in fact it is false. The power of
a test is defined to be the probability that za + z b
2

the null hypothesis is rejected when it is n1 = 8 (6.18)


indeed false. Hence if b is the probability of (1 2r )2a/s
a false negative, the power is 1 b. The dis- for additive effect, and
cussion in this section is based on Broman
2
et al. (2003a) and Zengs presentation at the za + z b
Plant and Animal Genome XI meeting, 2003 n2 = 4 (6.19)
(1 2r )d/s
(http://statgen.ncsu.edu/zeng/QTLPower-
Presentation.pdf). for dominance effect.
First a simple case (a point for depar-
ture) is one marker and one QTL for F2. Factors determining the required
Assume that the QTL genotypic effects sample size
for Q1Q1, Q1Q2 and Q2Q2 are a, d and a,
respectively. 1. If the test is two-tailed (the usual case), za
The marker effects can be tested, refer- should be replaced by za/2.
ring to Eqn 6.1, by 2. For interval mapping the required sam-
ple size can be reduced by a factor of (1 r*)
m M1M1 m M 2M 2 (1 2r )2a
t1 = = where r* is the recombination frequency
s 2
s 2
8s 2/n (6.16) between an interval of two marker loci.
+
n/4 n/4 Example: if r* is about 0.23 for a 30-cM inter-
val, then, (1 2r)2 in Eqns 6.18 and 6.19 can be
and referring to Eqn 6.2, by replaced by (1 r*) = 0.77 to account for the
worst case scenario where a QTL is located in
m M1M 2 ( m M1M1 + m M 2M 2 )/2
t2 = the middle of an interval (r r* /2).
s2 s2 s2 3. In the test, if many unlinked markers are
+ + used for controlling genetic background,
n/2 n n (6.17) most of genetic variance in the population
(1 2r )2d can be removed from the residual variance
=
4s 2/n (the idea of CIM) and sr2 may be roughly
approximated by the environment variance
Note that mM1M2 does not contribute to the se2. The overall heritability of the trait mat-
test in Eqn 6.16; adding mM1M2 in Eqn 6.16 ters enormously.
does not increase the efficiency of the test 4. For a systematical search for QTL in
unless |d| a/2 (but see below for the a genome, the Type I error a for each test
calculation of sample size required with should be substantially lower to account
dominance). for increased false positive probability in an
When n is large, the observed difference overall search. In most cases, the use of a*
t is approximately normally distributed and
= 0.001 (a very conservative level) for each
the power 1 b to detect the difference (for
individual test should be sufficient to ensure
one-tailed test) is
an overall false positive rate of less than 5%.
1 b = Pr[t > za with t N(t,1)] The relevant sample size can be calculated as
= 1 F(za t) 2
8 za * + z b
n1
0.77 2a / s e
where za is the z critical value of the test
with (1 a) confidence under the null for additive effect.
Molecular Dissection of Traits: Theory 241

Now it remains to determine the likely Thus as long as f, the proportion of the
magnitudes of 2a/se. Suppose that a QTL genetic variation attributed to the QTL, is
contributes to a proportion f of the genetic fixed, the required sample size for the test
variance sg2 in an F2 population. Assuming is unchanged.
that no other genes are linked to the QTL
and ignoring the dominance (d = 0),
Effect of linkage: multiple linked QTL
(2a)2 There are two issues that need to be
= f s g2/s e2
8s e2 considered:
sg2/se2 is an unknown quantity. For example, 1. Detection of QTL on the chromosome:
assuming hF22 = s g2/(s g2 + s e2) = 0.6 means for two linked QTL, if the model is mis-
identified (two QTL analysed as one), the
s g2 (2a)2 power to identify the one QTL is based on
2
= 1.5 and = 12 f the joint effect of QTL (a weighted sum). If
se s e2
the two QTL are in coupling linkage, the
Given that a* = 0.001 and b = 0.1 (z0.001 + joint effect is aggregated and thus power is
z0.1 = 3.09 + 1.28 = 4.37), the required sam- increased. If the two QTL are in repulsion
ple sizes for detecting leading QTL for f = linkage, the joint effect is reduced. Thus
0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4 and 0.5 are power is decreased and it can be very low.
n = 1653, 826, 330, 165, 82, 55, 41 and 33. However, if the model can be identified
correctly (searching for two QTL or condi-
tional searching), the issue is about separat-
Effects of dominance
ing linked QTL and the power to identify
Depending on the degree of the dominance repulsion-linked QTL is not necessarily
effect, the sample size required for detect- very low.
ing a dominance effect may need to be sub- 2. Separating linked QTL (identifying both
stantially increased. Dominance does not, QTL): the required sample size is increased
however, affect the calculation of the power by a factor (Zeng, 1993)
detecting QTL. For example, suppose d = a.
In this case we may use s i2 1/4
=
s i2 j r (1 r )
m M1 _ m M 2M 2 (1 2r )2a
t3 = =
s2 s2 16s 2/(3n) where si2 is the variance of marker i and s 2i.j
+ is the variance of marker i conditional on
3n/4 n/4
marker j.
But because of dominance The values for these factors corre-
sponding to the recombinant frequency, r,
3(2a)2 between the two QTL are shown at the bot-
= f s g2
16 tom of the page.

r 0.5 0.4 0.3 0.2 0.15 0.1

1 1 1.04 1.19 1.56 1.96 2.78


4r (1 r )

r 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01


1
3.05 3.40 3.84 4.43 5.26 6.51 8.59 12.76 25.25
4r (1 r )
242 Chapter 6

QTL detection and power calculation Experimental QTL studies in plant species
depend on QTL mapping analysis procedure: have been inadequate for drawing inferences
CIM is more powerful than simple interval about numbers, magnitudes and distribution
mapping; MIM is more powerful than CIM. of QTL for most quantitative traits. Unless
The power of the test can be increased large numbers of progeny are evaluated for
by combining information from multiple QTL, MAS will have minimal impact on
related traits, multiple crosses and multiple plant breeding (Gimelfarb and Lande, 1994a)
environments. The genetic structure becomes and new breeding strategies based on evalu-
more complex in this case and so does the ation of large numbers of progeny will be
statistical analysis. But, there are definite necessary to realize the potential of MAS.
advantages in the joint multiple trait analysis To test this hypothesis, the largest
for QTL identification (Jiang and Zeng, 1995) QTL experiment available in plants con-
and of course for hypothesis testing (pleio- ducted by Pioneer Hi-Bred in maize was
tropy) and parameter estimation. analysed in detail by Schn et al. (2004).
There are many factors that determine This study consisted of testcrosses of 976
how large a sample size is required for a spe- F4:5 lines derived from the cross of two
cific QTL mapping experiment. The sample elite lines. These materials together with
size depends on the heritability of the trait testcrosses of their parents were evaluated
of interest (any knowledge or guess), how in 16 environments. The F4:5 lines were
large an effect of a QTL (as a minimum) is assayed with 172 RFLP markers covering
expected to be detected (for example, detect the entire genome. With the entire data set
a QTL that explains 5% variation), what is (comprising N = 976 genotypes and E =
the likely complexity of the genetic archi- 16 environments) the number of detected
tecture of QTL and how many QTL, distri- QTL confirmed the infinitesimal model of
bution of effects, epistasis, etc. quantitative genetics (e.g. 30 QTL detected
with LOD 2.5 for plant height, explain-
ing 61% of the genetic variance).
For studying the effects of sample
6.11.2 Cross validation and sample size size as well as genotypic and environ-
mental sampling on the outcome of QTL
Cross validation (CV) is a re-sampling tech- analyses, the entire data set was parti-
nique that samples from a genetic cross (e.g. tioned into smaller data sets with N = 488,
F2 intercross) and divides a large-size sam- 244, 122 and E = 16, 4, 2. After randomi-
ple into several subsamples (e.g. k = 5). As zation of genotypes and environments,
an example, cross validation that was used the partitioning of the experimental data
for a sample size study (Melchinger et al., reference population, PED(N, E), was
2004) will be discussed in this section. repeated to obtain a total of 120 differ-
Melchinger et al. (2000) compiled a lit- ent small data sets for given values of N
erature survey based on 45 published QTL and E. Within each PED(N, E), heritabilities
studies in crops encompassing 34 complex were estimated and QTL analyses were
traits. Sample sizes ranged between 60 and performed for each data set with LOD =
380 with a median of 150. In most studies 2.50 and 3.21. Fivefold CV accounting
only a small number of QTL (median of six) for genotypic sampling was applied by
were detected which generally explained subdividing each small data set into five
a surprisingly large proportion (50% and genotypic samples. Four genotypic sam-
more) of the genetic variance. Although ples were used as the estimation set for
these findings seem to contradict Fishers localization of QTL and estimation of their
(1918) infinitesimal model upon which effects. The fifth sample was used as a
quantitative genetics is based, Beavis (1998) test set to obtain asymptotically unbiased
conjectured from simulation results that dif- estimates of the proportion of the geno-
ferent conclusions might be drawn if larger typic variance explained by QTL in each
experimental populations were evaluated. test set. For each small data set five
Molecular Dissection of Traits: Theory 243

different estimation sets and correspond- 6.11.3 Confidence interval of QTL


ing test sets are possible. By randomi- location
zation of the genotypes assigned to the
five subsamples 120 estimation sets and In genome scans to detect QTL or targeted
test sets were generated and averaged for scans to attempt to replicate previously
estimation of parameters. In the estima- reported QTL, estimation of the position of
tion set step, some (four) subsamples are a QTL is usually achieved with low preci-
used to predict QTL loci l and effects q. sion, even in controlled crosses from inbred
Heritability h2 can be also predicted by lines, but the accuracy of the position is
important, either for subsequent introgres-
s g2 sion or fine mapping (e.g. Lynch and Walsh,
h2 =
s 2 2
s ge 1998). Darvasi and Soller (1997) presented
+ + s g2 empirical predictions of the confidence
re e
interval of QTL location for dense marker
In the test set step, other (one) subsamples maps in experimental crosses. They showed
can be tested by using loci from estimation from simulation results for BC and F2 popu-
sets to predict effects q and the proportion lations from inbred lines that the 95% con-
of genotypic variance explained. fidence interval was a simple function of
Reducing N from 976 to 244 or even to sample size and the effect of the QTL:
122 decreased the average number of detected
QTL (nQTL) by more than half irrespective of CI95BC = 3000/[n(a + d)2] for BC
E (Fig. 6.5). By comparison, reducing E from CI95F2 = 1500/(na)2 for F2
16 to 4 or 2 had a much smaller effect on nQTL.
In all instances, a large variation in nQTL was where a and d are the additive and domi-
observed among different data sets especially nance effects of the QTL in residual standard
for smaller values of N and E. Although nQTL deviation units, respectively, and n the sam-
decreased with smaller data sets, the esti- ple size. The results from Darvasi and Soller
mates of genetic variance explained remained (1997) can be used in combination with power
almost the same due to a tremendous increase studies before an experiment to assess the
in the bias. This illustrates that QTL effects expected confidence region of a QTL given
obtained from smaller sample sizes are usu- its effect and the sample size of the experi-
ally highly inflated, leading to an overly opti- ment. After an experiment is conducted and
mistic assessment of the prospects of MAS. a QTL is detected, using either the LOD drop-
Moreover, inferences about the genetic archi- off method (Fig. 6.2; Lander and Botstein,
tecture (number of QTL and their effects) of 1989) or a bootstrap procedure (Visscher
complex traits cannot be achieved reliably et al., 1996; Talbot et al., 1999) to estimate the
with smaller sample sizes. confidence interval of a QTL.

Number of QTL detected Number of QTL detected

LOD 2.50 LOD 3.21

30 15

20 10
16 16
10 4 5 4
2 E 2 E
0 0
976 488 244 122 976 488 244 122
N N

Fig. 6.5. Average number of QTL (nQTL) detected in the estimation set of 120 different data set and
partitionings PED(N, E) using standard cross validation and LOD = 2.50 and 3.21 for plant height. From
Melchinger et al. (2004) with kind permission of Springer Science and Business Media.
244 Chapter 6

Visscher and Goddard (2004) derived can protect against one or more false posi-
by theory simple equations that can be used tives and roughly adjust the size of each
to predict any confidence interval and give individual test. LOD distribution under the
expressions for the 95% interval. This con- null hypothesis at any particular location
fidence interval in centimorgans is l depends on design (1 degree of freedom
for BC, 2 degrees of freedom for F2) with
CI(1 b) (200x)X1b/(nd2)
LOD(l )
= LR ~ c12 or c 22
with x = 2 for F2 and x = 4 for BC and 0.217
X1 b the threshold of a central chi-squared Some point-wise P-values for different levels
distribution with 1 degree of freedom corre- of LR and LOD are provided in Table 6.7.
sponding to a cumulative density of (1 b).
For example, for BC and F2 populations, the
Genome-wide threshold
95% confidence interval are predicted as
Assume a dense marker map with mark-
CI95BC (200)(4)(3.84)/(nd2) = 3073/(nd)2 ers everywhere. LR test statistics are cor-
and related and correlation drops off quickly
with distance but there is no correlation for
CI95F2 (200)(2)(3.84)/(nd2) = 1537/(nd)2 unlinked markers.
Based on Lander and Bostein (1989),
The prediction of the CI as a function of the a genome-wide threshold value, t, can be
proportion of the variance explained by the determined using the OrnsteinUhlenbeck
QTL (q) is process as follows

CI 200X1 b(1 q2)/(nq2) Pr(maxl in genome LR(l) > t) = a (C + 2Gt)at

For example, the 95% CI for a QTL that with Pr(c12 > t) = at, where C = number of
explains 35% of the variation in either a BC chromosomes; G = length of genome in
or F2 population is centimorgans; t = genome-wide threshold
value; and at = corresponding point-wise
CI95 (200)(3.84)(1 0.35)/(nq2) = 499/(nq)2 significance level.
Figure 6.6 shows LOD thresholds
A general form of the prediction of the CI for different point-wise and genome-wise
for dense marker maps that also applies to P-values in BC and F2 populations (Broman
other population structures is et al., 2003a).

CI(1 b) 200X1 b/l Permutation and thresholds

where l is the non-centrality parameter of a It is possible to derive the distribution of


chi-squared test for the presence of a QTL at any test statistic under an appropriate null
the true QTL location.
Table 6.7. LR, LOD and point-wise P-values.

6.11.4 QTL thresholds P-value


a
LR LOD 1 d.f. 2 d.f.
QTL thresholds for interval mapping
10 1 0.0319 0.1
Interval mapping scans across loci at loca- 31.6 1.5 0.0086 0.0316
tion l in a genome to find the evidence for 100 2 0.0024 0.0024
1,000 3 0.0002 0.001
QTL with large LOD (l). The relationship
10,000 4 < 0.0001 0.0001
between LOD and LR is LOD (l) = 0.217
LR. Setting a genome-wide LOD threshold a
d.f., degree of freedom.
Molecular Dissection of Traits: Theory 245

A
0.20
0.100

0.15
0.001
P-value

P-value
0.10

1e-05
0.05

0.00 1e-07
0 1 2 3 4 5 6 0 1 2 3 4 5 6
LOD threshold LOD threshold
B
0.20
0.10

0.15 0.01
P-value

P-value

0.10
1e-04
0.05

0.00 1e-06
0 1 2 3 4 5 6 0 1 2 3 4 5 6
LOD threshold LOD threshold

Fig. 6.6. Point-wise and genome-wide P-values and LOD threshold for BC (A) and F2 intercross (B).

hypothesis by shuffling the quantitative trait values does not change the summary
trait values among the individuals in the statistics such as the number of individu-
data set. If there is a QTL effect at specific als, mean and variance of the individuals.
location(s) in the genome, there will be an Estimating significance threshold values
association between the trait values and the by permutation includes four steps (Doerge
point of analysis on the genetic map. If there and Churchill, 1996):
is no QTL present in the genome, or it is
1. Hold the genetic map fixed (i.e. keep the
unlinked to the point of analysis, there is
marker information from a sampled indi-
no markertrait association (i.e. exactly the
vidual intact. If the individual has m and
situation described under the null hypoth-
y, the elements of the vector m should be
esis). Permutation tests resolve the prob-
kept together and the trait values y should
lem of finding a significance threshold by
be shuffled over these).
simulating a large number of permutations
2. Shuffle the trait values.
(say 1000) of the observed data set (marker
3. Analyse the shuffled data set by applying
observations are shuffled with respect to
a t-test, likelihood ratio test or calculation of
traits) so that distribution of the test statistic
LOD score.
(LOD score) can be estimated under a null
4. Store the test statistic from each analysis
hypothesis as no relationship between traits
point of step 3 in an analysis matrix.
and markers. This distribution determines
how large the LOD values obtained from a Repeat steps 24 N times.
particular data set by chance can be. Threshold values can be created for
Shuffling trait values among individu- comparison-wise (per marker), chromo-
als in the data set represents the situation some-wise (chromosome specific) and
under the null hypothesis (Churchill and experiment-wise (experiment specific). In
Doerge, 1994), i.e. randomness. Shuffling the computer software developed for the
246 Chapter 6

CIM method, the permutation method is False True

Declare
incorporated into the model so that empiri- positives positives

QTL
cal thresholds for declaring significant QTL (3) (1)
can be calculated.

Fail to
declare QTL
6.11.5 False discovery rate

Declaring the presence of a QTL always


True False
carries some risk that such a declaration is negatives negatives
false. The risk can be judged by the false (57) (3)
discovery rate (FDR), which represents the
probability that a QTL is false, given that Not linked Linked
a QTL has been declared. A high FDR can to QTL to QTL
result in false leads and wasted resources
in characterizing and exploiting genes for Fig. 6.7. Outcomes of a significance test for
quantitative traits, as well as confuse the detecting QTL. From Bernardo (2004) with kind
permission of Springer Science and Business
QTL literature and databases. Knowledge
Media.
of the magnitude of the FDR would be
helpful for designing QTL mapping experi-
ments and for properly interpreting their rate is defined as the probability that a QTL
results. is false, given that a QTL has been declared
Take an example as given by Bernardo (Benjamini and Hochberg, 1995; Fernando
(2004). Suppose that out of 64 independ- et al., 2004); it is equal to (number of false
ent markers, 60 are unlinked to QTL in a positives)/( number of false positives +
mapping population. Of these 60 markers, number of true positives ). In the example in
three are incorrectly declared to be linked Fig. 6.7, the FDR is equal to 3/(3 + 1) = 0.75
to a QTL (i.e. false positives, Fig. 6.7) and rather than 0.05. The FDR can therefore be
57 are correctly declared to be unlinked much greater than aC (Fernando, 2002).
to QTL (i.e. true negatives, Fig. 6.7). The The simulation study by Bernardo
comparison-wise significance level or (2004) was undertaken to determine the
Type I error rate, denoted by aC, is equal FDR in an F2 mapping population, given
to (number of false positives)/( number of different numbers of QTL, population sizes
false positives + number of true negatives) . and trait heritabilities. Markers linked
In the example in Fig. 6.7, aC is equal to 3/ to QTL were detected by multiple regres-
(3 + 57) = 0.05. Studies to map QTL have sion of phenotype on marker genotype.
differed in the significance levels used. Phenotypic selection and marker-based
Some investigators have used stringent recurrent selection were compared. The
significance levels of aC 0.0001, as sug- FDR increased as aC increased. Notably, the
gested by a permutation test to control the FDR was often 1030 times higher than the
experiment-wise error rate (Churchill and aC level used. Regardless of the number of
Doerge, 1994), whereas other investigators QTL, heritability or size of the genome, the
(Openshaw and Frascaroli, 1997) have used FDR was 0.01 when aC was 0.0001. The
a relaxed significance level of aC = 0.1. FDR increased to 0.82 when aC was 0.05,
Regardless of the significance level heritability was low and only one QTL con-
used, a misconception is that aC is equal trolled the trait. An aC of 0.05 led to a low
to the proportion of false positives among FDR when many QTL (30 or 100) control-
all declared markerQTL linkages. In other led the trait, but this lower FDR was accom-
words, if 20 QTL have been declared at a panied by a diminished power to detect
significance level of aC = 0.05, a miscon- QTL. Larger mapping populations led to
ception is that only 20 0.05 = 1 of the 20 both a lower FDR and increased power.
declared QTL is false. The false discovery Relaxed significance levels of aC = 0.1 or
Molecular Dissection of Traits: Theory 247

0.2 led to the largest responses to marker- QTL mapping has also been moving from
based recurrent selection, despite the high single trait-based analysis to integrated anal-
FDR. To prevent false QTL from confus- ysis of multiple traits or even thousands of
ing the literature and databases, a detected expression traits simultaneously. Methods
QTL should, in general, be reported as a for some specific traits including triploidy
QTL only if it was identified at a stringent endosperms and dynamic traits across dif-
significance level, e.g. aC 0.0001. In con- ferent developmental stages and methods for
clusion, the question of what proportion of complicated genetic effects including epista-
declared QTL in plants are false? cannot sis and genotype-by-environment interaction
be answered definitely because QTL stud- (Chapter 10) have also been developed.
ies have used different significance levels, Statistical methods have been devel-
traits differ in the number of underlying oped for almost all types of complicated
QTL and experiments have used differ- situation that would be encountered in
ent types of mapping populations (e.g. BC the genetic dissection of complex traits.
instead of F2 populations). However, most methods still remain at the
As a potential alternative to the FDR, stage of theory, published by statisticians
Chen and Storey (2006) proposed a gener- who keep moving forward to new publi-
alized version of genome-wide error rate cation opportunities or optimizing their
(GWER). Rather than guarding against any methodologies with endless efforts, leav-
single false positive linkage from occurring, ing few feasible options for geneticists.
the generalized GWER allows the researcher It should be noted that the best method
to guard against exceeding more than k false means nothing but statisticians games
positive linkage, where k is chosen by the unless it can be translated into user-
user. For example, if we set k = 1, then friendly software.
the goal for GWERk is to prevent more than Practically, there might be no general
one false positive linkage from occurring. method that can meet all requirements.
The user can apply this significance crite- This is because, on the one hand, the sim-
rion at the appropriate value of k or at sev- plest methods (such as phenotypic com-
eral values of k. GWERk allows the user to parison of alternative marker genotypes) are
provide a more liberal balance between true the best for simple traits; and on the other
positives and false positives at no additional hand, modelling or simulation, no matter
cost in computation or assumptions. how many parameters can be incorporated
into the model, might be too complicated
for complex traits that interact with vari-
ous environments. Ultimately, dissection
6.12 Summary and Prospects of complex traits will rely on continuous
research effort in applied QTL mapping
QTL mapping has evolved from single and integrated utilization of various infor-
marker-based, two flanking marker-based, mation and materials that has been accu-
to multiple marker-based approaches and mulating, including genetic and breeding
finally to all-marker-based whole genome materials (populations, structured mate-
approaches. It started by using simple and rials), molecular markers (sequences and
well-characterized F2 or RIL populations genes) and various phenotypic data col-
derived from biparental lines and now lected across environments (years, seasons
extends to any population including those and locations).
derived from multiple parents or randomly Commercial breeding programmes grow
selected materials. As an increasing number thousands of progenies per year derived
of populations, maps and QTL information from multiple, related and unrelated crosses
have been accumulating worldwide, it has and evaluate them for many agronomically
become increasingly important to use all important traits in diverse environments.
available data for meta-analysis, pooled With the use of high-throughput facilities
analysis and in silico mapping. (DNA sequencers, DNA microarrays, protein
248 Chapter 6

chips, etc.), these materials can be assayed QTL mapping; and (ii) the sampling of full
in parallel with the use of genomic tools. QTL variation present in a wide range of
Thus, the current limitations of classical germplasm, which allows the breeders
QTL mapping studies may soon be over- to search for the best alleles (allele min-
come by pedigree-based and/or haplotype- ing) present in elite materials as well as
based QTL mapping approaches (Jannink in genetic resources. On the other hand,
et al., 2001; Jansen et al., 2003). The main molecular dissection of complex traits will
ideas include: (i) the exploitation of pedigree depend on the utilization of both linkage-
and phenotypic data, routinely collected and LD-based methods (Manenti et al.,
in applied plant breeding programmes, for 2009; Myles et al., 2009).
7
Molecular Dissection of Complex Traits: Practice

In recent years, quantitative trait loci (QTL) requires three major data sets, which are
research has been attracting many scientists for molecular marker, phenotype and link-
resulting in numerous publications every age map, plus genetic mapping software
year. From the viewpoint of practice, how- that provide an appropriate mapping
ever, one should consider the whole picture procedure with user-friendly output. The
of QTL from single QTL to multiple QTL, linkage map can be specific to the map-
single traits to trait complexes, homogene- ping population or is inferred from phys-
ous to heterogeneous genetic backgrounds ical positions available for the molecular
and static mapping to dynamic mapping. markers.
Advances in molecular dissection of com-
plex traits could answer the following
questions: How many genes are involved
in genetic control of each quantitative trait 7.1 QTL Separating
in a segregating population? Can we sep-
arate closely linked QTL into single units? Most quantitative traits can be genetically
How can we compare QTL across different associated with molecular markers that are
genetic backgrounds and developmental located in different chromosomal regions.
stages? How can we handle multiple traits These regions represent either separate sin-
and expression QTL? Issues related to gle QTL or multiple closely linked QTL. The
these questions will be addressed in this number and distribution of multiple QTL
chapter. Some theoretical considerations on chromosomes determines their manipu-
will be also discussed as complementary lability in genetics and breeding. Generally,
to Chapter 6. For general resources, readers multiple QTL affecting a specific trait have
may refer to Xu (1997), Liu (1998), Lynch four possible distributions on chromo-
and Walsh (1998), Paterson (1998), Flint somes (Xu, 1997; Fig. 7.1): (i) independent
and Mott (2001), Xu, Y. (2002), Collard QTL genes are independently distributed
et al. (2005), Gibson and Weir (2005) and on each chromosome; (ii) loosely linked
Wu and Lin (2006). QTL genes are located on the same chro-
QTL mapping practice will usually mosome but separated by large distances so
include creating, genotyping and pheno- that they recombine with high frequency
typing mapping populations, generating and can be easily separated; (iii) clustered
genetic linkage maps and establishing QTL genes are closely linked or clustered
markertrait association. QTL mapping in a specific chromosomal region so that

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 249


250 Chapter 7

Chromosome I Chromosome II Chromosome III


Regional mapping

Minor
gene
mapping

Trait A Trait B Trait C


Independent genes Loosely linked genes Closely linked or clustered genes

Fig. 7.1. Models for QTL distribution. Three traits (A, B and C) are used as examples for independent QTL
(Trait A), loosely linked QTL (Trait B), closely linked or clustered QTL (Trait C, Chromosome II and III), and
mixed model (Trait C). Detectable QTL are indicated by circles, and their effects are represented by the cir-
cle sizes. Likelihood maps for Trait C are given at the right side of each linkage map, and for Chromosomes
I and III, two likelihood maps are given to show expected results from minor QTL mapping and regional
mapping. From Xu (1997). This material is reproduced with permission of John Wiley & Sons, Inc.

they behave as one gene with major effect; by different methods including mapping
and (iv) mixed distribution for a specific and selection approaches.
trait, QTL have a combined distribution of
the three models above.
Because of continuous variation in
quantitative traits, QTL genotypes cannot 7.1.1 Mapping approaches
be easily determined by inspecting the dis-
tribution of trait phenotype alone. This is In theory, molecular-marker-based QTL map-
one of the fundamental problems of quan- ping can be used to draw inferences about
titative genetics. Historically important QTL allelic differences. Normally, however,
genetic parameters, e.g. genetic variances it is difficult to determine whether the effect
and heritability (Chapter 1), summarize the detected with a particular molecular marker
effects caused by all QTL but do not pro- is due to one QTL with large effect or linked
vide information to distinguish the effects QTL each with relatively small effect. For
of individual QTL. In order to understand this reason, the term QTL usually describes
the genetic structure of QTL and ultimately a region of a chromosome defined by link-
clone them, multiple QTL affecting the age to a marker gene (Tanksley, 1993). Using
same trait must be mapped on to chromo- a mapping procedure, one can partition
somes and their effects must be well sepa- multiple QTL into single manipulable units
rated. Theoretically, multiple QTL can be and determine whether a QTL is comprised
separated into single manipulable factors of one or more genes (Fig. 7.1). This strategy
Molecular Dissection of Traits: Practice 251

depends on both the resolution of molecular ing the progeny of individual members of an
maps and the mapping power for the QTL F2 population until virtually homozygosity
with small effect and requires improvement is achieved. Considering the fact that more
of the statistical power in QTL analysis, sat- and more RILs have been accumulated in
uration of molecular maps and optimization breeding programmes and such populations
of population structures. can be exploited for the mapping of one
or more traits, the RIL approach is practi-
Fine mapping cal for most crops. AILs are initiated by a
cross between two inbred lines and derived
It is generally acknowledged that a typical by sequentially and randomly intercross-
higher plant genome includes 10,000 ing each generation until advanced gener-
100,000 genes, scattered through a total of ations are attained. For these two kinds of
1081010 bp of DNA. Consequently, 0.1% populations, many recombinational events
of the genome would include an average required for fine mapping of QTL are accu-
of 10100 genes. Several genes lying close mulated in a single relatively small popu-
together, each with a small effect on a trait, lation over the course of many generations.
could appear to be a single QTL of large effect Due to more opportunity for meiotic recom-
(Michelmore and Shaw, 1988; Paterson et al., bination, they have the advantage of possi-
1988). Reducing the size of the regions iden- bly distinguishing more closely linked QTL.
tified as containing QTL through fine map- For example, with the same population size
ping has been envisioned as an initial step and QTL effect, the 95% confidence inter-
in identifying single QTL that ultimately val of a QTL map location of 20 cM in the
could be manipulated using transforma- F2 is reduced fivefold after eight additional
tion (recombinant DNA) technology (Stuber, random mating generations (F10; Darvasi
1994a; Tanksley et al., 1995). Current strate- and Soller, 1995). RILs have similar effects
gies for mapping QTL depend on comparing on the resolution power of QTL mapping.
the means of recombinant and non-recom- It is worth noting that increases in recom-
binant classes. Given the practical size of binational events will reduce the effect due
segregating populations (n = 200300) and to the QTL associated with any particular
the marker density of molecular maps most marker and thus, these populations are more
frequently used for QTL studies, preliminary suitable for fine mapping of QTL of moder-
mapping resolution of QTL has been limited ate and large effects. In another approach,
to approximately 1020 cM inadequate Paterson et al. (1990) suggested that recom-
for distinguishing between single gene and binant individuals could be identified in
multi-gene composition. To reveal just what primary generations and selectively multi-
lies at a locus, techniques with much higher plied in subsequent generations so that the
resolution are necessary. recombinant classes occur at near equal fre-
Conventional mapping populations quency with the non-recombinant classes,
such as backcross (BC) and F2 have limi- increasing the power for statistical compari-
tations in fine mapping of QTL due to the sons among the classes.
lack of sufficient recombinational events The benefits of using designs such as
even in large populations. Therefore, an RILs other than the conventional F2 and BC,
alternative approach for fine mapping is to where genotyping and phenotyping could be
exploit populations whose derivations are done on the same set of individuals, include
based on multiple cycles of recombination. reducing cost and environmental variance
These populations include recombinant and taking advantage of the changes in
inbred lines (RILs) (Burr and Burr, 1991) population structures of other RIL popu-
and advanced intercrossing lines (AILs) lations. Kao (2006) proposed a statisti-
(Darvasi and Soller, 1995) or intermated cal method considering the differences in
recombinant inbred lines (IRILs) (Liu et al., population structures between different RIL
1996). As described in Chapter 5, RILs are populations on the basis of a multiple-QTL
produced by continually selfing or sibmat- model to map for QTL in different designs.
252 Chapter 7

The proposed method has the potential to with planned crosses that yields more pre-
improve the resolution of genetic architec- cise estimates than those under random
ture of quantitative traits and can serve as intermating.
an effective tool to explore the QTL map-
ping study in the system of RIL popula- Minor QTL mapping
tions. Martin and Hospital (2006) described
the non-independence of multiple recom- In most QTL identification studies, rather
binations arising in RIL recombination data stringent threshold probability levels have
even though there may be no interference been set so that there is a low risk in making
in each meiosis. They also provide formu- Type I errors (i.e. false positives). Thus, only
las for interference tests, gene mapping and those QTL with sufficiently large pheno-
QTL detection in RIL populations. typic effects to be detected statistically can
A new genetic map of maize, ISUIBM be identified while QTL with smaller effects
Map4, that integrates 2029 existing markers will fall below the threshold of detection (cf.
with 1329 new indel polymorphism (IDP) Fig. 7.1). When multiple QTL are located
markers has been developed using IRILs in the same chromosomal region, the ones
from the intermated B73 Mo17 (IBM) with smaller effects cannot be detected in
population (Fu et al., 2006). The mosaic most instances. This overshadow effect
structures of the genomes of 91 IRILs, an of major QTL over minor QTL makes the
important resource for identifying and map- molecular marker approach biased towards
ping QTL and expression QTL (eQTL), were the detection of QTL of large phenotypic
defined. When this RIL population was effects. It should be pointed out that these
evaluated in four environments for resist- major QTL would be ones with high herit-
ance to southern leaf blight (SLB) disease abilities, easily manipulated through trad-
caused by Cochliobolus heterostrophus race itional breeding practices and may already
O (Balint-Kurti et al., 2007), four common be fixed in many breeding lines. There are
SLB resistance QTL were identified in all accumulated data from numerous QTL stud-
environments, two in bin 3.04 and one each ies to establish definitely that QTL affecting
in bins 1.10 and 8.02/3. A comparison was a number of quantitative traits are distrib-
made between SLB QTL detected in two uted throughout the genome and certain
populations, independently derived from chromosomal regions appear to contribute
the same parental cross: the IBM advanced greater effects than others. More surprising
intercross population and a conventional has been the finding that in many instances
RIL population. Several QTL for SLB resist- a large proportion of quantitative variation
ance were detected in both populations, can be explained by the segregation of a
with the IBM providing between 5 and 50 few major QTL. It is not uncommon to find
times greater mapping resolution. individual QTL that can account for more
Population size and mating design are than 20% of the phenotypic variation in a
two important aspects to be given adequate population (Table 1 of Tanksley, 1993) and
consideration during the development of values as high as 85.7% (Lin et al., 1995)
IRILs. Although random intermating of F2 (a major gene with distinct bimodal distribu-
populations has been suggested for obtain- tion) have been reported for a single major
ing precise estimates of recombination fre- QTL. It may be reasonable, therefore, to use
quencies between tightly linked loci, Frisch marker technology as a means for placing
and Melchinger (2008) in a recent simula- greater emphasis on those QTL showing
tion study showed that sampling effects due only relatively minor effects (minor QTL).
to small population sizes in the intermating Detectability of a trait locus is
generations have abolished the advantages severely limited by its genetic background.
of random intermating that were reported A straightforward background effect is dilu-
in previous theoretical studies consider- tion, i.e. the more QTL alleles that exist, the
ing an infinite population size. They also smaller the relative contribution of a given
propose a mating scheme for intermating locus (Frankel, 1995). The smallest effects
Molecular Dissection of Traits: Practice 253

a QTL can have and still be detected by reduce the chances of spurious QTL being
the marker method depend on a number of reported, but also reduce the chances of
factors (Tanksley, 1993; Chapter 6), which detecting QTL with smaller effects. This
include: relationship indicates that development of
QTL mapping methods which can improve
1. Map distance: the closer a QTL is to a the mapping power with a specific sample
marker, the smaller the QTL effect and still size would benefit the separation of minor
be detected statistically. This relationship QTL. Based on the concept of permutation
indicates that the power of QTL mapping tests, Churchill and Doerge (1994) described
can be improved with the saturation of a method to determine an appropriate
molecular maps. Now, many high-density threshold value for declaring significant QTL
linkage maps providing high quality ref- effects, providing an alternative approach to
erence maps for QTL mapping have been the likelihood of odds (LOD) drop-off method
available for many crop plants. (Lander and Botstein, 1989). The condi-
2. Sample size: the larger the sample (popu- tional empirical threshold and the residual
lation) size, the more likely the effects of empirical threshold yield critical values that
smaller QTL will reach statistical signifi- can be used to construct tests for the pres-
cance. This relationship indicates that the ence of minor QTL effects while account-
detection of QTL with a relatively small ing for effects of known major QTL (Doerge
effect largely depends on the size of the map- and Churchill, 1996). Now the permutation
ping population. Using a typical sample size method has been widely used in various
(n < 500), two or more genes closely linked QTL mapping approaches and some QTL
(within 20 cM) will usually be detected as mapping software such as QTL CARTOGRAPHER
a single QTL (i.e. they cannot be distin- (http://statgen.ncsu.edu/qtlcart/WQTLCart.
guished as separate QTL when mapped with htm) provide the permutation function for
the interval approach of two flanking mark- the statistical methods incorporated.
ers). In maize using an F2 population size
of 1700 individuals and probability thresh- In rice, a total of 15 QTL for heading
old of 0.05, a QTL contributing as little as date (Hd1Hd3, Hd3bHd14) were identified
0.3% of phenotypic variance was reported in several populations derived from crosses
(Edwards et al., 1987). In experiments with between Nipponbare, a rice cultivar from
smaller sample sizes and higher probability Japan, and Kasalath, a rice cultivar from India
thresholds, QTL that explain less than 3% (as reviewed by Yano et al., 2001). Nine of
of the phenotypic variance are not normally these have been mapped as single Mendelian
detected. The bias towards detecting QTL factors and studies have shown that Hd1,
with larger effects means that it is unlikely Hd2, Hd3a, Hd3b, Hd5 and Hd6 are involved
that one will ever detect, map and charac- in day-length response (reviewed by Uga et
terize all of the QTL affecting a character in al., 2007). Using an extremely late heading
any single segregating population. (202 days to heading) cultivar Nona Bokra
3. Heritability: the larger the environmen- from India and japonica cultivar Koshihikari
tal effect on the character (i.e. low heritabil- (105 days) from Japan, QTL analysis iden-
ity), the less likely a QTL will be detected. tified 12 QTL on seven chromosomes. The
Estimates of heritability can be improved by Nona Bokra alleles of all QTL contributed
controlling environmental error. Permanent to an increase in heading date. Comparison
mapping populations such as RILs, double of chromosomal locations between heading
haploids (DHs) and advanced backcrossing date QTL detected between these two culti-
populations or near-isogenic lines (NILs) vars and 15 QTL identified from Nipponbare
can be used to improve the mapping power Kasalath populations revealed that eight
by replicate phenotyping in different envi- of the heading date QTL were nearby the
ronments (years, seasons or locations). Hd1, Hd2, Hd3a, Hd4, Hd5, Hd6, Hd9 and
4. QTL threshold: higher probability thresh- Hd13. The results suggested that the strong
olds for declaring a QTL effect significantly photoperiod sensitivity in Nona Bokra was
254 Chapter 7

generated mainly by the accumulation of disease resistance locus. Now NILs have
additive effects of particular alleles at previ- been widely used in map-based gene clon-
ously identified QTL (Uga et al., 2007). This ing through fine mapping.
also indicates that multiple QTL for complex With the accumulation of permanent
traits like extremely late-heading can be dis- mapping populations such as RILs and
sected by QTL mapping. DHs, it is possible to select lines that are
almost genotypically identical in the whole
Regional mapping genome except for only one or a few marker
loci. Combined with phenotypic similarity,
The most common method of placing mole- this information can be used to obtain NILs
cular markers on a linkage map is by ran- for qualitative or quantitative traits (Xu,
dom cloning of genomic/cDNA sequences, 1997).
by PCR-based detection of polymorphism, The second strategy, referred to as DNA
or by single nucleotide polymorphism pooling or the bulked segregant analysis
(SNP) markers based on chip technology (BSA), which is discussed later in this chap-
(Chapter 3), followed by linkage analysis. ter, relies on the use of segregating popula-
This whole genome mapping method is tions (Michelmore et al., 1991; Giovannoni
extremely useful and can be used to construct et al., 1991), which does not require highly
both low and high resolution maps of com- specialized genetic stocks. This strategy is
plex genomes. Nevertheless, this approach is derived from the concept of selective geno-
limited when one is interested in targeting a typing based on selection for contrasting
particular chromosomal region. A majority of phenotypes or bracket DNA markers.
random markers will ultimately be mapped Both regional mapping methods
outside of a target interval and as the inter- described above, when used in conjunction
val size decreases the odds of any new ran- with high-volume DNA marker technology,
domly generated marker being placed within permit one to screen thousands of loci and
it decreases (Tanksley et al., 1995). selectively identify those adjacent to the
Two strategies have been proposed gene of interest in a specific chromosomal
to target the chromosomal region of inter- region and are very suitable for analysis of
est (regional mapping; Xu, 1997) and have clustered QTL. Moreover, these regional
proven effective for identifying from a large mapping approaches can be accomplished
number of markers the few that reside near without having a genetic map for the spe-
a targeted locus. Both involve the use of cies. For major genes, the efficiency of the
genetic stocks that are (almost) genetically NIL and BSA approaches have been dem-
identical, except in the regions flanking onstrated in plants and early examples
the targeted gene. The first strategy uses of success include Young et al. (1988),
NILs, which are generated by introgression Michelmore et al. (1991), Giovannoni et al.
(Wehrhahn and Allard, 1965). As discussed (1991), Schller et al. (1992), Mackill et al.
in Chapter 4, the inbred lines differ at the tar- (1993) and Pineda et al. (1993).
geted locus or region. If the donor parent and To solve the issue associated with the
the recurrent parent are sufficiently diver- masking effects of major QTL and epistatic
gent, it is possible to detect polymorphisms interactions of multiple QTL involved in RIL-
between the pairs of NILs. The marker that based QTL mapping, Keurentjes et al. (2007a)
detects such polymorphisms will likely be empirically compared the QTL mapping
linked to the target gene. As early examples, power of a genome-wide NIL population with
Young et al. (1988) used NILs and pools of an already existing RIL population derived
restriction fragment length polymorphism from the same parents in Arabidopsis thal-
(RFLP) probes to detect new markers within iana. By analysing and mapping QTL affect-
the Tm-2a region of tomato. Using a simi- ing six developmental traits with different
lar strategy, Martin et al. (1991) were able heritability, overall, QTL with smaller effects
to use random PCR amplification on NILs could be detected in the NIL population more
to isolate new markers near the tomato Pto easily than in the RIL population, although
Molecular Dissection of Traits: Practice 255

the localization resolution was lower. In gen- les at others. The phenomenon where QTL
eral, population size is more important than alleles of similar effect are dispersed among
the number of replicates to increase the map- genetic stocks is referred to as allele disper-
ping power of RILs, whereas for NILs several sion. However, a genetic stock may contain
replicates are absolutely required. (associate) all the alleles of similar (positive
In an effort to identify putative candi- or negative) effect at the multiple QTL; this
date genes underlying drought tolerance in is referred to as allele association. For traits
rice, Nguyen et al. (2004) developed sev- naturally selected towards intermediate phe-
eral expression sequence-based markers notype, alleles of similar effect at multiple
using BSA for saturation mapping of QTL loci are more likely dispersed than associ-
regions. Thirteen of the markers were local- ated. In natural or breeding populations with
ized in the close vicinity of the targeted neutral selection, there are many genetic
QTL regions. In rice, substitution map- stocks that have positive alleles at some loci
ping of a flowering-time QTL associated but negative alleles at others. The extreme
with transgressive variation has separated phenotypes of quantitative traits come from
a previously located QTL, dth1.1, into at the association of QTL alleles while the
least two sub-QTL (Thomson et al., 2006). intermediate phenotype usually indicates
The QTL dth1.1 was associated with trans- allele dispersion. Therefore, different QTL
gressive variation for days to heading in an alleles with similar effect can be identified
advanced BC population derived from the from the existing populations. On the other
Oryza sativa cultivar Jefferson and an acces- hand, if one has allele-associated stocks in
sion of the wild rice relative Oryza rufipo- hand, the QTL alleles could be separated by
gon. A series of NILs containing different O. selection for different genotypes. Allele dis-
rufipogon introgressions across the target persion differs from linkage equilibrium in
region were constructed to dissect dth1.1 two respects. First, allele dispersion refers
using substitution mapping. In contrast to to independent or linked loci controlling the
the late-flowering O. rufipogon parent, O. same trait; while in linkage equilibrium the
rufipogon alleles in the substitution lines related genetic loci are usually supposed to
caused early flowering under both short and be genetically linked and control different
long day-lengths and provided evidence for traits. Secondly, allele dispersion represents
at least two distinct sub-QTL: dth1.1a and a situation in which any two non-genetically
dth1.1b. Potential candidate genes under- related genotypes (strains) within a species
lying these sub-QTL included genes with show allelic differences at the same genetic
sequence similarity to Arabidopsis GI, FT, locus, while linkage equilibrium represents
SOC1 and EMF1 and Pharbitis nil PNZIP. a situation in which a constant gene fre-
Evidence from families with non-target O. quency has been reached in a given popula-
rufipogon introgressions in combination tion derived from two related strains.
with dth1.1 alleles also detected an early Genetic stocks with dispersed QTL
flowering QTL on chromosome 4 and a alleles usually show similar phenotype,
late-flowering QTL on chromosome 6 and making it difficult to identify genetic differ-
provided evidence for additional sub-QTL ences only by phenotypic evaluation. When
in the dth1.1 region. these stocks are used as the parents to pro-
duce segregating populations, however, a
part of the progeny will have transgressive
phenotypes, i.e. they are phenotypically
7.1.2 Screening for allele dispersion outside of the range of the parents, because
these progeny associate all alleles of similar
When multiple QTL control a trait, their effect with the result of recombination of
alleles of positive or negative effect (increas- different QTL alleles. Positive and negative
ing or decreasing trait value) tend to be dis- transgressive individuals will arise from the
persed among genetic stocks, with positive associations of positive and negative alle-
alleles at one or some loci but negative alle- les, respectively. Transgression caused by
256 Chapter 7

dominance and/or overdominance can also obtained by successively selfing the trans-
be excluded by successively selfing the trans- gressive individuals (Xu and Shen, 1992b).
gressive individuals to determine if they Comparative genetic analysis of two con-
maintain the same phenotype in advanced trasting crosses (from the original cultivars
generations. If no significant epistasis can and from the corresponding extreme strains)
be detected by biometrical genetic analysis, revealed that two loci were responsible for
the transgressive segregation in populations the genetic difference of tiller angle in each
derived from two genetic stocks provides pair of the original cultivars, and alleles of
evidence for allele dispersion. Based on the similar effect were dispersed in the origi-
additive-dominance model, Xu and Shen nal cultivars but associated in the extreme
(1992a) suggested three methods to screen strains (largest tiller angle strains having all
the separable QTL alleles by detection of the positive alleles and smallest tiller angle
allele (gene) dispersion, including: (i) test- strains having all the negative alleles). The
ing the homogeneity between F2 phenotypic second cycle of crossing between the extreme
variance and the environmental variance strains derived from different original crosses
estimated from phenotypic variances of revealed further transgression. Biometrical
non-segregating populations (P1, P2 and F1); genetic analysis and selection response indi-
(ii) testing the differences of means among cate that four loci controlled the total varia-
F1, F2, F1 P1 and F1 P2; and (iii) comparing tion of tiller angle in four original cultivars,
the genetic parameters such as gene effects each cultivar carrying two positive alleles
and genetic variances estimated from the at only one locus (Xu et al., 1998).
cross derived by intermating transgressive Allele-dispersion can also be identi-
individuals of two kinds with those esti- fied based on QTL mapping results. In QTL
mated from the cross of the original stocks. mapping, phenotypic difference between
Classical genetic analysis provides parents is not necessary for detection of
some examples for allele dispersion. The QTL. In most cases where no parental dif-
first example in plants may come from ference is found, QTL are still detected,
Nicotiana rustica. The allelic differences which could be due to the complementary
for final height, flowering time and related patterns of positive and negative allelic
characters were largely dispersed between effects. QTL mapping can provide informa-
two cultivars (genotypes) 1 and 5 (Jinks tion about the genetic constitution of each
and Perkins, 1969, 1972; Perkins and Jinks, segregate in the mapping population so
1973) with 127 and 103 cm of final height that one can infer which individual carries
and 77 and 72 days of flowering time (days desirable alleles and then separate the mul-
after sowing), respectively. Among the ran- tiple QTL by selection for individuals with
dom sample of 82 inbred lines derived from different allele combinations. For example,
the cross between these two cultivars, trans- if four QTL are inferred to control a trait, the
gressive lines were found and two of them, allelic constitution can be determined for all
B2 and B35, were the shortest and tallest in individuals and each QTL. Therefore, one
final height (92 and 144 cm, respectively) can easily screen the individuals carrying
and the earliest and latest to flower (70 and the positive allele at each of the four QTL.
84 days, respectively). The simultaneous Because of allele dispersion, it is not likely
analysis of the two contrasting crosses (1 that one QTL mapping experiment using
5 and B2 B35) indicated the allele disper- any single population can detect all the
sion in the original cultivars (Jayasekara QTL affecting a given trait. Therefore, inde-
and Jinks, 1976). Another example is rice pendent experiments tend to reveal differ-
tiller angle (the angle between the main ent QTL or QTL alleles. Comparison of QTL
stem and its tillers). Transgressive segrega- effects and mapping positions may result in
tion was found in the two crosses derived separation of the multiple QTL. However,
from four indica rice cultivars with simi- this approach largely depends on the preci-
lar tiller angle and the extreme strains sion of QTL mapping results available. So
with largest and smallest tiller angles were far, numerous QTL affecting the same traits
Molecular Dissection of Traits: Practice 257

have been identified in most crops and from low-yielding, short-statured parental
they are different in locations and effects. lines (Edwards et al., 1992). A wild rice
Variation among investigations and among species with low yield potential contains
populations can logically be expected for genes that may significantly increase the
the following reasons (Smith and Beavis, productivity of the high-yielding cultivated
1996): (i) different polymorphisms in the rice (Xiao et al., 1996a). There are numerous
populations studied; (ii) different number recent examples available to support these
and location of polymorphic regions affect- early reports.
ing the trait; (iii) environmental effects or As observed in rice QTL mapping, on
genotypeenvironment interaction; and (iv) the average, about four QTL are identified for
small sample size. With the development of each trait (Table 7.1; Xu, Y., 2002), the same
highly polymorphic DNA markers such as as the average obtained for 176 trialtrait
simple sequence repeats (SSRs), the first rea- combinations as reviewed by Kearsey and
son will become less important. Permanent Farquhar (1998). When QTL identified for
populations can be phenotyped at different the same trait are summarized over different
seasons, years or locations, reducing the projects/populations, this number becomes
environmental effect on QTL identification. much larger. For example, rice plant height
Using relatively large sample size, com- has been mapped using 13 populations,
bined with highly polymorphic markers, with 63 QTL reported. Some of the QTL are
permanent population and replicate pheno- allelic to each other, i.e. they were mapped
typing, will help to determine whether the to the same chromosomal region or inter-
variation of QTL mapping comes from dif- vals of less than 15 cM. After elimination
ferent QTL constitutions of populations or of possible allelic QTL, the total number of
not. It is of interest to note that cryptic fac- QTL for plant height is reduced to 29, with
tors were frequently uncovered (e.g. Stuber, up to five QTL existing on a chromosome
1995; Ragot et al., 1995), indicating the pos- (Xu, Y., 2002). The QTL qPH1-1, which cor-
sibility of QTL dispersion. For example, responded to a major semi-dwarf gene sd-1
genetic factors contributing to high grain and qPH8-1 were each detected in six popu-
yield and tall stature in maize occasionally lations. QTL qPH2-2 and qPH3-3 were each
have been associated with marker alleles detected in five populations. Over 50 major

Table 7.1. The number of QTL identified in rice using permanent mapping populations. See Xu, Y. (2002)
for all references.

Number of Number of Number of


Population Population size markers traits QTL

IR64/Azucena DH 105135 146175 56 215


Zhaiyeqing 8/Jingxi 17 DH 132 137243 35 115
9024/LH422 RIL 194 141 25 74
CO39/Moroberekan RIL 143281 127 14 121
Lemont/Teqing RIL 255315 113217 8 46
IR58821/IR52561 RIL 166 399 5 28
IR74/Jalmagna RIL 165 144 5 18
Nipponbare/Kasalath BIL 98 245 4 19
Zhenshan 97/Minghui 63 RIL 238 171 3 6
Asominori/IR24 RIL 65 289 2 17
Acc8558/H359 RIL 131 225 1 11
IR1552/Azucena RIL 150 207 1 4
IR74/FR13A RIL 74 202 1 4
IR20/IR55178-3B-9-3 RIL 84 217 1 4
Overall 65315 113399 161 682
258 Chapter 7

genes for dwarf and semi-dwarf mutants factors, which regulate three plant proc-
have been found (Kinoshita, 1995) and 14 esses and one plant characteristic and two
of them were linked to molecular markers are environmental factors, which modulate
(Huang et al., 1996; Kamijima et al., 1996), the ongoing gene actions of one or more of
with 13 of them (93%) co-localized with the three plant processes. These six direct
plant height QTL. More plant-height QTL components are the rate of node and leaf
will likely be co-localized with major loci, development, change from node to flower,
as more major loci are linked with molecu- vernalization requirement, node number at
lar markers. These co-localizations support the minimal days to flowering, impacting
Robertsons (1985) hypothesis that alleles for photoperiod and impacting temperature
qualitative mutants are simply lost-function (Wallace, 1985). In most crops, the yield
alleles at the same loci underlying quanti- trait is comprised of several yield compo-
tative variation. Until QTL are mapped to nents and oil or protein content is related to
higher degrees of precision and/or cloned, many compounds and amino acids. In rice,
however, it would be difficult to prove that low fertility controlled by polygenes can be
the particular QTL actually correspond to partitioned into several components includ-
known loci defined by macromutant alleles ing male and female sterility, or ovary and
and which QTL are allelic to each other. The pollen abortion so that polygenes can be
QTL allelism test and the determination of divided into several single genes with dif-
the major-gene and QTL correspondence ferent effects and thus can be handled with
depend on the availability of high-density ease. Dissecting or partitioning a complex
molecular maps with a common set of mark- trait into separate components can benefit
ers shared among researchers. both QTL mapping and cloning.
With the generalization of this concept,
non-allelic alleles can be searched for among
an entire set of related species. The high
incidence of transgressive segregation in 7.2.2 Correlated traits
interspecific crosses tells us that individu-
als that do not exhibit a particular trait often Trait correlation may arise from either plei-
carry superior/hidden alleles that condition otropic effects of single genes or from tight
that trait. It is likely that non-allelic alleles linkage of genes affecting different traits.
usually would be present in other strains Correlated traits often share some QTL
but missed due to limitations in genetic mapped to similar chromosomal regions. In
analysis. By using the entire set of the most Poaceae, increased plant height often
related species, it thus may be possible to correlates with late flowering. Comparative
identify all of the genes involved in a given data support the possibility that different,
trait or physiological process because the closely linked genes, rather than a single
genes phenotypically hidden in one spe- gene, account for correlated traits. In sor-
cies may not be hidden in another species ghum, two of the three QTL affecting flow-
(Bennetzen, 1996). ering time were associated with height QTL
and two major QTL were mapped within
overlapping 90% confidence intervals,
explaining 85.7% and 54.8% of phenotypic
7.2 QTL for Complicated Traits variation, respectively, and showing similar
gene action (dominance/additive = 0.72 and
7.2.1 Trait components 0.73) (Lin et al., 1995). Many pairs of inde-
pendent discrete mutations affecting height
Many quantitative traits are a complex con- and flowering are closely linked in corre-
sisting of different or related components or sponding locations of wheat (Ppd1 and Rht8)
subtraits. For example, there are six direct (Worland and Law, 1986; Hart et al., 1993)
component effects upon days to flower- and rice (Se-1/Se-3 and d-4/d-9) (Kinoshita
ing in legume seed crops. Four are genetic and Takahashi, 1991; Causse et al., 1994).
Molecular Dissection of Traits: Practice 259

Correlated traits have been analysed It is also likely that physiological interac-
separately in QTL mapping without tions independent of genetic factors may
using correlation information and thus result in correlated phenotypic response.
correlation between traits will affect the Li et al. (2006) introduced a method for
mapping of any single trait involved. the analysis of multi-locus, multi-trait
Considering the correlation between genetic data that provides an intuitive
different physiological characters, a poly- and precise characterization of genetic
genic complex controlling a physiological architecture and they showed it is possi-
trait can be manipulated to a large extent ble to infer the magnitude and direction
by cloning only one or a few QTL from of causal relationships among multiple
this complex. That is, all functions of correlated phenotypes. They illustrated
a polygenic complex affecting multiple the technique using body composition
physiological characters can be started and bone density data from mouse inter-
and pushed by making one of the func- cross populations. The identification of
tions into a highly efficient system. This causal networks sheds light on the nature
was observed in growth-hormone-trans- of genetic heterogeneity and pleiotropy in
formed mice (Palmiter et al., 1983): when complex genetic systems.
growth hormone was produced largely Genetic correlation can be understood
by inserting additional copies, all other at gene expression levels. Coordinated reg-
components could respond appropriately ulation of gene expression levels across a
to this change, although growth hormone series of experimental conditions provides
is only one of the components affecting valuable information about the function
development. It seems that QTL control- of correlated transcripts. In order to anno-
ling closely related quantitative traits tate gene function and identify potential
could be manipulated to show linked members of regulatory networks, Lan et al.
response by only manipulating one or a (2006) explored correlation of expression
few of these QTL. profiles across a genetic dimension, namely
Multivariate analysis of complicated genotype segregating in a panel of 60 F2
traits can be used to investigate the struc- mice derived from a cross used to explore
ture of a genetic system that includes diabetes in obese mice. They first identi-
allelic variation at multiple loci, interme- fied 6016 seed transcripts for which they
diate phenotypes and their relationships. observed that gene expression is linked to a
Jiang and Zeng (1995) proposed a method particular region of the genome. Then they
for QTL detection based on a multivariate searched for transcripts whose expression is
normal model with unconstrained covari- highly correlated with the seed transcripts
ance structure. Alternatively, dimension and tested for enrichment of common bio-
reduction techniques, such as principal logical functions among the lists of corre-
component analysis, can be applied to a lated transcripts. They found and explored
set of correlated traits. Multivariate QTL the properties of 1341 sets of transcripts
analyses can provide enhanced power and that share a particular gene ontology term.
resolution in QTL mapping when traits Thirty-eight seeds in the G protein-coupled
are highly correlated and share common receptor protein signalling pathways were
genetic determinants (Korol et al., 2001). correlated with 174 transcripts, all of which
Mapping studies that investigate clus- are also annotated as G protein-coupled
ters of related phenotypes often reveal a receptor protein signalling and 131 of which
network of genetic effects, in which each share a regulatory locus on chromosome 2.
phenotype is influenced by multiple loci They noted that many of these findings
(heterogeneity) and different phenotypes would have been missed by simple eQTL
share one or more loci in common (plei- analysis without the correlation step. Trait
otropy). The complexity of observed QTL correlation combined with linkage mapping
networks will vary depending on the is more sensitive compared with linkage
traits and the power of the study design. mapping alone.
260 Chapter 7

7.2.3 Qualitativequantitative traits programme. As indicated by Xu (1997),


QTL with larger effects will mask the QTL
Many economically important quantita- with smaller effects in the same or nearby
tive characters, including plant height, pest locations so that the latter cannot be eas-
and disease resistance and grain quality, ily detected if they are mapped simultane-
in plant populations exhibit the combined ously. In order to eliminate the phenotypic
effects of both major genes and polygenes. effect of the major gene from the residual
In other words, many traits are influenced (error) term in the analysis of the signifi-
by both qualitative and quantitative genes cance of other QTL, Lin et al. (1995) tried
or by a majorminor gene system, showing two approaches, one by adjustment of the
bimodal distribution (Fig. 7.2; for examples phenotypic value of individual plants for
see Jiang et al., 1994 and Lin et al., 1995). the major gene effect and one by use of the
These traits can be defined as quantitative fix QTL algorithm in MAPMAKER/QTL to fix
qualitative traits (QQT) (Mo, 1993a,b) or the QTL effect with the largest LOD. As indi-
semi-quantitative traits (Stuber, 1995). cated by the authors, such approaches incur
Separation of the major gene effects from a risk that a fraction of the residual (error)
the polygenic effects is important for under- variance will be removed from the experi-
standing the whole genetic system of these mental model, due to chance correlation
kinds of traits and for mapping and cloning with the fixed parameters, artificially reduc-
of the genes involved. ing the remaining error term and increas-
Jiang et al. (1994) used Elstons (1984) ing the likelihood of false positives. Doerge
model of mixed major locus and polygenes and Churchill (1996) used the conditional
with modification and extension to obtain empirical threshold and the residual empir-
reliable information necessary for assess- ical threshold to search for multiple QTL.
ment of the use of major genes in a breeding Once a major QTL has been detected, its

Increase of
gene number
Compound
environmental
factors Minor QTL
Frequency

Major QTL
+
minor QTL

Single
environmental
factors
Major QTL

Uniformity of
environments Phenotypic value

Fig 7.2. Relationship between phenotypes, genes, and environments. Discrete phenotypic distribution for
qualitative traits arises from major genes, bimodal distribution for qualitativequantitative traits from the
joint effect of a major QTL (with dominant effect) and some minor QTL, and normal distribution for typical
quantitative traits from many minor QTL. With partition and uniformity of environments, some continuously
distributed traits can be converted into a bimodal of discretely distributed traits. From Xu (1997). This mate-
rial is reproduced with permission of John Wiley & Sons, Inc.
Molecular Dissection of Traits: Practice 261

phenotypic effects can be accounted for in into consideration maternal genetic effects
the search for secondary QTL. This method and cytoplasmic effects along with the direct
is suitable for unlinked multiple QTL and/ genetic effects of seeds (Xu, 1997). As seeds
or QTL residing on different chromosomes. initiate a new generation that differs from
Mapping methods suitable for majorminor their maternal plants, some seed traits should
genes warrant further research. be considered as a generation advanced
over their maternal plants. Since the DNA
used in most molecular analyses has been
extracted from leaves or tissues of maternal
7.2.4 Seed traits plants, genetic analysis of endosperm traits
should be based on the DNA extracted from
The improvement of seed yield and quality both maternal plants and endosperm tissues
is one of the most important objectives in in order to understand the relative contri-
cereal breeding. As a major storage organ of bution of the different genetic factors to the
cereal seeds, endosperms provide humans variation of endosperm traits.
with proteins, essential amino acids and Many years after Xus (1997) advocacy,
oils. An understanding of the inheritance of several articles were published detailing
endosperm traits is critical for the improve- the unique difference associated with tri-
ment of yield potential and seed quality. ploid traits and some statistical methods
Genetic behaviour in triploid endosperms have been developed with consideration of
is very different from that of the maternal the trisomic inheritance of the endosperm
plants that supply the components for grain and the generation difference between the
growth and development. Thus, methods mapping population and the endosperm
suited for genetic analysis of traits in mater- (Wu et al., 2002a,c; Xu, C. et al., 2003;
nal plants (diploids for most cereal crops) Kao, 2004; Cui and Wu, 2005; Wang, X. et al.,
cannot directly be used for endosperm traits 2007). In general, the proposed triploid-
(Xu, 1997). Based on triploid models, bio- based methods use the marker informa-
metrical methods have been proposed for tion either from only the maternal plants
conventional genetic analysis of endosperm or from both the maternal plants and their
traits (Gale, 1975; Bogyo et al., 1988; Mo, embryos for mapping endosperm traits and
1988; Foolad and Jones, 1992; Pooni et al., provide better detection power and estima-
1992; Zhu and Weir, 1994). Any analytical tion precision than diploid-based methods.
method for endosperm traits needs to com- The genetic models are also developed to
bine a QTL analytical method developed handle epistatic effects (Cui and Wu, 2005)
for diploid maternal plants with a triploid and to use bulked grain samples (Wang, X.
model proposed for conventional genetic et al., 2007).
analysis. Zheng et al. (2008) conducted QTL anal-
On the other hand, the genetic system ysis on maternal and endosperm genome for
controlling endosperm traits may be much three cooking quality traits (amylose con-
more complicated than that which controls tent, gel consistency and gelatinization tem-
the traits of maternal plants. Because mater- perature) in rice using a genetic model with
nal plants provide seeds with a portion of endosperm and maternal effects and envi-
their genetic material and almost all the ronmental interaction effects. The results
nutrients required for growth and develop- suggested that a total of seven QTL were asso-
ment, seed traits are genetically affected by ciated with cooking quality of rice, which
both the seed nuclear genes and the mater- were subsequently mapped to chromosomes
nal nuclear genes. In addition, cytoplas- 1, 4 and 6. Six of these QTL were also found
mic genes may also affect some seed traits to have environmental interaction effects.
through their indirect effects on the biosyn- As we discussed earlier, several studies
thetic processes of chloroplasts and mito- have shown that maternal genotypic vari-
chondria. To understand endosperm traits ation could greatly influence the estima-
with biological accuracy, one should take tion of the direct effects of QTL underlying
262 Chapter 7

endosperm traits. Recently, Wen and Wu provides a central tool for the determination
(2008) proposed methods of interval map- of the relative allelism of genes in different
ping of endosperm QTL using seeds of F2 or species (Bennetzen, 1996).
BC1 (an equal mixture of F1 P1 and F1 P2 Many QTL mapping studies have been
with F1 as the female parent) derived from a published for species connected by com-
cross between two pure lines (P1 P2). The parative linkage maps, which can be used
most significant advantage of these experi- to infer some conclusions regarding the
mental designs is that the maternal effects hypothesis of conserved QTL among diver-
do not contribute to the genetic variation gent species. Perhaps the first evidence
of endosperm traits and therefore the direct for orthologous QTL comes from compara-
effects of endosperm QTL can be estimated tive mapping in mung bean and cowpea
without the influence of maternal effects. In (Fatokun et al., 1992), where the research-
addition, these experimental designs could ers showed that the single most important
greatly reduce environmental variation QTL for determining seed weight in these
because a few F1 plants grown in a small two distinct species mapped to the same
block of field will produce sufficient F2 locus in both genomes and that the chance
or BC1 seeds for endosperm QTL analysis. occurrence of such coincidental mapping
More recently, He and Zhang (2008) pro- is very unlikely. Lin et al. (1995) and Xiao
posed mapping endosperm trait loci (ETL) et al. (1996c) discussed the putative orthol-
and epistatic ETL (eETL) as an efficient ogous QTL across grass species. Despite of
way to genetically improve grain quality different chromosome numbers and ploidy
using an alternative random hybridization levels, homoeologous relationships among
design. Using a penalized maximum likeli- rice, maize, wheat, oats and barley chromo-
hood method, the endosperm trait means of somes have been defined by using common
random hybrid lines together with known anchor probes (Ahn and Tanksley, 1993;
marker genotype information from their Ahn et al., 1993; van Deynze et al., 1995a, b).
corresponding parental F2 plants were used This information allows the comparison of
to estimate efficiently and without bias the locations of QTL affecting the same or cor-
positions and all of the effects of eETL. This responding traits in different species. For
new method may enable us to map triploid the QTL documented, some of them show
eETL in the same way as diploid quantita- similarities in locations for the same or
tive traits in future. similar traits. As an example, take flower-
ing traits (days to heading, flowering and
anthesis). The QTL close to CDO1081 on
rice chromosome 3 coincides with a similar
7.3 QTL Mapping across Species QTL on the homoeologous chromosomes
1 (Stuber et al., 1992) and 9 (Koester et al.,
For species connected by parallel genome 1993; Veldboom et al., 1994) of maize,
mapping, as described in Chapter 3, it chromosome 4 of barley (Hayes et al.,
should be possible to compare the map posi- 1993) and chromosome 5 of hexaploid oat
tions of QTL for the same or similar char- (Siripoonwiwat, 1995). Across 15 maize
acters. In this case, breeders might be able populations studied by seven groups of
to predict the positions of important QTL researchers, 55 QTL or mutants affecting
(e.g. for growth rates in animals or yield in flowering time were reported. A total of 26
plants) in one species based on mapping (47%) are clustered in five regions that span
studies from the others. Coincidence of 12.1% of the maize genome. One flowering
map positions would support the hypoth- QTL reported in sorghum (Pereira et al.,
esis that loci underlying natural quantita- 1994), three QTL in rice (Li et al., 1995),
tive variation have been conserved during three discrete mutants in wheat (Hart et al.,
long periods of evolutionary divergence 1993) and one in barley (Laurie et al., 1994)
(i.e. they are orthologous genes). The col- were included. Such a coincidence of QTL
linearity of genomes among related species map positions among these distinct species
Molecular Dissection of Traits: Practice 263

suggests that this kind of locus can be traced discrimination was conserved between the
back to the last common ancestor of these two species. Putative candidate genes for
species. Paterson et al. (1995) indicated that bud burst can be identified on the basis of
in sorghum, rice and maize, three similar co-locations between EST-derived markers
phenotypes (seed size, disarticulation of and QTL.
the mature inflorescence and day-neutral Schaeffer et al. (2006) reported a strat-
flowering) are largely determined by a small egy for consensus QTL maps that leverages
number of QTL that correspond closely in the highly curated data in MaizeGDB, in
the three taxa, which impels the compara- particular, the numerous QTL studies and
tive mapping of complex phenotypes across maps that are integrated with other genome
large evolutionary distances. data on a common coordinate system. In
Further studies identified QTL control- addition, they exploited a systematic QTL
ling important agronomic traits that showed nomenclature and a hierarchical categoriza-
similarities in locations for the same or simi- tion of over 400 maize traits developed in
lar traits (i.e. Fatokun et al., 1992; Lin et al., the mid-1990s; the main nodes of the hier-
1995; Xiao et al., 1996c; for a review, see archy are aligned with the trait ontology at
Xu, 1997). Shattering and plant height are Gramene, a comparative mapping database
examples that were also mapped to collinear for cereals. Consensus maps are presented
regions among grass genomes (Paterson et for one trait category, insect response (80
al., 1995; Peng et al., 1999). Chen et al. (2003) QTL); and two traits, grain yield (71 QTL)
identified four QTL for quantitative resist- and kernel weight (113 QTL), representing
ance to rice blast that showed correspond- over 20 separate QTL map sets of ten chro-
ing map positions between rice and barley, mosomes each.
two of which had completely conserved The use of anchor markers has enabled
isolate specificity and the other two had detection of possible orthologous QTL by
partial conserved isolate specificity. Such comparing QTL across cereals or construc-
corresponding locations and conserved spe- tion of phylogenetic relationships. Although
cificity suggested a common origin and con- it is unclear how many claimed orthologous
served functionality of the genes underlying QTL are real, detection of QTL that are com-
the QTL for quantitative resistance. mon across cereals at least indicates that the
In forest trees, a comparative genetic same QTL could be identified from very dif-
and QTL mapping was performed between ferent genetic backgrounds.
Quercus robur L. and Castanea sativa Mill., In a significant across species study,
two major forest tree species belonging to Campbell et al. (2007) identified a set of evo-
the Fagaceae family (Casasoli et al., 2006). lutionarily conserved and lineage-specific
Oak EST-derived markers (sequence tagged rice genes, which is termed conserved
sites, STSs) were used to align the 12 link- Poaceae-specific genes (CPSGs) reflecting
age groups of the two species. Fifty-one and the presence of significant sequence simi-
45 STSs were mapped in oak and chestnut, larity across three separate Poaceae sub-
respectively. These STSs, added to SSR families. Using the rice genome annotation,
markers preciously mapped in both species, along with genomic sequence and clustered
provided a total number of 55 orthologous transcript assemblies from 184 species in
molecular markers for comparative mapping the plant kingdom, they have identified a
within the Fagaceae family. Homologous set of 861 rice genes that are evolutionarily
genomic regions identified between oak and conserved among six diverse species within
chestnut allowed comparison QTL posi- the Poaceae yet lack significant sequence
tions for three important adaptive traits. similarity with plant species outside the
Co-location of the QTL controlling the tim- Poaceae. It was interesting to note that
ing of bud burst was significant between the vast majority of rice CPSGs (86.6%)
the two species. However, conservation of encode proteins with no putative function or
QTL for height growth was not supported by functionally characterized protein domain
statistical tests. No QTL for carbon isotope and for the remaining CPSGs, 8.8% encode
264 Chapter 7

an F-box domain-containing protein and If additional BC generations proceeded with


4.5% encode a protein with a putative whole genome selection, this approach can
function. On average, the CPSGs have be used to create QTL-NILs. The uniform-
fewer exons, shorter total gene length and ity of genetic background among QTL-NILs
elevated GC content when compared with and their donors should permit straightfor-
genes annotated as either transposable ele- ward phenotypic evaluation and facilitate
ments (TEs) or those genes having signifi- QTL mapping.
cant sequence similarity in a species outside In maize, a set of 89 NILs was created
the Poaceae. At the genome level, syntenic using marker-assisted selection (MAS)
alignments between sorghum (Sorghum (Szalma et al., 2007). Nineteen genomic
bicolor) and 103 of the 861 rice CPSGs regions, identified by RFLP loci and cho-
(12.0%) could be made, demonstrating an sen to represent portions of all ten maize
additional level of conservation for this set chromosomes, were introgressed by back-
of genes within the Poaceae. crossing three generations from donor line
Tx303 into the B73 genetic background.
NILs were genotyped at an additional 128
7.4 QTL across Genetic SSR loci to estimate the size of introgres-
Backgrounds sion and the amount of background intro-
gression. Tx303 introgressions ranged from
10 to 150 cM in size with an average of
Phenotypic expression of quantitative traits
60 cM. Across all NILs, 89% of the Tx303
is affected to a great extent by internal
genome is presented in targeted and back-
genetic background. That is at least in part
ground introgressions. A parallel experi-
because gene action is not always independ-
ment of testcrosses of each NIL to the
ent between QTL controlling different traits
unrelated inbred, Mo17, was conducted in
and between QTL and the corresponding
the same environments to map QTL in NIL
major genes or other major genes. Difficulty
testcross hybrids.
in obtaining consistent results from differ-
In Arabidopsis, a genome-wide cover-
ent QTL mapping experiments for the same
age NIL population was developed by
trait can be partly attributed to the action
introgressing genomic regions from the
of the heterogeneous genetic background
Cape Verde Islands (Cvi) accession into the
from which mapping populations are
Landsberg erect (Ler) genetic background
derived, besides the reasons mentioned in
(Keurantjes et al., 2007b). QTL mapping
the Section 7.1.2.
power of the new population was empir-
ically compared with an already exist-
ing RIL population derived from the same
7.4.1 Homogeneous genetic parents. For that, QTL affecting six devel-
backgrounds opmental traits with different heritability
were analysed and mapped. Overall, in the
Populations developed for QTL analysis NIL population smaller-effect QTL than
can be very heterogeneous in genetic back- in the RIL population could be detected
grounds, with hundreds or thousands of although the location resolution was lower.
genes segregating simultaneously, or very Furthermore, the effect of population size
homogeneous with only a target gene seg- and of the number of replicates on the
regating. Homogeneous or isogenic back- detection power of QTL affecting the devel-
grounds such as NILs can be created through opmental traits was estimated. In general,
the five approaches as described in Chapter population size is more important than
4 and Xu, Y. (2002). Tanksley and Nelson the number of replicates to increase the
(1996) proposed an advanced BC QTL anal- mapping power of RILs, whereas for NILs
ysis in which a hybrid F1 is backcrossed several replicates are absolutely required.
to the recurrent parent for several times to These analyses are expected to facilitate
obtain advanced generations (e.g. BC2, BC3). experimental design for QTL mapping
Molecular Dissection of Traits: Practice 265

using these two common types of segregat- imize the genetic background effect and
ing populations. to confirm the phenotype for the recom-
QTL have been fine mapped by apply- binants selected.
ing a mapping strategy based on analy-
sis of large progenies derived from NILs.
This approach requires the construction of 7.4.2 Heterogeneous genetic
highly inbred lines involving many genera- backgrounds
tions prior to generating the cross needed
for fine mapping. Instead of homogeniz- Although genetic distances and order of
ing the complete genetic background, as DNA markers are comparable among very
in the NIL approach, Peleman et al. (2005) different crosses, QTL mapping using
have chosen to focus specifically on the different populations derived from the
loci involved in expression of the pheno- same cross has identified very different
type. The strategy involved simultaneous QTL. Only some QTL are common across
fine mapping of QTL already at the F2 stage populations of different structures, such as
rather than producing inbred lines prior to DHs and RILs derived from a single cross
fine mapping. The main principle of the (He et al., 2001) where there is an identi-
approach is the selective genotyping and cal set of genes segregating. Heterogeneous
phenotyping of only those plants that yield genetic backgrounds can also come from
information on the map position of the various crosses derived from different culti-
QTL. Such plants are selected after a first vars. Genetic materials with heterogeneous
rough-scale mapping by standard methods genetic backgrounds can be used to esti-
(e.g. 200 F2 individuals). After identifica- mate epistasis, detect non-allelic QTL and
tion of the QTL for the trait of interest, a discover multiple alleles. QTL mapping for
larger part of the population (e.g. 1000 F2 the same traits using different populations
plants) is screened with markers flanking can be illustrated using seed dormancy
the QTL to identify sets of QTL isogenic in barley as an example, where QTL were
recombinants (QIRs). QIR plants carrying a compared across seven RIL populations
recombination event in one QTL while they and one DH population derived from
are homozygous at all other QTL are most crosses including 11 cultivated strains and
informative. The trait complexity can thus one wild barley strain showing the wide
be reduced to a monogenic trait as plants range of seed dormancy levels (Hori et
with all but one QTL having an identical al., 2007). Linkage maps were constructed
homozygous genotype are selected. These based on EST, SSR, RFLP and morpho-
QIRs are subsequently genotyped with suf- logical markers, each map consisting of
ficient markers at the recombinant QTL 821114 markers (Table 7.2). Using these
region to precisely map the recombinant populations, a total of 38 QTL clustered
event within the QTL-bearing interval. around 11 regions were identified on the
Phenotyping other QIRs becomes more reli- barley chromosomes except chromosome
able by reducing the trait complexity as 2H among eight populations. The QTL at
these plants are nearly isogenic for all QTL the centromeric region of the long arm of
that affect the trait. Peleman et al. (2005) chromosome 5H was identified in all popu-
demonstrated that for fine mapping oligo- lations with different degrees of dormancy
genic traits, homogenizing the background depth and period (Fig. 7.3).
genome is not required. The method was Considering several populations derived
demonstrated by fine mapping a QTL from diverse parental materials increases
responsible for erucic acid content in rape- the probability that a QTL will be poly-
seed. For quantitative traits that are con- morphic in at least one population. To go
trolled by many QTL each with relatively beyond comparison of results between
small effects, progeny test and background populations, some authors have proposed
selection with markers to cover the whole jointly analysing the different populations.
genome would be required in order to min- This can be done first for independent
266 Chapter 7

Table 7.2. Summary of linkage map information for eight permanent mapping populations in barley
(from Hori et al. (2007) with kind permission of Springer Science and Business Media).

Number of markers
Total map
Morpho- length
Population logical EST SSR RFLP AFLP Total (cM)

Haruna Nijo H602 DH (DHHS) 4 1055 35 16 1110 1362.7


Russia 6 H.E.S.4 (RHI) 3 75 34 3 1134 1249 1595.7
Mokusekko 3 Ko A (RIA) 2 102 1 105 1233.5
Harbin 2-row Khanaqin 7 (RI1) 4 81 85 1217.0
Harbin 2-row Turkey (RI2) 4 29 45 1 328 407 1377.1
Harbin 2-row Turkey 45 (RI3) 2 80 82 1103.8
Harbin 2-row Katana (RI4) 4 90 94 1208.2
Harbin 2-row Khanaqin 1 (RI5) 2 76 32 110 1078.3

Bmac113
k06317
k08607

k04509
DHHS
RI2
(2003,5w)
(2003,5w) RI4
(2003,10w)
(2003,10w) (2003,5w)
k03390 (2005,5w)
RIA (2005,5w) (2003,10w)
k10895 (2005,10w)
(2005,5w) (2005,10w) (2005,5w)
k09404 (2005,10w)
k04703
k00950

k00584 RI5
(2005,5w)
RHI
(2003,5w) RI1
k06860
(2005,5w) (2005,5w)
Bmag223
(2005,10w) RI3
srh (2003,5w)
k09282 (2003,10w)
Bmag113 (2005,5w)
(2005,10w)
k03192

k03993
k04768

k09350
k03846
DHHS
k07669 (2005,5w)
k04431 (2005,10w)
ABC155
Bmag222

Fig. 7.3. A consensus barley linkage map based on eight mapping populations (for codes see Table 7.2)
and positions of QTL for seed dormancy at 5 and 10 weeks (5w and 10w, respectively) after ripening
in 2003 and 2005. Linkage groups are oriented with short arms from the top. The anchor loci including
SSR, RFLP and morphological markers are indicated with under line. QTL positions are indicated by grey
boxes. Peaks of the significant marker intervals as indicated by triangles in boxes. Only chromosome 5H
is included, which shows large-effect QTL near the centromere on the long arm in all of the populations.
From Hori et al. (2007) with kind permission of Springer Science and Business Media.
Molecular Dissection of Traits: Practice 267

populations (no known pedigree relation- new QTL were also detected in advanced
ship between the parents of the different generations.
populations) (Muranty, 1996; Xu, 1998).
In this case, QTL effects are nested (in the
statistical sense) within populations and 7.4.3 Epistasis
the number of parameters to be estimated
increases with the number of populations. Importance of epistasis
Also, the lack of connections between pop-
ulations does not allow global comparison The importance of epistasis to the genetic
of the effects of all QTL alleles segregat- control of quantitative traits has been
ing in the different populations. An alter- debated. As one of the early supports to the
native approach is therefore to develop importance of epistasis, Eshed and Zamir
connected populations (common parents (1996) reported that QTL epistasis is a sig-
among populations). Under the assump- nificant component in determining phe-
tion of additivity, considering identical notypic values using tomato NILs. For the
allelic effects over populations rather than five yield-associated traits, 2040% of the
nesting effects within populations reduces 45 dichromosome segment combinations
the total number of parameters and, conse- were epistatic, which is much higher than
quently increases the power of QTL detec- would be expected by chance alone. The
tion (Rebai and Goffinet, 1993; Jannink detected epistasis was predominantly less-
and Jansen, 2001). In such an analysis, the than-additive, i.e. the effect of double het-
effects of alleles segregating are estimated erozygotes was smaller than the sum of the
simultaneously, which facilitates a global effects of the corresponding single hetero-
comparison. This is of particular interest to zygotes. Several other studies showed that
identify the parental origin(s) of favourable the epistatic variance can account for a large
allele(s) at each QTL. proportion of the genetic variance of quanti-
QTL for six quality traits in tomato tative traits (Carlborg et al., 2005; Malmberg
(fruit weight, firmness, locule number, and Mauricio, 2005; Malmberg et al., 2005).
soluble solid content, sugar content and Epistatic interaction among loci could
titratable acidity) were studied in order to contribute substantially to the variation in
investigate their individual effect and their complex traits (Carlborg and Haley, 2004;
stability over years, generations and genetic Marchini et al., 2005).
backgrounds (Chab et al., 2006). Three sets In contrast, after a review of most stud-
of genotypes corresponding to three genera- ies conducted at that time Tanksley (1993)
tions were compared: (i) an RIL population suggested that strong epistatic interactions
containing 50% of each parental genome; are the exception and not the rule for natu-
(ii) three BC3S1 populations segregating rally occurring polygenes. These conclu-
simultaneously for the five regions car- sions are supported to some extent by the
rying fruit quality QTL, but almost fully few studies in which individual QTL have
homozygous for the recipient genome on been genetically identified by introgres-
the eight chromosomes carrying no QTL; sion from other QTL in NILs and have been
and (iii) three sets of QTL-NILs (BC3S3 lines) shown to continue producing their same
which differed from the recipient line only individual effects (De Vicente and Tanksley
in one of the five chromosome regions. 1993; Eshed and Zamir 1995) and also by
Eight of the ten QTL detected in RILs were a recent report on negligible interaction
recovered in the QTL-NILs with the genetic using a barley DH population (Harrington
background used for the initial QTL map- TR306) developed by the North American
ping experiment, with the exception of two Barley Genome Mapping Project for QTL
QTL for fruit firmness. Several new QTL mapping, which consisted of 145 lines and
were detected. In the two other genetic 127 markers covering a total genome length
backgrounds, the number of QTL in com- of 1270 cM. These DH lines were evaluated
mon with the RILs was lower, but several in 25 environments for seven quantitative
268 Chapter 7

traits: heading, height, kernel weight, lodg- errors, due to increasing difficulty in man-
ing, maturity, test weight and yield. Xu aging such a population effectively.
and Jia (2007) applied an empirical Bayes
method that simultaneously estimates Statistical methods for epistatic QTL
127 main effects for all markers and main-
effect QTL (single marker) and the larg- Methods for mapping QTL with epistatic
est epistatic effect (single pair of markers) effects are still premature. Some methods
explained 18 and 2.6% of the phenotypic utilize models including a single epistatic
variance, respectively. On average, the sum effect at a time (Holland, 1998; Malmberg
of all significant main effects and the sum of et al., 2005), while others apply a model
all significant epistatic effects contributed selection strategy that searches for multi-
35 and 6% of the total phenotypic variance, ple epistatic effects (Carlborg et al., 2000;
respectively. Epistasis seems to be negligible Yi et al., 2003, 2005; Baieri et al., 2006).
for all the seven traits. They also found that Xu (2007) developed an empirical Bayes
whether two loci interact does not depend method that can simultaneously estimate
on whether or not the loci have individual main effects and all individual markers
main effects. This invalidates the common and epistatic effects of all pairs of markers.
practice of epistatic analysis in which epi- Recently, epistatic QTL analysis has been
static effects are estimated only for pairs of extended to the genome-wide level. Such
loci of which both have main effects. an example is that Yi et al. (2007) proposed
The contradicting reports may result a Bayesian model selection approach of
from the fact that QTL mapping studies and genome-wide interacting QTL for ordinal
analytical methods have not been able to traits in experimental crosses. Stich et al.
detect epistasis and thus the conclusions (2007) examined a genome-wide QTL map-
could be biased, preferentially identify- ping strategy using genome sequence infor-
ing genes that have large effects and/or act mation of RILs that were generated from
independently (Xu, 1997). This argument several crosses of parental inbreds. The
is supported by the results that QTL with SNP haplotype data of B73 and 25 diverse
large effects are detected in very different maize inbreds were used to simulate the
crosses and environments. The second rea- production of various RIL populations.
son is that ordinal QTL analysis was made Higher power to detect three-way interac-
with populations segregating for the whole tions was observed for RILs derived from
genome simultaneously so that it may be optimally allocated distance-based designs
difficult to detect an interaction in a spe- than from nested designs or diallel designs.
cific combination of QTL genotypes. For The power and proportion of false positives
example, Yano et al. (1997) predicted an to detect three-way interactions using a
interaction between the two largest QTL, nested design with 5000 RILs were for both
Hd1 and Hd2, for heading date. But the the 4-QTL and 12-QTL scenario of a mag-
existence of another QTL, Hd6 and its inter- nitude that seems promising for their iden-
action could not be detected in their pri- tification. To find an optimal model for the
mary population (F2), where many epistatic epistatic effect, Bayesian model selection
interactions could exist in so-called minor (George and McMulloch, 1993), by taking
QTL. Successful examples for detection of advantage of Markov chain Monte Carlo
epistatic interactions by using primary pop- (MCMC) sampling, is a more efficient algo-
ulations seem to be related to population rithm than both the exhaustive and the heu-
sizes and structures, quantitative traits, the ristic searches. The simulation experiments
number of existing QTL and QTL effects. conducted by Xu (2007) showed that the
The more QTL involved, the more diffi- MCMC-based methods performed satisfac-
cult is the detection of significant differ- torily when the sample size was 600. The
ences for individual QTL. Although using a empirical Bayes method is more robust to
large population size may help to detect epi- small sample size than the MCMC-based
static interactions, it increases experimental full Bayes methods. Considering that most
Molecular Dissection of Traits: Practice 269

QTL studies reported so far have sample order (Charcosset et al., 1994). The statistical
sizes less or much less than 600, a larger properties of QTL-by-genetic-background
mapping population has to be created in interaction tests in the case of a single
order to use the MCMC-based full Bayes digenic interaction has been analysed by
methods developed so far for QTL mapping means of simulations (Jannink and Jansen,
where epistatic effects are involved. 2001) and the result showed that it was
possible to identify the two QTL involved by
Population strategies for epistatic using an appropriate statistical test and also
QTL studies proposed guidelines for the interpretation of
the sign of the QTL-by-genetic-background
QTL interaction has been analysed using interaction effects. For more complex situa-
different types of plant materials includ- tions, the results are less predictable. Several
ing a series of chromosomal substitution digenic epistatic interactions that involve a
lines or QTL-NILs. If NILs are used, inter- given QTL may add up if similar in sign,
action between the target QTL and other yielding a significant interaction with the
major genes/QTL can be eliminated and genetic background whereas none of them
only epistasis between multiple target QTL were significant. They may also cancel out
needs to be considered. With removal of each other if opposite in signs and lead to no
noise from heterogeneous backgrounds, the detectable interaction with the genetic back-
proportion of variance explained by the tar- ground. It is therefore interesting to compare
get QTL will increase and minor QTL can both types of interactions.
be identified. Blanc et al. (2006) presented results
Epistatic interactions between QTL and from six connected F2 populations of 150 F2:3
the genetic background can be addressed families each, derived from four maize
using connected designs of multiple popu- inbreds and evaluated for three traits of
lations, provided the mating design contains agronomic interest using the MCQTL software
loops (in the simplest case, three popula- (Jourjon et al., 2005). This software permits
tions derived from three parents A B, B C the joint analysis of multiple populations
and A C). In such designs, epistasis can be using a composite interval mapping method
tested through the comparison between: (i) a based on a linearized regression model
connected additive model where the allele (Haley and Knott 1992; Charcosset et al.
effects at a QTL are assumed to be identical 2000). They first detected QTL in each pop-
in the different populations; and (ii) an ulation independently (single-population
hierarchical model where allele effects are analysis), secondly on the whole design
nested within populations, which accounts without taking into account connections
for possible interactions with the genetic (multi-population disconnected analysis),
background. Such an analysis tests for con- and then on the global design using connec-
sistency of allelic effects over populations tions (multi-population connected analysis).
and therefore permits evaluation of the Lastly, they tested for digenic interactions and
contribution of QTL-by-genetic-background for locus-by-genetic-background interactions,
epistatic effects to variation in QTL results estimated the contribution of epistasis to the
observed among populations, relative to variation of the traits studied and checked if
that of other factors such as allelic relation- epistatic interactions could explain discrep-
ships between parental inbreds and statis- ancies among the analyses. The joint estima-
tical noise. Tests for epistasis in connected tion of the different parental allele effects in
designs following this principle have been a connected model allowed them to identify,
proposed by several authors (Rebai et al., for each QTL, the parental inbred line(s)
1994; Charcosset et al., 1994; Jannink and that carried the most interesting allele(s).
Jansen, 2000, 2001). One of the advantages Taking into account the connections between
of these tests, when compared to testing populations increased the number of QTL
only for digenic interactions, is to enable the detected and the accuracy of QTL position
detection of epistatic interactions of higher estimates. Many epistatic interactions were
270 Chapter 7

detected, particularly for grain yield QTL Ayres et al. (1997) determined the relation-
(R2 increase of 9.6%). Allelic relationships ship between polymorphism at that locus
and epistasis both contribute to the lack and variation in amylose content. Eight wx
of consistency for QTL positions observed microsatellite alleles were identified from
among populations, in addition to the lim- 92 long-, medium- and short-grain US rice
ited power of the tests. cultivars, which explained 85.9% of the
Melchinger et al. (2008) derived quan- variation. The amplified products ranged
titative genetic expectations of QTL effects from 103 to 127 bp in length and contained
obtained from one-dimensional genome (CT)n repeats, where n ranged from 8 to 20.
scans with the triple testcross (TTC) design Average amylose content in cultivars with
and pairwise interactions between marker different alleles varied from 14.9 to 25.2%.
loci using two-way analyses of variance Using more diverse rice germplasm acces-
(ANOVA) under the F2- and the F-metric sions (n = 243), Zeng et al. (2000) identified
model. It was demonstrated that the TTC 15 alleles at the wx locus, using microsat-
design can partially overcome the limita- ellite class and G-T polymorphism, result-
tions of the design III in separating QTL ing in a total of 16 alleles identified so far.
main effects and their epistatic interac- Now the question is whether the multiple
tions in the analysis of heterosis, and that alleles identified at the waxy locus can
dominance additive epistatic interac- be associated to QTL alleles and whether
tions of individual QTL with the genetic the case can be extended to other traits or
background can be estimated with a one- genetic loci.
dimensional genome scan. Using molecular markers with multiple
alleles in QTL mapping will help identify
multiple QTL alleles. QTL studies using
different populations have identified some
7.4.4 Multiple alleles at a locus common QTL. It is necessary, however,
to further clarify whether they identified
Two-parent derived populations in diploid common or different alleles at those QTL.
crops have only two alleles segregating at Reporting the sizes of associated alleles
each locus. Identification of multiple alleles and using allele-rich markers in QTL stud-
requires comparison of populations derived ies will provide information required for
from different crosses. To distinguish QTL this clarification, with the assumption that
alleles identified in one cross from those in each marker allele has a corresponding QTL
another, all mapped alleles must be accu- allele.
rately sized and documented.
As an example of multiple alleles at a
locus, rice amylose content, mainly con-
trolled by the wx gene, can be taken as 7.5 QTL across Growth and
an example. Wide variation in amylose Developmental Stages
content occurs and cultivars with differ-
ent amylose content, varying from waxy Classical breeding methods rely heavily
(02%), very low (39%), low (1019%) on end-point measurements of agricultural
and intermediate (2025%) to high (> productivity, which are influenced by dif-
25%), have been selected in breeding pro- ferent parameters and consequently dif-
grammes. Conventional genetic studies ferent genes, in different environments.
using cultivars with different amylose con- If more specific measures of agricultural
tents revealed transgressive segregation in productivity can be identified, for exam-
F2s in almost all possible parental combi- ple, physical or chemical properties of the
nations (Pooni et al., 1993). A polymorphic plant which relate directly to productivity
microsatellite was identified in the wx gene under a particular environmental stress, it
(Bligh et al., 1995) located 55 bp upstream will be much more feasible to identify the
of the putative 5'-leader intron splice site. underlying genes by mapping. However,
1

Plate 1. Circle diagram for comparative genomics in cereals. (From Gale and Devos (1998) 1998 National
Academy of Sciences, USA.)
Plate 2. Disequilibrium matrix for polymorphic sites within sh1. Polymorphic sites are plotted on both the x-axis
and the y-axis. The pair-wise calculation of linkage disequilibrium (LD) (r 2) is displayed above the diagonal with
the corresponding P-values for Fishers exact test displayed below the diagonal. Coloration is indicative of the
corresponding P-value or r 2 values from the bars on right. Notice that some blocks of LD do persist over larger
distances within the gene, which do not necessarily correspond to tight linkage. (Reproduced with permission of
Annual Reviews Inc., from Flint-Garcia et al. (2003); permission conveyed through Copyright Clearance Center,
Inc.)
3

Plate 3. Ac/Ds tagging system in plants.


Plate 4. Carotenoid enhancement of the rice endosperm by transformation with psy orthologues and crtl.
(a) Schematic diagram of the T-DNAs used to generate transgenic rice plants. The T-DNA comprise the rice
glutelin promoter (Glu) and the first intron of the catalase gene from castor bean (I), E. uredovora crtl functionally
fused to the pea RUBISCO chloroplast transit peptide (SSUcrtl) and a phytoene synthase from each of five plant
species (psy), with a nos terminator, as well as a selectable marker cassette comprising the maize polyubiquitin
(Ubi1) promoter with intron, hygromycin resistance (hpt) and nos terminator. (b) Photograph of polished wild-type
and transgenic rice grains containing the T-DNA (as above) with the daffodil psy (Np) or maize psy (Zm) showing
altered colour due to carotenoid accumulation. (c) Schematic diagram of the T-DNA in pSYN12424 used to
develop Golden Rice 2. pmi is maize phosphomannose isomerase gene. (From Paine et al. (2005) reprinted by
permission from Macmillan Publishers Ltd.)
Molecular Dissection of Traits: Practice 271

measures of agricultural productivity usu- have identified various so-called circadian


ally reflect the effects of many genes, act- clock genes and clock-controlled transcript
ing at different times during the lengthy factors through mutants. Circadian rhythms
period of growth and development of the in plants are poorly understood and they
organism. Genetic expression of quantita- can be determined as complex dynamic
tive traits associates closely with devel- traits.
opment stages and may have one specific Some dynamic traits do not fit a normal
stage which is most suitable for identifi- distribution as well as a typical quantitative
cation (Xu, 1997). For any biochemical trait because of their extremely non-linear
process that takes place in growth, possi- nature. These traits are best characterized in
bly much less than 0.1% of the operations terms of sudden transitions to qualitatively
involved in the process would lead to the distinct phenotypic states as opposed to
activity and the final output of a plant cell quantitative extensions of previous states,
(Peterson, 1992). Therefore, developmental and are called time-to-event or time-to-
research on quantitative traits is important failure traits. The most notorious of such
not only for quantitative genetic analysis by traits is probably the death of the organism
conventional methods, but also for deter- and many other traits such as flowering can
mination of the best stage for identification be interpreted in this way. In a typical time-
by molecular approaches. In rice, for exam- to-event (or time-to-failure) experiment one
ple, genetic analysis on tiller number was follows a sample over time and records the
made at different growth stages and differ- times (e.g. hours, days) at which the event
ential phenotypic expression was found, occurs to a given individual. The resulting
indicating that evaluation and selection of phenotypic distribution is usually right-
tiller number should be at the peak-tillering skewed.
stage (Xu and Shen, 1991). At this stage,
phenotypic differences in tiller number
among genetic materials was maximized
and suitable for distinguishing different 7.5.2 Dynamic mapping
genotypes. In general, intensive research
in developmental genetics of quantitative Many agriculturally and biomedically
traits is fundamental for QTL mapping and important traits undergo predictable changes
cloning but limited work has been done in in genetic time. These changes are in part
most crops. driven by the temporal regulation of the
genes or QTL underlying these phenotypes.
To understand genetic expression at differ-
ent developmental stages, dynamic mapping
7.5.1 Dynamic traits has been proposed (Xu, 1994, 1997; Xu and
Zhu, 1994). Xu, Y. (2002) summarized three
Quantitative traits that can be measured approaches to dynamic analysis or time-
repeatedly during the development of life related mapping using phenotypic data col-
are called longitudinal traits in humans, but lected across developmental stages. One is
more often called dynamic traits in animals based on analysis of trait values measured
and plants. Some genes control the pheno- at each observation time (e.g. Bradshaw and
typic values of the dynamic traits at fixed Stettler, 1995; Plomion et al., 1996; Price
time points and others may alter the transi- and Tomos, 1997; Verhaegen et al., 1997),
tions of the phenotypes between consecu- from which the accumulated effect of a
tive time points. The growth pattern of a QTL, from the beginning of ontogenesis to
dynamic trait is called the growth trajectory each observation time, can be estimated.
(Yang and Xu, 2007). On the other hand, the This is called effect-accumulation analysis
normal functions of biological process are or unconditional QTL mapping (Yan et al.,
strongly correlated with the genes that con- 1998a). The second approach is to analyse
trol them. Several studies in animal models trait-value increments observed at sequential
272 Chapter 7

time intervals (e.g. Bradshaw and Stettler, vice versa; and (iv) proportionally acting:
1995; Plomion et al., 1996; Verhaegen et al., QTL either expressed with a proportion-
1997), from which the incremental or net ally increased or decreased rate or with a
effect of a QTL at each time interval can be consistent rate.
estimated. This is called effectincrement As an early example in dynamic QTL
analysis or conditional QTL mapping (Yan mapping, Yan et al. (1998a,b) used rice
et al., 1998a). Phenotypic data collected IR64/Azucena DHs to study the develop-
at different growth stages or time intervals mental characteristics of QTL for tiller
can be analysed either separately or jointly. numbers and plant height by conditional
Compared to separate analysis, joint analy- and unconditional interval mapping, in
sis can synthesize all the information from combination with phenotyping these traits
different times or time intervals to give a every 10 days after transplanting. They
comprehensive estimate of each QTL posi- concluded that many QTL identified at
tion, according to which a corresponding the early stages were undetectable at the
complete expression (or expression rate) final stage. Conditional mapping identi-
curve of each QTL can be estimated (Wu fied more QTL than unconditional map-
et al., 1999). In practice, both separate and ping. Temporal patterns of gene expression
joint analyses should be conducted. A third changed with developmental stages.
approach to looking at the QTL over time is Genes at a specific genomic region might
to do a multivariate analysis based on fitting have opposite genetic effects at various
the parameters of the growth curve (animal growth stages. For chromosomal regions
breeders call this general approach random significantly associated with plant height,
regression). conditional QTL were found only at one
The significant advantage of dynamic to several specific periods and no QTL for
mapping is that it provides a quantita- plant height was continually active during
tive framework for testing the interplay the entire period of growth.
between genetic (inter)actions and the
pattern of development in a time course.
Dynamic mapping constructs a setting
for precisely estimating and predicting 7.5.3 Statistical methods for dynamic
a number of fundamental events in the mapping
genetic control of development (Wu et al.,
2004), which include: (i) the timing of a Several dynamic mapping methods devel-
QTL to turn on and off to affect growth oped (Ma et al., 2002; Wu, W. et al., 2002;
in a time course; (ii) the duration of the Wu et al., 2004; Wu and Lin, 2006) have
dynamic genetic effect of a QTL; (iii) the made it possible to test interesting hypothe-
magnitude of the genetic effect of a QTL ses about the quantitative genetic control of
on maximal growth rate; and (iv) the plei- the rate of change in the phenotype as well
otropic effect of the growth QTL on other as the time specificity of the genetic effects.
developmental traits related to growth To be informative, these latter methods
processes. require that a measured trait value can be
In general, there are four types of obtained from the same individual at differ-
QTL effects in time-to-event experiments ent time points and that the phenotype can
(as modified from Wu and Lin (2006) and be described as a process unfolding along a
Johannes (2007)). These include: (i) early- continuous trajectory.
acting: QTL expressed at the early stage The biological and statistical advan-
of the developmental process but not dur- tages of dynamic mapping result from joint
ing the rest of the process; (ii) late-acting: modelling of the mean-covariance struc-
QTL expressed only at the late stage of the tures for developmental trajectories of a
developmental process; (iii) inversely act- complex trait measured at a series of time
ing: QTL highly expressed at the early stage points. While an increased number of time
but with low expression at the late stage, or points can better describe the dynamic
Molecular Dissection of Traits: Practice 273

pattern of trait development, significant dif- same time, the EC model recovered all of
ficulties in performing dynamic mapping the QTL the CPH model detects. It was con-
arise from prohibitive computational times cluded that potentially important QTL may
required as well as from modelling the be missed if their time-dependent effects
structure of a high-dimensional covariance are not accounted for.
matrix. An efficient approach for applying The most cumbersome issue in multi-
dynamic mapping to high-dimensional data ple QTL mapping for dynamic traits is how
is through dimensional reduction, i.e. the to determine the optimal number of QTL.
transformation that brings data from a high- To do this, variable selection via stepwise
to low-order dimension. Zhao et al. (2007) regression is commonly used in maxi-
developed a statistical model for dynamic mum-likelihood mapping. Reversely-jump
mapping of QTL that govern the develop- Markov chain Monte Carlo (RJ-MCMC) is
mental process of a quantitative trait on the the corresponding variable selection proce-
basis of wavelet dimension reduction. By dure used in Bayesian analysis. However,
breaking an original signal down into a spec- RJ-MCMC is shown to be subjected to poor
trum by taking its averages (smooth coeffi- mixing and slow convergence to the sta-
cients) and difference (detail coefficients), tionary distribution. Variable selection by
they used the discrete Haar wavelet shrink- Bayesian shrinkage analysis and stochastic
age technique to transform an inherently search are more efficient than RJ-MCMC
high-dimensional biological problem into (reviewed in Yang and Xu, 2007). In these
its tractable low-dimensional representation methods, no variable selection is conducted
within the framework of dynamic mapping in an explicit manner; rather, a treatment
constructed by a Gaussian mixture model. similar to variable selection is made implic-
The wavelet-based parametric dynamic itly by shrinking the effects of excessive
mapping holds great promise as a power- QTL to zero. Yang, R.Q. et al. (2006) devel-
ful statistical tool to unravel the genetic oped an interval-mapping procedure to
machinery of developmental trajectories map QTL for dynamic traits under the max-
with large-scale high-dimensional data. imum-likelihood framework. They fitted
To be informative, methods based on the growth trajectory by Legendre polyno-
the test of time specificity of the genetic mials. The method was intended to map one
effects require that a measured trait value QTL at a time and the entire QTL analysis
can be obtained from the same individual involved scanning the entire genome by fit-
at different time points and that the pheno- ting multiple single-QTL models. Yang and
type can be described as a process unfold- Xu (2007) proposed a Bayesian shrinkage
ing along a continuous trajectory. Johannes analysis for estimating and mapping mul-
(2007) developed the idea of time-varying tiple QTL in a single model. The method
QTL effects in the context of time-to-event is a combination between the shrinkage
analysis. An extension of the Cox model (EC mapping for individual quantitative traits
model) (Therneau and Grambsch, 2000) was and the Legendre polynomial analysis, an
applied to an interval-mapping framework. extensively used linear growth model in
In its simplest form, this model assumes animals, for dynamic traits. Simulation
that the QTL effect changes at some time study showed that the method generated a
point t0 and follows a linear function before much better signal for QTL than the interval-
and after this change point. The approxi- mapping approach.
mate time point at which this change occurs Although various statistical methods
is estimated. Using simulated and real data, have been developed to meet the require-
the mapping performance of the EC model ments of dynamic mapping for different
was compared to the Cox proportional haz- trait categories, effectiveness and efficiency
ards (CPH) model, which explicitly assumes of these methods needs further studies.
a constant effect. The results showed that Application of these methods to QTL map-
the EC model detects time-dependent QTL, ping needs full support of user-friendly
which the CPH model fails to detect. At the mapping software.
274 Chapter 7

7.6 Multiple Traits and Gene effect size and also many forms of genetic
Expression complexity. Direct evidence for genetic
complexity of transcript levels comes from
Plant breeders manage numerous pheno- detecting multiple QTL for at least some
types simultaneously in order to develop a expression traits. Even the detected QTL
suitable breeding product for a suitable envi- typically explain only a minority of trait
ronment. Geneticists face the same challenge variation (Rockman and Kruglyak, 2006).
when handling many transcripts in genetic In yeast, the median phenotypic effect of a
mapping for gene expression. In terms of the detected QTL was 27% of genetic (inherit-
complexity and variability, both phenotypes able) variance explained and only 23% of
handled by breeders and transcripts handled traits had a QTL that explained > 50% of the
by geneticists belong to a same category: genetic variance (Brem and Kruglyak, 2005;
multiple traits. However, discussion in this Fig. 7.4). Whereas visible trait variation is
section will focus on gene expression. All often described by several QTL that collec-
issues discussed in this section can be found tively account for up to half of the genetic
in the corresponding discussion on multiple variance and individually rarely > 20% of it,
quantitative traits in Chapter 1. eQTL accounting for 2550% of transcrip-
tional variation are prevalent (as summarized
by Gibson and Weir, 2005). It is clear that
major-effect QTL are more prevalent than
7.6.1 Features of gene expression many investigators would have expected.
Fourthly, transcriptional variation is
An emerging approach is to ask whether the probably highly polygenic. It is important
parameters of gene activity at the level of to recognize that even in the cases where
transcription regarding additivity, heritabil- a major-effect eQTL explains half of the
ity and complexity parallel those of clas- genetic variance for transcript abundance,
sical phenotypic traits (Gibson and Weir, the other half remains to be accounted for
2005). There are six features that character- and in most cases will be caused by unde-
ize gene expression studies. First, there is tected loci. Because conservative thresholds
now a reasonable expectation that for any of detection are required to adjust for the
tissue from any organism sampled under a extraordinarily large number of comparisons
particular set of environmental conditions, involved in a genome-wide linkage scan for
1050% of the transcripts will be found several thousand transcripts (the so-called
to vary as a result of heritable differences multiple comparison problem), most true
(Stamatoyannopoulos, 2004). eQTL remain undetected. Based on a yeast
Secondly, numerous examples were data set studied by Brem and Kruglyak
observed of non-additivity of transcription (2005) transcription is more often likely to be
including over- and under-dominance (F1 highly polygenic than monogenic: only 3%
with higher or lower expression, respec- of highly heritable transcripts are consist-
tively, than either parent), parent-of-origin, ent with single locus inheritance, 18% sug-
maternal and reciprocal F1 effects, indicat- gest control by two loci and > 50% require
ing an unexpected complexity to the map- at least five loci under an additive model
ping of genotype on to the transcriptional (Fig. 7.5). They also argue that more than
phenotype. Similar results have been half of the transcripts show transgressive
observed in studies targeting specific can- segregation (transcript abundance in F2 prog-
didate genes in maize (Auger et al., 2005) eny falls outside the range of both grandpar-
and wheat (Sun et al., 2004) and in a mas- ents) and that > 15% are better explained
sively parallel signature sequencing (MPSS) by models that include epistatic interaction.
analysis of hybrid oysters (Hedgecock et al., Clearly, the landscape of gene expression in
2002). yeast is genetically complex and it can be
Thirdly, genetic complexity of transcript expected that it will be anything but more
levels is reflected by the QTL number and complex in higher eukaryotes.
Molecular Dissection of Traits: Practice 275

0.30

0.25
Fraction of loci

0.20

0.15

0.10

0.05

0
0 0.2 0.4 0.6 0.8 1.0
Fraction of genetic variance

Low (10%) Average (29%) High (94%)


1.0 0.4 1.5
0.8 0.3 1.0
0.6 0.2 0.5
0.4 0.1
0
0.2 0
0.5
0 0.1
1.0
0.2 0.2
0.4 0.3 1.5

0.6 0.4 2.0


0.8 0.5 2.5
Seg BY RM Seg BY Seg RM Seg BY RM Seg BY Seg RM Seg BY RM Seg BY Seg RM

Fig. 7.4. Most gene expression traits are affected by multiple loci. Each bar represents the fraction of
QTL that explain a percentage of genetic variance in the range on the x-axis. For each trait with significant
linkage(s), only the single most significant QTL is included. Data are derived from the first table in Brem
and Kruglyak (2005). The panels below the plot show examples of QTL that explain, from left to right,
low (10%), average (29%) and high (94%) percentages of genetic variance. In each panel, the left-most
column shows the relative expression of the corresponding genes in all 112 segregants (Seg), the next
two columns show the expression in replicates of the two parent strains (BY, RM), and the last two col-
umns show the expression in the segregants that inherit the QTL alleles for the first and second parent
strains (Seg BY, Seg RM). From Rockman and Kruglyak (2006) reprinted by permission from Macmillan
Publishers Ltd.

Fifthly, up to one-third of eQTL are cis multiple transcripts. The straightforward


acting. If an eQTL is mapped to a genomic interpretation of these cases is that the eQTL
region where the expression trait (eTrait) identifies a regulatory gene that co-regulates
gene is located, it may suggest the cis-regu- as many as 25 downstream targets (but see de
latory mechanism for the eQTL, i.e. certain Koning and Haley, (2005) and Prez-Enciso
sequence variations around the gene region (2004) for a more critical interpretation).
of the eTrait, may directly influence the
transcript abundance of the gene. In most of
these cases, intuition suggests that the eQTL
effect is likely to be caused by polymorphism 7.6.2 Examples of eQTL in plants
in the regulatory region of the gene, namely
sequence variation in the binding sites for Dissection of the genetics underlying gene
transcription factors. Otherwise, mapping expression combines large-scale micro-
results may indicate trans-acting regulations, array analyses of expression profiles and
i.e. the variation of an eTrait is affected by conventional QTL mapping of the same
sequence polymorphisms in other genes. segregating population. In this analysis the
Sixthly, most of eQTL studies detect expression profiling is considered a quan-
eQTL hotspots that explain variation for titative phenotype affected by multiple
276 Chapter 7

100 Min > n now being associated with gene expression


90 using eQTL analysis in an increasing number
80
of crops. For example, Kirst et al. (2004) dis-
sected the genetic and metabolic network
Percentage of transcripts

70
Max = n underlying variation in growth in an inter-
60 specific BC population of eucalyptus. QTL
50 analysis of transcript levels of lignin-related
40
genes showed that their mRNA abundance
is regulated by two genetic loci, coordinat-
30
ing genetic control of lignin biosynthesis.
20 These two loci co-localize with QTL for
10 Min < n + 1 growth, suggesting that the same genomic
0
Min = n regions are regulating growth and lignin con-
1 2 3 4 5 6 7 8 9 10 tent and composition. Using a high-density
Number of eQTL (n) oligonucleotide array and phenotypically
divergent rice accessions and their transgres-
Fig. 7.5. Inference of polygenic regulation from
sive segregants, Hazen et al. (2005) measured
eQTL analysis. A plot of the data from Brem and
Kruglyak (2005) showing the range of complexity the expression of approximately half of the
of transcriptional regulation in yeast inferred by genes in rice (21,000) to associate changes
their likelihood analysis. The error bars indicate the in stress-regulated gene expression with
range of the percentage of differently expressed QTL for osmotic adjustment (OA), which is
transcripts in the F2 segregation that are predicted a known mechanism of drought tolerance. A
to be regulated by n genes indicated on the x-axis. total of 662 transcripts were observed to be
The large Xs indicate the minimum number of tran- expressed differentially between the parental
scripts regulated by more than n eQTL: for exam- lines. Only 12 genes were induced in the low
ple, at least 20% of transcripts are predicated to
OA parent (CT9993) at moderate dehydra-
have more than ten eQTL. The circles place a low
tion stress levels while over 200 genes were
limit on the number of transcripts regulated by up
to n eQTL: for example, at least 10% of transcripts induced in the high OA parent (IR62266).
are regulated by four or fewer eQTL. Reprinted Sixty-nine genes were upregulated in all
from Gibson and Weir (2005) with permission from high OA lines and nine of those genes were
Elsevier. not induced in any of the low OA lines, of
which four could be annotated as follows:
sucrose synthase, a pore protein, a heat shock
genes and environmental factors (Jansen protein and a late embryogenesis abundant
and Nap, 2001). This approach has facili- (LEA) protein. Previous conventional QTL
tated the identification of genomic regions mapping using the same two rice accessions
or eQTL associated with transcript variation showed that the parental genotypes differed
in co-regulated genes and when correlated for five of the OA QTL, that two of these QTL
with phenotypic data from a quantitative are syntenic with other cereal drought stress
character, has successfully identified candi- QTL (Zhang et al., 2001) and a major OA QTL
date genes by co-localizing gene eQTL and in the same genomic region on rice chromo-
trait QTL (Brem et al., 2002; Klose et al., some 7 is also reported in a different cross
2002; Wayne and Mclntyre, 2002; Schadt (Lilley et al., 1996). Of the 3954-probes that
et al., 2003; Rockman and Kruglyak, 2006; correspond to this part of the chromosome,
Keurentjes et al., 2007b). few showed a differential expression pattern
Plants exhibit massive changes in gene between the high and low OA lines. Thus,
expression during morpho-physiological these preliminary results demonstrate the
and reproductive development as well when power of integrating quantitative analysis of
exposed to a range of biotic and abiotic gene expression data with genetic map infor-
stresses. These have been observed as dif- mation to identify genetic and metabolic net-
ferences in transcriptional profiles in many works that would not have been identified
crops. Variation in transcript abundance is through conventional QTL analysis.
Molecular Dissection of Traits: Practice 277

Guo, M. et al. (2006) applied genome- Most eQTL mapping studies to date
wide transcript profiling to gain a global pic- have searched for eQTL by analysing gene
ture of the ways in which a large proportion expression traits one at a time. As thou-
of genes are expressed in the immature ear sands of expression traits are typically ana-
tissues of a series of 16 maize hybrids that lysed, this can reduce power because of the
vary in their degree of heterosis. Key obser- need to correct for the number of hypoth-
vations include: (i) the proportion of allelic esis tests performed. In addition, gene
additively expressed genes is positively expression traits exhibit a complex correla-
associated with hybrid yield and heterosis; tion structure, which is ignored when ana-
(ii) the proportion of genes that exhibit a bias lysing traits individually. To address these
towards the expression level of the paternal issues, Biswas et al. (2008) applied two
parent is negatively correlated with hybrid different multivariate dimension reduction
yield and heterosis; and (iii) there is no cor- techniques, the singular value decomposi-
relation between the over- or under-expres- tion (SVD) and independent component
sion of specific genes in maize hybrids with analysis (ICA) to gene expression traits
either yield or heterosis. The relationship of derived from a cross between two strains
the expression patterns with hybrid perform- of Saccharomyces cerevisiae. In total, 21
ance is substantiated by analysis of a geneti- eQTL were found, of which 11 were novel
cally improved modern hybrid (Pioneer and both cis and trans-linkages to the meta-
hybrid 3394) versus a less improved older traits were observed. These results demon-
hybrid (Pioneer hybrid 3306) grown at dif- strated that dimension reduction methods
ferent levels of plant density stress. The are a useful and complementary approach
proportion of allelic additively expressed for probing the genetic architecture of gene
genes is positively associated with the mod- expression variation.
ern high-yielding hybrid, heterosis and As we have discussed earlier, a range
high-yielding environments, whereas the of biological and statistical tools enable
converse is true for the paternally biased research on natural variation to move from
gene expression. The dynamic changes simple reductionistic studies focused on
of gene expression in hybrids responding to individual genes to integrative studies con-
genotype and environment may result from necting molecular variation at multiple loci
differential regulation of the two paren- with physiological consequences. Hansen
tal alleles. Their findings suggested that et al. (2008) provides a comprehensive
differential allele regulation may play an review focusing on recent examples that
important role in hybrid yield or heterosis demonstrate how expression QTL data can
and provide a new insight to the molecular be used for gene discovery and to untangle
understanding of the underlying mecha- complex regulatory networks. The latter is
nisms of heterosis. also briefly discussed in Chapter 10.
Recently, Keurentjes et al. (2007b)
described the results of genome-wide expres-
sion variation analysis in an RIL population
of A. thaliana and for many genes varia- 7.7 Selective Genotyping
tion in expression could be explained by and Pooled DNA Analysis
eQTL. The nature and consequences of this
variation are discussed based on additional As introduced in Chapter 6, replacing indi-
genetic parameters, such as heritability and vidual genotyping by selecting only the
transgression and by examining the genomic individuals from the high and low tails of
position of eQTL versus gene position, the population distribution (selective gen-
polymorphism frequency and gene ontol- otyping) or DNA analysis in pools of the
ogy. Besides, the authors have developed selected individuals (pooled DNA analy-
an approach for genetic regulatory network sis) was proposed for QTL analysis and for
construction by combining eQTL mapping testing of linkage between markers and a
and regulatory candidate gene selection. major gene. This concept is referred to as
278 Chapter 7

tail analysis (Hillel et al., 1990; Dunnington one for the other at the two marker loci. The
et al., 1992; Plotsky et al., 1993), bulked result is that each DNA pool is homozygous
segregant analysis (Giovannoni et al., 1991; at all loci within and adjacent to the target
Michelmore et al., 1991), or selective DNA region. However, the homozygous target
pooling (Darvasi and Soller, 1994) and is an region differs between the two pools in
effective solution to reduce costs associated parental origin, thus providing the basis
with genotyping large mapping popula- for selection of polymorphic markers spe-
tions. As reducing the size of a QTL map- cific to the target region. When pooled DNA
ping population will decrease the detection samples are subsequently utilized as tem-
power (Charcosset and Gallais, 1996) and plates for random primer amplification via
also increase the QTL confidence interval, PCR, polymorphism should result only if
as well as the risk of detecting false QTL, the primer primes within or adjacent to the
selective genotyping can save much more target interval. This polymorphism can also
cost than using smaller population sizes be detected by probing with other molecu-
while maintaining the same mapping power lar markers.
as the large populations. Take a large popu- The genotyping in this approach
lation with 500 individuals and select 25 becomes very simple since it relies on
individuals from each tail, which means just two DNA pools each of plants from
that selective genotyping will only cost one or other of the phenotypic extremes
10% (= 2 25/500) of the total cost required (Giovannoni et al., 1991; Michelmore et al.,
for genotyping the whole population. When 1991). The pools that have been used in
pooled DNA analysis is used, two tails can plants usually consist of 1015 individuals
be genotyped as two individuals, which taken from as large a population as possible.
brings the genotyping cost down to 0.4% This approach has already been successfully
(= 2/500) of the total cost. Apparently, the used in many plants (e.g. Barua et al., 1993;
bigger the original population size, the more Hormaza et al., 1994; Villar et al., 1996; van
saving there will be on all related costs Treuren, 2001; Zhang et al., 2002).
including genotyping.

7.7.2 Quantitative traits


7.7.1 Major gene-controlled traits
When selective genotyping is used for quan-
Major gene-controlled traits can be selec- titative traits (Fig. 7.6B; Lander and Botstein,
tively genotyped through bulked segregant 1989; Xu and Crouch, 2008), the changes
analysis (Fig. 7.6A; Xu and Crouch, 2008). of marker allele frequencies between two
Briefly, individuals are selected so that tails can be used for detection markertrait
two groups of individuals have contrasting association as suggested by Stuber et al.
phenotype, i.e. resistant versus susceptible (1980, 1982). Due to the hitchhiking effect,
plants, or early versus late maturity, with a selection would change frequencies of mark-
randomized genetic background for other ers that are more linked to the QTL for the
traits. Molecular markers that detect poly- selected trait (Lebowitz et al., 1987). Darvasi
morphism between the two groups may be and Soller (1992) have shown that selective
linked to the character and this possibility genotyping led to a marked decrease in the
can then be tested in a segregating popu- number of individuals genotyped for a given
lation. In order to isolate additional mark- power. This approach can be bidirectional if
ers near a gene for which the genetic map the two tails of the distribution are con-
location is known, individuals that are sidered or unidirectional if only one tail is
homozygous across a target interval can be considered. The latter is more suitable for
selected from a segregating population based traits subjected to strong selection of unfa-
on known bracket markers. DNA from these vourable environments. Several applications
individuals is then combined into two pools: of this method have already been reported
one homozygous for one parental type and for QTL detection or QTL validation in
Molecular Dissection of Traits: Practice 279

A B
Population
distribution

Selection

DNA pools

Linked 1.0 Linked Genotyping


0.5

0.0
Allele frequency

Linked 1.0 Linked


0.5

0.0

Unlinked 1.0 Unlinked


0.5

0.0
R plants S plants High Low
tail tail

Fig. 7.6. Selective genotyping and pooled DNA analysis. (A) Pooled analysis using disease resistant
(R) and susceptible (S) plants as an example. DNA pools are constructed from R and S plants selected
from a mapping population and then genotyped by molecular markers. When the two DNA pools show
different alleles at a specific marker locus, the marker is linked with the disease response, while when
both pools show the same heterozygous genotype, the marker is unlinked with the disease response.
(B) Pooled DNA analysis using extreme plants selected for a target quantitative traits from two tails of
a normal distribution in the mapping population. Markertrait linkage is revealed by allele frequency at
specific marker loci. When allele frequencies are significantly different between two pools at a marker
locus, the marker is linked with the target traits, while when the allele frequencies are very close to each
other (each approximately to 0.5), the marker is unlinked with the target trait. In both A and B, assume
that the marker is dominant and reveals polymorphisms between the parental lines that are used to
derive the mapping population.

plants (Foolad and Jones, 1993; Zhang, QTL. An additional interest of the marker
L.P., et al. 2003; Wingbermuehle et al., 2004; frequency approach is that it makes it pos-
Coque and Gallais, 2006). It is especially sible to use DNA pooling of selected indi-
useful for genes that have large effects on viduals to estimate the frequencies needed
the trait of interest. It can also be used for for the tests (Darvasi and Soller, 1994).
traits controlled by a few major-effect QTL
(Quarrie et al., 1999). Furthermore, in
maize, with two cycles of recurrent selec-
tion on phenotype from a population of F4 7.7.3 The power of selective genotyping
independent families, Moreau et al. (2004) and pooled DNA analysis
have shown that the significant changes in
marker allele frequency were for a marker There are several problems associated with
locus located in the vicinity of the detected pooled or bulked DNA analysis in plants
280 Chapter 7

as summarized by Xu and Crouch (2008). (ranged from 200 to 3000), tail population
These include: (i) a relatively small number size (15100 plants in each tail, equivalent
of markers has been used to try to cover the to 1350% of selection rate), number of
whole genome with the assumption that QTL (15), marker density (115 cM), QTL
the recombinant frequencies are consist- effect (explaining 120% of phenotypic
ent across the genome and genes of interest variation), two linked QTL and two QTL
can be readily identified within a marker with epistatic interaction. One hundred
density of 1525 cM; (ii) contrasting indi- simulation runs were carried out for each
viduals have been selected from a relatively scenario from which the power of QTL
small population size so that the phenotypic detection and the mean LOD score were
difference between the pools may be only then calculated.
big enough for identification of large-effect Comparative analysis of two selective
genes/QTL; (iii) when allele signal is judged genotyping strategies (Fig. 7.7) indicated
by a gel-based genotyping system, allele fre- that conventional selective genotyping (Fig.
quency in each pool cannot be quantified 7.7A, Strategy A, where relatively small total
accurately and the allele signal generated and tail population sizes were used with a
by a small percentage of individuals in low density of marker coverage), resulted in
the pool cannot be detected and thus, the the detection of only one marker in the tar-
genetic difference between the pools can be get region with an average LOD score of 3.94
only scored as presence and absence; and and power of detection of 67%. In contrast,
(iv) because of the above reasons, a relatively Strategy B (Fig. 7.7B), where large total and
small number of individuals (about 15) is tail population sizes were used along with
included in each pool to guarantee that the a high density of marker coverage, resulted
real associated markers will not be missed, in the detection of multiple markers around
at a cost of a high level of false positives the target region with the highest having a
(markertrait is not really associated to each LOD score of 10.37 and a power of detec-
other but still indicated so statistically). The tion of 98%.
false positive markers have to be eliminated When various QTL effects (responsible
by a whole population validation step with for 120% of the total phenotypic variation),
all putative markers. tail sizes (15100) and total population sizes
Simulation studies have been carried (200, 500, 1000 and 3000) (Fig. 7.8) were
out by Xu et al. (2008) and Sun et al. (2009) used in the simulation analysis, the power of
using QTL ICIMAPPING (available at http:// QTL detection indicated the optimum total
www.isbreeding.net), an integrated com- and tail population sizes required for detec-
puting package for common QTL mapping tion of small QTL. To identify QTL explain-
methods including single marker analysis, ing 15% of the phenotypic variation with
traditional interval mapping (Lander and a 95% or higher power of QTL detection,
Botstein, 1989), and inclusive composite will require a population size of 200 or
interval mapping for additive (Li et al., more with a minimum tail size of 15, which
2007) and interacting (Li et al., 2008) QTL. matches most reported cases of successful
Several parameters associated with selec- use of bulked DNA analysis. However, to
tive genotyping were simulated based on detect QTL of small effect, ranging from 3
the assumption that phenotypic extremes to 10% of the phenotypic variation, 50100
from two tails of a recombinant inbred pop- individuals needed to be selected from each
ulation can be reliably selected and that tail of a population with 1000 individuals,
they can be genotyped either individually in order to have a 95% power of QTL detec-
so that the allele frequency in each tail can tion (Fig. 7.8). The simulation analysis also
be inferred or genotyped using bulked DNA indicated that the power of detection would
from each tail so that the allele frequencies not change when multiple QTL (two to five)
can be estimated based on the relative sig- are involved but they are independent of
nal strength of two DNA pools. Simulated each other. The simulation also indicated
parameters include total population size that selective genotyping can be also used to
Molecular Dissection of Traits: Practice 281

A
80 4.5

70 4.0

60 3.5
LOD = 3.0
3.0
50
Power (%)

2.5

LOD
40
Power 2.0
30 LOD
1.5
20 1.0
10 0.5
0 0.0
0 15 30 45 60 75 90 105 120 135 150
cM
B
120 12

100 Power 10
LOD
80 8
Power (%)

LOD = 6.0

LOD
60 6

40 4

20 2

0 0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
cM

Fig. 7.7. Effects of selective genotyping strategies on detection power and mean LOD score around the
target region (15 cM, grey area) assuming the QTL explain 10% of phenotypic variation. (A) Strategy A:
population size = 200, tail size = 15, marker density = 15 cM, resulting in only one marker showing posi-
tive in the target region with an LOD score = 3.94 and power of detection = 67%, which has been widely
used in conventional bulked DNA analysis. (B) Strategy B: population size = 500, tail size = 30, marker
density = 1 cM, resulting in multiple markers showing positive in the target region with LOD = 10.37 and
power of detection = 98%, which is proposed for selective genotyping-based fine mapping.

separate linked QTL at a distance of 25 cM in phenotyping than in genotyping when


and detect epistatic QTL. the cost ratio of genotyping to phenotyp-
A study of the optimization of selec- ing is higher than 1, the optimal proportion
tive genotyping without assumption on the selected appears to be between 10 and 20%
QTL effect such as a negligible contribution for each tail. It is mainly affected by the
r p2 of the QTL to the phenotypic variance, cost ratio and decreases when the cost ratio
was developed, with selection either of both increases. At this optimum, BSG is competi-
tails (bidirectional genotyping or BSG) or tive with ANOVA, or even more powerful,
only one tail (unidirectional genotyping or when the cost ratio is higher than 1. USG
USG) (Gallais et al., 2007). For a given popu- can also be competitive when the cost ratio
lation size of phenotyped plants the optimal is higher than 2. Using experimental data
proportion selected for selective genotyping from two populations of about 300 F4 inbred
is around 30% for each tail. For the same families of maize, it was verified that BSG
investment as in ANOVA, by investing more at the optimum gives the same results as
282 Chapter 7

n = 200 n = 500

100 100
90 90
80 80
70 70

Power (%)
Power (%)

60 60
50 50
40 40
30 30
20 20
10 100 10 100
0 50 0 50

iz e
iz e
30 30
20 15 20 15

il s
il s
15 10 15 10
5 5

Ta
Ta
3 1 3 1
QTL effect QTL effect
(%) (%)

n = 1000 n = 3000
100 100
90 90
80 80
70 70
Power (%)
Power (%)

60 60
50 50
40 40
30 30
20 20
10 100 10 100
0 50 50
0
ize

ize
30 30
20 15 20 15
il s

il s
15 10 15 10
5 5
Ta

Ta
3 1 3 1
QTL effect QTL effect
(%) (%)

Fig. 7.8. Detection power of selective genotyping under various QTL effects (120%, percentage
phenotypic variation explained by identified QTL), tail sizes (15100) and total population sizes
(2003000). A total of 100 permutations were implemented for each case and each combination.

ANOVA or is better whereas USG is less All-in-one plate genetic mapping of all
powerful or equivalent. target traits in one step

A large number of trait-specific genetic and


7.7.4 Use of selective genotyping breeding materials, with novel properties
and pooled DNA analysis including inbreds/cultivars with extreme
phenotypes, eternal/fixed segregating popu-
Replacement of entire population genotyping lations (e.g. RILs, DHs, NILs, introgression
lines (ILs)), genetic stocks (e.g. single-
A general conclusion that can be drawn from segment substitution lines (SSSLs)) and
the simulation analyses is that selective geno- mutant libraries, have been developed and
typing can be used to replace the entire popu- maintained across the world for most crops.
lation genotyping approach in almost all cases These are valuable directly for the purpose
including QTL with relatively small effects they were developed but also offer a novel
as well as QTL with epistatic interactions or resource for genetic mapping and gene
link QTL. It is recommended that for selec- discovery when used collectively. These
tive genotyping for QTL of different effects, materials have often been phenotyped in
the population sizes would be: 20 individu- multiple environments due to their perma-
als from each tail of a population with 200 nently fixed genetic composition. By col-
individuals for large QTL (15% or larger), 50 lecting phenotypic extremes from currently
individuals from each tail of a population available genetic and breeding materials and
with 5001000 individuals for medium-size utilizing selective genotyping and pooled
QTL (310%) and 100 individuals from each DNA analysis, it is theoretically possible
tail of a population with 30005000 individ- that one 384-well plate could be designed to
uals for small QTL (0.23%). cover almost all major gene/QTL-controlled
Molecular Dissection of Traits: Practice 283

agronomic traits of importance in a crop genes that interact with each other and the
species (Xu et al., 2008; Sun et al., 2009). environment, selective genotyping will face
the same challenges as experienced with
Genome-wide association mapping linkage-based QTL mapping using entire
population genotyping.
Developments in SNP genotyping technolo-
gies and methodologies recently reported Integration with selective phenotyping
in human genomics have made it possible
now to carry out genome-wide linkage- The selective phenotyping method involves
disequilibrium-based association mapping preferentially selecting individuals to
in human beings by using an integrated maximize their genotypic dissimilarity.
technology package including selective Selective phenotyping is most effective
genotyping, pooled DNA analysis and when prior knowledge of genetic archi-
microarray-based SNP genotyping with tecture allows focus on specific genetic
100,000 markers (Sham et al., 2002; Meaburn regions (Jin et al., 2004; Jannink, 2005) and
et al., 2006; Yang, H.-C. et al., 2006a). This specific allele combinations. As genotyping
system has the power to estimate allele fre- becomes cheaper, it may be more efficient
quencies and identify unique alleles from a to first carry out low density genotyping of
pooled DNA sample of several hundreds of the whole population in order to identify
individuals. If this approach is successfully the most informative subset of individuals
translated to plants it will resolve many of in terms of minimum level of relatedness
the constraints of pooled DNA analysis. The between individuals plus optimum subpop-
high frequency of false positive markers ulation structure and allele representative-
that would be detected when substantially ness. Then carry out precision phenotyping
fewer plants are used in each pool could of this subset, particularly for the traits that
be avoided if a pooled DNA can be formed are difficult or expensive to evaluate. And
using many more plants selected from a then finally carry out dense whole genome
large population. However, optimizing SNP genotyping of the individuals from the tails
genotyping systems for pooled DNA analy- of the phenotypic distribution. In this way,
sis is considerably more complicated than the total number of individuals to be pheno-
for SSR markers and suffers a much higher typed and genotyped may not change, but
level of redundancy. Where this has been the power of the analysis will be dramati-
achieved in human genomics, it required at cally increased. This approach could also
least half a million SNPs as a starting point be achieved for traits where phenotypic
in order to identify 100,000 optimized SNPs extremes can be easily identified by using a
suitable for pooled analysis. This density of simple screening method, for example abi-
SNP markers is available in rice and maize otic stress tolerance where a large number
and in due course other crops when whole of plants/families can be eliminated eas-
genome sequences are generated. ily under stress conditions through visual
Genome-wide association mapping scoring. As the original population can be
may provide a shortcut to discovering func- selected under a strong environmental stress
tional alleles and allelic variations that are to eliminate a large proportion of the plants,
associated with agronomic traits of inter- only the most stress tolerant and probably
est. Selective genotyping and pooled DNA the most stress sensitive plants too, are
analysis can be extended to using inbred selected for genotyping. Following selective
lines with extreme phenotypes selected genotyping of the individuals with extreme
from various collections of germplasm. phenotypes, precision phenotyping of the
This is in principal similar to linkage- resultant subset of individuals can be car-
disequilibrium-based association mapping ried out using physiological component and
but using selected phenotypic extremes. surrogate traits. High-density planting and
For association mapping of quantitative selection at early stages of plant develop-
traits governed by a large number of minor ment, combined with selective phenotyping
A Large-size segregating populations B Large-size segregating populations

284
(F2, BC, composite F1, RIL, DH, etc., n > 500) (F2, BC, composite F1, RIL, DH, etc., n > 500)

High phenotypic extreme Low phenotypic extreme Phenotypic extremes Phenotypic control
(3050 plants/fixed lines) (3050 plants/fixed lines) each with 3050 plants/fixed lines 3050 plants/fixed lines
selected from 500+ selected from 500+ selected under randomly selected under
plants/fixed lines plants/fixed lines a target environment a normal environment

Phenotype confirmation Phenotype confirmation Phenotype Phenotype


using families derived from using families derived from confirmation confirmation
selected plants/fixed lines selected plants/fixed lines under the target under a normal
under a target environment under a target environment environment environment

Chapter 7
Extraction of DNA Extraction of DNA Extraction of DNA Extraction of DNA

Genotyping Genotyping Genotyping Genotyping


using individual DNA using individual DNA using individual DNA using individual DNA
or bulked DNA analysis or bulked DNA analysis or bulked DNA analysis or bulked DNA analysis

Markertrait association analysis Markertrait association analysis


by comparing allele frequencies or signal difference by comparing allele frequencies or signal difference
between high and low phenotypic extremes between high and low phenotypic extremes
or between high phenotypic extreme and phenotypic control

Fig. 7.9. Flowchart for large-scale selective genotyping and genetic mapping, including: selection of phenotypic extremes from large-size segregating populations,
phenotype confirmation, DNA extraction, genotyping and markertrait association analysis. (A) A procedure for most target traits which can be scored phenotypically
for all individuals/fixed lines, and then high- and low-phenotypic extremes are selected for further analysis. (B) A procedure particularly suitable for abiotic and biotic
stress tolerance where only the phenotypic extreme for tolerance is available under a target environment and comparison is made between the extreme and the
phenotypic control that is randomly selected from the individuals/fixed lines under a normal environment. From Xu and Crouch (2008) with permission.
Molecular Dissection of Traits: Practice 285

and genotyping should also be investigated in this endeavour (Dudley and Lambert,
as a potential option for some traits in order 2004), particularly regarding the success of
to allow one to work with more plants/fami- marker-assisted recurrent selection (MARS)
lies at the same cost (Xu and Crouch, 2008). to accumulate favourable alleles at numer-
Where the target trait is influenced by plant- ous loci. In this approach, pyramiding of
ing density or strong selection pressure this minor genes can be achieved using MARS
will clearly confound the ability to make to accumulate minor QTL where decades
genetic gain. However, many major-gene of breeding efforts have resulted in the fixa-
controlled traits can be investigated in this tion of all major genes.
way without much disturbance. It can be expected that selective geno-
Figure 7.9 shows this method for detec- typing and pooled DNA analysis, which have
tion of markertrait association for stress tol- been widely used with mixed success in
erance and other traits that can be selected genetic mapping, will become increasingly
for phenotypic extremes under a target important in genetic mapping and MAS and
environment. It can be inferred that pheno- will gradually replace entire population gen-
typic extremes or extremely stress tolerant otyping in many cases. Selective genotyping
plants are those with an accumulation of will greatly facilitate and improve genetic
favourable alleles from multiple loci, each mapping and marker-assisted breeding pro-
with small to large effects, so that genetic cedures in general. As genome-wide selec-
mapping will identify the genetic regions tive genotyping become possible, an effective
with relatively large accumulative effect on information management and data analysis
the target trait. In this case, lessons learnt system will be required to make full use of
from long-term selection of protein and oil the potentialities of selective genotyping in
content in maize may be highly instructive genetics, genomics and plant breeding.
8
Marker-assisted Selection: Theory

Conventional breeding procedures have In marker-assisted breeding the plant


been limited by the fact that the sources of breeder takes advantage of the association
genetic variation amenable to its manipula- between agronomic traits and allelic vari-
tions are restricted in practice to those avail- ants of genetic, mostly molecular, markers.
able within the gene pool of a single species. The general idea behind marker-assisted
In addition, conventional breeding manipu- breeding is as follows. Before a breeder can
lations involve crossing entire genomes, utilize linkage-based associations between
relying on independent assortment and traits and markers, the associations have to
recombination to produce superior recom- be assessed with a certain degree of accur-
binants; and attempting to identify these acy and thus marker genotypes can be used
from among many segregation products. as indicators or predictors of trait genotypes
Even for alleles with a major and clear-cut and phenotypes. When the alleles in ques-
phenotypic effect this can involve many tion are few in number and have major
generations and the task may be almost effects on phenotype, such as a single gene-
impossible when desired and undesired based disease resistance, the assessment of
alleles are closely linked. In principle, these association is straightforward: mapping a
limitations can be overcome by the genetic monogenic trait goes along with the mapping
engineering procedures such as recom- of markers, as described in Chapter 2, while
binant DNA methodologies. These proced- introduction of the desired alleles into the
ures have enabled the cloning of genes with cultivar can be carried out readily by the
defined activities and their introduction in classical breeding procedures of crossing,
a rapid and highly specific manner across backcrossing, selfing and selection. In both
species boundaries. Thus, all kinds of life cases, breeders depend on a clear relation-
now become available as a source of useful ship between genotype and phenotype to
alleles and the mixing of entire genomes monitor the presence of the desired alleles
is avoided (Beckmann and Soller, 1986a). in the populations of concern. For quanti-
The basic limitation in genetic engineer- tative traits, however, a reliable assessment
ing applications at present seems to be of traitmarker association requires large-
the public acceptance of genetically modi- scale field experiments as well as statistical
fied organisms (GMOs) derived from gene techniques, known as quantitative trait loci
cloning and transformation approaches as (QTL) mapping as described in Chapters 6
described in Chapters 11 and 12. and 7. Once markertrait associations have

286 Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu)


Marker-assisted Selection: Theory 287

reliably been assessed, the breeder is able the magnitude of their effects and the way
to monitor the transmission of trait genes these loci interact. Hence, error margins on
via closely linked markers, thus enabling the measurement of phenotypes tend to be
genotype building, i.e. construction of significantly larger than those of genotyping
desired genotypes by deliberate crossing scores based on DNA markers.
and selection, using the marker genotype as 2. Increased efficiency: DNA markers can
a selection criterion. be scored at seedling stage or even based on
The potential value of genetic markers, seed before germination. This is especially
linkage maps and indirect selection in plant advantageous when selecting for traits which
breeding has been known for over 80 years. are expressed only at later stages of develop-
Since the advent of DNA marker technology ment, such as traits associated with flower,
in the 1980s, it has dramatically enhanced fruit and seed. By selecting at the seedling
the efficiency of plant breeding. In the past 20 stage or based on seed DNA, considerable
years, a number of breeding companies have, amounts of time and space can be saved.
to varying degrees, used markers to increase 3. Reducing costs: there are ample traits
the effectiveness of selection in breeding where the determination of the phenotype
and to significantly shorten the development costs more than the performance of geno-
time of cultivars (Dwivedi et al., 2007). Now, typing using a PCR assay or hybridization.
advances in automated technology enable a In a high-throughput setting, the material
new approach in marker-assisted breeding, and consumable cost for a PCR assay will
called Breeding by Design. The advances in typically not exceed US$2. In comparison,
applied genomics and the possibility to gen- the growth of a tomato or pepper plant to
erate large-scale marker data sets provide us full maturity in a heated greenhouse will
with the tools to determine the genetic basis cost approximately US$20. Every plant that
for all traits of agronomical importance. Also, can be rejected before planting, particularly
methods for assessing the allelic variation at for those with the seed that is big enough
these agronomically important loci are now for single seed-based DNA extraction, will
available. This combined knowledge will in such settings save a considerable amount
eventually allow the breeder to combine of money.
favourable alleles at all these loci in a con-
trolled manner, leading to superior cultivars The use of DNA markers for indirect
(Peleman and van der Voort, 2003). selection offers greatest benefits for quanti-
Changing concepts and molecular app- tative traits with low heritability as these
roaches provide opportunities to develop are the most difficult characters to assess
rational and refined breeding strategies. in field experiments. Obviously, the devel-
Knowledge about map position and allelic opment of marker-assisted assays for such
variation at agronomically important loci traits is difficult and costly due to the exten-
in concert with available, easy-to-assay sive phenotypic assays required for such
molecular markers have made possible the traits. However, once the knowledge exists
design of superior cultivars. Compared to to estimate the parameters which determine
phenotypic assays, as summarized from the trait of interest, a well-designed experi-
Xu, Y. (2002), Peleman and van der Voort mental set up will result in the availability
(2003), Xu, Y. (2003) and Xu and Crouch of marker-assisted selection (MAS) tools,
(2008), DNA markers offer great advantages which can reduce to a major extent future
to accelerate the cultivar development time application of phenotypical assays (Peleman
as a result of the following. and van der Voort, 2003). As described in
previous chapters, molecular marker tech-
1. Increased reliability: the outcome of nology will help identify favourable alleles
phenotypic assays is affected, among others, for agronomic traits, associate these alleles
by environmental factors, the heritability with specific molecular markers and intro-
of the trait, the number of genes involved, duce them from one genetic background
288 Chapter 8

to another through MAS. Theoretical con- for efficient MAS, including: (i) suitable
siderations in MAS will be discussed in this genetic markers and their characterization;
chapter and the practical issues in MAS will (ii) high-density molecular maps; (iii) estab-
be discussed in Chapter 9. lished markertrait associations for traits of
interest; (iv) high-throughput genotyping
systems; and (v) functional data analysis
and delivery.
8.1 Components of Marker-assisted
Selection

Key issues in successful deployment of 8.1.1 Genetic markers and maps


molecular markers in MAS are as follows as
summarized from Xu, Y. (2003) and Mohler Desirable DNA markers for MAS should
and Singrn (2004): meet the following requirements: detec-
tion of high frequency of polymorphism,
1. Markers should co-segregate or map as
co-dominance, abundance, whole genome
close as possible to the target gene (e.g. less
coverage, high duplicability, suitability
than 2 cM), in order to have low recombin-
for high-throughput analysis and multi-
ation frequency between the target gene
plexing, technical simplicity, cost effec-
and the marker. Accuracy of MAS will be
tiveness, requirement of a small amount of
improved if, rather than a single marker, two
DNA and user-friendly (such as suitability
markers flanking the target gene are used.
for different genotyping systems and facil-
Ideally, gene-based markers that are devel-
ities). Among all these requirements, co-
oped from the sequence of the target gene,
dominance is the most important for MAS
or functional markers that reveal functional
in breeding hybrid crops, because the two
differences associated with the target gene,
parental inbreds and hybrid combinations
are more preferred as segregation between
can be distinguished unambiguously by
the marker and the target gene will no longer
co-dominant markers. Single nucleotide
exist or will be reduced to a minimum.
polymorphism (SNP) markers have great
2. For unlimited use in MAS, markers
potential for super high-throughput an-
should display polymorphism between
alysis using chip technology and become
genotypes that have and do not have the
available in more and more plant spe-
target gene.
cies, while simple sequence repeat (SSR)
3. Cost-effective, simple and high-throughput
markers are more widely used across dif-
markers are required to ensure genotyping
ferent crops. For all types of DNA mark-
power needed for the rapid screening of large
ers available so far, SSR satisfies all the
populations. Hybridization-based non-PCR
requirements. As estimated from a draft
markers that can reveal difference from DNA
rice sequence, the density of SSRs in the
samples directly would be more preferred.
genome is approximately one SSR per gene.
In addition, marker-assisted back- These markers can be shared internation-
ground selection depends on molecular ally through Internet-distributed primer
markers that are well characterized and dis- sequences. SSR markers can be genotyped
tributed over the whole genome. It is most manually using agarose or polyacrylamide
desirable if gene-based markers are used gels and ethidium bromide or silver stain-
for both marker-assisted foreground and ing, or in highly automated facilities. SSR
background selection. In this case, a core markers can be multiplexed doing PCR and
set of markers can be established for both for multiple-sample loading on gels using
purposes so the same markers can be used fluorescent labelling. DNA extracted from
for foreground selection in some crosses but a small piece of leaf or from single dry
background selection in others. seeds will be enough to run several hun-
As summarized by Xu, Y. (2003), there dreds of markers. As more and more genes
are five key components that are required are cloned, it will be possible to develop
Marker-assisted Selection: Theory 289

molecular markers based on sequences The efficiency of MAS largely depends


within the gene of interest. Intragenic on how well markers are linked to the target
markers provide several advantages over trait. Construction of a high-density genetic
gene-linked markers. First, there is no map using high-throughput molecular mark-
recombination between the marker and ers is the first step to a large-scale MAS pro-
the gene, or intergenic recombination. gramme. A reference map is required for each
Secondly, multiple alleles can be tagged crop or species based on a permanent segre-
and distinguished. An example is the SSR gating population that can be shared interna-
marker in rice, RM190, derived from a tionally allowing the placement of additional
microsatellite sequence with a splice site markers on the same map. This map should
in the waxy gene, which is responsible be constructed using markers that are friendly
for amylose synthesis, an important grain to users. There are two reasons why we need
quality trait for rice. As more and more a high-density molecular map. First, a mini-
cloned genes become available, functional mum requirement for MAS based on marker
markers developed from the target gene will trait association includes a three-marker
be the best choice in MAS. system: one marker co-segregating with the
Selection of target chromosomal trait for foreground selection and the other
regions based on associated markers two flanking a target region for recombinant
(foreground selection) and selection of selection. Since the target gene can be located
genetic background for one of the paren- in any region of the genome, a dense map is
tal genomes (background selection) may required for identifying this triplet at any
require different markers. Markers for position in the genome. Even if a genic or
foreground selection must be genetically functional marker is available for the target
mapped and associated with agronomic trait, the triplet is still needed for selection
traits. Genetic markers revealing multi- against the donor genome around the target
ple bands or representing multiple loci when marker-assisted gene introgression
are usually difficult to trace back to the is involved. Secondly, markers identified
specific allele/locus known to be associ- using mapping populations may not be pol-
ated with the trait, particularly when the ymorphic in breeding populations derived
population used for MAS is different from from other parental lines. To guarantee that
the population used for mapping. These the three-marker system will work for other
types of markers include randomly ampli- breeding populations, many markers have to
fied polymorphic DNAs (RAPDs) and be identified around the target region. For the
amplified fragment length polymorphisms crop species with physical maps and whole
(AFLPs), which as such are not good for genome sequences available, molecular
foreground selection. To use markertrait markers will be available to cover the entire
associations based on these markers, it is genome so that markers targeting specific
best to use markers that are more locus- genes and genomic regions can be selected
specific, such as sequence tagged site from the marker and/or sequence databases.
(STS) or SSR markers. For background Using less expensive array-based genotyping
selection, any type of markers that dis- systems, a core set of array-based of molecu-
play a high rate of polymorphism is use- lar markers can be established for all genetic
ful. Background selection does not require mapping and genome-wide MAS for differ-
the use of mapped markers as long as they ent populations.
can reveal genome-wide polymorphism
and the revealed differences can be traced
back to their parents. As indicated previ-
ously, however, it is desirable to develop 8.1.2 Marker characterization
a core set of gene-based markers for both
foreground and background selection so It is not enough to just have thousands
this set of markers can be used as univer- of genetic markers in hand. To use
sal markers across different populations. molecular markers efficiently, they have
290 Chapter 8

to be characterized for many features, described by Botstein et al. (1980), which


including: number of alleles; polymorphism reflects the amount of polymorphism and
information content (PIC); allelic difference is a function of the number of alleles and
(e.g. allele sizes and their range); allele fea- allele frequencies at any given locus. PIC
ture (e.g. haplotypes) in standard or control value is used to refer to the relative value
cultivars; signal strength under specific of each marker with respect to the amount
genotyping conditions; background or noise of polymorphism exhibited, which can be
n
signal; PCR or hybridization conditions; estimated by PIC i = 1 j =1Pij2, where Pij is
chromosome location (flanking markers the frequency of the jth allele for marker
and genetic distances); and information i and the summation extends over n alleles
required for multiplexing. (Weir, 1990; Anderson et al., 1993). The cal-
Characterization of molecular markers culation is based on the number of alleles
helps to identify markers close to the genes detected by a marker at a given locus and
of importance to breeding programmes and the relative frequency of each allele in the
to evaluate germplasm and breeding materi- tested accessions. In rice, the average PIC
als. A core set of molecular markers should value was almost twice as high for SSRs
be characterized for each plant species and (0.66) as for RFLPs (0.36) (Xu et al., 2004).
these markers should be evenly distributed
on all chromosomes and suited for multi- Informative markers
plexing. Many crop plants have now estab-
lished core-set markers and have been used Based on PIC values and the number of
for evaluation of germplasm accessions, alleles detected, a set of highly informa-
construction of heterotic pools and MAS tive markers can be selected such that the
(see Xu, Y. (2003) for an example in rice). same amount of genotyping information can
Many efforts have been contributed to char- be obtained by surveying fewer molecular
acterize array-based markers and optimize markers. Selected markers should be evenly
genotyping systems. distributed throughout the genome. As an
example, a group of 24 RFLP/SSR mark-
Allele number ers was selected from 236 accessions 160
markers rice data set as a set of highly
The number of alleles at a marker locus is informative markers for preliminary fin-
related to the genetic diversity that can be gerprinting of rice germplasm and breeding
revealed by a particular marker. The more populations (Xu et al., 2004).
alleles at a locus, the higher the degree of In addition to the requirements dis-
diversity that can be revealed and the more cussed previously for a marker system, there
efficiently closely related lines can be dis- are some other requirements for a marker
tinguished. SNP markers show polymor- and a core set of markers. Taking SSR mark-
phism by two different nucleotides, usually ers as an example, a useful marker should
displaying two different alleles across germ- have many alleles per locus (> 10), high PIC
plasm. Restriction fragment length poly- value (> 0.8), suitable difference in allele
morphism (RFLP) markers have many fewer sizes (410 bp between any two alleles),
alleles per locus compared to SSR markers. strong signal for detection, less background
As a typical example, an average of 2.7 or noise signal and high replicability or
RFLP alleles and 11.9 SSR alleles per locus reliability. As SNP markers have only two
has been reported by Xu et al. (2004) based alleles at each locus, the informative marker
on a large set of germplasm accessions. sets rather than individual markers should
be used and their informativeness can be
Polymorphism information judged by their haplotypes. Also allelic
content (PIC) value polymorphism should be considered for the
markers developed from different regions of
The relative informativeness of each marker a gene for candidate gene-based, gene-based
can be evaluated based on its PIC value, as or functional markers. No matter which type
Marker-assisted Selection: Theory 291

of markers is used, a useful set of markers depend on specific environments. Thus,


should provide whole genome coverage, QTL markers identified using a single map-
even distribution on each chromosome and ping population may not be automatically
high potentiality for multiplexing or high- used directly in unrelated populations
throughput genotyping. without marker validation and/or fine map-
ping (Nicholas, 2006). The markertrait
association must be validated, in represent-
ative parental lines, breeding populations
8.1.3 Validation of markertrait and phenotypic extremes before it can be
associations used for routine MAS, particularly for QTL
with relatively small effects. In a propor-
Establishment of highly significant marker tion of cases, markers will lose their selec-
trait associations is one of the prerequisites tive power during this validation step. In
for MAS, which is discussed in Chapters 6 these cases, there is a need to identify new
and 7. Demonstrated linkages between tar- markers (through fine mapping or candi-
get traits/genes and molecular markers are date gene analysis) around the target locus
traditionally based on genetic mapping in order to find markertrait associations
experiments and it is important to confirm that are shared across different breeding
that these associations are consistent in populations. Shortcuts in this process may
mapping and breeding populations. As an be possible through cross comparison with
example, Knoll and Ejeta (2008) validated dense maps or by screening candidate gene
three QTL for early-season cold tolerance in markers for species where these are avail-
sorghum using two other populations and able. By finding several markers within a
they found that all three associated markers single gene, it is much more likely that the
were shown to retain influence in the differ- parents of any breeding population will
ent genetic backgrounds. be polymorphic for at least one of them,
In many cases, however, genetic map- thus allowing breeders to track the alleles
ping results obtained from specific crosses donated from each parent throughout the
cannot be used for MAS for the same traits breeding process, speeding MAS and back-
in different crosses. There are three rea- crossing in any cross. In the better studied
sons for this phenomenon. First, quantita- species, up-to-date markertrait association
tive traits are usually controlled by many and associated SNP markers can be rou-
genes. Genes are only segregating at the loci tinely accessed through gene cloning and
where two parents are genetically different fine mapping reports.
and thus can be mapped using the popula- MAS has been successfully applied to
tion derived from these two parents. For a date for monogenic and oligogenic traits
randomly selected mapping population, controlled by major genes and for QTL of
the parents will have a strong chance to large effect influencing complex traits, par-
share identical alleles at some of the genetic ticularly in the private sector (Dwivedi et al.,
loci. There is a high probability that seg- 2007). For more precise genotypic selection
regating genes in any breeding population of complex traits such as the minor-gene
could be different from the genes already controlled abiotic stress tolerance, more
mapped. Secondly, multiple alleles at a closely linked markers, preferably gene-
locus work in the same way to complicate based markers, or even better, functional
MAS, because mapping parents could have nucleotide polymorphic markers (Rockman
alleles that are different from those of breed- and Wray, 2002; Andersen and Lbberstedt,
ing populations. Interaction among these 2003), need to be developed. This should be
multiple alleles will modify markertrait combined with precision phenotyping in
associations when different allele combi- order to maximize the power of detection
nations are considered. Thirdly, genotype- and minimize the chance of false negative.
by-environment interaction could make the There are many reasons why close
establishment of markertrait association markertrait associations are required:
292 Chapter 8

(i) chromosomal location associated with the coupling reliable chemical assays with an
trait must be reduced to a manageable piece appropriate detection system to maximize
of DNA if cloning of specific genes is neces- efficiency with respect to accuracy, speed
sary; (ii) to identify all the related genes for and cost. With current technology plat-
a specific trait, a high-density genetic map forms (e.g. Illumina), one lab can deliver
is required because the fewer markers are throughputs in excess of one million data
used, the smaller proportion of genetic fac- points per day, with an accuracy of > 99%,
tors contributing to that trait will be sam- at a cost of US$0.060.10 per data point
pled; (iii) large genetic distances between using array-based SNP genotyping system.
markers and target traits will contribute to In order to meet the demands of the com-
the rapid decrease of MAS efficiency after ing years, however, genotyping platforms
several successive cycles of selection; and need to deliver throughputs in the order
(iv) to minimize linkage drag involved in of one million genotypes per day at a cost
gene introgression, closely linked markers of only a few cents per genotype (instead of
around the target region are needed. per data point). In addition, DNA template
QTL mapping presumes accurate phe- requirements must be minimized such that
notypic scoring methods, something that hundreds of thousands of SNPs can be
can be difficult to optimize and even more interrogated using a relatively small amount
difficult to keep consistent for months or of genomic DNA. Released whole genomic
years. Just a few mis-scored individuals can sequences in model and crop plants includ-
totally confound QTL discovery and place- ing Arabidopsis, rice and maize have been
ment (Young, 1999). This is also true for used to develop gene-based SNPs for other
fine mapping of major genes for map-based related species.
cloning, where mis-scoring of several plants
in a population with thousands of individu-
als will result in a large error (up to 1 cM) in
estimating genetic distances. High levels of 8.1.5 Data management and delivery
accuracy are required to dissect a chromo-
somal region associated with a given trait To handle the daily data flow from the lab-
and narrow down the candidate region to a oratory to the breeder and integrate infor-
single contig or several mega bases, that is, mation from molecular markers, genetic
a set of clones that can be assembled into a mapping and phenotyping, many informat-
linear order. ics tools are needed. Decision support tools
required in molecular breeding are fully
discussed in Chapter 15 so only data man-
agement and delivery are briefly described
8.1.4 Genotyping and high-throughput here as a component of MAS.
genotyping systems For efficient data management and
delivery, it is important for all researchers
To make marker-based technology practi- to follow general rules through all these
cal for breeding applications, an automated procedures. A standard reporting system is
genotyping system is required. As an ulti- also critical for comparative genomics, QTL
mate marker type, SNPs have gained wide allelism tests, data sharing and mining and
acceptance as genetic markers for use in the correspondence between major genes
linkage and association studies, especially and QTL. As discussed by Xu, Y. (2002), a
for human genetics and many crop plants standard system for markertrait associa-
as well. High-throughput SNP genotyping tion should include associated alleles and
has great potential for many applications, allele characterization such as allele sizes,
including MAS on the basis of whole genome gene effects, variation explained by each
approaches. This has led to a requirement for gene or all genes in the model, gene inter-
high-throughput SNP genotyping platforms. action if more than one gene is identified
Development of such a platform depends on and genotype-by-environment interaction
Marker-assisted Selection: Theory 293

if more than one environment is involved. selected genome-wide to minimize the


Genetic information should be shared and donor genome content (DGC).
combined with data generated in plant Historically, Tanksley and Rick (1980)
breeding, for example, germplasm diversity, and Tanksley (1983) considered the use of
mapping populations, pedigrees, graphical isozyme markers to speed the introgres-
genotypes, mutants and other genetic stocks. sion of a trait controlled by a gene of major
With thousands or even millions of Mendelian effect from an exotic resource
data points flowing out of a laboratory population to a cultivar. In this case, the gen-
daily, timely scoring and delivery of the eral problem is to eliminate the exotic donor
results to breeders are basic requirements genome as rapidly as possible by replac-
for a high-efficiency breeding system. Well- ing it with the recipient cultivar genome
trained assistants for genotyping and scor- while retaining the gene of interest from the
ing, coupled with research scientists who donor. This is generally accomplished by
can analyse data in meaningful ways, are a larger number of backcross (BC) genera-
the key components for a data manage- tions. Tanksley and Rick (1980) pointed out
ment and delivery system. A laboratory that if the chromosomes of the donor strain
with well-equipped facilities has to be also carry isozyme or other markers differentiat-
well equipped with qualified personnel ing them from the recipient chromosomes,
and software required for data integration, the number of BC generations required
manipulation, analysis and mining. Timely can be reduced dramatically by selecting
delivery of data to the breeder is also equally against the donor marker alleles. Because of
important, because in many cases the time the general lack of isozyme or other markers
window the breeder can use for selection differentiating cultivars from one another,
is very limited. With the high-throughput Tanksley and Rick (1980) and Tanksley
genotyping and data management systems (1983) proposed this scheme as primarily
currently available, it takes 510 days to of use for introgression from wild species to
generate and analyse data including activi- cultivar. With the availability of informative
ties ranging from leaf tissue harvesting to molecular markers to differentially mark the
DNA extraction, genotyping, data scoring, whole genome, however, this technique has
analysing, summarizing and reporting. The been used for cultivar-to-cultivar introgres-
number of data points that can be generated sion within a species.
and thus the number of plants that can be Efficiency of MAS in gene introgression
handled in a week in a lab depends on the is affected by many factors including popu-
level of high throughput and the availability lation size, genome size, markergene link-
of genotyping facilities. age intensity and number of markers. Stam
(2003) raised several issues that are relevant
to the design of an introgression-breeding
programme:
8.2 Marker-assisted Gene
Introgression What amount of variation in DGC is to
be expected in generations BC1, BC2,
Gene introgression involves the introduc- etc.?
tion of a target gene into a productive, To what extent does this depend on
recipient line or cultivar. It can be used in the number of chromosomes and the
both backcrossing and intercrossing pro- genome size?
grammes. By using DNA markers to identify What population size is required to
recombinants, introgressed chromosome guarantee, with 90% certainty, that at
segments might be trimmed to minimal least one individual in a BC1 has a DGC
size, reducing the extent to which the recur- of less than e.g. 0.30?
rent genotype is disrupted by undesirable If markers are flanking the target gene,
alleles closely linked to the target trait. At what are the optimal population sizes
the same time, genetic background can be in successive generations to ensure
294 Chapter 8

that the donor segment dragged along foreground selection can involve one to
with the target gene is smaller than the several markers. The simplest way is to use
segment bracketed by these flanking one closely linked marker (on either side
markers? of the target locus). The most complicated
Does it pay to increase the number of approach is to integrate foreground selec-
markers for background selection? If tion with background selection using mul-
so, to what extent does this depend on tiple markers for the target locus and many
population sizes used and/or genome others for covering the entire genome, this
size? is referred to as whole genome selection
If a certain pre-set goal, e.g. less than in this book and differs from genome-wide
0.05 DGC, is to be achieved in a given selection which will be discussed later in
number of generations, should popula- this chapter. The most frequently used
tion size in successive generations be approach is to use a triplet, markertarget
constant or is it better to vary popula- marker. Depending on how close the linked
tion size over generations? markers are to the target, the population
If the number of generations is not a sizes required for identification of particu-
limiting factor, but the total number of lar genotype, the cost and efficiency related
plants to be genotyped is, then what is to foreground selection compared to phe-
the optimal distribution of plant num- notypic selection varies significantly. For
bers over generations? example, a two-genetic locus model with
Do the same guidelines for optimal one marker and one target locus involved
transfer of a single target gene also hold can be simplified as selection for a single
for the transfer of multiple genes? gene-based marker when the marker is
developed from the target gene.
Some of these issues on gene introgres-
sion have been also addressed by a number
of authors, using an analytical approach, Selection using single markers
numerical methods, computer simula-
The reliability of foreground selection
tions or a combination thereof (Hospital
largely depends on the genetic distance
et al., 1992; Hospital and Charcosset, 1997;
between the markers and the target gene. If
Hospital, 2001; van Berloo et al., 2001;
only one marker, located on one side of the
Stam, 2003). As a special case, Frisch (2004)
target gene, is used in selection, the link-
discussed the issues related to introgres-
age between the marker and the gene has
sion of a recessive gene, where recurrent
to be very tight in order to have relatively
backcrossing without the aid of molecular
high selection efficiency. Suppose a marker
markers requires progeny tests in each BC
locus (M/m) is linked with the target locus
generation in order to determine whether a
(Q/q) with recombination frequency of r
plant is a heterozygous carrier of the reces-
and the F1 has genotype MQ/mq, where Q
sive gene or not.
is the target allele to be selected; when M is
linked to Q, Q can be selected on the basis
of M. The probability that the Q/Q genotype
8.2.1 Marker-assisted foreground can be obtained through selection of marker
selection genotype M/M, that is, the probability for
selecting the correct individuals, is
There are several approaches to using
molecular markers to select an associated P1 = (1 r)2 (8.1)
target gene or allele (foreground selection).
Foreground selection can be used for gene From Fig. 8.1, the probability for select-
introgression from one genetic background ing the correct individuals decreases rap-
to another and pyramiding multiple genes/ idly with the increase of recombination
alleles to a genotype from multiple donors frequency. In order to have over 90% proba-
as well. For a specific target gene or allele, bility, the recombination frequency between
Probability for selecting correct plants (%) Marker-assisted Selection: Theory 295

100 18

Minimum plants to be selected


16
80 Bracket markers 14
12
60 10 Single marker
Single marker
8
40 6
4 Bracket markers
20 2
0
0.0 0.1 0.2 0.3 0.4 0.5
0
0.0 0.1 0.2 0.3 0.4 0.5 Recombination frequency
Recombination frequency
Fig. 8.2. Relationship between the recombination
Fig. 8.1. Relationship between the recombination frequency between marker and target gene and
frequency between marker and target gene and the minimum plants that should be selected in
the probability for selecting correct plants based MAS. Assume the target gene is between the
on linked markers. middle of two markers, i.e. r1 = r2, when bracket
markers are considered.
the marker and the target gene must be
less than 0.05. When r is larger than 0.10, is equivalent to non-linkage between the
the probability reduces to the below 80%. marker and the target gene (r = 0.5), at least
However, if we just want to have at least one 16 plants are needed.
selected individual with the target genotype,
MAS is still very helpful even if the linkage Selection using bracket markers
is very loose. If the probability for getting at
least one desired individual with the target By monitoring markers flanking the target
genotype is P2, the minimum sample size locus and the recipient alleles at the flanking
required to obtain the target genotype M/M markers, the length of intact donor chromo-
can be obtained from the probability func- some segment around the target gene can be
tion of the binomial distribution as reduced efficiently (Tanksley et al., 1989).
This rationale can be used to determine the
n = log(1 P2)/log (1 P1) (8.2) population size in a BC programme such
that the recombinants between the target
Frisch et al. (1999b) provided the prob- gene and the flanking markers can be found
abilities for getting the target genotype, P2, with a high probability.
which are not only defined by factors asso- To reduce false positives in MAS, flank-
ciated with the target gene and its flanking ing markers or multiple markers around
markers but also by the condition that the the region should be used simultaneously.
complete chromosome region between a A three-marker system, with three mark-
flanking marker and the nearest telomere ers located on a chromosome block, will be
consists entirely of the recurrent parent desirable in this case (Zhang and Huang,
genome. 1998). The marker in the middle, preferably
Figure 8.2 indicates the relationship intragenic or co-segregating with the gene,
between the minimum sample size required will be used to indicate the presence of the
and the recombination frequency when target gene in the selection process. The
P2 = 0.99. Even if the recombination fre- marker on each side will be used to indicate
quency is as high as 0.3, only seven plants the absence of the chromosome segment
that have M/M genotype are needed in order from the donor parent (negative selection),
to have 99% probability that at least one of that is, selection for recombination between
them has the target genotype. In selection the target gene locus and the marker loci. As
without using molecular markers, which more and more genes have been cloned, the
296 Chapter 8

marker in the middle could be developed between single crossovers (as is generally
from the cloned gene. This system will be the case), the frequency of actual double
very useful when the target gene is only crossovers is lower than the expected value
available in a wild species and linkage drag (which assumes no interference). Therefore,
is associated with the chromosome segment the actual probability for making the right
to be introgressed. selection based on flanking markers should
Suppose there are two marker loci (M1/ be higher than the theoretical expectation.
m1 and M2/m2) which are located on each The population size required to gen-
side of the target gene (Q/q) with recombi- erate (in a single BC generation) a high
nation frequencies r1 and r2 and the F1 has probability of obtaining at least one plant
genotype M1QM2/m1qm2. The F1 will pro- recombinant between the target gene and
duce two gametes with the marker genotype both flanking markers is greater than the
M1M2, one of which is the parental type reproductive rate for most crop species.
containing the target allele (M1QM2) and the For example, for a flanking marker distance
other is the double-crossed containing non- of 5 cM on each side of the target gene,
target allele (M1qM2). Because the frequency about 4000 individuals are required to find
of double crossing is very low, the double- a double recombinant with a probability of
crossed gamete is very rare. As a result, the 0.99 (Frisch et al., 1999b). Therefore, Frisch
probability of making the correct selection (2004) proposed a sequential strategy to find
for the target allele Q based on the presence an individual with recombination between
of M1 and M2 is very high. Under no interfer- the target gene and one flanking marker in
ence, the probability of obtaining the target generation BC1 and a recombinant between
genotype Q/Q based on selection of M1M2/ the target gene and the second flanking
M1M2 in the F2 generation is marker in generation BC2 (also see Fig. 8.4b
for further explanation of this strategy).
P1 = (1 r1)2 (1 r2)2 / [(1 r1)(1r2)+r1r2]2 (8.3) Table 8.1 gives the optimum popula-
tion size n1 in generation BC1 and corre-
When the target gene is located in the sponding expected population size E(n2)
middle of two flanking markers, i.e. r1 = r2, in generation BC2 such that the expected
the probability of making the right selection total number of individuals E(n) = n1 +
is minimized. Figures 8.1 and 8.2 show the E(n2) required to introgress one gene with a
relationship between the minimum number minimum number of individuals in a two-
of plants required and r1 (or r2), when generation BC programme is minimized.
P2 = 0.99 and r1 = r2. Selection efficiency is The values depend on the map distances
much higher using two flanking markers d1 and d2 between the target gene and two
than using one marker. With interference flanking markers (Frisch, 2004).

Table 8.1. The expected total number of individuals E(n) = n1 + E(n2) required to introgress
one gene with a minimum number of individuals in a two-generation BC programme (from
Frisch (2004) with kind permission of Springer Science and Business Media).

Map distance d2 (cM)

Map distance 4 6 8 12 16
d1(cM) n1/E(n2)a

4 143/252 136/186 130/155 123/128 117/117


6 91/167 88/135 83/105 79/93
8 66/125 63/94 60/80
12 48/83 41/68
16 32/62
a
n1 is the optimum population size in generation BC1 and E(n2) the corresponding expected population
size in generation BC2.
Marker-assisted Selection: Theory 297

Selection using multiple markers for specific trait or trait category. The multiple-
multiple targets marker approach can be used to select the
best trait/gene combinations based on selec-
MAS provides opportunities for simulta- tion for each of the target loci whose posi-
neous selection of multiple traits/genes tion in the genome is known. It is possible
using multiple markers. In some cases, to select the best cassette for any traits and/
multiple pathogen races or insect biotypes or trait combinations.
must be used to identify plants for multi- When single chromosomes are distin-
ple resistances, but in practice phenotypic guishable, partial genome selection or whole
selection may be difficult or impossible chromosome selection are possible as an
because different genes may produce simi- alternative to whole genome selection so that
lar phenotypes that cannot be distinguished the other chromosomes remain unchanged.
from each other. Markertrait association MAS could be focused on a chromosomal
can be used to simultaneously select multi- region/arm if it is separable from the rest
ple resistances from different disease races of the genome. Genes controlling the same
and/or insect biotypes and pyramid them traits or trait category may cluster in some
into a single line through MAS. specific chromosomal regions, which are
For example, to find a restorer for cyto- called gene blocks. Regional mapping strat-
plasmic male sterility (CMS) in rice through egies (Xu, 1997; Monna et al., 2002), com-
testcrossing and progeny test, a candidate bined with a high-density genetic map, can
male plant has to be testcrossed with a CMS help construct high-density regional maps
line to find out if it has fertility restorability that target gene blocks for separation of
based on the fertility of testcross progeny. closely linked genes.
However, sterility in testcross progeny could
result from the absence of either restorabil-
ity genes or wide compatibility genes or both
when an intersubspecific cross is involved. 8.2.2 Marker-assisted background
MAS using multiple markers could be used selection
to distinguish the two different types of ste-
rility. As another example, consider phe- In a BC programme, molecular markers can
notypic selection for multiple traits in rice, be used for indirect selection for the presence
such as thermal-sensitive genic male steril- of a favourable allele (Tanksley, 1983) and
ity (TGMS), amylose and wide compatibil- for selection against the undesirable genetic
ity. Candidate plants must be tested in two background of the donor genotype (Tanksley
different environments where TGMS can be et al., 1989). Selection for the remainder of
identified. Each plant must be testcrossed the genome excluding the target gene(s), i.e.
with wide compatibility testers, following genetic background, is called background
up with a progeny test in the next season. selection (Hospital and Charcosset, 1997).
At the same time, a relatively large amount The background selection is aim-
of seed must be harvested for amylose meas- ing at the whole genome. In a segregating
urement. While conventional selection population, each chromosome represents a
methods require a delay until a large number random combination of two parental chro-
of seeds are available and a reasonable level mosomes. So we have to know the parental
of homozygosity is reached, in MAS only combination of each chromosome in order
a leaf harvested at any growth stage in any to do whole genome selection, i.e. the entire
segregating population is required, with the genome has to be covered by molecular
availability of associated markers for these markers. For an individual plant, we can
traits. infer the parental origin for each marker
As genetic mapping information accu- allele across the whole genome when geno-
mulates from different mapping populations, types at all marker loci are known and thus
it may be possible to establish a complete we can infer the parental combination for
profile for all the genes associated with a each chromosome.
298 Chapter 8

Concept of graphical genotypes a 1: 1122222222; 2: 22222; 3:111122; 4: 333333; 5: 22333;


6: 11111; 7: 333332; 8: 22111; 9:3333; 10: 1233322
11: 111; 12: 23333
In breeding programmes, it is important to
consider the complete genome of individu- b 1 2 3 4 5 6 7 8 9 10 11 12
als, in addition to specific target genes. In
sexually reproducing organisms, segregat-
ing progeny contain chromosomes that
are mosaic of chromosomal pieces derived
from their parents. Knowledge of the molec-
ular marker genotype at one specific locus
1 2 3 4 5 6 7 8 9 10 11 12
or haplotype across several closely linked
loci yields information about the parental
origin of alleles at that particular site in the
genome. Knowledge of the molecular marker
genotypes of many linked loci throughout
the entire genome yields an estimate of the
exact composition of an individuals chro-
mosomes in terms of its parents. In other Fig. 8.3. Graphical genotype for an individual
words, information about linked points in a from a tomato F2 population derived from a cross
between Solanum esculentum and Solanum
genome permits deduction of a continuous
pennellii (also known as Lycopersicon esculentum
genotype, which can be displayed graphi- and Lycopersicon pennellii ). (a) Numerical
cally (Fig. 8.3). Graphical genotypes provide RFLP data presented in order along the 12
a clear picture of the genomic structure of chromosomes in the genome: 1, homozygous
each plant, which facilitates MAS, and are for S. esculentum; 2, heterozygous; 3,
particularly useful for background selection. homozygous for S. pennellii. (b) Graphical
The selection is first made for foreground to genotypes derived from the numerical RFLP data
retain the target gene and background selec- shown in a. White intervals indicate segments
tion is then made among the plants already derived from S. esculentum; blackened intervals
selected for foreground. from S. pennellii; and striped intervals indicate
segments containing a crossover event. Two
In any meiosis, zero, one, or more cross-
homologues of each chromosome pair are shown
over events may occur between a given side by side. Two isomeric graphical genotypes
pair of homologues. When crossover has of equal likelihood are shown that differ only in
occurred, a complete description of an indi- the region noted by the arrow and thick line.
viduals genome would include information From Young and Tanksley (1989a) with kind
on changes in allelic constitution due to permission of Springer Science and Business
recombination, as well as information on the Media.
locations where crossover events occurred
(Young and Tanksley, 1989a). In transmission genotypes. It is similar to cytological karyo-
genetics, individuals are routinely described types in describing an entire genome in a
by their genotype at one or more genetic loci single graphic image, but different in that
of interest. The description is generally alpha- graphical genotypes would be inferred from
betic or numerical in nature and provides pre- molecular marker data and thus would show
cise information on the derivation and allelic the genomic constitution and parental deriva-
constitution at the specific loci. High-density tion for all points in the genome.
molecular maps can be used to determine To develop a graphical genotype, molec-
the genotype of an individual at thousands ular marker data, obtained in a numerical
of loci and thus it is possible to deduce the form, need to be transformed into an easily
most probable genetic constitution for regions interpretable and accurate graphic image.
of interest or the entire genome in a given Young and Tanksley (1989a) developed the
individual. The graphical genotype, which mechanics for conveying RFLP data in the
portrays molecular data in a graphical form, form of a graphical genotype, applied this
has a number of advantages over numerical concept to BC and F2 populations of tomato
Marker-assisted Selection: Theory 299

and discussed several issues relating to the ular markers is inferred from the genotypes of
potential power and application of graphi- the markers that delimit the interval. When
cal genotypes. The term RFLP markers used inferring the graphical genotype of an interval
in their paper can be extended to include from the genotypes of the marker endpoints,
genotypes derived from all types of molecu- there are often alternative configurations that
lar markers that are co-dominant and haplo- will satisfy the available marker data. Young
types derived from di-allelic markers such and Tanksley (1989a) used the most likely
as SNPs. This concept (developed on the configurations to develop a graphical geno-
basis of structured populations such as BC type. Thus, simple configurations requiring
and F2) can be extended to all populations the fewest number of crossover events were
including natural populations that consist utilized in developing a graphical genotype,
of germplasm accessions or cultivars. while alternative configurations that require
one or more multiple crossover events are
Requirements for deducing graphical not. In practice, this means that if two con-
genotypes secutive loci have the same genotype, the
genotype of the segment between the mark-
In order to construct a graphical genotype, ers is inferred to be that of two flanking
certain conditions must be met. First, a markers. When two adjacent loci have dif-
well populated or high density, molecular ferent marker genotypes, it is inferred that a
map, for the entire genome of the species crossover event had taken place somewhere
must be available. This map should con- between the two loci.
sist of a large number of markers that cover Since the genotype of a non-recombinant
the entire genome with at least one marker interval is inferred from the genotype of its
every 10 cM or less. In addition, it is also marker endpoints, double crossovers (or
necessary that the cistrans configuration other even numbers of crossovers) in a given
for the molecular markers be known in interval will falsify this inference and the
order to prepare a graphical genotype. In likelihood of double crossovers increases
populations derived from inbred lines, such by the square of the probability of a cross-
as breeding populations consisting of BC or over between the adjacent molecular mark-
F2 progeny, the cistrans configuration can ers. Thus, for any interval, the probability
be inferred simply by the knowledge of the that the inferred genotype will be correct is
breeding scheme. In more complex situa- 1 r 2, where r is the probability of a cross-
tions, complete molecular marker data must over event between adjacent molecular
be obtained for three generations in order markers. For the total genome, the probabil-
to prepare graphical genotypes for individu- ity that there are no incorrect intervals is
als in the third generation. In humans, for
example, molecular marker data must be Total intervals
determined for grandparents and parents
in order to develop graphical genotypes
Pt = (1 r )
n =1
2
n (8.4)
for the children in the pedigree. Without
this knowledge of cistrans configuration, This equation considers only double
molecular marker data from some regions of crossovers and assumes interference bet-
the genome may have more than one pos- ween crossovers to be negligible. As an
sible graphical genotype, all of which are example, consider an organism with a total
equally likely to be correct. genome size of 1000 cM in which molecular
markers are evenly spaced over the entire
Assumptions employed in developing genome. The expected proportion of the
graphical genotypes genome which is described correctly by
the graphical genotype is calculated by first
The primary assumption required for the determining the probability of 0, 1, 2, . . .
development of graphical genotypes is that intervals that are incorrectly described for a
the genotype of a region between two molec- given spacing of molecular markers. These
300 Chapter 8

probabilities, along with the spacing size, additional crossing. Although the concept
are then used to determine the expected of the graphical genotype was proposed a
length of the genome correctly inferred, long time ago, it has been widely used in
which is then divided by the total genome different fields of genomics. It has been
size to yield the expected proportion of the used, as described in Chapter 4, for selec-
genome that is accurately portrayed by the tion of genome-wide introgression lines as
graphical genotype. With molecular mark- a library to cover all traits and the whole
ers spaced every 10 cM, an inferred graphi- genome segment by segment. As molecular
cal genotype will have a probability of only marker data increase exponentially with
30% of being exactly correct for all regions the availability of high-throughput geno-
(i.e. no incorrect intervals). However, this typing systems, the concept of the graphical
same graphical genotype will be accurate in genotype and its derivatives have received
describing the genome constitution for over more attention and are widely used in
99% of the genome. Even when the spac- MAS, near-isogenic line (NIL) construction,
ing between molecular markers increases to introgression line library development and
30 cM, the inferred graphical genotype will association mapping. As numerous points
be accurate for approximately 95% of the in the genome can be covered by markers,
genome. Apparently, as the number of man- graphical genotypes can be simplified by
ageable and available molecular markers displaying them using the physical posi-
becomes unlimited compared to the number tions of markers rather than the intervals
when the concept was proposed, the correct determined by flanking markers.
probability will be improved significantly.
Cistrans ambiguity happens in an
F2 population when heterozygous loci are 8.2.3 Donor genome content
separated by a stretch of one or more homo- in BC generations
logous loci. In this situation, two equally
likely graphical genotypes are possible that DNA marker-based whole genome selec-
differ in the cistrans configuration of the tion or background selection can be used to
flanking heterozygous regions (see Fig. 5 of accelerate recovery of a recurrent genotype
Young and Tanksley, 1989a). Calculations in the backcrossing process for improving
based on the Poisson distribution indicated parental lines. The basic principle of back-
that only 6% of a genome consisting of ten ground selection (as opposed to foreground
chromosomes of 100 cM each will be ambig- selection on the target gene) is that in any
uous. The utility of graphical genotypes in given BC generation the actual DGC varies
F2 populations will not generally be seri- around the theoretical mean value.
ously impaired by cistrans ambiguities. Once QTL alleles of interest in the
resource (donor) parent have been identi-
Application of graphical genotypes fied by linkage to resource-specific marker
alleles, repeated backcrosses to the cultivar
A graphical representation of a genotype (while choosing in each cycle only those
deduced from RFLP data for a randomly backcrossing progeny carrying the exotic
selected individual from a tomato F2 popu- QTL-linked marker alleles) will allow the
lation provided by Young and Tanksley effective introgression of the linked quanti-
(1989a) is shown in Fig. 8.3. Note that it is tative alleles from the donor into the culti-
not only possible to see which portions of var. Depending on the number of alleles to
each set of homologues are derived from be introgressed, it may be possible to expe-
each parent, but also the regions in which dite matters by actively selecting against
crossovers took place. exotic marker alleles (and hence against the
Using graphical genotypes, plants can associated chromosomal regions) that are
be selected that not only contain the gene(s) not in linkage to introgressed alleles.
of interest, but also have the highest prob- Table 8.2 shows the frequency of a
ability that the rest of the genome will favourable allele after one to six BC gen-
return to that of the recurrent parent with erations, with and without selection for a
Marker-assisted Selection: Theory 301

Table 8.2. The frequency of a favourable allele after a given number of BC generations, with and without
selection for a linked marker allele or for a pair of markers bracketing the favourable allele (marker
bracket) and the proportion of recipient genome recovered with and without MAS against the remaining
exotic genome. From Beckmann and Soller (1986a) by permission of Oxford University Press.

MAS against remainder of exotic


MAS for favourable alleles genome (proportion of recipient
Number (frequency of favourable alleles) genome recovered)
of BC
generations None Single markera Marker bracketb None Full marker coveragec

1 0.25 0.81 0.92 0.75 0.85


2 0.12 0.73 0.88 0.88 0.99
3 0.06 0.66 0.85 0.94 1.00
4 0.03 0.59 0.82 0.97 1.00
5 0.02 0.53 0.78 0.98 1.00
6 0.01 0.48 0.75 0.99 1.00
a
Proportion of recombination between marker allele and linked favourable allele, 0.10.
b
Proportion of recombination between the two markers of the bracket, 0.40.
c
Two markers per chromosome.

linked marker allele. Also shown are results If only two background selection markers
if selection is for a pair of marker alle- on the target chromosomes are used (assum-
les bracketing the QTL to be introgressed. ing direct selection for the target gene), the
Suppose that the proportion of recombi- distances d1 and d2 between the target gene
nation between marker allele and linked and markers can be chosen such that the
favourable allele is 0.10 when a single expected DGC on the target chromosome
marker is used and the proportion of recom- is minimized if both markers are fixed for
bination between the two markers of the the recipient alleles (Hospital et al., 1992)
bracket is 0.40. The comparison of interest by applying
is the frequency of the introgressed alleles
1
after three generations of marker-assisted d1 = d2 = ln(1 + 2 s ) (8.5)
backcrossing (MAB) (single marker, 0.66; 2
marker bracket, 0.85) compared to the fre- where s is the proportion of selected BC1
quency of the introgressed allele after five individuals. This approach is based on the
to six unassisted BC generations (0.01). In assumption of an infinite population size
the former case, the introgressed allele will and the optimum properties only hold true
have an immediate effect on cultivar value if two markers in the carrier chromosome of
and can be rapidly brought to fixation by the target gene are used (Frisch, 2004).
selfing or selection.
With two markers per chromosome
used for MAS against the remainder of the
exotic genome, the proportion of recipient 8.2.4 Linkage drag in gene introgression
(recurrent) genome recovered in BC2 will be
equal to that obtained in BC6 without MAS When transferring a single gene from a donor
(Table 8.2). This result is also given Fig. 8.4 into the genetic background of a recurrent
and is well recognized by many authors parent by repeated backcrossing, genetic
(e.g. Tanksley et al., 1989; Hospital et al., linkage will cause fragments of the donor
1992; Frisch et al., 1999a, 2000). Therefore, genome surrounding the target gene to be
selection based on markers that distinguish dragged along, which is called linkage
between donor and recurrent parent genome drag, a persistent problem in plant breeding
may considerably accelerate the recovery of for gene introgression. Small donor genome
the recurrent parent genome. fragments, not linked to the target gene,
302 Chapter 8

a BC1 BC2 BC3 BC6


Traditional
backcrossing
breeding

Per cent 75.0 87.7 93.3 99.0


recurrent
genome

Marker-assisted
backcrossing
breeding

Per cent 85.5 98.0 100


recurrent
genome

b F1 BC1 BC2 BC3 BC20 BC100


Traditional
backcrossing
breeding
Years 0.5 1 1.5 2 10 50

Marker-assisted
backcrossing
breeding
Years 0.5 1 1.5

Fig. 8.4. Comparison of traditional and marker-assisted backcrossing breeding (assuming that
co-dominant markers are used). (a) Rate to return to recurrent parent genotype in regions of genome
unlinked to gene(s) being introduced. (Top) Traditional backcrossing breeding. Graphical genotypes were
generated for randomly selected individuals from various BC generations derived from a single BC1
individual by computer simulation. Only one homologue of each of the 12 tomato chromosomes is shown
(the other homologue can be derived exclusively from the recurrent parent). Darkened regions indicate
donor genome segments, striped regions indicate segments in which crossovers occurred, and white
regions indicate recurrent genome segments. Each interval is 20 cM in length. The numbers beneath each
graphical genotype indicate the percentage of the genome derived from the recurrent parent. The average
number of generations required to return to the recurrent genome, as estimated from 20 independent
simulations, was 6.5 1.7 generations. (Bottom) Graphical genotypes of individuals from marker-assisted
backcrossing breeding programme showing return to the recurrent parentage in only three generations.
In each BC generation, 30 progeny were generated and the best (in terms of percentage recurrent parent
genome) was used as the parent for the next BC generation. (b) Expected linkage drag around a selected
gene held heterozygous during backcrossing. (Top) Traditional backcrossing breeding. (Bottom) MAS for
plants carrying chromosomes with recombination near the selected gene. Markers tightly linked to the
gene of interest are used to identify individuals with crossovers within 1 cM on one side of the selected
gene in BC1. These recombinant individuals are then backcrossed to the recurrent parent and other tightly
linked markers are used to select recombinants within 1 cM on the other side of the target gene in BC2.
The expected number of years to obtain a given level of linkage drag (for a typical crop with a generation
time of 0.5 years) is shown below. From Tanksley et al. (1989) reprinted by permission from Macmillan
Publishers Ltd.

may also end up in the recipients genetic example, even after 20 BCs, one expects to
background. The removal of linked seg- find a sizable piece (10 cM) of the donor
ments occurs in a complex fashion that was chromosome still linked to the gene being
described by Hanson (1959) and further selected (Stam and Zeven, 1981), which is
elaborated by Stam and Zeven (1981). Their shown in Fig. 8.4b. In practice, this region
work showed that it takes many generations may be larger or smaller than the expected
to remove the linked donor segments. For value owing to the large variance associated
Marker-assisted Selection: Theory 303

with the expected value and because a select individuals that have experienced
breeder inevitably practices selection among recombination near the gene of interest.
the progeny. In most plant genomes, 10 cM is In approximately 150 BC plants there is a
enough DNA to contain hundreds of genes. 95% chance that at least one plant will have
Therefore, backcrossing results in the trans- experienced a crossover within 1 cM on one
fer, not only of gene(s) of interest, but also side or the other of the gene being selected.
of additional linked genes from the donor. Molecular markers allow unequivocal iden-
This phenomenon can often result in a new tification of these individuals (Young and
cultivar modified for characters other than Tanksley, 1989b). With one additional BC
those originally targeted. Not surprisingly, generation of 300 plants, there would be a
many examples of linkage drag are known 95% chance of a crossover within 1 cM of
in which undesirable traits that are closely the other side of the gene, generating a seg-
linked to a target gene are carried along dur- ment surrounding the target gene of less
ing the breeding programme, particularly than 2 cM. This would have been accom-
when an exotic germplasm is involved. plished in two generations with molecular
In addition to linkage drag, unlinked markers, while it would have required, on
DNA from the donor parent must also be average, 100 generations without molecular
removed during a BC breeding programme. markers (Fig. 8.4b). It is apparent that the
In order to obtain a better idea of the rela- ability to select for desirable recombinants
tive importance of linked versus unlinked in a region of interest is a function of the
donor segments in BC breeding, a sim- number of markers mapped in that region,
ple curve was derived from the works of as well as the number of plants assayed.
Hanson (1959) and Stam and Zeven (1981) As plant molecular maps become more
to compare the amount of foreign DNA saturated, the efficiency of selecting recom-
due to these two sources as a function of binants will increase.
the number of BC generations (Young and Peleman and van der Voort (2003) pro-
Tanksley, 1989b). The results of this analy- vided an example of linkage drag that hap-
sis demonstrated that for a hypothetical pened in gene transgression of lettuce. In the
genome of ten chromosomes of 100 cM 1990s, Keygene was involved in a marker-
each, the proportion of unlinked DNA assisted breeding approach that led to the
derived from the donor genome is greater development of a novel lettuce cultivar
than that of remaining linked DNA only in resistant to the aphid Nasonovia ribisnigri
the first four BC generations. After this time, (Jansen, 1996). This aphid is a major problem
the proportion of donor DNA due to linkage in field-grown lettuce areas in Europe and
drag far exceed unlinked DNA by a factor of California causing reduced and abnormal
50 and in the 20th BC generations, linked growth in addition to spread of viral diseases.
donor DNA exceeds unlinked by a factor of Resistance to this aphid could be introgressed
more than 105. This simple analysis clearly from a wild relative of lettuce, Lactuca
emphasizes the importance of linkage drag virosa, by repeated backcrossing. However,
as the prominent problem in BC breeding despite many rounds of backcrossing the
programmes. new product was of extremely poor quality,
In a traditional BC programme, the bearing yellow leaves and a greatly reduced
linked segments usually remain large for head. This could either have been caused
many generations not because recombina- by a pleiotropic effect of the resistance gene
tion had not occurred in these regions, but or by linkage drag, a negative trait closely
because there is no effective way to identify linked to the positive trait of interest. Marker
recombinant individuals. In classical breed- analysis eventually demonstrated that the
ing it is usually only by chance that such reduced quality was caused by linkage drag.
recombinants are occasionally selected In this case, the linkage drag was recessive,
which contribute to a reduction in the size only visible in the homozygous state, thereby
of the donor segment. With high-density seriously increasing the difficulty to select
molecular maps it is possible to directly for recombinations based on the phenotype.
304 Chapter 8

It was decided to use DNA markers flank- donor genome substitution. The distribution
ing the introgression to pre-select for indi- of DGC in a BC1 generation is shown in Fig.
viduals that are recombinant in the vicinity 8.5 for three genome sizes (haploid number
of the gene. More than a thousand F2 plants of chromosomes map length): small:
were screened this way, leading to the 127 5 100 cM; medium: 10 100 cM; large:
selection of some 100 individuals bearing a 15 150 cM (Stam, 2003). The important
recombination or even double recombina- feature that can be observed is that the vari-
tions in the vicinity of the gene. Only those ance in DGC decreases as genome size (total
individuals needed to be phenotyped for centimorgans) increases.
both the resistance and, at the F3 level, for the From the tabulated cumulative distribu-
absence of the negative characteristics. This tion of Fig. 8.5 the probability of less than
approach eventually led to the selection of a given DGC can be read. For example, the
an individual bearing recombination events probability that DGC is less than 0.35 equals
very close to each side of the gene thereby 0.21, 0.12 and 0.06 for the small, medium
removing the linkage drag. The results dem- and large genome, respectively. From these
onstrated that the (recessive) linkage drag was probabilities one can calculate the popula-
due to tightly linked factors on both sides of tion size required to ensure that with e.g.
the resistance gene. As indicated by Peleman 90% certainty at least one plant will occur
and van der Voort (2003), this result would with less than a given DGC. Let the threshold
have been very hard to obtain by classical DGC be x and let the corresponding probabil-
selection methods. ity be px. Then from Stam (2003) the required
minimum population size N satisfies

8.2.5 Effect of genome size on gene 1 (1 px)N > PC (8.6)


introgression
where PC is the pre-set level of certainty.
The influence of genome size on the distri- For the three genome sizes, the popula-
bution of DGC in BC generations has import- tion size required is given in Table 8.3 to
ant consequences for the attainable rate of find with 90% probability at least one or at

1.0

0.8
Cumulative distribution

0.6

0.4

Small
Medium
0.2
Large

0.0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
DGC (%)

Fig. 8.5. The cumulative distribution of donor genome content (DGC) in a BC1 generation for a small,
medium or large genome. Results based on 50,000 replicate simulation runs. After Stam (2003).
Marker-assisted Selection: Theory 305

Table 8.3. Population sizes (expressed as number of individuals) required in


a BC1 to obtain (probability 0.90) at least one or at least two plants with less than
a certain donor genome content (DGC) with a small, medium or large genome
(Stam, 2003).

At least one At least two

DGC Small Medium Large Small Medium Large

< 0.45 5 5 6 8 9 11
< 0.40 7 9 14 14 16 25
< 0.35 10 18 40 17 31 68
< 0.30 16 41 169 28 69 285
< 0.25 28 111 822 48 187 >1000

least two plants with less than a given DGC assisted BC programme. For example, only
in a BC1 generation, which tells the import- 2650 marker data points were required for
ance of genome size. For example, for DGC n1:n2:n3 = 1:3:9, while 5000 or even 7250
to be less than 0.40 in at least one plant, a marker data points were required for ratios
large genome requires approximately a two- 1:1:1 and 3:2:1, respectively. However, in a
fold larger population size as compared multi-stage selection for a quantitative trait,
to a small genome (14 versus 7). As DGC large populations in early generations are
decreases, this tendency increases rapidly advantageous because when high selection
as up to tenfold for DGC less than 0.30 and intensity is applied, a large selection gain is
thirty times for DGC less than 0.25. From expected due to the large segregation vari-
these simple calculations, the price to be ance (Frisch, 2004).
paid for a rapid decline of DGC in a large
genome is twofold (Stam, 2003): (i) the
larger the genome size, the more markers
(the more marker data points per plant) 8.2.6 Background selection at carrier
required; and (ii) the larger the genome size, chromosome
the larger population size required to attain
a given rate of donor genome substitution. Donor genome substitution is most impor-
When multiple BC generations are con- tant and at the same time most difficult, for
sidered, there are selection strategies on chromosomes that carry the target gene(s).
population sizes. Employing increasing, Suppose that the target gene is flanked by
constant, or decreasing population sizes two markers at map distances d1 and d2
from generations BC1 to BC3 in a simulation which can be used for background selec-
study had little effect on the recurrent par- tion as described previously. Within a given
ent genome values of the selected BC3 plants number of generations the introgressed
(Frisch et al., 1999a). For example, allocat- segment must be smaller than the segment
ing a total of n = 300 plants such that 100 covered by d1d2. Then, given a pre-set
plants are generated in each of generations probability of reaching this goal (with a 99%
BC1 to BC3, (ratio n1:n2:n3 = 1:1:1) resulted success rate), what are the optimal popula-
in a lower 10% percentile of the recurrent tion sizes in successive BC generations?
parent genome (Q10) of 97.4%, while vari- The answer to this question has been
ous ratios from 3:2:1 on the one extreme to given by Hospital and Decoux (2002) and can
1:3:9 on the other resulted in Q10 values readily be obtained with the software pack-
of 97.3 and 97.4%, respectively. In con- age POPMIN (http://moulon.inra.fr/fred/pro
trast, employing a large population size in grams). Table 8.4 provides three important
generation BC1 multiplies the number of features based on the results from POPMIN.
marker data points required for the marker- First, the smaller the segment (interval)
306 Chapter 8

Table 8.4. Optimum population sizes (expressed as number of individuals) required in successive BC
generations to achieve with 99% certainty two markers flanking the target gene becoming detached in at
least one plant of the last BC generation. N, accumulated number of plants. (N), average accumulated
number of plants; this is less than N because with a certain probability the goal may be reached before
the final generation. Figures indicated in configuration column are distances in centimorgans (cM).
T, target locus; d1, d2, flanking markers (Stam, 2003).

Configuration Generation BC1 BC2 BC3 N (N)

d1-10-T-10-d2 2 62 100 162 (137)


3 25 36 76 137 (74)
d1-5-T-5-d2 2 118 200 318 (289)
3 48 70 149 267 (149)

bracketed by the markers, the more plants 8.2.7 Whole genome selection
are required because rare recombinants are for genetic background
less likely to occur in smaller populations.
Secondly, population size should increase The question arises about the number of
as generations proceed as two-sided detach- markers (per chromosome) that should be
ment (crossover) is in most cases a two-stage used for whole genome selection for genetic
process. If no detachment (crossover) occurs background and how this depends on genome
at any side in a given generation, more plants and/or population sizes. Several authors (see
are required in the generation(s) thereafter. e.g. Hospital and Charcosset, 1997; Frisch
Thirdly, allowing more generations (three et al., 1999a,b) have shown that in a moder-
versus two) to achieve the goal requires ately sized population from which the most
fewer plants to be grown and genotyped in promising plant is selected for further back-
total, indicating a trade-off between speed crossing, an increase in the number of mark-
and cost (total sample size) of the introgres- ers per chromosome beyond two is hardly
sion programme. rewarding (Table 8.5). An increase from 1 to 8
In addition, the POPMIN software also markers reduces DGC in relative sense (from
allows the user to specify the initial genotype 0.13 to 0.07 in BC2), but the absolute effect
at both markers and the target locus. Given is limited. However, when rapid progress
an initial condition of BC generation, e.g. requires using larger population sizes, espe-
BC1, the user can optimize population sizes cially in the case of a large genome (where
in the following BC2, BC3 etc. Conversely, larger population sizes are required anyway),
if no single recombinant has been obtained the situation is different (Table 8.6).
in a given BC generation, an increase of the
originally planned population sizes in gen- Table 8.5. Average decrease of DGC in a BC
erations thereafter is needed. programme with a medium genome size and
In terms of the relative importance of single target gene. Each chromosome has 1, 2
or 8 markers, uniformly distributed over the
background selection for the carrier chromo-
chromosome. One plant out of 50 is selected in
somes and the remainder genome, Hospital
each generation for backcrossing. The selected
(2002) considered background selection on plant satisfies the following conditions: (i) it carries
carrier chromosomes to be more important the target allele; and (ii) it has the smallest number
than on non-carriers and thus assigned dif- of markers of donor signature. Results based on
ferent weights to carrier and non-carrier 5000 replicate simulation runs (Stam, 2003).
markers. Frisch and Melchinger (2001)
considered multi-stage selection of mark- Number of markers BC1 BC2 BC3
ers: after selection of the target gene(s), one
1 0.34 0.13 0.07
selects plants based on carrier markers and
2 0.31 0.09 0.04
finally, from the obtained subset, one selects 8 0.30 0.07 0.02
based on non-carrier markers.
Marker-assisted Selection: Theory 307

Table 8.6. Average DGC attained in BC2 for small 8.2.8 Multiple gene introgression by
and large genome sizes with various population repeated backcrossing
sizes and number of markers per chromosome
(Stam, 2003). Since little additional effort is required to
Genome size
screen with multiple molecular markers
Number of Population after sampling and DNA extraction, one
markers size Small Large could consider adding many genes simul-
taneously to a cultivar through MAB. For
2 50 0.082 0.121 example, batteries of disease resistance
200 0.079 0.095 genes could be added in a few generations,
400 0.078 0.088 as opposed to the many generations required
8 50 0.040 0.100
with traditional breeding. The ability to rap-
200 0.021 0.067
400 0.019 0.055
idly adjust existing cultivars should allow
breeders to more quickly respond to market
demands, as well as unexpected environ-
Two general conclusions can be drawn mental pressures, such as the appearance of
(Stam, 2003): (i) For a small genome with new pathogens.
few markers per chromosome, increas- With marker assisted introgression,
ing the population size makes little sense. allele frequencies for the introgressed
When many markers are available, however, alleles are sufficiently high that two of
an increase of population size does reduce three alleles could be readily introduced
DGC, but hardly so beyond N = 200. (ii) For and brought to fixation in a given breed-
a large genome, increasing the population ing cycle (Beckmann and Soller, 1986a).
size is beneficial, irrespective of the number Without MAS, many BC progeny will have
of markers per chromosome. Obviously, to be screened for the introduced trait, due
with increasing genome size more inde- to the extreme rarity of BC progeny carrying
pendent recombination events are required desired exotic alleles.
to attain a given reduction in DGC, which Several authors have considered opti-
in turn demand larger populations for their mization aspects of multiple gene transfer
discovery. by repeated MAB (van Berloo et al., 2001;
Whole genome selection for background Frisch and Melchinger, 2001; Hospital,
will help reduce the DGC. The question 2002; Stam, 2003). Frisch (2004) discussed
about what final level of DGC is acceptable the introgression of two dominant genes. It
cannot easily be answered in general terms. is clear that, roughly speaking, the effects
When only relying on estimated DGC, based of population size, genome size and total
on markers, one still runs the risk that after number of markers on the efficiency of
finalization a tiny donor fragment contains recurrent parent genome recovery are simi-
a few wild type genes that confer an unde- lar to those for single gene transfer. As an
sirable trait. Especially in a rapid cycling example, Table 8.7 shows the effect of
introgression programme that hardly allows population size for the introgession of three
phenotypic selection for general agronomic target genes in a genome of medium size
performance, undesired donor traits may using eight markers per chromosome for
unexpectedly turn up despite an expen- background selection (Stam, 2003). With
sive and theoretically powerful BC scheme multiple targets an increase of population
(Stam, 2003). On the other hand, desirable size enhances the efficiency. However, the
DGC levels across different BC populations average DGC decrease in the BC population
largely depend on the genetic difference more apparently when the population sizes
between the donor and recurrent parents. In increase from small to medium such as from
many cases, unsaturated backcrossing with 50 to 100.
selection for the target gene may be enough The number of target genes does affect
particularly when the donor parent is also a the answer to the question whether a given
commercial cultivar. total number of plants should be distributed
308 Chapter 8

Table 8.7. Average DGC in BC2 and BC3 in an genetic value relies on increasing the fre-
example of the simultaneous introgression of three quency of favourable genes controlling
target genes in a genome of medium size, using that trait. To create a superior genotype,
eight markers per chromosome for background the breeder must assemble many genes
selection. A single plant was selected for further
which work well together and, for a spe-
backcrossing, carrying the three target alleles and
having the smallest number of markers of donor
cific trait, assemble the alleles with similar
signature. Averages based on 1000 replicate effects from different loci. This process is
simulation runs (Stam, 2003). called pyramiding, by which different QTL
alleles can be recombined and the true-
Population size BC2 BC3 breeding lines associating alleles of similar
(positive or negative) effect can be selected
50 0.18 0.09 (Xu, 1997; Fig. 8.6). Related techniques
100 0.14 0.06
include effectively identifying the individ-
200 0.11 0.04
400 0.09 0.03
uals with favourable allele combinations,
assembling different alleles into a common
genetic stock to produce new genotypes and
determining the joint effects of alleles at dif-
over two or three generations. Comparison
ferent loci. In the words of Allard (1988):
of average DGC attained with a total of 900
Emphasis was therefore shifted . . . to a
plants, distributed over two or three BC
particulate approach . . . determining the
generations with medium genome size and
individual effects of single marker loci on
eight selection markers per chromosome
adaptive change, then determining the joint
(Stam, 2003), showed that three BC genera-
effects of pairs of loci.
tions each with 300 plants is more effec-
tive than two BC generations each with 450
plants. The average DGC for the former is 8.3.1 Gene-pyramiding schemes
0.010 for one target gene and 0.036 for three
target genes while for the latter these two
If all genes cannot be fixed in a single step
numbers are 0.023 and 0.083, respectively.
of selection, it is necessary to cross again
Again, there is a trade-off among time (how
selected individuals with incomplete, but
many generations involved), cost (how
complementary, sets of homozygous loci
many data points to generate per genera-
(Xu et al., 1998). However, such strategies
tion) and efficiency (how soon the recurrent
are limited to small numbers of target loci.
parent genome can be recovered).
To accumulate more loci in a single geno-
A complication arising with multi-
type by selection on markers, Hospital et al.
ple QTL transfer is the uncertainty about
(2000) proposed a marker-based recurrent
the exact location of QTL. Hospital and
selection (MBRS) method using a QTL com-
Charcosset (1997) investigated the optimal
plementary strategy in a randomly mating
location of markers to be used in foreground
population. When evaluating this method
selection. This optimization process should
using simulations with 50 detected QTL
also consider the relative economic import-
in a population of 200, they found that the
ance of the target traits for the multiple QTL
frequency of favourable alleles went up to
to be introgressed.
100% in ten generations when markers were
located exactly on the QTL, but up to only
92% when markerQTL distance was 5 cM.
8.3 Marker-assisted Gene The reduced efficiency in the latter case
Pyramiding comes from the probability of losing the
QTL during the breeding scheme because
Agricultural productivity is the result of of recombination between the markers and
growing superior genotypes in an envir- QTL. This effect becomes more severe with
onment which allows them to express increasing duration of the breeding scheme
their superiority (Boyer, 1982). Increasing because of the accumulation of meiosis;
Marker-assisted Selection: Theory 309

Germplasm

A B C D E
A Map-based
Screening for B X whole genome
non-allelic C X X selection
QTL D X X X
E X X X X

Q1Q1 q1 q1 q1 q1 q1 q1
Allele- Allele
q2 q2 Q2Q2 q2 q2 q2 q2
dispersed dispersion
q3 q3 q3 q3 Q3Q3 q3 q3
materials
q4 q4 q4 q4 q4 q4 Q4Q4
Crossing x x
Divergent
F2 F2
selection Pyramiding
Q1Q1 q1 q1 q1 q1 q1 q1
Q2Q2 q2 q2 q2 q2 q2 q2
q3 q3 q3 q3 Q3Q3 q3 q3
q4 q4 q4 q4 Q4Q4 q4 q4
Crossing x Separating

Divergent F2
selection
Q1Q1 q1 q1
Allele- Q2Q2 q2 q2
Allele
associated Q3Q3 q3 q3
association
materials Q4Q4 q4 q4

Fig. 8.6. A procedure for QTL separating and pyramiding. Non-allelic QTL with dispersed QTL alleles are
identified by observation of transgressive segregation and map-based whole genome selection, and then
recombinants are obtained by divergent phenotypic selection from crosses derived from non-allelic
QTL materials. Two cycles of cross-selection are exemplified to pyramid non-allelic QTL at four loci
(Q1-q1, Q2-q2, Q3-q3 and Q4-q4). QTL separating is a reverse process of pyramiding, in which
allele-associated materials are used as parents to produce a segregating population (F2) and
intermediate phenotype is selected in order to get allele-dispersion individuals. From Xu (1997).
This material is reproduced with permission of John Wiley & Sons, Inc.

hence, it is important to cumulate and fix the Definition


target genes as rapidly as possible. Hospital
et al. (2000) concluded that the optimization To accumulate into a single genotype
of pair-wise crosses between selected indi- the genes that have been identified in multi-
viduals is the most efficient way to decrease ple parents, assume we have n loci of inter-
the duration of the breeding scheme under est and a set of founding parents labelled
the constraint of at constant cost. {Pi, i [1,, n]} with Pi being homozygous
Servin et al. (2004) developed a general for the favourable alleles at the ith locus
framework to optimize breeding schemes to and homozygous for unfavourable alleles at
accumulate identified genes from multiple the remaining n 1 loci. We assume that
parents into a single genotype (gene-pyra- the recombinant fractions between the loci
miding schemes). This section will intro- are known and we want to derive the ideal
duce the theory developed by these authors genotype (ideotype) that is homozygous for
on marker-assisted gene pyramiding. the favourable allele at all n loci.
310 Chapter 8

As shown in Fig. 8.7, the gene- bled haploids (DHs) as described in detail
pyramiding scheme has two parts. The in Chapter 4. Using the DH procedure, the
first part is called a pedigree and is aimed ideotype can be developed in just one addi-
at accumulating all target genes in a single tional generation after the root genotype is
genotype (called the root genotype). The obtained, plus one more generation for seed
second part is called the fixation steps, increase to produce large populations. The
which aims at fixing the target genes into fixation steps using the DH procedure can
a homozygous state, that is, to derive the be outlined as follows.
ideotype from the root genotype. A pedigree First, obtain a genotype carrying all
can be represented by a binary tree with n favourable alleles in coupling, namely
leaves corresponding to the n founding par- H(1, 2, , n)(B) by crossing the root genotype with
ents and n 1 nodes. Each node of the tree a blank parent (denoted as H(B)(B)) containing
is called an intermediate genotype and has none of the favourable alleles. This guaran-
two parents. Each intermediate genotype, tees that the linkage phase of the offspring is
which is a particular genotype selected known and that the H(1, 2, , n)(B) genotype can
from among the offspring, becomes a parent be identified without ambiguity.
in the next cross. Denote the gametes (sub- Second, self H(1, 2, , n)(B) to give the ideo-
sets of genes) passed on from the parents to type in one generation.
the intermediate genotype as s. Take H(s1)(s2)
as an example, the intermediate genotype
must produce and pass on to its offspring Pedigree height
a gamete carrying all the favourable alleles The number of generations a pedigree spans
in s1 and s2. is called the pedigree height, denoted h. If
There are many possible procedures the fixation steps span two generations, the
that can be used to fix the root genotype, one complete gene-pyramiding scheme spans
of which is to generate a population of dou- h + 2 generations. A pedigree is of maxi-
mum height when just one cross is per-
Founding parents
formed at each generation (involving an
intermediate genotype H and a founding
parent). This type of pedigree is called a
P1 P2 P3 P4 P5 P6 G0
cascading pedigree. Conversely, a pedigree
is of minimum height when the maximum
H(1) (2) H(3) (4)
Pedigree

H(5) (6) G1 number of crosses is performed at each gen-


eration. The height n of a pedigree cumulat-
H(1,2) (3,4) G2 ing n genes satisfies

Log2(n) h n 1 (8.7)
H(1,2,3,4) (5,6) G3
Node where x denotes the smallest integer larger
Fixation steps

than or equal to x.
Root genotype

Ideotype Number of pedigrees


H(1,2,3,4,5,6) (1,2,3,4,5,6)
The number of pedigrees accumulating n
genes is the number of binary trees with n
Fig. 8.7. Example of gene pyramiding scheme labelled leaves. The root genotype of a pedi-
cumulating six target genes. The pedigree part
gree accumulating n target genets comes
is aimed at cumulating one copy of all target
genes from founding parents in a single genotype
from the cross of two parents carrying,
(root genotype). The fixation step is to derive respectively, p and n p (non-overlapping)
the ideotype from the root genotype which fixes target genes, where (1 p n 1). Let N(p)
the target genes into a homozygous state. From be the number of subpedigrees cumulat-
Servin et al. (2004) with permission. ing p specified genes. Summing up over
Marker-assisted Selection: Theory 311

all possible values of p, the number N(p) of Note that other target genes might be
pedigree cumulating n genes can be com- on the map, located between the ais, but
puted via not belonging to the set s; recombinations
between these genes do not matter here. As
n 1
n an example illustrating Eqn 8.10, consider
p N (p)N (n p)
1
N (n) = (8.8) the genotype H(1,3)(2,5,6). The probability that
2 p =1
it passes the set (1, 2, 3, 5, 6) is (see Eqn 8.11
at bottom of page). Knowing these probabil-
The factor is there to ensure that the cross-
ities, the overall probability of obtaining
ing of two given parents is counted only
the root genotype of a given pedigree is the
once. This recursion can be solved (see the
product, over all the pedigrees nodes (other
Appendix in Servin et al., 2004) and leads to
than the root node), of the probabilities cal-
n
culated as in Eqn 8.10.
N (n) = (2k 3) = (2n 3)(2n 5)...1
k =2
(8.9)

Minimum population sizes necessary to


for the total number of pedigrees cumulat- obtain ideotype
ing the n genes. The total number of pedi- Lets call Pf and Pm the probabilities com-
grees increases very fast with the number of puted as in Eqn 8.10 that each parent of a
loci considered. For example, when n = 3, 4, given node passes on its particular subset
5, 6, 7, the total numbers of pedigrees are 3, of genes. From these probabilities we can
15, 105, 945 and 10,395, respectively. compute the population size N needed to
get the intermediate genotype at this node
Gene transmission probability through with a probability of success g. The prob-
a pedigree ability that none of the N offspring has the
right genotype is (1 Pf Pm)N; identifying this
Given the recombination fractions between
loci, we can compute the probability that with 1 g gives
an intermediate genotype H(s1)(s2) passes on
ln(1 g )
to its offspring the set of genes s that is the N= (8.12)
ln(1 Pf Pm )
union of s1 and s2. If denoted by v(s) the
total number of genes in the set s, we have
v(s) = v(s1)+v(s2). Let {ai} be the genes in set where ln denotes the natural logarithm. From
s ranked according to their position on the Eqn 8.12, the population sizes required at
genetic map, so that s = (a1,a2,...,av(s1) + v(s2)). each node can be computed. Now the over-
Let rx,y be the recombinant fraction all probability of success of the pedigree is
between x and y. The probability that a the product of the probabilities of success at
gamete generated by H(s1)(s2) contains the set each of its nodes. Similarly, the population
s of genes is sizes required for the fixation steps can be
computed. The nodes associated with com-
v ( s )1 bining two founding parents always pass on
p (i, i + 1)
1 their target genes. Let p be the number of
P (H (s1 )(s2 ) s ) = (8.10)
2 i =1 other nodes in the breeding scheme; if they
all have the some probability of success g as
where p (i, i + 1) = rai,ai+1, if genes ai and ai+1 are considered here, then the overall probability
in different subsets and p(i, i + 1) = (1 rai,ai+1), of success of the gene-pyramiding scheme
otherwise. is g p. The sum of all population sizes needed

1
P (H (1,3)(2,5,6) ) (1, 2, 3, 5, 6) = (r1,2 )(r2,3 )(r3,5 )(1 r5,6 ) (8.11)
2
312 Chapter 8

in the gene-pyramiding scheme (pedigree spans five generations (h = n 1 = 3 for the


and fixation steps) is denoted by Ntot. The pedigree height, plus two generations for
target of the population sizes to be handled the fixation steps) and requires the smallest
at any node or step during the whole gene- cumulated population size (Ntot = 325) of all
pyramiding scheme is Nmax. the schemes. The two other best schemes
last four generations (h = Log2(n) = 2 for the
A case study pedigree height plus two generations for the
fixation steps). The scheme that necessitates
Servin et al. (2004) developed a computer the next smallest Ntot (= 961) is the one rep-
program to build all pedigrees leading to resented in Fig. 8.8b. It cumulates loci 1 and
the ideotype for a given number n of genes. 4 on one subpedigree and 2 and 3 on the
Given the ri,j values, the program determines other, before generating the H(1,2,3,4)(B) geno-
the gene transmission probabilities and the type. The population sizes needed for this
cumulated population size Ntot for each gene-pyramiding scheme are large at all
pedigree followed by the fixation steps. nodes when compared to the cascading type.
A case study was provided by Servin et The gene-pyramiding scheme represented
al. (2004) for cumulating four genes, which in Fig. 8.8c necessitates an even larger Ntot
might be frequently used in accumulating (= 1001) because a huge population size
major-gene controlled traits such as disease is needed to produce the root genotype
resistance. The 15 possible pedigrees for H(1,2)(3,4); conversely, the population size
accumulating four genes located on a sin- needed to produce the H(1,2,3,4)(B) genotype is
gle chromosome were generated with the much smaller (N = 97).
assumption that the recombination fraction Xu et al. (1998) provided a practical
between adjacent loci are the same and cor- example to accumulate four loci control-
respond to 20 cM using Haldanes mapping ling tiller angle using phenotypic selection
function (Haldane, 1919). As the recombin- following both a scheme similar to the cas-
ation fraction is the same for all pairs of adja- cading pedigree and the other similar to the
cent loci, some gene-pyramiding schemes schemes described by Fig.8.8b, c.
have the same transmission probability or Although the cascading pedigree-based
population sizes, which we would note is gene-pyramiding scheme needs the smallest
not true for almost all practical cases. population size to combine all favourable
Figure 8.8 shows the three schemes, alleles, it takes more generations to derive the
each representing multiple gene pyramid- root genotype, compared to other schemes.
ing schemes with the same accumulated When genotyping cost is a more important
population size that necessitate the smallest limiting factor than the time required for
Ntot. The population sizes were computed delivery of breeding products, the cascad-
so that the probability of success of each ing pedigree scheme should be chosen.
scheme was 0.99. In the scheme based on a However, when quick development of a root
cascading pedigree (Fig. 8.8a), there are four genotype becomes more important, particu-
nodes for which the probability of obtain- larly in the private breeding sector, a couple
ing the intermediate genotype is not 1. The of generations less would make a huge dif-
probability of success used at each of those ference in market-share competition.
nodes was thus 0.991/4 = 0.9975. In the two Theoretically the method described
other schemes, the number of such nodes is above can be extended to the schemes
three, so that the probability of success used involving many genes. As more genes are
at each of these nodes was 0.991/3 = 0.9967. involved, the pedigree height (the number
The hybrids between founding parents are of generations a pedigree spans) increases
obtained with a probability of 1, so that the and so does the cumulative population size.
population required at the corresponding The difference of population sizes between
nodes was assumed to be one individual. difference schemes will also increase. As
Figure 8.8a shows a gene-pyramiding the number of genes increases, the popula-
scheme involving a cascading pedigree. It tion size needed in each generation would
a b c
P1 P2 P1 P4 P2 P3 P1 P2 P3 P4

P3

n=1 n=1 n=1 n=1 n=1


P4 B B

Marker-assisted Selection: Theory


1 1 1 1 1
r r r r r
2 12 2 14 2 23 2 12 2 34
n = 70 n = 394 n = 837
B
1 1 1
(1r12)r23 r (1r23)r34 (1r12)r23(1r34)
2 2 12 2
n = 84 n = 500 n = 97

1 1 1
(1r12)(1r23)r34 [ (1r12)(1r23)(1r34)]2 [ (1r12)(1r23)(1r34)]2
2 2 2
n = 102 N = 65 N = 65
Ntot = 961 Ntot = 1001
1
[ (1r12)(1r23)(1r34)]2
2
N = 68
Ntot = 325

Fig. 8.8. Representation of three different gene-pyramiding schemes cumulating four loci. Scheme a is based on a cascading pedigree. Schemes b and c
differ by the order of crosses of the founding parents. The target genes are represented by solid circles and other genes by shaded boxes. At each node the
transmission probabilities of the targeted genes from parent to offspring are given. When the probability is equal to one, it is not indicated. The population sizes
needed at each node (N ) and the cumulated population sizes (Ntot) are provided. From Servin et al. (2004) with permission.

313
314 Chapter 8

become so large for some schemes that it n


1
would be practically impossible. As a result, fF1 =
2
the cascading pedigree-based scheme would n1 n2
become the only choice, although it would 3 1
fP1BC1 = (8.13)
take many more generations to derive the 4 4
root genotype. n1 n2
1 3
fP2BC1 =
4 4

The three proportions were used as a guide


8.3.2 Crossing and selection strategies
as to whether a BC reduced population size
and to indicate which parent should be
Different crossing and selection strategies used as the recurrent parent.
may require vastly different population If the target alleles are dispersed among
sizes to recover a target genotype with the three parents, i.e. P1, P2 and P3, a top-cross
same certainty even when the same parents (or three-way cross), e.g. (P1 P2) P3 is
are used (Bonnet et al., 2005). Determination required to combine all alleles. If each par-
of the most efficient strategy has the poten- ent carries different alleles, the alleles con-
tial to dramatically decrease the amount tributed by parents P1 and P2 in the first
of resources (plants, plots, marker assays cross will be present at frequencies of 0.25
and labour) required to combine a set of following a top-cross with P3 and the alle-
target alleles into an idea genotype (ideo- les contributed by P3 will each have a fre-
type). Considerable efficiency gains can be quency of 0.5. If n1, n2 and n3 are the number
achieved if plant breeders are able to choose of target favourable alleles in the three par-
the most appropriate cross (e.g. single cross, ents, respectively, under the condition of no
BC or top-cross) and best MAS methods. selection, the expected proportion of indi-
In using markers, several scenarios are viduals with the target genotype in the DH
commonly faced by breeders: (i) pyramid- or RIL population is
ing alleles at multiple loci including con-
sideration of most appropriate cross type; n1 + n2 n3
1 1
(ii) minimizing marker screening costs by fTC = = 2n3 2n (8.14)
sequential culling; (iii) use of incomplete 4 2
linked markers to combine target alleles; and
(iv) combining alleles linked in repulsion where n = n1+n2+n3. Equation 8.14 is used
in crosses segregating for other unlinked to determine the order in which to cross
target alleles. Wang et al. (2007) used popu- parents to minimize the population sizes
lation genetic theory to establish general required in a top-cross.
rules for the numbers of markers required,
the best crossing strategies and the level of Minimizing the total number of marker
inbreeding to maximize the efficiency of assays with sequential culling
marker implementation where there is no
recombination between marker and allele In a population of N individuals to be screened
of interest. sequentially with markers at n independent
loci and where only those with the target gen-
Comparing biparental, back- and top-crosses otype are retained for screening with the next
marker, the total number of marker assays (M)
If n loci differ between two parents with required to identify the target genotype at all
favourable alleles at n1 loci in the first par- loci can be calculated according to
ent P1 and n2 in the second parent P2, then
relative proportions of the target genotype in M = N + Nf1 + Nf1 f2 + + Nf1 f2 fn1 (8.15)
DH or recombinant inbred line (RIL) popu-
lations derived from F1, P1BC1 (backcrossed where f1, f2, fn are the proportion of indi-
to P1) and P2BC1 (backcrossed to P2) are viduals retained after screening with each
Marker-assisted Selection: Theory 315

marker. For any set of markers, M will be with a smaller population size if selection
minimized if the marker with the lowest is delayed until greater homozygosity has
retained fraction f (or highest culling rate) is been reached.
used first, followed by the next lowest and For more segregating loci, population
so on. The total cost (C) of marker assays sizes quickly increase even in DH or RIL
can be determined from Eqn 8.15 by inclu- populations. For example, in a biparental
sion of the cost of each assay population with eight unlinked segregating
loci, the frequency of the target genotype in
C = Nc1 + Nf1c2 + Nf1 f2c3 + + a homozygous population is 0.58 = 0.0039,
Nf1 f2 fn1cn (8.16) the minimum population size 1777. In these
instances, Bonnet et al. (2005) proposed a
where c1, c2, cn are the cost of the marker two-stage selection strategy. The first stage
assays. The total cost, C, is minimized when is F2 enrichment, where F2 individuals
carrying the entire set of target alleles in
c1 c2 cn either homozygous or heterozygous form
< < ... < .
1 f1 1 f2 1 fn are selected. F2 enrichment takes advantage
of the high expected frequency of carrier
It should be noted that the analytic (either homozygous or heterozygous) at each
expression for the cost of sequential culling locus of 0.75. The value of the technique
ignores the costs of plant/line handling (tag- can be seen in a population segregating at
ging, leaf sampling, etc.) and DNA extrac- 12 loci, where the frequency of genotypes
tion, which are fixed with total sample size selected in an F2 enrichment step is 0.7512 =
and cannot be reduced by sequential cull- 0.031676, resulting in the minimum popula-
ing. If these fixed costs are major parts of tion size of 144 F2 generations, compared to
the expense for genotyping, the order of the frequency of 0.2512 = 5.960464 108 and
markers used in the sequential culling may a population size of > 77 million to identify
become less important. As high-throughput a single homozygous individual in the F2.
genotyping systems have been established After F2 enrichment, the frequency of
for using all markers for all samples to make each of the 12 target alleles in the selected
the genotyping most cost-effective overall, population is increased from 0.5 to 0.67.
the order of markers used in the sequential The second step is to generate a popula-
culling may become less important. tion of more or less homozygous lines from
the selected F2. The frequency of the target
Enrichment of favourable alleles at early genotype in DH/RIL populations gener-
generations ated from the enriched F2 will have been
increased from 0.512 to 0.6712, resulting in a
When many (unlinked) markers have to decrease in minimum population size from
be selected, the frequency of a target homo- 18,861 to 596. Thus, with enrichment, both
zygous genotype will be low and a large the F2 and the DH/RIL populations are of a
population size will be required. For exam- more practical size for breeding.
ple, in the F2 of a biparental cross between The allele enrichment can be done
two inbreds segregating at five unlinked for more than one generation when
loci, the frequency of the target genotype is multiple-generation selection is involved.
0.255 = 0.00098 and the minimum popula- Enrichment at two selection stages (e.g. in F2
tion size (Eqn 8.2) to recover at least one and F3) always requires greater assay num-
target genotype is 4714 (a = 0.01). If selec- bers than simple F2 enrichment (Wang et al.,
tion is made among homozygous lines 2007). As indicated by Bonnet et al. (2005),
(i.e. DH or RIL populations) from the same F2 enrichment increased the frequency of
cross, the frequency of the target genotype selected alleles, allowing large reductions
is 0.55 = 0.03125 with a minimum popu- in minimum population size for recov-
lation size of only 146 (a = 0.01), i.e. the ery of target genotypes (commonly around
target genotype is more readily recovered 90%) and/or selection at a greater number
316 Chapter 8

of loci. So the gain from another cycle of versa for negative correlation. If correlation
allele enrichment selection in F3 following results from pleiotropic effects of a marker
enrichment in F2 is at best minor and often gene rather than linkage, it is difficult, if not
results in a small net increase in minimum impossible, to select towards the direction
population size. opposite to the correlation.
For a top-cross of three adapted wheat
lines from an existing breeding programme,
simulation of changes in allele frequen-
cies at nine target genes (seven unlinked) 8.3.4 Marker-assisted recurrent
showed that population size was mini- selection versus genome-wide selection
mized with a three-stage selection strategy
in the F1 generation of the top-cross (TCF1), Recurrent selection is considered as one
the F2 generation of the top-cross (TCF2) of the selection approaches to combine
and DHs. Enrichment of allele frequencies favourable alleles distributed among dif-
in TCF2 reduced the total number of lines ferent sources of germplasm. There are
screened from > 3500 to < 600. Eight of the various new versions of recurrent selec-
genes were present at frequencies > 0.97 tion available with which molecular
after selection (Wang et al., 2007). marker information is incorporated. The
key advantages of these new versions are
the availability of genetic data for all pro-
8.3.3 Gene pyramiding for different traits geny at each generation of selection, the
integration of genotypic and phenotypic
The methods discussed above are for pyr- data and the rapid cycling of generations
amiding genes affecting a specific trait. of selection and information-directed mat-
However, aggregating favourable genes from ings at continuous nurseries.
different traits in a genotype has long chal- Marker-assisted recurrent selection
lenged plant breeders. The principles dis- (MARS) was proposed in the 1990s (Edwards
cussed above can be used in the same way and Johnson, 1994; Lee, 1995 Stam, 1995)
to accumulate QTL alleles controlling dif- which uses markers at each generation to
ferent traits. A distinct difference in con- target all traits of importance and for which
cept is that alleles at different trait loci to be genetic information can be obtained. Genetic
accumulated may have different favourable information is usually obtained from QTL
directions, i.e. negative alleles are favour- analyses performed on experimental popu-
able for some traits but positive alleles are lations, which includes QTL locations and
favourable for others. Therefore, one may effects. When the QTL mapping is conducted
need to combine the positive QTL alleles based on a biparental population, both par-
of some traits with the negative alleles of ents often contribute favourable alleles. As a
others to meet breeding objectives. Marker- result, the ideal genotype is a mosaic of chro-
assisted gene pyramiding is also important mosomal segments from the two parents. The
when considering multiple traits, as in phe- goal of MARS is to obtain individuals with as
notypic selection each of these traits has to many accumulated favourable alleles as pos-
be tested in different environments, differ- sible. However, the ideal genotype, defined
ent developmental stages or different stages as the mosaic of favourable chromosomal
of a breeding programme. segments from two parents, will usually
Attention should be paid to trait cor- never occur in any Fn population of realistic
relation when one practises pyramiding of size (Stam, 1995). As discussed previously,
alleles for different traits. Positive correla- a breeding scheme to produce or approach
tion will facilitate the pyramiding process this ideal genotype based on individuals of
involved in selection for alleles with the the experimental population could involve
same favourable direction, but impede the several successive generations of crossing
process of selection for QTL alleles with individuals (Stam, 1995; Peleman and van der
different favourable directions and vice Voort, 2003) and would therefore constitute
Marker-assisted Selection: Theory 317

what is referred to as MARS or genotype Lets consider genome-wide selection


construction. This idea can be extended to and MARS as depicted in Fig. 8.9. Bernardo
situations where favourable alleles come and Yu (2007) simulated genome-wide
from more than two parents. Please note that selection by evaluating DHs for testcross
MARS can also start without any QTL infor- performance in Cycle 0, followed by two
mation while selection can be based on sig- cycles of selection based on markers. Cycle
nificant markertrait association established 0 is evaluated during the regular growing
during the MARS process. season when phenotypic measurements
All simulation studies revealed that are meaningful. Cycles 1 and 2 of genome-
MARS was generally superior to phenotypic wide selection and MARS are conducted
selection in accumulating favourable alleles in an off-season nursery, where pheno-
in one individual (van Berloo and Stam, typic evaluations were not meaningful but
1998, 2001; Charmet et al., 1999) and MARS where three generations can be grown in
was between 3% and almost 20% more effi- 1 year. For Cycle 0, genome-wide selection
cient than phenotypic selection (van Berloo and MARS can be considered to be either
and Stam, 2001). The advantage of MARS involved in the F2 plants or the production
over phenotypic selection was greater when of DHs. Response to marker-based selection
the population under selection was larger is greater with DHs than with F2 plants.
or more heterozygous including BC1 or F2 Assuming that individuals are genotyped
populations. for NM markers and breeding values asso-
Through simulation, Bernardo and Yu ciated with each of the NM markers were
(2007) assessed the response due to MARS predicted and were all used in genome-wide
compared with genome-wide selection and selection, Bernardo and Yu (2007) found that
to determine the extent to which phenotyp- across different numbers of QTL (20, 40 and
ing can be minimized and genotyping maxi- 100) and levels of heritability, the response
mized in genome-wide selection. By their
definition, MARS refers to the improvement
Inbred 1 Inbred 2
of an F2 population by one cycle of MAS
(i.e. based on phenotypic data and marker F1
scores) followed by three cycles of marker-
based selection (i.e. based on marker scores Cycle 0, Doubled haploids or F2 (N0)
only) in an off-season nursery (Johnson Cross to tester
Evaluate testcrosses
2001, 2004). The marker scores are typi- Select N0-Sel
cally determined from about 20 to 35 mark- Genotype with NM markers

ers that have been identified, in a multiple Intermate N0-Sel

regression model, as significantly associated


with one or more traits of interest (Edwards F1 (N0-Sel)

and Johnson, 1994). Genome-wide selection


refers to marker-based selection without Cycle 1 (N )

significant testing and without identify-


ing a subset of markers associated with the Genome-wide selection MARS
trait (Meuwissen et al., 2001). The effects (All NM markers) (Significant markers only)
on the target trait (i.e. breeding values) of
all genotyped markers distributed across Select N1-Sel before flowering
Intermate N1-Sel
the genome are fitted as random effects in a
Cycle 2 (N ) Cycle 2 (N )
linear model. The trait values are then pre-
dicted as the sum of an individuals breed- Select N2-Sel before flowering
ing values across all the genotyped markers Intermate N2-Sel
End of selection End of selection
and selection is subsequently based on these
genome-wide predictions. In this book, the Fig. 8.9. Genome-wide selection and marker-
term genome-wide selection will be only assisted recurrent selection (MARS). From
used for this specific situation. Bernardo and Yu (2007) with permission.
318 Chapter 8

to genome-wide selection was 1843% larger when quantitative traits are involved. First,
than the response to MARS. Regardless QTL mapping so far has provided limited
the heritability and the number of QTL, results so that there is no trait for which all
response to genome-wide selection were related QTL have been located precisely.
smallest when NM = 64 markers were used. Therefore, it is very difficult, if not impos-
A minimum of NM = 128256 polymorphic sible, to make a comprehensive selection
markers should be used in genome-wide for any specific trait. It is also a complicated
selection in maize and more markers should issue to simultaneously select for multiple
be used for complex traits that have, at QTL. Secondly, epistasis will affect both the
the same time, a high heritability. In con- efficiency and the final products of MAS.
trast, response to MARS were largest with Thirdly, there are certain genetic correla-
NM = 64 or 128 markers. Genome-wide selec- tions among quantitative traits so MAS for
tion is most useful for complex traits that are one trait may also modify other correlated
controlled by many QTL and have low her- traits. Therefore, it is much more difficult to
itability. Responses to selection were main- apply MAS to quantitative traits.
tained when the number of DHs phenotyped
and genotyped in Cycle 0 was reduced and
the number of plants genotyped in Cycles 1 8.4.1 Selection based on phenotypic
and 2 was increased. Such schemes that min- values
imize phenotyping and maximize genotyping
would be feasible only if the cost per marker The theoretical basis for phenotypic selec-
data point is reduced to about US$0.02. As tion is that phenotypic value is an approxi-
availability of large numbers of SNP markers mate estimate of genotypic value and thus,
in many crop plants and array-based cheap selection based on phenotypic value can
genotyping systems, genome-wide selection, be considered approximately as a selection
as a brute-force and black-box procedure that based on genotypic value. The higher the
exploits cheap and abundant molecular mark- relatedness between phenotypic and geno-
ers, is superior to MARS in plants. Please typic values, the higher the efficiency of
note that in genome-wide selection, one does phenotypic selection.
not need any QTL information. Rather, one It should be noted that, under random
uses a general regression approach in a test mating, only a fraction of the total genotypic
set to obtain an estimate of breeding value value, namely the component contributed
from a very dense marker set and then selects by additive effects, can be transmitted from
on this marker set. one generation to the next and therefore,
only selection for the additive component of
genotypic value is effective. More precisely,
8.4 Selection for Quantitative Traits the more closely the additive effect of an
individual resembles its phenotypic value,
The most significant distinction of quan- the higher the efficiency of phenotypic
titative inheritance is that there is no cor- selection. In animal breeding, the additive
responding (simple) relationship between value of an individual is often known as its
genotype and phenotype, although conven- breeding value. The relatedness of pheno-
tional plant breeding is based on selection typic value to the additive effect depends
of phenotypes. This is the major reason why on narrow-sense heritability (h2 = sA2 /sP2),
the efficiency of conventional plant breed- where sA2 and sP2 are additive genetic vari-
ing is often low. Therefore, the main objec- ance and phenotypic variance, respectively.
tive of MAS should be for quantitative traits The higher h2, the greater is the relatedness
according to their importance and necessity. between phenotypic value and additive
In principle, the methodologies developed effect. When h2 = 1, the phenotypic value is
for qualitative traits in MAS are also appli- equal to the additive effect value. The effi-
cable to quantitative traits. However, more ciency of phenotypic selection increases as
factors should be taken into consideration the narrow-sense heritability increases.
Marker-assisted Selection: Theory 319

In conventional plant breeding, Here MAS method proposed by Lande and


improvement of quantitative traits has been Thompson (1990) will be described, which
relying on direct selection. Direct selection is based on markertrait regression and has
is to select the individuals with extreme been widely accepted. Under the additive-
phenotype (one with either the largest or effect model, the markertrait regression
the smallest phenotypic value) in each gen- equation is
eration so that the population mean changes
N
towards the direction of selection. As dis-
cussed in Chapter 1, the efficiency of direct y = m0 + a x + e
i =1
i i (8.17)
selection can be determined by response
to selection (R) or genetic advance (G), where y is the phenotypic value of an indi-
which is defined as the difference between vidual, m0 is the model mean, ai is the additive
the population mean of the progeny derived effect of marker i, xi is the category vari-
from the selected individuals (y) and the able of marker i (with values of 1, 0 and 1
population mean of the original or parental for marker genotypes MM, Mm and mm),
population (m), i.e. G = y m (Fig. 1.1). The e is random environmental error and N is
higher the genetic advance, the higher the the number of markers. By step-wise regres-
efficiency of selection. Apparently, genetic sion, markers with significant effects on the
advance is positively proportional to herit- target trait and thus most probably linked to
ability. Under a given heritability, genetic QTL can be selected and additive effect esti-
advance depends on selection rate (the pro- mates (i) can be used to calculate marker
portion of selected individuals to the total score for each plant:
individuals in the population). The smaller
the selection rate, the larger the selection n
intensity (the difference of population
means between the selected individuals
m= a x
i =1
i i (8.18)

and the original population) and the larger


the genetic advance. where n is the number of markers selected.
Marker score m is an approximation of the
additive effect and the extent of approxi-
8.4.2 Selection based on marker scores mation depends on the proportion of addi-
tive genetic variance explained by selected
2
From discussion above, under random mat- markers (s M ) to the total additive genetic
ing, selection should be much more efficient variance (s A2 ), that is, p = s M
2
/s A2 . The higher
if it can be based on additive effects. The key the value of p is, the better m is as a pre-
issue here is how to estimate the additive dictor of an individuals additive genetic
effect for each plant. Theoretically, it can value. Only when p = 1, is m equal to an
be estimated through markerQTL analysis. individuals additive effect. Selection based
By primary QTL analysis, however, it is dif- on marker scores is called marker-score
ficult to detect all QTL and estimate their selection.
effects precisely and thus the estimate of Both marker score and phenotypic
additive effect is only an approximate with value are the approximations of additive
a potentially large estimation error. In order effect and their degrees of approxima-
to obtain an accurate estimate for an indi- tion depend on p and h2, respectively. So
viduals additive effect, it is necessary to selection efficiencies of these two methods
map each QTL precisely. At present, MAS depend on the relative magnitudes of p and
can be only processed based on approxi- h2. That is, selection based on marker scores
mate additive effects. might not be more efficient than selection
Most QTL mapping methods can be based on phenotype, depending on whether
used to obtain additive effects (see Chapter p is larger than h2.
6 for details). In practice, however, a more Lets now discuss the direct selection.
convenient and efficient method is required. Denote genetic advances obtained through
320 Chapter 8

marker-score selection and phenotypic than in phenotypic selection (Hospital


selection as GM and GP, respectively. et al., 1997). The first issue can be solved
Under the same selection rate, the relative by constant re-evaluation and screening for
efficiency of these two methods is markers that have significant effects on the
trait. If molecular markers are re-evaluated
GM p and selected for associations in each genera-
RE MP = = (8.19)
GP h2 tion, selection efficiency will be improved
significantly (Gimelfarb and Lande, 1994a).
which indicates that the relative efficiency However, this approximation will increase
is determined by the relative magnitudes the cost for molecular marker analysis. It
of p and h2. It can be inferred that for traits should be more reasonable if this re-evalu-
with relatively low heritability the relative ation and selection can be performed every
efficiency is high. The lower the heritability two to three generations (Hospital et al.,
is, the higher the relative efficiency. For the 1997).
traits with relatively high heritability, the
efficiency of phenotypic selection will be
high enough so that there is no necessity for 8.4.3 Index selection
marker-score selection. In addition, marker-
score selection may be less efficient than As discussed above, both marker score and
phenotypic selection because of estimation phenotypic value are approximates of addi-
errors of marker scores. tive effect, each containing only partial
Although selection based on marker information of additive effects that could
scores has relatively higher efficiency when complement each other. If marker score and
heritability is low, the power for detecting phenotypic value can be combined, selection
QTL will be decreased and the sample error based on the integrated information should
of marker scores will increase and thus the have higher efficiency. Therefore, Lande and
efficiency of selection based on marker Thompson (1990) proposed that a selection
scores will decrease if the heritability is too index should be constructed using marker
low (Moreau et al., 1998). Under low herit- score and phenotypic value:
ability, therefore, it is necessary to increase
population size and use low thresholds for I = bzz + bmm (8.20)
declaring QTL in order to improve the power
for QTL detection and to decrease the esti-
which can be optimized by choosing the
mate error of marker scores (Gimelfarb and
weight coefficients bz and bm to maximize
Lande, 1994a; Hospital et al., 1997; Moreau
the rate of improvement in the mean pheno-
et al., 1998).
type per generation.
If the estimates of marker scores are
The selection method based on the
reliable, selection based on marker scores
selection index is called index selection.
will have significant genetic advance in
In the above equation, z is the phenotypic
early generations. However, genetic advance
value and m is the marker score. The optimal
will often decrease with the advance of
weight coefficients bz and bm are
generations and disappear in three to five
generations so that no further significant
genetic advance is possible (Edwards and s G2 s M
2
(1 p)h2
bz = = (8.21)
Page, 1994). There are two reasons for this sP sM
2 2
1 ph2
phenomenon. First, genetic recombination
breaks the linkage relationship between and
the marker and QTL. Secondly, favourable
alleles with minor effects are lost during s P2 s G2 1 h2
bm = = (8.22)
the selection while the unfavourable alle- s P s M 1 ph2
2 2

les are fixed (become homozygous) at the


rate that is faster in marker-score selection respectively.
Marker-assisted Selection: Theory 321

The selection index also approximates to 1.0


the additive effect, to which extent depends 0.9

Heritability of the MAS index


on its heritability (Knapp, 1998): 0.8
0.7
0.6
(1 p)h2 p(1 h2 ) 0.5
hI2 = + 2 (8.23)
1 ph2
h 2ph2 + p 0.4
0.3
0.2
From this equation, the higher the 0.1
selection index heritability (h2I ), the better 0.0
predictor of additive effect the selection 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of additive genetic variance
index becomes and the higher the selec- explained by markers (p)
tion efficiency. When p = 0, h2I = h2, i.e.
index selection is equivalent to phenotypic Fig. 8.10. Relationship between heritability of the
selection. For a given h2, h2I increases as MAS index (h2I ) and the proportion of the additive
p increases and it increases dramatically genetic variance p = s M2 /s G2 with p ranging from 0.0
when h2 is low. h2I increases rapidly when to 1.0 and heritability (h2) ranging from 0.1 to 1.0,
0 < p 0.5 (Fig. 8.10). This indicates that where s M2 is the additive genetic variance associated
with markers and s G2 is the additive genetic variance.
low heritability has a strong effect on marker
From Knapp (1998) with permission.
score and in this case MAS should have a
higher impact on selection.
For directional selection, the relative 5
efficiency of index selection and pheno-
Heritability
typic selection can be expressed as (Lande 0.025
4
and Thompson, 1990): 0.05
Relative efficiency

3 0.10
GI p (1 p)2
RE IP = = + (8.24)
GP h2 1 ph2 0.20
2
0.50
where GI is the genetic advance of selection 1.00
index. Fig. 8.11 shows how REIP changes 1
with p under different levels of h2. For a
given p, REIP increases with the decrease 0
of h2, that is, MAS is more efficient when 0.0 0.2 0.4 0.6 0.8 1.0
Proportion of additive variance explained by markers
heritability is low; while REIP increases
with the increase of p but the increase rate Fig. 8.11. Efficiency of MAS in the improvement
becomes slow when h2 is high. When h2 of a single trait relative to traditional individual
reaches to an intermediate level (h2 = 0.5), selection index with the same selection intensity,
index selection has no apparent advantage. assuming very large sample sizes. Relative
When h2 = 1, REIP does not change with p, efficiency is plotted as a function of the proportion
with a constant value of 1, indicating that in of the additive genetic variance in the trait
this case molecular markers do not provide significantly associated with the marker loci, for
any extra information so that MAS has no various values of the heritability of the trait. From
positive contribution at all. Lande and Thompson (1990) with permission.
The relative efficiency of index selec-
tion and marker-score selection can be
which indicates that no matter what
expressed as
values p and h2 take, there is REIM 1.
Therefore, index selection always has higher
GI RE IP h2(1 p)2
RE IM = = = 1+ (8.25) selection efficiency than marker-score selec-
GM RE MP p(1 ph2 )
tion, which has been proven by computer
322 Chapter 8

simulation (Whittaker et al., 1997) and is given period. Based on this result, Hospital
different from the situation where selection et al. (1997) proposed a selection strategy
is based on marker scores. with one-generation of index selection
Index selection depends on both pheno- and several generations of marker-score
typic value and marker score. Therefore, the selection alternatively. In the generation of
factors that affect the efficiency of marker- index selection, a relatively larger popula-
score selection will also affect the efficiency tion is required for re-evaluation and selec-
of index selection. Computer simulation tion based on molecular markers, in order
(Gimelfarb and Lande, 1994a,b, 1995) indi- to maintain the reliability of markertrait
cated that index selection is more efficient regression. Conversely, relatively smaller
than phenotypic selection at least for the populations can be used in the generation
first several generations, but this advantage of marker-score selection.
disappears very quickly with the advance of
generations. In advanced generations, index
selection might be less efficient than the 8.4.4 Genotypic selection
phenotypic selection. This could happen
in advanced generations: when the degree Both marker-score selection and index
of the additive effect explained by marker selection depend on genotypic value or
score is not as good as that by phenotypic more specifically the additive components
value (i.e. p < h2), while sampling errors of genotypic value rather than the genotype
of weight coefficients in Eqn 8.20 amplify itself. Therefore, both selection methods
the relative importance of marker score so are for selection of genotypes through gen-
that the proportion of additive genetic vari- otypic values, which is indirect and they
ance explained by the selection index is not have no virtual difference from phenotypic
as good as that by phenotypic value (h21 < selection. This is not exactly the concept
h2). Therefore, both marker-score and index of MAS which has been proposed and
selection have advantage only at the early expected. Because genotypic value is the
stages of selection and, in advanced genera- result of genotypic expression, different
tions phenotypic selection works better. genotypes may have the same genotypic
Although index selection utilizes more value, that is, a genotypic value could
genetic information and thus it is more match up with many genotypes. There is
efficient than marker-score selection, it a loss or degeneracy of genetic information
costs more and needs more work in order from genotype to genotypic value. This
to gain the extra information for pheno- information degeneracy will result in a low
typic value. Furthermore, measurement of efficiency of selection and the loss of some
phenotypic value is limited to the stage at favourable QTL alleles with relatively
which the trait is expressed, which cancels smaller effects. The more the QTL involved
the advantage that MAS can be done at any in selection, the higher the chance that
stage. In addition, when the measurement favourable alleles will be lost. Therefore, a
of phenotype needs to be progeny tested, more efficient selection method should be
the cycle of index selection becomes longer that based on the genotype itself (which is
so that the advantage of index selection may called genotypic selection) as MAS for the
not offset this disadvantage. For example, qualitative trait. More specifically, each
hybrid maize yield has to be measured by target QTL is selected based on its two
progeny test and each cycle of index selec- flanking markers, a single closely linked
tion will take 2 years, while four cycles of marker, or a gene-based marker.
marker-score selection can be done in 2 Currently, genotypic selection of quan-
years (Edwards and Page, 1994). Although titative traits is now limited by the avail-
marker-score selection has lower genetic ability of QTL that have been fine mapped.
advance per cycle compared to index selec- For most quantitative traits, only QTL with
tion, it has higher genetic advance per unit large effects have been mapped on a rela-
time because more cycles can be done in a tively rough scale, leaving a lot of minor
Marker-assisted Selection: Theory 323

QTL non-detectable. To improve the effi- critical for quantitative traits that are geneti-
ciency and reliability of MAS, markers that cally controlled by many genes and interact
are flanking the target QTL should be tightly with environments. Advanced backcross-
linked. However, if the target region is too ing QTL (AB-QTL) analysis, proposed by
small, it might not contain the target QTL Tanksley and Nelson (1996) to accelerate
because the primary QTL mapping is not so the process of molecular breeding, is one
accurate. It is important to develop flanking of the approaches that can be used for this
markers that bracket the target QTL with purpose. Stuber et al. (1999) discussed
high confidence in order to have high selec- their effort to test a marker-based breeding
tion efficiency. scheme for systematically generating supe-
As discussed in the previous section for rior lines without any prior identification
selection of qualitative traits, it is better to of genes in the donor sources. Identifying
use three linked markers in selection and and mapping of genes in the donor is a
the best positions of these markers will be bonus obtained when the derived NILs are
determined by the confidence interval of the evaluated. This method is somewhat simi-
QTL. The middle marker should be tightly lar to AB-QTL analysis. Other approaches
linked to, located exactly at, or identical to, include using associations identified in F2
the QTL, which should be bracketed by two populations to select the subsequent self-
flanking markers. The optimized window pollinated populations.
size for the target region determined by the The AB-QTL strategy postpones QTL
flanking markers is positively proportional mapping until the BC2 or BC3 generation.
to the confidence interval of the QTL. The The delay of QTL analysis offers advantages
larger the QTL confidence interval, the big- for QTL characterization such that the prob-
ger the target region bracketed by the flank- ability is reduced for the detection of QTL
ing markers is required for guaranteeing that displaying epistatic interactions among
the QTL is located in the target region. donor alleles due to their overall low fre-
As indicated by Hospital and Charcosset quency. In fact, there will be a higher prob-
(1997), using position-optimized markers ability of detecting additive QTL which still
to follow the target QTL in BC breeding, function in a near-isogenic background.
favourable alleles at four independent QTL During the generation of BC2 or BC3 popula-
can be transferred from the donor parent to tions, negative selection is being exercised
the recurrent parent, with a population con- to minimize the occurrence of unfavourable
sisting of several hundreds of individuals. donor alleles. The advantage of focusing
If there is linkage between QTL and QTL on the BC2 or BC3 population is that they
are precisely mapped or larger population offer sufficient statistical power for QTL
sizes are used, more QTL can be transferred identification on the one hand and on the
simultaneously. other hand provide sufficient similarity to
the recurrent parent to select for QTL-NILs
in a short time span (within 12 years). By
use of QTL-NILs, the QTL discovered can
8.4.5 Integrated marker-assisted be verified and the NILs may serve directly
selection either as improved cultivars or as a parent
cultivar in case of hybrid crops (Peleman
As discussed previously, markertrait and van der Voort, 2003).
association identified in one population The AB-QTL approach can be exploited
has to be validated before it can be used for for pyramiding QTL alleles. Each time that
MAS in other populations. One of the best AB-QTL analysis is applied, the map posi-
ways to avoid the marker the validation step tions of donor QTL affecting key traits will
is to integrate genetic mapping with MAS, likely be discovered so that QTL mapping
that is, markertrait associations identified information derived from AB-QTL analysis
from a breeding population will be used is cumulative. Based on this knowledge, as
for MAS of the same population. This is indicated by Tanksley and Nelson (1996),
324 Chapter 8

it would be straightforward to combine projects to introgress these favourable alle-


favourable donor QTL alleles detected in les from the O. rufipogon accession into
one experiment with non-allelic QTL affect- cultivated rice.
ing the same trait from other experiments in
which a different donor parent was used.
In this way, it should be possible to pyramid
all non-allelic QTL with similar effects 8.4.6 Response to marker-assisted
detected within a given species or across selection
the related species, if they act without much
influence of epistasis. MAS for traits controlled by major genes
The AB-QTL approach has been suc- will receive a strong response. However, the
cessfully used to identify markers for QTL response to selection, or genetic advance,
contributing to fruit size, shape, colour and for quantitative traits will depend on sev-
firmness together with soluble solids and eral factors: linkage between markers and
total yield in tomato. On this basis, QTL- genes, trait heritability, gene effects, gene
marker associations were identified in one interactions, population size, the number
BC generation and immediately applied of plants selected and the breeding scheme.
in the subsequent BC generation some 6 In classical selection theory, the expect-
months later (Tanksley et al., 1996). In rice, a ation, genetic variance and heritability of
series of advanced backcrossing populations the target trait are required, as well as the
have been developed through collaborations covariance between the target trait and
between Cornell University and breeders selection criterion in the case of indirect
around the world to identify and introgress selection. In backcrossing without selec-
trait-enhancing alleles from wild species tion, as described in Chapter 4, the expected
into high-yielding elite cultivars. The first donor genome proportion in generation BCn
such study employed a cross between the is 1/2n+1. In backcrossing with selection
wild rice relative Oryza rufipogon and the for the presence of a target gene, Stam and
Chinese indica hybrid V20/Ce64 (Xiao Zeven (1981) derived the expected donor
et al., 1998). Although the O. rufipogon genome proportion on the carrier chromo-
accession was phenotypically inferior for all some of the target gene. Their results were
12 traits studied, transgressive segregation extended to a chromosome carrying the tar-
was observed for all traits and 51% of the get gene and the recurrent parent alleles at
QTL detected had beneficial alleles from O. two flanking markers (Hospital et al., 1992)
rifupogon. By MAS and field selection, an and to a chromosome carrying several target
excellent CMS restorer line (Q661) carry- genes (Ribaut et al., 2002a).
ing one of the QTL for yield components has An example in Lande and Thompson
been developed. Its hybrid, J23A/Q661, (1990) demonstrated that on a single trait
out-yielded the check hybrid by 35% in a the potential selection efficiency by using a
replicated trial for the second rice crop in combination of molecular and phenotypic
2001 (Yuan, 2002). A second QTL study information, compared to standard methods
used an advanced BC population between of phenotypic selection, depends on the her-
the same O. rufipogon accession and the itability of the trait, the proportion of addi-
upland japonica rice cultivar Caiapo tive genetic variance associated with marker
and identified beneficial QTL alleles from loci and the selection scheme. As discussed
O. rufipogon for 56% of the trait-enhancing previously, the relative efficiency of MAS is
QTL detected (Moncada et al., 2001). A third greatest for traits with low heritability if a
study employed the O. rufipogon in a cross large fraction of the additive genetic variance
with the long-grain Jefferson, a US tropi- is associated with marker loci. Limitations
cal japonica cultivar and the O. rufipogon that may affect the potential utility of MAS
allele was favourable for 53% of the in applied breeding programmes include:
yield and yield component QTL (Thomson (i) the level of linkage disequilibrium in the
et al., 2003). There are several ongoing populations, which affects the number of
Marker-assisted Selection: Theory 325

marker loci needed; (ii) sample size needed the efficiency of MAB in crops with smaller
to detect trait loci with low heritability; and genomes is much higher than that in crops
(iii) sample errors in the estimation of rela- with larger genomes.
tive weights in the selection indices. Using > 80 markers in maize (corres-
Frisch and Melchinger (2005) developed ponding to a marker density of 25 cM) or
a theoretical framework for MAS for the > 60 markers in sugarbeet (marker density
genetic background of the recurrent parent 15 cM) resulted in only a marginal increase
in a BC programme to predict the response of the response to selection, irrespective of
to selection and give criteria for selecting the the population size employed (Fig. 8.12).
most promising BC individuals for further Increasing the population size up to 100
backcrossing or selfing. The approach dealt plants resulted in substantial increase in
with selection in generation n of the BC pro- response to selection on both crops and using
gramme, taking into account pre-selection even larger populations still improves the
for the presence of one or several target expected response to selection. Frisch and
genes, the linkage map of the target gene(s) Melchinger (2005) concluded that increas-
and markers and the marker genotype of the ing the response to selection by increasing
individuals used as non-recurrent parents the number of markers employed is possi-
for generating BC generations. ble only up to an upper limit that depends
Response to selection R is defined as on the number and length of chromosomes.
the difference between the expected donor In contrast, increasing the response to selec-
genome proportion m in the selected fraction tion by increasing the population size is
of a BCn population and the expected donor possible up to population sizes that exceed
genome proportion m' in the unselected BCn the reproduction coefficient of most crop
population: species.
An optimum criterion for the design of
R = m m' (8.26) MAS in a BC population can be defined by
the expected response to selection reached
Prediction of the response to selection with a fixed number of MDP. For a fixed
can be employed to compare alternative sce- number of MDP in sugarbeet, designs with
narios with respect to population size and large populations and few markers always
required number of markers. This applica- reached larger values of response to selec-
tion was illustrated by the example of a BC1 tion than designs with small populations
population using model genomes close to and many markers (Fig. 8.12). For maize,
maize (ten chromosomes of length 2 M) and the same trend was observed for 500 and
sugarbeet (nine chromosomes of length 1 M) 1000 MDP, while for a larger number of
with markers evenly distributed across all MDP the optimum design ranged between
chromosomes, a target gene located 66 cM 40 and 50 markers. Therefore, in BC1 popu-
from the telomere on a chromosome and one lations of maize and sugarbeet and a fixed
individual is selected as the non-recurrent number of MDP, MAS is, within certain lim-
parent of generation BC2. its, more efficient for larger populations than
The expected response to selection for for higher marker densities.
maize ranged from 5% of the donor genome In theory, MAS is proposed to be more
(20 markers, 20 plants) to 12% (120 mark- efficient than phenotypic selection when
ers, 1000 plants) and for sugarbeet it ranged the heritability of a trait is low, where there
from 7% to 15% (Fig. 8.12). To obtain a is tight linkage between QTL and markers
response to selection of 10% with 60 mark- (Dudley, 1993; Knapp, 1998), with larger
ers, a population size of 180 is required in population sizes (Moreau et al., 1998) and
maize, corresponding to 180/2 60 = 5400 in earlier generations of selection before
marker data points (MDP). By comparison, recombinational erosion of markertrait
in sugarbeet a population size of 60 is suf- associations (Lee, 1995). Edwards and Page
ficient, resulting in only 30% of the MDP (1994) proposed that the distance between
required for maize. The result indicates that markers and QTL was the factor that most
326 Chapter 8

Ten chromosomes of length 2M Nine chromosomes of length 1M


m=
1000
15 15
500

m=
13 1000 13 200
Response to selection (%)

Response to selection (%)


500
100
11 11 80
200 60

100 40
9 80 9
60
40
20
7 7

20

5 5

20 40 60 80 100 120 20 40 60 80 100 120


Number of markers Number of markers

20,000 MDP 2000 MDP


10,000 MDP 1000 MDP
5000 MDP 500 MDP

Fig. 8.12. Expected response to selection throughout the entire genome and expected number of
required marker data points (MDP) when selecting the best out of m = 20, 40, 60, 80, 100, 200, 500 and
1000 BC1 individuals. Model of the maize genome with ten chromosomes of length 2 M (left-hand side).
Model of the sugarbeet genome with nine chromosomes of length 1 M (right-hand side). From Frisch and
Melchinger (2005) with permission.

limited genetic gains from MAS. Yousef composite populations, MAS resulted in
and Juvik (2001a) reported an empirical significantly higher gain than phenotypic
experiment that provided equivocal results selection for 38% of the comparisons, while
regarding the relative efficiency of MAS and phenotypic selection was significantly
phenotypic selection in enhancing econom- greater in only 4% of the cases. The average
ically important quantitative traits in sweet MAS and phenotypic selection gains, calcu-
corn. MAS and phenotypic selection were lated as percent increase or decrease from
applied to three F2:3 base populations with the randomly selected controls, was 10.9%
either the sugary 1 (su1), sugary enhancer 1 and 6.1%, respectively.
(se1), or shrunken 2 (sh2) endosperm muta- Recognizing that small mapping popu-
tions. One cycle of selection was applied lations are not adequate for QTL mapping
to both single and multiple traits such as is the first and most important realization
seedling emergence. Selection efficiencies needed in the research community (Young,
were evaluated on the basis of gains over 1999). Scientists must understand that
one cycle. Among 52 paired comparisons simply demonstrating that a complex trait
between MAS and phenotypic selection can be dissected into QTL and mapped to
Marker-assisted Selection: Theory 327

approximate genomic regions using DNA many of the assumptions of the underly-
markers is not enough. Projects need to ing quantitative genetic models and to test
utilize better scoring methods, larger popu- the limits of selection itself. The power of
lation sizes, multiple replications and envir- selection is best presented by the selection
onments, appropriate quantitative genetic responses that have been observed in two
analysis, various genetic backgrounds and, important agricultural species. US maize
whenever possible, independent verifica- yield increased from a pre-1930 average of
tion through advanced generations or paral- 1.6 t ha1 (26.1 bushels acre1) to an aver-
lel populations (Melchinger et al., 1998; Utz age of 8.6 t ha1 (134.7 bushes acre1) for the
et al., 2000; Schn et al., 2004). Only then 5 year period from 1998 to 2002, a fivefold
will sufficient experimental evidence be in increase over 70 years (http://www.usda.
place for a successful MAS programme. gov/nass/). Of course, not all of the increase
What if we knew all the genes for a is due to selection, but studies have consist-
quantitative trait in hybrid crops? This was ently shown that genetics can account for
asked by Bernardo (2001), when working 50% of the increase. Milk yield in Holsteins
on the prediction of hybrid performance had increased from 5870 kg in 1957 to
through computer simulation. With maize 11,338 kg in 2001, representing a doubling
as a model species, he found through trait in milk yields over 44 years (http://aipl.
and gene best linear unbiased prediction arsusda.gov/dynamic/trend/current/trndx.
(TG-BLUP) that gene information is most html). There is evidence that the genetic
useful in selection when few loci (e.g. ten) trend continues to increase with time
control the trait. With many loci ( 50), in Holsteins. Molecular techniques have
the least square estimates of gene effects provided novel tools to analyse the final
become imprecise. Gene information con- selection product and reveal the change
sequently improves selection efficiency of genetic structures with the progress of
among hybrids by only 10% or less and selection experiments.
actually becomes detrimental to selection, Including long-term selection in this
as more loci become known. Bernardo fur- chapter is justified by a concept reversed
ther indicated that increasing the popula- breeding-to-genetics, which starts with a
tion size and trait heritability to improve selection programme to pyramid favour-
the estimates of gene effects also improves able alleles from various sources of germ-
phenotypic selection, leaving little room plasm to create transgressive variation and
for improvement of selection efficiency via great selection response followed by genetic
gene information. He thought genomics is analysis (usually marker-assisted evaluation)
of limited value in selection for quantitative to identify the genes and alleles associated
traits in hybrid crops. Epistatic interactions, with the selection response. As it will take
which were assumed absent in his study, years to pyramid genes and alleles from mul-
would make the estimation of gene effects tiple sources through genetic mapping and
even more difficult. It is unknown whether MAS, the reversed breeding-to-genetics
methods other than TG-BLUP or multiple approach can be used to exploit cumulated
regression would substantially enhance the novel alleles and genes by taking advantage
usefulness of gene information in selection. of the availability of plant materials that have
been accumulating in genetic and breeding
programmes. Combining with the strategy
of selective genotyping revised by Xu et al.
8.5 Long-term Selection (2008) and Sun et al. (2009), it could be more
realistic by starting with selection to pyramid
As one of the most powerful tools available alleles followed by genetic analysis to iden-
to biology, selection is used in the plant and tify the genes, compared to the genetics-to-
animal sciences to develop improved crop breeding approach by which genes/QTL are
cultivars and livestock breeds. Selection mapped in separate genetic analyses and then
is also used in laboratory species to test to pyramided by MAS. In this section, we will
328 Chapter 8

discuss the long-term selection experiments Segment 1. Generations 09, mass selec-
in plants and marker-assisted evaluation tion based on chemical composition.
of the selection results, which can be con- Numbers of ears analysed and selected
sidered the reversed breeding-to-genetics varied but approximately 20% of the
approach. ears analysed were selected. Each strain
was grown in a separate but isolated
field.
8.5.1 Long-term selection in maize Segment 2. Generations 1025. 120 ears
per strain were analysed and 24 were
There are several long-term selection experi- saved. Seed from each ear was planted
ments in maize (Duvick et al., 2004; Hallauer ear-to-row. Alternate rows were detas-
et al., 2004; Dudley and Lambert, 2004). The selled and 20 ears were analysed from
most well-known is the selection for oil and each of the six highest yielding rows.
protein contents which has been running Four ears were saved per row.
for over 100 generations. The detail about Segment 3. Generations 2652 in IHP
this experiment can be found in the special and ILP; generations 2658 in IHO and
volume of Plant Breeding Reviews (Volume ILO. Twelve selected ears were arbi-
24, Part 1, 2004). Only some significant proce- trarily divided into two lots (A and B)
dures and results will be summarized here. of six ears. Seed within each lot was
bulked and planted in the nursery.
Procedure Silks in lot A were pollinated by a bulk
sample of pollen from 1520 plants
The long-term selection experiment for oil in lot B while silks in lot B were pol-
and protein content in maize was initiated linated with pollen from lot A. Thirty
at the University of Illinois by C.G. Hopkins ears from each lot were analysed and
before the rediscovery of the Mendelian laws the 12 most extreme of the 60 ears ana-
(Hopkins, 1899). The most recent update on lysed were saved.
this long-term selection can be found from Segment 4. Generations 5390 in IHP
Dudley and Lambert (2004). Although the and ILP; 5990 in IHO and 5987 ILO.
original goal was to produce agriculturally The selection procedure was the same
valuable crops by increasing the oil and as in segment 3 but 90100 kg of N fer-
protein content of the kernels, the results tilizer ha1 were added to the soil. Only
are also quite remarkable from a theoreti- 87 generations were completed in ILO
cal viewpoint. One of the most interesting because of difficulties with seed set and
results was that the continued selection did seed quality which cause a loss of some
not deplete the variability. Truly the results generations.
were not in full compliance with the simple
Mendelian expectation. Following 48 generations of forward
In 1896, Hopkins initiated selection in selection, reverse selection was initiated
the open-pollinated maize cultivar Burrs in each of the four strains to form four
White (Hopkins, 1899). He analysed 163 new strains: Reverse High Protein (RHP),
ears for oil and protein concentration. The Reverse Low Protein (RLP), Reverse High
24 ears highest in protein, the 12 ears lowest Oil (RHO) and Reverse Low Oil (RLO)
in protein, the 24 ears highest in oil and the (Figs 8.13 and 8.14). The objective was to
12 ears lowest in oil were selected to initiate determine the extent of residual variability
the Illinois High Protein (IHP), Illinois Low available for selection. The selection pro-
Protein (ILP), Illinois High Oil (IHO) and cedure was the same as in the forward
Illinois Low Oil (ILO) strains, respectively. strains except that selection was for low
Both forward and reverse selection has been protein in IHP, high protein in ILP, etc.
conducted at different times in the experi- Following seven generations of selection
ment. The forward phase of the experiment in RHO, selection was against reverse to
was divided into four segments as follows: initiate the Switchback High Oil (SHO)
Marker-assisted Selection: Theory 329

25
Oil means
IHO
RHO
20 SHO
ILO
RLO

15
Oil (%)

10

0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Generation

Fig. 8.13. Mean oil percentage plotted against generations for IHO, RHO, SHO, ILO and RLO derived
from 100 generations of selection. From Dudley and Lambert (2004). This material is reproduced with
permission of John Wiley & Sons, Inc.

40 Protein means
ILP
35 RLP
IHP
30 RHP

25
Protein (%)

20

15

10

0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Generation

Fig. 8.14. Mean protein percentage plotted against generations for IHP, RHP, ILP and RLP derived
from 100 generations of selection. From Dudley and Lambert (2004). This material is reproduced with
permission of John Wiley & Sons, Inc.
330 Chapter 8

strain. Beginning in generation 90 of ILP, with other species where viability problems
a new strain called Reverse Low Protein 2 caused progress to cease, significant genetic
(RLP2) was initiated by selection for high variability was found in strains which had
protein in ILP. This strain was initiated to plateaued. Thus, it is not clear whether
determine whether genetic variability still an upper limit has been reached for protein
existed that could be exploited by selection in IHP.
after an apparent lack of progress in ILP for Based on significant progress in the
nearly 35 generations. The selection proce- reverse selection strains, genetic variance
dures were the same as in the regular and had not been exhausted at generation 48.
reverse selection strains, and have been The results from RLP2 are inclusive as to
described in detail (Dudley et al., 1974; whether exploitable genetic variance for
Dudley and Lambert, 1992). high protein still existed at generation 90
in ILP. Results from the per se evaluation
Limits to selection trials suggest that progress is being made,
but the generation data do not confirm this
Response over all generations is presented result.
by means for each generation plotted One unusual result of reverse selection
against generation number for all strains occurred in RHP. All the gain from the first
(Figs 8.13 and 8.14). The data upon which 48 generations of selection was dissipated
the figures were based is available in details by the next 15 generations of selection
in Dudley and Lamberts (2004) Appendix (Fig. 8.14). The progress per generation for
Tables 5.A15.A5. these 15 generations was approximately
One of the objectives of this experiment 0.68% per generation a rate at least three
was to determine the limits to selection for times that in any other segment of any strain
oil and protein in maize. The question has for protein.
been answered for low oil and low protein
in that progress ceased when oil became so Explanation of progress
low it was no longer measurable with the
analytical tools available. Protein appar- For oil, total gain in IHO is approximately
ently reached a lower limit after approxi- four times the total gain in ILO. For protein,
mately 65 generations when no further the gain in IHP is approximately three times
progress was possible with the selection the gain in ILP. Gain in the high direction
methods used. This low limit is likely phy- for both oil and protein is greater from gen-
siological in nature. erations 49 to 100 than from generations 0 to
Three types of evidence suggested an 48. In contrast, nearly 90% of the gain from
upper limit has not been reached for oil in selection in ILO and ILP came in the first
IHO and SHO. Significant genetic variation 48 generations of selection. Given that the
still exists in generation 98 and has not lower limit to selection for ILO and ILP is
changed since generation 65 as results near zero and the upper limit could approach
from the per se evaluation trials showed 100%, this greater gain in the high direction
significant increases in oil during the last is not surprising. The gain between genera-
five generations measured. Thus 100 gen- tions 48 and 100 for IHO (9.7% oil) is simi-
erations of selection have not eliminated lar to that in RHO (9.0% oil) and the gain
genetic variability and an upper limit has in IHP (12.5%) protein from generations 48
not been reached for either IHO or SHO. to 100 is similar to that in RHP (12.2% pro-
For IHP, the results are not clear. Data tein). The gain in RLO from generations 48
from the evaluation trials indicated no sig- to 100 is nearly ten times that in ILO and the
nificant increase in protein since genera- gain in RLP over the same generations is 13
tion 88. Genetic variance in generation 98 times that in ILP.
was not significantly different from that in These results are consistent with the
generation 68. Also there is no apparent gene frequency estimates assuming a model
viability problem in IHP as in experiments with a relatively larger number of genes
Marker-assisted Selection: Theory 331

affecting the traits, each with relatively Table 8.8. Ultimate limits to selection, measured
similar effects and additive gene action. as number of sA with varying values of n (number
The frequency of favourable alleles (q) in of loci segregating) and q (frequency of favourable
the original population was estimated as alleles) (from Dudley (1977) with permission).
approximately 0.2; therefore, greater gain q
for higher oil or protein should be possible
than lower values. When reverse selection n 0.1 0.25 0.5 0.75 0.9
was initiated, q was estimated to be approx-
imately 0.5 in both IHP and IHO. Thus, 10 13 8 3 3 2
selection in either direction should be pos- 50 30 17 10 6 3
sible and the total possible change should 100 42 24 14 8 5
200 60 35 20 12 7
be approximately the same in either direc-
tion. The switchback selection occurred at a
gene frequency of 0.35, which could allow to be low, approximately 0.25, and n > 50.
greater progress in the high direction than Such values are consistent with estimates
in the low, as was observed. of q of approximately 0.2 for both oil and
By evaluating the results from selection protein and n of 54 and 123 for oil and pro-
of 48 generations, Leng (1962) suggested four tein obtained, respectively. Although these
possible genetic interpretations: (i) acciden- results suggest all the progress would be
tal outcrossing; (ii) favouring of heterozy- explained by segregation of a large number
gotes in selection; (iii) high rate of mutation of genes in the original population, muta-
of the chemical genes concerned; and (iv) tion cannot be eliminated as a possible
release of variability by some unknown source of some of the variation upon which
means. He immediately dismissed these selection continues to operate.
interpretations because: (i) pollination has Goodnight (2004) and Eitan and Soller
been under strict control throughout the (2004) suggested epistasis as an important
long-term study. (ii) Favouring heterozy- factor to explain the negative or positive
gosity cannot be ruled out; however The heterosis for oil and protein observed in
rapid response to reverse selection in all the crosses involving the long-term selec-
four strains, if it were attributed to residual tion strains, supporting the hypothesis that
heterozygosity alone, would have required additive additive espitasis was important.
the level of heterozygosity to have remained Further evidence comes from the Design
at nearly the same level through 48 genera- III study of Moreno-Gonzalez et al. (1975)
tions of successful selection. This appears where crosses of both the F2 and the F6 of
highly improbable. (iii) Since all four the cross of IHO ILO back to the parents
strains are relatively uniform and show no exhibited negative heterosis for oil. This
evidence of being highly mutable . . . muta- hypothesis is further supported by the pres-
tion is not considered a likely explanation. ence of significant negative heterosis for pro-
(iv) A plausible mechanism is that contin- tein in the crosses of IHP RLP and IHP
ued recombination plays a role. ILP (Dudley et al., 1977).
However, as indicated by Dudley (1977), Dudley et al. (1974) suggested part of
it is possible to explain all the progress by the continued response in IHP could be due
segregation of a relatively large number of to a change in environment because the
genes (n), each at a relative low frequency addition of N-fertilization in generation 53
(q) in the original population. The number increased response per generation from 1.4
of additive genetic standard deviations to 1.6 g kg1 protein per cycle. The increase
of progress possible for a given value of n of available N fertilizer presumably allowed
and q as calculated by Dudley (1977) based alleles for higher protein to be expressed
on theory derived by Robertson (1970) is and selected.
shown in Table 8.8 for a sample of values of Finally, Walsh (2004) argued that muta-
q and n. For the progress of 21 sA made in tion was a necessary assumption to explain the
IHO and 18 sA in IHP, gene frequency needs result of the long-term selection experiment.
332 Chapter 8

He indicated that gain based on mutational unique opportunity to investigate the genetic
variance is expected to exceed that from gain basis of kernel chemical traits and have
based on residual segregation from the origi- been used to produce maize populations to
nal population after about 46 generations for map the QTL responsible for the selection
oil and 33 for protein. Although per-locus response (Goldman et al., 1993). By using
mutation rates are typically very small, for a 90 genomic and cDNA clones distributed
wide range of traits the mutational variance throughout the maize genome to detect
introduced in each generation is on the order RFLPs between IHP and ILP strains, 22 loci
of 1/100th of the environmental variance. distributed on ten chromosome arms were
This can be quite significant after 1020 gen- significantly associated with protein con-
erations. Keightley (2004) reviewed selection centration and clusters of three or more sig-
experiments in inbred lines and concluded nificant loci were detected on chromosome
that mutational variance was important in arms 3L, 5S and 7L, suggesting the presence
selection response. However, as indicated by of QTL with large effects at these locations.
Dudley (2007), neither Walsh nor Keightley A multiple linear regression model consist-
considered the effects of epistatic inter- ing of six significant loci on different chro-
actionon selection response. They concluded mosomes explained over 64% of the total
that epistasis may be an important factor in variation (Goldman et al., 1993). These sig-
explaining long-term response to selection, nificant QTL associations can be used to
which has been supported by the results from account for the long-term selection response
the crosses of IHO ILO and IHP ILP for and the protein content difference between
epistatic interactions as more epistatic inter- the IHP and ILP strains. It can be expected
actions were significant than expected by that the longer the selection proceeds, the
chance and the number of markers associated bigger the difference of protein content will
only with significant epistatic effects ranged be in the resulting selection strains and thus
from 46.3 to 72.2% of the total number of the potential to detect additional QTL, as
significant markers detected (Dudley, 2008). long as the populations continue to respond
I would rather suggest that both large num- to selection. This expectation can be tested
bers of loci at low gene frequency in the by QTL mapping using the crosses from the
original population and their recombination IHP and ILP strains derived from different
and epistatic interaction in the long-term cycles of selection.
selection lines should have contributed to Instead of using the extreme divergence
the long-term selection response. of the parents to create mapping populations,
Wassom et al. (2008) identified kernel QTL
Marker-assisted evaluation in a genetic background more relevant to
practical breeding by using 150 BC1-derived
Response to phenotypic selection can be S1 lines (BC1S1s) from IHO and recurrent
evaluated and associated genes can be iden- parent B73. Oil, protein and starch were
tified using molecular markers. The Illinois measured in BC1S1s and in Mo17-top-cross
long-term selection experiment on maize oil hybrids. Multiple regression models with
and protein contents (Dudley and Lambert, 39 QTL detected for each trait by compos-
1992, 2004) and marker-assisted evaluation ite interval mapping explained 46.9, 45.2
(Goldman et al., 1993) provides such an and 44.3% of phenotypic variance for oil,
example. The long-term divergent selection protein and starch, respectively, in BC1S1s
response can be attributed to the accumula- and 17.5, 22.9 and 40.1% for oil, protein
tive action of alleles with similar effect that and starch, respectively, in the testcross
had been dispersed among the individuals hybrids.
of the original population (Xu, 1997), while Laurie et al. (2004) used an association
de novo mutations may be an alternative study to infer the genetic basis of dramatic
explanation for this divergence, as indicated changes that occurred in response to selec-
by selection for bristle number in Drosophila tion for changes in oil concentration. The
(Mackay, 1995). The selection strains offer a study population was produced by a cross
Marker-assisted Selection: Theory 333

between the high- and low-selection lines at effective population size, due to the system
generation 70 when the oil concentrations used for bulking of pollen from multiple tas-
were estimated for IHO as 16.7% and for sels and using the bulked pollen to fertilize
ILO as 0.4%, followed by ten generations many ears, may be larger than previously
of random mating and the derivation of 500 calculated, contributing to less inbreeding
lines by selfing. These lines were genotyped than estimated earlier (Walsh, 2004). The
for 488 genetic markers and the oil concen- lower levels of inbreeding observed for the
tration was evaluated in replicated field tri- reverse strains than the forward strains sug-
als. As a single admixture event between gest the change in direction of selection
IHO and ILO created linkage disequilibrium for protein levels may have contributed to
(LD) between genes with different allelic maintenance of heterozygosity over gen-
frequencies and the ten-generation random erations in the reverse strains. There were
mating eliminated essentially all associa- trends in the variant frequencies in the for-
tion between unlinked markers and most of ward and reverse strains that are consistent
those between loosely linked markers, the with response to selection.
population can be used for LD mapping. All the RFLP loci selected to assay on
Three methods of analysis were tested in the strains based on association with QTL
simulations for ability to detect QTL. Using for protein contents in IHP ILP-derived
the most effective method-model selec- mapping populations showed frequency
tion in multiple regression, 50 QTL were trends consistent with response to selection
detected which accounted for 50% of the in one or both of the reverse strains. Only
genetic variance, suggesting that > 50 QTL one RFLP locus that showed a trend has
are involved. The QTL effect estimates are not been identified as a QTL. The selection
small and largely additive. About 20% of of probes based on previous QTL associa-
the QTL have negative effects (i.e. not pre- tions most likely increased the probability
dicted by the parental difference), which of identifying loci with variant frequency
is consistent with hitchhiking and small trends. These probes are more likely to
population size during selection. The large reveal variants that respond to reverse
number of QTL detected accounts for the selection, if they were not fixed by cycle 48
smooth and sustained response to selection when the reverse selection was initiated.
throughout the 20th century. These loci are therefore good candidates to
Mikkilineni and Rocheford (2004) look for changes in variant frequencies in
characterized RFLP variant frequencies in response to reverse selection. Eight probes
two cycles (65 and 91) of IHP, ILP, RHP and (23%) showed reverse trends for the RHP
RLP. As revealed by RFLPs, considerable strain. Twelve probes (34%) showed trends
variation at the DNA level was maintained for the RLP strain. One probe (3%) showed
in the Illinois long-term selection protein a trend for just the RHP strain. Five probes
strains even after 91 generations of selec- (14%) showed trends for just the RLP strain.
tion. Only one locus was observed with a Seven probes (20%) showed trends in com-
unique RFLP variant detected in just one mon for both the strains. All seven loci that
of the four strains. Although only 35 RFLP displayed trends in both directions were
loci were looked at, it does not appear there associated with QTL in IHP ILP mapping
was much variation that might potentially populations (Goldman et al., 1993, 1994;
be attributable to mutation. The inbreeding Dijkhuizen et al., 1998).
values calculated from the RFLP data from RFLP genotypic and variants fre-
cycles 65/69 and 91 were lower than those quency difference among cycle 90 of the
calculated on the strains before molecular oil strains, IHO, ILO, RHO and RLO, were
marker data were available. Maize under- also determined (Sughroue and Rockeford,
goes inbreeding depression and thus there 1994). A high degree of variant polymor-
may have been some natural selection phism was found among the four oil strains
within the selection strains for more vigor- and many RFLP loci were still segregating
ous and more heterozygous plants. Also, the within the oil strains after 90 generations
334 Chapter 8

of selection. RFLP variant trends consistent regation of tiller angle was found in two
with response to directional selection were rice F2 populations, 5002 Zhu-Fei 10 and
detected in comparisons among the four oil HA79317-7 Zhen-Nong13. By divergent
strains. selection for tiller angle in each F2 popula-
tion, two types of true-breeding extremes
Application to plant breeding were obtained, one with larger tiller angle
and the other with smaller tiller angle.
The total gain from selection, both in abso- Transgression of tiller angler was confirmed
lute value and in number of additive genetic in the two extreme crosses (Xu and Shen,
standard deviations, is well beyond what 1992b). For loci contributing to variation
might have been expected from the distribu- in tiller angle, the alleles of similar effect
tion of oil and protein values in the original were proved to be dispersed in the original
population. Likewise, they are well beyond parents but associated (pyramided) in the
what has been possible by selection for agro- extreme selections. By crossing two extreme
nomic traits such as grain yield. To illustrate strains each derived from one original cross,
the possible increases in maize grain yield new transgression was found in the F2 and
if selection for yield was as effective as for then two types of extremes were obtained
oil and protein, estimates of grain yield and by the second cycle of divergent selection.
sA from two maize synthetics, RSSSC (a By crossing the second-cycle extremes with
stiff-stalk synthetic) and RSL (a Lancaster each other and the third cycle of divergent
derivative) obtained in Illinois were used. selection for larger tiller angle, all posi-
The original means were 6.66 t ha1 for RSL tive alleles from the four original parents
and 9.23 t ha1 for RSSSC. Assume a gain of were pyramided. The transgression in each
24 sA, the approximately average of what original cross can be explained by the com-
was observed for oil and protein. The gain plementary action of the genes, which had
would be 33.28 t ha1 for RSL and 27.44 t ha1 been dispersed between the original par-
for RSSSC or a yield at the limit of 39.94 t ents and complemented each other when
ha1 for RSL and 36.68 t ha1 for RSSSC. they were pyramided in the extreme strains
Assuming some heterosis, the ultimate yield (Xu et al., 1998). Since this transgression
would be around 43.96 t ha1. These values was observed in replicate experiments
are not unreasonable when the fact that a (Xu and Shen, 1992b,c), it is unlikely that
yield of over 31.4 t ha1 was reported in Iowa the results are due to the mutation events
in 2002. As indicated by Dudley and Lambert as reported for experiments on divergent
(2004), these results suggest the existence selection for bristle number in Drosophila
of more genetic variability and more plas- (Mackay, 1995).
ticity in the maize genome than is usually We would expect genetic fixation with
expected. They also suggest that limits to long-term selection programmes. However,
selection for yield have not been reached. To selection experiments discussed above for
mark the importance of the long-term selec- maize for high and low protein or oil and in
tion experiment for protein and oil in maize Drosophila for bristle numbers (Yoo, 1980)
at the University of Illinois, the conference show no indication of genetic fixation from
titled Long-term Selection: A Celebration long-term selection resulting in remark-
of 100 Generations of Selection for Oil and able changes in phenotype. Frequent iden-
Protein in Maize was held on 1719 June tification of large-effect QTL, as reviewed
2002 in Urbana, Illinois. by Tanksley (1993), Kearsey and Farquhar
(1998) and Xu, Y. (2002), makes steady and
sustained selection response puzzling: alle-
les of large effects should be fixed rapidly,
8.5.2 Divergent selection in rice after which no further response would be
seen. Barton and Keightley (2002) named
Xu et al. (1998) reported a divergent selec- two factors that might explain this apparent
tion experiment in rice. Transgressive seg- paradox. First, QTL-mapping experiments
Marker-assisted Selection: Theory 335

underestimate the number of QTL and over- effects of statistically significant QTL are
estimate their effects. Secondly, mutation substantially overestimated. As the results
generates alleles of large effect, which can from RFLP-based evaluation of long-term
be picked up quickly enough by selection selected maize lines, the numbers of QTL
to sustain a continuing selection response. discovered in various experiments have
Several mechanisms have been described begun to get close to the numbers of genes
that can create de novo variation, including required to explain the long-term selection
intragenic recombination, unequal crossing response.
over among repeated elements, transposon Studies of this type would not be
activity, DNA methylation and paramuta- possible without the availability of the
tion. Barton and Keightley (2002) listed long-term selection strains. This fact
several factors that make it difficult to esti- points to the importance of maintaining
mate the true numbers and effects of loci longer-term selection programmes so that
influencing a quantitative trait. Hyne and these kinds of genetic stocks are available
Kearsey (1995) pointed out that in a typ- for various types of studies. Maintenance
ical experiment (heritability 40%, 300 F2 of long-term breeding materials is becom-
individuals), no more than 12 QTL are ever ing more challenging in an era frequently
likely to be detected, which is supported focused on short-term genomic-based
by empirical data on the numbers of QTL experiments funded by short-term com-
detected in plants as reviewed by Tanksley petitive grants. However, it is genetic
(1993), Kearsey and Farquhar (1998) and stocks developed by public sector long-
Xu, Y. (2002). Both Beavis (1994) and Utz term breeding and selection programmes
and Melchinger (1994) indicated that unless that frequently facilitate many of these
samples are large (> 500, for example), the studies at molecular level.
9
Marker-assisted Selection: Practice

Developments in genomics have provided cesses (CFIA/NFS, 2005; Heckenberger


new tools for discovering and tagging et al., 2006; IBRD/World Bank, 2006). If
novel alleles and genes useful for improv- the marker loci are sufficiently close on
ing target traits and for manipulating those genetic or physical maps then reasonably
genes in breeding programmes through good inferences may be made about the cul-
marker-assisted selection (MAS), i.e. selec- tivars haplotype. Such information is used
tion of phenotypic traits indirectly using to establish identity, resolve disagreements
markers that are closely linked to the traits related to germplasm ownership and acqui-
or are developed from the gene-related sition, enforce laws intended to encourage
sequences. Plant breeding will benefit from genetic diversity of the hybrids and avoid
MAS through: (i) more effectively identify- using inbreds that contain transgenes which
ing, quantifying and characterizing genetic may violate regulatory considerations and
variation from all available germplasm restrictions. These are often the very first
resources (Tanksley et al., 1989; Tanksley applications of genomics in private sector
and McCouch, 1997; Gur and Zamir, 2004); breeding programmes, which is discussed
(ii) tagging, cloning and introgressing genes in Chapter 13.
and/or quantitative trait loci (QTL) use- Using molecular markers in plant breed-
ful for enhancing the target trait through ing programmes has been widely discussed
genetic transformation and molecular (Beckmann and Soller, 1986a; Paterson et al.,
marker technologies (Dudley, 1993; Gibson 1991; Dudley, 1993; Stuber, 1994a; Xu and
and Somerville, 1993; Paterson, 1998; Peters Zhu, 1994; Lee, 1995; Hospital and Charcosset,
et al., 2003; Gur and Zamir, 2004; Pea, 1997; Xu, Y., 2002, 2003; Eathington et al.,
2004; Holland, 2004; Salvi and Tuberosa, 2007; Bernardo, 2008; Collard and Mackill,
2005); and (iii) manipulating (differentiat- 2008; Xu and Crouch, 2008; Xu et al., 2009b, d).
ing, selecting, pyramiding and integrating) Using rice and other cereal crops as exam-
genetic variation in breeding populations ples, Xu, Y. (2003) provided a comprehen-
(Stuber, 1992; Xu, 1997; Collard et al., sive review on the MAS system, germplasm
2005; Francia et al., 2005; Varshney et al., evaluation, hybrid prediction and seed qual-
2005b; Wang, J. et al., 2007). MAS can also ity control. Much has happened in maize
have significant utility in plant breeding breeding since Stuber and Moll (1972) first
programmes through assisting PVP (plant reported that selection for grain yield in
variety protection) and DUS (distinct- maize had resulted in changes in allele fre-
ness, uniformity and stability testing) pro- quencies at several isozyme loci throughout

336 Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu)


Marker-assisted Selection: Practice 337

the genome. In so doing, they essentially As conventional breeding systems


laid the grounds for MAS in maize. Indeed, attempt to combine more and more tar-
if phenotypic selection (PS) could produce get traits, there tends to be an overall loss
a change in marker allele frequencies, then of breeding gain and an increase in the
why could deliberately altering marker allele duration of breeding cycles (time to generate
frequencies at specific loci not produce pre- a new product). Hence, MAS offers poten-
dictable phenotypic changes for one or sev- tial to greatly improve the overall pace, pre-
eral traits? cision and impact of the breeding progress
The success of MAS depends on loca- by assembling target traits in the same geno-
tion of the markers with respect to genes type more precisely, with less unintentional
of interest. Three kinds of relationships losses and in fewer selection cycles.
between the markers and respective genes
could be distinguished. (i) The molecular
marker is located within the gene of inter- 9.1 Selection Schemes for
est, which is the most favourable situation Marker-assisted Selection
for MAS and in this case, it could be ide-
ally referred to as gene-assisted selection.
MAS is most useful for traits where pheno-
While this kind of relationship is the most
typic evaluation is expensive or difficult,
preferred one, it is also difficult to find
particularly for those polygenic traits with
this kind of marker. (ii) The marker is in
low heritability that are highly affected
linkage disequilibrium (LD) with the gene
by the environment. It is also useful to
of interest throughout the population. LD
break linkages between the target traits
is the tendency of a certain combination of
and undesirable genes in so-called marker-
alleles to be inherited together. Population-
accelerated backcross breeding. MAS may
wide LD can be found when markers and
also offer the opportunity to address goals
genes of interest are physically close to each
not possible through conventional breed-
other. Selection using these markers can be
ing, such as pyramiding different sources
called LD-MAS. (iii) The marker is in link-
of disease resistance that have similar phe-
age equilibrium with the gene of interest
notypes. Indirect selection based on marker
throughout the population, which is the
genotype rather than phenotype can be
most difficult and challenging situation for
used to accelerate the speed and increase
applying MAS.
the precision of genetic progress, reduce the
The efficiency of MAS depends on many
number of generations and when integrated
factors associated with how the underlying
into optimized molecular breeding strate-
markertrait associations (MTAs) were iden-
gies, it can also lower costs of selection. Xu,
tified, including: the size of the mapping
Y. (2002) discussed six situations that are
population, the nature of the phenotyping,
most suitable for MAS. These include selec-
the design and analysis of the experiment,
tion without testcrossing or a progeny test;
the number of markers used, the distance
selection independent of environments;
between marker loci, the genomic region
selection without laborious fieldwork or
containing the desired QTL and the propor-
intensive laboratory work; selection at an
tion of additive genetic variance explained
earlier breeding stage; selection for multi-
by the marker, the selection method and the
ple genes and/or multiple traits; and whole
experimental design. The efficiency of MAS
genome selection.
also depends on many factors associated
with its application, including: the crop
and breeding system, the molecular breed-
ing process and the nature of the genotyping 9.1.1 Selection without testcrossing
pipeline. For private breeding programmes, or progeny test
MAS has offered several attractive fea-
tures, most of which are related to time and In plant breeding, many traits need test-
resource allocations. crossing and a progeny test for unambiguous
338 Chapter 9

identification. Typical examples include ratory using sophisticated equipment or


male-sterility restorability, wide compat- facilities, or a large number of samples
ibility, heterosis and combining ability. In are required, which means that it cannot
testcrossing, each candidate plant will be be measured until late generations when
crossed to testers and then its genotype will a relatively large amount of seed becomes
be inferred from a progeny test in the next available for each selection entry. Chemical
season. Each candidate plant must be har- and physical properties of grain are exam-
vested and maintained separately and only ples that fall under this category. Traits
the plants with the target trait will advance such as tissue cultivability need laborious
to the next level. Testcrossing may continue laboratory work for testing each sample.
for several generations until the selected Using MAS, a piece of leaf harvested at
plants reach a certain level of homozygosity. any growth stage of plants or even a piece
Using MAS, testcrossing and/or a progeny of endosperm will be enough for accurate
test can be eliminated since the target trait measurement of all the traits mentioned
can be identified from the candidate plant above, once associated markers have been
itself, based on marker genotypes, saving identified.
laborious testcrossing and time-consuming
progeny tests.
9.1.4 Selection at an early breeding
stage
9.1.2 Selection independent
of environments Traits that are only measurable at or after
the reproductive stage would be good can-
Many traits must be screened in specific or didates for MAS. For example, grain qual-
controlled environments where they can be ity can only be tested using mature seeds.
fully expressed. For example, photoperiod Yield heterosis and yield potential must be
or temperature sensitivity can only be iden- measured after harvest or/and in advanced
tified by comparison of their phenotypes generations. For forest and fruit trees, many
in two distinct photoperiod or temperature traits have to wait up to several years until
conditions. For identification of insect/ adult stages for phenotyping. MAS can be
disease resistance, plants must be inocu- made at any stage and in any generation,
lated artificially or naturally. For abiotic so that breeders do not need to maintain a
resistance, such as drought, salinity and large number of candidate plants generation
submergence tolerance and lodging resist- after generation (year after year). A recent
ance, selection in traditional breeding pro- summary report on wheat MAS (Kuchel
grammes can only be done when the specific et al., 2008) indicated that the integration of
stress is present. To measure responses to MAS for specific target genes, particularly
agrochemicals, such as herbicides and plant at the early stages of a breeding programme,
growth regulators, these chemicals must be is likely to substantially increase genetic
applied to plants at the right stage under improvement.
suitable environments. MAS has made it
possible to perform indirect selection for
these traits. 9.1.5 Selection for multiple genes
and multiple traits

9.1.3 Selection without laborious field In some cases, multiple pathogen races or
or intensive laboratory work insect biotypes must be used to identify
plants for multiple resistances, but, in prac-
Many important traits are phenotypically tice, this may be difficult or impossible,
invisible or unscorable by visual observa- because different genes may produce simi-
tion and must be measured in the labo- lar phenotypes that cannot be distinguished
Marker-assisted Selection: Practice 339

from each other. MTA can be used to select 9.2 Bottlenecks in Application of
multiple resistances simultaneously. Marker-assisted Selection
Consider selection for multiple traits
for example, temperature-sensitive genic To analyse the bottlenecks that may limit
male sterility (TGMS), amylose and wide the application of MAS in plant breeding,
compatibility in rice. Candidate plants it is necessary to have a brief overview of
must be tested under two different envir- the current status of MAS. Several private
onments where TGMS can be identified. companies have been routinely using MAS
Each plant must be testcrossed with wide- in breeding programmes, benefiting from
compatibility testers, following up with their long-term basic research programmes
a progeny test in the next season. At the and the availability of all the components
same time, a large amount of seed must of MAS. It is certainly a big investment for a
be harvested for amylose measurement. breeding company/institution to start from
Thus, using PS methods, we must wait scratch to run an efficient and fully oper-
until a large number of seeds are available ational MAS-based breeding programme. In
and a reasonable level of homozygosity is contrast to conventional breeding schemes,
reached. the methods and design of infrastructure
needed to support MAS have been the areas
of greatest change. In order to utilize MAS,
9.1.6 Whole genome selection companies had to make significant invest-
ments to assemble or modify various aspects
MAS can also be practised at the whole of infrastructure such as methods to detect
genome level. Whole genome selection can DNA polymorphism, manage information,
be used to eliminate the donor genome in or analyse and track samples, software to
backcross breeding or to get rid of link- relate genotype with phenotype and off-
age drag when a wide cross is involved. season or continuous nurseries (Ragot and
Combined with MAS for multiple traits, Lee, 2007). These components had to be
whole genome selection allows the breeder integrated with each other and with breed-
to transfer multiple traits through back- ing activities, which meant that scientists
crossing simultaneously. needed to learn how and when MAS pro-
High density molecular maps can be vided a comparative advantage over other
used to determine the genotype of an indi- methods when taking into account time and
vidual at many, sometimes thousands of cost components.
loci and make it possible to deduce the most MAS has been applied in the private
favourable genetic constitution for various sector for crops of great commercial interest
regions throughout the entire genome in a including maize, soybean, canola, sunflower
given individual. By portraying molecu- and vegetables. MAS in maize cultivar devel-
lar data in a graphical form, as discussed opment aims at recovering an ideal genotype
in Chapter 8, a graphical genotype can be defined as a mosaic of favourable chromo-
inferred to show the genomic constitution somal segments from the parents (referred
and parental derivation for all points in the to as genotype construction). More specifi-
genome (Young and Tanksley, 1989a), which cally MAS in maize has been used to simul-
opens up the possibility of conveniently taneously select for multiple traits (selection
analysing quantitative traits in map-based based on marker information only) such as
whole genome selection. As an extension of yield, biotic and abiotic stress resistance
this concept, the graphical genotype can be and quality attributes (Ragot et al., 2000;
described for QTL and used to identify from Eathington, 2005; Eathington et al., 2007),
mapping populations the desirable indi- several of which are polygenic in nature.
viduals with a favourable combination of Although there is very limited information
different QTL alleles or with association of on successful breeding product delivery,
all alleles of similar effects across the whole the first commercial products of molecular
genome. breeding (rather than limited MAS) have
340 Chapter 9

been released from multinational breed- in implementation of MAS strategies for


ing companies. The first molecular breed- cultivar development has been achieved by
ing maize hybrids developed by Monsanto the MAS Wheat Consortium in the USA and
entered the US commercial portfolio in the 80 MAS projects were completed and over
2006 cropping season and as estimated by 300 additional backcrossing programmes
2010, over 12% of the commercial crop in are attempting to incorporate 22 different
the USA will be derived from molecular disease and pest resistance genes and 21
breeding (Fraley, 2006). alleles favouring bread-making and pasta
MAS has also been used to some quality (Dubcovsky, 2004). With all the
extent in the public sector for plant breed- efforts above and other MAS breeding pro-
ing through gene introgression and gene grammes in the public sector worldwide,
pyramiding, particularly for major-gene however, there are very few documented
controlled disease resistance and for the releases or registrations of new cultivars.
crops of less interest to the private sector (for Some examples available, as shown in
a review, see Dwivedi et al., 2007). William Table 9.1, include two rice cultivars released
et al. (2007b) reported the use of MAS in the in the USA, Cadet and Jacinto, with unique
International Maize and Wheat Improvement cooking and processing quality traits (http://
Center (CIMMYT) wheat breeding pro- www.ars.usda.gov/is/AR/archive/dec00/
grammes. Large MAS programmes have rice1200.pdf). In Indonesia, two rice culti-
been developed to help wheat breeding in vars, Angke and Conde, released possess-
Australia with MAS extensively used for at ing resistance to bacterial blight, produced
least 19 genes, or chromosome regions for 20% greater yield over IR64 (Bustamam
cultivar development in Australian wheat et al., 2002). In common bean, USPT-ANT-1
breeding programmes (Eagles et al., 2001). was registered as an anthracnose-resistant
During the last few years, remarkable progress pinto bean germplasm line which contained

Table 9.1. Examples of released crop cultivars developed through marker-assisted selection.

Crop Trait Breeding product Reference

Common bean Resistance to Resistance incorporated in Miklas et al. (2003)


anthracnose Pinto bean cultivar,
USPT-ANT-1 containing
Co-42 gene that confers
resistance to all known
North American races of
anthracnose in USA
Pearl millet Downy mildew The parental lines of the Navarro et al. (2006)
original hybrid (HHB 67) were
improved for downy mildew
resistance through MAS and
conventional backcross breeding,
and new hybrid HHB 67-2 with
improved resistance to downy
mildew released in India
Rice Bacterial blight Angke and Conde, possessing Bustamam et al. (2002)
resistance to bacterial blight,
produced 20% greater yield
over IR64 and released in
Indonesia
Rice Amylose content Cadet and Jacinto with unique http://www.ars.usda.
cooking and processing quality gov/is/AR/archive/
traits released in USA dec00/rice1200.pdf
Marker-assisted Selection: Practice 341

the Co-42 gene conferring resistance to all markers is probably the one published by
known North American races of anthracnose Concibido et al. (1996) for soybean cyst
in the USA (Miklas et al., 2003). In pearl mil- nematode resistance. The volume of publi-
let, the parental lines of the original hybrid cations on the development and to a lesser
(HHB 67) were improved for downy mildew extent application of markers for assist-
resistance through MAS and conventional ing plant breeding has increased dramati-
backcross breeding and a new hybrid HHB cally during the last decade. As a result,
67-2 with improved resistance to downy the number of articles containing the term
mildew was released in India (Navarro et al., marker-assisted-selection climbed to
2006). Quality protein maize from an extra- over 1000 in 2003 (Fig. 9.1). There is lim-
early single cross maize hybrid, Vivek Maize ited targeted public sector funding to sup-
Hybrid-9, which was developed through port the large-scale validation, refinement
MAS for the opaque2 gene (Babu et al., and application of MAS in field breed-
2005) has recently been released in India. ing. This can be seen from the number of
The limited success in breeding prod- articles with the term marker-assisted-
uct delivery in MAS can be further illus- selection (1390 in 2004) compared to the
trated by the numbers of publications number of articles with the term quantita-
that have been generated on QTL map- tive trait locus or quantitative trait loci
ping and MAS since the discovery of the (1250 in 1998 and 4440 in 2005, Fig. 9.1).
first generation of DNA markers (Xu and Most of articles on MAS result from either
Crouch, 2008). The term marker-assisted- investments from donors with a scientific
selection first appeared over two decades mandate or academic institutions with
ago (Beckmann and Soller, 1986b) and was a specific interest in showing promising
initially focused on the potential uses. applications of MAS in plant breeding.
A decade later, the community term To convert promising publications into
became increasingly interested in appli- practical application in field breeding
cation of the genes tagged by molecular requires breaking through many practical,
markers and the term appeared in over logistical and genetical bottlenecks (Xu
100 journal articles in 1995 (Fig. 9.1). and Crouch, 2008). This includes devel-
However, the first real article on applica- oping simple, quick and cheap technical
tion of MAS in plant breeding using DNA protocols for sampling, DNA extraction

5000

4500

4000
QTL
3500
Number of articles

3000

2500

2000

1500
MAS
1000

500

0
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
Year

Fig. 9.1. The numbers of articles with the terms QTL (quantitative trait locus or quantitative trait loci)
and MAS (marker-assisted-selection) by years from Google Scholar, 4 August 2007). From Xu and
Crouch (2008) with permission.
342 Chapter 9

and genotyping that remain reliable and 9.2.1 Effective markertrait association
precise when routinely applied at high-
throughput. This also includes develop- QTL publications have been increasing
ing tailored sample and data tracking tremendously in the past two decades as
and management systems plus powerful shown in Fig. 9.1 and involving almost
decision support tools to ensure effective all crop plants and all types of agronomic
integration of genotyping into breeding traits (as reviewed by Dwivedi et al., 2007).
programmes. Xu and Crouch (2008) dis- However, reports of QTL mapping to date
cussed the bottlenecks associated with have tended to be based on individual small
translation of MAS from publications to moderately sized mapping populations
to practice, particularly in public sec- screened with a relatively small number
tor breeding programmes. William et al. of markers, providing relatively low reso-
(2007a) provided technical, economic and lution of MTAs (Xu, Y., 2002, 2003; Salvi
policy considerations on MAS in crops and Tuberosa, 2005). Very few of the QTL
based on lessons from the experience at an reported have been utilized in plant breed-
international agricultural research centre. ing through MAS. Thus the community
In principle, effective MAS systems are is investing a large amount of money and
the result of the following activities: labour in generating a lot of publications
with little impact on applied plant breed-
Developing DNA extraction and tissue ing. One of the approaches to effective
sampling and tracking systems appro- MTA is selective genotyping and pooled
priate for large-scale field trials. DNA analysis discussed in Section 7.7.
Establishing a platform for molecu- Some inherent limitations to MAS are
lar data generation, management and related to the estimates of QTL position and
analysis that meet the needs of plant genetic effects and the rates of false posi-
breeding. tives and negatives. Confidence intervals
Developing analytical methods for for QTL are typically 1015 cM; a genetic
synthetic cultivar development, het- region that should not be a major barrier
erotic group construction and hybrid for implementing MAS although it could
prediction using molecular marker become a limitation to achieving genetic
information. gain by preventing the selection of desired
Exploiting genetic and breeding mater- recombination events. The advent of asso-
ials including populations, hybrids, ciation mapping and a growing pool of can-
open-pollinated populations, landraces didate genes should provide some resources
under selection and synthetic cultivars needed to minimize problems related to the
from ongoing breeding programmes. estimation of QTL position. The genetic
Validating MTAs using any popula- effects of QTL are overestimated for many
tion that is genotyped genome-wide for reasons, some of which are linked to experi-
marker-assisted backcrossing (MABC) mental designs for phenotyping or popula-
and phenotyped for target traits, lead- tion development while others are inherent
ing to the update and refinement of the to the process of QTL detection (Lee, 1995;
marker set. Beavis, 1998; Melchinger et al., 1998;
Optimizing MAS systems and refining Holland, 2004).
MAS breeding programmes by improv-
ing the following procedures: high-
throughput sampling, DNA extraction
and genotyping, environment control 9.2.2 Cost-effective and high-throughput
and characterization, precision pheno- genotyping systems
typing, integration of diversity analysis,
genetic mapping and MAS, and data Private corporations have established or are
generation, interpretation and delivery developing the capacity to produce hun-
systems. dreds of millions of data points per year in
Marker-assisted Selection: Practice 343

service laboratories, distinct from research and optimizing such a detection system is
units. Besides, smaller biotech compa- time-consuming and also expensive.
nies are developing technologies that could Continual improvement in the cap-
reduce the cost of each marker data point to ability of laboratories to generate molecu-
a mere few US cents (Ragot and Lee, 2007). lar data has come through the development
Without considering any other cost in MAS, of new types of markers allowing increas-
however, the current cost associated with ing automation. However, this has tended
DNA extraction alone is already a big bur- to come with the negative consequence
den for many plant breeding programmes of an increase in the cost of equipment
in terms of sample-based cost, especially required to achieve high-throughput low-
in the early stages when few assays are cost genotyping and in turn, the capacity to
required on each sample. So a great effort see molecular genotyping achieve impacts
will be needed first to minimize the cost at the scale of modern plant breeding pro-
associated with each step of DNA extraction grammes. Due to the large up-front costs of
including sampling, labelling, reagents and assembling infrastructure and personnel
plastic consumables. for genotyping, it is unlikely that individ-
PCR amplification is a necessary and ual national marker laboratories could pro-
also expensive step for all PCR-based mark- duce data points in a cost-efficient manner.
ers. Multiplexing PCR primers has been In advanced laboratories and in animal
an approach to significantly reduce the and human research, this has led to an
PCR related cost but it takes a lot of effort increased tendency towards centralization
to optimize the protocol for suitable multi- and in particular, a shift to an out-sourcing
plex marker sets. Multiplexed PCR primers mode of operation. Therefore, the actual
work well for genetic diversity analysis. genotyping might be most efficiently and
When they are used for genetic mapping effectively carried out through regional
and MAS, however, they have to be opti- hubs and/or out-sourcing services. Collard
mized and even redesigned for each spe- et al. (2008) discussed genotyping systems
cific cross or population because there is no that might be suitable for different situa-
universal marker set that contains markers tions and breeding programmes including
that are polymorphic across all crosses or gel- and non-gel-based genotyping systems
populations. for remote breeding station laboratories
Another significant cost related to MAS and capillary- and array-based genotyping
is the step of marker detection after PCR systems for regional hub laboratories.
amplification, which can be significantly
different from one assay type to another.
When screening PCR-based markers by agar-
ose gel electrophoresis, which is considered 9.2.3 Phenotyping and sample tracking
more suitable for MAS of single traits, gel
preparation and electrophoresis and scor- Once a high-throughput system has been
ing time for a 50200-sample-gel can take as established for DNA extraction, PCR ampli-
long as 34 h. Using microtitre plates or dot fication and marker detection, the bottleneck
blot detection of allele-specific gene-based will be the phenotyping that is required for
markers offers substantially higher through- MTA before MAS and sample tracking that
put and lower costs than gel-based assays. is required for a large number of plants and
However, those systems are not suitable families during MAS. Phenotyping has
for large-scale MAS using large numbers of been considered as critical in the era of
markers for both genetic background selec- post-genomics and is now receiving greater
tion and multiple target traits. Effective and attention than ever. Precision and global
efficient marker genotyping systems for phenotyping of a large number of plant sam-
large-scale MAS depend on a high-through- ples is very expensive and time-consuming
put detection system that works with a large and is the limiting factor that affects the
number of markers. In general, developing accuracy of genetic mapping and the power
344 Chapter 9

of MAS. Private corporations have realized a whole for MAS. As plant breeders always
the need for such high precision phenotyp- work with a large number of plants and
ing as can be seen from their active recruit- populations and some crop species cannot
ing of trait-specific phenotyping scientists be as easily organized in the field as others,
often located in targeted areas where the to facilitate the sample collecting and track-
trait of interest can be more easily meas- ing, sampling tracking will finally deter-
ured (e.g. positions dedicated to drought mine whether MAS can be processed in a
tolerance and located in arid regions of high-throughput manner and thus whether
the world) (Ragot and Lee, 2007). Beyond MAS is practicable on a large scale.
laboratories, plant handling is becoming a
bottleneck to high-throughput protocols.
High-throughput facilities have to be estab- 9.2.4 Epistasis and
lished and equipped at continuous nursery genotype-by-environment interaction
sites potentially to handle millions of plants
per year. Genetic effects related to epistasis are either
The level of heritability of measured poorly estimated or ignored by breeding
traits depends on whether the phenotyping programmes (Holland, 2001; Crosbie et al.,
can be repeated across different seasons, 2006). Such assessments of genetic effects
locations and environments. Clustering tar- will inflate predictions of genetic gain. The
get locations into mega-environments and relative merit of MAS will depend on the
comparing these with selection at differ- nature of predictions, actual results and
ent locations has been used to understand costs of alternative methods.
how the target environments for a breed- The importance of genotype-by-
ing programme differentiate the germplasm environment interaction (GEI; discussed
with respect to yield and other agronomic in detail in Chapter 10), as a bottleneck
traits (e.g. Rajaram et al., 1994; Lillemo in marker-assisted breeding (MAB), has
et al., 2004; Chapter 10). Cross-population been recognized because it affects both the
and environment comparison of phenotyp- power of QTL detection and the response
ing will determine how the MTAs identi- to MAS. To evaluate QTL by environment
fied under one environment can be used for interaction, precision phenotyping at multi-
selection under another. In this case, well ple location/environment trials is required.
characterized environments and well estab- Selection of suitable locations for pheno-
lished selection criteria are essential pre- typing and accurate estimation of QTL
requisites for the development of a reliable effects across environments are two factors
precision phenotyping system. Precision that determine whether the QTL identi-
and high-throughput multi-locational phe- fied can be used for effective MAS. Also in
notyping, together with effective sampling MTA, either through linkage mapping or LD
and data acquisition systems being devel- mapping, QTL-by-environment interaction
oped for many traits, provides the potential effects should also be incorporated in to the
to develop a phenomics-based protocol for statistical model for MTA.
trait-specific breeding programmes. This
will not only help understand the phe-
notypic profile a plant possesses but also
improve the precision of genetic mapping 9.3 Reducing Costs and Increasing
and thus MAS for the target phenotype. Scale and Efficiency
Tracking samples from the field to the
harvest bags to DNA plates for DNA extrac- Highly abundant single nucleotide poly-
tions, PCR amplification and marker detec- morphism (SNP)-based genic markers pro-
tion and then tracing back to the field plants vide great potential for increasing scale
selected based on the genotyping is a time- and efficiency and thus reducing the cost
consuming and error-prone step, which of MAS because genotyping can be auto-
translates into a large proportion of cost as mated. Developments of high-throughput
Marker-assisted Selection: Practice 345

genotyping platforms are largely driven by molecular markers depends on marker type
human and animal research and applications. and its capacity in high-throughput analy-
However, there are many commonalities sis. With well-established marker systems
among MAB of livestock and human health and sequencing facilities, genotyping with
diagnostics, which will provide many impor- simple sequence repeat (SSR) markers costs
tant spillovers for molecular plant breeders. about US$0.300.80 per data point, depend-
The feasibility of marker-assisted approaches ing on marker multiplexing and the number
for plant breeding is heavily influenced by the of markers genotyped for each sample (Xu
relative cost (in time and money) compared et al., 2002). For example, the lowest cost of
with conventional breeding. SNP analysis in maize is now about US$90
There are several ways to reduce the for 1536 samples.
MAS cost. First, high-throughput analysis
using automated genotyping and data scor-
ing systems will help increase the daily data 9.3.1 Costbenefit analysis
output. Secondly, using the same sample for
selection of multiple traits will reduce the Costbenefit analysis will help us under-
trait-based cost. Thirdly, selection at an early stand which components in the system need
stage of plant development or before plant- to be improved and where the bottlenecks
ing and an early stage of the breeding process for large-scale application of MAS are, as
will minimize the number of plants that need preliminary outputs in this area have been
to be retained so that the overall breeding achieved in maize (Dreher et al., 2003; Morris
cost will come down. Fourthly, optimization et al., 2003) and wheat (Kuchel et al., 2005).
of MAS systems, including facilities and per- This analysis needs to be constantly updated
sonnel, will result in less cost per data point. as new genotyping systems become available
While not truly an inherent limitation and new optimizations are implemented
of the methods involved, one unavoidable in respective genotyping labs. Since many
limitation of MAS is the cost of assembling factors that can reduce cost may influence
and integrating the necessary infrastructure genetic gain, it is essential that costbenefit
and personnel. These can be substantial and analysis modules be integrated into those
beyond the means of many programmes. facilitating the genetic modelling and simu-
For such programmes, implementation of lation of different breeding systems (Wang et
MAS could lead to a delusional or unbal- al., 2003, 2004; Wang, J. et al., 2007).
anced reallocation of resources from vital The economic merit of MAS could
activities such as high-quality phenotypic include situations in which molecular costs
evaluation and selection in the target envir- are more than offset by savings in pheno-
onment (Ragot and Lee, 2007). Currently, typic evaluation. If molecular costs are in
only the largest maize breeding programmes addition to, not in place of, phenotypic
in a given market or region have the scale of costs, the economic merit of MAS will
sales and diversity of products that can jus- become questionable and more difficult
tify and support MAS and withstand some to evaluate. In other cases, the ability to
of the financial burdens of establishing and select early offsets the extra costs that are
replacing components of the system (e.g. associated with MAS. Detailed costbenefit
changes in the methods and platforms for analysis of various elements of DNA marker
detecting DNA polymorphisms). development and application, including the
The economic story from DNA sequen- cost of the required genotyping platforms
cing may tell us what we can expect in and professional expertise, needs to be
terms of cost reduction in marker genotyp- assessed at the earliest possible stage. This
ing. Sequencing cost per finished base was is particularly important at this time when
US$10 in 1990, but was reduced to US$1 most public plant breeding programmes are
in 1996, US$0.10 in 2002 and US$0.01 in not adequately funded or poorly equipped
2006, which is a thousand times cheaper to reach a critical threshold of marker assay
than in 1990. The cost of genotyping using throughput.
346 Chapter 9

Comparative studies exist about the well-defined breeding objective identi-


benefit of MAS versus PS (van Berloo and fication of plants carrying a mutant reces-
Stam, 1999; Yousef and Juvik, 2001a). The sive form of the opaque2 gene in maize that
benefit depends on the heritability of the is associated with Quality Protein Maize
trait and the population size. When the her- (QPM). In addition to generating empiri-
itability is high, the cost involved in geno- cal cost information that will be of use to
typing many plants may not outweigh the CIMMYT research managers, the first stage of
expected benefits from PS. As calculated for the study produced four important insights,
recombinant inbred lines (RILs), a benefit which can be applied in general to other
can be expected with a range of heritability cases of MAS. First, for any given breed-
of 0.10.3 (van Berloo and Stam, 1998). If ing project, detailed budget analysis will be
the value is less than 0.1, it is not possible to needed to determine the cost-effectiveness
detect the QTL with the accuracy required of MAS relative to PS methods. Secondly,
to rely on flanking markers for selection. direct comparisons of unit costs for pheno-
In considering the use of molecu- typic and genotypic analysis provide use-
lar markers to improve the mean value of ful information for research managers, but
a generic trait in a population formed by in many cases, technology choice decisions
crossing two inbred lines, Moreau et al. are not made solely on the basis of cost. For
(2000) show that MAS will be more cost- example, time considerations will often be
effective than PS when the ratio of pheno- critical, since genotypic and phenotypic
typic to genotypic evaluation costs exceeds screening methods may differ in their time
a critical level. A comparison between MAS requirements. Even when real time require-
and conventional greenhouse screening of ments are similar, for applications in which
common beans for resistance to common phenotypic screening requires samples of
bacterial blight showed that the cost of mature grain, genotypic screening often
MAS is about one-third less than that of the can be completed much earlier in the plant
greenhouse test (Yu et al., 2000). growth cycle. Thirdly, the choice between
A study designed to compare the conventional and MAS methods may be
cost-effectiveness of conventional breed- complicated even further because the two
ing and MAB in maize was carried out in are not always direct substitutes. Using
Mexico at CIMMYT (Dreher et al., 2003; molecular markers, breeders may be able to
Morris et al., 2003). This study proceeded obtain more information about what is going
in two stages. In the first stage, costs asso- on at the genotypic level such as genetic
ciated with use of conventional and MAS background than they can obtain using phe-
methods for maize breeding were esti- notypic screening methods. Fourthly, when
mated using a spreadsheet-based budgeting used with empirical data from actual breed-
approach (Dreher et al., 2003). The costs ing programmes, budgeting tools are needed
of initially developing molecular markers to improve the efficiency of existing proto-
linked to the trait of interest were not con- cols and to inform decisions about future
sidered; the analysis assumed that suitable technology choices.
molecular markers were already available. In the second stage of the study (Morris
Field operations and laboratory procedures et al., 2003), the costs associated with the
required for conventional and MAS breed- use of conventional and MAS methods at
ing projects were identified and costed out. CIMMYT were compared for a particular
Sensitivity analysis was then performed breeding application: introgressing an elite
to determine how the costs of field opera- allele at a single dominant gene into an elite
tions and laboratory procedures are likely maize line (line conversion). At CIMMYT, nei-
to change with improvements in research ther method shows clear superiority in terms
protocols and/or fluctuations in prices of of both cost and speed: conventional breed-
key inputs. This information was used to ing schemes are less expensive, but MAS-
compare the cost of using conventional based breeding schemes can be completed in
screening methods and MAS to achieve a less time. For applications involving trade-
Marker-assisted Selection: Practice 347

offs between time and money, relative profit- single-seed-based DNA extraction system
ability can be evaluated using conventional will play a significant role in enhancing MAS
investment theory. Private firms, which can efficiency, particularly for traits expressed
raise operating capital by drawing on corpo- late in the cropping season. Compared to
rate cash reserves, floating shares in the stock MAS using DNA extracted from leaves and
market, or borrowing in commercial credit other tissues, seed-based DNA genotyping
markets, have been actively implementing has many advantages, including: (i) identi-
MAS to maximize the net benefits generated fication of desirable genotypes and discard
by their breeding programmes (also profits) of undesirable genotypes before planting;
by opting for technologies that allow them (ii) increasing the speed of breeding cycles
to bring new products into the market faster, by selecting genotypes during the off season;
even if these technologies are more costly to (iii) reducing the time-consuming and error-
implement. In contrast, public plant breed- prone sample collecting step that currently
ing programmes, which are more likely to involves harvesting leaf tissue from plants
face capital constraints in the sense that they in the field or glasshouse which then need
are usually required to operate within their to be retraced when the genotyping data is
budget allocation, have been much slower released; and (iv) saving land because only
to implement MAS. Public breeding pro- selected genotypes (seeds) are planted.
grammes can maximize the returns to their Although DNA extraction from single dried
limited resources by sticking to lower-cost PS seed has been studied in many plant species,
methods, even though this means that breed- most reports focus on destructive protocols.
ing projects will take longer to complete. To develop a comprehensive and operational
For many plant breeding projects, the system for MAS using single-seed-based
relative attractiveness of PS versus MAS will and non-destructive DNA extraction, the
not be in doubt. When switching between PS extracted DNA must have a high quality
and MAS implies a trade-off between time compared to leaf-tissue DNA so as not to
and money, the cost-effectiveness of DNA confound the PCR amplification and detec-
markers depends critically on four param- tion process. Similarly, the quantity of DNA
eters: (i) the relative cost of phenotypic should be large enough for whole genome
versus genotypic screening; (ii) the time sav- genotyping and DNA extraction should
ings achieved using MAS; (iii) the size and be high-throughput, while sampled seeds
temporal distribution of benefits associated should maintain a high level of germination.
with accelerated release of improved germ- A seed DNA-based genotyping sys-
plasm; and (iv) the availability to the breed- tem that is feasible for crop species with
ing programme of operating capital. All four relatively large seeds has been developed
of these parameters can vary significantly in CIMMYT for molecular breeding in maize
between breeding projects, suggesting that (Gao et al., 2008). An optimized genotyping
detailed economic analysis may be needed method using endosperm DNA sampled
to predict in advance which selection tech- from single maize seeds was developed
nology will be optimal for a given breeding (Fig. 9.2), which can be high-throughput
project (Morris et al., 2003). and is generally applicable to different types
of kernels. The seed DNA-based genotyping
method involved excising endosperm pieces
from imbibed maize seeds, then grinding
9.3.2 Seed DNA-based genotyping the pieces into powder in a 96-tube plate
and MAS system using a tissue shaker to improve efficiency.
Sampled seeds were stored in two 48-well
DNA extraction currently represents the plates as a unit for facilitating trace data
single largest cost in most MAS pipelines from desirable genotypes to corresponding
and often presents the rate limiting step candidate seeds. Using the seed DNA-based
for scale-up of the whole process. Develop- genotyping method, the DNA extraction
ment and optimization of a non-destructive process and following genotyping can be
348 Chapter 9

1 Soaking 2 Sampling 3 Grinding

6 Tracking back 5 PCR and genotyping 4 DNA extraction


and planting

Fig. 9.2. Flowchart of large-scale seed DNA-based genotyping system. From Gao et al. (2008) with kind
permission of Springer Science and Business Media.

done in 96-well plates using regular extrac- changing population sizes and selection pres-
tion buffers, the DNA quality is function- sures to differences in field design and strat-
ally comparable with that of leaf DNA and egies in MAS. Over several breeding cycles,
the DNA amount extracted from 30 mg of this is likely to result in accelerated gain and
endosperm is sufficient for up to 200400 improved efficiency. Another advantage of
agarose gel-based markers and several mil- seed DNA-based genotyping is that genotyp-
lion chip-based SNP markers. By compar- ing can continue until at least a minimum
ing endosperm and corresponding leaf number of desirable genotypes are identified.
DNA of an F2 population, genotyping errors This means that target genotypes can be iden-
caused by pericarp contamination and het- tified by genotyping populations as small as
ero-fertilization averaged 3.8% and 0.6%, possible, saving the cost of sampling all avail-
respectively, depending on the SSR markers able plants in the field while avoiding the
used. Endosperm sampling did not affect risk that no desirable genotypes can be found
germination rates under controlled con- with available plants in the field (as there is
ditions, while under field conditions the no way to go beyond the plants that have been
germination rate, seedling establishment planted), compared to leaf DNA-based geno-
and normalized different vegetative index typing. For example, a theoretical proportion
(NDVI) were significantly lower than that of of homozygotes at n target loci in an F2 popu-
controls for some genotypes. Careful field lation is (1/4)n and thus for three loci, 1/64
management could compensate for these plants in the population will have the desir-
slight effects on germination and seedling able genotypes. As seed DNA-based genotyp-
establishment. Seed DNA-based genotyping ing can stop at any stage once the suitable
lowered costs by 24.6% compared to leaf amount of target seeds that carry desirable
DNA-based genotyping due to reduced field genotype has been identified, the number
plantings and labour costs. of seeds that have to be genotyped could be
As seed DNA-based genotyping can much less than, or in the worst case equal to,
be processed before planting, for example the number of plants that have to be planted
selecting on F2 seeds harvested from an F1 in the field. For leaf DNA-based genotyping,
plant, it is possible to select desirable geno- to ensure a 99% probability of obtaining at
types before planting. This has a potentially least one desirable genotype, a minimum
large impact on breeding programmes, from number of plants that have to be planted is
Marker-assisted Selection: Practice 349

log(1 0.99)/log(1 1/64) = 292. As the 9.3.3 Integrated diversity analysis,


number of target loci increases, the mini- genetic mapping and MAS
mum number of plants that have to be
planted in the field will go beyond the capa- Genetic mapping and MAS usually involve
city of most current breeding programmes. multiple consecutive steps from develop-
All these factors have significant impacts ment of mapping populations, genetic map-
on procedures, methods and strategies for ping and marker validation to MAS. New
MAS that have been well established for the multipurpose methodologies are emerging
leaf DNA-based genotyping, simplifying the that facilitate the integration of genetic diver-
process and improving breeding efficiency. sity analysis, MTA analysis, MAS validation
An important next step is a comprehensive and application within a single breeding pro-
modelling and analysis of all aspects of this gramme context. These methodologies rely
genotyping method, as have been done for on utilization of multiple approaches such as
leaf DNA-based MAS under the assumption LD analysis using a set of diverse genotypes,
that selection is made after planting since advanced backcross QTL (AB-QTL) mapping
the pioneer study by Lande and Thompson (Tanksley and Nelson, 1996) and mapping-
(1990). This also needs to incorporate both as-you-go (MAYG) (Podlich et al., 2004), so
negative factors such as hetero-fertilization, that various steps in the process can be inte-
endosperm triploidy and potential pericarp grated. In the MAYG approach, estimates of
contamination and positive factors such as QTL allele effects are continually revised
reduced labour time and selection of desir- by remapping new elite germplasm gener-
able genotypes before planting. ated over cycles of selection, thus ensuring
There are several issues to be consid- that QTL estimates remain relevant to the
ered before the seed DNA-based genotyping current set of germplasm in the breeding
developed in maize can be extended to other programme. The integration of genetic map-
crops (Gao et al., 2008). First, crops should ping and MAS offers two major advantages:
have relatively large seeds with at least 810 (i) ability to carry out MTA analysis using
mg of tissue that can be sampled for DNA breeding populations directly rather than
extraction in order to meet the requirement having to follow time-consuming develop-
of single-seed-based genotyping, particularly ment of genetic populations; and (ii) com-
for agarose gel-based genotyping. Secondly, bining MTA and its validation. This saves
seed texture and tissues (endosperm from time, both in the process itself but also in the
monocots and cotyledons from dicots) generation of the necessary genetic materi-
should be suitable for sampling, or the seed als. However, perhaps most importantly, the
can be soaked without significantly affect- common use of end-user relevant genetic
ing germination rate when the dry seed is materials throughout the process is likely to
too hard for excising. Thirdly, the pericarp dramatically reduce the level of redundancy
contamination can be neglected because it that is commonly experienced when trans-
is at a relatively low level or the pericarp ferring outputs from genetic studies and val-
can be removed easily during sampling. idating them in breeding populations.
Finally, a suitable DNA extraction protocol
may need to be developed for each specific
crop and can be used for crop seeds with a
high percentage of specific chemical compo- 9.3.4 Developing breeding strategies
sitions such as fat, protein and starch. It can for simultaneous improvement
be expected that this approach would, to a of multiple traits
great extent, replace leaf DNA-based geno-
typing that has been used in many crops for Strategy development for multiple trait
intellectual property protection, transgene improvement will includeunderstanding the
detection, genetic testing for cultivar purity correlation between different traits (includ-
and hybridity, gene mapping, genetic diver- ing the interaction between component
sity analysis and MAS. traits of a very complex trait such as drought
350 Chapter 9

tolerance); genetic dissection of the develop- traits requiring testcrossing or progeny


mental correlation between multiple traits; testing, environment-dependent traits and
understanding of genetic networks for cor- seed and quality traits.
related traits; and construction of selection
indices with multiple traits. Much progress
has already been made in this area which 9.4.1 Traits requiring testcrossing
is relevant to drought tolerant crops, e.g. in or progeny testing
maize (Edmeades et al., 2000; Bnziger et al.,
2006) and wheat (Babar et al., 2006, 2007).
Cytoplasmic male sterility and fertility
A MAS kit can be developed to include mark-
restoration
ers associated with a set of key major-gene
controlled traits plus markers evenly cover- Many important crop species, including
ing the whole genome for marker-assisted rice, sorghum and sunflower, depend on
background selection. Several thousands cytoplasmic male sterility (CMS) and its fer-
of well-selected SNP markers can be fitted tility restoration for hybrid seed production.
into a single chip and they can be updated A large amount of testcrossing and progeny
and ultimately replaced by gene-based testing is involved in breeding CMS lines
and functional markers as more and more and their restorers. Testcrossing can start as
genes are identified and functionally char- early as with the F2 generation. F2 plants will
acterized for traits of economic importance. be selected first for other agronomic traits
Selection for multiple traits can be com- and selected plants are testcrossed to main-
pleted in one step as long as the population tainer lines for CMS-maintaining ability or to
is large enough to allow desirable individu- restorer lines for restorability. The testcross
als to combine different traits. However, the progeny will be planted the following sea-
number of trait loci that can be manipu- son for fertility observation. Only the plants
lated in one step is limited as the popula- with complete sterile testcross progeny (for
tion size required to cover the recombinants CMS) or completely fertile testcross pro-
increases exponentially with the increase geny (for restorability) will be moved to the
of the number of traits/loci. To manipulate next breeding procedure. MAS can be used
multiple genes/traits that are beyond the to replace testcross and progeny testing if
population sizes that are amenable, a two- markers for fertility restorability are devel-
stage selection strategy involved two gener- oped. Xu, Y. (2003) listed restorability genes
ations proposed by Bonnet et al. (2005) and that have been associated with molecular
simulated by Wang, J. et al. (2007) can be markers in 12 crop species including maize,
employed, as discussed in Chapter 8. In this rice, sorghum, wheat, barley, rye, sunflower,
approach, individuals are selected first by oilseed rape, sugarbeet and onion. Some of
all target markers for both homozygous and them have been cloned (e.g. Desloire et al.,
heterozygous forms to obtain a subset of 2003; Koizuka et al., 2003; Komori et al.,
population that contain higher frequencies 2004; Wang, Z. et al., 2006) and many more
of the target alleles so that a much smaller cloning studies would be expected. Cloned
population size is required in the following genes provide an opportunity of developing
generation to obtain the homozygotes at the genic or functional markers for selection of
target loci. fertility restorability.

Outcrossing
9.4 Traits Most Suitable for MAS
Evolutionary change in plant mating sys-
With currently available molecular mark- tems from outcrossing (cross-pollination) to
ers and genotyping systems, some traits are inbreeding (self-pollination) has occurred
more suitable for MAS than others. Xu, Y. frequently throughout the history of flower-
(2002) evaluated various traits and listed ing plants and has been described as the most
those most suitable for MAS, which include common evolutionary trend in angiosperm
Marker-assisted Selection: Practice 351

reproduction (Stebbins, 1957, 1970). For parental lines (Lucken, 1986). Considering
example, wild rice is frequently cross-polli- all hybrid cereal crops with the CMS system,
nated, while cultivated rice is self-pollinated. measurements for increased outcrossing
Many characters involved in mating system rate will include choice of favourite climate
evolution, such as sizes of floral organs or conditions for seed production; ensuring
amount of pollen produced, are quantitative flowering synchronization of the two par-
in nature. Hybrid seed production depends ents; providing a suitable pollen source;
on the improvement of outcrossing-related developing male sterile lines with desirable
traits and for self-pollinated crops, it might outcrossing traits; supplementary pollina-
involve a reconstruction (or recovery) of the tion; and adjustment of flowering habit and
outcrossing mating system (Xu, Y., 2003). stigma characteristics using growth regula-
Various techniques to produce hybrids tors such as gibberellic acid (Xu, Y., 2003).
have been developed depending on the Many plants are naturally self-
crop, including hand emasculation, roguing pollinated. Their floral structure is adapted
of staminate plants in dioecious lines, use for inbreeding. Breeding parental lines may
of gynoecious or highly female lines, CMS need to completely convert the floral struc-
and genetic male sterility, protogyny, or ture and make them suitable for outcross-
self-incompatibility (Janick, 1998). The rate ing. Outcrossing in rice depends on the
of outcrossing is often the limiting factor capacity of stigmas to receive alien pollen
determining whether a hybrid has potential and the capacity of anthers to emit much
for commercialization: seed cost and price pollen to pollinate other plants in the prox-
are both largely dependent of how easy it imity (Oka, 1988). Linkage between long
is to produce high-quality hybrid seed that exerted stigma and undesirable agronomic
both seed providers and farmers accept. traits in wild rice species is quite strong
Maize was particularly suitable for hybrid and needs to be broken to incorporate these
breeding because of monoecism and the traits into selected genotypes. On the other
simple emasculation techniques practised hand, using the gene eui (elongated upmost
in breeding that allowed for easy inbreed- internode) to correct the panicle enclosure
ing and outbreeding (Simmonds, 1979). The associated with CMS has been used in China
necessity of high seeding rates in highly for high-yielding seed production with the
self-pollinated crops such as rice and wheat minimized gibberellic acid application.
introduces an economic problem: seed This gene has been cloned (Zhu et al., 2006)
production costs must be low enough and and hopefully the gene transfer can be facil-
yield of hybrids in the farmers fields must itated by MAS.
be high enough that farmers can profit from The floral structure of wheat is consid-
purchase and use of hybrid seed and com- ered to be oriented towards cross-pollination
panies can profit from their production and (Wilson, 1968). However, a close examina-
sale (Goldman, 1999). tion of its floral traits clearly indicated that
Yield of hybrid seed is determined by wheat is less suited, in its present form, to
many variables, both genetic and environ- cross-pollination than crops such as maize,
mental. In productive, favourable environ- sorghum and rye (Wilson and Driscoll, 1983).
ments, seed yield from seed set through After review of the status of hybrid wheat,
cross-pollination can approach those of con- Lucken and Johnson (1988) indicated the
ventional self-pollinated cultivars in wheat need for acquiring more knowledge about
(Lucken, 1986) or might be up to 80% of genetic variation of floral biology, including:
inbred lines in rice (Yuan and Chen, 1988; (i) spike and flower morphology; (ii) pollen
Lu et al., 2001). The breeders approach to dispersal, buoyancy, durability and vigour;
high, stable seed production is: (i) to iden- (iii) stigma accessibility, receptivity and
tify those plant and flower features that durability; and (iv) development of selec-
affect cross-pollination; (ii) to find varia- tion screens for these traits.
tion for these traits; and (iii) to incorporate Many factors affecting outcrossing
genes for favourable expression of traits into provide opportunities for MAS. However,
352 Chapter 9

there are very few investigations on genetic Wide compatibility


mapping of traits related to outcross-
ing. Grandillo and Tanksley (1996) exam- Hybridization barriers exist in distant crosses
ined anther length in a backcross between of many crop species to some extent. Because
Lycopersicon esculentum and Lycopersicon the parents are not genetically compatible,
pimpinellifolium. They found two QTL hybrids derived from intersubspecific crosses
affecting this trait, on chromosomes 2 and 7, such as indica japonica in rice are partially
which accounted for only 24% of the pheno- or completely sterile with seed set less than
typic variation. Georgiady et al. (2002) inves- 30%. Some intermediate cultivars have lit-
tigated traits that distinguish outcrossing tle or no hybrid barrier with either indica or
and self-pollinating forms of currant tomato, japonica, which can be called wide-compat-
L. pimpinellifolium. A total of five QTL ible cultivars. The wide compatibility trait
were found involving four traits: total anther can be thus defined as the ability to make
length, sterile anther length, style length and intersubspecific hybrids fertile.
flowers per inflorescence. Each of these four To identify wide compatibility and trans-
traits had a QTL of major (> 25%) effect on fer the related genes to other genetic back-
phenotypic variance. In rice, some genetic grounds, testcrossing and progeny testing are
mapping projects have been undertaken that required, as for fertility restoration. In rice,
target outcrossing. Two QTL for the rate of several sets of testers were carefully selected
exserted stigma in the RILs derived from the for this purpose. A lot of work is involved in
cross between an indica cultivar, Peikuh, testcrossing and progeny testing to find out
and a wild rice, W1944 (Oryza rufipogon the cultivars or plants with wide compat-
Griff.; Uga et al., 2003). Nine QTL for the fre- ibility. Molecular marker-assisted identifica-
quency of stigma exsertion were detected in tion of wide compatibility genes have been
the RILs derived from the cross between a reported (Wang G.W., et al., 2005, 2006; Zhao
japonica cultivar, Asominori, and an indica et al., 2006), which will accelerate and facili-
cultivar, IR24 (Yamamoto et al., 2003). A fur- tate the breeding process by eliminating or
ther QTL analysis was conducted using the minimizing testcrossing and progeny testing.
F2 population between a japonica cultivar, Wide compatibility has been selected in rice
Koshihikari, and a breeding line showing using associated SSR markers.
exserted stigma selected from the back- Heterosis
cross population between IR24 as a donor
and japonica cultivars (Miyata et al., 2007). Exploitation of heterosis or hybrid vigour
A highly significant QTL (qES3), which had to increase crop yields started early in the
been predicted in the RILs of IR24, was con- 20th century with maize. From inbreeding
firmed at the centromeric region on chromo- a number of crop plants including maize,
some 3. qES3 increases about 20% of the George H. Shull developed a perspective on
frequency of the exserted stigmas at the IR24 heterosis that he outlined in a 1908 publi-
allele and explains about 32% of the total cation entitled The composition of a field of
phenotypic variance. A QTL near-isogenic maize. Hybrids provide many advantages in
line (NIL) for qES3 increased the frequency a crop production system. The principal ben-
of the exserted stigma by 36% compared to efit is increased yield. In open-pollinating
that of Koshihikari in a field evaluation. species, one of the most often overlooked
It is anticipated that MAS will provide benefits is uniformity, an element which has
a powerful tool to help fix the outcrossing- allowed for the rapid expansion of produc-
related issues in crops that are naturally tion in many crop plants such as the vegeta-
self-pollinated but have great potential bles. Additional benefits may include stress
in hybrid breeding. Testcrossing required tolerance and pest resistance and other per-
traits, such as stigma longevity and recep- formance characteristics. Breeders of hybrid
tivity and labour-intensive traits, such as crops can react faster and with more options
pollen load, can be selected much more eas- to meet changing markets, customer needs
ily through linked markers. and production demands. Other advantages
Marker-assisted Selection: Practice 353

of hybrids include the ability to combine Typical examples of environment-dependent


useful dominant genes available in differ- traits include photoperiod/temperature sen-
ent inbred lines, to optimize the expression sitivity, environment-induced genic male
of genes in the heterozygous state and to sterility (EGMS) and abiotic and biotic
produce unique traits. stresses. MAS is particularly useful to such
Xu, Y. (2003) discussed four features traits as they can be selected under any
of hybrid breeding associated with hybrid environments through associated markers.
prediction, including selection for hybrid Before MAS, however, MTA has to be estab-
performance, seed production and commer- lished under the environments where the
cialization and grain production. Hybrid phenotype can be expressed, in most cases,
performance depends on genes and their under controlled environments. Controlled
interactions and combinations. Selection for environments can be compared with each
hybrid performance in breeding programmes other or with natural environments. If two
is based on testcrossing and progeny testing. environments mainly differ in one macro-
That is, we breed hybrids through selection environmental variable, they are considered
of parental lines with desirable agronomic contrasting or near iso-environments (NIEs)
traits. To associate the parental phenotype and the standard plot-to-plot variation and
with hybrid performance, breeders have to other residual micro-environmental effects
cross their candidate breeding lines with can be neglected (Xu, Y., 2002). If the two
several testers and from the hybrid prog- environments are from experiments in dif-
eny, to determine if the candidates contain ferent years or locations, it is assumed that
the genes required for hybrids and whether location and year effects do not confound the
the parental combinations produce useful effect of the macro-environmental factor.
hybrids. This indirect selection, based on Some traits need to be measured under
testcrossing and progeny testing, is time-con- NIEs, where plants respond differently.
suming and very expensive. Furthermore, In such cases, one environment imposes
the association between the parental line much less stress on plants than the other,
and hybrid from one cross cannot be used to for example, two environments with nor-
make a prediction about other associations. mal and high temperatures. The effect of
A cross of two extremely low-yielding the stress environment can be measured by
inbreds can give a hybrid with good mid-par- comparing it to a much-less-stress or non-
ent or high-parent heterosis but poor perform- stress environment. A relative trait value
ance, whereas a cross of two high-yielding is then derived from two direct trait values
inbreds might exhibit less mid-parent or high- measured in each environment to ascertain
parent heterosis but nevertheless produce a the sensitivity of plants to the stress. If dif-
hybrid with good performance. High-yielding ferent plants have an identical phenotype
hybrids owe their yield not only to heterosis under the much-less-stress environment
but also to other heritable factors that are not (this is not true for a segregating population
necessarily influenced by heterosis. For effec- in most cases), the direct trait value in the
tive selection, one needs to know the relative stress environment can be used to meas-
importance of each genetic contribution ure sensitivity. When both environments
of heterosis and non-heterosis in individual impose little stress on plants and the plants
hybrids (Duvick, 1999). MAS for heterosis respond differently, however, relative trait
could become possible, as discussed later values should be used (Xu, Y., 2002).
and will facilitate breeding processes through
associated markers. Photoperiod/temperature sensitivity

A typical example for environment-


9.4.2 Environment-dependent traits dependent traits is photoperiod sensitiv-
ity in many plant species that can only be
For some traits, phenotypic expression measured in NIEs, one with short day-length
largely depends on specific environments. and the other with long day-length. Plants
354 Chapter 9

start to flower when specific photoperiod mapped rather than the relative response
and/or temperature conditions are met. In measured under the NIEs. In rice, numer-
hybrid crops, flowering synchronization ous QTL for days-to-heading or -flowering
of two parents is one of the factors influ- have been mapped using molecular mark-
encing hybrid seed production and thus ers but very few of them have been tested
the economic advantage over the inbred under both long- and short-day conditions.
lines/cultivars. To understand photoperiod Using an F2 between japonica Nipponbare
and temperature responses, hybrids and and indica Kasalath, Yano et al. (1997)
their parents must be planted in a variety identified two major and three minor QTL
of environments or NIEs. Genetic study of for heading date. Three of them (Hd1, Hd2
these responses will finally characterize the and Hd3) were identified later as photo-
parental photoperiod-thermo response pat- period sensitivity genes by testing the QTL-
tern and its effect on their hybrids and thus NILs under different day-lengths (Lin et al.,
make hybrid photoperiod-thermo response 2000) and one of them (Hd1) was cloned
predictive. (Yano et al., 2000).
Using a rice double haploid (DH) pop-
ulation between Zhaiyeqing 8 and Jingxi Environment-induced genic male sterility
17, days-to-heading and photo-thermo
sensitivity were investigated in two envi- Male sterility can be induced by specific
ronments (Beijing and Hangzhou, China) environmental factors. An EGMS was
that differ mainly in day-length and tem- first discovered in rice by Shi (1981) from
perature (Xu, 2002). Four chromosomal Nongken 58, a japonica cultivar. The
regions were significantly associated mutant Nongken 58S is sterile when the
with days-to-heading in either or both days are long (> 13.5 h) but becomes fertile
locations, whereas a different locus on when days are short (< 13.5 h). Thus, fertility
chromosome 7 (G397A-RM248) was signif- conversion is triggered by the length of pho-
icantly associated with photo-thermo sen- toperiod. EGMS has also been reported in
sitivity, indicating that the photo-thermo pepper, tomato, wheat, barley, sesame, pea,
sensitivity QTL was independent of the rape and soybean.
QTL for days-to-heading. By evaluating The dependency of male sterility on tem-
days-to-flowering of individual CO39/ perature or photoperiod-temperature inter-
Moroberekan RILs under 10 h and 14 h action requires two different environments in
day-lengths and greenhouse conditions, the breeding and selection process. Breeding
Maheswaran et al. (2000) identified 15 populations have to be planted in one envir-
QTL for days-to-flowering. Only four of onment where the plants will be sterile to
them were also identified as influencing make sure of the presence of sterility genes
response to photoperiod. and in another where the plants will be fer-
Different QTL have been identified tile to confirm the fertility conversion and
using direct and relative trait values and produce seeds. Using associated molecular
in rice, days-to-heading and photoperiod markers, confirmation of fertility conversion
are often controlled by different QTL as involving two environments can be avoided.
discussed above. On the other hand, direct Genetic mapping studies in rice have laid a
and relative traits could share some QTL. foundation for MAS in breeding EGMS lines.
That means days-to-heading and photo- To facilitate incorporation of the tms2 gene
period sensitivity are genetically related to in rice, an SSR marker, RM11, located on
some extent because both traits are related chromosome 7, was identified and found to
to the basic vegetative growth that plants be useful in identifying heterozygous fertile
must achieve to flower. There are QTL map- plants in F2 populations and F3F4 progenies
ping studies undertaken in NIEs, but QTL for selection of progenies in advance (Lu et
were mapped using trait values scored in al., 2004). Lang et al. (1999) reported that
each environment rather than using rela- PCR-based markers were 85% accurate in
tive measures. The traits themselves were identifying tms3 in the juvenile stage.
Marker-assisted Selection: Practice 355

Biotic and abiotic stresses silk emergence. A short ASI means rapid
silk extrusion because time to anthesis is
Breeding of insect and disease resist- little affected by drought.
ance and tolerance to abiotic stresses has
become a worldwide issue. To identify
insect/disease resistance, plants must be 9.4.3 Seed and quality traits
inoculated artificially or naturally, or in
specific environments where the stress Seed traits
exists. Artificial inoculation is impractical
when the insects/diseases are under quar- As a major storage organ of crop seeds, endo-
antine control. On the other hand, evalua- sperms provide humans with proteins, essen-
tion of plant response to different insects/ tial amino acids and oils. An understanding
diseases or different biotypes/strains/races of the inheritance of endosperm traits is
of the same stress agents is very difficult, if critical for the improvement of seed quality.
not impossible, using traditional screening Genetic behaviour in triploid endosperms
methods. is very different from that of the maternal
In traditional breeding programmes, plants that supply assimilates for grain
selection for tolerance to abiotic stress such growth and development. Thus, methods
as salinity, drought and submergence tol- suited for genetic analysis of traits in mater-
erance and lodging resistance can only be nal plants (diploids for most cereal crops)
done in specific environments that are either cannot directly be used for endosperm traits
present at specific locations or created at (Xu, 1997). Any genetic analytical method
well-controlled environments. Selection for for endosperm traits needs to combine
these traits is considered most difficult in a genetic method developed for diploid
breeding programmes. maternal plants with a triploid model pro-
For effective MAS, development of posed for conventional genetic analysis.
suitable selection criteria is critical for both The genetic system controlling endo-
MTA and the following MAS, particularly sperm traits may be much more compli-
for abiotic stresses. Taking drought toler- cated than that which controls traits of the
ance in rice as an example, current know- plant per se. Because the plant provides
ledge on physiology suggests that drought seeds with a portion of their genetic mate-
tolerance depends on one or more of the fol- rial and almost all the nutrients required
lowing components: (i) the ability of roots for growth and development, seed traits
to exploit deep soil water to provide for eva- are genetically affected by both the seed
potranspirational demand; (ii) the capacity nuclear genes and the maternal nuclear
for osmotic adjustment that allows plants genes. In addition, cytoplasmic genes may
to retain turgidity and protect meristems also affect some seed traits through their
from extreme desiccation; and (iii) control indirect effects on the biosynthetic proc-
over non-stomatal water loss from leaves esses of chloroplasts and mitochondria.
(Nguyen et al., 1997). These components are To understand endosperm traits with bio-
generally applicable to other cereal crops. logical accuracy, one should take into
A large number of QTL had been identified consideration maternal genetic effects and
in rice for osmotic adjustment, dehydra- cytoplasmic effects along with the direct
tion tolerance, abscisic acid accumulation, genetic effects of seeds. As seeds initiate
stomatal behaviour, root penetration index, a new generation that differs from their
root thickness, total root number, root maternal plants, some seed traits should
length, total dry root weight, deep root dry be considered as one generation advanced
weight and root pulling force (Zhang et al., over their maternal plants. Genetic analy-
1999). In maize, grain yield under drought sis of endosperm traits should be based
stress is negatively correlated with the on the DNA extracted from both maternal
anthesis-silking interval (ASI), the differ- plants and endosperm tissues in order to
ence in days between pollen shedding and understand the relative contribution of the
356 Chapter 9

different genetic factors to the variation of types may come from different genetic
endosperm traits (Xu, 1997). In many cases, factors or different alleles from the same
all endosperm traits have been treated the locus. PS for the same trait values may not
same as other traits of the plant, with few result in the same alleles or genes fixed in
reports (Tan et al., 1999) that considered parents.
the generation advancement issue. On the other hand, almost all quality
traits are only measurable at or after the
Hybrid seed traits reproductive stage. MAS will help distin-
guish different genetic loci that contribute
Although F1 plants are uniform, seeds borne to the same quality traits. Methods for non-
on them represent the F2 seed generation destructive extraction of DNA from single
and are expected to segregate for some grain dry seeds, as discussed in Section 9.3 and
characteristics. Major determinants of grain Xu et al. (2009d), provide an opportunity for
quality, for example in cereal crops, are selection of seed traits so that selection can
milling; grain size, shape and appearance; be processed before planting. Early-stage
and cooking and eating characteristics. selection also provides more opportunities
Some grain tissues are of maternal origin for selection of traits with relatively low
and some result from fertilization and union heritability. MAS could be used for early-
of genetically diverse gametes. For exam- stage quality tests or DNA-based quality
ple, the lemma and palea of the rice hull are tests, whereas such tests would be delayed
maternal tissues. Seed size and shape are in a conventional breeding programme
determined by the shape and size of hulls because a relatively large amount of seeds
and the latter is determined by the genotype is required.
of F1 plants. As a result, all F2 seeds borne Genetic contribution to quality comes
on F1 plants have nearly identical dimen- from both parents, but one of them could
sions even though the parents could have be more important than another in some
very different seed sizes. Endosperm is trip- specific situations. Endosperm properties
loid tissue resulting from the union of one might be affected more by female parents
male nucleus with two female nuclei. If the due to the maternal effect, or more by male
parents differ in endosperm traits, these parents due to the xenia effect. The compo-
traits among F2 grains on F1 plants show sition and development of the kernels can
clear-cut segregation (Kumar and Khush, be changed by the nature of pollen. This
1986; Tan et al., 1999). Single seed analysis was first shown by Kiesselbach (1926) as
of a rice hybrid, Shanyou 63, indicated the change of a sweet corn endosperm into
that the amylose content for seeds on a F1 a starchy endosperm after pollination of
plant could range from 8% to 32% when a sweet corn female by a flint endosperm
two parents had 15.8% and 27.2% amylose male. Large xenia effects were observed for
content, respectively. A similar situation sorghum malt quality in the F1 but this was
was reported for barley. If the parents dif- entirely lost in the F2 generation (Wenzel
fer significantly in malting quality charac- and Pretorius, 2000). Curtis et al. (1956)
ters, the grain produced by barley hybrids observed that the germ is markedly influ-
will be heterogeneous and heterozygous for enced in weight, oil and protein content by
characters critical to the malting process both the seed parent and the pollen parent
(Ramage, 1983). of corn, with a pronounced maternal effect.

Quality traits

Many quality traits, including seed traits 9.5 Marker-assisted Gene


discussed above, are genetically controlled Introgression
by multi-genic loci, or by multiple alleles
at a locus because of the triploidy of the As discussed in Chapter 8, major appli-
endosperm. As a result, the same pheno- cations of MAS in plant breeding are the
Marker-assisted Selection: Practice 357

transfer of novel alleles from a donor to tively screened in Mexico on a seasonal


elite germplasm and to pyramid all favour- basis. These traits include parameters asso-
able alleles from different sources into one ciated with root health, foliar diseases and
genetic background. In the former case, factors associated with quality (Table 9.2).
marker-assisted background selection is All MAS is tightly coupled with the exist-
used to eliminate the donor genetic back- ing field breeding operations. The applica-
ground, while in the latter case the back- tion of markers begins by characterizing the
ground selection may not be necessary cross block materials. Parental material is
depending on whether the recipient is best first characterized with markers for known
commercialized or not. Although MAS has genes to identify the parents with favour-
been widely used in the private sector for able alleles, which are then selectively
breeding of both major gene controlled traits combined in crosses (William et al., 2007b).
and MARS for quantitative traits, it has lim- There are numerous examples available in
ited application in the public sector because MAS including its application in gene intro-
of the reasons described in the previous gression and pyramiding for both qualita-
sections. Some significant applications of tive and quantitative traits. Several recent
MAS in plant breeding have been made in reviews provided a good coverage in gen-
several Consultative Group on International eral methods (Xu, 2003) and applications in
Agricultural Research (CGIAR) centres. As several major crops (Dwivedi et al., 2007).
an example, molecular markers are being Reviewing all the details for all traits and
used to facilitate selection at CIMMYT for crops is beyond the scope of this chapter.
a set of traits that have low heritability but In this section, an overview is provided for
high economic value or cannot be effec- marker-assisted gene introgression.

Table 9.2. List of markers along with the chromosomal location of the target genes, currently in use for
MAS at CIMMYT (from William et al. (2007b) with kind permission of Springer Science and Business
Media).

Trait Gene Marker type Chromosome

Resistance to Heterodera avenae Cre1 STS 2BL


Resistance to H. avenae Cre3 STS 2DL
Crown rot Qtl-2.49 SSR 1DL
Flour colour/Pratylenchus neglectus Rlnn-1 STS 7BL
Boron tolerance Bo-1 SSR 7BL
Russian wheat aphid Dn2 SSR 7D
Russian wheat aphid Dn4 SSR 1D
Hessian fly H25 SSR 4A
Stem rust Sr24 STS/SSR 3DL
Stem rust Sr25 STS 7DL
Stem rust Sr26 STS 6AL
Stem rust Sr38 STS 2AS
Stem rust Sr39 STS 2B
Durable leaf and brown rust Lr34/Yr18 STS 7DS
Swelling volume GBSS-null STS 4B
Grain hardness Hardness STS 5ABD
Dough strength Glu1BX STS 1BS
Barley yellow dwarf virus BDV2 STS 7DS
Agronomy Rht-B1b STS 4B
Agronomy Rht-D1b STS 4D
Agronomy Rht8 SSR 2D
Pairing homologue ph1b STS 5B
High protein gene Gpc-B1 STS 6B
358 Chapter 9

9.5.1 Marker-assisted gene inbred lines with limited risk of carrying


introgression from wild relatives along undesirable characteristics. Such an
approach could be beneficial in many crops
Novel alleles and genetic diversity widely although no accounts of its implementation
exist in wild relatives of cultivated plants. have been reported despite the many years
Wild crop relatives are traditionally looked of reports of its successful use in tomato
upon as potential sources of gene(s) for (Tanksley et al., 1996; Bernacchi et al.,
various agronomic traits including resist- 1998a,b; Robert et al., 2001), rice (Xiao et al.,
ance to many pests and diseases that are not 1998) and soybean (Concibido et al., 2003).
available in cultigens, thus making them a Wild relatives of rice within the genus
valuable resource for gene transfer in cul- Oryza are not only a rich source of infor-
tivated species (Tanksley and McCouch, mation on the origins of variation within
1997). Both conventional crossing and the genus but also a viable source of a wide
selection and molecular breeding (MAS variety of agronomically important germ-
and transgenics) have been used to transfer plasm for future breeding in rice and other
pest and disease resistances from wild rela- cereals as well. To fill the gulf between
tives to cultivated crop species. Resistance national research programmes and breed-
gene(s) from wild relatives have facilitated ing applications in developing countries,
large-scale cultivation of crops in disease an international programme, the Generation
or pest endemic regions of the world, i.e. Challenge Programme Cultivating Plant
bacterial blight and grassy stunt virus in Diversity for the Resource Poor (www.
rice, bacterial blight in maize and potato generationcp.org), has been established
and nematodes in many crops. Wild rela- to begin to characterize and utilize a
tives are usually inferior to modern culti- wide spectrum of germplasm collections.
vars with respect to yield and seed quality. Molecular markers have been proven partic-
However, trait-enhancing alleles have been ularly useful for accelerating the backcross-
identified and introgressed into cultivated ing of a gene or QTL from exotic cultivars
species from wild species through marker- or wild relatives into an elite cultivar or
assisted novel allele discovery. The suc- breeding line (Tanksley and Nelson, 1996).
cessful transfer of improved fruit yield and Favourable genes or alleles from wild spe-
processing quality in tomato (Rick, 1974; de cies of rice have been detected after back-
Vicente and Tanksley, 1993; Fulton et al., crossing to elite cultivars (Xiao et al., 1998;
1997; Bernacchi et al., 1998a,b; Fridman Moncada et al., 2001; Septiningsih et al.,
et al., 2000; Yousef and Juvik, 2001b) led to 2003; Thomson et al., 2003). Similarly, this
the realization that wild relatives can con- approach can identify alleles from exotic
tain beneficial genes (in addition to resist- cultivars that result in improved phenotype,
ance to biotic stresses) associated with even though the parent may not possess
yield and seed quality, although these are inferior phenotype for this trait (Tanksley
often phenotypically masked by deleterious and McCouch, 1997; Xu, 1997, 2002).
genes and are thus difficult to identify and McCouch et al. (2007) summarized results
transfer through PS and breeding. from a decade of collaborative research
Concerns about reduced genetic diver- using advanced backcross populations
sity among commercial hybrids and deple- derived from O. rufipogon L. to: (i) identify
tion of genetic diversity in gene pools used QTL-associated improved performance in
in breeding may be partially alleviated by cultivated rice Oryza sativa L.; and (ii) to
successful implementations of MAS. MABC clone genes underlying key QTL of interest.
may revive interest in using essentially They demonstrated that AB-QTL analysis
untapped exotic germplasm as a source of is capable of: (i) successfully uncovering
favourable alleles for improvement of elite positive alleles in wild germplasm that
cultivars (Ragot and Lee, 2007). Very small were not obvious based on the phenotype
and targeted chromosomal segments of of the parent; (ii) offering an estimation of
exotic origin can be introgressed into elite the breeding value of exotic germplasm;
Marker-assisted Selection: Practice 359

(iii) generating NILs that can be used as the because of increases in panicle length, pani-
basis for gene isolation and also as parents cles per plant, grains per plant and grain
for further crossing in a cultivar develop- weight. These improved lines with 9311-
ment programme; and (iv) providing gene- type genetic backgrounds are being used to
based markers for targeted introgression of raise the existing yield potential of super
alleles using MAS. hybrid rice in China (Liang et al., 2004).
Development of exotic genetic libraries, O. grandiglumis (allotetraploid, CCDD
(also known as CSSL, chromosome seg- genome species) is another wild relative
ment substitution line; IL, introgression contributing positive alleles for increased
line; or CL, contig line) is another approach grain yield in rice. In contrast, only 68%
to enhance utilization of wild relatives increase in grain yield was reported when
to expand crop gene pools. These genetic positive alleles from Hordeum spontaneum
stocks provide a well characterized poten- were introgressed into barley. Wild rela-
tial resource for uplifting the yield barriers tives also contributed positive alleles for
through pyramiding beneficial loci and fix- improved grain characteristics in rice (long,
ing of positive heterosis. For example, when slender and translucent grains and grain
tomato introgression lines carrying three- weight), wheat (grain weight and hardness)
independent yield-promoting genomic and barley (grain weight, protein content and
regions were pyramided, the progenies pro- some malt quality traits). Of particular inter-
duced more than 50% greater yield com- est is a locus for grain weight, tgw2, which
pared to controls (Gur and Zamir, 2004). contributed positive alleles from O. gran-
Yoon et al. (2006) reported that several rice diglumis that are independent from unde-
lines outperformed Hwaseongbyeo (approx- sirable effects of height and maturity (Yoon
imately 1 t ha1 increase in grain yield). et al., 2006). In a similar study, Ishimaru
Several grain characteristics including (2003) identified a grain weight QTL, tgw6,
grain weight, were improved after crossing responsible for increased yield potential
an advanced introgression line contain- without any adverse effects on plant type,
ing Oryza grandiglumis segments, HG101 or grain quality in the Nipponbare genetic
(very similar to Hwaseongbyeo) with background. Similarly, alleles from Glycine
Hwaseongbyeo. The above examples dem- soja conveyed 89% increase in grain yield
onstrate that wild relatives contain desira- and improved the protein content in soy-
ble alleles for agronomic traits even though bean (Concibido et al., 2003).
their effect is phenotypically not evident
in wild relatives. It is important that more
emphasis should be given to exploit wild
relatives to identify yield enhancing alleles 9.5.2 Marker-assisted gene
to further raise the yield potential of crop introgression from elite germplasm
cultivars.
Using AB-QTL analysis, yield and grain- Unquestionably, the most pervasive and
quality enhancing alleles from wild relatives direct use of MAS by the private sector has
have been successfully introgressed in rice, been with backcrossing of transgenes into
wheat, barley, sorghum, common bean and elite inbred lines, the direct parents of the
soybean. Dramatic yield advantages have commercial hybrids, particularly in maize
been reported in rice, for example, through (Ragot et al., 1995; Crosbie et al., 2006).
the introduction of two yield-enhancing QTL Currently, the most widely deployed trans-
alleles (yld1.1 and yld2.1) from O. rufipogon genes and combinations thereof (i.e. gene
(AA genome) into 9311 (one of the top per- stacks) are for resistance to herbicides or
forming parental lines used in the produc- insects (e.g. Ostrinia and Diabrotica). As
tion of super hybrid rice in China). This the commercial maize crop of any region,
contributed in excess of 20% yield increases maturity zone, market or country is not yet
in rice; i.e. about 1 t ha1 gain in yield in uniform or homogeneous for any transgene,
some of the newly bred cultivars, largely maize breeders have elected to develop
360 Chapter 9

near-isogenic versions (transgenic and non- showed resistance to late blight. RB has also
transgenic) of elite inbreds and commercial been cloned and transformed into Katahdin,
hybrids in order to satisfy combinations of a highly susceptible potato cultivar. The
licensing agreements, agronomic practices, Katahdin transformed plants with RB
regulatory requirements, market demands showed broad-spectrum resistance against a
and product development schemes (Ragot wide range of late blight isolates (Song et al.,
and Lee, 2007). This has required compa- 2003). Clearly, by having the full sequence
nies to have two parallel maize breeding of the target gene, it should be possible to
programmes, transgenic and non-transgenic. develop a highly efficient low cost assay
In this manner, MABC of transgenes and to system for this trait. The best example of the
a lesser degree, of native genes and QTL use of MAS in commercial barley breeding
for other traits, has expedited the develop- is the barley yellow mosaic virus complex
ment of commercial hybrids. Unless regula- where a variety of different markers have
tory issues change dramatically, MABC will been developed for selection of the rym4
remain the preferred means of delivering and rym5 resistance genes and one, the SSR
transgenes to the market. Bmac0029, is used by many European winter
MABC clearly provides the information barley breeders (Rae et al., 2007). The clon-
needed to reduce the number of generations ing of the rym4/5 locus (Stein et al., 2005)
of backcrossing, to combine (i.e. stack) provides the basis of a diagnostic marker for
transgenes, native genes or QTL into one rym4/5-based virus resistance.
inbred or hybrid quickly and to maximize As reviewed by Dwivedi et al. (2007),
the recovery of the recurrent parents genome MAS coupled with backcross and pedigree
in the backcross-derived progeny. In several breeding methods and field evaluation has
private breeding programmes, MABC has led to reports in the literature of genetic
enabled the number of backcrossing genera- enhancement for resistance to bacterial
tions needed to recover 99% of the recurrent blight (Xa21), gall midge (Gm-6t) and brown
parent genome to be reduced from six to plant hopper (Bph1 and Bph2) in rice; to
three, reducing the time needed to develop leaf rust (Lr19, Lr51 and Yr15) in wheat; to
a converted cultivar by 1 year (Crosbie et al., yellow dwarf virus (Yd2), stripe rust (Yr4)
2006; Ragot et al., 1995). As a line derived and powdery mildew (mlo-9) in barley; and
by MABC can be made to be very similar to to downy mildew (major QTL) in pearl mil-
the original non-converted line, most of its let. The progenies showed the same resist-
attributes, including agronomic perform- ance level as the donor parental lines both
ance, can be assumed to be equal or similar in greenhouse and field evaluations.
to those of the original line.
Marker-assisted gene introgression is
thought to be promising in rice because a
number of rice cultivars are widely grown 9.5.3 Marker-assisted gene
for their adaptation, stable performance introgression for drought tolerance
and desirable grain quality. Chen et al.
(2000) used such an approach to transfer The International Rice Research Institute
the bacterial blight resistance gene Xa21 (IRRI) has several drought-tolerance breed-
into Minghui 63, a widely used parent for ing programmes using identified QTL and
hybrid rice production in China. Ahmadi MAS. QTL affecting root parameters were
et al. (2001) used a similar approach to identified using a rice DH population
introgress two QTL controlling resistance derived from the cross IR64 Azucena.
to rice yellow mottle virus into the cultivar An MABC programme was started to trans-
IR64. Such approaches, however, can only fer the alleles of Azucena (upland rice)
sample a small number of accessions. at four QTL for deeper roots on chromo-
Using PCR-based DNA markers for track- somes 1, 2, 7 and 9 from selected DH lines
ing the RB gene in potato breeding popula- into IR64 (elite rice cultivar) (Shen et al.,
tions, several marker-positive selected lines 2001). The backcross progeny were selected
Marker-assisted Selection: Practice 361

strictly on the basis of their genotype at the inducing less than 40% yield reduction,
marker loci in the target regions up to the performance of testcross hybrids resulting
BC3F2, from which BC3F3 NILs were devel- from MAS was no better than the original
oped and compared to IR64 for the target version of CML274.
root traits. Of the three tested NILs carrying A major QTL on linkage group 2 (LG2)
target 1 (QTL on chromosome 1), one had is associated with increased grain yield
significantly improved root traits over IR64. and harvest index under terminal stress in
Three of the seven NILs carrying target 7 pearl millet cultivar PRLT 2/89-33 (Yadav
(QTL on chromosome 7) alone, as well as et al., 2002). The performance of QTL
three of the eight NILs carrying both targets MAS-derived top cross hybrids (TCH)
1 and 7, showed significantly improved root was compared with that of field-based
mass. Four of the six NILs carrying target 9 TCH. Progenies with the best overall abil-
(QTL on chromosome 9) had significantly ity to maintain under terminal stress envi-
improved maximum root length. ronments were used to generate the TCH
Steele et al. (2006) initiated MABC to and these were compared with randomly
improve drought tolerance into Kalinga III, mated TCH made from randomly selected
an upland indica cultivar. After five back- progenies from the entire population
crosses and conducting over 3000 marker (irrespective of performance under termi-
assays (2548 restriction fragment length nal drought stress). In both cases progenies
polymorphism (RFLP) and 700 SSR) on were selected irrespective of the presence or
323 plants, the NILs were developed and absence of favourable alleles at the putative
evaluated for root traits. The target seg- drought tolerance QTL and evaluated across
ment on chromosome 9 (RM242-RM201) 21 environments (non-stress, terminal
significantly increased root length under stress and gradient stress). The QTL MAS-
both irrigated and drought stress environ- derived hybrids were significantly, but only
ments. Azucena alleles at the locus RM248 modestly, higher yielding both in full and
(below the target root QTL on chromosome in partial terminal stress environments.
7) delayed flowering. However, selection for However, this advantage under stress was at
the recurrent parent allele at this locus pro- the cost of lower yield of the same hybrids
duced early-flowering NILs that are suited under non-stressed environments. The QTL
to upland environments in eastern India. MAS-derived hybrids flowered earlier and
Anthesis-silking interval (ASI) is an had limited effective basal tillers, low bio-
important trait associated with drought tol- mass and high harvest index. All these traits
erance in maize. Ribaut et al. (1996, 1997) are similar to that of the drought tolerant
initiated a major MAB programme to transfer parent thus confirming the effectiveness of
five genomic regions involved in the expres- the putative drought tolerant QTL on LG2
sion of a short ASI from Ac7643 (a drought (Bidinger et al., 2005).
tolerant line) to CML247 (an elite tropical
breeding line). Five genomic regions were
transferred using flanking PCR-based mark- 9.5.4 Marker-assisted gene
ers. Seventy of the best BC2F3 (i.e. S2 lines) introgression for quality traits
lines were crossed with two testers, CML254
and CML274. These hybrids and the BC2F4 Rice
families derived from selected BC2F3
plants were evaluated for 3 years under Rice amylose content, mainly control-
drought stress conditions. The best five led by the wx gene, is a good example of
MABC-derived hybrids yielded, on average, MAS. Ayres et al. (1997) determined the
at least 50% more than the control hybrids relationship between polymorphism at that
under water stress conditions (Ribaut et al., locus and variation in amylose content.
2002b; Ribaut and Ragot, 2007). However, Eight wx microsatellite alleles were identi-
this difference became less marked when fied from 92 long-, medium- and short-grain
the intensity of stress decreased: for a stress US rice cultivars. When used as predictors
362 Chapter 9

of amylose content, these eight alleles Wheat


explained an average of 85.9% of the vari-
ation. The amplified products ranged from Sun et al. (2005) used a novel sequence
103 bp to 127 bp in length and contained tagged site (STS) marker for improving
(CT)n repeats, where n ranged from 8 to polyphenol oxidase (PPO) activity in bread
20. Average amylose content in cultivars wheat. Breeding wheat cultivars with low
with different alleles varied from 14.9% to PPO activity is the best approach to reduce
25.2%. Although the mircrosatellite marker undesirable darkening of bread wheat
was located in the intron of the waxy gene, a based end-products, particularly for Asian
complete association between marker alleles noodles. Based on the sequences of genes
and amylose contents still depends on fully conditioning PPO activity during kernel
understanding other genes involved in the development, 28 pairs of primers were
starch synthesis. developed. One of these markers, PPO18,
To improve the most widely grown mapped to chromosome 2AL, can amplify
hybrid rice, Zhou et al. (2003) successfully a 685-bp and an 876-bp fragment in the
introduced the wx-MH fragment from the cultivars with high and low PPO activ-
restorer line Minghui 63 into the male ster- ity, respectively. QTL analysis indicated
ile line Zhenshan 97B, which was subse- that the PPO gene co-segregated with the
quently transferred to Zhenshan 97A, using STS marker PPO18 and is closely linked
MAS in three generations of backcrossing to Xgwm312 and Xgwm294 on chromo-
followed by one generation of selfing. The some 2AL, explaining 2843% of phe-
introduction of this fragment has greatly notypic variance for PPO activity across
improved the cooking and eating quality three environments. A total of 233 Chinese
of inbred lines and their resultant hybrids, wheat cultivars and advanced lines were
with the agronomic performance essentially used to validate the correlation between
the same as the original maintainer line and the polymorphic fragments of PPO18 and
resultant hybrid. Liu et al. (2006) used MAS grain PPO activity. The results showed
to introgress the Wx-T allele (conferring that PPO18 is a co-dominant, efficient
intermediate amylose content and thus good and reliable molecular marker for PPO
quality) into two widely used maintainers activity and can be used in wheat breed-
(Longtefu and Zhenshan 97) and their rele- ing programmes targeting noodle quality
vant male-sterile lines to generate improved improvement.
indica hybrids. The resulting maintainer
lines and hybrids showed improved cook- Maize
ing and eating quality with no significant
alterations in their agronomic traits. The endosperm of the maize seed has several
Rice with low glutelin content is suit- distinct regions that have different physical
able for patients affected by diabetes and properties. The aleurone is the outer layer
kidney failure. The Lgc-1 locus confers of the endosperm, composed of specialized
low glutelin in the rice grain, located on cells that secrete hydrolytic enzymes dur-
chromosome 2 between flanking markers ing germination. Beneath the aleurone are
(Miyahara, 1999). This trait has been suc- starchy endosperm cells filled with starch
cessfully incorporated into japonica rice and storage proteins, thus creating two
with 9397% selection efficiency using distinct regions the vitreous or glassy
SSR2-004 and RM358 markers (Wang, endosperm and the starchy endosperm. The
Y.H. et al., 2005). Additionally, grain qual- vitreous endosperm transmits light, whereas
ity traits such as 1000-seed weight, kernel the starchy endosperm does not. Typically,
length/breadth ratio, basmati type aroma the endosperm is 90% starch and 10%
and high amylose content have been com- protein (Gibbon and Larkins, 2005). Normal
bined with resistance to bacterial blight maize protein is deficient in two essential
using MABC breeding (Ramlingam et al., amino acids (lysine and tryptophan), has a
2002; Joseph et al., 2003). high leucine:isoleucine ratio and biological
Marker-assisted Selection: Practice 363

value (Babu et al., 2004). A naturally QTL that affect yield. Using backcross breed-
occurring recessive mutant gene opaque2, ing and QTL/marker information, they iden-
observed first in a Peruvian maize landrace, tified a NIL (00170) that when evaluated
gives a chalky appearance to the kernels for yield over 22 environments and for malt
and has improved protein quality due to quality over six environments, produced
increased levels of lysine and tryptophan in yield equal to Baronesse while maintaining
the endosperm (Mertz et al., 1964). However, a Harrington-like malt quality profile. Other
this trait appears to be associated with infe- studies have also reported the development
rior agronomic traits such as brittleness and of lines with improved malt quality: white
increased susceptibility to insect pests. With aleurone colour and high a-amylase content
the discovery of modifier genes that alter (Ayoub et al., 2003) and high in b-glucan and
the soft, starchy texture of the endosperm, fine-coarse difference (Igartua et al., 2000).
maize breeders developed hard endosperm
o2 mutants designated as Quality Protein
Maize (QPM) (Prasanna et al., 2001; Nelson, 9.6 Marker-assisted Gene
2001; Xu et al., 2009d) which have the phe- Pyramiding
notypes and yield potential of normal maize
but maintain the increased lysine content of
Gene pyramiding is the process that brings
o2. Opaque2 is a recessive trait but due to
the genes or alleles dispersed in different
the effect of the modifiers, QPM behaves as a
cultivars into a cultivar/genotype. QTL
quantitative trait. Using SSRs and backcross
pyramiding is an important strategy for
breeding, Babu et al. (2005) developed maize
rebuilding the outputs from reductionist
lines that had twice the amount of lysine and
genomics research into whole traits of value
tryptophan as compared to local cultivars
for crop improvement. Genes can be pyra-
and recovered up to 95% of the recurrent par-
mided through pedigree breeding by crosses
ent genome in two backcross generations.
involving multiple parental lines contain-
ing different favourable alleles or MABC
Barley to introgress those alleles into the same
genetic background. One of the approaches
Malt is a major raw material for the produc-
for the pedigree method is to use NILs. Once
tion of beer. Characters that affect malting
the desirable QTL have been detected, then
quality include malt extract content, a- and
NILs are generated for each QTL in a com-
b-amylase activity, diastatic power, malt
mon elite genetic background and the effect
b-glucan content, malt b-glucanase activ-
of each QTL individually evaluated. The
ity, grain protein content, kernel plumpness
selected NILs containing the most important
and dormancy, all are quantitatively inher-
QTL for the target trait are subjected to pair-
ited and variously influenced by the envir-
wise crosses to pyramid two or more QTL
onment (Zale et al., 2000). There are a few
for one or more target traits. For example, in
barley cultivars with good malt quality that
rice QTL for increased grain number (Gn1)
brewers are reluctant to change from due to
and QTL for reduced plant height [Ph1(sd1)]
their concerns about the resultant changes in
were pyramided in the Koshihikari back-
flavour and brewing procedures. For exam-
ground producing a 23% increase in grain
ple, the goal of the US Pacific Northwest
yield while reducing the plant height by
barley breeding programme is to produce
20% compared with Koshihikari (Ashikari
high yielding NILs that maintain traditional
et al., 2005).
malting quality characteristics but transfer
QTL associated with yield, via MABC, from
the high yielding cv. Baronesse to the North
American two-row malting barley industry 9.6.1 Gene pyramiding for major genes
standard cv. Harrington. Schmierer et al.
(2004) targeted the Baronesse chromosome The great opportunity offered by MAS to
2HL and 3HL fragments presumed to contain select superior lines based on genotype
364 Chapter 9

rather than phenotype becomes clearly into Minghui 63 through transformation.


obvious, particularly in the case of combin- An allele at the Wx locus from Minghui
ing different simple inherited resistance 63 was transferred by MAS to Zhenshan
genes of large effects for a given pathogen 97 to improve cooking and eating quality
in a single genotype (gene pyramiding). of the hybrid, resulting in a new version of
Gene pyramiding is particularly important Zhenshan 97 with medium amylose con-
for disease resistance breeding. It is a use- tent, soft gel consistency and high gelati-
ful approach to the durability or level of nization temperature. The pyramiding of
pest and disease resistances, or to increase Bt, Xa21 and wx genes created an improved
the level of abiotic stress tolerance. Genes Shanyou 63 (He et al., 2002). Other suc-
controlling resistance to different races or cessful examples in rice include improved
biotypes of a pest or pathogen and genes pyramided lines and cultivars contain-
contributing to agronomic or seed quality ing gene combinations for bacterial blight,
traits can be pyramided together to maxi- blast, brown plant hopper, yellow stem
mize the benefit of MAS through simulta- borer and sheath blight. In wheat, powdery
neous improvement of several traits in an mildew (Pm2, Pm4a, Pm6, Pm8 and Pm21)
improved genetic background. Gene pyra- pyramided lines and those with resistance
miding multiple genes for resistance to to Fusarium head blight (six QTL), orange
different diseases can offer great financial blossom midge (Sm1) and leaf rust (Lr21)
rewards through extending the lifespan of were bred through MAS. Resistance to bar-
new cultivars. Such an approach has been ley mild mosaic virus and barley yellow
used for the backcross transfer of QTL for mosaic virus complex and stripe rust has
downy mildew resistance in pearl millet been separately incorporated through MAS
(Witcombe and Hash, 2000). in barley. Many of these pyramided lines
Many reports have been available for showed enhanced resistance to pests and
marker-assisted gene pyramiding although diseases, some even out-yielded the con-
few of them result in the release of com- trols under high disease or pest pressure in
mercial cultivars. Table 9.3 lists some rep- field conditions. In legumes, resistances to
resentative examples from barley, common rust and anthracnose (QTL) have been com-
bean, rice, soybean and wheat, most of bined in common bean.
which are for major genes. Gene pyramiding
includes combinations of genes for resist-
ance to multiple races of the same disease,
genes for resistance to different diseases 9.6.2 Gene pyramiding through
and genes for disease and insect resistance. marker-assisted recurrent selection
In rice, three blast resistance genes have
been pyramided into one cultivar. First, Marker-assisted recurrent selection
three blast resistance genes (Pi-2, Pi-1 and (MARS) schemes and infrastructure have
Pi-4) were mapped on to rice chromosomes been developed for forward breeding of
6, 11 and 12. Gene pyramiding started with native genes and QTL for relatively com-
three NILs, each carrying one of the genes. plex traits such as disease resistance, abi-
After two cycles of crossing and selection, a otic stress tolerance and grain yield (Ribaut
plant containing the three was obtained and and Betrn, 1999; Ragot et al., 2000; Ribaut
it has been used as a source for these genes et al., 2000; Eathington, 2005; Crosbie et al.,
in plant breeding. An integrated breeding 2006). Eathington (2005) and Crosbie et al.
programme including MAS was used to (2006) reported that the rates of genetic
improve an elite hybrid rice, Shanyou 63, a gain achieved through MARS in maize
cross between Zhenshan 97 and Minghui were about twice those of PS in some ref-
63. Xa21, a wide-spectrum bacterial blight erence populations. Marker-only recurrent
resistance gene, was introduced into the selection schemes have been implemented
restorer Minghui 63 by MAS and a Bt gene for a variety of traits including grain yield
that is toxic to stem borer was introduced and grain moisture (Eathington, 2005), or
Table 9.3. Examples of marker-assisted gene pyramiding for resistance to biotic stresses in crops.

Crop and target trait Gene Breeding scheme Marker MAS product Reference

Barley yellow mosaic rym4, rym5, rym9 Simple and complex RAPDs and SSRs DHs carrying rym4, Werner et al. (2005)
virus and barley mild and rym11 crosses using rym9 and rym11
mosaic virus double haploids and those with rym5,
rym9 and rym11
Barley stripe rust QTL (1H, 4H Backcross derived SSRs Introgression lines Richardson et al. (2006)
and 5H) introgression lines carrying 1H, 4H or 5H
individually or in
combinations
Common bean rust Nine major genes Three backcrosses RAPDs Lines combining resistance Faleiro et al. (2004)

Marker-assisted Selection: Practice


(Uromyces appendiculatus) each for rust and to rust and anthracnose
and anthracnose anthracnose
(Colletotrichum
lindemuthianum)
Rice bacterial blight (BB) Xa4, xa5, xa13 Pedigree breeding RFLPs Pyramided lines showing Huang et al. (1997)
and Xa21 broader spectrum of
resistance to BB
Xa7 and Xa21 Pedigree breeding 6 PCR-based Pyramided lines showing Zhang, J. et al. (2006)
markers stronger resistance to
BB than lines with
single genes
Rice bacterial blight (BB), Xa21 and Xa7; Pedigree breeding AFLP 1415, STS P3, Improved Minghui He, Y. et al. (2004)
stem borer (SB), blast, Bt (SB); Pi1, Pi2, M5, 248, RM144, 63 showing broader
and brown planthopper Pi3; and Qbph1 RM224 and Pi2 resistance to BB and
(BPH) and Qbph2 combined resistance
to BB and SB, and
improved Zhenshan 97
showing better resistance
to BPH
Rice bacterial blight (BB), Xa21, Bt and RC7 Pedigree breeding Pc822 (Xa21), Bt Lines carrying three genes Datta et al. (2002)
yellow stem borer (YSB), chitenase (Sb) and RC7 chitinase resistant to BB,
sheath blight (SB) YSB and SB
(Rhizoctonia solani)

365
(Continued )
366
Table 9.3. Continued

Crop and target trait Gene Breeding scheme Marker MAS product Reference

Rice blast (BL) Pi1, Piz-5 and Pita Pedigree breeding RFLPs The pyramided lines Hittalmani et al. (2000)
[Magnaporthae grisea showing better
(Herbert) Borr. resistance to blast
(anamorphe Pyricularia
oryza Cav.)]
Rice blast (BL) and Piz-1 and Piz-5 Pedigree breeding RZ536 and r10 (BL) The pyramids showing Narayanan et al. (2004)
bacterial blight (BB) (BL) and and Xa21 (1.4 kb enhanced
Xa21 (BB) fragment of pC822) resistance to BL
and BB
Soybean corn earworm QTL and Three backcrosses Nine SSRs The pyramid lines Walker et al. (2002)
(CEW) (Helicoverpa Bt (cry1Ac) with a
zea Boddie) detrimental effect

Chapter 9
on larval weights
and on defoliation
by CEW
Soybean corn earworm cry1Ac and QTL Two backcrosses Six SSRs and Lines carrying Walker et al. (2004)
and soybean looper (PI 229358) sequence-specific cry1Ac and QTL
(Pseudoplusia includens) primers cry1Ac alleles resistant to
three lepidopteran
pests
Wheat Fusarium head Six FHB QTL, Two backcrosses gwm533, gwm493 Resistant progenies Somers et al. (2005)
blight (FHB) (Fusarium Sm1 for midge and wmc808 containing
graminearum), orange and Lr21 for leaf chromosome
blossom midge rust segments FHB,
(Sitodiplosis mosellana) Sm1 and Lr21
and leaf rust (Lr21)
Wheat powdery mildew Pm2, Pm4a, Pedigree breeding RAPD and SCAR Lines with Pm2 Wang et al. (2001)
(Erysiphe graminis Pm6, Pm8 markersa and Pm4a
DC. F. tritici Em. Marchal) and Pm21 immune to
powdery mildew
a
RAPD, randomly amplified polymorphic DNA; SCAR, sequence characterized amplified regions.
Marker-assisted Selection: Practice 367

abiotic stress tolerance (Ragot et al., 2000) Hybrid performance can be measured by the
and multiple traits are being targeted heterosis, the performance of a hybrid over
simultaneously. Selection indices were their parental lines.
apparently based on ten to probably more Suppose a breeder has 100 inbreds from
than 50 loci, these being either QTL identi- heterotic group 1 and 100 inbreds from
fied in the experimental population where heterotic group 2. There are 10,000 possi-
MARS was being initiated, QTL identified ble (group 1 group 2) single crosses. For
in other populations, or genes. Marker gen- developing new hybrids, there are 495,000
otypes are generated for all markers flank- possible (group 1 F2 ) (group 2 tester)
ing QTL included in the selection indices combinations and 495,000 possible (group
(Ragot et al., 2000). Plants are genotyped 1 tester) (group 2 F2) combinations, if
at each cycle and specific combinations testcrossing starts from the F2. Due to lim-
of plants are selected for crossing, as pro- ited resources, breeders are unable to test
posed by van Berloo and Stam (1998). all combinations in all environments of
Several, probably three to four, cycles or interest but may test a limited set of sin-
MARS are conducted per year using con- gle crosses and F2 tester combinations.
tinuous nurseries. Results reported in Typically, < 1% of the maize single crosses
these recent communications about pri- tested by a breeder eventually become com-
vate MARS experiments (Ragot et al., 2000; mercial hybrids (Hallauer, 1990). Therefore,
Eathington, 2005) are in sharp contrast to predicting hybrid performance has always
those in earlier publications (Openshaw been a primary objective in all hybrid-
and Frascaroli, 1997; Moreau et al., 2004). breeding programmes. Methods for predict-
As summarized by Ragot and Lee (2007), ing the performance of single crosses would
this selection response can be attributed to: greatly enhance the efficiency of hybrid
(i) rather large sizes of the populations sub- breeding programmes. Development of a
mitted to selection at each cycle; (ii) use reliable method for predicting hybrid per-
of flanking versus single markers; (iii) formance and/or heterosis without generat-
selection before flowering; (iv) increased ing and testing hundreds or thousands of
number of generations from one to four single cross combinations has been the goal
generations per year; and (v) lower cost of of numerous studies using marker data and
marker data points. combinations of marker and phenotypic
data, particularly in maize and rice.

9.7 Marker-assisted Hybrid 9.7.1 Genetic basis of heterosis


Prediction
QTL for heterosis
Hybrid performance largely depends on gen-
eral combining ability (GCA) of the parental Heterosis is a complex physiological phe-
lines and the specific combining ability (SCA) nomenon affected by many factors. Yield is
between the parents. GCA is defined as an the most important trait in crop-based het-
attribute of an inbred line and is measured as erosis analysis. Understanding the genetic
the average performance of all hybrids made basis of heterosis is the fundamental basis for
with that inbred line as a parent. The higher hybrid prediction. Several different hypoth-
the GCA of an inbred, the higher the average eses have been proposed for the explanation
performance of its hybrids. SCA is defined of heterosis. Among these hypotheses, argu-
for specific combinations of parents and is ments focused on the dominance hypothesis
measured by the deviation of the hybrid per- (Davenport, 1908) and the overdominance
formance from the expected performance as hypothesis (East, 1908; Shull, 1908), both of
estimated from the GCA of the parents. As which are based on describing the genetic
a result, hybrid performance is determined effects of single loci. Recent studies have
by its parents GCA and the crosss SCA. indicated that epistasis plays an important
368 Chapter 9

role in genetic control of both quantitative Several investigations were reported


traits and heterosis. The dominance hypoth- later on for genetic analysis of heterosis per
esis proposes that heterosis results from se in rice. Li, Z.K. et al. (2001) investigated
the cancellation of effects from deleterious the genetic basis of heterosis in rice using
recessive alleles, contributed by one par- 254 RILs derived from a cross between
ent, by dominant alleles contributed by the Lemont (japonica) and Teqing (indica)
other parent in the heterozygous F1. This and two backcross and two testcross popula-
hypothesis emphasizes the contribution of tions derived from crosses between the RILs
the dominance to heterosis. The overdomi- and their parents plus two testers (Zhong
nance hypothesis assumes that a specific het- 413 and IR64). As a result, most QTL asso-
erozygous combination of alleles at a single ciated with decreased grain yield and bio-
locus is superior to either of the homozygous mass, or with heterosis in rice appeared to
combinations of the parental alleles at that be involved in epistasis and about 90% of
locus. With development of molecular the QTL contributing to heterosis appeared
markers, QTL mapping in rice and maize to be overdominant. Hua et al. (2002, 2003)
addressed the classical models by breaking designed a mating scheme that generated
down heterosis into Mendelian factors and a fixed or immortalized F2 population,
assessing their modes of inheritance (Stuber using a population of 240 RILs derived
et al., 1992; Xiao et al., 1995; Yu et al., 1997a; from the Zhenshan 97 Minghui 63 cross.
Li, Z.K. et al., 2001; Luo et al., 2001; Hua In this design, crosses were made between
et al., 2002, 2003; Lu, H. et al., 2003). The the RILs chosen by random permutations
evidence showed that both dominance and of the 240 RILs. In each round of permuta-
locus-specific overdominance have a role in tion, the 240 RILs were randomly divided
heterosis, with some involvement of epista- into two groups and lines in the two groups
sis, although the relative contribution of each were paired at random without replace-
of these mechanisms is still unclear. Crow ment to provide parents for 120 crosses.
(1999, 2000) provided a historical review on Three rounds of such random permutations,
the dominance and overdominance hypoth- including 360 crosses, resulted in two con-
eses. Xu, Y. (2003) and Lippman and Zamir clusions. First, all kinds of genetic effects,
(2007) provided a review on all possible including single-locus heterotic effects
hypotheses including epistasis. caused mostly by overdominance and all
In many investigations, genes for yield three forms of digenic interactions (additive
per se and genes for yield-related hetero- by additive, additive by dominance and
sis have been confounded with each other. dominance by dominance) appeared to play
Reports in the 1990s on dominance (Xiao a role in the genetic basis of heterosis in the
et al., 1995), overdominance (Stuber et al., immortalized F2 population. However, the
1992) and epistasis (Yu et al., 1997a) were QTL were not fine mapped, leaving open
based on the use of yield and yield compo- the possibility that, as in maize, the single-
nents per se to measure hybrid performance locus effects were due to pseudo-overdom-
without use of parental lines as a control to inance, rather than true overdominance.
derive values for the mid-parent or better- Secondly, single-locus heterotic effects
parent heterosis. The method of measure- and dominance-by-dominance interaction
ment will identify genes for yield and yield could, together, adequately account for the
components rather than genes for heterosis. genetic basis of heterosis in the F1 hybrid.
For open-pollinated species like maize, To assess the importance of loci with
which has severe inbreeding depression, overdominant (ODO) effects in expression
it is very difficult (if not impossible) to do of heterosis, Semel et al. (2006) employed
side-by-side comparisons of the F1 hybrids NILs, carrying single marker-defined chro-
with their parents. But, theoretically, this mosome segments from distantly related
comparison is absolutely necessary if het- wild species Solanum (Lycopersicon) pen-
erosis rather hybrid performance needs to nellii, to partition heterosis into defined
be measured (Xu, Y., 2003). genomic regions, eliminating a major part of
Marker-assisted Selection: Practice 369

the genome-wide epistasis. They detected thesis, response to stress, transcription reg-
841 QTL for 35 diverse traits. NILs show- ulation and others. They further confirmed
ing greater reproductive fitness are char- the expression patterns of 68.2% SSH-
acterized by the prevalence of ODO QTL, derived cDNAs by reverse Northern blot,
which were virtually absent for the non- while semi-quantitative RT-PCR exhibited
reproductive traits. ODO results from true similar results (72.2%). This suggests that
ODO due to allelic interactions of a single the genes differentially expressed between
gene or from pseudo ODO involving linked hybrids and their parents are involved in
loci with dominant alleles in repulsion. In diverse physiological pathways, which may
their study, although they detected domi- contribute to heterosis in wheat.
nant and recessive QTL for all phenotypic Maize inbred lines B73 and Mo17 pro-
traits but ODO only for the reproductive duce a heterotic F1 hybrid. Based on analysis
traits indicates that pseudo ODO is unlikely with a 13,999 cDNA microarrays, Swanson-
to explain heterosis in NIL, thus they favour Wagner et al. (2006) compared global pat-
the true ODO model, a single functional terns of gene expression in seedlings of the
Mendelian locus, involved in heterosis. hybrid (B73 Mo17) with those of its paren-
tal genotypes. A total of 1367 expressed
Gene expression analysis of heterosis sequence tags (ESTs) were observed to be
significantly differentially expressed, using
Using serial analysis of gene expression an estimated 15% false discovery rate as cut
(SAGE), Bao et al. (2005) surveyed tran- off. All possible modes of gene action were
scripomes in panicles, leaves and roots of observed, including additivity, high- and
a super-hybrid rice (LYP9) in comparison to low-parent dominance, underdominance
its parental inbred cultivar genotypes (93- and overdominance. A total of 1062 of the
11 and PA64s). They identified 595 upregu- 1367 ESTs (78%) exhibited expression pat-
lated and 25 downregulated tags in LYP9 terns that are not statistically distinguish-
that were related to enhancing carbon- and able from additivity while the remaining 305
nitrogen-assimilation, including photosyn- ESTs exhibited non-additive gene expres-
thesis in leaves, nitrogen uptake in roots sion. About 181 of the 305 non-additive
and rapid growth in both roots and pani- ESTs exhibited high-parent dominance, 23
cles. They found massive complementation ESTs showed low-parent dominance, while
at the transcript level that further suggests 44 ESTs displayed underdominance or
that the underlying mechanisms of hetero- overdominance. These results suggest that
sis may not be as simple as have been multiple genetic mechanisms, including
reported from studies of a small number of overdominance, contribute to heterosis. This
genes (Birchler et al., 2003). contrasts with previous studies that reported
Yao et al. (2005) used an interspecific heterosis was due to gene action of only a
hybrid between common wheat (Triticum small set of maize genes (Song and Messing,
aestivum L., 2n = 42, AABBDD) line 3338 and 2003; Guo et al., 2004; Auger et al., 2005).
spelt (Triticum spelta L., 2n = 42, AABBDD) Further analysis of allelic variation in
line 2463, which is highly heterotic both for gene expression in the maize hybrid and
aerial growth and for root-related traits. In its parental lines (B73 and Mo17) identi-
their research they included an expression fied a subset of 27 genes that are differen-
assay using modified suppression subtrac- tially expressed in parental lines. When
tive hybridization (SSH) to generate four the transcriptional contribution of each
subtracted cDNA libraries between the allele from the inbred line was analysed
wheat hybrid and its parental genotypes. in the hybrid, the majority of the differen-
Of the 748 non-redundant cDNAs obtained, tial expression was observed to be due to
465 cDNAs had high sequence similarity to cis-regulatory variation and not due to dif-
GenBank entries in diverse functional cate- ferences in trans-acting regulatory factors.
gories, such as metabolism, cell growth and This suggest a predominance of additive
maintenance, signal transduction, photosyn- expression and a lack of epistatic effects,
370 Chapter 9

as genes subject to cis-regulatory varia- heterotic hybrid. Mapping expression QTL


tion are expected to be expressed at mid- (eQTL) and testing whether there is an asso-
parent, or additive, levels in the hybrids ciation between eQTL and phenotypic QTL
(Stuper and Springer, 2006). Using a 57,000 should be the next logical step. However, it
maize gene-specific long-oligonucleotide can be expected that numerous eQTL will
microarray containing about 32,000 genes be identified across different species and
to study the differential gene expression populations, which would repeat the his-
between a maize hybrid and its parental tory of phenotypic QTL mapping where
genotypes (B73 and Mo17), Scheuring et al. numerous QTL have been identified but
(2006) revealed that at least 800 genes were nothing can be confirmed for their heterotic
expressed at two- to tenfold higher levels in effects. Further research is required to iden-
the hybrid than the parent genotypes. Using tify the QTL that genetically control hetero-
Massively Parallel Signature Sequencing sis and their interactions in gene networks
(MPSS), an open-ended mRNA profiling associated heterotic effects across the whole
technology, of nearly 400 allelic signature genome.
tag pairs, Yang, X. et al. (2006) found 60%
of the genes expressed in meristems of the Prospects on genetic basis of heterosis
hybrid were significantly different in allele-
specific transcript level as compared to the Heterozygosity and its related gene inter-
parental genotypes. This suggests an abun- actions are the primary genetic basis for
dance of cis-regulatory polymorphisms explanation of heterosis because the hybrid
affecting hybrid meristem gene expression. is heterozygous across all genetic loci that
Furthermore, when comparing the expres- differ between the parents. Thus, the degree
sion of the same allele in the hybrid versus of heterosis depends on which loci are het-
inbred parents, they found 50% of the genes erozygous and how within-locus alleles
expressed at a significantly different level. and inter-locus alleles interact with each
Such differences in expression are likely other (Xu, Y., 2003). Interaction of within-
to be attributed to the effect of trans-acting locus alleles results in dominance, partial
factors that differ between the hybrid and dominance, or overdominance, with a theo-
inbreds. While cis-regulatory variation pre- retical range of dominance degree from zero
dicts additive expression, trans-regulation (no dominance) to larger than 1 (overdomi-
may result in non-additive expression in nance). Interaction of inter-locus alleles
the hybrid. Thus, studying the effect of tran- results in epistasis. Genetic mapping results
script regulation at an allele-specific level have indicated that most QTL involved in
provides a different level of understanding heterosis and other quantitative traits had
of gene regulation than focusing on overall a dominance effect. As statistical methods
expression in the hybrid. that can estimate epistasis more efficiently
As indicated by Lippman and Zamir become available, epistasis has been found
(2007), however, differences in methodology more frequently and proven to be a common
aside, a fundamental problem in these phenomenon in the genetic control of quan-
studies is that they cannot associate novel titative traits including heterosis (Xu, Y.,
expression patterns in hybrids with any 2003). With so many genetic loci involved,
heterotic phenotypes. As too many loci it is unlikely that there is no interaction at
have been revealed to differ between two all between any pair of them.
parental lines, the key issue is to under- It can be concluded that two different
stand further which really matter with the types of allele interaction, both within-
expression of heterosis. This is very much locus and inter-locus, play an important
like the situation where experienced breed- role in the genetic control of heterosis.
ers can tell how many traits could be dif- Contribution of a specific locus to het-
ferent between two parental lines they are erosis could be due to any single type of
working with but they cannot tell how the these interactions. When multiple loci are
differences contribute to production of a involved which were not taken into account
Marker-assisted Selection: Practice 371

in the early 1900s, various combinations of components in hybrid breeding for many
within-locus and inter-locus interactions crops. Introgressing exotic germplasm is
(especially dominance-by-dominance inter- often suggested as an approach to increase
action) could contribute to the genetic con- genetic differences between opposing heter-
trol of heterosis. For a specific cross and otic populations, thereby potentially increas-
specific trait, heterosis might be explain- ing heterotic response. An understanding of
able by any single type of these interactions heterotic relationship between populations
(Xu, Y., 2003). For different crosses, spe- is needed to exploit exotic germplasm intel-
cies, or traits, however, their heterosis has ligently. Melchinger and Gumber (1998)
to be explained by the dominance of differ- reviewed the development of heterotic
ent degrees in combination with all possi- groups in five major crops with different pol-
ble inter-locus interactions, as indicated by lination systems: allogamous maize and rye;
Goldman (1999). A full understanding of partially allogamous faba bean and oilseed
heterosis will depend on cloning and func- rape; and autogamous rice.
tional analysis of all genes that are related to A possible explanation for heterotic
heterosis. This process would be very simi- groups is that populations of divergent
lar to that for understanding disease resist- genetic backgrounds have unique allelic
ance genes that functionally appear much diversity that may have arisen from founder
simpler than heterosis. effects, genetic drift, or the accumulation of
unique allelic diversity by mutation or selec-
tion. Significantly greater heterosis could
result from this genetic diversity by specific
9.7.2 Heterotic groups interallelic interactions (overdominance),
repulsion-phase linkage among loci show-
Heterotic groups are the backbone of suc- ing dominance (pseudo-overdominance)
cessful hybrid breeding. In most cases, (Havey, 1998) and/or inter-locus interaction
breeding for heterosis without knowledge (epistasis). Apparently, the most obvious
of heterotic patterns has proven to be a potential heterotic groups are either geo-
hit-or-miss approach (Jordaan et al., 1999). graphically separated populations or sepa-
The concept of heterotic groups or heterotic rate subspecies and ecotypes. Melchinger
pools was first developed in maize, based and Gumber (1998) recommended the fol-
on the observation that inbreds selected out lowing criteria for the identification of
of certain populations tended to produce heterotic groups and patterns in descend-
better performing hybrids when crossed to ing order of importance: (i) high mean per-
inbreds from other groups (Hallauer et al., formance and large genetic variance in the
1988). This recognition resulted from the hybrid population to ascertain future selec-
systematic crossing of thousands of inbred tion response; (ii) high per se performance
lines from different source populations and and good adaptation of both or at least one
evaluation of the hybrids (Havey, 1998). of the parental heterotic groups; (iii) low
In the review of capturing heterosis in for- inbreeding depression in the source mate-
age crop cultivar development, Brummer rials for the development of inbreds; and
(1999) indicated that the key to successful (iv) a stable CMS system without deleteri-
semi-hybrid production is to keep heterotic ous side effects, as well as effective restorers
groups separate, only intercrossing them for and maintainers, if hybrid breeding is based
testing and release. Breeding highly heter- on CMS.
otic hybrids largely depends on selection
of desirable parents as a prerequisite for Construction of heterotic groups based
most hybrid breeding programmes and thus on hybrid performance
depends on genetic diversity in the germ-
plasm resources available to plant breed- With large numbers of inbred or open-
ers. Therefore, construction or development pollinated lines or populations available, it
of heterotic groups has been one of the key is not feasible in most crops to make diallel
372 Chapter 9

crosses and produce sufficient F1 seed for genetically balanced sets of crosses, inter-
multi-environment field-testing. Therefore, group hybrids out-yielded the respective
Melchinger and Gumber (1998) suggested intra-group hybrids by 21% in RYD LSC
a multi-stage procedure to identify heter- crosses (Dudley et al., 1991) and by 16% in
otic groups, which consists of the following flint dent crosses (Dhillon et al., 1993).
steps: (i) grouping the germplasm based on In both studies, the percentage of increase
genetic similarity; (ii) selection of represent- in heterosis for yield of inter-group over
ative genotypes (e.g. two or four lines or one intra-group crosses was about twice as large
population) from each subgroup for produc- as for the hybrid yield itself. Most heterotic
ing diallel crosses; (iii) evaluation of dial- grouping reports are for maize, with only
lel crosses among the subgroups together very few on other crops including summer
with parents in replicated field trials; and squash (Anido et al., 2004) and rapeseed
(iv) selection of the most promising cross (Qian et al., 2007).
combinations as potential heterotic patterns Rice might be the only crop where
using the identification criteria. If estab- hybrids are widely grown but very few
lished heterotic patterns are available, using studies on heterotic groupings have been
selected elite genotypes from them as test- reported. Heterosis in rice has been uti-
ers for the production and evaluation of the lized largely through CMS. Fortunately, rice
germplasm to be classified is recommended. breeders in China identified the restorers for
Based on the testcross performance, popula- CMS from geographically distant rice culti-
tions or lines having similar combining abil- vars from South-east Asia and used them in
ity and heterotic response could be merged hybrid rice breeding. This resulted in high
to constitute a new independent heterotic levels of heterosis among intra-subspecies
group, if they behave differently from the (indica indica) hybrids. A large-scale
existing heterotic groups; however, if their screening of diverse CMS maintainers and
behaviour is similar to an existing heter- restorers provided some clue as to heter-
otic group, they could be merged with it to otic pattern. Three ecotypes from different
enlarge its genetic base. Heterotic patterns subspecies, indica, japonica and javanica,
in many crop species have been established have different morphological and physi-
solely based on the large numbers of test- ological characteristics and ecogeographi-
crosses and breeding experience, without cal distribution and, therefore, serve as a
the use of molecular markers. basis for defining distinct heterotic groups
Ron Parra and Hallauer (1997) reviewed (Xu, Y., 2003). As summarized by Yuan
heterotic patterns used in the major maize (1992), heterosis for grain yield in crosses
production regions of the world. Some pat- among the three rice ecotypes has the fol-
terns have had importance in specific pro- lowing trend: indica japonica > indica
duction regions. Others have been exploited javanica > javanica japonica > indica
on several continents, for example, the het- indica > japonica japonica. This mirrors
erotic patterns based on Reid Yellow Dent the current situation of heterotic pools in
(RYD) and Lancaster Sure Crop (LSC) rice. It is well known to hybrid rice breed-
from the temperate USA and Tuxpeo and ers that a high level of heterosis results from
Estacin Tulio Ospina from tropical Mexico crosses between CMS lines bred in China
and South America. Two heterotic groups and restorer lines derived from South-east
from which inbreds commonly are selected Asian indica cultivars, which is the heter-
and used to produce superior maize hybrids otic pattern for indica indica hybrids.
are Iowa Stiff Stalk Synthetic (BSSS) and
derivatives of LSC (Darrah and Zuber, Construction of heterotic groups using
1986; Gerdes and Tracy, 1993). Although molecular marker information
both populations are primarily comprised
of southern dent germplasm, LSC has Molecular markers have been playing an
more northern flint germplasm than BSSS increasingly important role in the construc-
(Smith, 1986; Gerdes and Tracy, 1993). With tion of heterotic groups since the 1990s. Most
Marker-assisted Selection: Practice 373

reports are focused on maize, wheat, barley 8


Old
and canola. Because marker-based groupings Old
6 NSS
reflect the genetic differences among paren- NSS-old
tal lines, they can contribute to parental SS
4
improvement and to effective selection for SS-old

Component 2 (12%)
heterotic hybrids. In general, heterotic groups 2
constructed on the basis of marker informa-
tion match up very well with pedigrees, but 0
have the advantage that missing historical
2
information, such as the incomplete pedigree
information or ambiguous pedigree, will not
4
affect the marker-based method. SS
In maize, different types of molecular 6
markers have been successfully used to dif- NSS
ferentiate heterotic groups with results that 8
8 6 4 2 0 2 4 6 8 10 12
are consistent with pedigree-based grouping
Component 1 (31%)
(Mumm and Dudley, 1994; Liu et al., 1997;
Peng et al., 1998; Wu et al., 2000; Menkir Fig. 9.3. A plot of the inbred scores on the
et al., 2004). Based on heterosis and com- first two principal components from analysis of
bining ability analyses using cultivars from SSR marker profiles of the parents of the maize
different heterotic groups, Peng et al. (1998) hybrids (SS, Stiff Stalk Synthetic inbred line; NSS,
proposed seven heterotic patterns for the Non Stiff Stalk Synthetic inbred line). The large
utilization of maize heterosis. Divergence boundaries distinguish three main groups of lines:
at molecular marker loci has been useful in Old, the old inbred lines used before the formation
of the heterotic groups; the other two groups
assigning maize inbreds to known heterotic
represent SS and NSS inbred lines. The arrows
groups previously established in breed-
indicate the direction of the progression of inbred
ing programmes and the molecular infor- improvement in the SS and NSS heterotic groups.
mation agreed with pedigree information From Cooper et al. (2004) with permission.
(Lee et al., 1989; Melchinger et al., 1991;
Messmer et al., 1993). Side-by-side pheno-
typic evaluation of a sequence of successful Using 160 RFLP markers and 21 wide-
maize hybrids produced by Pioneer Hi-Bred compatibility cultivars and three indica and
International, Inc., representing each dec- three japonica cultivars, Zheng et al. (1994)
ade from the 1930s to present, provides a constructed a dendrogram tree and discussed
description of the phenotypic changes for the potential of wide compatibility in hybrid
a number of key traits that the breeders breeding using indica japonica crosses.
have directly or indirectly changed. Genetic Based on diallel crosses among eight indica
fingerprints of the inbred parents of these lines representing the parents of the best-
hybrids provide a description of the geno- performing commercial rice hybrids grown
typic changes that have occurred in asso- in China, Zhang et al. (1995) studied molec-
ciation with the sustained breeding effort ular divergence and hybrid performance.
(Fig. 9.3; Cooper et al., 2004). Important Their results suggest the existence of two het-
phases can be identified over this period erotic groups within indica, one comprised
of breeding. Initially double-cross hybrids of rice strains from southern China and the
(1920s1960s) were developed. From the other comprised of strains from South-east
1960s there was a relatively rapid transi- Asia. Using two types of molecular markers,
tion to the use of single-cross hybrids, the RFLPs and amplified fragment length poly-
foundation of which was the organization of morphisms (AFLPs), Mackill et al. (1996)
the maize germplasm into heterotic groups, obtained similar grouping results. Using
represented in this example by the Stiff RAPD and SSR markers, Xiao et al. (1996b)
Stalk Synthetic (SS) and the Non Stiff Stalk separated the ten parental lines into two
Synthetic (NSS) groups (Fig. 9.3). major groups that correspond to indica and
374 Chapter 9

japonica subspecies. These results and the ing, maintaining and improving heterotic
results from barley (Melchinger et al., 1994) groups. As discussed above, marker-based
and wheat (Sun et al., 1996; Ni et al., 1997) grouping of germplasm and breeding
also supported the conclusion that DNA populations will help establish heterotic
markers are very useful tools for construc- groups that hold maximum genetic diversity
tion of heterotic groups. between groups but minimum diversity
within groups. Identification of marker
Future direction alleles that are specific to each heterotic
group will help keep them genotypically
It is evident from the review of various stud- separated. MAS can be used to improve the
ies that adapted populations, isolated either existing heterotic groups through introgress-
by time and/or space, are the most suitable ing target genes from one heterotic group or
candidates for promising heterotic patterns. outsource germplasm to another with mini-
Genetic diversity can be related to geographic mum linkage drag from the donor.
origin of parental lines. The geographical
variation can be related to ecological and
environmental variations that, in turn, dic-
tate survival fitness, created by spontaneous 9.7.3 Marker-assisted hybrid prediction
and induced genetic variation in natural and
directed-selection situations. Consequently, It is reasonably believed that heterosis
the parental lines derived from different originates, in some way, from the genetic
geographic origins are considered to have differences or heterozygosity between
more genetic diversity than those derived the parents. Theoretically, hybrid per-
from the same geographic origin. During formance is equal to the average parental
internationalization of plant breeding efforts performance plus heterosis. In the past sev-
and massive exchange of unimproved and eral decades, hybrid prediction has been
improved germplasm throughout the world largely based on the evaluation of genetic
attention needs to be paid to avoid the nega- diversity among parental lines. It has been
tive effect of using distant crosses that might expected that understanding the relation-
mix up heterotic groups existing among cul- ship between heterozygosity/parental dif-
tivars of different geographic origins. For ference and heterosis would help predict
example, breeding wide-compatible inbred hybrids. The development of molecular
cultivars as a bridge for harnessing indica/ marker techniques has provided new tools
japonica heterosis in rice has reduced het- for hybrid prediction and DNA markers
erosis compared to what would be expected have been used extensively in investigating
from crosses between typical indica and correlations between parental genetic dis-
japonica cultivars (Xu, Y., 2003). tance (GD) and hybrid performance.
Heterotic groups should not be consid-
ered as closed populations, but should be Genome-wide heterozygosity
broadened continuously by introgressing and hybrid prediction
unique germplasm to warrant medium- and
long-term gains from selection. Heterotic The relationship between parental genetic
groups consisting of poorly utilized and divergence and hybrid performance was
unadapted germplasm should be enhanced first studied in maize. Variability for molec-
through joint publicprivate breeding ven- ular markers generally agreed with pedi-
tures. Different phenotypes may or may gree information and assignment (based on
not reflect divergent genetic backgrounds. hybrid performance) to known heterotic
Phenotypically different populations may groups (Smith, O.S. et al., 1990; Dudley
possess the same genetic background and et al., 1991; Melchinger et al., 1991); how-
divergent phenotypes may be conditioned ever, variability at molecular marker loci
by allelic differences at relatively few loci was ineffective in predicting specific
(Havey, 1998). MAS can be useful in creat- hybrid performance from crosses among
Marker-assisted Selection: Practice 375

maize inbreds (Lee et al., 1989; Melchinger cant and positive for all traits of within-
et al., 1992). Some reports indicated high group hybrids, flint flint crosses, but not
correlation between hybrid performance/ for the subset of flint dent and dent dent
heterosis and parental GDs or the degree of crosses (Boppenmaier et al., 1993). This
heterozygosity (Lee et al., 1989; Smith, O.S. was supported by Benchimol et al. (2000)
et al., 1990; Stuber et al., 1992; Reif et al., using 18 tropical maize inbred lines where
2003), while others revealed very weak cor- correlations of parental GDs with single
relations (Godshalk et al., 1990; Dudley crosses and their heterosis for grain yield
et al., 1991). Correlations between single- were higher for line crosses from the same
cross performance and molecular marker heterotic groups than the crosses from dif-
diversity for unrelated parental inbreds ferent heterotic groups. In rice, Xiao et al.
have been too low to be of any predictive (1996b) reported that yield potential and
value (Godshalk et al., 1990; Melchinger its heterosis showed significantly positive
et al., 1990; Dudley et al., 1991), which is correlations with GD for indica indica or
also supported by the result from sorghum japonica japonica crosses, but the cor-
(Jordan et al., 2004). Molecular-based GD relations were not significant for indica
estimates also failed to predict superior japonica crosses. It was confirmed by Zhao
hybrid performance in oat (Moser and Lee, et al. (1999) that very little correlation was
1994), soybean (Gizlice et al., 1993), chick- detected in intersubspecific crosses using
pea (Sant et al., 1999) and pepper (Geleta diallel crosses derived from 11 elite rice
et al., 2004). A recent large-scale experiment cultivars. In other cases, however, weak
in maize also supported this unpredictabil- or no correlation was found for within-
ity. Using three sets of six sister-line inbred group hybrids. Examples include weak or
lines, each set being highly related and no significant associations of GD with F1
derived from a common parent cross and performance and mid-parent heterosis in
45 sister-line hybrids generated by a partial soybean (Cerna et al., 1997), wheat (Martin
diallel, Lee, E.A. et al. (2007) re-examined et al., 1995) and US long-grain rice cultivars
the relationship between degree of related- (Saghai Maroof et al., 1997). These results
ness, genetic effects and heterosis in maize. may be due to the low levels of heterosis in
The three sets of sister lines ranged between these cultivar groups.
47 and 77% identical-by-descent, creating a Based on results from various studies
series of lines that potentially vary in gene in maize, Melchinger (1993) summarized
frequency. They reported three relevant the relationship between parental GD and
findings regarding heterosis for grain yield: mid-parent heterosis (MPH) in a sche-
(i) substantial genome-wide heterozygos- matic representation. For crosses among
ity is not a requirement for the expression related lines, there exists a tight association
of heterosis; (ii) there is not a consistent between GD and MPH for yield characters
relationship between degree of relatedness because both measures are a linear func-
and the magnitude of heterosis; and (iii) the tion of co-ancestry, f, and thus decrease
presence of non-additive genetic effects is with increasing f. For intra-group crosses,
not a requirement for the manifestation of the correlation r(GD, MPH) is generally
heterosis. positive, too. This can be explained by
hidden relatedness between some parents
Hybrids are more predictable within than considered to be unrelated based on their
between heterotic groups pedigree and the presence of the same link-
age phase between QTL and marker loci in
Correlations between heterozygosity/GD the maternal and paternal gametic arrays
and hybrid performance/heterosis varied of intra-group hybrids, which results in a
for hybrids between lines that belong to positive covariance between GD and MPH
the same heterotic group (within-group (Charcosset et al., 1991). In contrast, no sig-
hybrids). In maize, correlations of GD with nificant association between both measures
F1 performance and heterosis were signifi- exists for inter-group hybrids. In this case,
376 Chapter 9

the maternal and paternal gametic arrays the latter is that from using marker loci that
may differ in the linkage phase for many are significantly associated with the traits of
QTLmarker pairs; as a consequence, posi- interest revealed by single factorial analysis
tive and negative terms cancel each other of variance. The results from rice indicated
in their net contribution to covariance (GD, that there was a weak correlation between
MPH), resulting in a low or zero correlation general heterozygosity and heterosis but
(Charcosset and Essioux, 1994). a significant correlation between specific
heterozygosity and heterosis for yield and
Heterosis-associated markers and hybrid biomass.
prediction
Favourable allele combination and hybrid
It has been common practice in most stud- prediction
ies to determine GD or heterozygosity esti-
mates from a set of DNA markers chosen for Heterogenic gene combinations may not
good coverage of the entire genome but not always lead to heterosis and heterosis
for linkage to genes influencing heterosis of may ultimately depend upon the balance
the target trait. Theoretical investigations between favourable and unfavourable inter-
(Charcosset et al., 1991) and computer mod- actions of genes. It is reasonably inferred that
elling (Bernardo, 1992) demonstrated that heterosis could be caused by specific gene
with intra- and inter-group crosses the cor- combinations derived from the two parents.
relation between GD and MPH is expected Those genes may simultaneously produce
to decrease if genes influencing heterosis different genetic effects in different genetic
are not closely linked to markers used for backgrounds. So, for parental improvement
calculation of genetic estimates and vice and hybrid prediction, investigating the
versa if markers employed for calculation of specific gene combinations that contribute
GDs are not linked to genes controlling the to heterosis should be more important than
trait. Hence, increasing the marker density studying any single gene or QTL. Using 99
alone will not necessarily improve the abil- half-diallel rice hybrids derived from nine
ity to predict MPH by GD estimates; rather, CMS lines and 11 restorer lines, Liu and
markers must additionally be selected for Wu (1998) found that four favourable alle-
tight linkage to genes affecting heterosis les and six favourable heterotic patterns on
of the target trait in the germplasm under the parental lines significantly contributed
study. This is corroborated by comparison to the heterosis of their hybrids for grain
of results obtained with 209 AFLPs versus yield, whereas six unfavourable alleles and
135 RFLPs (Ajmone Marsan et al., 1998) and six unfavourable heterotic patterns signifi-
a study by Dudley et al. (1991). Using these cantly reduced heterosis. They suggested
associative loci will help establish strong that optimal hybrids with superior grain
correlations between heterozygosity and yield could be developed by assembling
heterosis. However, allelic differences at those favourable alleles into and removing
marker loci do not assure allelic differences the unfavourable alleles from their parental
at linked loci for heterosis. For a limited lines.
number of markers to be useful as predic-
tors for hybrid performance, the effects of Conclusions and prospects
alleles at the loci linked to specific marker
alleles must be ascertained (Stuber et al., There are several conclusions that can be
1999). drawn from the numerous investigations
Zhang et al. (1994) proposed two sta- on the relationships between heterozygos-
tistical parameters, general and specific ity and GD with hybrid performance and
heterozygosity, to measure genotypic heterosis. First, the higher the heterozygos-
heterozygosity. The former is the hetero- ity between the parents, the stronger the
zygosity calculated from the GDs between heterosis is. Secondly, using more mark-
the parents using all possible markers and ers alone will not improve the prediction.
Marker-assisted Selection: Practice 377

Thirdly, prediction is possible using mark- the approaches that could be exploited
ers known to be associated with hybrid further to improve the prediction of hybrid
performance or heterosis if the association performance/heterosis using molecular
is used to predict performance of a hybrid markers. Understanding genetic variation
derived from the same heterotic pattern. among cultivars to be tested and identify-
Fourthly, genetic variation (the presence of ing markers associated with heterosis and
heterosis) is a prerequisite for prediction. heterosis-related traits are two important
Fifthly, the relationship of heterozygosity components in hybrid prediction. We should
with heterosis and with hybrid performance keep in mind that markerheterosis associa-
will be different if the two involve differ- tions identified in one cross may not be suit-
ent genes (Xu, Y., 2003). The last conclu- able for selection in others because heterosis
sion was supported by results of Zhu et al. could be controlled by many genes and each
(2001) that heterosis was highly significant cross has different genes and gene combina-
but hybrid performance was not when 57 tions in action.
rice accessions from six ecotypes and their Despite their low values, the inbred-
hybrids were genotyped by 48 SSR and 50 hybrid yield correlations were positive.
RFLP markers. It is anticipated that predic- They indicated a tendency for high-yielding
tion could be possible if heterozygosity is inbreds to produce high-yielding hybrids.
derived from specific marker loci that are Hybrid breeding is always accompanied by
associated with heterosis and hybrid per- the improvement of parental lines. Modern
formance and all possible associated loci maize inbreds, grown at todays high den-
have been identified and their effects and sity, can yield nearly as much as hybrids
interactions clearly defined. of the 1930s (Duvick, 1984; Meghi et al.,
Considering the fact that only heterotic 1984). Duvick (1999) has suggested that if
crosses are of commercial importance and as much effort had been put into improve-
of interest to the breeder, the practical value ment of open-pollinated varieties (OPVs)
of the genetic distance approach for predic- as has been devoted to hybrid improve-
tion of heterosis and hybrid performance is ment over the years, the gap between the
limited (Vuylsteke et al., 2000). This is true best hybrids and the best OPVs might be
for some crop species like maize. For rice, less than what it currently is. Some authors
however, the reproductive barrier between even argue that OPVs might be superior to
the two subspecies, indica and japonica, hybrids (Lewontin and Berlan, 1990), but
has enforced a limitation on the utiliza- their assumption is not backed up by data.
tion of indica/japonica heterosis, although The potential application of DNA mark-
the use of the wide-compatibility gene(s) ers in hybrid breeding depends very much
has had a great impact on the limitation. upon whether divergent heterotic groups
Hybrid breeding for indica rice has been have been established or not and upon crop
based on crosses within the indica group. species. If well-established heterotic groups
The strong relationship between the hetero- are unavailable, marker-based GD estimates
zygosity at marker loci and heterosis within can be used to avoid producing and test-
the indica group as reported before (Xiao ing crosses between closely related lines.
et al., 1996b) indicates that GD estimates Furthermore, crosses with inferior MPH
based on molecular markers could be very could be discarded prior to field-testing
useful in assigning indica cultivars into based on prediction. Another potential
different subgroups for hybrid indica rice application exists. If new lines of unknown
development. heterotic pattern or inbreds developed from
Screening for heterosis-related molecu- crosses between parents from different het-
lar markers as suggested by Melchinger et al. erotic groups (e.g. commercial hybrids) are
(1990b), using specific heterozygosity pro- to be evaluated for testcross performance,
posed by Zhang et al. (1994) and identifying GD estimates could assist the breeder in the
favourable combinations of allele and heter- choice of appropriate testers for evaluating
otic patterns (Liu and Wu, 1998) are among the combining ability of the lines.
378 Chapter 9

9.8 Opportunities and Challenges easier to apply on a large scale, MAS can be
carried out for all genes related to import-
Plant breeding has generally accounted for ant target traits and using information from
one-half of the increases in productivity of genotyping of all germplasm in the breeding
the major crops and the future will continue system.
to depend on its advances. However, the
rate, scale and scope of uptake of genomics
in crop breeding programmes have continu-
9.8.2 Crop-specific issues
ally lagged behind expectations. This is lit-
tle different to the adoption of quantitative
genetics, mechanization and computeriza- The bottlenecks in MAS could be specific to
tion during the last century. This is partly crops except for those discussed in the previ-
due to the long product development cycle ous section. For example, a possible limita-
in plant breeding and in turn the long-term tion of MAS with maize is the structure and
nature of feedback from the market regard- content of various gene pools. Examples of
ing the impact of any changes in the culti- maize gene pools would include European
var development pipeline. Opportunities flint and dent germplasm, US dents and
and challenges we are facing in MAS will various heterotic groups within each of
be discussed in this section. these and other larger pools. Surveys with
DNA markers have established differences
among such groups of germplasm (Smith
and Smith, 1992; Niebur et al., 2004). In
9.8.1 Molecular tools and breeding addition, the efficacy for MAS in relatively
systems complex populations such as synthetics and
OPVs has not been investigated.
The prerequisite for increasing accessibility For open-pollinated crops, breeding for
of MAS to breeders is developing a highly complex traits is limited by an additional
efficient breeding system, particularly bottleneck that there is no standardized
for resource-limited plant breeding pro- protocol available for MAS that can be auto-
grammes in developing countries. Several matically applied to the various breeding
strategies can be used to establish such a systems required for development of inbred,
system through the use of MAS, includ- hybrid, population and synthetic cultivars,
ing: (i) selection at early breeding stages where the material at many stages in the
to eliminate most segregants, particularly breeding process is highly heterogeneous
for highly inheritable traits; (ii) selection and highly heterozygous. This is very dif-
at early developmental stage using high- ferent from breeding systems for inbred
selection pressure and an optimized selec- crops such as wheat that almost always
tion rate, particularly for large-size plants; start and end with inbred lines (Koebner
(iii) one-step selection for multiple traits and Summers, 2003) and rice that may
using high-throughout genotyping; (iv) utili- start with inbreds and ends with inbreds or
zation of cost-effective genotyping systems; inbred-based hybrids (Xu, Y., 2003). Thus,
(v) highly efficient phenotyping, sample MAS efforts in open-pollinated crops can
tracking and data acquisition; (vi) develop- consist of two simultaneous approaches,
ment and utilization of quick fixation and one using the MTAs that have been identi-
stabilization approaches; and (vii) genotyp- fied previously and the other based on an
ing once and phenotyping multiple times. integrated genetic diversity analysis, MTA
To increase accessibility of MAS to breeders, analysis and MAS approach to discover,
the most important thing is to build skills validate and apply new marker associations
and capacity in developing countries and to all in the same breeding populations albeit
develop decision support tools to facilitate at different generations. It is a challenge
MAB programmes. Over the next decade, to make MAS applicable from the earliest
MAS technologies will become cheaper and possible stages of the breeding programme
Marker-assisted Selection: Practice 379

while giving the flexibility to sequentially detect (in the right genetic material) and, be
improve the power of the MAS as data less influenced by GEIs and genetic back-
accumulates and information is integrated ground effects. Of great importance will be
through subsequent breeding processes. a shift away from analysis of entire genetic
populations to an emphasis on selected
individuals with extreme phenotypes from
relevant breeding populations and genetic
9.8.3 Quantitative traits stocks and likely, pooled DNA analysis
using the selected individuals (Xu and
Traditionally the heritability of quantita- Crouch, 2008). Of equal importance will be
tive traits was the most common predictor a shift from linked markers to diagnostic
of genetic gains for different plant breeding gene-based markers, which will generally
methods. DNA markers may be used today be SNP-based and thus readily scalable for
to accelerate and enhance overall breed- high-throughput haplotyping.
ing methods by combining DNA marker
and phenotyping data in a selection index.
Geneticists and plant breeders need to deal
with linkage disequilibrium while using 9.8.4 Genetic networks
MAS in recurrent selection, especially
when using polymorphic markers arising The potential for MAS to contribute to
from mapping populations, which tend to improvements in crops should increase in
be from diverse parents and thus may not parallel with our understanding of the rela-
be relevant for target breeding materials. tionships among genomes, the environment
The power of MAS will also continue to and phenotypes. Candidate transgenes will
rely heavily on the accuracy and precision be developed on a regular basis and their
of phenotyping and the characterization contributions to crop improvement will be
and evaluation of germplasm in the field. realized in the most efficient manner with
Issues such as the error term to test for MAS. Likewise, the identification of candi-
the significance of a QTL, detecting small date native genes and their gene products
effects with narrow genetic variance, or and functions and of other DNA sequences
the number of QTL not related to genetic (e.g. micro-RNA (miRNA), matrix attach-
variance or divergence of parents are all ment and regulatory regions), will improve
under-researched areas that need priority the power of methods such as association
attention by geneticists. Addressing these mapping and genome scans to assess their
issues will allow plant breeders to define genotypic value in the context of defined
the optimum number of individuals/lines reference populations of significance to
and markers to be used in their MAS plant breeding.
programmes. Plants exhibit massive changes in gene
Plant breeders are ready to apply MAS expression during morpho-physiological
for quantitative traits when the genetic gain and reproductive development as well as
and time or cost efficiency from doing so when exposed to a range of biotic and abi-
are clearly higher than through PS meth- otic stresses. A new field of genetics of glo-
ods. Initial emphasis in this area should be bal gene expression has emerged based on
on traits for which a robust cost-effective the application of traditional techniques
phenotyping system is not available. To of linkage and association analysis for
quickly reach this stage requires a para- the thousands of transcripts measured by
digm shift in strategy among the marker microarrays. Dissecting the architecture of
trait identification community: from efforts quantitative traits in this way connects DNA
to identify all QTL influencing the target sequence variation with phenotypic vari-
trait to a focus on identification of a few ation and is improving our understanding
QTL having the largest effect on the target of transcriptional regulation and regulatory
trait. QTL of major effect may be easier to variation (Rockman and Kruglyak, 2006).
380 Chapter 9

9.8.5 Marker-assisted selection Policy options for research, devel-


in developing countries opment and diffusion of the products of
MAS in developing countries depend on
There are many additional factors that will the development objectives and priorities
affect the application of MAS in develop- of the agricultural sector, its various sub-
ing countries. Building the necessary skills sectors and cross-cutting activities deal-
among national programme staff and ensur- ing with science and technology. Dargie
ing those programmes have possession or (2007) discussed the policy considerations
access to sufficient capacity is an essential and options for developing and imple-
prerequisite. menting MAS programmes and projects
Several crop-specific biotechnology net- for developing countries. He considered
works have been established in Asia, Africa three categories of countries with differ-
and Latin America during the 1980s and ent capacities of facilities and personnel.
1990s. Many of these covered a wide range (i) Countries with high-quality personnel
of activities including upstream research and and facilities for phenotypic evaluation
capacity building. Unfortunately, in some and selection and in molecular biology:
cases major donors have pulled out from through the establishment of centralized
further funding of such networks. However, centres of excellence and sectoral/subsec-
all these networks still present an excel- toral institutions, they have the potential
lent basis for the development of molecular to develop and validate molecular markers
breeding communities of practice that can be and apply MAS routinely. (ii) Countries
used to validate, refine and apply new tech- with reasonable capacities for phenotype
nologies in national breeding programmes. evaluation and selection and some capaci-
Conversely in other crops, conventional ties to apply molecular marker methods:
breeding networks have sufficiently matured these countries have less comprehensive
to become prime candidates for the intro- breeding programmes and therefore can
duction of MAS systems and other molecu- cover fewer species. Using regional cen-
lar breeding approaches. However, many of tres of excellence, such as Bioscience
these breeding programmes are not receiving eastern and central Africa (BecA), is an
international development assistance or are option to implement MAS in their breed-
significantly under-funded, which seriously ing programmes. (iii) Countries with lim-
threatens their long-term impact. Molecular ited capacities in phenotypic evaluation
breeding consortia accessing joint venture and selection and no capacities to apply
genotyping hubs or commercial service pro- molecular techniques: their options are
viders appear to be an increasingly realistic to partner with institutions of the CGIAR
option where those facilities can provide the system and other advanced institutions
right quality, quantity and timeline of serv- in developed and developing counties
ice to fit the given breeding system. and import cultivars and advanced breed-
Capacity building will upgrade the skills ing lines developed by these institutions
of participating plant breeders and improve through MAS that contain the needed
the understanding of plant breeding and traits. Establishment of molecular breeding
associated molecular technologies among community of practice will help upgrade
the broader community. As many molecular scientific and technical expertise in molec-
techniques become sufficiently routine, there ular biology itself and in linking molecular
will be many opportunities for scientists to and phenotypic approaches through spe-
profitably shift their attention to experimen- cies and theme-specific networks, work-
tal design, analysis and interpretation as shops, training courses, scientific visiting,
opposed to their current predominant time etc., to implement MAS across these types
contribution to data generation. of developing countries.
10
Genotype-by-environment Interaction

Genotype is defined as an individuals tells us the actual environmental factors such


genetic make-up the nucleotide sequence as maximum temperature, minimum tem-
of DNA that is transmitted from parents to perature, precipitation, sun radiation, etc.
offspring as discussed in Chapter 2. The GEI can rise due to changes in the geno-
phenotypic expression of a genotype depends type, the environment, or both. It is ubiqui-
on environments that may be defined as the tous, occurring for virtually every aspect of
sum total of circumstances surrounding or plant growth and development and touch-
affecting an organism or a group of organ- ing every discipline of biological science.
isms. Cultivars of a crop as genotypes, when A large proportion of biological research in
grown under a wide range of conditions, agricultural science is concerned with study
are exposed to different soil types, fertil- of GEI. Scientists are increasingly aware
ity levels, moisture contents, temperatures, that much scientific inference is conditional
photoperiods, biotic and abiotic stresses and because of GEI. This awareness has led to
cultural practices. As gene expression may greater interest and therefore advances, in
be modified, enhanced, silenced, or timed our understanding of the factors influenc-
by the regulatory mechanisms of the cell in ing plant growth and development. In turn,
response to internal and external factors, the there has been considerable improvement
genotypes (cultivars) may specify a range of in the performance of many crop species
phenotypic expression that are called the (Cooper and Byth, 1996). Nevertheless,
norm of reaction, or plasticity, which is sim- we are far from developing an adequate
ply the expression of variability in the phe- understanding of the factors influencing
notype of individuals of identical genotype adaptation, even for our major agricultural
(Bradshaw, 1965). As a result, one cultivar species. Consequently, there is considerable
may have the highest yield in some environ- opportunity for improvement in strategies
ments and a second cultivar may excel in of plant breeding.
others. Changes in the relative performance The relative performance of geno-
of genotypes across different environments types across environments determines the
are referred to as genotype-by-environment importance of an interaction. There is no
interaction (GEI). GEI must be explained GEI when the relative performance among
from the environmental part, from the geno- genotypes remains constant across environ-
typic part and from both simultaneously. ments. In Fig 10.1a, cultivar A has the same
Environmental characterization using data yield superiority over cultivar B across two
from Geographic Information System (GIS) environments (E1 and E2). No GEI is present

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 381


382 Chapter 10

a b c

A
120 120 A 120
100 100 100 A
Yield

Yield

Yield
80 B 80 B 80
60 60 60 B
40 40 40
20 20 20

E1 E2 E1 E2 E1 E2
Environment Environment Environment

Fig. 10.1. The relative performance of two cultivars (A and B) in two environments (E1 and E2).
(a) No GEI is present. (b) GEI is present but does not alter genotypic ranking. (c) GEI is present and
alters genotypic ranking. Modified from Allard and Bradshaw (1964).

because the yield differential between required for different row spacings, soil
the cultivars is 50 units in both environ- types or planting dates. (ii) The potential
ments proportionality is maintained, that need for unique cultivars in different geo-
is, the difference between any two geno- graphical areas requires an understanding
types in any two environments is the same. of GEI. The importance of this interaction
GEIs can occur in two ways. (i) The differ- can determine if division of a large geo-
ence among genotypes can vary without any graphical area into subareas is needed and
alternation in their rank, which is referred justified for testing new genotypes and
to as non-crossover interaction. In Fig 10.1b, recommending cultivars to crop produ-
a GEI is present because cultivar A yields cers. (iii) Effective allocation of resources
20 units more than cultivar B in environ- for testing genotypes across locations and
ment E1 but 50 units more in environment years is based on the relative importance of
E2. (ii) The rank among cultivars change genotype location, genotype year and
across environments, which is referred to genotype location year interactions.
as crossover interaction (COI). In Fig 10.1c, (iv) The response of genotypes to variable
cultivar A is more productive in environ- productivity levels among environments
ment E1, but cultivar B is more productive provides an understanding of their stabil-
in environment E2. The most important GEI ity of performance. An understanding of
for the plant breeder is the COI caused by the genotype stability across environments
changes in rank among genotypes. helps in determination of their suitability
Existence of GEI has significant influ- for the fluctuations in growing conditions
ence on the efficiency of crop improvement that are likely to be encountered.
via plant breeding, largely because they There are several key areas in GEI study:
confound comparisons among genotypes (i) methodology for effective environmental
with the environment of test and compli- characterization and classification; (ii) strat-
cate the definition of breeding objectives. egies for partitioning GEIs into repeatable
It is argued that to overcome these con- and non-repeatable components; (iii) experi-
straints to crop improvement we need to mental evidence to quantify the relative effi-
develop an understanding of the differ- ciencies of direct selection for target traits
ences in plant adaptation associated with and indirect selection strategies based on
the differences in performance and in par- crop physiological principles; (iv) integrated
ticular the GEI. GEI is of interest to plant utilization of multi-environment trial data,
breeders for several reasons (Fehr, 1987). pedigree information and genotypic data of
(i) The need to develop cultivars for spe- cultivars; and (v) determination of genetic
cific purposes is determined by an under- loci responsible for GEI and molecular dis-
standing of GEI. Unique cultivars may be section of GEI components. Discussion in
Genotype-by-environment Interaction 383

this chapter is mainly based on several sus low rates of inorganic nitrogen fertili-
important references including Feher (1987), zation? The breeder may have a hypothesis
Romagosa and Fox (1993), Knapp (1994), Xu about the answer to the question on the
and Zhu (1994), Cooper and Hammer (1996), basis of practical experience. It is critical
Bernardo (2002), Chahal and Gosal (2002), that the hypothesis should not be regarded
Kang (2002), Crossa et al. (2004), Cooper as factual, an attitude that can bias the
et al. (2005), van Eeuwijk et al. (2005) and interpretation of the experimental results.
Yan et al. (2007). A MET that involves multiple genotypes,
years and locations is usually required. The
GEI is considered to be absent if all geno-
10.1 Multi-environment Trials types perform similarly across all the envir-
onments, i.e. total variation is explained
only by main effects of environments and
A major objective in plant breeding pro-
genotypes.
grammes is to assess the suitability of
The empirical mean response, yij, of the
individual crop genotypes for agricultural
ith genotype (i = 1,2,,I) in the jth environ-
purposes across a range of agro-ecological
ment (j = 1,2,,J ) with r replications in each
conditions. Appropriate experimental pro-
of the IJ cells is expressed as
cedures are required to understand and
determine the importance of GEI. For this _ _
y ij = m + ti + dj + (td)ij + eij (10.1)
purpose breeders conduct so-called multi-
environment trials (METs). In a MET, a set
where m is the grand mean over all genotypes
of genotypes is evaluated across a number
and environments, ti is the additive effect
of environments that hopefully represent
of the ith genotype, dj is the additive effect
the target environment to select widely
of the jth environment, (td )ij is the non-
or specifically adapted genotypes. As an
additivity, GEI, of the ith genotype in the
example, Table 10.1 provides a MET data
jth environment and eij is the (average)
set for 18 winter wheat cultivars tested at
error assumed normally and independ-
nine Ontario locations in 1993 from Yan
ently distributed, i.e. NID(0, s 2/r), where s 2
et al. (2007). The performance of genotypes
is the within-environment error variance,
in METs is analysed by statistical mod-
assumed to be constant.
els developed to describe and interpret
Except for m, all the terms in Eqn 10.1
genotype-by-environment data (GED). The
are usually treated as random effects. To
statistical analysis should provide estimates
provide a complementary framework for a
for parameters that indicate both how well
genetic interpretation of the observed trait
genotypes perform on average across the
variation, we can also consider the trait
environmental range and how well they per-
phenotypic variation as the combination of
form in specific environmental conditions.
a genetic signal component [ti + (td)ij], an
environmental context component (dj) and
an environmental noise component (eij).
10.1.1 Experimental design For the variance-covariance (VCOV) struc-
ture of the error term, eij, various choices
An understanding of the steps involved are possible, the simplest being that eij is
in the design, implementation, analysis independently identically normally dis-
and interpretation of METs can be useful. tributed. The terms of Eqn 10.1 can also be
Planning of any experiment begins with a considered as fixed effects depending on
statement of the concept or hypothesis to be the sampling methods used and the gen-
evaluated, sometimes phrased in the form eral purpose of the study. For example, if
of a question. Is the relative performance environment refers to locations, then they
among genotypes different with conserva- may be considered a fixed effect when they
tion tillage versus conventional tillage? Do are not randomly chosen from all possible
genotypes respond differently to high ver- sites in an area, while if the environment
384 Chapter 10

Table 10.1. Mean yield (Mg ha1) of 18 winter wheat cultivars (G1G18) tested at nine Ontario locations
(E1E9) in 1993 (from Yan et al. (2007) with permission).

Test environments

Genotypes E1 E2 E3 E4 E5 E6 E7 E8 E9 Mean

G1 4.46 4.15 2.85 3.08 5.94 4.45 4.35 4.04 2.67 4.00
G2 4.42 4.77 2.91 3.51 5.70 5.15 4.96 4.39 2.94 4.31
G3 4.67 4.58 3.10 3.46 6.07 5.03 4.73 3.90 2.62 4.24
G4 4.73 4.75 3.38 3.90 6.22 5.34 4.23 4.89 3.45 4.54
G5 4.39 4.60 3.51 3.85 5.77 5.42 5.15 4.10 2.83 4.40
G6 5.18 4.48 2.99 3.77 6.58 5.05 3.99 4.27 2.78 4.34
G7 3.38 4.18 2.74 3.16 5.34 4.27 4.16 4.06 2.03 3.70
G8 4.85 4.66 4.43 3.95 5.54 5.83 4.17 5.06 3.57 4.67
G9 5.04 4.74 3.51 3.44 5.96 4.86 4.98 4.51 2.86 4.43
G10 5.20 4.66 3.60 3.76 5.94 5.35 3.90 4.45 3.30 4.46
G11 4.29 4.53 2.76 3.42 6.14 5.25 4.86 4.14 3.15 4.28
G12 3.15 3.04 2.39 2.35 4.23 4.26 3.38 4.07 2.10 3.22
G13 4.10 3.88 2.30 3.72 4.56 5.15 2.60 4.96 2.89 3.80
G14 3.34 3.85 2.42 2.78 4.63 5.09 3.28 3.92 2.56 3.54
G15 4.38 4.70 3.66 3.59 6.19 5.14 3.93 4.21 2.93 4.30
G16 4.94 4.70 2.95 3.90 6.06 5.33 4.30 4.30 3.03 4.39
G17 3.79 4.97 3.38 3.35 4.77 5.30 4.32 4.86 3.38 4.24
G18 4.24 4.65 3.61 3.91 6.64 4.83 5.01 4.36 3.11 4.48
Mean 4.36 4.44 3.14 3.49 5.68 5.06 4.24 4.36 2.90 4.19

refers to years then they can be considered replicated. In this case, it is an augmented
as randomly chosen. If years and locations design; it is a perfectly legitimate design,
are typically representing a normal combi- although the precision is lower.
nation of years and locations they can be
perfectly considered as random effects.
The genotypes chosen for an assessment
of possible interactions are an important 10.1.2 Basic data analysis
consideration in designing the experiment. and interpretation
Some analyses of GEI are not based on an
experiment specifically designed for that For all MET data, basic analyses should
purpose, particularly the assessment of the include the calculation of mean values,
importance of interactions with locations determination of the statistical significance
and years. Instead, breeders utilize data of the sources of variation and estima-
from test genotypes including cultivars, tion of appropriate variance components.
hybrids, populations and experimental The sources of variation in an experiment
lines that have been evaluated over loca- are partitioned into main effects and their
tions and years as a part of normal testing interactions (Table 10.2). The mean squares
programmes. for the sources of variation are determined
It is desirable to have at least two repli- and appropriate F-tests are conducted to
cations in each location and year to obtain assess the probability that a source of varia-
an estimate of experimental error so that it is tion is significant. Components of variance
possible to test the significance of the inter- can be calculated for the main effect of the
actions of interest. Any additional replica- genotypes and their interactions with the
tions will allow a more reliable estimate of locations and years. Standard errors can be
the experimental error. However, sometimes computed for each variance component.
resources are not available for replicating Data interpretation includes the statisti-
all genotypes so that only some entries are cal significance of various variation sources
Genotype-by-environment Interaction 385

Table 10.2. Analysis of variance for experiments in an annual crop with different numbers of locations
and years (from Johnson et al. (1955) with permission).

Sources of variation Degrees of freedom Expected mean squares

One location in 1 year


Replications r1
Genotypes g1 s e2 + r (s g2 + s gl2 + s gy
2
+ s gly
2
)
Error (r 1)(g 1) s e2
One location in 2 or more years
Years y1
Replications in years y(r 1)
Genotypes g1 s e2 + r (s gy
2
+ s gly
2
) + ry (s g2 + s gl2 )
Genotypes years (g 1)(y 1) s e + r (s gy + s gly )
2 2 2

Error y(r 1)(g 1) s e2


One year at two or more locations
Locations l1
Replications in locations l(r 1)
Genotypes g1 s e2 + r (s gy
2
+ s gly
2
) + rl (s g2 + s gl2 )
Genotypes locations (g 1)(l 1) s e + r (s gy + s gly )
2 2 2

Error l(r 1)(g 1) s e2


Two or more locations in 2 or more years
Years y1
Locations l1
Replications in years and locations yl(r 1)
Years locations (y 1)(l 1)
Genotypes g1 s e2 + r s gly
2
+ rys gl2 + rls gy
2
+ ryls g2
Genotypes years (g 1)(y 1) s e2 + rs gly
2
+ rls gy
2

Genotypes locations (g 1)(l 1) s e2 + rs gly


2
+ rys gl2
Genotypes years locations (g 1)(y 1)(l 1) s e2 + rs gly
2

Error yl(r 1)(g 1) s e2

and their practical implications. The geno- testing programmes. The cost of establish-
type location interaction measures the ing independent programmes for different
consistency of performance among geno- geographical areas is substantial; therefore,
types at different locations. The consistency the decision can be difficult. Before estab-
of performance of genotypes in different lishing independent breeding programmes,
years is indicated by the genotype year the breeder should make a detailed exami-
interaction. The genotype location year nation of the environmental factors respon-
interaction measures the consistency of the sible for the genotype location interaction.
genotype location interaction across years. As suggested by Fehr (1987), if the differ-
For all of these mentioned interactions, an ences among locations are due to soil type
examination of mean values is necessary to or other factors that are consistent from year
determine if a significant interaction is due to year, independent programmes may be
to a change in rank among genotypes or to appropriate. Temporary differences among
changes in the differences among genotypes locations associated with unusual climate
without rank change (ref. Fig. 10.1). conditions would not justify this.
Another consideration in determining
Genotype location interaction the implications of genotype location
interaction is that fluctuations in rank may
Wide fluctuations in the rank of genotypes not preclude selection of superior genotypes
across test locations suggest that it may be for multiple locations. Assume that a group
desirable to develop genotypes for different of genotypes are divided into three classes:
locations through independent selection and good, intermediate and poor. A genotype
386 Chapter 10

location interaction could be caused by fluc- mode was applied to two data sets. Data
tuations in rank among genotypes within the set 1 comprised genotype (25) location
three classes, but not among classes. Such (4) sowing time (4) interaction with eight
an interaction would be unlikely to justify traits measured. The structure of data set 2 is
the establishment of breeding programmes genotype (20) irrigation regimes (4) year
for independent locations, at least for the (3) on grain yield. Their results showed that
initial stages of testing. the three-way AMMI analysis gave sensible
and useful information that have otherwise
Genotype year interaction been unavailable to the breeder in relation
to the differential responses of genotypes in
An inconsistent ranking among genotypes different locations and in several years and
grown in different years is in some regards the different relationship between locations
more difficult to deal with than a genotype in different years.
location interaction. A breeder does not
have the option of establishing independ-
ent breeding programmes for different years
(Fehr, 1987). The primary option available 10.2 Environmental Characterization
is to identify genotypes that exhibit supe-
rior performance on the average across The objectives of GED analysis (i.e. MET data
years. This involves the testing of geno- for a single trait) should include three major
types in several years before selection of aspects: (i) mega-environment analysis;
one for release as a cultivar. To reduce the (ii) test-environment evaluation; and
length of time for genetic improvement, (iii) genotype evaluation (Yan and Kang,
multiple locations in 1 year often are used 2003), all of which are associated with
as a substitute for years. The substitution environmental characterization. Yan et al.
is only effective when the range of climate (2007) use the yield data listed Table
conditions among locations in single years 10.1 as an example to illustrate the three
is comparable to that among years. aspects of bi-plot analysis. When supple-
mental information (e.g. data on environ-
Genotype year location interaction mental or genotypic covariates) is available,
a fourth aspect, which is to understand the
This interaction can first be used to test causes of genotype main effect (G) and GEI
if the genotype location interaction is (Yan and Kang, 2003; Yan and Tinker, 2006)
repeatable across years and thereby mega- can be included as described in Section
environments can be established. It can be 10.3.2.
used secondly when there are fluctuations Environmental characterization in-
in the ranking of genotypes associated with volves definition of the key factors which
individual locationyear combinations. influence both performance level and
Here the breeder must identify genotypes the relative performance of genotypes in
with superior average performance over an experiment, as well as assessment of
locations and years. When METs are per- the relevance of these factors to the target
formed across several years, the interaction environments. This provides a basis for
is referred to as a three-mode (three-way) understanding the results from individual
data array, in which the modes are geno- experiments and predicting their applica-
types, locations and years. By extension tion to elsewhere. This can be extended to
of a two-way additive main effect and include the sociological factors that influ-
multiplicative interaction (AMMI) mode ence the utilization of cultivars by farm-
to a three-way mode, Varela et al. (2006) ers. In the majority of METs conducted
offered us a natural approach for assess- there is no clear definition of the environ-
ing the response in locations and years or mental challenge. Further, in many METs
for studying the multi-attribute response of there is no measurement of how well the
genotypes in environments. The three-way test environments match those of the target
Genotype-by-environment Interaction 387

environments. Different categories of sites ronments. Predictable factors can be evalu-


may be identified to assist environmental ated individually and collectively for their
characterization. Benchmark sites, which interaction with genotypes. Studies have
are intensively monitored, could be useful been made of genotype soil type, genotype
for gaining an understanding of the mixture row spacing, genotype planting date,
of environment encountered in the produc- genotype plant population and genotype
tion system. Cooper and Hammer (1996) fertilization interactions.
listed three strategies for environmental To maximize growers yields, the grow-
characterization: (i) direct measurement of ing region often has to be subdivided into
environmental variables during an experi- relatively homogeneous mega-environments
ment, which is possible but is both time and appropriate genotypes deployed for
consuming and resource intensive and, each of these mega-environments. A mega-
therefore, costly; (ii) quantitative analytical environment is defined as a portion (not
approaches based on statistical methods necessarily contiguous) of a crop species
and simulation models; and (iii) utiliza- growing region with a fairly homogeneous
tion of reference and probe genotypes for environment that causes similar genotypes
specific environmental factors. It should be to perform best. For several reasons, identi-
noted that statistical methods discussed in fying mega-environments has attracted much
the next section can also be used for envi- attention (Gauch and Zobel, 1997). First,
ronmental characterization. interest has grown in providing adapted
materials for marginal environments which
are stressed in a variety of ways that typ-
ically generate large GEIs so genotypes that
10.2.1 Classification of environments win in highly favourable environments may
rank poorly there. Secondly, greater concern
Every factor that is a part of the environ- about long-term soil conservation and about
ment has the potential to cause differential lowering pesticide and fertilizer usages has
performance, i.e. GEI. Environmental fac- stimulated a greater diversity of manage-
tors can be classified as either predictable or ment practices, hence creating a variety
unpredictable factors (Allard and Bradshaw, of mega-environments within major crop
1964). Predictable factors are those that occur regions previously managed in a much more
in a systemic manner or are under human uniform manner. Finally and more gener-
control, such as soil types, planting dates, ally, most plant breeders feel that they are
plant densities, fertilizer rates, tillage prac- exploiting rather than ignoring the poten-
tices, crop rotation patterns. Unpredictable tial for yield increases that resides in GEIs.
factors are those that fluctuate and cannot Genotype evaluation and test-environment
be artificially controlled, including rainfall, evaluation become meaningful only after
temperature and relative humidity. From an the mega-environment issue is addressed.
applied perspective it may also be useful to Mega-environments are broad, usu-
distinguish between environmental factors ally international and frequently transcon-
that can be manipulated by the farmer and tinental, which can be defined by similar
those that cannot. In METs, management biotic and abiotic stresses, cropping system
factors are generally considered as part of requirements, consumer preferences and, for
the environment and often are not explicitly convenience, by a volume of production of
distinguished from climatic, temporal and the relevant crop sufficient to justify its atten-
regional factors. They are considered to be tion. For example, tropical lowland, late-
sampled in combination with the different maturing, white dent maize with relevant
locations and years in the METs. However, disease resistances occupies 3.8 million ha
wherever possible plant breeders can and across 18 countries (Gauch and Zobel, 1997).
often do separate, out of the pool of things This definition encompasses environmental,
called environment, factors identified to be genotypic, geographical and even economic
repeatable and important in the target envi- aspects of mega-environments. Subdivision
388 Chapter 10

of a crops growing region into several mega- mega-environment analysis and GEI ana-
environments implies more work for plant lysis. Type 3 is the most challenging target
breeders and seed producers, but it also environment and, unfortunately, also the
implies higher heritabilities, faster progress most common one. Statistical methods for
for plant breeders and potentially stronger grouping environments involves classifica-
competitiveness for seed producers. On the tion procedure, ordination procedures (i.e.
other hand, the necessary and sufficient con- using coordinates in a graph to depict rela-
dition for mega-environment division is a tionships among environments), or the joint
repeatable which-won-where pattern rather use of a classification and an ordination
than merely a repeatable environment-group- procedure (DeLacy and Cooper, 1990).
ing pattern (Yan and Rajcan, 2002; Yan and Clustering analysis, which can be used
Kang, 2003). Mega-environments are used to for environment classification, usually
allocate resources in a breeding or research involves the creation of hierarchical groups
programme, to rationalize germplasm and of environments just as the same as described
information exchanges between breed- for germplasm classification in Chapter 5.
ing programmes (allowing even small pro- A given environment is more similar to
grammes to progress by focusing on the most an environment in the same cluster than
promising material), to increase heritabilities to an environment in a different cluster,
within relatively well-defined and predict- in terms of genotype rankings, rather than
able environments, to increase the efficiency the physical factors of the environments
of testing and breeding programmes and to per se. The clustering procedure requires
target genotypes to appropriate production some measure of dissimilarity or distance
areas. Many other terms, however, have between environments. Data from METs are
essentially the same meaning, such as agro- typically unbalanced because the genotypes
climatic or eco-geographic regions. and locations often vary from year to year.
Appropriate mega-environment analy- But the statistical distance between two
sis should classify the target environment environments can be determined from the
into one of three possible types (Table 10.3). performance of the subset of genotypes that
Type 1 is the easiest target environment one are grown in both environments (DeLacy
can hope for, but it is usually an overopti- and Cooper, 1990). The distance measures
mistic expectation. Type 2 suggests oppor- for genotypes summarized by Lin et al.
tunities for exploiting some of the GEI. Such (1986) can be used for environment cluster-
opportunities should not be overlooked ing, whereas Ouyang et al. (1995) measured
if they exist, which is the whole point of the distance between j and j' as

Table 10.3. Three types of target environments based on mega-environment analysis (from Yan et al.
(2007) with permission).

With crossover GEI No crossover GEI

Repeatable Type 2: target environment consisting Type 1: target environment


across years of multiple mega-environments. consisting of a single, simple
Strategy: select specifically adapted mega-environment.
genotypes for each mega-environment. Strategy: test at a single test location
A single year multilocation trial may be in a single year suffices to
sufficient. select for a single best cultivar.
Not repeatable Type 3: target environment consisting of a
across years single but complex mega-environment.
Strategy: select a set of cultivars for the
whole region based on both mean
performance and stability based on data
from multiyear and multilocation tests
Genotype-by-environment Interaction 389

g 2 As described in Chapter 5, several meth-


Pij m j Pij ' m j '

1 ods are available for joining individual envi-
D jj ' = s
g i =1
j s j ' ronments and environment clusters together
(10.2) to form a (new) cluster on the basis of Djj'.
g

(t
1 A common procedure is the average linkage
= ij t ij ' )
2
g method, also called the unweighted pair-
i =1
group method using arithmetic averages
where g is the number of genotypes grown (UPGMA) method, in which the distance
in both j and j'; mj (or mj) is the mean of all the between two clusters is equal to the average
genotypes in environment j (or j'); and sj (sj) distance between an environment in the first
is the phenotypic standard deviation among cluster and an environment in the second
all the genotypes in environment j (or j'). cluster. The cluster diagram or dendrogram
When all genotypes grown in environment is used to graphically illustrate the groups
j are also grown in environment j', Eqn 10.2 of environments (Fig. 10.2). The cluster dia-
can be rewritten as (Ouyang et al., 1995) gram indicates the hierarchical clustering of
environments and the average distances at
1
D jj ' = 2 1 1 rjj '
n
( ) (10.3)
which they are joined. The clusters based on
Djj' from METs are often consistent with geo-
graphical groupings (Bernardo, 2002). For
where rjj' is the correlation between environ- example, Ouyang et al. (1995) partitioned
ment j and environment across genotypes. the 90 counties in Iowa on the basis of the
This implies that the distance between two performance of seven maize hybrids grown
environments is Djj' = 0 if the performance in a total of 2006 environments. Cluster anal-
of genotypes are perfectly correlated in the ysis partitioned the counties into a northern
two environments, i.e. rjj' = 1. In contrast, group and a southern group, although two
the distance approaches Djj' = 2 if rjj' = 0. The south-eastern Iowa counties were clustered
distance approaches a maximum of Djj' = 4 with the northern Iowa group (Fig. 10.2).
when COIs occur and rjj' approaches 1. The northsouth groups are consistent with

Fig. 10.2. Cluster analysis of Iowa counties. Adapted from Ouyang et al. (1995); original figure provided
by Rex Bernardo.
390 Chapter 10

differences in days to maturity between the correlation between genotype performance


high altitudes and the low altitudes. The in METs and in the target population of
Iowa counties were further subdivided into environments (TPE). Plant breeders have
a south-eastern cluster, a south-western clus- favoured classifications based on the simi-
ter, a northern cluster and a central cluster. larity of cultivar discrimination in trials.
In the which-won-where view of the However, these efforts frequently fail to
genotype main effect (G) plus genotype- provide adequate assessments of the TPE,
by-environment interaction (GGE) bi-plot since they require long-term performance
(Fig. 10.3) based on the data in Table 10.1, the data, which are not normally collected due
nine environments fell into two sectors with to high cost. To describe the TPE, Lffler
different winning cultivars. Specifically, G18 et al. (2005) performed crop simulations
was the highest yielding cultivar in E5 and for each US Corn Belt Township for the
E7 (but only slightly higher than several other period 19522002, using standard Crop
cultivars with markers in close proximity to Environment Resource Synthesis (CERES)-
G18) and G8 was the highest yielding cultivar Maize model inputs. To classify METs,
in the other environments. This crossover GE input data were collected at or near the trial
suggests that the target environments may be sites. Grain yield and biotic stress data for
divided into different mega-environments. model confirmation were collected from 18
The effectiveness of a cultivar evalua- hybrids grown in replicated trials in 266
tion system largely depends on the genetic environments in 20002002. On the basis

PC1 = 58.9%, PC2 = 19.1%, Sum = 78%


Transform = 0, Scaling = 0, Centering = 2, SVP = 2
0.8 E7
G7
G1 G3 G18 E5
0.4 G11
G2 G5G9
G12
G6
PC2

0.0
G15G16 E2
G14
E1
E4 E3
0.4 G4
G10E6 E9
G17
E8

0.8

G8
G13
1.2
1.2 0.8 0.4 0.0 0.4 0.8 1.2 1.6
PC1

Fig. 10.3. The which-won-where view of the GGE bi-plot based on the G E data in Table 10.1. The
data were not transformed (Transform = 0), not scaled (Scaling = 0), and were environment-centred
(Centering = 2). The bi-plot was based on environment-focused singular value partitioning (SVP = 2)
and therefore is appropriate for visualizing the relationships among environments. It explained 78% of
the total G + GE. The genotypes are labelled as G1G18 and the environments are labelled as E1E9.
From Yan et al. (2007) with permission. PC, principal component.
Genotype-by-environment Interaction 391

of prevailing conditions during key growth of maps, globes, reports and charts. A GIS
stages and observed patterns of GEI, six can be viewed in three ways. The Database
major environment classes (EC) were iden- View: a GIS is a unique type of database of
tified. The relative frequency of each EC the world a geographic database (geodata-
varied greatly from year to year and signifi- base). It is an information system for geog-
cant hybrid EC interaction variance was raphy. Fundamentally, a GIS is based on a
observed. This environmental classification structured database that describes the world
system provided a useful description of in geographic terms. The Map View: a GIS is
some of the features of both the TPE and the a set of intelligent maps and other views that
MET. Knowledge of the spatial (locations) show features and feature relationships on
and temporal (years) distributions of ECs the earths surface. Maps of the underlying
that influence the incidence of GEI can be geographic information can be constructed
used to improve cultivar performance pre- and used as windows into the database to
dictability in the US Corn Belt TPE. support queries, analysis and editing of the
Subdivision of a crops growing regions information. The Model View: a GIS is a set of
into several mega-environments could be information transformation tools that derive
avoided if genotypes could be found with new geographic data sets from existing data
yield superiority throughout the region, sets. These geo-processing functions take
that is, cultivars bred in favourable envir- information from existing data sets, apply
onments would also perform best in analytic functions and write results into
different or unfavourable environments. new derived data sets. To utilize GIS data
However, one can hardly expect a single more effectively, several software packages
cultivar or a hybrid to flourish the world have been developed. ESRI software offers
over, under all environments and manage- scalable solutions for researchers at National
ment practices. A cultivar planted outside Agricultural Research Services (NARS),
its mega-environment frequently suffers universities and international research cen-
yield reductions. Furthermore, even if the tres. From field-based products like ArcPad
breeding goal is wide adaptation (rather to the server level Spatial Database Engine
than mega-environment directed breeding), (ArcSDE), data can be collected and man-
it would still be the best strategy to iden- aged. The Internet Map Server (ArcIMS)
tify several mega-environments and place a allows research sites separated by great geo-
test location in each to select wide adapta- graphical distances to be connected in real
tion. It has been a normal practice that mul- time and ArcGIS provides all of the neces-
tinational breeding companies have their sary tools to analyse the spatial components
programmes established to target specific of agricultural data sets.
eco-geographic regions. There is a growing need to classify
production environments by combining
biophysical criteria with socio-economic
10.2.2 GIS and environment factors. Geospatial technologies, especially
characterization GIS, are playing a role in each of these
areas and spatial analysis provides unique
Modern plant breeding programmes in- insights. Use of GIS to characterize wheat
creasingly use information from different production environments is described by
sources, including geographic information Hodson and White (2007) by drawing from
provided by GIS (http://www.gis.com). GIS examples at the International Maize and
integrates hardware, software and data for Wheat Improvement Center (CIMMYT).
capturing, managing, analysing and dis- Since the 1980s, the CIMMYT wheat pro-
playing all forms of geographically refer- gramme has classified production regions
enced information. GIS allows us to view, into mega-environments based on climatic,
understand, question, interpret and visu- edaphic and biotic constraints. Advances
alize data in many ways that reveal rela- in spatially disaggregated data sets and
tionships, patterns and trends in the form GIS tools allow mega-environments to be
392 Chapter 10

characterized and mapped in a much more The second example is to use GIS
quantitative manner. The combination of parameters to determine the Striga-prone
improved crop distribution data and key areas in Africa. Striga is an obligate para-
biophysical data at high spatial resolutions sitic weed that attacks cereal crops in sub-
also permits exploring scenarios for dis- Saharan Africa. In western Kenya, it has
ease epidemics, as illustrated for the stem been identified by farmers as their major
rust race Ug99. Availability of spatial data pest problem in maize. A new technology,
describing future climate conditions may consisting of coating seed of imidazolinone
provide insights into potential changes in resistant (IR) maize cultivars with the imi-
wheat production environments in the com- dazolinone herbicide, imazapyr, has proven
ing decades. Increased availability of near to be very effective in controlling Striga
real-time daily weather data derived from on farmer fields. To help extension agents
remote sensing should further improve and seed companies to develop appropri-
characterization of environments, as well as ate strategies, the potential for this technol-
permit regional-scale modelling of dynamic ogy was analysed by combining different
processes such as disease progression or data sources into a GIS (De Groote et al.,
crop water status. Below are some examples 2008). Superimposing secondary data, field
where plant breeding research is benefiting surveys, agricultural statistics and farmer
from implementing a spatial aspect to envi- surveys made it possible to clearly identify
ronment characterization. the Striga-prone areas in western Kenya.
The first example is to use GIS param- By extrapolation over the maize area in the
eters in grouping sites to ensure that breed- zone, total potential demand for IR-maize
ers choose as many variable sites as possible seed is estimated at 20002700 t year1.
to represent the target region. The present Similar calculations, but based on much
mega-environments in the Southern African less precise data and expert opinion rather
Development Community (SADC) countries than farmer surveys or trials, gives an esti-
are confounded within each country, which mate of the potential demand for IR-maize
limits the exchange of germplasm among seed in Africa as 153,000 t year1.
them. A study was undertaken to revise and The third example is to classify maize
group similar maize-testing sites across the growing environments based on drought
SADC countries that are not confounded related parameters. GEIs in southern African
within each country (Setimela et al., 2005). maize growing environments result from fac-
The study was based on 3 years (19992001) tors related to maximum temperature, season
of regional maize yield trial data and GIS rainfall, season length, within-season drought,
parameters from 94 sites. Sequential retro- subsoil pH and socio-economic factors that
spective (Seqret) pattern analysis method- result in sub-optimal input application. The
ology was used to stratify testing sites and difficulty of choosing appropriate selection
group them according to their similarity environments has restricted breeding progress
and dissimilarity based on mean grain yield. for abiotic stress tolerance in highly variable
The methodology used historical data, tak- target environments. Bnziger et al. (2006)
ing into account imbalances of data caused applied cluster analysis to the most prominent
by changes over locations and years, such GEIs and grouped trial sites into eight mega-
as additions and omission of genotypes and environments mainly distinguished by season
locations. Cluster analysis grouped regional rainfall, maximum temperature, subsoil pH
trial sites into seven mega-environments, and N application. GIS information available
mainly distinguished by GIS parameters for season rainfall, maximum temperature
related to rainfall, temperature, soil pH and and subsoil pH (Hodson et al., 2002) was
soil nitrogen with an overall R2 = 0.70. This used to map maize mega-environments (Table
analysis can reveal challenges and opportu- 10.4; Fig. 10.4). Classification by maximum
nities to develop and deploy maize germ- temperature distinguished different eleva-
plasm in the SADC region faster and more tions: mega-environments AE corresponding
effectively. to the mid-altitudes; mega-environments F
Genotype-by-environment Interaction 393

Table 10.4. Characteristics of maize mega-environments in southern Africa as identified through


sequential retrospective pattern analysis of multi-environment trials (reprinted from Bnziger et al. (2006)
with permission from Elsevier).

Area in Area in
Maize mega- Maximum Season Subsoil pH southern southern
environment temperature (C) precipitation (mm) (water) Africa (103 ha) Africa (%)

A 2427 > 700 < 5.7 46,282 18.2


B 2427 > 700 > 5.7 28,826 11.4
C 2430 < 700 48,291 19.0
D 2730 > 700 < 5.7 17,166 6.8
E 2730 > 700 > 5.7 49,589 19.6
F > 30 > 700 17,146 6.8
G > 30 < 700 38,403 15.1
H < 24 7,897 3.1

A
B
C
D
E
F
G
H

Fig. 10.4. Maize mega-environments in southern Africa delineated by combinations of maximum


temperature, season precipitation and subsoil pH. Table 10.4 gives details of the eight environments
AH. White areas with rainfall < 400 mm were excluded from the analysis. Squares indicate trial sites
used for defining mega-environments. Climatic and edaphic data were from Hodson et al. (2002).
Reprinted from Bnziger et al. (2006) with permission from Elsevier.

and G to the lowlands; and mega-environment clustered with trials in mega-environments


H to the highlands. However, they also seem D and E, which may be indicative of less fer-
to be related to disease incidence, with leaf tile soil types in those areas or may simply be
diseases such as Cercospora zeae-maydis, coincidental.
Puccinia sorghi and Exserohilum turcicum
the most prevalent in mega-environments
A and B, downy mildews occurring in mega- 10.2.3 Selection of locations for testing
environments F and G and Puccinia poly-
sora and Helminthosporium maydis likely The purpose of test-environment evalua-
occurring mostly in mega-environment F. tion is to identify test environments that
Most trials with suboptimal N application effectively identify superior genotypes for
394 Chapter 10

a mega-environment. An ideal test envir- were evaluated across 3 years in a total of


onment should be both discriminating of 47 environments by randomized complete
the genotypes and representative of the block designs with four replications per
mega-environment. The selection of loca- trial. Results indicated that the AMMI +
tions for the evaluation of a quantitative cluster analysis and pattern analysis clas-
character involves a number of considera- sified test locations consistently and in
tions. Locations generally are chosen to good agreement with the GIS-based sub-
represent the area where a new cultivar region definition. Under the hypothesis
is to be grown commercially. The cost of of six selection environments assigned
transporting machinery and personnel to subregions in proportion to their size
may influence the distance of a location (three sites in each of 2 years) for late stage
from the main research centre, when the selection, specific adaptation provided
testing is largely based on a mechanized 27% greater gains than wide adaptation
system. The availability of suitable land over the region at similar costs. The advan-
may be a factor when the size of the test tage of specific adaptation was much larger
area is large. The test environments should (39% determined on the basis of observed
be evaluated for being, or not being, rep- gains) for the smaller, stressful inland sub-
resentative of the target environment and region, where specific adaptation may also
for their power to discriminate among enhance food security.
genotypes. Repeatable GL interaction revealed in
A primary consideration in site selec- METs can be exploited by site-specific culti-
tion is the diversity of environments that var recommendations. There is uncertainty,
can be obtained within a year. This is partic- however, on methods for defining recom-
ularly important when widely adapted cul- mendations and extending results to non-
tivars are desired. A breeder will attempt to tested locations. With reference to durum
test at locations that have environments as wheat in Algeria, Annicchiarico et al. (2006)
diverse as those that would be encountered compared methods for defining the best
at one location in 2 or more years (Fehr, pair of cultivars for local recommendation
1987). Selection of locations for testing can based on: (i) observed data; (ii) joint regres-
be based on analysis of variance (ANOVA), sion-modelled data; (iii) AMMI-modelled
correlation and cluster analyses as those data; (iv) factorial regression-modelled
used for selection and evaluation of geno- data; (v) AMMI modelling interfaced with
types tested in METs as will be discussed in a GIS; and (vi) factorial regression model-
the next section. ling interfaced with a GIS. The last two
Developing specific cultivars for each methods extended the recommendations to
subregion of a target region, instead of all sites in a GIS as a function of long-term
widely adapted cultivars, may exploit posi- climatic data. GIS-based recommendations
tive genotype-by-location (GL) interactions implied a slight yield decrease relative to
to increase crop yields. With reference to those based on conventional modelling.
the Algerian durum wheat (Triticum durum However, they allowed for about 9% higher
Desf.) region, Annicchiarico et al. (2005) yields than those of most-grown cultivars,
performed a study aimed at: (i) comparing while enlarging the scope for site-specific
AMMI versus joint regression modelling of recommendations and assisting national
GL effects; (ii) verifying the reliability of seed production and distribution systems.
a GIS-based definition of two subregions
that extended the site classification on the
basis of GL effects as a function of long-
term winter mean temperature; and (iii) 10.3 Stability of Genotype
comparing wide versus specific adaptation Performance
in terms of observed and predicted yield
gains. Twenty-four cultivars from interna- In general, there are two concepts of stabil-
tional centres in Europe and North Africa ity for genotype performance, static and
Genotype-by-environment Interaction 395

dynamic. Static stability, also referred to as plants over different environments (Allard
the biological concept of stability, implies that and Bradshaw, 1964; Briggs and Knowles,
a genotype has a stable performance across 1967). It has been shown that heterozygous
environments with no among-environment individuals, such as F1 hybrids, are more
variance, i.e. a genotype is non-responsive stable than their homozygous parents.
to increased levels of inputs. Dynamic sta- The stability of heterozygous individu-
bility implies that a genotypes performance als seems to be related to their ability to
is stable, but for each environment, its per- perform better under stress conditions than
formance corresponds to the estimated or homozygous plants. The terms genetic home-
predicted level, which is also referred to as ostasis and population buffering were used
the agronomic concept of stability. Lin et al. to describe the stability of a group of plants
(1986) classified statistical methods for sta- that exceeds that of its individual mem-
bility analysis into four groups: bers (Lerner, 1954; Allard and Bradshaw,
1964). Heterogeneous cultivars generally
Group A: based on deviation from aver-
have higher stability than homogeneous
age genotype effect (DE) represents
cultivars.
sums of squares;
A number of statistical procedures
Group B: based on GEI represents
have been developed to enhance our under-
sums of squares;
standing of GEI and to select genotypes
Group C: based on either DE or GEI
that perform consistently well across many
represents regression coefficient against
environments. The earliest approach was
environment mean; and
the linear regression analysis. Finlay and
Group D: based on either DE or GEI
Wilkinson (1963), Eberhart and Russell
represents deviations from regression.
(1966) and Tai (1971) popularized varia-
In Group A (Type 1 stability) which is tions of the regression approach, assuming
equivalent to biological stability, a geno- an expected linear response of yield to envi-
type is regarded as stable if its among- ronments. Other statistical methods that
environment variance is small. In Groups B have received significant attention are pat-
and C (Type 2 stability), which is equiva- tern analysis (DeLacy et al., 1996), the AMMI
lent to agronomic stability, a genotype is model (Gauch and Zobel, 1996), the shifted
regarded as stable if its response to environ- multiplicative model (SHMM) (Cornelius
ments is parallel to the mean response of all et al., 1996; Crossa et al., 1996), linearbilinear
genotypes in a test. In Group D (Type 3 sta- and mixed models (Crossa et al., 2004) and
bility), a genotype is regarded as stable if the non-parametric methods of Hhn (1996).
residual mean square following regression The methods of Hhn (1996) and Kang
of genotype performance or yield on envir- (1988, 1993) investigate yield and stabil-
onmental index is small. Lin and Binns ity into one statistic that can be used as a
(1988) proposed a Type 4 stability on the selection criterion. Flores et al. (1998) and
basis of predictable and unpredictable non- Hussein et al. (2000) conducted comparative
genetic variation. They suggested the use of evaluation of 22 and 15 stability statistics/
a regression approach for the predictable methods, respectively. Flores et al. (1998)
portion. The mean square for years-within- classified 22 univariate and multivariate
locations for each genotype as a measure of methods into three main groups. Group 1
the unpredictable variation was referred to statistics are mostly associated with yield
as Type 4 stability. level and show little or no correlation with
The stability of cultivar performance stability parameters. In Group 2, both yield
across environments is influenced by the and stability of performance are considered
genotype of individual plants and the simultaneously to reduce the effect of GEI.
genetic structure of the plants. The terms Group 3 statistics emphasize only stability.
homeostasis and individual buffering have Recently, mixed model approaches have
been used to describe the stability in per- become increasingly important in GEI and
formance of individual plants or groups of stability analyses.
396 Chapter 10

10.3.1 Linearbilinear models ia 2ik = jg 2jk = 1 for k = k'. When Eqn 10.4
for studying GEI is saturated the number of bilinear terms is
t = min(I 1, J 1) and for any smaller value,
Statistical methods for detecting and quan- the model is said to be truncated. The inter-
tifying COI and for forming subsets of action parameters lk, aik and gjk of the GEI
environments and/or genotypes with negli- subspace are estimated from the data them-
gible COI have been based on fixed effect selves. The linearbilinear model of Eqn
linearbilinear models. Several classes of 10.4 is a generalization of the regression
these models have been developed, some on the mean model with more flexibility
of which are widely used. In this section, for describing GEI because more than one
linearbilinear model development will genotypic and environmental dimension is
be discussed, mainly based on Crossa et al.s considered.
(2005) review. Several classes of linearbilinear mod-
An early approach towards the ana- els, described by Cornelius et al. (1996),
lyses of GEI included the conventional which are generally derived from Eqn 10.4,
fixed effect two-way (FE2W) ANOVA are Genotypes t
Regression Model (GREG)
model with the sum to zero constraints y ij = m i + k=1lk a ik g jk + e ij , the Sites (envi-
running over indices as shown in Eqn ronments)
t
Regression Model (SREG) yij = mj
10.1. Yates and Cochran (1938) proposed + k=1 lka ik g jk + e ij , the Completely Multipli-
t

to relate the GEI term in Eqn 10.1 linearly cative Model (COMM) y ij = k=1 lka ik g jk + e ij
to the environmental main effect, that is, and the Shiftedt Multiplicative Model
(td)ij = xidj + dij, where xi is the linear regres- (SHMM) y ij = b + k=1 lka ik g jk + e ij .
sion coefficient of the ith genotype on the Two linear and bilinear models, SHMM
environmental mean and dij is a deviation. and SREG, have been used for studying GEI
This approach was later used by Finlay and and for clustering genotypes or sites into
Wilkinson (1963) and modified by Eberhart groups with statistically negligible COI
and Russell (1966). William (1952) linked (Cornelius et al., 1992, 1993; Crossa and
the FE2W model with principal component Cornelius, 1997, 2002; Crossa et al., 1993,
analysis (PCA) by considering the model 1995). Only the SREG model permits the
yij = m + ti + laigj + eij, where l is the larg- detection of COIs (Bernardo, 2002).
est singular value of ZZ' and ZZ (for The SREG model has been used for
Z = yij yi.) and ai and gj are the correspond- grouping environments without geno-
ing eigenvectors. typic rank change (Crossa and Cornelius,
Gollob (1968) and Mandel (1969, 1971) 1997). The interaction parameters aik and
extended Williams (1952) work by consider- gjk of these linearbilinear models define
t
ing the bilinear GEI term as (td )ij = k=1lka ik g jk. the behaviour of the genotypes and the
Thus, the general formulation for the linear environments and when ai1, ai2 and gj1, gj2
bilinear model is are plotted together in the bi-plot (Gabriel,
1978) useful interpretations of the relation-
t

l a
ships between genotypes, environments
y ij = m + t i + d j + k ik g jk + e ij (10.4)
and GEI are obtained. In the bi-plot, the
k =1
interaction between the ith genotypes and
where the constant lk is the singular value of the jth environment is obtained from the
the kth multiplicative component (kth PCA projection of either vector on to the other.
axis), that is ordered l1 l2 lt; ik cor- Crossa et al. (2002) used SREG1 analysis
responds to the left singular vector of the (a reduced SREG model) to examine GEI
kth component and represents genotypic among the 20 environments ranged from
sensitivities to hypothetical environmen- 0.41 to 0.43 and, consequently, the rank-
tal factors represented by the right singu- ing of the nine genotypes differed among
lar vector of the kth component, gjk. The aik environments (Fig. 10.5). The primary
and gjk satisfy the ortho-normalization con- effect for a given environment depends on
straints iaikaik' = jgjkgjk' = 0 for k k' and the other environments included in the
Genotype-by-environment Interaction 397

8
20 environments
7

5
SREG1 predicted yield (t ha1)

8
Subset of 10
7 environments

2
0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
Primary effect of environments

Fig. 10.5. Predicted yield from SREG1: analysis of nine maize genotypes. Data from Crossa et al. (2002)
courtesy of R. Bernardo (2002).

analysis. A subset of ten environments, in after taking out the main effects of geno-
which the resulting primary effects are all types and environments from METs is used
positive and COIs were absent, was found for PCA to extract patterns of GEI or resid-
(Fig. 10.5). ual variation to understand the underly-
Gabriel (1978) described the least ing causes of such interactions. It is thus a
square fit of Eqn 10.4 and explained combination of ANOVA and the PCA.
how the residual matrix of the GEI term, In AMMI analysis, the least square esti-
Z = yij yi. y.j + y.., is subjected to a singular mates of the parameters along with mean
value decomposition (SVD) after adjusting values of genotypes and environments are
the additive (linear) terms. Zobel et al. interpreted to classify genotypes and envi-
(1988) and Gauch (1988) called Eqn 10.4 ronments for their stability. A bi-plot is
additive main effects and multiplicative developed by placing both genotype and envi-
interaction (AMMI) and proposed a cross- ronment means on the x-axis and placing the
validation procedure for determining the first PCA scores of the genotypes and envi-
number of important bilinear components. ronments on the y-axis. This bi-plot can be
The AMMI model separates the multiplica- used to facilitate identification of any pattern
tive portion of GEI into specific patterns of of GEI, i.e. specific interactions of individual
response of genotypes and environments. genotypes and environments based on the
In this analysis the information about GEI sign and magnitude of PCA values especially,
398 Chapter 10

assuming that the first PCA accounts for t

the most important pattern of the GEI. The


bi-plot helps to visualize the relationship
yij mj = l a
k =1
k g
ik jk + e ij (10.5)

between eigen values for PCA 1 and geno-


typic and environment means. Any genotype The model is subjected to the constraint
with a PCA 1 value close to zero shows gen- l1 l2 lt 0 and to ortho-normality
eral adaptation to the tested environments constraints on the aik scores, as indicated
(Fox et al., 1997). However, it should be for Eqn 10.4.
noted that a genotype that is poor every- Least squares solution for mj is the empir-
where would have a zero PCA 1 score too. ical mean (y.j) for the jth environment and
So the PCA 1 value should be used along the least squares solutions for parameters in
with the average performance across the the term lkaikgjk (for i = 1, , I; j = 1, , J)
tested environments. A large genotypic PCA are obtained from the kth PC of the SVD of
1, with a high average performance, reflects the matrix Z = [zij], where zij = yij y.j. The
more specific adaptation to environments maximum number of PCs available for esti-
with a PCA 1 score of the same size. The mating the model parameters is p = Rank (Z).
genotypes and environments with a PCA In general, p min (J, I 1), with equality
values of the same sign show positive inter- holding in most cases. For k = 1, 2, 3, ,
actions and suggest specific adaptations. aik and gjk have also been characterized as
The reverse sign of the PCA value of geno- primary, secondary, tertiary, etc., multipli-
types and environments depicts negative cative effects of the ith cultivar/genotype
interaction, i.e. poor performance of geno- and jth environment. Thus, Eqn 10.5 may
types in such environments. be described as modelling the deviations of
the cell means from the environment means
as a sum of PCs, each of which is the prod-
10.3.2 GGE bi-plot analysis uct of a cultivar score (aik), an environment
score (gjk) and a scale factor (the singular
As one of the major statistical methods value, lk).
for GED analysis, GGE bi-plot analysis The GGE bi-plot is constructed from
has been developed by Yan et al. (2000), the first two PCs from the SVD of Z with
Yan and Kang (2003) and Yan and Tinker markers, one for each cultivar, plotted
(2006). The method is based on the bi-plot with l1f a i1 as abscissa and l 2f a i 2 as ordi-
originally developed by Gabriel (1971), nate. Similarly, markers for environments
which is a popular data visualization tool are plotted with l11 f g j 1 as abscissa and
in many scientific research areas, includ- l 21 f g j 2 as ordinate. The exponent f, with
ing psychology, medicine, business, sociol- 0 f 1, is used to rescale the cultivar
ogy, ecology and agricultural sciences. The and environment scores to enhance visual
bi-plot tool has become increasingly popu- interpretation of the bi-plot for a particular
lar among plant breeders and agricultural purpose. Specifically, singular values are
researchers since its use in cultivar evalu- allocated entirely to cultivar scores if f = 1
ation and mega-environment investigation. (cultivar-focused scaling), or entirely to
Yan et al. (2000) referred to bi-plots based environment scores if f = 0 (environment-
on singular value decomposition (SVD) of focused scaling); and f = 0.5 will allocate
environment-centred or within-environ- the square roots of the lk values to culti-
ment standardized GED as GGE bi-plots, var scores and also to environment scores
because these bi-plots display both G and (symmetric scaling). Mathematically, a
GE, the two sources of variation that are rel- GGE bi-plot is a graphical representation
evant to cultivar evaluation. of the rank 2 least squares approxima-
The GGE bi-plot is based on the tion of the rank p matrix Z. This repre-
SREG linearbilinear (multiplicative) model sentation is unique except for possible
(Cornelius et al., 1996), which can be simultaneous sign changes on all ai1 and
written as gj1 and/or all ai2 and gj2. An important
Genotype-by-environment Interaction 399

property of the bi-plot is that the rank view because it facilitates genotype com-
2 approximation of any entry in the parisons based on mean performance
original matrix Z can be computed by and stability across environments within
taking the inner product of the corre- a mega-environment. Since GGE repre-
sponding genotype and environment sents G + GEI and since the AEC abscissa
vectors, i.e. (l1f a i1, l2f m i 2 )(l11 f g j 1, l 21 f g j 2 )' approximates the genotypes contributions
= l1a i1g j 1 + l2a i 2g j 2 . This is known as the to G, the AEC ordinate must approximate
inner-product property of the bi-plot. the genotypes contributions to GEI, which
The GGE bi-plot methodology consists is a measure of their stability or instability.
of a set of bi-plot interpretation methods, Thus, G4 in the figure was the most stable
whereby important questions regarding genotype, as it was located almost on the
genotype evaluation and test-environment AEC abscissa and had a near-zero projec-
evaluation can be visually addressed. tion on to the AEC ordinate. This indicates
Within a single mega-environment, cultivars that its rank was highly consistent across
should be evaluated for their mean perform- environments within this mega-environ-
ance and stability across environments. ment. In contrast, G17 and G6 were two of
Figure 10.6 is the Average Environ- the least stable genotypes with above aver-
ment Coordination (AEC) view, which is age mean performance.
based on genotype-focused singular value Several recent articles reviewed and
partitioning (SVP), that is, the singular compared the two statistical approaches
values are entirely partitioned into the discussed above, AMMI and GGE bi-plot
genotype scores (GGE bi-plot option SVP analyses. For their pros and cons, read-
= 1). This AEC view with SVP = 1 is also ers are referred to Gauch (2006), Yan et al.
referred to as the Mean versus Stability (2007) and Gauch et al. (2008).

PC1 = 58.9%, PC2 = 19.1%, Sum = 78%


Transform = 0, Scaling = 0, Centering = 2, SVP = 1

1.2 E7

0.8 E5

0.4 G7 G1
PC2

G3 G18
G11
G12 G2 G5
G9
G6
0.0 G15G16
G14 E2
G4 E1
G17 G10
0.4 E3
E4

G8
G13 E9
0.8 E6
E8

1.6 1.2 0.8 0.4 0.0 0.4 0.8 1.2 1.6


PC1

Fig. 10.6. The mean versus stability view of the GGE bi-plot based on a subset of the G E data
in Table 10.1. The data were not transformed (Transform = 0), not scaled (Scaling = 0), and were
environment-centred (Centering = 2). The bi-plot was based on genotype-focused singular value
partitioning (SVP = 1) and therefore is appropriate for visualizing the similarities among genotypes.
It explained 79.5% of the total G + GE for the subset. From Yan et al. (2007) with permission.
400 Chapter 10

10.3.3 Mixed model tively, and are assumed to be random and


normally distributed with zero mean vec-
A useful and statistically efficient approach tors and VCOV matrices R, G and E, respec-
would be to generate disjoint subsets of envi- tively, such that (see Eqn 10.7 at bottom of
ronments and genotypes with no significant page). R and E are assumed to have the sim-
COI within the linear mixed model frame- ple variance component structure:
work and to detect COI in that context. R = var(rep) = repIr
Linear mixed models and the factor analytic = [diag(s 2rj, j = 1, 2, ..., s)]Ir (10.8)
(FA) VCOV structure offer a more realistic
and effective approach for quantifying COI and
and forming subsets of environments and
E = var(error) = errorIrg
genotypes without COI.
= [diag(s 2ej, j = 1, 2, ..., s)]Irg (10.9)
Based on Crossa et al. (2004), the
mixed model fitted to the data from s where r is the number of replicates, g is
sites is (see Eqn 10.6 at bottom of page) the number of genotypes and Ir and Irg are
where yj is the vector of the response the identity matrices of orders r and r g,
variable (i.e. grain yield) in the jth site respectively; rep = diag(s r2j, j = 1, 2, ..., s) and
(j = 1, 2, , s); 1 is a vector of ones; mj is the error = diag(s e2j, j = 1, 2, ..., s) are the s s rep-
population mean of the jth site; ZRj and ZGj licate and error VCOV matrices among pairs
are the incidence matrices of the random of s sites, respectively; s 2rj and s e2j are the
effects of replicates and genotypes within replicate and residual variances within the
the jth site, respectively; r, g, e are the jth site, respectively, and is the Kronecker
vectors that contain the random effects of (or direct) product of the two matrixes.
replicates within sites, genotypes within The VCOV matrix G can be represented
sites and residuals within sites, respec- as (see Eqn 10.10 at bottom of page) where

y 1 1m1 Z R1 0 . . . 0 Z G1 0 . . . 0

y 2 1m2 0 Z R2 . . . 0 0
Z G2 . . . 0
. . . . . . . . . . . . . .
= + r + g +e (10.6)
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .

y s 1ms 0 . . . . Z Rs 0 . . . . Z Gs

0 R = var(rep) 0 0
r

g
~ N 0 , 0 G = var( genotype ) 0
(10.7)
e
0 0 0 E = var(error )

G = var( genotype ) = g
I g

s g21 r12sg1s g2 . . . r1ss g1s g s



r12s g1s g2 s g22 . . . .
(10.10)
. . . . . .
= Ig
. . . . . .
. . . . . .

rs1s g s s g1 . . . . 2
s gs
Genotype-by-environment Interaction 401

the jth diagonal element of the s s matrix Ig, which assumes no relationship among
Sg is the genetic variance s 2gj within the jth genotypes.
site and the ijth element is the genetic cov- The genetic environmental component
ariance rij sgi sgj of genotypic effects in sites i (Sg) of the variance of the random effect
and j; thus rij is the correlation of genotypic vector, g, can be modelled by the FA, which
effects in sites i and j. The hypothesis of expresses the random effect of the ith gen-
interest is that the rij for pairs of sites within otype in the jth environment as a linear
subsets of sites (or genotypes) previously function of latent variables xik with coeffi-
identified as non-COI subsets by SHMM or cients djk for k = 1, 2, t, plus a residual
SREG clustering are all unity (Crossa et al., hij. Then
2004).
The approach proposed by Crossa
G = (DD' + Y) Ig = FA(k) Ig (10.11)
et al. (2004) is a step in the right direc-
tion for incorporating the linear mixed
model methodology into the quantification
of COI. However, detection of COI using 11 12 . . 1t
22 . . 2t
SREG (and SHMM) has generally been 21
done within the fixed effect linearbilinear where = . . . . . is a
framework (Cornelius and Seyedsadr,
. . . . .
1997), that is, the differences between any s1
s2 . . st
two genotypic effects in any two environ-
ments are linear functions of least squares
solutions for model parameters regarded matrix of order s t with the kth column
as fixed. Recently, Yang (2007) recog- containing the environment loadings for the
nized that in statistical analyses of METs, tth latent factor (k = 1, , t). The FA model
either genotypes or environments, or both, can be interpreted as the linear regression
should be considered as random effects of genotype and GEI on latent environmen-
and, therefore, the detection of COI must tal covariates (environmental loadings,
consider that the difference between geno- djk), with each genotype having a separate
typic effects in a random environment is slope (genotypic scores, xik) but a common
a predictable function that involves Best intercept (if main effects of genotypes are
Linear Unbiased Estimators (BLUEs) as not distinguished from GEI). The slopes of
well as Best Linear Unbiased Predictions genotypes measure the sensitivity of the
(BLUPs). genotypes to hypothetical environmental
As a development of Crossa et al.s factors represented by the loadings of each
(2004) approach, Burgueo et al. (2008) environment.
presented an integrated methodology for As an example, two CIMMYT maize
clustering environments and genotypes international METs were used to illus-
with negligible COI based on results trate the method for searching for subsets
obtained from fitting FA to MET data, of environments and genotypes with neg-
which was used to detect COI using ligible COI. Results from both data sets
predictable functions based on the lin- showed that the proposed method formed
ear mixed model with FA and BLUP of subsets of environments and/or genotypes
genotypes. with negligible COI. The main advantage of
For the genotype factor, the identity the integrated approach is that one unique
matrix Ig (of order g), described above, is linear mixed model, the FA model, can
used when it is assumed that the genotypes be used for: (i) modelling the association
are not related and the breeding value of among environments; (ii) forming subsets
each genotype will be predicted only by of environments without COI; (iii) group-
the value of the empirical responses of the ing genotypes into non-COI subsets; and
genotype itself. The genotypic component (iv) detecting COI using the appropriate
of G can be modelled by the identity matrix predictable function.
402 Chapter 10

The multivariate approaches help to Y., 2002). Various statistical methods have
identify patterns of variation for genotypes been developed for mapping quantitative
on the basis of their multi-dimensional trait loci (QTL) involved in GEIs. In this
response to many environments. Such section, genetic models involving GEI
grouping, however, does not represent any and molecular dissection of GEI will be
specific types of stability index, but it can discussed.
be used to draw meaningful conclusion in
relation to the response of a cultivar that has
been known for its ability. All the new gen- 10.4.1 Partition of environmental factors
otypes grouped along with a well-known
cultivar or falling near it are considered to
METs involve various environmental fac-
have the same type of stability, since groups
tors and some of them can be partitioned
of genotypes that are known for their stabil-
into several key components. Environment
ity have been quite stable over years. Along
partition can be used to understand the
with these known cultivars, a large number
effect of each environmental component,
of new lines can be screened for their stabil-
the response of a genotype to specific envi-
ity from 1-year experiments.
ronmental factors and the genetic control
In general the multivariate methods
of environment-dependent traits such as
do not provide a simple measure of stabil-
temperature or photoperiod-induced male
ity for a specific genotype which could be
sterility.
used as a trait in breeding programmes. A
Genetic analysis in general involves
detailed description of all such techniques
extracting a genetic signal from many
is beyond the scope of this book and the
sources of noise, such as those from exter-
reader is encouraged to consult Freeman
nal environments and internal genetic back-
(1973), Kang (1990) and Fox et al. (1997).
grounds. For accurate genetic analysis, the
Commercial statistical software packages
noise must be minimized or eliminated.
such as SAS can be used for different mod-
Controlled environments or genetic back-
els described in this section. For example,
grounds are usually created for filtering
mixed models can be fitted using SAS PROC
the noise. In Chapter 4, we described the
MIXED.
development of a set of individuals such as
near-isogenic lines (NILs) that have homo-
geneous genetic background. Similarly, Xu,
10.4 Molecular Dissection of GEI Y. (2002) proposed the concept of near-iso
environments (NIEs). Plant populations
Recent advances in molecular biology used for genetic analysis can be evaluated
have provided some of the best tools for in either natural or controlled environments
obtaining insights into the molecular mech- or both. Controlled environments can be
anisms associated with GEI. Molecular compared with each other or with natural
markers can be employed to find genomic environments. If two environments mainly
regions with stable responses. Marker- differ in one macro-environmental factor,
assisted QTL-by-environment interaction they are considered contrasting or NIEs, if
(QEI) analysis will ultimately provide a the standard plot-to-plot variation and other
better genetic understanding and possible residual micro-environmental effects can
regulation of this phenomenon. Regions of be neglected. A relative trait value is then
plant genomes that provide stable responses derived from two direct trait values meas-
across diverse environments can be identi- ured in each environment to ascertain the
fied. Experimental strategies have been sensitivity of plants to the stress (see for
proposed for resolving environmental fac- example Ni et al., 1998).
tors into several components that affect Xu, Y. (2002) provided an example of
specific quantitative traits so their effects how rice plants respond to photoperiod and
can be either estimated or controlled (Xu, temperature. Using Zhaiyeqing 8/Jingxi 17
Genotype-by-environment Interaction 403

doubled haploids (DHs), days-to-heading whereas likelihood of odds (LOD) scores for
(DTH) and photo-thermo sensitivity (PTS) the PTS in these regions were much lower
were measured in two environments than the threshold. A region on chromo-
(Beijing and Hangzhou) that mainly differ in some 7 (G397ARM248) was significantly
day-length and temperature. At the photo- associated with PTS (LOD = 4.47), where
thermo sensitive stage, Beijing has long LOD scores for DTH in both locations were
day-length (14.515 h) and low temperature much lower than the threshold (Fig. 10.7),
(2027C), whereas Hangzhou has short indicating that this PTS QTL is independ-
day-length (1313.5 h) and high temperature ent of the QTL for heading date. As the rice
(25.530C). Rice is considered a short-day breeding programme has been accelerated
plant and development from vegetative to by growing rice in an off-season or an off-
reproductive stages is promoted under short location where it is not a targeted envir-
day-length and high-temperature condi- onment, marker-assisted selection (MAS)
tions. Differences in photoperiod and tem- for these types of traits would be important
perature in the two locations resulted in as they can only be identified under NIEs.
differences in DTH of 039 days for indi- A second example in rice is from CO39/
vidual DH lines (Fig. 10.7). Using the rela- Moroberekan recombinant inbred lines
tive difference, ( (DTH in Beijing DTH in (RILs), grown under greenhouse conditions
Hangzhou)/DTH in Beijing 100), genes and exposed to two different photoperiod
associated with PTS were mapped with regimes (Maheswaran et al., 2000). Days-
155 restriction fragment length polymor- to-flowering (DTF) of individual lines was
phism (RFLP) and 92 simple sequence evaluated under 10-h and 14-h day-lengths
repeat (SSR) markers. Four chromosomal and loci associated with photoperiod sen-
regions were identified significantly associ- sitivity were identified based on the delay
ated with DTH in either or both locations, in flowering less than the 14-h photoperiod

16
16
14
14 Beijing (BJ)
Number of DH lines

12
12
10
10
8
8
6
6
4
4
Number of DH lines

2 2

0 0
2 6 10 14 18 22 26 30 34
20
Photo-thermo sensitivity (PTS)
18
16
14 Hangzhou (HZ)
12 LODs for DTH and PTS
10 Chr Marker interval DTH(BJ) DTH(HZ) PTS
8
6 1 RG400RM84 2.41* 1.68 0.87
4 7 G379ARM248 1.21 0.56 4.47*
8 RG885RM44 7.35* 6.56* 1.07
2
10 C16RM228 2.67* 3.04* 0.41
0
65 71 77 83 89 95 101 107 113 119 12 RG463RG323 2.30 2.55* 0.82
Days-to-heading (DTH)

Fig. 10.7. QTL mapping for photo-thermo sensitivity (PTS) in rice under two environments (Beijing and
Hangzhou). Left: Days-to-heading (DTH) distribution in Zhaiyeqing 8/Jingxi 17 DH population planted
in Beijing and Hangzhou. Top right: PTS distribution in the population when PTS was measured by the
difference of DTHs in the two environments divided by the DTH in Beijing. Bottom right: QTL identified for
DTH in Beijing and Hangzhou and for PTS (*LOD > 2.4). Modified from Xu, Y. (2002). Chr, chromosome.
404 Chapter 10

(DTF at 14 h DTF at 10 h). In total, 15 QTL (Paterson et al., 1991; Stuber et al., 1992;
were associated with DTF. Only four of them Lu et al., 1996; Veldboom et al., 1996), indi-
were also identified as influencing response cating that QTL detection depends on the
to photoperiod. None of these QTL is allelic specific environment. These QTL can be
to the PTS QTL on chromosome 7. defined as environment-dependent (sensi-
Genetic mapping performed on envi- tive) QTL. For improving mapping power
ronmental sensitivity has provided much and efficient QTL cloning, therefore, the
better quantitative evaluations of QEI and specific conditions highly suitable for
have been used successfully to investigate expression of the quantitative trait of inter-
plasticity and GEI for agriculturally relevant est should be identified. The results from
traits in animals and plants. QTL mapping across multiple environments
provide some evidence for QEIs, in addition
to the information of how QTL detection
10.4.2 QTL mapping across depends on the environments and which
environments traits are more environment-dependent.
QTL can be studied under adverse
Do genes function similarly in different environments (abiotic stress), NIEs or a uni-
environments? The answer is negative. form environment by replicating DH or RIL
Phenotypic expression of quantitative traits populations and splitting tillers or ratoon-
is affected by external environmental factors ing a segregating population. Xu, Y. (2002)
such as day-length, temperature, moisture summarized the QTL mapping experiments
and soil conditions, which can greatly in rice that have been done in two or more
modify the phenotype of quantitative traits. environments by using permanent popu-
In many cases, external environments act lations. For the convenience of compari-
as a regulator of expression of the traits. son, rice QTL mapped in two environments
It was found that when the same mapping were selected for sharing analysis (Table
population was phenotypically evaluated in 10.5). A total of 159 QTL was identified in
different environments, some QTL could be ten QTL mapping reports for 11 categories
detected in all environments tested but oth- of quantitative traits. For different traits,
ers could be detected only in some of them QTL-sharing frequencies between the two

Table 10.5. Comparison of QTL mapped in two environments using the same populations in rice (Xu, Y., 2002).

Number of QTL Mean VEb (%)

Traita Total Shared (%) Total Shared QTL Unshared QTL

Yield 15 2 (13.3) 8.7 12.8 8.1


Panicle per plant 7 3 (42.9) 7.1 6.7 7.4
Grain per panicle 16 4 (25.0) 11.7 12.9 11.3
1000-grain weight 17 9 (52.9) 12.7 14.0 11.1
Root 30 9 (30.0) 11.8 15.0 10.4
Drought tolerance 21 2 (9.5) 9.8 10.2 9.8
Flood tolerance 12 3 (25) 24.4 48.8 16.3
Al tolerance 4 2 (50) 12.5 16.0 9.0
Disease resistance 17 7 (41.2) 10.4 11.0 10.1
Seedling vigour 13 3 (23.1) 16.0 19.5 14.9
Paste viscosity 7 2 (28.6) 19.5 37.7 11.9
Total 159 46 (30.0) 12.6 16.7 10.9
a
Traits in each category: Yield, grains (t ha1); Root, root number, root length and thickness; Drought tolerance, leaf
rolling and relative water content; Flood tolerance, initial plant height, plant height increment, internode increment and
leaf-length increment; Seedling vigour, shoot length, root length, coleoptile length and mesocotyl length; Paste viscosity,
peak viscosity, hot paste viscosity and cool paste viscosity.
b
VE, Variance explained.
Genotype-by-environment Interaction 405

environments ranged from 9.5% for drought which presumably represent the genetic
tolerance to 52.9% for 1000-grain weight factors underlying the GEI observed in line-
and, for all traits, on average, 46 (30%) of based phenotypes (Beavis and Keim, 1996).
them are shared or are common between QEI has been predicted by comparing the
the two environments. For all shared QTL, QTL detected separately in different envir-
the mean variance explained is 16.7%, onments in many crops. That a QTL can
whereas for the unshared QTL, it is 10.9%. be detected in one environment but not in
QTL with large effect (higher proportion others, as discussed earlier, could result
of the variance explained) are shared more from experimental noise, sampling error or
frequently. Major-gene-related QTL (for experimental error and thus does not neces-
flooding tolerance and paste viscosity) had sarily indicate QEI. As indicated by Jansen
the highest QTL-sharing frequencies. When et al. (1995), the chance for simultaneous
compared across three or more environments, detection of QTL in multiple environments
QTL-sharing frequencies become lower. For is small. On the other hand, sharing QTL
example, a total of 22 QTL for six agronomic among environments does not necessarily
traits were identified in Zhaiyeqing 8/Jingxi mean lack of QEI. This is supported by the
17 DHs, only seven of which were shared fact that QEI was identified for some shar-
in all three tested environments (Lu et al., ing QTL by incorporating QEij into QTL
1996). In three trials using Tesanai 2/CB F2 analysis (e.g. Yan et al., 1999) and by the
and its two equivalent F3s, eight QTL were fact that QTL effects estimated across envi-
identified, two of which were detected in all ronments could be very different.
three trials (Zhuang et al., 1997). In another
report, three of 11 QTL identified for leaf
rolling were shared in the three trials with 10.4.3 QTL mapping with incorporated
different drought-stress intensities (Courtois GEI
et al., 2000).
With grain yield and test weight evalu- There are two approaches for the analysis of
ated in four trials and for grain yield com- QEIs (Leflon et al., 2005). The first approach
ponents evaluated in eight trials, Blanco deduced interactions by comparing QTL
et al. (2001) detected a total of 52 QTL in detected separately in different environ-
durum wheat that were significant in at ments as described in the previous section:
least one environment at P < 0.001 or in at in many cases, an interaction was merely
least two environments at P < 0.01. Paterson detected and no estimate made of the interac-
et al. (2003) described the impact of well- tion itself. In other cases, QEIs were assessed
watered versus water-limited growth condi- by co-localization between QTL detected
tions on the genetic control of fibre quality, for the main effect and QTL detected for
a complex suite of traits that collectively stability statistics (Emebiri and Moody,
determine the utility of cotton. Fibre length, 2006). The second approach takes interac-
length uniformity, elongation, strength, tion effects into account in the analysis of
fineness and colour (yellowness) were influ- multi-environment trials by introducing
enced by 6, 7, 9, 21, 25 and 11 QTL, respec- QTL main effects and QEI effects, like stud-
tively, that could be detected in one or more ies of GEI (see, for instance, Crossa et al.,
treatments. The genetic control of cotton 1999; Campbell et al., 2003, 2004; Groos
fibre quality was markedly affected both by et al., 2003). These methods are powerful
general differences between growing sea- but a large number of environmental meas-
sons (years) and by specific differences in urements is necessary for their application.
water management regimes. Seventeen QTL With data collected from multiple loca-
were detected only in the water-limited tion trials on a core set of genotypes, GEI can
treatment while only two were specific to be detected by ANOVA and various statisti-
the well-watered treatment. cal procedures that measure genotype stabil-
Inconsistent QTL detection across ity (Lin et al., 1986; Kang, 1993) as described
environments may also be the result of QEI, in previous sections. To determine genetic
406 Chapter 10

factors responsible for GEI, QEI can be evalu- themselves be regressed on an environ-
ated on the basis of agronomic data collected mental co-variable, z, in an attempt to link
on a mapping population in multiple location differential QTL expression directly to key
trials and comparing QTL detection across environmental factors. The QEI term xi rj is
environments by ANOVA to test marker replaced by a regression term xi(lzj) and a
locus environment interactions. More residual term xir*j, that again disappears
recent efforts in QTL mapping involving GEI from the expectation when r*j is assumed to
have proven far more effective, largely due be random. The parameter l is a proportion-
to the incorporation of a QEI component by ality constant that determines the extent to
integrating this interaction component into which a unit change in the environmental
actual mapping algorithms (Jiang and Zeng, co-variable z, influences the effect of a QTL
1995; Wang, D.L. et al., 1999). allele substitution.
To analyse GED produced in METs, vari-
ous statistical models have been proposed Mixed models
that differ in the extent to which additional
genetic, physiological and environmental Using mixed models, several papers have
information is incorporated into the model been dedicated to the incorporation of the
formulation. The simplest model is the addi- genetic basis of GEI: differential expression
tive two-way ANOVA model, without GEI of QTL in relation to changing environmen-
and with parameters whose interpretation tal conditions, or QEI. Early work on QEI
depends strongly on the set of included gen- was done by Jansen et al. (1995), Jiang and
otypes and environments. The most compli- Zeng (1995) and Korol et al. (1998), who used
cated model is a synthesis of a multiple QTL a mixed model approach. Regression based
model and an eco-physiological model to approaches were presented by Sari-Gorla
describe a collection of genotypic response et al. (1997), Caliski et al. (2000), Hackett
curves. Among these models, factorial regres- et al. (2001) and van Eeuwijk et al. (2001,
sion models allow direct incorporation of 2002). Piepho (2000) and Verbyla et al. (2003)
explicit genetic, physiological and environ- presented other relevant work on QEI. These
mental co-variables on the levels of the geno- authors developed QTL mapping methods
typic and environmental factor. They are also for the analysis of METs using mixed model
very suitable for the modelling of QTL main theory, thereby giving special attention to the
effects and QEI (van Eeuwijk et al., 2005). modelling of heterogeneity of variance across
In the framework of factorial regression, environments and correlations between envi-
modelling of QEI is a natural extension of ronments, where the latter correlations may
modelling main effect QTL, i.e. QTL that be due to undetected QTL.
are supposed to have constant expression Jansen et al. (1995) developed an ana-
across environments. A model with a QTL lytic approach, multiple QTL mapping,
main effect and QEI at the same location in which accommodates both the mapping
the genome can be written as of multiple QTL and GEI. This approach
was compared to interval mapping in
m ij = m + x i r + Gi* + E j + x i rj + (GE )*ij (10.12) the mapping of QTL for flowering time
in Arabidopsis thaliana under various
The (GE)ij from the ANOVA model is photoperiod and vernalization conditions.
partitioned in part due to a differential QTL Procedures developed by Jiang and Zeng
expression, xi rj and a residual, (GE)*ij, that (1995) for estimating the effect of QTL for
is usually taken to be random and for that multiple traits can be used to test the sig-
reason then disappears from the expres- nificance of QEI.
sion for the expectation. In the light of QEI, A least squares interval mapping
the parameter rj adjusts the average QTL approach developed by Sari-Gorla et al.
expression across environments, rj, to a (1997) allows inclusion in the model of the
more appropriate level for the individual parameters describing the experimental and
environment j. The QEI parameters, rj, can environmental situation so that the QEI can
Genotype-by-environment Interaction 407

be tested. The analysis was performed on for yield, as identified at the 2H chromo-
data concerning two components of maize some could be described as QTL expression
pollen competitive ability, obtained from in relation to the magnitude of the tempera-
an experiment over 2 years. The method, ture range during heading.
in comparison with the traditional single
marker approach, has been shown to be Factorial regression model
more powerful in detecting QTL and more
precise in determining their map position. If climatic data are available for precipita-
The analysis has identified QTL expressed tion, temperature and solar radiation, facto-
across years, putative QTL with major rial regression models (van Eeuwijk et al.,
effects and QTL accounting for GEI. 1996) and partial least squares models
Piepho (2000) proposed a mixed model (Aastveit and Martens, 1986) can be used
method to detect QTL with significant mean to determine the degree to which each of
effect across environments and to charac- these factors influence GEI and QEI (Crossa
terize the stability of effects across multi- et al., 1999). Hence, just as molecular mark-
ple environments. He treated environment ers are commonly used to model the effects
main effects as random, which meant that of chromosomal segments (QTL) on a par-
both environment main effects and QEI ticular quantitative trait, climatic data can
effects were random. also be used to model particular aspects of
Verbyla et al. (2003) developed an the environment that contribute to the dif-
approach for multi-environment QTL analy- ferential performance of genotypes across a
sis. To accommodate a multi-environment range of testing environments. Using facto-
analysis, the size of a QTL effect was assumed rial regression models, Crossa et al. (1999)
to be a random effect. The approach resulted were the first to explain QEI and found that
in a multiplicative mixed model for QEI of temperature differences across environ-
the factor analytic type. The full genetic ments accounted for a large portion of the
model may also include a factor analytic QEI detected in a tropical maize (Zea mays
model for the residual GEI, whereas the L.) mapping population. They showed how
environmental model for the non-genetic regression methods such as the partial least
variation involves local, global and extra- squares regression and the factorial regres-
neous variation. The approach was used sion models, together with genetic markers
to determine QTL for yield in the Arapiles and environmental co-variables (such as
Franklin DH population of the National maximum and minimum temperature and
Barley Molecular Marker Program. sun hours), could be used to: (i) detect rele-
Malosetti et al. (2004) presented a strat- vant sets of correlated markers and environ-
egy for modelling QEI using mixed model mental variables that explain a significant
methodology in combination with regres- proportion of the total GEI; and (ii) study
sion ideas. They proposed a simple interval the influence of environmental variables
mapping approach that consists of fitting on the expression of QTL with the objective
along the genome a mixed model with both of assessing and interpreting the QEI that
a fixed QTL main effect and a fixed QEI accounts for GEI. Vargas et al. (2006) used
term and for the random part, the residual factorial regression and partial least squares
genetic variation, a factor analytic model methods for mapping QTL and QEI for the
with one multiplicative term and residual CIMMYT maize drought stress programme.
heterogeneity. For chromosome positions Van Eeuwijk (2001, 2002) extended the
with identified QTL expression and QEI, factorial regression models for GEI and QEI
a second modelling step regresses the QEI developed by Crossa et al. (1999) from the
on one or more environmental co-variables. original marker-based regressions to interval
To illustrate the approach, they analysed mapping and composite interval mapping.
grain yield data stemming from the North The authors presented: (i) a randomiza-
American Barley Genome Project (NABGP) tion test for controlling the genome-wise
(http://barleyworld.org/NABGP.html). QEI error rate, following the logic introduced by
408 Chapter 10

Churchill and Doerge (1994); and (ii) a par- complex relationships of GEI where many
tial least square (PLS) strategy to deal with traits function as not only dependent vari-
the problem of multi-collinearity among ables to be predicted by environmental and
multiple cofactors. The PLS strategy con- genetic factors, but also as independent pre-
sisted of: (i) taking all the markers outside dictor variables of other traits further down-
the chromosome being evaluated as cofac- stream (Dhungana et al., 2007). To use SEM
tors; (ii) regressing the phenotypic responses to analyse GEI, prior knowledge of the direc-
on this set of markers using multivariate tion of the causal relationships is assumed
PLS; (iii) calculating the fitted values for and specified through a path diagram and
the phenotypic responses; and (iv) using the model is then algebraically specified by
the corrected phenotypic observations, i.e. a system of regression-type equations where
the residuals from the PLS regression, in a each variable is adjusted to contain only
simple interval mapping (SIM) procedure GEI effects. A final model is then developed
for the chromosome being evaluated. by fitting successive models and retaining
significant QEI variables which result in a
Structural equation model better fitting model. The final model yields
path coefficients and a path diagram that
Most agronomically important traits are the contains only significant paths thus giv-
result of a number of genetic, molecular and ing insight into important relationships
physiological mechanisms that affect the between traits, QTL and the environmental
trait of interest either directly or indirectly variables.
through other intermediate traits. The GEI The approach was applied to recom-
of each trait in the network among variables binant inbred chromosome wheat lines
will be influenced either directly or indi- grown in multiple environments. The final
rectly, by a number of QEI and GEI of other model explained 74% of the yield GEI varia-
traits, which may, in turn, be influenced by tion and it was found that spikes per square
other factors (Campbell et al., 2003). A single metre GEI had the highest direct effect on
dependent variable quantitative approach yield GEI and that the genetic markers were
cannot describe the complicated relation- mostly sensitive to temperature and precip-
ship between traits, QTL and environments itation during the vegetative and reproduc-
where some traits function simultaneously tive periods. In addition, a number of direct
as both dependent variables to be predicted and indirect causal relationships were
by other genetic and environmental factors identified that described how genes interact
and as independent predictor variables of with environmental factors to affect GEI of
other traits. several important agronomic traits.
Dhungana et al. (2007) developed a sys-
tematic approach for understanding GEI of QEI mapping examples
complex interrelated traits by combining
chromosome institution lines that allowed There are numerous examples now avail-
studying the effects of genes on a single able for QEI mapping using some of the
chromosome with a structural equation approaches described above. Only a few
model (SEM) that approximated the com- examples will be discussed here to repre-
plex process involving genes, environmen- sent different approaches.
tal conditions and traits. Structural equation Romagosa et al. (1996) assessed AMMIs
modelling is a generation of path analysis value in QTL mapping. This was done
proposed by Wright (1921) and is used to through the analysis of a large two-way table
quantitatively analyse the causal structure of GED of barley (Hordeum vulgare L.) grain
among a number of variables where each yields. Grain yield data of 150 DHs derived
may function as a dependent variable in from the Steptoe Morex cross and the two
some equations and an independent vari- parental lines, were taken by the NABGP
able in others (Bollen, 1989). Because of at 16 environments throughout the barley
this, SEM is ideal for characterizing the production areas of the USA and Canada.
Genotype-by-environment Interaction 409

Four regions of the genome were identified QTL exhibited significant QEIs in the Ler
to be responsible for most of differential Col and Cvi Ler lines, respectively. These
genotypic expressions across environments. interactions were attributable to changes in
They accounted for approximately 50% of magnitude of effect of QTL more often than
the genotypic main effect and 30% of the to changes in rank order (sign) of effect.
GEI sums of squares. The magnitude and Multiple QEIs (in Cvi Ler) clustered in two
sign of AMMI scores for genotypes and sites genomic regions on chromosomes 1 and 5,
facilitate inferences about specific inter- indicating a disproportionate contribution
actions. The parallel use of classification of these regions to the phenotypic patterns
(cluster analysis of environments) and ordi- observed.
nation (PCA of the GE matrix) techniques By using factorial regression models,
allowed most of the variation present in the agronomic and molecular genotype data
GE matrix to be summarized in just a few and three environmental covariates (daily
dimensions, specifically four QTL showing mean temperature, precipitation and solar
differential adaptation to four clusters of radiation) recorded in each test environ-
environments. ment, Campbell et al. (2004) investigated to:
An illustration of the uncertainties (i) detect which of these three environmen-
occurring when attempting to find specific tal covariates may account for GEI by testing
genetic factors for yield is presented by individual genotype environmental cov-
Reyna and Sneller (2001). They attempted ariate interactions; and (ii) detect marker
to introgress alleles for yield into elite soy- environmental covariate interactions that
bean material in the southern soybean area provide explanations of variable QTL gen-
of the USA. As variation was scarce, the otypic differences across environments.
authors tried to exploit beneficial alleles for Agronomic performance and molecular
yield identified as QTL in cv. Archer from marker data available for a population of
the northern soybean area of the country chromosome 3A recombinant inbred chro-
(Orf et al., 1999). Reyna and Sneller (2001) mosome lines (RICLs-3A) in seven environ-
built up four NILs for each QTL and tested ments were used along with environmental
them under different environmental con- covariate data to construct individual fac-
ditions. They found that the QTL for yield torial regressions to explain GEI and QEI.
identified in a particular cultivar and envi- Precipitation and temperature before anthe-
ronmental condition did not contribute sig- sis had the greatest influence on agronomic
nificantly to improved yields in a different performance traits for the RICLs-3A and
genetic background and different environ- explained a sizeable portion of the total
mental conditions. The authors concluded: GEI for those traits. Individual molecular
It may be difficult to capture the value marker environmental covariate interac-
assigned to QTL alleles when the alleles tions explained a large portion of the total
are introgressed into populations with dif- marker environment interactions for several
ferent genetic backgrounds, or when tested agronomic traits.
in different environments. Laperche et al. (2007) used three
Ungerer et al. (2003) examined inflores- methodologies to reveal QTL nitrogen
cence development patterns in Arabidopsis interactions: (i) QTL detected separately
under different, ecologically relevant pho- under both types of N supply; (ii) QTL
toperiod environments for two RIL map- detected for global interaction variables
ping populations (Ler Col and Cvi Ler) assessed as N+/N and N/N+; and (iii) QTL
using a combination of quantitative genet- considered for factorial regression slope
ics and QTL mapping. Plasticity and GEI and ordinate parameters, which represent
were regularly observed for the majority of a plants sensitivity to N stress and plant
13 inflorescence traits. These observations performance under a limited N supply. In
can be attributable (at least partly) to vari- total, 233 QTL were detected for the traits
able effects of specific QTL. Pooled across measured in each combination of environ-
traits, 12/44 (27.3%) and 32/62 (51.6%) of ment and N supply (N+: high supply; N: low
410 Chapter 10

supply). Comparison of QTL detected under ing with GEI in a breeding programme:
N+ and N levels identified 13 non-specific (i) ignoring them, i.e. using genotypic means
QTL, eight N+ specific loci and seven across environments even when GEI exists;
N specific loci. For QTL for global inter- (ii) avoiding them; or (iii) exploiting them.
action variables, four adaptive loci were Kang (2002) discussed these three ways.
validated and eight constitutive loci were Interactions should not be ignored when
found to be involved in G nitrogen inter- they are significant and of the crossover
action. Nine interactive loci were validated type. The second way of dealing with these
and three new loci detected using factorial interactions, i.e. avoiding them, involves
regression variables. minimizing the impact of significant inter-
actions. One approach is to group similar
environments (forming mega-environments)
10.4.4 Utilization of MET and genotypic via a cluster analysis as discussed in pre-
data vious sections. With environments being
more or less homogeneous, genotypes
evaluated in them would not be expected
While the international METs have been
to show COIs. By clustering environments,
used effectively to exchange germplasm
potentially useful information may be
there have been only limited analyses of the
lost. International research centres such as
large data sets that they generate. Further,
CIMMYT, aim to identify maize and wheat
many of these analyses have focused on a
genotypes with broad adaptation (i.e. stable
specific international MET conducted in 1
performance across diverse environments)
year. Analyses which integrate the infor-
at many international sites. If the subgroup-
mation from international METs across
ing is used to eliminate the environments
years have been attempted in only a few
that share the same factors and are identical
cases. The strength of these studies is that
to each other (redundant test environments),
they integrate large quantities of data on
optimization of the environment sites will
spatial and temporal GEIs and provide a
also help determine the broad adaptation by
basis for identifying repeatable interactions.
using as few environment sites as possible.
However, their weakness is that in most
The third approach encompasses stability
cases there is only a limited information
of performance across diverse environments
base for explanation of these interactions.
by analysing and interpreting genotypic and
There are great opportunities for synergy
environmental differences. This approach
between the statistical, genetical and bio-
allows researchers to select genotypes
physical modelling methodologies (Cooper
with consistent performance, identify the
and Hammer, 1996). The complete data set
causes of GEI and provide the opportunity
which contains genotype, phenotype and
to correct the problem. When the cause for
environment information for numerous
the unstable performance of a genotype is
genetics and breeding materials opens the
known, either the genotype can be improved
door for comprehensive use of them in both
by genetic means or a proper environment
genetics and plant breeding including GEI
(inputs and management practices) can be
through genome-wide association mapping.
provided to enhance its productivity.
In Chapter 6, we provided an example of
The best approach for breeders and
the use of cultivars in METs and their phe-
geneticists would be to understand the nature
notypic and genotypic data in linkage dis-
and causes of GEI and to try to minimize its
equilibrium mapping (Crossa et al., 2007).
deleterious implications and exploit its ben-
eficial potential through appropriate breed-
ing, genetic and statistical methodologies
10.5 Breeding for GEI (Singh et al., 1999). Appropriate analyses of
data can provide an opportunity for exploit-
How can a breeder deal with GEI? Eisemann ing GEI through applied analytical meth-
et al. (1990) listed three ways of deal- ods, such as AMMI and GGE bi-plot, using
Genotype-by-environment Interaction 411

climatic factors to explain GEI, evaluating study, factorial regression revealed that
risk of production and optimizing allocation water deficits during the formation of grain
of land resources to various genotypes for number and N level were also associated
selection in heterogeneous environments. with GEI.
To alleviate GEI concerns caused by
stresses, breeders need to know as much
10.5.1 Breeding for resource-limited about the various characteristics of geno-
environments types as possible. They also need to charac-
terize environments as fully as possible
Breeding for resource-limited environments (Kang, 2002). Knowledge of soil character-
is one of the major objectives for many inter- istics and ranges of weather variables and
national breeding programmes. Commonly stresses that plant materials will be exposed
the performance of a genotype in an envir- to is a prerequisite to exploiting the benefi-
onment is a function of the influence of cial potentials of the genotypes and environ-
many interacting factors and environments ments and to targeting appropriate cultivars
differ in the type, intensity and timing of to specific environments.
these challenges. Although biotic stresses andinteractions
The performance level is important. among them and/or with abiotic factors
Where productivity is low due to an over- remain poorly understood, they have signif-
riding environmental limitation, it may be icant relevance to GEI in plants. Plants may
comparatively easy to accomplish improve- respond to pathogen infection by inducing
ment in performance through genetic and/ a long-lasting, broad-spectrum, systemic
or environmental change (Kang, 2002). resistance to subsequent infections. Induced
A relatively simple genetic change may have disease resistance has been referred to as
quite a fundamental influence on perform- physiological acquired immunity, induced
ance and hence adaptation; for example, use resistance or systemic acquired resistance.
of an early maturity, vernalization require- Differences in insect and disease resistance
ment or even a morphological character to among genotypes can be associated with
avoid frost damage, genetic resistance to a stable or unstable performance. It is highly
specific disease, genetic tolerance of a nutri- desirable to identify QTL for a complex trait
tional disorder, etc. Equally, environmental that is expressed in a number of environ-
modification to overcome the limitation ments. Crossa et al. (1999) found that higher
may be possible. In these situations, the key maximum temperature in low- and interme-
to plant improvement is the recognition of diate-altitude sites affected the expression
the nature of the stress or challenge and of of some QTL, whereas minimum tempera-
the adaptive response (Cooper and Byth, ture affected the expression of other QTL,
1996). in tropical maize.
In order to increase crop productivity
through enhanced yield potential, heterosis,
modified plant types, improved yield stabil-
ity, gene pyramiding and exotic and trans- 10.5.2 Breeding for adaptation
genic germplasm, it is important to identify and stability
the factors that are responsible for GEI.
Brancourt-Hulmel (1999) used crop diagno- An understanding of the genetic basis of
sis with the analysis of interaction by fac- adaptation and stability and their physio-
torial regression in wheat. She provided an logical and environmental causes is of fun-
agronomic explanation of GEI and defined damental importance for understanding
the responses or parameters for each geno- GEI, for assessing the association between
type and each environment. Earliness at phenotypic and genotypic values and for
heading, susceptibility to powdery mildew enhancing the selection of superior and
and susceptibility to lodging were the major stable genotypes. The presence of COIs
factors responsible for GEI. In the same has important implications for breeding
412 Chapter 10

strategies that aim to improve either broad tatively in time and degree for a number of
or specific adaptation or some combina- uncontrollable factors, analysis of genetic
tion of both components of adaptation. differences becomes complex (Cooper and
The broad adaptation concept the Byth, 1996).
need to minimize GEIs (and maximize G) With a MET, a breeder can identify cul-
was successful for the rapid adoption of tivars with specific adaptation as well as
the seed-based technology of the Green those with broad adaptation, which will not
Revolution. But is it appropriate for the be possible from testing in a single environ-
green evolution? Cultivars must be diversi- ment. Broad adaptation provides stability
fied and matched with the diversity of pest against the variability inherent in an eco-
systems to ensure effective and durable system, but specific adaptation may provide
pest management. Genotypes will need to a significant yield advantage in particular
be matched with a less predictable water environments as discussed in Section 10.1.
supply in the irrigated system in rice. MET makes it possible to identify culti-
Scientists may also need to match genotypes vars that perform consistently from year to
with radiation levels (again unpredictable) year (small temporal variability) and those
to address the challenge of increasing the that perform consistently from location to
yield by 50%. location (small spatial variability). There
When we look at plant adaptation, is a need for developing cultivars with
which exploits spatial GEI, are we interested broad adaptation to a number of diverse
in this primarily as a proxy for temporal environments (adaptability) and a need for
GEIs to help ensure the stability of perform- farmers to use new cultivars with reliable
ance of our chosen genotypes over time? We or consistent performance from year to year
need to have clarity in our objectives in pur- (reliability) (Evans, 1993). Genetic improve-
suing this particular topic. Are we search- ment for low-input conditions would
ing for adaptability per se in order to exploit require capitalizing on GEI and slower or
technological spillover, or are we using limited gains in low-input or stress envi-
spatial GEIs to ensure stability of perform- ronments suggested that conventional high-
ance over time for farmers using improved input management of breeding nurseries
cultivars in particular locations? Tools like and evaluation trials might not effectively
modelling and simulation can complement select genotypes with improved perform-
spatial genotype-by-environment experi- ance at low-input levels (Smith, M.E. et al.,
mentation and analysis aimed at addressing 1990). Because of the success in favourable
either adaptability or stability issues. environments, plant breeders have tried
More commonly, however, there may be to solve the problems of poor farmers liv-
a range of responses to the environmental ing in unfavourable environments by sim-
challenge and the same adaptive response ply extending the same methodologies and
can result from different challenges. In philosophies applied to favourable, high-
these circumstances, the factors influencing potential environments, without consider-
adaptation are multivariate, quantitative ing the possible limitations associated with
and complex and may vary in an undefined the presence of a large GEI (Ceccarelli et al.,
manner between different genotypes. Thus 2001). Responses to selection under stress
it is more difficult to recognize the nature and non-stress environments and to selec-
of the challenge and explain the adaptive tion at high- and low-input levels need to be
response. In advanced testing programmes compared theoretically and practically.
where genotypes exhibit reasonably high As indicated by Kang (2002), stability
levels of performance, relative differences of cultivars would be enhanced if multiple
in adaptation and the specific nature of resistances/tolerances to stress factors were
the GEI become increasingly important in incorporated into the germplasm used for
defining breeding objectives and strategies. cultivar development. If every cultivar (dif-
However, where adaptation is a function of ferent genotypes) possessed equal resistance/
response to environments differing quanti- tolerance to every major stress encountered
Genotype-by-environment Interaction 413

in diverse target environments, GEI would be Multi-environmental testing at early stages


reduced. Conversely, if genotypes possessed of breeding
differential levels of resistance (a heteroge-
neous group) and, somehow, we could make Kang (2002) proposed early multi-environ-
all target environments as homogeneous as ment testing. Usually, there is a shortage of
possible, GEI would again be reduced. Since seed at the earliest stages of breeding, which
we do not have any control over unpredicta- prevents extensive testing at multiple loca-
ble environments from year to year, the only tions. However, in a clonally propagated
approach would be the former. crop, such as sugarcane or potato, one stalk
Plants have incorporated a variety of of sugarcane or one tuber of potato can be
environmental signals into their develop- divided into at least two pieces and planted
ment pathways that have provided for in more than one environment. Similarly,
their wide range of adaptive capacities in other crops, if only 20 kernels are avail-
over time. In response to severe environ- able, one could plant ten seeds each in two
mental changes, a genome can respond by diverse environments. In the absence of
selectively regulating (increasing, decreas- GEI, one would obtain a better evaluation
ing or even shutting down) the expression of the genotypes, but, if GEI was present,
of specific genes. Jiang et al. (1999) used one would obtain information about the
molecular markers to investigate adaptation consistency or inconsistency of perform-
differences between highland and lowland ance of genotypes early in the programme.
tropical maize. They concluded that breed- This strategy would prevent gene loss or
ing for broad thermal adaptation should be genetic erosion, which could occur if test-
possible by pooling genes showing adapta- ing was done in only one environment and
tion to specific thermal regimes, albeit at would also result in an increased breeding
the expense of reduced progress for specific effort without a corresponding increase in
adaptation. expenditure of resources.

Unbalanced data
10.5.3 Measurement of GEI in breeding
programmes Plant breeders often deal with unbalanced
data. Searle (1987) classified unbalanced-
ness as planned unbalanced data and miss-
Measure interaction at intermediate
ing observation. When a set of genotypes
growth stages
is grown in a specific set of environments,
A crop is exposed to variable environmen- oftentimes a balanced data set (without any
tal factors throughout the developmental missing scores) is not possible, especially
stages and the growing season. Generally, when a wide range of environments is used,
researchers investigate the causes of GEI or long-term trials are conducted. Hybrids/
for a quantitative trait such as yield that are cultivars are continually replaced year
phenotyped at the final harvest stages. To after year. Also the number of replications
critically investigate GEI, one may need to may not be equal for all genotypes because
record environmental variables and plant- experimental plots may be discarded for
growth measurements at specific time one reason or another. In such cases, plant
intervals throughout the growing season, breeders must deal with unbalanced data.
as suggested by Xu (1997) for dynamic QTL Researchers have used different ap-
mapping. This would help determine what proaches for studying GEI in unbalanced
effect, if any, the environmental variables data (Kang, 2002). Usually environmental
from an earlier period had on GEI at inter- effects are considered as random and cul-
mediate stages and on the final yield. This tivar effects as fixed. Inference on random
may provide a better understanding of the effects using least squares, in the case of
dynamic development process of a quanti- unbalanced data, is not appropriate because
tative trait. information on variation among random
414 Chapter 10

effects is not incorporated (Searle, 1987). test environments need to be estimated to


For this reason, mixed model equations select QTL for MAS. This only fails if every
are recommended (Henderson, 1975). The QTL manifests COI and the test environ-
restricted maximum likelihood metho- ments did not uncover these interactions,
dology is generally preferred to maximum both of which are very unlikely for care-
likelihood estimates because it considers the fully selected test and target environments.
degrees of freedom for fixed effects for cal- Suppose, for example, a QTL manifests a COI
culating error. The calculation of restricted which is not observed among the QTL geno-
maximum likelihood stability variances for type by test environment means, but which
unbalanced data allow one to obtain a reli- leads to no overall effect of the QTL within
able estimate of stability parameters and target environments, then putting selection
overcomes the difficulties of manipulating pressure against this QTL is equivalent to
unbalanced data (Kang and Magari, 1996). selecting a neutral locus (Knapp, 1994).
This diminishes the selection response by
decreasing selection intensity, as do many
errors (Edwards and Page, 1994).
10.5.4 MAS for QEI Differentiating between non-crossover
and crossover QEIs might be important for
Genotype means across test environments optimizing MAS. Crossover QEIs could affect
are used to select lines, populations, hybrids the outcome of MAS, whereas non-crossover
and cultivars for target environments and QEIs should be of no consequence to the
to select marker and QTL alleles for MAS outcome of MAS. Understanding and char-
across target environments. Test environ- acterizing the nature of QEIs is useful for
ments samples of years, locations and optimizing MAS or conventional selection.
other factors are selected to maximize the
speed of a selection cycle while minimizing
the cost of testing and maximizing selec-
tion gains for target environments. GEIs can 10.6 Future Perspectives
cause test environments to fail to maximize
selection gains for target environments, with QTL mapping so far has provided strong
equivalent consequences for selection with evidence that in addition to some consistent
and without markers. At the extreme, this additive QTL effects across genetic back-
happens when differences between geno- ground and environments, genetic architec-
types are observed across test environments ture in elite breeding populations involves
and there are no differences across target important components of epistasis, GEI and
environments. Or when differences between pleiotropy. However, this type of informa-
genotypes are not observed across test envi- tion has not been used to enhance breeding
ronments and there are differences across strategies. The realized progress from selec-
target environments. The consequences are tion has been usually considerably lower
either to fix unfavourable alleles, or, for than the predicted response, which in most
MAS only, to fix alleles at QTL which have cases might have been associated with GEI.
no mean effect across test environments, but To examine the potential of molecular-
which had an effect across the sample of test enhanced breeding strategies to enhance
environments used. The root of the problem the predictive response, many attempts have
with GEIs is differences between test and been made and are currently underway to
target environments. Nothing can be done construct relevant gene-to-phenotype models
about the outcome of selection if test and tar- for traits to assist the plant breeding process.
get environments are fixed (Knapp, 1994). As such a example, Cooper et al.
Additional methods are needed to deter- (2005, 2007) have tackled this as a genetic
mine the nature of QEIs. These methods modelling problem by developing a flex-
are hardly necessary for practising MAS. ible quantitative framework for studying
Only the means of QTL genotypes across the genetic architecture of traits in terms
Genotype-by-environment Interaction 415

of gene networks. This framework, called N genes can interact in K different ways to
the E(NK) model, is an extension of the NK determine the trait phenotype in E differ-
gene network model that was introduced ent environment-types. K = 0 indicates the
and used by Kauffman (1993) to study N genes act independently in the model
the behaviour of gene networks and their and a larger K indicate increased levels of
influences on organism development and interactions among the N genes.
evolutionary processes. When the E(NK) Kauffmanns landscape concept can be
model is applied to the study of issues rel- used in combination with the E(NK) model
evant to plant breeding processes, it allows to examine how the shape of the phenotype
for the property that the influence of a gene landscape changes with the genetic archi-
network on the expression of a trait can dif- tecture of a trait, as determined by changes
fer in varying environmental conditions. in the levels of E, N and K. The simple addi-
Thus, E identifies different environment- tive finite locus model is defined by the case
types within the context of a defined TPE, where E = 1 and K = 0, thus E(NK) = 1(N:0)
N identifies the different genes and K iden- (Fig. 10.8). As E and K are increased for a
tifies the degree of connection between given level of N, the effects of the alternative
subsets of the total set of N genes, i.e. the alleles for the N genes become increasingly
gene network topology (Kauffman, 1993; context dependent on the genotypes of other
Cooper et al., 2005). Thus, in the termi- genes and on the range of environment-
nology of quantitative genetics the E(NK) types in the target population of environ-
model is a finite locus polygenic model ments. Thus, context dependent effects of
that can be defined to include effects of genes due to epistasis and GEI can be simu-
epistasis and GEIs. The parentheses around lated (Cooper and Podlich, 2002). Building
the NK term are used to indicate that the on the landscape metaphor (Fig. 10.8), it is

E=1 E >1 GE


Additive
interactions
K=0
Individual environment-types
(N:K) landscapes

0<K<N1

K=N1

Increasing
epistasis
Target population of environments
E(N:K) landscapes

Fig. 10.8. Schematic three-dimensional representation of the phenotypic state-space performance


landscapes for gene-to-phenotype (G P) models simulated using the E(NK) model. The additive
E(NK) = 1(N : 0) G P model is depicted as a single-peak landscape. Models with increasing levels
of epistasis (i.e. from K = 1 to K = N 1) are depicted by an increasingly more rugged landscape surface.
Models with GEIs are depicted as a series of different landscape surfaces for different environment-types
(E). The G P response surface for the target population of environments (TPE) is depicted as a mixture
of the response surfaces from the different environment-types. From Cooper et al. (2005) CSIRO 2009.
416 Chapter 10

observed that as E and K are increased we microarray experiments becomes increas-


move from a single peaked additive land- ingly popular in genetic studies of quan-
scape for the E(NK) = 1(N:0) case to a mul- titative traits, it is possible to detect the
tiple peaked landscape and ultimately a relative signals of many genes simultane-
random landscape when K = N 1 and E > 1 ously, allowing better understanding of
(Cooper et al., 2002b). genetic networks associated with specific
Recent advances in genome sequenc- developmental stages and/or environmental
ing and high-throughput technologies, such factors. Combining large-scale microarray
as DNA and protein chips as described in experiments with genetic network simula-
Chapter 3, allow us to measure the spatio- tion, it can be expected that GEI will be fur-
temporal expression levels of thousands of ther revealed at the whole genome level and
genes or proteins. As gene expression in in the context of genetic networks.
11
Isolation and Functional Analysis of Genes

One of the challenges in molecular breed- All techniques for gene isolation exploit
ing is to understand how thousands of gene one or more of the four characteristics
products interact with each other to control that define genes (Gibson and Somerville,
development and the ability of an organ- 1993): they have a defined primary struc-
ism to respond to its environment. Gene ture (sequence); they occupy a particular
isolation and its functional analysis are not location within the genome; they encode
only for development of functional markers an RNA with a particular expression pat-
but also for manipulating plants through tern; and many genes encode protein or
genetic transformation. For sequenced plant mRNA products with a defined function.
species identification of function for each Therefore, identification and functional
gene has become a major focus in the era analysis of genes can start at various points
of functional genomics. For example, the in the process of gathering information about
Arabidopsis community has developed an genomes: they can be identified from their
initiative to empirically identify the func- locations relative to closely linked mark-
tion of all Arabidopsis genes by the year ers on a genetic map, from their presence
2010. in populations of RNAs, from an analysis
To isolate and characterize all the genes of genomic sequence data with gene find-
in plants, it is important to first define ing programs, from comparisons with the
what we mean by a gene. A gene was ini- genomic sequence data from related organ-
tially defined as the nucleic acid sequence isms, or from their disruption with the sub-
that codes for a peptide. The definition now sequent appearance of a phenotypic variant
is extended to encompass many more fea- (Fig. 11.1; Cullis, 2004).
tures including the presence of gene fami- In a simple organism such as E. coli, one
lies within a plant, alternative splicing, can readily isolate individual genes associ-
RNA that functions without translation ated with a particular function. In a few Petri
into a protein and other confounding fac- dishes one might select among millions of
tors that together make a simple universal individuals to identify mutants in the func-
definition more difficult (Cullis, 2004). A tion of interest, then artificially introduce
gene, when defined as a transcribed (and wild-type (e.g. non-mutant) DNA into the
translated) unit, is usually split into coding mutants and identify those segments which
pieces (exons) that are separated by inter- restore normal function. Having a gene in
vening sequences (introns) in the eukaryotic hand, one might determine the sequence of
genomes. genetic information comprising the gene,

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 417


418 Chapter 11

Genomic sequence

Gene discovery

In silico ESTs Full-length Tilling and MPSS SAGE


gene prediction cDNAs expression arrays

Gene functional annotation

In silico prediction Gene expression Mutant analysis Proteinprotein interaction

Forward genetics Reverse genetics

Select a biological process Select a gene or genes of interest


Generate highly redundant mutant populations Generate highly redundant mutant populations
Screen for mutants with a desirable phenotype Develop and conduct sequence-based mutant
Map and clone the gene responsible for the screens
phenotype Analyse the phenotype of the mutants

Fig. 11.1. From genomic sequence to gene function. Steps and experimental approaches that are used
in the functional annotation of the genome. MPSS, massively parallel signature sequencing; SAGE, serial
analysis of gene expression. Modified from Alonso and Ecker (2006).

the protein product encoded by this infor- is difficult to discern the effect of a single
mation, the function of the protein, the gene (QTL) by merely looking at the appear-
regulation of activity of the gene or protein ance (phenotype).
by environmental factors and so on. Such The main source of empirical information
elegant schemes for gene cloning are now about gene function and structure has been the
being used for plants, but with significant capture and characterization of mRNA tran-
challenges. Higher plants tend to have scripts (Fig. 11.1). A variety of high through-
relatively large quantities of DNA in their put methodologies have been successfully
genome and more non-coding DNA than used for the model plant, Arabidopsis thal-
coding DNA, making it difficult to identify iana (Alonso and Ecker, 2006), including
particular genes of interest. Further, higher expressed sequence tags (ESTs) and full-
plants have relatively long generation times, length cDNA sequencing, whole genome
months to years (versus a few minutes for tiling microarrays and gene-expression arrays,
E. coli). Based on phenotypes, one is sel- a massively parallel signature sequencing
dom able to study millions of individuals or (MPSS) technique and serial analysis of gene
know enough about the function of a gene in expression (SAGE). Such methods provide
order to isolate it from the many thousands information on gene splicing and transcribed
of other genes in the organism. Many major units. The most widely used methods for
genes have been cloned based on various isolating genes based on their functions
approaches, but for traits affected by several involve protein purification, complementa-
genes or quantitative trait loci (QTL), the tion of mutant phenotypes, positional cloning
effects of any one are often partly masked using genetic maps and mutagenesis-based
by others and/or by environments. Thus, it gene identification. The major limitation to
Isolation and Functional Analysis of Genes 419

gene cloning based on its function is that Currently, computational sequence


for about half of the genes in most organ- analysis methods are usually exploited as
isms the functional or physiological proper- a complement to and component of other
ties of their gene products are unknown or functional genomics approaches. There are
the corresponding proteins cannot be puri- two main categories of computational meth-
fied in sufficient quantities to permit amino ods that detect genes: extrinsic approaches
acid sequence determination or preparation relying essentially on comparison with other
of antibodies. As described in this chapter, related sequences and intrinsic approaches
however, there are many elegant strategies for based only on the local properties of the
cloning genes in plants. sequence under scrutiny (nucleotide com-
Objectives of this chapter are to review position and sequence motifs) (Windsor and
some basic methods that have been used for Mitchell-Olds, 2006). Methods using infor-
isolation and functional analysis of plant mation intrinsic to the sequence (Fig 11.2A)
genes. These methods include those based are based on the analysis of sequence, with-
on in silico prediction, comparative genom- out referring to other sequences stored in
ics, cDNA sequencing, microarray, map- the databases. These methodologies are
based cloning and mutagenesis. For more commonly encountered as ab initio tools
comprehensive discussion, readers are rec- and are, by definition, not comparative
ommended to refer to Gibson and Somerville (Davuluri and Zhang, 2003). Ab initio gene
(1993), Foster and Twell (1996), Jenks and prediction algorithms display high sensi-
Feldmann (1996), Paterson (1996b), Weigel tivity, but a low specificity in their output
et al. (2000), Davuluri and Zhang (2003), models. Extrinsic gene prediction meth-
Ramakrishna and Bennetzen (2003), Cullis ods are based on extrinsic data (including
(2004), Jeon et al. (2004), Seki et al. (2005), expression evidence) and/or sequence simi-
Windsor and Mitchell-Olds (2006), Gibrat larity (Fig. 11.2B, C, D), which supplement
and Marin (2007), Nicolas and Chiapello ab initio prediction by providing improved
(2007), Candela and Hake (2008) and Jung specificity and complementary sensitivity.
et al. (2008). The first two features to be placed
on newly acquired assembled genomic
sequences are the possible open reading
frames (ORFs) and splice sites. These two
11.1 In Silico Prediction sets of data are combined to identify both
already known and putative genes. Other
Recent improvement in sequencing technol- available information that can be added to
ogies has made large-scale DNA sequencing the sequence includes markers for genetic
practical and widely accessible, which has maps, submitted genes and ESTs and other
enabled the development of sequence-based EST data from different species. Comparisons
methods for identifying genes through of gene family members within and between
exon discovery in genomic sequence data species yielded the expected result that the
(Nunberg et al., 1996). Ideally analytical genes were most highly conserved between
methods should not depend only on a gene the most closely related species. Moreover,
from another species that has already been sequence conservation was greatest in the
isolated and sequenced. For organisms with protein-coding portions of the exons. All
compact genomes such as bacteria and these make computational methods for gene
yeast, exons tend to be large and the introns identification practical.
are either non-existent or short, so that the
identification of genes by computational
approaches is relatively straightforward.
However, the challenge is much greater 11.1.1 Evidence-based gene prediction
for larger genomes such as those of plants,
because the exonic signal is buried under Gene prediction frameworks using expres-
non-genic noise. sion data, also called evidence-based gene
420 Chapter 11

Query DNA sequence

A B C

a
Peptide data, EST/mRNA data
and full-length cDNA data

b
Predicted Predicted
Actual Actual

Coding sequences
D False exon/coding region (A);
Predicted
False meaningful non-coding sequence (B)
Predicted Exon/intron discrepancy
Predicted True meaningful non-coding sequence
Actual Missed feature
Actual

Fig. 11.2. Intrinsic (light grey arrows) and extrinsic (dark grey arrows) methods for gene prediction. Thick
black lines represent query sequence data. WatsonCrick-strand coding sequences are indicated above
or below sequence strands. Path (A), ab initio gene prediction algorithms model gene content with data
from the query sequence itself. These methods can miss features such as small ORFs and small introns.
Exons are also missed, but ab initio methods can erroneously identify exons or whole coding sequences.
Generally, these methods are not applicable to the prediction of functional non-coding sequences.
Path (B), similarity-based gene prediction. These methods are comparative and incorporate data from
the alignment of one or more syntenic DNA sequences. Similarity-based methods display improved
sensitivity and specificity for coding and non-coding sequences over ab initio methods. The ability to
predict genes or conserved features is a function of the number of sequences compared, the evolutionary
distance of these sequences, and the degeneracy and size of the features in the homologous sequences.
Path (C), evidence-based gene prediction. These methods can be computational or experimental and
display high specificity but low sensitivity. The efficacy of the prediction is contingent on the quality/extent
of available expression data. Path (D), combinatorial approaches. In the example presented, similarity
evidence is combined with an ab initio prediction to improve the overall prediction of gene content.
Reprinted from Windsor and Mitchell-Olds (2006) with permission from Elsevier.

prediction, integrate empirical transcription suggested that approximately 19,000 gene


and protein expression data with genome functions are encoded in the green plant
sequence to produce gene models (Fig. lineage (Vandepoele and Van de Peer,
11.2C) and facilitate annotation. Such data 2005), nearly 6500 of which are encoded
provide high specificity to gene model pre- by orphan genes, novel genes that cannot
diction, but sensitivity is contingent on the be definitively assigned to a characterized
extent of the expression data sets. This prop- homologue or gene family.
erty has negative impacts on the identifica-
tion of sequences with tightly regulated or
low-abundance transcripts or of RNA species
that are not translated. The incorporation of 11.1.2 Homology-based gene prediction
expression data from multiple species can
overcome some of these limitations and Sequence homology or similarity is a very
allows the use of species with no or partial powerful type of evidence for detecting
genome sequence in comparative analyses functional elements in genomic sequences.
(Windsor and Mitchell-Olds, 2006). The homology-based methods to detect
The analysis of the transcriptome and genes use either intraspecies or interspe-
proteome data sets for the kingdom Plantae cies sequence comparison in different ways
Isolation and Functional Analysis of Genes 421

(Davuluri and Zhang, 2003; Windsor and (as a function of the abundance of mRNA)
Mitchell-Olds, 2006; Nicolas and Chiapello, and of alternative splicing. The problem
2007). Most of the algorithms use one of is more severe when the mRNA sequence
three types of external information: pro- has been obtained in a different organism
tein sequences, mRNA sequences or DNA (Nicolas and Chiapello, 2007). If the query
sequences. sequence is very long, MEGABLAST is a bet-
ter choice, which is specifically designed
Comparison with EST/cDNA databases to efficiently find long alignments between
very similar sequences. MEGABLAST is also
It has been demonstrated that homology- optimized for aligning sequences that dif-
based cloning was a very effective way to fer slightly as a result of sequencing errors.
identify tissue-, organ- and developmental Davuluri and Zhang (2003) suggested the
stage-specific expressed genes by assem- use of an expected value (e-value) of 0.1 and
bling EST sequences derived from the same filtering for low complexity repeats. When
library and performing homology searches larger word size (with the default value
in databases. In an early report, about 18% of 28) is used, it increases the search speed
5000 human clones were assigned probable and limits the number of database hits. For
function (Adams et al., 1991) by simply pick- BLASTN, the word size can be reduced from
ing random cDNA clones, obtaining partial the default value of 11 to a minimum of 7
sequence from the 5' end and comparing the to increase the sensitivity. Algorithms that
six possible translations of the partial cDNA can be used for DNAcDNA and DNAEST
sequences with the sequences of known alignments include SIM4 (Florea et al., 1998)
proteins in the various databanks. The and GENESEQUER (Usuka et al., 2000).
sequencing of mRNA through the sequenc- Similarity-based methods (e.g. BLASTN,
ing of cDNA is an experimental technique BLASTX) are perhaps the best to determine
to determine the sequence of genes that are whether a given region of the genome is tran-
transcribed. As sequencing takes places scribed or not. A BLASTN match to a cDNA/
after splicing, cDNA sequencing also allows EST or BLASTX match to a protein is good
precise determination of the intronexon evidence that the region belongs to a gene.
structure of genes. By directly comparing a However, these methods have their own lim-
genomic DNA sequence (query) with ESTs itations (Davuluri and Zhang, 2003). Even
or cDNA, regions of the query sequence the most comprehensive cDNA projects will
that correspond to processed mRNA can be miss low copy number transcripts and those
identified. transcripts whose expression is low, cell- or
BLASTN is a common program that iden- tissue-specific, or expressed only under unu-
tifies similar nucleotide sequences that sual conditions. cDNA or ESTs can contain
exist in databases (nr/EST) to the query one or more introns, if the mRNA was par-
sequence (see Basic Local Alignment Search tially spliced, which could lead to misclas-
Tool (BLAST) help at http://www.ncbi.nlm. sification of intron regions as exons. Some
nih.gov/BLAST for further details about cDNA sequences may result in incorrect
BLASTN and other programs). The similar- protein prediction. Partial BLASTX alignment
ity between the sequences is estimated by to a target protein should not be consid-
aligning the sequences as closely as pos- ered, as the protein may not be a true ortho-
sible. The BLASTN algorithm finds similar logue of the source gene and only shares
sequences by generating an indexed table some domains, although it would still give
or dictionary of short subsequences called some information for gene prediction.
words for both the query and the database.
However, determination of the DNA struc- Comparison with protein sequence
ture of genes from ESTs is not trivial, because databases
these sequences are generally incomplete
(often sequenced at the 3' end), of poor DNAprotein similarity can be compared
quality (sequenced only once), redundant to predict the protein coding sequences
422 Chapter 11

that resemble proteins already present If homologous genomic sequences from


in the databases. BLASTX (Gish and States, two species are known, then a recently
1993) and FASTX (Pearson et al., 1997) are developed gene prediction tool called
two programs for similarity analysis by SGP-1 can be used to find protein-coding
local alignment between a DNA sequence genes (Davuluri and Zhang, 2003).
translated in six reading frames and a pro-
tein sequence library. Subsequently, spliced Comparison of a translated genomic
alignment programs such as GENE-WISE, sequence with translated nucleotide
GENESEQUER, or PROCUSTES can be used to find database
gene structure by comparing the genomic
sequence to the target protein sequences. TBLASTX can be used to identify similarities
These programs derive an optimal align- among protein coding regions by selecting
ment based on sequence similarity score of Nucleotide query Translated db [tblast]
the predicted gene product to the protein option from the BLAST web page. TBLASTX
sequence and intrinsic splice site strength takes a nucleotide query sequence, trans-
of the predicted introns. However, to pre- lates it in all six frames and compares
dict the structure of a coding sequence com- the translations to nucleotide database
posed of multiple exons, the DNA sequence sequences that are dynamically translated
and the protein sequence must be aligned in all six frames.
taking into account the presence of introns
that constitute long fragments of the DNA Comparison of homologous genomic
sequence that do not match with the protein sequences
sequence (Nicolas and Chiapello, 2007).
PROCUSTES (Gelfand et al., 1996) and PAIR- Often scientists want to isolate from a par-
WISE (Birney et al., 1996) were the first two ticular organism, a gene orthologous to one
programs developed to resolve this spliced already isolated from another organism
alignment problem. while others may be interested in isolat-
Protein coding DNA from closely ing from the same organism other members
related plant species, such as sorghum and of a gene family (paralogues) for which at
maize, show considerable sequence simi- least one cloned member is available. The
larity. VISTA/AVID and PIGMAKER can be used homologous genes (orthologues or para-
to compare large genomic sequences to logues) may be used as probes to isolate the
find orthologous genomic sequences from target genes from a library or the degenerate
closely related species. For example, primer method. As the full genomes of plant
sequence analysis of orthologous genes species such as Arabidopsis and rice are
from rice, maize and sorghum showed that more precisely annotated, the finding and
the exons are more conserved than introns isolation of potential genes in other, less
(Schmidt, 2002). The degree of sequence well-defined systems may be possible with
conservation, in terms of sequence iden- reference to the position of the sequence in
tity, across species has been shown to be a particular cluster of genes through syn-
consistent with the divergence times of teny. However, these predictions are likely
the respective species. For gene prediction to be complicated by the presence of multi-
programs, it would be best to compare two ple copies of genes, the divergence between
genomes that are very closely related, but paralogues and orthologues in other spe-
distant enough that their intergenic repeat cies (see Chapter 1 in Cullis, 2004) and
elements differ significantly. As a rule of the micro- and macro-rearrangements of
thumb, two species are considered closely the chromosomes over evolutionary time.
related if they diverged within the last Therefore, any candidates will need to be
25 million years. For example, maize and extensively characterized to demonstrate
sorghum are closely related species as they that they are performing the same function
were diverged 1520 million years ago. in both time and space. Comparison of rice
Isolation and Functional Analysis of Genes 423

and Arabidopsis genome sequences showed Splice site prediction


that 90% of the Arabidopsis genes had a
putative homologue in rice, where only As most plant genes have several exons, pre-
71% of rice genes had a putative homology cise gene structure prediction in plants very
in Arabidopsis (IRGSP, 2005). The creative much depends on correct splice site predic-
application of similarity-based analyses has tion. Although nearly all introns begin with
allowed the identification of novel coding GT and finish with AG, this information is
sequences. Several thousands of conserved not enough for the spliceosome to choose
unannotated regions were recognized in the splicing sites. Other sequence signals
A. thaliana relative to the partial genome around these dinucleotides are used. Most
sequences of Brassica oleracea (Ayele splicing site recognition methods are based
et al., 2005; Katari et al., 2005). In these on the evaluation of the sequences of poten-
approaches, conserved genomic regions, as tial sites by using a probabilistic model that
identified by TWINSCAN, with physical prox- describes position by position the nucleo-
imity in the A. thaliana reference genome tide composition of actual splicing sites.
were chained together to produce novel Many first generation gene prediction pro-
gene models. grams used simple position weight matrix
Homology-based cloning has been effec- methods to model the compositional biases
tive when the gene of interest is a known present in the 5' and 3' splice sites. A limi-
member of a multi-gene family. Often, amino tation of such models is that they do not
acid sequence alignments of family mem- account for dependencies between positions
bers reveal particularly conserved regions. that are not consecutive (OFlanagan et al.,
Consequently, degenerate oligonucleotides 2005). Most recent programs have investi-
can be designed and used in library screen- gated the correlations between different
ing or directly for PCR cloning of the gene. positions by using Markov models, maxi-
mal dependence decomposition models,
decision tree models and artificial neural
11.1.3 Ab initio gene prediction networks (as reviewed by Davuluri and
Zhang, 2003). GENESPLICER, NETPLANTGENE,
Comparison of sequence differences between NETGENE2 and SPLICEPREDICTOR are some of
coding and non-coding regions has encour- the splice site prediction programs that use
aged the development of prediction meth- splice site models. Other models such as
ods based on the probabilistic modelling of Bayesian networks that account for correla-
DNA sequences, which helps overcome the tion between non-adjacent positions have
limitations of homology-based methods. Ab also been proposed (Chen et al., 2005).
initio gene finding programs recognize sig- Exon prediction
nals of compositional features in an input
genomic sequence by pattern matching or Exons are defined by what is retained in a
statistical methods. The performance of a spliced mRNA, which includes untranslated
gene-finding program is typically meas- regions (UTRs) and the protein coding regions.
ured in terms of the sensitivity, defined as The protein coding exons typically are of
the proportion of true signals (e.g. donor four types: (i) initial exons (ATG to first donor
signals, exons) that are correctly predicted site); (ii) internal exons (acceptor site to
and specificity, defined as the proportion donor site); (iii) terminal exons (acceptor site
of predicted signals that are correct. A pro- to stop codon); and (iv) single exons (ATG
gram is considered accurate if its sensitiv- to stop codon without introns). Most of the
ity and specificity are simultaneously high. gene prediction programs have been devel-
A comprehensive review of these programs oped to predict protein coding exons. The
can be found at http://linkage.rockefeller. accuracy of splice site prediction and hence
edu/wli/gene/. Stein (2001) reviewed vari- exon prediction, by second generation pro-
ous genome annotation methods. grams (e.g. GENSCAN, GENEMARK.HMM, MZEF or
424 Chapter 11

SPL)is significantly higher than simple splice finding genes in genomic sequences, since
site prediction programs described above, the evidence of support (mRNA, EST, pro-
because these programs integrate splice site tein) was already derived experimentally
models with additional types of informa- (Davuluri and Zhang, 2003). Ab initio gene-
tion, such as compositional features of exons prediction programs do not rely on such
and introns. MZEF, based on quadratic discri- data, but miss some known genes (false
minant analysis, was specifically trained to negatives) and predict some that are not real
predict internal exons (Davuluri and Zhang, (false positives). A combination of ab initio
2003). It was shown to perform better than gene prediction programs and homology-
FGENESP, GRAIL, GENSCAN and GENEMARK.HMM in based approaches has been automated in
predicting internal exons for the Arabidopsis several programs such as GENOMESCAN and
genome. For predicting initial and terminal RICEGAAS to produce more reliable predic-
exons, GENSCAN and GENEMARK.HMM are the tions of protein-coding regions. GENOMESCAN
best options, even though the accuracy of incorporates protein homology information
predicting these exons is significantly lower (BLASTX hits) with the exonintron predic-
than that of internal exon prediction. tions of GENSCAN. It first masks the inter-
spersed repetitive elements in the genomic
sequences with REPEATMASKER and then
Gene modelling
combines the GENSCAN predicted peptides
The accuracy of individual exon predic- with BLASTX hits. The program determines
tion can be further improved by combin- the most likely parse (gene structure),
ing the compatibility of the reading frames conditional on the given similarity infor-
of adjacent exons to make a full coding mation under a probabilistic model of the
transcript. Probabilistic models, such as gene structural and compositional proper-
Hidden Markov Models, have been used ties of genomic DNA for the given organism.
to incorporate this information in GENSCAN There are two major ways to integrate dif-
and GENEMARK.HMM, which model different ferent approaches to improve the prediction
states (exon, intron, intergenic regions, etc.) (Nicolas and Chiapello, 2007). The first way
of a gene. The GENEMARK program imple- is to use the programs separately, then carry
ments a sliding window strategy. The win- out a post-treatment of the results. JIGSAW
dow slides along the sequence and at each (Allen and Salzberg, 2005) is an example
position the program computes the proba- of program developed for this task. It uses
bility of the sequence contained in the win- dynamic programming algorithms to auto-
dow under seven models: non-coding and matically combine predictions made with
coding on each of the two strands in each of independent programs. The second way is
three reading frames, to obtain a probability to develop programs that have their predic-
of the locally coding nature of the sequence tions simultaneously based on intrinsic and
in each reading frame. A simple alterna- extrinsic criteria.
tive to sliding windows is implemented by Despite great progress, gene prediction
GLIMMER (Nicolas and Chiapello, 2007). The by computational approaches alone is still
program first extracts ORFs longer than a far from perfect. Comparing the perform-
certain threshold and, secondly, attempts to ances of different approaches is a careful
classify them according to their coding or and difficult task as indicated by Nicolas
non-coding nature. and Chiapello (2007). The difficulties arise
partly from the diversity of information
taken into account as well as the diversity
of the predictions made (complete coding
11.1.4 Gene prediction by integrated sequences, exons, splicing sites). Just as the
methods quality of intrinsic methods depends clearly
on their adjustment to a given species, that of
Gene prediction by homology-based meth- extrinsic methods depends on the degree of
ods is perhaps the most efficient way of similarity between the sequences compared.
Isolation and Functional Analysis of Genes 425

Last but not least is the problem of a gold character. Most of the transmembrane seg-
standard that could serve as a reference for ments are made up of a helices and around
the comparison of the approaches. 2530 residues are required for the polypep-
Before running any gene-finding pro- tide chain to cross the membrane in the form
gram, Davuluri and Zhang (2003) suggested of an a helix. Some segments of protein
the use of programs such as REPEATMASKER, sequences have an over-representation of a
which identifies known classes of inter- small number of amino acids or even show
spersed repeats and long and short inter- a more or less regular repetition of a particu-
spersed nuclear elements (LINEs and SINEs), lar peptide. They do not have a conventional
which exist in non-coding regions of the three-dimensional globular structure. These
genome. Almost all gene finding programs zones rich in specific amino acids supply
can predict only protein coding regions and practically no information on the function
have not been trained to predict untrans- of proteins and they must be masked before
lated exons and untranslated portions of using homology search methods, because
first and last coding exons. In addition, their abnormal amino acid composition
identifying the exact boundaries of all the disturbs the statistics associated with these
exons and assembly of the exons into differ- techniques and often results in false infer-
ent genes is not possible by computational ence of homology. Coiled coils zones are
approaches alone. As indicated by Davuluri formed by a bundle of two or three a helices
and Zhang (2003), however, even the partial and they can be detected based on statistical
predictions are of immense value to design techniques that take into account the prob-
the experiments that can determine the com- ability of observing a particular amino acid
plete gene structure faster than would be at each of several characteristic positions
possible by experimental methods alone. (Lupas, 1996). Cellular sorting of proteins
towards organelles that are their final des-
tination depends on signals present in the
primary structure. Techniques based on the
11.1.5 Detecting protein function from overall composition of amino acids can
genomic sequences be used to predict the location of proteins
in various organelles.
There are three major classes of in silico
methods used to obtain information on pro-
Homology search methods
tein function: methods using information
intrinsic to the sequence; homology search Homology search methods play a central
methods; and methods based on the context role in in silico functional analysis to dis-
of genes (Gibrat and Marin, 2007). Homology cover similar proteins in databases. There
search techniques provide precise informa- are two principal categories of homologous
tion and thus occupy a central place, while proteins: orthologous proteins (which result
others give only general information. from a process of speciation) and paralo-
gous proteins (which result from a process
of duplication). Gibrat and Marin (2007)
Intrinsic methods
evaluated different means by which a rela-
The methods using information intrinsic tionship of homology between two proteins
to the sequence detect recognisable protein can be inferred, including sequence com-
structure such as transmembrane segments, parison, profile detection, motif detection
zones of low complexity, coiled coils and and fold recognition.
cellular sorting signals (Gibrat and Marin, As with DNA sequence comparison,
2007). As transmembrane segments are protein sequence comparison is the most
mostly made up of hydrophobic residues, natural and the oldest method of indicat-
the detection methods are based on the ing a relationship of homology between
search for segments that have an appropri- two proteins. BLAST and FASTA are sequence
ate size and present a marked hydrophobic comparison methods based on the principle
426 Chapter 11

that two sequences with a common ancestor of functional models can be classified into
should maintain some traces of that rela- those that occurred in an ancestor common
tionship in the sequences. Profile detection, to most of the present lines of organisms,
based on a multiple alignment of similar those concerning the genetically mobile
sequences, can be used to estimate the vari- domain that are found in different proteins
ability of each of the positions along the and those similar to cytochrome P450, pro-
sequence of a protein. Multiple alignments viding some information on interaction
can be made by using PSI-BLAST (Altschul between proteins (Gibrat and Marin, 2007).
et al., 1997), which constructs the multiple Gene proximity methods are based on the
alignments iteratively during the search by observation that functionally linked genes
comparing with profiles of families of pro- are co-regulated and have a tendency to
teins from databases. Motif detection is to be close together in the genome and that
search for motifs that correspond to a func- the position of a gene in the genome may
tional signature, or even to residues neces- provide information about its function.
sary to maintain the correct geometry of the Protein function can be predicted, e.g. by
active site of a protein. As these residues are measuring a local proximity that involves
crucial for the function of the protein they the conservation of nearby gene pairs in
are thoroughly conserved. Some programs, different genomes being compared. Gene
such as SCANREGEXP and PFSCAN, can be used co-occurrence is based on the concept that
to search for motifs characteristic of particu- genes that are implicated in a particular cel-
lar proteins in the Prosite library of motifs lular process, common to genomes of this
(Hofmann et al., 1999). Fold recognition set, can share an identical phylogenetic pro-
methods are based on the alignment of a file. Thus, an unknown protein that shares
sequence on a three-dimensional structure a phylogenetic profile with proteins that are
to indicate relationships of homology, which known to be implicated in a particular cel-
can be used to reveal the distant homo- lular pathway has a high chance of playing
logues that cannot be detected by sequence a role in that pathway.
comparison methods.

Gene context methods 11.2 Comparative Approaches


for Gene Isolation
The methods based on the context of a
gene depend on studying the co-location
Comparative approaches for gene isolation
of genes in different genomes. Unlike hom-
include in silico methods that have been
ology search techniques that often provide
described in the previous section, experi-
information on the molecular function of
mental procedures involving map-based
proteins, this type of technique generally
cloning using comparative analysis between
provides information on the interactions
closely related species and major-gene
between proteins and thus their cellular
assisted QTL mapping by comparing traits
role. With gene context methods protein
controlled by genes with different func-
function can be detected based on three
tions. These approaches have been used at
different concepts: gene fusion, gene prox-
different scales due to their various levels of
imity and gene co-occurrence (Gibrat and
practicability.
Marin, 2007).
Gene fusion has been evident by obser-
vation that a metabolic pathway made up
of independent enzymes in the prokaryo- 11.2.1 Genomic bases of comparative
tes is catalysed by a multi-enzyme system. approaches
Two proteins that exist in the independ-
ent form in one genome and that are fused It is expected that the functional genomics
in another genome have a high chance of of model plants will contribute to the under-
being in close interaction. The gene fusion standing of basic plant biology as well as to
Isolation and Functional Analysis of Genes 427

the exploitation of genomic information for artificial chromosome (BAC), which permits
crop improvement. This is because a large more detailed mapping in the large genome
number of gene functions are conserved species. Chromosome walking, as a basic
across species, either directly or after iden- step in map-based cloning to be described
tifying the functional homologues. Perhaps in detail later, is often difficult with large
the most exciting application of compara- genomes such as barley, maize and wheat. In
tive genomics will be the identification of these cases, related plant species with small
different versions of genes for a target spe- genomes such as rice, which show genomic
cies from other related species. Orthologous collinearity with the large genome species,
genes in related species will be similar in can be used to identify and isolate desired
sequence and function to those in the target genes. This approach has potential pitfalls,
species but could result in markedly differ- especially with respect to some disease
ent phenotypes (Xu et al., 2005). resistance genes (Kilian et al., 1997; Leister
Conservation of gene content and gene et al., 1998; Pan et al., 2000). Resistance
order among closely related plant species gene regions often undergo rapid rearrange-
greatly assists in gene identification and ment that results in a lack of micro-colline-
annotation. Even in closely related plant arity caused by deletion or translocation of
genomes, whose ancestors diverged from the target loci. However, at the very least,
each other more than 10 million years the comparative genomic approach pro-
ago (mya), only genes are conserved in vides numerous probes from one species,
orthologous regions. All of the plant spe- which can be used for gene mapping and
cies with large genomes studied have isolation in another species (Ramakrishna
been evolved by the moment of retrotrans- and Bennetzen, 2003).
posons within the last 6 million years and Comparative genetics has been facili-
these sequences vary greatly between spe- tated by the development of massive data-
cies (Ramakrishna and Bennetzen, 2003). bases, efficient querying and comparison
Hence, plant species that diverged from software and ever-improving computers.
each other more than 50 mya only have Many of the first genes sequenced in rice
exonic regions conserved among genes. and other grasses were represented by
This feature has been used to improve gene abundant mRNAs (e.g. those encoding stor-
annotation with great success (Tikhonov age proteins and photosynthetic proteins).
et al., 1999; Dubcovsky et al., 2001; Members of the same gene families (e.g. para-
Ramakrishna et al., 2002). Gene structure logues), including those that were mapped
can be predicted more accurately using to the same genomic position and thus were
comparative sequence analysis than by the derived by vertical descent from a common
combined use of ESTs, homology to entries ancestral gene (i.e. orthologues), were often
in protein databases and gene prediction cloned and analysed in multiple species
programs as described in the previous sec- (Bennetzen and Ma, 2003).
tion. Conservation of genomic collinear- When sequences from two separate
ity, gene content and order among plant parts of the gene are moderately conserved,
genomes greatly assists in gene isolation degenerate oligonucleotides based on the
from cross-species comparisons. two sequences can be used to attempt to
Differences in gene content are some- amplify the intervening sequence by PCR.
times observed in otherwise collinear regions An amplified RT-PCR product or a degener-
of plant genomes (Tikhonov et al., 1999; ate oligonucleotide may be used in nucleic
Tarchini et al., 2000; Ramakrishna et al., acid hybridization to screen the colonies
2002; Bennetzen and Ramakrishna, 2002). or plaques of a cDNA/genomic library for
This phenomenon can complicate gene clones containing the gene of interest.
isolation, but does not completely invali- Positive clones from a genomic library need
date the approach. Under almost all circum- to prove that the clone actually contains the
stances, a small genome species will provide gene and to be further examined to identify
numerous DNA markers on a single bacterial where on the clone the gene is located.
428 Chapter 11

11.2.2 Experimental procedures involved maize. The results of such studies will show
in comparative analysis whether overall collinearity is maintained
in the region.
Ramakrishna and Bennetzen (2003) de-
scribed methods for plant gene isolation CLONE SELECTION AND MAPPING. Several thou-
based on comparative genetic map and/or sand clones from the small genome BAC
genomic sequence information. This tech- libraries are screened for individual clones
nique involves identification of collinear that show homology to DNA markers
regions, followed by clone selection and mapped in the collinear regions in different
finally, sequence analyses to identify the plant species.
gene of interest.
CONSTRUCTION OF SHOTGUN LIBRARIES AND SEQUENC-
Basic procedures ING. These two steps can follow a standard
procedure described in Chapter 3.
IDENTIFICATION OF COLLINEAR REGIONS. The
genetic map position of the targeted locus
SEQUENCE ANALYSES AND ANNOTATION. The first
in the plant species with a large genome
step in the sequence analysis of collinear
size must be determined accurately by seg-
BACs (for instance, when a collinear sor-
regation analysis of the locus with tightly
ghum BAC is sequenced to isolate a gene
linked markers. These markers should map
based on the genetic map location in maize)
to a collinear region in a related plant spe-
is the delimitation of regions that are con-
cies with a small genome to enable isolation
served and not conserved relative to rice.
of the targeted locus. Comparative genetic
Conserved regions are usually or always
linkage maps with common molecular
genes, while the unconserved regions are
markers serve as the best starting point.
usually not genes. Complete sequences
For example, the maize genome is about
from orthologous BACs are then compared
2400 Mb in size, corresponding to a genetic
using the program DOTTER to identify the
map of about 2500 cM. This translates to an
conserved regions. Genes can be predicted
average of 1 Mb/cM for the maize genome.
as described in the previous section.
A large mapping population of 5000 gam-
etes with no recombinants in the segregat-
ing progeny makes it likely that the targeted CONFIRMATION OF CANDIDATE GENES. The pos-
gene is present within a 500-kb region. The sible functions of candidate genes can be
rice genome has a size of 380450 Mb and a investigated using several independent
genetic map of about 1600 cM. This makes approaches. Sequence analyses and annota-
map-based gene isolation much easier in tion, as described above, using comparative
rice than in maize. sequence analyses, gene-finding programs
In cases where the gene of interest is and BLAST searches, identify putative genes.
absent in the small genome, we can still Sequence variations and gene structure
use markers from the orthologous region in analysis of the gene identified in the region,
the small genome to fine-map in the large for instance in susceptible and resistant
genome, as markers are often the limiting lines in case of disease resistance genes, can
factor for fine-mapping in some crop spe- help verify a candidate gene. As an example,
cies. The complete genome sequence in preliminary mapping, cloning, sequencing,
crop plants provides abundant information gene finding and BLAST searches identified
for choosing suitable markers. For exam- two candidate genes for barley Rpg1. These
ple, the maize BAC libraries can then be were differentiated by segregation analysis
screened with suitable markers from rice in 8518 gametes and by sequence analysis in
to identify BACs that harbour the gene of barley lines susceptible or resistant to stem
interest. The next step is to look for the rust (Brueggeman et al., 2002).
presence of flanking markers (tightly linked Additional experimental analyses can
to the targeted gene) on continuous BACs in be performed to evaluate candidate gene
Isolation and Functional Analysis of Genes 429

function. Several approaches can be used leaf-rust-resistance gene of bread wheat that
including mutation analysis and expression was successfully isolated using a strategy
analysis. of shuttle-mapping between diploid wheat
Mutation analysis can follow these as a model and bread wheat (Huang et al.,
steps (Ramakrishna and Bennetzen, 2003): 2003). Most of the time, however, there are
breakages in microsynteny that prevent
Analysis of knock-out mutations (i.e.
the straightforward identification of a can-
transfer DNA (T-DNA) or transposon
didate gene for a given trait. This was the
insertions).
case when attempts were made to isolate
Wild-type lines that either have a non-
the leaf-rust-resistance gene Rph7 (Brunner
functional or an overexpressed gene of
et al., 2003) or the photoperiod response
interest can be generated by transform-
gene Phd-H1 (Dunford et al., 2002) from bar-
ing wild-type plants with antisense or
ley. A similar story was reported for the Rfo
sense gene constructs.
restorer genes isolated from radish: markers
RNA interference can be employed,
flanking these genes in radish are collinear
where homologous double-stranded
with the Arabidopsis sequence, but the gene
RNA (dsRNA) is used to suppress a
itself is not present in Arabidopsis although
gene, generally resulting in a null phe-
many homologues are present elsewhere in
notype (same as above, combine these
the Arabidopsis genome (Brown et al., 2003;
two).
Desloire et al., 2003).
Complementation studies, where a
Examples for the use of a shuttle-mapping
wild-type copy of the gene of inter-
strategy have to be evaluated on a case-by-case
est is transformed into the mutant to
basis. The present information, from both
see if the T1 progeny yields wild-type
successes and failures, strongly suggests that
phenotype and whether complementa-
the development of efficient tools for isolat-
tion co-segregates with the transgene in
ing genes of agronomic importance within
subsequent generations.
each important family should continue to be
Searching for point mutations by tar-
a priority and that, as indicated by Delseny
geting induced local lesions in genomes
(2004), restricting ourselves to use of several
(TILLING) to provide an allelic series of
model species would be unwise, although
mutations.
collinearity has been useful in providing
Tissue-specific expression of the genes can additional markers with which to saturate
be studied using Northern blot analysis, fine genetic and physical maps.
microarrays, reporter constructs, or reverse
transcription-PCR to see if the expression
patterns agree with the predicted biology of
the targeted gene. 11.2.3 Cloning QTL facilitated by
related major genes
Examples
Robertson (1985) presented evidence that
Probably the most comprehensive appli- qualitative and quantitative traits may be
cation of collinearity in plants was the the result of different types of variation
attempt to clone specific barley disease of genic DNA at the loci involved. At any
resistance genes by chromosome walk- given locus, variation of a minor nature
ing using rice. The collinearity provided may result in wild-type alleles responsible
numerous DNA markers from rice that facil- for gene products with different efficien-
itated the chromosome walk in barley, lead- cies (quantitative alleles) while major genic
ing to the isolation of the desired stem-rust rearrangements or changes in the region of
resistance gene Rpg1, although the synteny the gene essential for a normal function-
with rice failed to yield the gene because it ing gene product may result in qualitative
does not seem to exist in rice (Brueggeman mutant alleles. Based on this hypothesis,
et al., 2002). Another example is the Lr21 Robertson proposed a possible approach to
430 Chapter 11

cloning QTL. It is apparent from previous tions and comparing the map position of
work that the alleles for quantitative vari- these QTL with previously known positions
ation assume possible allelic interactions of qualitative variations for the same charac-
and have a smaller individual effect than ter (Beavis et al., 1991). The results showed
alleles from qualitative variation. However, a general concordance in map positions
it is possible that alleles for qualitative of QTL and major genes affecting height,
mutants are simply loss-of-function alleles which is consistent with the hypothesis.
at the same loci underlying quantitative With the development of practical QTL
variation. Consider, for instance, a trait such mapping, the similar location of QTL and
as plant height. In maize, at least 17 known major genes is supported by many research
qualitative mutants that affect plant height results and only several early examples
have been identified (cf. Robertson, 1985). will be discussed here. In maize, many
These are non-allelic mutants, all of which QTL affecting plant height were located at
have been placed on chromosomes. In rice, known major loci (Edwards et al., 1992),
over 50 loci responsible for semi-dwarfism indicating that some QTL may be allelic to
or dwarfism have been found and mapped the major genes. In genetic analysis of rice
(Kinoshita, 1995). If all these loci had two blast resistance, a major gene was located
or more wild-type alleles responsible for on chromosome 8 by randomly amplified
quantitative variation, the combination of polymorphic DNA (RAPD) analysis on
these would come close to being sufficient resistant and susceptible plants of a double
to explain the quantitative inheritance pat- haploid (DH) population (Zhu et al., 1994).
tern observed by breeders. Theoretically, A QTL controlling quantitative resistance
QTL mapping studies can provide a test was found in the same chromosomal region
of this hypothesis. If a gene contributing when using molecular markers to map the
to quantitative variation is allelic to a gene resistance gene with quantitative pheno-
controlling qualitative variation, then these type data (Wang et al., 1994). In A. thaliana,
genes should map to the same locus along five QTL affecting flowering time were
the chromosome. For some organisms (e.g. identified in a cross between two ecotypes,
maize and Drosophila), many of the major H51 and Landsberg erecta and four of them
qualitative loci controlling morphological were located in regions containing muta-
variation have been mapped with a high tions or loci previously identified as confer-
degree of precision on genetic maps and ring a late flowering phenotype (Clarke et
these locations should be predictive of the al., 1995). Generally, associations between
locations of QTL for the same character. qualitative mutants and QTL are more often
As indicated by Robertson (1985), it seems than expected by chance. In maize, for exam-
unreasonable, to say nothing of wasteful, to ple, 75% of chromosome intervals harbour-
assume that a living organism would have ing discrete height mutants also harboured
two sets of loci, one for qualitative traits height QTL and 43% of intervals harbouring
and one for quantitative traits, when one set QTL also harboured mutants (cf. Lin et al.,
could account for both patterns. 1995), although the association is by no
Robertson (1989) gave two examples to means absolute. A report on QTL mapping
support his hypothesis. One of them is that in five rice populations detected 23 plant
a difference in gibberellin deficiency, con- height QTL. According to linkage relation-
trolled by a major gene, resulted in a quan- ships determined with restriction fragment
titative difference of plant height. He also length polymorphism (RFLP) markers, all
listed a series of qualitative traits which of the 13 major dwarfing or semi-dwarfing
are related to the quantitative traits of same genes were found to be in close proximity to
kind. This hypothesis has been tested in these plant-height QTL (Huang et al., 1996).
maize. Beavis and colleagues attempted to In Drosophila, the map positions of bristle
test the relationship of qualitative mutants QTL in every case corresponded approxi-
to quantitative variation by mapping QTL mately to those of candidate neurogenic
for plant height in four maize F2 popula- loci or loci with major bristle phenotypes
Isolation and Functional Analysis of Genes 431

(Long et al., 1995). However, the QTL were organisms and use these tags to fish a gene
located on the map with a low degree of res- out of a portion of chromosomal DNA by
olution in most cases mentioned above, rais- matching base pairs. The challenge associ-
ing the possibility that the QTL are linked, ated with identifying genes from genomic
but not identical to the qualitative loci, as sequences varies among organisms and is
indicated by Tanksley (1993). Until QTL are dependent upon genome size as well as the
mapped to higher degrees of precision and/ presence or absence of introns, the interven-
or cloned, it will be difficult to prove that ing DNA sequences interrupting the protein
the particular QTL actually correspond to coding sequence of a gene. A large number of
known loci defined by macromutant alleles. new genes have been identified in many spe-
As more and more QTL are cloned, whether cies by randomly sequencing cDNA clones
there are any corresponding macromutant to produce ESTs. In view of the efficiency of
alleles can be tested. this approach as a mechanism for establish-
As suggested by Helentjaris et al. (1992), ing relationships between plant phenotypes
the theory proposed by Robertson (1985) and the large amount of sequence infor-
provides a possible approach to identifica- mation available for other organisms, it is
tion and cloning of important quantitative desirable to obtain large numbers of partial
genes. For QTL cloning based on Robertsons sequences.
proposal, both major and minor genes
should be examined based on their rela-
tive contribution to expression of traits and
their interaction with each other. It can be 11.3.1 Generation of ESTs
expected that genetic relationships can be
established between known major genes To generate EST sequences, the mRNA is
and QTL, or QTL can be paralleled with the isolated and reverse transcribed into cDNA.
extreme mutants and one can verify these The cDNA clones are sequenced from either
relationships and facilitate QTL cloning the 5' or 3' ends of the cDNA or from both
through cloning the related major genes. ends (Chapter 3). The sequences are then
clustered to identify a series of tentative
unique genes (TUGs) or tentative contigs
(TCs) and an estimate of the number of dif-
11.3 Cloning Based on cDNA ferent RNAs present in the initial sample.
Sequencing The TUGs/TCs can then be compared with
the current databases to identify which of
One way to identify genes is to clone and these have already been described in the
sequence RNAs. Short stretches of cDNA species under consideration and which
sequences, derived from mRNA, are referred are still absent from the current databases.
to as expressed sequence tags (ESTs). ESTs are Where hits occur to ESTs from other organ-
usually 200500 nucleotides long and gener- isms, a possible function may be ascribable
ated by sequencing either one or both ends to the sequence (Cullis, 2004). The sequen-
of an expressed gene. cDNA sequence-based cing of any given sample is continued until
approaches for gene cloning have been exten- the rate of finding new sequences drops
sively applied in humans, Caenorhabditis below an acceptable level. Although a huge
elegans and plants. The precise nature of redundancy of highly abundant RNAs will
the sequence information obtained by EST be produced, low-abundance RNAs and
analysis and the ever increasing number of those genes that are only expressed in spe-
gene sequences of known function make it cialized cells are still likely to be missed.
possible and productive to identify specific Therefore, techniques facilitating the isola-
genes by sequence similarity as discussed in tion of specific tissues or cells, such as laser
the previous section. The idea is to sequence capture microscopy and RNA amplification,
cDNA that represents genes expressed in cer- may help identify genes that are expressed
tain cells, tissues, or organs from different at low levels or in very few cells.
432 Chapter 11

Because a gene can be transcribed into quantitative trait by site-directed mutation


mRNA many times, ESTs ultimately derived and genetic engineering, which can help
from this mRNA may be redundant. That is, determine which cDNAs are related to a
there may be many identical, or similar, cop- given trait (Xu, 1997).
ies of the same EST. Such redundancy and Although it is widely recognized that
overlap means that when someone searches the generation of ESTs constitutes an
dbEST for a particular EST, they may retrieve efficient strategy to identify genes, it is
a long list of tags, many of which may repre- important to acknowledge that there are
sent the same gene. Searching through all of several limitations associated with the EST
these identical ESTs can be very time consum- approach. One is that it is very difficult to
ing. To resolve the redundancy and overlap isolate mRNA from some tissues and cell
problem, National Center for Biotechnology types. This results in a paucity of data on
Information (NCBI) investigators developed certain genes that may only be found in
the UniGene database. UniGene automati- these tissues or cell types. Second is that
cally partitions GenBank sequences into a important gene regulatory sequences may
non-redundant set of gene-oriented clusters. be found within an intron, as well as in
The clustering and assembly of individual untranscribed regions of the gene (pro-
ESTs into TUGs/TCs will result in decreased moter). Because ESTs are small segments of
sequence redundancy and a final consensus cDNA, generated from an mRNA in which
sequence that should be both more accurate the introns have been removed, much valu-
and longer than any of the underlying indi- able information may be lost by focusing
vidual ESTs in the database. The clustering only on cDNA sequencing. Despite these
algorithms will identify all the transcripts limitations, ESTs continue to be invaluable
from a gene family and generate a consen- in characterizing plant genomes.
sus sequence from the EST data. An alterna-
tive way of reducing redundant sequencing
is to enrich the RNA populations for
low-abundance transcripts. Cullis (2004) 11.3.2 Generation of full-length cDNAs
described a number of normalization and
subtraction methodologies for enrichment Construction of full-length cDNAs is a
of these low-abundance RNAs before clon- central focus in the post-sequence era of
ing. Abundant cDNA clones can be removed the various genome projects. As an essen-
before sequencing by screening high-density tial resource for the functional analysis of
cDNA filters with labelled RNA. The clones plant genes, full-length cDNAs can be used
that have strong hybridizations are elimi- in many fields such as genome annotation
nated and the minimally-hybridizing clones including splice sites, expression profil-
are re-arrayed and sequenced. The ultimate ing, protein structure determination using
goal of EST projects is to develop a UniGene X-ray crystallography and transgenic ana-
set that eventually contains all the genes for lysis (Cullis, 2004). By validating with a
the organism. full-length cDNA, predicted transcription
When a cDNA library is used in the pro- units from genomic sequence data and the
cess of gene cloning, the sequenced cDNAs occurrence of alternative splicing events
are used as molecular markers for genetic can be confirmed. The full-length cDNA can
mapping, generating a saturated cDNA be used in both homologous and heterolo-
marker-based genetic map to locate the gene gous expression systems to generate large
and eventually determine which cDNA is amounts of protein for functional and struc-
related to the gene effects. Additionally, tural studies to determine gene function. In
a cDNA can be verified as the gene by a addition, sequencing of the full-length tran-
transformation/complementation test. With scripts will allow the identification of RNAs
the development of reversed genetic tech- from different members of gene families.
niques, one can also study the genetic effect Full-length cDNA library construction
of a mutated cDNA and its effect on the is more technically challenging compared
Isolation and Functional Analysis of Genes 433

with EST generation. A full-length first- the second-strand synthesis. Again, this
strand cDNA is not efficiently produced primer also has an extension that includes
by reverse transcription, especially if the a restriction enzyme site. After the second-
mRNA has a stable secondary structure. strand synthesis the full-length cDNA is
Libraries made from cDNAs, therefore, can cloned with the restriction sites inserted
contain both full-length and partial cDNAs. with the first- and second-strand primers.
One method for constructing cDNA librar- The full-length cDNA for a desired
ies with a high content of full-length clones gene can be obtained using 5' and 3' RACE
involves starting from the first transcribed (rapid amplification of cDNA ends) tech-
nucleotide. A number of critical issues nique. RACE results in the production of
pertaining to synthesis and cloning of full- a DNA copy of the RNA sequence of inter-
length cDNAs have been recently identified. est, produced through reverse transcrip-
Most important is the purity and integrity tion, followed by PCR amplification of the
of the starting material. mRNA is often con- DNA copy. The amplified DNA copy is then
taminated with heterogenous nuclear RNA sequenced to obtain a partial sequence of
(hnRNA) due to the difficulty to exclu- the original RNA. RACE can provide the
sively isolate cytoplasmic RNA from plant sequence of an RNA transcript from a small
tissues. True full-length cDNAs will yield known sequence within the transcript to the
sequence information from both 5' and 3' 5' end (5' RACE-PCR) or 3' end (3' RACE-
non-coding regions as well. A full-length PCR) of the RNA.
cDNA should encompass all sequences
from the CAP site to the poly (A) addition
site. However, it is generally agreed upon 11.3.3 Full-length cDNA sequencing
that a cDNA comprising the entire coding
sequence of a protein should be considered
The wide availability and usefulness of
worthy for full-length sequencing at high
cDNA clones has spurred an interest in
accuracy.
using a high-throughput approach to
Cullis (2004) described a process of con-
obtaining the complete sequence of full-
traction of full-length cDNA. A biotin label
length clones. The approach to obtaining
for the CAP structure has been developed
the sequence of a full-length cDNA clone
based on the principle that the CAP site and
is different from that used to generate EST
3' end of mRNA are the only sites that carry
data. Many of the full-length cDNAs are
the diol structure. The diol groups at each
likely to be longer than the reads resulting
end of the mRNA are biotinylated and then
from sequencing both the 3' and the 5' ends
the first-strand of cDNA is synthesized.
of the insert. Therefore, additional sequenc-
This synthesis is primed with a degener-
ing strategies are necessary to obtain the
ate primer (XTTTTTTTT (restriction site) ).
full-length cDNA sequence. There are three
The reaction mixture is then digested with
possible strategies for full-length sequenc-
RNase I, which cleaves the single-stranded
ing (Cullis, 2004):
RNA molecules at any sites, to destroy RNA
molecules or part of them unpaired with Transposon mutagenesis (Kimmel
their cDNA. Therefore, the 5' ends of all the et al., 1997): a transposon is randomly
mRNAs not protected by their partial cDNAs inserted, in vitro, into the cDNA insert
and exposed as single-stranded are removed and primers designed from both sides
(along with the biotinylated CAP structure) of the transposon are used for sequen-
as are all the biotinylated 3' ends. The full- cing. Typically, each cDNA clone is
length cDNAs are captured on streptavi- subjected to a transposon reaction to
din-coated magnetic beads and the cDNA produce a population of subclones,
is released from the beads and the mRNA each harbouring a transposon at a dis-
is destroyed by treatment with RNase H tinct location. The subclones are then
and alkaline hydrolysis. The cDNA is then sequenced using transposon-specific
tailed with oligo(dG) that is used to prime primers, most often from each end of
434 Chapter 11

the inserted transposon. Sequencing to known genes or predicted similarities,


a number of independent transposon the similarities are frequently too distant
sites will be sufficient to assemble the for hybridization or PCR-based methods to
complete cDNA sequence. be useful. In this case, a directed EST screen
Concatenated cDNA sequencing (CCS) may prove useful. EST data is obtained and
(Yu, W. et al., 1997): multiple cDNA searched for characteristic sequences to
inserts are isolated, pooled and enzy- identify a specific EST as a candidate gene.
matically concatenated. The entire pop- Maximal information about the characteris-
ulation of concatenated cDNAs is then tic sequence motifs and their approximate
subjected to shotgun sequencing (as locations in the target protein sequence is
if it was a single large-insert genomic critical to the success of this effort.
clone), with the individual cDNA The success of direct EST screens for
sequences then derived by computer identification of specific genes depends on
analysis following sequence assembly. two factors (Nunberg et al., 1996): (i) ade-
This approach is well suited for exist- quate sequence information from related
ing high-throughput sequencing envi- species allowing reliable prediction on the
ronments. However, it is also associated conservation of specific motifs and the abil-
with a number of technical challenges, ity to recognize these and overall sequence
including the initial construction of the similarities even with the relatively limited
cDNA concatemers and shotgun librar- sequence data available; and (ii) the mRNAs
ies, the computational de-convolution of the target genes are moderately abundant
of the cDNA sequences and problems and are enriched in the specific tissues.
associated with the uneven molar rep- There are many successful examples of direct
resentation of individual cDNAs. EST screens in plants and the direct EST
Primer walking: primers are designed approach can be the method of choice for the
from 5' and 3' end sequences and used isolation of new genes based on sequence
for a second round of sequencing. motif conservation or the isolation of poorly
Additional primers are then made and conserved known genes from new species. In
used until the whole contiguous cDNA these situations, directed EST screens should
sequence is obtained. This method be intrinsically more rapid and reliable than
has been applied to the large-scale, hybridization- or PCR-based screens.
full-insert sequencing of cDNA clones
(Wiemann et al., 2001); however, it is
also associated with several limitations.
First, the extensive use of synthetic oli- 11.3.5 Full-length cDNAs for the
gonucleotides adds considerably to the discovery and annotation of genes
overall costs. Secondly, the sequencing
of larger cDNA clones is associated Sequencing the full-length cDNA solves the
with many iterative walking steps, in following problems (Seki et al., 2005). First, it
some cases requiring a protracted effort. permits accurate identification of the 5' and 3'
Thirdly, there are logistical demands of UTRs. Secondly, in comparison with complete
ensuring correct primertemplate asso- genomic sequence, it enables one to identify
ciations, especially when applied on a the precise locations of all introns. Thirdly, it
large scale. aids in the discovery of new genes.
In all gene predictions from genomic
DNA the precise identity of the gene bound-
aries and exonintron structure is hindered
11.3.4 Directed EST screens to identify by the lack of supporting experimental evi-
specific genes dence. Full-length cDNA sequences and bio-
informatics software can produce insights on
When genes are to be isolated from a particu- the structure of genes in chromosomal DNA.
lar organism based on marginal similarities Therefore, full-length cDNA sequences are
Isolation and Functional Analysis of Genes 435

essential for confirmation of the predicted On a local scale, chromosome walks in


genes within a sequenced genome. Having particular regions provide the means to
a full-length cDNA enables the checking of isolate genes which have been assigned to
both the extent of the coding region of the the region by genetic mapping. On a global
gene as well as the sequences immediately scale, assembly of contigs for entire chro-
5' upstream and 3' downstream of the cod- mosomes can effectively provide a resource
ing sequence. In addition, having a full which will reduce the future need for small-
length cDNA makes it possible to train the scale investigations of particular locations.
gene finding programs so that the unknown Chromosome walking is a technique to
regions of the genome can be more accurately clone a gene (e.g. a disease gene) from its
annotated as far as the presence of genes is known closest markers. The closest linked
concerned (Cullis, 2004). The availability of marker (e.g. EST or a known gene) to the
many full-length cDNAs and trained gene- gene is used to probe a genomic library.
finding programs from a small number of A restriction fragment isolated from the end
model plants will also ease the identifica- of the positive clones is used to re-probe
tion of genes in partial genomic sequences the genomic library for overlapping clones.
of more exotic plant species. This process is repeated several times to
walk across the chromosome and reach
the gene of interest. In the diagram (Fig.
11.4 Positional Cloning 11.3A), the chromosome walk begins with
a clone containing mkrB. The ends of the
High density linkage maps of molecular
markers provide an alternative gene clon-
ing approach, positional cloning or map- yfg mkrC
based cloning. Positional cloning usually mkrA mkrB
A
consists of identifying the markers that flank
and show tight genetic linkage to the target
gene, walking to the gene by using various
genomic libraries constructed in, for exam- yfg mkrC
ple, yeast artificial chromosome (YAC) vec- mkrA mkrB
B
tors and confirming the gene effects by the
comparison of the isolated gene with a wild-
type allele or complementation of the reces-
sive phenotype by transformation (Meyer et yfg mkrC
mkrA mkrB
al., 1996; Paterson, 1996b). Theoretically, C
positional cloning methods permit the isolation
of any gene which can be precisely mapped.

11.4.1 Theoretical considerations of


positional cloning Fig. 11.3. Chromosome walking. (A) Chromosome
walking begins with a clone containing mkrB. The
For a plant species with no complete ends of the clone (boxed) are used to probe a
sequence available, one can obtain the genomic library until a clone contains either mkrA
genic DNA corresponding to the target or mkrC sequences. Clones between these two
markers (mkrB and mkrC) are then evaluated
locus through walking from one marker to
for the presence of the target gene yfg. (B)
the other by identifying two DNA markers
Chromosome walking is interrupted because a
which flank a target locus and using these segment of the region to be walked through is
markers to identify a series of contiguous non-clonable. (C) Chromosome walking is
DNA clones (contigs), which are funda- detoured because one of the clone end probes
mental to a tremendous amount of research. (filled box) is a repeated sequence.
436 Chapter 11

clone (boxed) are used to probe a genomic be relatively inactive in recombination.


library. Clones from adjacent genome seg- On the other hand, the correspondence
ments are thus identified and isolated. The between genetic and physical distance
distal ends of those clones are used to re- varies widely at different locations within
probe the library. These steps are continued a genome. In tomato, an organism with an
until a clone contains either mkrA or mkrC average of about 750 kb cM1, individual
sequences. Clones between two markers, regions have been estimated to show from
mkrB and mkrC, must then be evaluated for as little as 50 kb cM1 to as much as over
the presence of the target gene yfg. 4000 kb cM1 (Pillen et al., 1996). It has
If the target species is a species whose been known that centromeric regions tend
genome has been completely molecularly to be subject to recombination suppression
mapped, an ordered set of YACs, P1-derived and many genetic maps show clustering of
artificial chromosomes (PACs), BACs or cos- DNA markers near the centromere. Factors
mid clones will be available. Knowing which other than repetitive DNA, such as intro-
molecular markers are adjacent to the target gressed chromatin or recombinational
gene automatically identifies the YACs and/ hotspots can also markedly influence the
or cosmids that need to be tested. relationship between genetic and physical
One difficulty of chromosome walk- distance (Paterson, 1996a).
ing is recognizing where the gene is located Positional cloning suffers from unpre-
between the two markers. Zoo (or garden) dictability in terms of the number of post-
blots where DNAs of a variety of spe- meiotic progeny that a research study can
cies have been restricted, electrophoresed expect to genotype to narrow a candidate
and Southern blotted can be useful. Gene chromosomal region to a small number of
sequences are more likely to be conserved candidate genes. For example, in rice, only
during evolution than intergenic sequences. 1600 gametes were genotyped to narrow the
The identification of GC islands or the use of Pi36(t) allele to a resolution of 17 kb (Liu,
exon trapping can also be useful. There are X.Q. et al., 2005), whereas 18,944 gametes
other problems with walking. Chromosome were genotyped to map the Bph15 allele to a
walks can be interrupted if a segment of the lower resolution of 47 kb (Yang et al., 2004).
region to be walked through is non-clonable Dinka et al. (2007) described a detailed meth-
(for example if it is toxic to the host cell) odology to improve this prediction using
(Fig. 11.3B). Chromosome walks can be rice as a model system. They derived and/
detoured in many directions if one of the or validated and then fine tuned equations
clone end probes (filled box in the Fig. that estimate the mapping population size
11.3C) is a repeated sequence. by comparing these theoretical estimates to
By using a complete genetic map 41 successful positional cloning attempts.
to estimate the total genetic length of Then they used each validated equation to
a genome, one can calculate the aver- test whether neighbourhood meiotic recom-
age quantity of DNA corresponding to a bination frequencies extracted from a refer-
genetic distance of 1% recombination (i.e. ence RFLP map can help researchers predict
1 cM). The physical quantity of DNA cor- the mapping population size.
responding to 1 cM varies widely among A primary consideration in contig
higher plants, from about 280 kb in assembly is the size of each step i.e. the
Arabidopsis to more than 7000 kb in bar- amount of DNA which can be held by the
ley. Curiously, despite gross differences in cloning vector used to construct the library.
the average amount of DNA per cM in dif- This consideration is a two-edged sword
ferent taxa, the genetic (recombinational) larger steps afford faster progress in assem-
distance between orthologous loci tends to bling the contig, but yield lower resolution
be remarkably similar. This tends to sug- because target genes must be identified
gest that the largely repetitive DNA ele- from a larger DNA segment. Chromosome
ments which account for the differences walking has used different cloning vectors
in physical size of plant genomes may that can carry from 1020 kb of exogenous
Isolation and Functional Analysis of Genes 437

DNA by bacteriophage lambda (l) up to transcripts in the megabase DNA clone; and
400700 kb by YAC vectors. (iii) an efficient transformation system for
While chromosome walking is straight introducing exogenous DNA into the plant
forward in organisms with small genomes, it species of interest, permitting identification
is more difficult to apply in most plant spe- of the target gene by mutant complementa-
cies with large and complex genomes. The tion. Currently, an essential requirement for
strategy of chromosome walking is based positional cloning is the availability of com-
on the assumption that it is difficult and prehensive genomic libraries of relatively
time-consuming to find DNA markers that large DNA fragments, typically in YAC vec-
are physically close to a gene of interest. tors. The chromosome landing paradigm
Technological developments have invali- can readily be applied to cloning QTL that
dated this assumption for many species. As can be mapped with a high degree of reso-
a result, the mapping paradigm has changed lution. Progress has already been made in
such that it is often possible to isolate one or high-resolution mapping of QTL in plants
more DNA marker(s) at a physical distance and chromosome landing has been used to
from the targeted gene that is less than the clone genes for quantitative traits, as exem-
average insert size of the genomic library plified in the following section. As antici-
being used for clone isolation. The DNA pated more than a decade ago by Xu (1997),
marker is then used to screen the library QTL that have been cloned so far are those
and isolate (or land on) the clone contain- with large effect and can be easily verified
ing the gene, without any need for chromo- by transformation.
some walking and its associated problems Genome sequence information has
(Tanksley et al., 1995). Through this chro- reshaped the procedures of positional clon-
mosome landing approach, Martin et al. ing as the chromosome-aligned genome
(1993) isolated the tomato gene Pto, con- sequence information allows several of the
ferring resistance to the bacterial pathogen steps in positional cloning to be skipped
Pseudomonas syringae pv. This exempli- (Jander et al., 2002). With a larger number
fies the advantages of chromosome landing, of sequence-based molecular markers avail-
in that initial emphasis on the isolation of able, a certain level of genetic mapping may
many closely linked DNA markers elimi- be able to quickly associate the target trait
nated the need for chromosome walking and to a specific genomic region and further
the development of a high-resolution link- fine mapping effort may narrow the target
age map expedited the identification of can- genomic region to several candidate genes
didate cDNAs. This approach has become based on sequence information. This would
the main strategy by which positional clon- be followed by cloning, complementation
ing is applied to isolate both major genes by transformation and high quality de novo
and QTL in plant species. determination of the sequence of the entire
Contig assembly, or chromosome walk- region of interest without a previously deter-
ing/landing, facilitate positional cloning mined wild-type DNA sequence as a guide.
the isolation of genes based on genetic map Fig. 11.4 provides a comparison of map-
information. Positional cloning has proven based cloning in Arabidopsis between 1995
an effective means of isolation of genes and 2002 (with complete genomic sequence
in higher plants, but can be complicated available). The total effort required for map-
by physically large genomes, prominent based cloning reduced from 35 person-years
repetitive DNA fractions and polyploidy. to less than 1 person-year.
Positional cloning has several basic require- Methods have been proposed for cloning
ments (Paterson, 1996b): (i) delineation of of multiple QTL and QTL with small effects.
a target gene to a small chromosomal inter- Peleman et al. (2005) proposed a method to
val, preferably flanked by two DNA mark- fine map multiple QTL in a single popula-
ers and spanned by a single megabase DNA tion. As a first step, a rough mapping analysis
clone, or by a contig of several megabase is performed on a small part of the popula-
DNA clones; (ii) a means for identifying tion. Once the QTL have been mapped to a
438 Chapter 11

1995 Key steps in 2002


Total effort: map-based Total effort:
35 person-years cloning process < 1 person-year

Mixture of DNA First-pass mapping Standard molecular


and visible markers (20 cM resolution) marker set is available

Build physical map Generate Physical


(YACs or cosmids) physical map map exists

Develop markers Fine-scale mapping Choose markers from


from YACs or cosmids (40 kbp resolution) Cereon database

No DNA sequence for Consider Identify candidate genes


candidate genes available candidate genes from Col-0 sequence

Clone and complement, Final identification Design PCR primers from


then de novo sequencing of mutation Col-0, then sequence

Fig. 11.4. Comparison of effort involved in map-based cloning in Arabidopsis. The key steps that have
become easier between 1995 and 2002 are presented. From Jander et al. (2002) reproduced with
permission of the American Society of Plant Biologists.

chromosomal interval by standard procedures, blast disease and were associated with the
a large population of 1000 plants or more is level of blast resistance.
analysed with markers flanking the defined
QTL to select QTL isogenic recombinants
(QIRs). QIRs bear a recombination event in the 11.4.2 Examples of positional cloning
QTL interval of interest, while other QTL have
the same homozygous genotype. Only these Positional/candidate gene isolation has
QIRs are subsequently phenotyped to fine been very successful. Some early and sig-
map the QTL. By focusing at an early stage on nificant examples include: (i) identification
the informative individuals in the population of genes underlying qualitative phenotypes
only, the efforts in population genotyping and using mutant analysis and a sequenced
phenotyping are significantly reduced as com- genome (Jander et al., 2002) and no mutant
pared to prior methods. Linkage disequilib- in a sequenced or unsequenced genome
rium methods for fine mapping may also offer (Buschhes et al., 1997); (ii) identification
improved accuracy of QTL detection (Bink and of genes underlying quantitative pheno-
Meuwissen, 2004; Grapes et al., 2004). types in an unsequenced genome (Frary
For QTL with small effects, fine-scale et al., 2000) and using positional analysis
mapping and positional cloning will be and structure/function interpretation in a
very difficult in the absence of a whole sequenced genome (Yano et al., 2000); and
genome sequence. However, in these (iii) comparing the function of the gene with
cases, reverse genetics may offer a solu- its orthologous counterparts in other spe-
tion, through functional genomics analysis cies and exploring how the gene interacts
of candidate genes that underlie QTL. For with other genes in a pathway (Izawa et al.,
example, Liu et al. (2004) identified five 2003).
candidate defence response (DR) genes With advances made in rice genom-
that co-located with QTL for resistance to ics, several QTL associated with the same
Isolation and Functional Analysis of Genes 439

traits or trait components have been cloned. Fine mapping and candidate gene
These include four QTL for heading date identification
Hd1, Hd3a, Hd6 and Ehd1 (Yano et al.,
2000; Takahashi et al., 2001; Kojima et al., 1. YAC clones containing fw2.2 were used
2002; Doi et al., 2004) and QTL for grain as templates to screen the cDNA library
number (Gn1a) and grain size (GS3) (Ashikari that contains the dominant allele (Fw2.2),
et al., 2005; Fan et al., 2006). More recently, which allows a positionally targeted search
the first QTL with significant pleiotropic for candidate genes.
effects has been isolated (Xue et al., 2008). The 2. 100 positive cDNA clones and four
QTL Gdh7, isolated from an elite hybrid rice unique transcripts were identified.
and encoding a CONSTANS, CONSTANS- 3. 3472 F2 individuals (derived from NIL
LIKE, TOC1 (CCT) domain protein, has major recurrent parent (RP) cross) were screened
effects on an array of traits in rice, including with four markers to establish the marker order
number of grains per panicle, plant height of the cDNAs along the YACs (Fig. 11.5B). An
and heading date. Enhanced expression alternative would be to sequence the YACs,
of Gdh7 under long-day conditions delays which, however, is more expensive.
heading and increases plant height and 4. cDNAs were used to identify four cosmid
panicle size. Sakamoto and Matsuoka (2008) clones (Fig. 11.5B) from an L. pennellii cosmid
summarized the genes identified in rice grain library consisting of 1550 kb genomic clones,
yield and its component trait including grain which are large enough to contain more than
number, grain weight, grain filling, plant one gene per clone including enhancer/pro-
height and tillering, In this section, cloning of moter, 5' and 3' UTRs, introns and exons.
fw2.2 (Frary et al., 2000) will be discussed in
detail as an example for identifying a gene(s) Complementation tests
underlying a quantitative phenotype in an
1. Identified cosmid candidate clones were
unsequenced genome.
used in transformation experiments with
two cultivated genetic backgrounds (Mogeor,
Preliminary genetic mapping TG496). In the hemizygous R0 generation,
the fruit weight of transformants was not sig-
1. Several QTL (11) associated with tomato
nificantly different from the controls due to
fruit weight were identified in primary inter-
partial dominance of the L. pennellii. Thus,
specific mapping populations: Lycopersicon
R0 plants were selfed and homozygous R1
pennellii (small fruit) Lycopersicon escu-
individuals with and without transgenes
lentum (large fruit).
were compared for phenotype.
2. All wild Lycopersicon spp. contain
2. Significant differences in fruit weight
small-fruit alleles at the locus fw2.2; mod-
were observed only for COS50 transform-
ern cultivars have large-fruit alleles, which
ants and the differences were signifi-
suggested that this locus is a domestica-
cant in both Mogeor and TG496 genetic
tion locus and partially recessive muta-
backgrounds.
tions lead to large fruit. The alleles from
3. By sequencing the COS50 clone, two
modern cultivars at fw2.2 increase fruit
ORFs were identified: cDNA44 (used as
weight by 530% in segregating popula-
probe) and ORFX (Fig. 11.5C).
tions but 47% in near-isogenic line (NIL)
4. Recombination with COS50 (XO33) delim-
populations.
ited fw2.2 to a region containing ORFX.
3. NILs were developed with a total of
41.9 cM of L. pennellii DNA (containing
fw2.2) in an L. esculentum background
Exploration of ORFX identity
(Fig. 11.5A).
4. Fine mapping narrowed the region to 1. A significantly higher level of ORFX
two YAC clones (150 kb region) containing transcript in small-fruited NILs (Fw2.2)
fw2.2 (Alpert and Tanksley, 1996) (Fig. 11.6; was found, compared to large-fruited (cul-
Fig. 11.5A). tivated) RP (fw2.2). No ORFX transcript was
440 Chapter 11

C IGS ORFX cDNA44

0 1.4 2.8 5.1 5.8 6.0 7.8 14 kb


XO33

B
TG91 cDNA70 cDNA27TG687 cDNA38 cDNA44 HSF24 TG167

~5 kb
COS62 COS84 COS69 COS50
XO31 XO33

14
12
10

Log P
8
6
4
A 2
0
14.6

10.4

10.1
1.6

5.4

6.8

7.5

5.9
1.6 TG266
7.0
0.9
3.3
6.6

8.4

2.4
3.4
1.6
1.6
1.0
2.6
1.1
7.7
6.4
Distance (cM)
marker

TG154
TG537

CT59
TG492
CD66
TG91
TG167
TG361
TG140
TG151
TG189

TG426
CT232

TG48
TG33

TG308

TG353

TG469
CT205

CT255

TG554

TG145

CT9

Fig. 11.5. High-resolution mapping of the fw2.2 QTL. (A) The location of fw2.2 on tomato chromosome
2 in a cross between Lycopersicon esculentum and a NIL containing a small introgression (grey area)
from Lycopersicon pennellii (Alpert and Tanksley, 1996). (B) Contig of the fw2.2 candidate region,
delimited by recombination events at XO31 and XO33 (from Alpert and Tanksley, 1996). Arrows
represent the four original candidate cDNAs (70, 27, 38 and 44), and heavy horizontal bars are the four
cosmids (COS62, 84, 69 and 50) isolated with these cDNAs as probes. The vertical lines are positions of
RFLP or cleaved amplified polymorphism (CAPs) markers. (C) Sequence analysis of COS50, including
the positions of cDNA44, ORFX, the A-T-rich repeat region, and the rightmost recombination event,
XO33. From Frary et al. (2000). Reprinted with permission from AAAS.

detected in the L. pennellii cDNA library. Characterization of ORFX


The transcript was only detected at low lev-
els with RT-PCR in mRNA extracted from 1. ORFX represents a previously uncharac-
pre-anthesis carpels. terized plant-specific multi-gene family and
2. Carpels were heavier in RP but cell it has at least four paralogues in tomato and
size was the same in NILs and RP, which eight homologues in Arabidopsis (organ-
suggested that ORFX controls carpel cell ized in two- or three-gene clusters).
number before anthesis. 2. Sequence comparison of ORFX alleles
3. ORFX was found to encode a 163-aa from L. pennellii and L. esculentum (830-nt
polypeptide of 22 kDa. fragment, which included 95 nt from 5' UTR,
4. BLASTP showed matches only with plant 55 nt from 3' UTR) showed 42-nt difference.
genes in GenBank (dicots, monocots, gym- 3. ORFs are highly conserved with only 35-nt
nosperm); and none of the putative homo- differences in introns; four silent changes; three
logues have known function. causing amino acid changes. It was concluded
5. The three-dimensional shape of the pred- that functional differences in alleles may be
icated protein was similar to heterotrimeric due to a combination of sequence changes in
guanosine triphosphate-binding proteins in coding and upstream regions of ORFX.
rat, which is associated with control of cell 4. Reduction of cell division in the small-
division. fruited NIL correlates with higher levels of
Isolation and Functional Analysis of Genes 441

0.1 cM 0.2 cM 0.3 cM 0.4 cM 0.5 cM 0.6 cM 0.7 cM

TG686
TG687 TG167
CD66 TG91 HSF24 TG361 Plant Average ten
recombinant fruit weight (g)
ID # CA NY

NA b
3 62.1

11 69.3b NA

b NA
12 63.3

31 NA 63.0b

33 NA 50.5a
34 41.1 a
44.1a
Controls
M82-1-8 72.4 71.9

NIL 939-2 49.9 49.6

Location of fw2.2
= Homozygous Lycopersicon pennellii DNA
a = Significantly different (P < 0.01) from M82-1-8 large-fruited control
= Homozygous Lycopersicon esculentum DNA b = Significantly different (P < 0.01) from NIL 939-2 small-fruited control
= Interval in chromosome where crossover took place

Fig. 11.6. Graphical genotypes of homozygous recombinants in the fw2.2 region of chromosome 2. Five
replications of each recombinant plant were grown in California (CA) and New York (NY). The average gram
(g) weight of ten fruit from each recombinant was compared with the large-fruited, M82-1-8, and the small-
fruited, NIL 939-2, controls. Recombinants #3, #11, #12, and #31 were significantly larger (b; P < 0.01) for
average ten fruit weight in comparison to the small-fruited control, NIL 939-2, while recombinants #33 and
#34 were significantly smaller (a; P < 0.01) for average ten fruit weight in comparison to the large-fruited
control, M82-1-8. Recombinants #31 and #33 delineate the fw2.2 region (bracketed by arrows), based on
the smallest region demonstrating statistical significance. Plants for which few or no fruit were harvested
due to pest infection were not available (NA) for fruit weight analysis. The black and white boxes indicate
the homozygous condition for Lycopersicon pennellii (NIL 939-2) and Lycopersicon esculentum (M82-
1-8) at the molecular markers, respectively. The grey boxes indicate the approximate position between
two molecular markers where the genetic recombination event took place. The genetic distance between
molecular markers (separated by dashed lines) is indicated by the scale shown in centiMorgans (cM). From
Alpert and Tanksley (1996) National Academy of Sciences, USA 1996.

ORFX transcript, suggesting a role as a nega- has been used as a tool in plant breeding
tive regulator of cell division. for many years with numerous cultivars
released. As described in Chapter 1, various
chemical and physical mutagens have been
11.5 Identification of Genes used to create a wide variety of unique plant
by Mutagenesis mutants to increase the amount of variation.
Mutagenesis approaches have attracted the
Phenotypic variation that has been used attention of plant molecular biologists as
in genetic analysis and plant breeding they provide a means for identifying desired
comes from either natural variation or genes (Xu et al., 2005). Whole genome muta-
induced mutations. Natural phenotypic genesis brings an opportunity of mutating
variation is observed in germplasm collec- every gene contained in a plant species.
tions and exists as a random collection of In functional genomics, mutant popu-
diverse mutations throughout the genome, lations or libraries that cover all possible
although natural selection has led to main- genes become an increasingly important
tenance of these mutations. Mutagenesis tool. Mutant libraries can be constructed
442 Chapter 11

using chemical and physical mutagenesis, cal and developmental consequences of


T-DNA insertion and transposon tagging mutations in the gene of interest, as well
(e.g. Joen et al., 2000; Leung et al., 2001; Xue as a relatively facile means for isolating the
and Xu, 2002; An et al., 2003; Hirochika, affected genes.
2003; Hirochika et al., 2004). These libraries The nature of the damage that is caused
can be used for functional analysis based on by mutagenesis determines the functional
loss-of-function analyses. Gain-of-function class of genetic alternations that are produced.
approaches such as T-DNA activation tag- Deletions, insertions and rearrangements are
ging and gene overexpression are powerful more likely to result in loss-of-function alle-
complements to insertional mutagenesis les, whereas point mutations can lead to a
for the successful identification of mutant broader range of effects, including hypomor-
phenotypes. Libraries of enhancer trap lines phic, hypermorphic and neomorphic effects
have also been developed which facilitate (that is, alleles of reduced, enhanced or
the detection and isolation of regulatory ele- novel gene function, in corresponding order)
ments (Wu, C. et al., 2003). (Alonso and Ecker, 2006).
An alternative to nucleic acid sequence
characterization is the direct demonstra-
tion that the sequence has a function. This 11.5.1 Generation of mutant populations
can be done by the re-introduction of the
sequence into the appropriate plant or by The most reliable method to ascertain gene
knock-out of the gene through mutagenesis function is to disrupt the gene and to deter-
(Cullis, 2004). The technology used to muta- mine the phenotype change in the result-
genize genes and identify those mutants ing mutant individual. A large collection of
includes both insertional mutagenesis mutants for an organism will be extremely
(Azpiroz-Leehan and Feldmann, 1997) and valuable to the scientific community and
the TILLING methodology (McCallum et al., will accelerate the speed of gene function
2000), which identifies single base changes analysis. Table 11.1 lists mutagenic agents
in a gene of interest. A third method for dis- and related mutations. Two popular methods
rupting gene function is by gene silencing for mutagenesis are insertion and deletion.
through RNA interference (RNAi) (Cogoni Mutagenized populations that are
and Macino, 2000). derived from a single homozygous genotype,
Perhaps the most popular mutagens in i.e. an inbred line, share a common genetic
use by plant molecular biologists for gene background and individuals differ only
cloning are the insertion mutagens, T-DNAs at one or a few mutagenized loci in the
and transposable elements (transposons). genome (isolines). Transgenic mutagenized
The advantage of these mutagens is that lines carry molecular tags that facilitate rapid
they both disrupt the gene and also serve identification of candidate loci. Chemically
as vehicles for recovery of the plant genic mutagenized lines carry only small insertion/
DNA. This process, known as gene tagging, deletion or point mutations (Table 11.1) which
provides a means to examine the biochemi- are difficult to find unless the candidate

Table 11.1. Mutagenic agents and related mutations.

Mutagenic agents Type of mutation

Ethylmethane sulfonate (EMS) CG >> AT transitions (point mutations)


Di-epoxy butane (DEB) Point mutations, small deletions (68 bp)
Fast neutrons (FN) Deletions (up to 1 kb)
X-rays Chromosome breaks, rearrangements
T-DNA Insertion
Transposable elements (TE) Insertion/deletion
RNAi constructs Insertion where transcribed product causes gene silencing
Isolation and Functional Analysis of Genes 443

region has been defined. Mutation rates rying a constitutive promoter, such as the
as high as 103 alleles per gene have been cauliflower mosaic virus (CaMV) 35S pro-
reported in maize, suggesting that alleles of moter, that is capable of driving the expres-
any given gene might be found by screening sion of genes adjacent to the insertions. The
as few as 3000 M2 families, or 3000 M1 plants knock-about mutation is an insertion that
in the case of non-complementation screen does not inhibit normal functioning of the
(Candela and Hake, 2008). gene. The knock-knock mutation has more
Knock-out mutagenesis including most than one insertion, causing multiple knock-
that are chemically induced has the following outs. Finally, the knock-worst mutation
limitations: (i) redundancy a high level of includes insert events that lead to large-
gene duplication in plants provides genetic scale chromosomal rearrangement.
buffering (backup/second copy) such that Gene knock-outs imply that the activity
knocking out one member of a gene family of a gene has been eliminated. In plants the
may not affect phenotype; and (ii) lethality two major methods for generating these are
some genes confer essential functions; dis- by inserting either a T-DNA or a transposon
ruption will lead to lethality so knock-outs at sequence (Azpiroz-Leehan and Feldmann,
those loci will never be retrieved in generated 1997). The unique advantage of using for-
plants or in offspring. Conditional lethals may eign DNA as a mutagen is that the inserted
be retrieved, if necessary conditions, such as fragment not only disrupts the gene func-
temperature sensitivity, are met. tion but also tags the affected gene with
Resource populations in model systems known sequences, which greatly facilitates
provide genome-wide resources for all biolo- gene isolation. The DNA has a defined
gists so they do not have to develop them for sequence and acts as a marker for the loca-
each experiment. These populations allow tion of the mutation. Thus, by using oligo-
results of independent experiments to build nucleotide screening or specialized PCR,
on each other, because data on the same set of the mutagenized gene can be identified and
mutants can be maintained in a common data- sequenced easily. This method was first
base, facilitating worldwide collaboration. illustrated by the cloning of the white eye
locus of Drosophila (Bingham et al., 1981).
As a principle of insertion tagging, an
11.5.2 Insertional mutagenesis endogenous or an engineered DNA frag-
ment (with known sequence) is allowed to
Insertional mutagenesis occurs naturally insert at random into the genome. When it
in a number of plant species through the lands in a gene, it generally causes a reces-
excision and reintegration of endogenous sive, loss of function mutation. For insertion
transposable elements. The insertion of a mutagenesis to be useful for isolating all
known DNA segment into a gene of inter- genes from a plant genome, it will be neces-
est has been an extremely valuable genomic sary to saturate the genome with insertions
tool for a number of systems in mammals so that every single gene has been mutated.
and plants. Insertional events can be classi- The probability that an insertion will be
fied as T-DNA tagging, transposon tagging, found within a given gene can be estimated
retrotransposon tagging, or entrapment based on the size of the gene, the size of the
tagging, depending on the type of element genome and number of inserts distributed
used. Insertion events can also be labelled among the population (Krysan et al., 1999,
according to their sites and types of inser- 2002). Assuming random chromosomal
tions (Jeon et al., 2004). The knock-out is a insertion, tagging efficiency can be calcu-
null mutation with an insertion in the cod- lated according to the formula
ing or regulatory region of a gene. Knock-
down mutations cause reduced expression P = 1 [1 (L/C)]nf
due to an insertion in the promoter or 3'
UTRs. The knock-on (or activation tag- where P is the probability of finding an
ging) mutation has an insertion element car- insertion within a given gene, L is an average
444 Chapter 11

length for the gene, C is the haploid genome the ability to make and propagate large
size, n is the number of independent inser- numbers of transformants; (iv) the pre-
tional lines and f is the average number of dominance of loss-of-functional alleles; (v)
loci inserted per line. the biased distribution of insertion in the
Consider an example that: (i) the rice genome; (vi) the inability to characterize
haploid genome size is 3.8 108 bp; (ii) lethal mutations; and (vii) the difficulty
the average rice gene is 3.0 kb long; and of generating populations that are large
(iii) the mean number of insertion loci per enough to reach complete saturation of the
line is 1.4. A total of 417,000 tagging lines genome.
would be required for establishing a popu-
lation in which a T-DNA insertion could T-DNA tagging
be found within a given gene at 99% prob-
ability.The number of tagging lines required The transfer DNA (T-DNA) is a defined seg-
for saturation mutagenesis of a genome is ment of the tumor-inducing (Ti) plasmid
highly dependent on the length of the target of Agrobacterium tumefaciens and delim-
genes. A group of 1 kb genes in rice requires ited by short (25 bp) imperfect-repeat bor-
1,250,000 lines to achieve 99% probability der sequences called left and right T-DNA
of being mutated, whereas 5 kb genes need borders. The insertion of a T-DNA element
250,000 lines in the T-DNA tagging popula- into a chromosome can lead to many dif-
tion. Jung et al. (2008) summarized the rice ferent outcomes: insertion into the coding
insertional mutants generated by different region can lead to partial or complete inac-
mutagens including T-DNA, Ac/Ds, Spm/ tivation of the gene; while insertion into
dSpm, T-DNA with enhancer, full-length the promoter region can lead to complete
cDNA over-expresser (FOX) system and inactivation of the gene, reduced expres-
Tos17. sion of the gene, or increased expression
Insertion tagging has the following of the gene.
advantages: (i) insertion tagging gener- Several methods have been developed
ally inactivates a gene which simplifies for introducing T-DNA into Arabidopsis.
phenotypic evaluation (disrupts an ORF, These include various tissue culture and
interrupts promoter, interferes with intron- whole plant techniques. However, most
splicing); (ii) it marks the gene for isola- tissue culture-based transformation pro-
tion via inverse-PCR, TAIL-PCR (thermal tocols developed for Arabidopsis were
asymmetric interlaced), transposon dis- not directed toward insertion mutagen-
play, AIMS (amplification of insertion- esis. The vast majority of T-DNA tagged
mutagenized sites), cDNA-AFLP, etc; and genes have been isolated from populations
(iii) it can be used for both forward and of transformants generated with whole
reverse genetics. In forward genetics, it can plant transformation protocols (Jenks and
be used to screen for an interesting pheno- Feldmann, 1996). A computer database
type and uses the tag to isolate the gene. In has been established for Arabidopsis that
reverse genetics, it can be used to identify contains the precise genomic locations of
insertion in a gene sequence of interest and over 50,000 T-DNA insertions. Any gene
to figure out the phenotypic consequences of interest can quickly be found, if the col-
of the insertion; three-dimensional pools of lection contains a mutation in that gene,
insertion line DNA can be used for efficient by performing a simple BLAST search. The
screening. database of these insertions can be found at
Insertional mutagenesis as a tool for http://signal.salk.edu/cgi-bin/tdnaexpress
gene cloning has its limitations: (i) redun- and the Arabidopsis Knock-out Facility at
dancy and lethality (the same as chemi- the University of Wisconsin. A number of
cal mutagenesis); (ii) some species lack other crop plants have similar resources,
endogenous TEs or cannot mobilize them especially rice, which is already well
efficiently; (iii) engineered systems require served with T-DNA insertion lines (Parinov
Isolation and Functional Analysis of Genes 445

and Sundaresan, 2000; Ramachandran known DNA sequence or selection markers


and Sundaresan, 2001). In rice, a mutant are inserted within the elements.
population containing 55,000 lines was Three major families of endogenous
generated and about 81% of the popula- transposable elements have been used for
tion carried one to two T-DNA copies per gene tagging in maize (Candela and Hake,
line. T-DNA was preferentially (80%) 2008). All transposable elements fall into
integrated into genic regions (Hsing et al., one of the following two classes: DNA
2007). elements and retroelements. DNA ele-
T-DNA insertion is the most generally ments, such as Ac/Ds and Spm in plants,
applicable method because it can be used P elements in animals and Tn in bacteria,
for any plant that can be transformed and transpose via DNA intermediates. A com-
regenerated. Because each transformant mon feature of DNA elements is the flank-
is an independent event with the T-DNA ing of the element by short inverted repeat
being relatively randomly inserted into sequences. The enzyme transposase recog-
the genome a large number of independent nizes these sequences, creates a stem/loop
transformation events are needed to inacti- structure and then excises the loop from
vate every gene. The need for the generation the region of the genome. The excised loop
of large numbers of independent transform- can then be inserted into another region of
ants therefore limits this technology to plant the genome. Retroelements transpose via
species or particular lines that are capable RNA intermediates. The RNA is copied
of being transformed in a high-throughput by reverse transcriptase into DNA and the
manner (Cullis, 2004). DNA integrates into the genome. Two types
T-DNA tagging has the advantages of of endogenous transposable elements have
effective interruption of genes, low copy been identified in plants: autonomous and
number (1.5) and preferential insertion into non-autonomous. TEs that are autonomous
genic regions. Low copy number per line code for a transposase that cleaves the ele-
makes it easier to clone the tagged gene but ment from its insertion site in the chromo-
requires a larger number of transformants some. TEs that are non-autonomous lack
to hit every gene with a T-DNA insertion. transposase function but are competent to
Disadvantages include the need for time- transpose if transposase is supplied.
consuming and high efficiency transforma- TEs can be either low copy (e.g. Ac/
tion methods, somatic variation in tissue Ds in maize, Tos17 in rice): typically one
culture and a high percentage of untagged to three copies per genome or high copy
mutants. The T-DNA does not always segre- (e.g. Mu in maize): up to 100s of copies per
gate with mutation (hit and run); approxi- genome, which means very small popu-
mately 3540% of mutants are actually lations are needed to ensure saturation
tagged with the T-DNA. Dipping of whole tagging. It can be difficult to isolate a muta-
plants in Agrobacterium avoids the tissue genized gene by forward genetics because
culture step and somaclonal variation that many genes are hit simultaneously with
results. a TE insertion and one or many may affect
phenotype but TEs are very good for reverse
Transposon tagging genetics.
There are two major advantages in
Transposable elements (TEs) or transposons using the so-called Class II transposons
are sections of DNA (sequence elements) such as Ac/Ds, En/Spm and Mu elements
that move, or transpose, from one site in the of maize, Tam elements of snapdragon and
genome to another and have been used as dTph1 from petunia (Jeon et al., 2004).
a tool for gene isolation. The insertion and First, unlike T-DNA, transposons can be
excision of transposable elements result in excised from the disrupted gene in the pres-
changes to the DNA at the transposition site. ence of transposase, resulting in functional
The transposition can be identified when a revertants that can confirm the phenotypic
446 Chapter 11

consequences of the mutation. Secondly, be launched to move around the genome


excision often leads to insertion close to and generate insertions in every gene. This
the original site, which can be exploited for technique is most easily applied to maize
local mutagenesis through targeted insertion because this is the plant from which most
into specific domains of a gene. To follow of the transposable elements have been iso-
the excision and insertion of transposons lated (Cullis, 2004). The initial lines are, in
elements, phenotypic assay systems have essence, always available. The engineer-
also been developed using marker genes ing of two-component transposon systems
containing transposon inserts, in which that include an inducible promoter will
excision can be monitored by restoration of make this particular technique more widely
markergene activity. Transposon excision applicable to a wide variety of other plant
can be selected by employing visualized species.
markers such as beta-glucuronidase (GUS),
luciferase, streptomycin resistance or green Retrotransposon tagging
fluorescent protein (GFP).
If the maize transposon Ac is used, the Class I retrotransposable elements are a
movement of the transposon is likely to be group of mobile elements that transpose
to relatively close sites on the same chro- through reverse transcription of RNA inter-
mosome as the original insertion point. mediates by reverse transcriptase, RNaseH
Therefore, a number of starter lines can be and integrase enzymes, leaving a copy of the
constructed with the Ac present at various retrotransposon at the original site. This rep-
known chromosomal locations across the licative mode allows the retrotransposons
genome (Cullis, 2004). Then the appropri- to generate genetic diversity by producing
ate starter line can be chosen that will have insertional mutations. Retrotransposons are
a high probability of generating an insertion abundant in species with large genomes,
in a closely linked gene of interest. however, only a small fraction of them are
As with T-DNA insertions, construc- active. When the frequency of transposition
tion of the transposon to function as an in the original or heterologous hosts is con-
enhancer trap is also possible. Such engi- sidered, three Tyl-copia retrotransposons,
neered transposons can be introduced into Tnt1 and Tto1 in tobacco and Tos17 in rice,
the plant genome using T-DNA-mediated seem suitable for gene tagging. These ret-
transformation. Once inserted the trans- rotransposons prefer low-copy, gene-rich
poson can hop from one chromosome regions (Jeon et al., 2004). Retrotransposon
location to another as long as an active mutagenesis is shown in Fig. 11.7.
transposase is present (Smith et al., 1996). Of the rice genome 17% is estimated
Although most transposons tend to hop to to consist of retrotransposons. Tos17 is an
linked sites, a strategy has been devised to endogenous, copia-like element that can
select for transposons that land at unlinked be activated by tissue culture in rice. The
loci (Sundaresan et al., 1995). sequenced cultivar from japonica subspe-
An Ac/Ds tagging system in plants is cies, Nipponbare, has only two copies of
shown in Plate 3. The Ac/Ds tagging system Tos17. A total of 47,196 Tos17-induced
has both advantages and disadvantages. It insertion mutant lines were generated and
is an efficient and cost-effective method to this population carried 500,000 insertions.
generate a large mutant population, although Tos17 was three times more likely to be
secondary transposition complicates gene found in genic regions containing introns
identification and this transposon system is and exons than in intergenic regions.
not available in many species. The advan- Frequency was low in centromeres and peri-
tage of using transposons is that they can centromeric regions that were not in other
then be activated and moved into many types of retrotransposons. A total of 78% of
regions of the genome. Therefore, after the insertions were in hotspots (clustered), with
generation of a small number of lines with an average of 6.5 insertions/hotspot (Miyao
the transposon present, the transposons can et al., 2003).
Isolation and Functional Analysis of Genes 447

Retrotransposon
(Containing reverse transcriptase gene)
Donor DNA

RNA

cDNA

Original New copy of


integrated retrotransposon
retrotransposon integrated

Fig. 11.7. Retrotransposon transposition. Modified from Buchanan et al. (2002).

Activation tagging To overcome the tedious transformation


process for many crop plants, a new strategy
T-DNA mutagenesis can be adapted to gen- that combines activation tagging and Ac/Ds
erate gain-of-function alleles by activation transposon tagging was proposed. In this
tagging (Weigel et al., 2000). To achieve this, system, the T-DNA carries the Ds element,
several copies of a strong transcriptional which contains the Bar gene and a tetramer
enhancer are introduced into the T-DNA. of the 35S enhancer. The T-DNA and the Ds
On integration, the enhancers stimulate the element promote constitutive expression of
transcription of a nearby gene and cause its genes in their vicinity after integration or
ectopic expression. Conventional T-DNA transposition.
mutagenesis shares with transposon tagging Inserted (T-DNA or TE) tags contain a
the disadvantage that it is not efficient for selectable marker and multiple enhancer
tagging genes that are functionally redundant elements (CaMV 35S enhancers) and may
because a phenotype will not be observed. contain a reporter gene (GUS or GFP) with a
In addition, neither T-DNA tagging nor weak promoter as well (Weigel et al., 2000).
transposon tagging will identify genes that When the T-DNA construct inserts in or near
are required during multiple stages of a life a gene (within about 3.5 kb either upstream
cycle and where loss of function results in or downstream), transcriptional signals
early embryonic or in gametophytic lethality. (enhancers) on the T-DNA construct interact
To overcome these drawbacks, an activation- with the native promoter and enhance gene
tagging system consisting of T-DNA vectors expression producing dominant, gain of func-
that contain strong transcriptional enhancers tion mutations. The reporter gene may report
was developed in Arabidopsis and success- the original expression pattern, or the new
fully used in gene cloning and the analysis pattern. Selection can be applied to primary
of the function of redundant genes (Weigel et transformants and the resulting transformants
al. 2000). Multiple transcriptional enhanc- analysed for desired phenotypes, or insertion
ers from the CaMV 35S gene are positioned events. The T-DNA tag is then cloned and
near the right T-DNA border. Genes imme- nearby genes can be characterized.
diately adjacent to the inserted CaMV 35S As an early example, Weigel et al.
enhancers are over-expressed. A transposon- (2000) characterized over 30 dominant,
mediated activation tagging system has also morphological mutants with various phe-
been developed on the basis of a self-stabi- notypes from the T-DNA activation-tagging
lizing Ac transposon derivative, using a Ds pools of Arabidopsis. T-DNA activation has
element that carries the tetramerized CaMV also been used as a tool for isolation of the
35S enhancer (Suzuki et al., 2001). regulators of a complex metabolic pathway
448 Chapter 11

from a genetically non-tractable plant spe- is activated by a chromosomal enhancer ele-


cies (van der Fits et al., 2001), where hun- ment located near the insertion point. Gene
dreds of thousands of Catharanthus roseus trap elements contain a reporter gene with
suspension cells (transformed with T-DNA an intron that carries multiple splicing donor
that carried constitutive enhancer elements) and acceptor sites. The reporter gene can be
were screened relatively easily for their expressed when the element is inserted into
resistance to a toxic substrate. In a recent the transcribed region.
example, Wan et al. (2008) generated about The use of gene traps consists of plac-
50,000 individual transgenic rice plants by ing a reporter gene in a vector whereby the
an Agrobacterium-mediated transformation reporter gene is only activated when inserted
approach with the pER38 activation tag- within a functional gene. The reporter gene
ging vector. The vector contains tandemly has a visual phenotype, so the tissue speci-
arranged double 35S enhancers next to the ficity of the promoter region (and therefore
right border of T-DNA. Comparative field the endogenous gene itself) can be identi-
phenotyping of the activation tagging and fied directly. The reporter activation demon-
enhancer trapping populations in two gen- strates the spatial and temporal expression of
erations (6000 and 6400 lines, respectively, the disrupted gene. Because expression lev-
in the T0 generation and 36,000 and 32,000 els can be monitored in heterozygous plants,
lines, respectively, in the T1 generation) the gene trap system is useful for studying
identified about 400 dominant mutants, the patterns of most plant genes, including
indicating that the activation tagging pool essential genes that cause lethal mutations
is a valuable alternative tool for functional when homozygous. A finer dissection of
analysis of the rice genome. various patterns within an organ has been
Activation lines can be used to identify demonstrated for enhancer trap GUS fusions
function of specific members of gene fami- in Arabidopsis roots (Cullis, 2004).
lies where over-expression is diagnostic but Devices for entrapment can be trans-
loss-of-function provides no phenotype; ferred into plant cells as part of T-DNA or
or to identify the function of genes where transposable elements. This approach has
knock-outs are lethal. Activation lines may been applied to genes that are difficult to
be used to clarify new functions of previ- identify by traditional methods. Because
ously characterized genes where phenotype genes are identified based on their reporter
depends on expression differences. gene expression, a mutant phenotype is not
required. With this advantage, both func-
Entrapment/enhancer/promoter tagging tionally redundant genes and genes that
have functions at multiple developmental
By creating fusions between tagged genes stages can be identified (Jeon et al., 2004).
and a reporter gene such as GUS and GFP,
an entrapment-tagging system allows one
to monitor gene activity. Insertion of the 11.5.3 Non-tagging mutagenesis
promoterless reporter or the reporter with
minimal promoter not only destroys normal While insertional mutants offer obvious
gene function but also activates expression advantages in gene cloning and reverse
of the reporter gene. There are three com- genetics, there are a number of limitations
monly used entrapment systems: promoter associated with transformation-mediated
trap, enhancer trap and gene trap (Springer, mutagenesis. These include: (i) the high
2000). Promoter trap elements contain a pro- initial investment in producing the mutant
moterless reporter gene. The reporter gene is lines or starter lines; (ii) the transgenic
expressed when it is inserted into an exon nature of the mutants prevents large-scale
and forms a translational fusion between the cultivation in the field and exchange of
endogenous gene and the reporter. Enhancer mutation collections in different countries;
trap elements carry a reporter gene with a (iii) parts of the genome may not be acces-
minimal promoter. Reporter gene expression sible to insertional inactivation (e.g. hot
Isolation and Functional Analysis of Genes 449

spot insertion by Ac/Ds), thus preventing about 4000 mutations per genome, com-
complete genome coverage; and (iv) in some pared with an average of 1.5 insertions per
organisms, insertional mutagenesis has transferred DNA (T-DNA) mutants (Alonso
never reached the efficiency needed for et al., 2003; Till et al., 2003). Chemical
large-scale mutagenesis. Due to these issues, agents generate a broader range of DNA
other types of mutations are also used in alternations; these are predominantly single
plant species, including deletion and chem- base-pair substitution, but also induce small
ical mutagenesis. insertions and deletions. Importantly, the
Chemical and radiation-induced muta- distribution of EMS-induced mutations is
tions have been widely used for random unbiased (Alonso and Ecker, 2006).
mutagenesis in plants, resulting in a broader
spectrum of mutation alleles that occurs Point mutations
randomly in the genome. Chemical muta-
genesis produces a broad range of mutant EMS, a base-alkylating agent that generates
alleles such as loss-of-function, gain-of- point mutations (of which the vast majority
function, reduction-of-function and novel are G/C-A/T transitions, which often lead
functions, in contrast to insertion and dele- to the creation of stop codons/nonsense
tion mutagenesis that causes mainly loss- mutations), has been used most commonly
of-function mutations. When an efficient because of its ease of use and the diversity
transformation tool is not available, it is not of potential mutants. As EMS causes a high
possible to adopt a gene tagging strategy, but density of mutations, fewer plants need to
these random, non-tagging systems can be be screened in order to target all genes, com-
utilized to create a mutagenized library. pared with other mutagenesis systems.
Ionizing radiation has been widely However, point mutations induced by
used to induce mutations for plant breeding EMS are subtle changes whose detection can
and classical genetic analysis, but the con- be challenging. Once a phenotypic mutant
sequences of ionizing radiation have only is identified, it is necessary to determine the
recently been examined closely at the molec- locus in the genome by a positional cloning
ular level. Several genes have been identi- strategy in order to clone the corresponding
fied in animals and plants using deletion gene, as discussed in the previous section. If
mutants. Fast neutron, gamma ray, X-ray and a mutant that exhibits an identical or simi-
UV radiations have been used in different lar phenotype has already been identified,
systems. Usually, fast neutrons produce large complementation crosses are a first step
deletions while the other three radiations to determine whether the new mutation is
yield small deletions or point mutations. allelic. The existence of multiple alleles can
Besides ionizing radiation, a number of give information on gene function and be
chemicals have been used to generate large useful for breeding.
mutant collections. Many chemicals can Strategies have been developed recently
be used as mutagens but di-epoxy butane so that subtle changes like point mutation
(DEB), N-ethyl-N-nitrosourea (ENU), ethyl- can be detected easily. For efficient adap-
methane sulfonate (EMS), di-epoxy octane tation chemical induced mutagenesis for
(DEO), ultraviolet-activated trimethylpso- reverse genetics in Arabidopsis and other
ralen (UVTMP) and hexamethyllphospho- plants, McCallum et al. (2000) developed
ramide (HMPA) are common mutagens used a large-scale screening system, targeting
in animals and plants. Generally, deletions induced local lesions in genomes (TILLING),
caused by chemical mutagens are relatively which allows a point mutation to be identi-
small, ranging from point mutations to fied. In the basic TILLING method, seeds
mutations of several kilobases. are mutagenized by treatment with EMS.
Chemical agents such as EMS and nitro- The resulting M1 plants are self-fertilized
somethylurea (NUM) are extremely efficient and DNA is prepared from the M2 individu-
mutagens in A. thaliana. Under optimal con- als. To screen many individuals a pooling
ditions, EMS treatment of seeds can generate strategy is used. DNA samples are pooled
450 Chapter 11

and pools are arrayed on microtitre plates EcoTILLING technologies and discussed the
and subjected to gene-specific PCR. High- process that has been made in applying these
throughput TILLING (Colbert et al., 2001; methods to many different plant species.
Till et al., 2003) uses the CEL I mismatch In addition, new methods for efficient
cleavage enzyme, which recognizes base- genome-wide detection of point mutations
pair mismatches (Oleykowski et al., 1998). are appearing on the horizon, including a
The PCR is performed using a mixture of mismatch-repair detection on tag arrays
labelled and unlabelled primers. One primer (Faham et al., 2005). Mismatch-repair
is labelled with the IR Dye 700 and the other detection allows > 1000 amplicons to be
with IR Dye 800. Melting and re-annealing screened for variations in a single labora-
of PCR products is followed by CEL I tory reaction. This approach can be scaled
treatment, which preferentially cleaves up to allow sequence comparison in whole-
mismatches in heteroduplexes between genome coding regions among large sets of
wild-type and mutant DNA sequences. lines and controls at a reasonable cost.
CEL I-treated PCR products are applied to
slab gel electrophoresis, then detected in Deletion mutagenesis
two separate channels by LI-COR scanners.
Mutations are indicated by shorter, cleaved Ionizing radiation mutagenesis causes de-
PCR products. If a mutation is detected in a letions and other types of chromosomal
pool, the individual DNA samples that went alternations. In plants, fast-neutrons are
into the pool can be analysed separately to well-established, very effective deletion
identify the individual that carries the muta- mutagens (Koornneef et al., 1982; Li, X.
tion. Once this individual has been identi- et al., 2001). Approximately ten genes
fied, its phenotype can be determined. This are randomly deleted in each line when
screening procedure can locate a mutation treated with fast neutrons at a dose of 60 Gy
to within a few base pairs for PCR products (Koornneef et al., 1982). As fast neutron-
of up to 1 kb in size. A potential problem deletion mutagenesis can be performed on
with this method is that any one individual numerous dry seeds and plant transforma-
will carry multiple mutations. Genetic anal- tion is not necessary, it is easy to produce a
ysis is therefore necessary to confirm that great number of mutants with a high prob-
any observed phenotypic alteration is asso- ability of finding a mutation in every gene.
ciated with the mutation in the target gene Cloning a gene mutated by a deletion
and not with another mutation elsewhere in requires chromosome walking, as with
the genome. However, TILLING often results chemical mutagenesis. However, deletion
in a number of allelic mutations in different mutants can also be effective for reverse
lines, which can help to confirm phenotype genetics. Deletion libraries have been estab-
as well as provide information on protein lished that contain knock-out mutants in
function. An important advantage of the Arabidopsis and rice (Li, X. et al., 2001).
TILLING method is that it can be applied Deletions can be identified by gene-specific
to any species for which a gene sequence is PCR screening of pooled DNA, where PCR
known. extension time is shortened so that ampli-
TILLING has moved from proof-of con- fication of the longer wild-type fragment
cepts to production with the establishment is suppressed and only mutant lines yield
of publicly available services for Arabidopsis, products (Joen et al., 2004). Experimental
maize, lotus and barley. Pilot-scale projects approaches for identifying DNA deletions
have been completed on several other plant in pools of mutants that are generated by
species, including wheat. The protocols devel- high-energy ionizing radiation have been
oped for TILLING have been adapted for dis- developed (Li and Zhang, 2002), which
covery of natural nucleotide variation linked is called Deleteagene. Deleteagene can be
to important phenotypic traits, a process applied to plants in which transformation
termed EcoTILLING (Comai et al., 2004). Till is inefficient; it might also provide a means
et al. (2007) reviewed the current TILLING and of simultaneously mutate (delete) tandem
Isolation and Functional Analysis of Genes 451

duplicated genes (Li and Zhang 2002; and pairs with complementary sequences.
Zhang, S. et al., 2003). In contrast to inser- The most well-studied outcome of this rec-
tion mutants where the probability of find- ognition event is post-transcriptional gene
ing a mutant is proportional to the size of silencing. In this way all the RNA tran-
the target gene, meaning that identification scripts from any of the members of a gene
is difficult in a small gene (Krysan et al., family can be simultaneously silenced if
1999), it is easier to find a knock-out of a highly homologous regions are used. Any
small gene from a deletion mutant pool. resulting phenotype can then be attributed
to the functioning of that gene family, but it
will still need to be determined whether the
11.5.4 RNA interference family members contribute redundant func-
tions or whether only one of the members
All gene disruption approaches have some of the gene family actually conditions the
inherent limitations. For example, it is dif- particular phenotype observed.
ficult to identify the function of redundant Artificial microRNAs (amiRNAs),
genes or the functions of genes required in which are designed to target one or several
early embryogenesis or gametophyte devel- genes of interest, provide a new and highly
opment. One way to overcome the redundant specific approach for effective post-transcrip-
gene problem is to simultaneously inhibit tional gene silencing in plants. Warthmann
all the members of a gene family through et al. (2008) devised an amiRNA-based strat-
gene silencing. RNA interference (RNAi) is egy for both japonica and indica types of
a mechanism that inhibits gene expression cultivated rice. Using an endogenous rice
by causing the degradation of specific RNA miRNA precursor and customized 21mers,
molecules or hindering the transcription amiRNA constructs were designed to target
of specific genes (Fire et al., 1998). RNAi three different genes (Phytoene desaturase
refers to the function of homologous double- -Pds, Spotted leaf -Spl11 and elongated uper-
stranded RNA (dsRNA) to specifically target pmost internode-Eui1/CYP714D1). Upon
a genes product, resulting in null or hypo- constitutive expression of these amiRNAs
morphic phenotypes. As long as the interfer- in the cultivar Nipponbare (japonica) and
ence is targeted to a region of the gene that IR64 (indica), the target genes were down-
is conserved within all the members of the regulated by amiRNA-guided cleavage of the
gene family, all members of the family will transcripts, resulting in the expected mutant
be similarly inhibited (Tang et al., 2003). phenotypes. The effects were highly spe-
The most interesting aspects of RNAi cific to the target gene, the transgenes were
are the following (Cullis, 2004): stably inherited and they remained effec-
tive in the progeny. Ossowski et al. (2008)
dsRNA, rather than single-stranded reviewed various strategies for small RNA-
antisense RNA, is the interfering agent. based gene silencing, described the design
It is highly specific. and application of artificial miRNAs for gene
It is remarkably potent (only a few silencing in many plant species and com-
dsRNA molecules per cell are required pared the small RNA pathways mediating
for effective interference). transgene-induced gene silencing, includ-
The interfering activity (and presum- ing post-transcriptional gene silencing, tran-
ably the dsRNA) can cause interference scriptional gene silencing and virus-induced
in cells and tissues far away from the gene silencing.
site of introduction.
The RNAi pathway is initiated by the enzyme
DICER, which cleaves long, dsRNA mol-
ecules into short fragments of 2025 bp. One 11.5.5 Gene isolation via mutagenesis
of the two strands of each fragment, known
as the guide strand, is then incorporated There are two main approaches for disrupt-
into an RNA-induced silencing complex ing gene function on the basis of its DNA
452 Chapter 11

sequence: using one of the targeted tech- The ligated DNA is precipitated and
niques such as RNAi or ectopic expression, transformed by electroporation into
or screening a collection of randomly gen- recombination-deficient E. coli cells to
erated mutants for a knock-out. The ampli- maximize the stability of the multimer-
fication and sequencing of genomic DNA ized (CaMV 35S enhancers). Recovered
next to an inserted transposon or T-DNA is plasmids can then be sequenced to
an essential step in identifying a mutation identify captured flanking sequences.
within a gene. Several steps are required
to isolate the disrupted flanking DNA, 4. Inverse PCR (IPCR): utilizing primers
incorporating different possible techniques made from the left or right border sequences
(Jenks and Feldmann, 1996), including: on circularized genomic fragments. IPCR has
been implemented to isolate DNA segments
1. Screen for the mutant: this can be done of the genome that flank the inserted molec-
through the generation of genomic librar- ular in transgenic plants tagged by T-DNA,
ies from the mutants and screening with transposons or retrotransposons. The tech-
sequences homologous to the right or left nique involves digestion by appropriate
border regions. When the screen is based restriction enzymes containing the known
on visible phenotypes, all the mutants are sequence and its flanking region (Joen et al.,
grown under regular growth conditions if 2004). Many thousands of restriction frag-
screening for morphological variation; or ments are circularized by self-ligation with
they are grown under a special condition if T4 DNA ligase and the circularized DNA is
screening for conditional mutants such as then used as a template in PCR. The unknown
those to biotic or abiotic stresses. flanking DNA segment is amplified by two
2. Confirm co-segregation: because a large primers located at the ends of the known
proportion of mutant lines are untagged in sequence. The first primer is designed to
T-DNA or transposon mutagenized collec- locate near the junction point between the
tions, co-segregation analysis of the T-DNA insert and plant sequences and the second
sequence or selection marker with the phe- primer is located near the enzyme site that
notype is the first step towards cloning the is used for digestion of the mutagenized
gene. It is estimated that 3540% of the DNA. At least 50 nucleotides should be left
mutants in Arabidopsis are possibly due to between the primer sites and the junctions
deletion, rearrangement or somatic mutation for nested PCR to isolate specific amplifica-
during transformation. Once co-segregation tion products and DNA sequencing.
is established for a given mutant, isolation 5. Thermal asymmetric interlaced (TAIL)-
of the mutated gene may be achieved by PCR using nested border specific primers
several methods such as plasmid rescue, and arbitrary degenerate primers. The TAIL-
IPCR or TAIL-PCR. PCR strategy has been used to isolate insert-
3. Plasmid rescue: this involves utilizing end sequences from P1 and YAC clones (Liu
bacterial selectable markers and origin of and Whittier, 1995), genomic sequences
replication from a linearized bacterial plas- that flank T-DNA insertions from trans-
mid incorporated into the T-DNA to isolate genic lines of Arabidopsis (Liu et al., 1995)
T-DNA-plant junctions in E. coli. It includes and genomic DNA flanking Tos17 in rice
the following procedures: (Yamazaki et al., 2001). TAIL-PCR depends
on amplification between a set of three
Restriction enzymes are present in the nested primers for the known sequences and
T-DNA at the ends of the bacterial plas- shorter, arbitrary degenerate primers with
mid sequence. low Tm values. Accordingly, the PCR pro-
After extracting purified genomic DNA, gramme is set to thermally control specific
the genomic DNA is digested with the and non-specific products. In the primary
appropriate restriction enzyme. After reaction, five high-stringency cycles are used
removal of the enzyme, samples are to specifically amplify linear products from
ligated. the target flanking sequences by the known
Isolation and Functional Analysis of Genes 453

insert-specific primer. This is followed by random mutations to be efficiently used in


one low-stringency cycle to allow annealing a reverse genetic screen (Alonso and Ecker,
by the arbitrary degenerate primers in the 2006). First, the number of mutations in the
flanking sequence. The next cycles alter- collection should exceed (by five- to ten-
nate between two high-stringency cycles fold) the number of genes in the genome.
and one reduced-stringency cycle, resulting This redundancy is necessary to ensure
in logarithmic amplification of the target that mutations in a particular gene will be
sequences. Secondary and tertiary reac- found with a sufficiently high probability.
tions by the nested primers decrease non- Secondly, each individual plant in the muta-
specific amplification. TAIL-PCR depends genized populations should be catalogued,
on the accidental position of an arbitrary propagated and pooled so that it can then
primer sequence in the flanking sequence. be effectively screened. The number of
Therefore, it is important to design optimal individual mutants that need to be screened
primers in order to successfully amplify a tends to be very high, making it necessary
given insertion site. to pool individual mutants before testing for
6. Adaptor-ligated PCR: this was an early the presence of the mutation of interest. This
method for isolating the flanking region of requirement has promoted the development
a known DNA and now a modified method, of more sophisticated pooling strategies that
PCR walking, has been developed for iso- minimize the number of assays required but
lating genomic sequences that flank T-DNA still allow the identification of an individual
borders (Balzergue et al., 2001; Cottage mutant line in one-step or two-step screens
et al., 2001). The method comprises three (Winkler and Feldman, 1998; Alonso et al.,
major steps: restriction, adaptor ligation and 2003). The optimal strategy for screening
PCR amplification. The genomic DNA car- the DNA pools depends on the type of muta-
rying the T-DNA or transposon is digested gen used or, more specifically, the nature of
with blunt-end restriction enzymes. An the DNA lesion. Thirdly and perhaps most
asymmetric adaptor cassette is ligated to the important, a DNA sequence-based screen-
digested DNAs. By using a primer specific ing approach needs to be developed that is
for the adaptor cassette and a primer spe- sensitive enough for a single plant with a
cific to the T-DNA or transposon, unknown specific sequence alteration to be detected
target DNAs that flank the insertions can be within a pool of wild-type individuals.
amplified. Different types of DNA lesion (deletions,
7. Complementation test: to prove the iso- insertions and point mutations) require dif-
lated gene is the one responsible for the ferent screening methodologies as described
mutant phenotype, molecular complemen- by Alonso and Ecker (2006).
tation should be performed by introducing A major drawback of reverse genetic
the wild-type allele into the mutant back- methods is that the screen has to be repeated
ground and restoring the wild-type phe- for every gene. However, high-throughput
notype. Alternatively, good evidence can identification of genome insertion sites in
be obtained by molecular characterization mutagenized populations can be achieved
of several alleles at the locus. Transposon by taking advantage of the known sequence
mutants can sometimes be reverted by exci- of the inserted DNA and the availability of
sion of the transposon, which can also be a complete genome sequence in sequenced
used as evidence that the insertion was the plant species. Various PCR-based strategies
cause of the phenotype. have been adapted for this purpose, includ-
ing TAIL-PCR (Sessions et al., 2002) and the
All approaches which produce lesions adaptor ligation approach (Alonso et al.,
in unpredictable chromosomal locations 2003).
can be used both in forward and in reverse One of the most exciting uses of the
genetic approaches for gene discovery. near complete collection of gene-indexed
There are, however, three main require- mutations for a given species is the ability
ments that must be met for a collection of to carry out whole-genome forward genetic
454 Chapter 11

screens (Carpenter and Sabatini, 2004). This function analysis (Alonso and Ecker, 2006).
will enable researchers to test simultane- First, induced alleles with phenotypes
ously the role of all genes in the genome that have only been observed in a specific
for involvement in a particular biological genetic background (Sanda and Amasino,
process (Alonso and Ecker, 2006). The first 1996) point to the need for the creation and
step towards this goal involves generating sequencing of insertions in large mutant
a non-redundant collection of homozygous populations using various accessions or
mutants. From more than 300,000 gene- ecotypes; a process that will be facilitated
indexed mutant lines in A. thaliana, ideally by UHTS. Secondly, UHTS technology
two independent lines per gene need to be will allow complete genome re-sequencing
selected, the mutations need to be confirmed of many hundreds, or even thousands, of
and homozygous plants obtained. The hypo- accessions. With the concomitant develop-
thetical end product for this step will be a ment of more phenotyping platforms and
collection of 521 96-well plates that corre- corresponding community phenotyping
spond to 50,000 mutant lines, two lines for databases, whole-genome association stud-
each of the 25,000 genes. This seed library ies that link genotype and phenotype will
could then be systematically screened to become the approach of choice to inter-
study the role of each one of the 25,000 rep- rogate plant gene function and the role of
resented genes in any given biological proc- natural allelic variation in plant adaptation
ess. The identification of mutants affected to a range of local growth habitats (Weigel
in the selected biological process allows the and Nordborg, 2005).
immediate identification of the underlying
genes. By having two independent mutant
lines per gene, false positives and the need 11.6 Other Approaches for Gene
for experimental replicates are substantially Isolation
reduced.
Several important advances towards
As described in Chapter 3, DNA chip and
gene-function analysis in Arabidopsis are
microarray technology make it possible
on the horizon (Alonso and Ecker, 2006),
to do gene isolation in a high-throughput
which should provide some guidelines for
way. Gene isolation through microarrays is
all plant species: the ability to do systematic
to identify target gene(s) from a genome.
forward genetics using reverse genetic tools
There are two different approaches: (i)
(simultaneous phenotypic analysis of all
parallel analysis of gene expression by
gene-indexed mutants), the development of
comparison of expressions among differ-
new phenomic platforms, improvements in
ent species or different individuals within
targeted mutagenesis (specifically, homolo-
a species, or expressions of the same indi-
gous recombination) and the utilization of
viduals at different growth or developmen-
natural variation in gene function studies.
tal stages or under different environments.
Induced mutations (point mutations, dele-
Microarray-based gene expression analysis
tions and transposon- and T-DNA-generated
can be used to detect the type and abun-
mutations) and other means of reducing gene
dance of mRNA in the cell by hybridiza-
expression (such as the use of small inhibi-
tion, which needs few samples and is
tory RNAs (siRNAs), amiRNAs and artificial
highly automatable. (ii) Genes can be iso-
repressor proteins) will continue to be used
lated from cDNA or EST microarrays by
for some time. However, the importance of
using homologous probes.
natural allelic variation to study gene func-
tion in plants is likely to increase. The rapid
advancement of ultra high-throughput
sequencing (UHTS) technologies, which 11.6.1 Gene expression analysis
allow 1 gigabase of sequence in 48 h for a
cost of US$3000 (Service, 2006), is likely to One of the major applications of DNA micro-
have a profound effect in two areas of gene- arrays is gene expression profiling (Tessier
Isolation and Functional Analysis of Genes 455

et al., 2005). Gene profiling via microarrays sample data together. Either the gene or the
involves determining the expression of array dimensions can be clustered accord-
genes under specific conditions. Similarly, ing to similarity indices, enabling one to see
microarrays have enabled genome-wide both genes with similar expression profiles
class comparisons such as organs, geno- across arrays and arrays that have underly-
types or conditions. Several studies have ing similarities across genes (Finak et al.,
identified genes that are consistently dif- 2005).
ferentially expressed between two or more Class prediction experiments aim to
predefined classes, the degree to which a find subsets of genes that can best distin-
gene is active in a certain organ or tissue guish between two or more classes of sam-
can be measured by the amount of mRNA ple. First do sequence analysis. Probes are
found in the cells, although the correlation designed using gene-specific DNA frag-
between mRNA and active protein is not ments and oligonucleotide or DNA arrays
always absolute due to post-transcriptional are made for all genes of an organism. All
regulation. Such approaches hope to find mRNA probes reverse transcribed under
the complete set of genes that differentiate different classes from the organism are
between cellular states and shed light on then hybridized with the DNA microar-
the underlying differences between these ray. Based on the intensity of hybridiza-
classes at the molecular level. tion signal, differential gene expressions
Initial strategies for detecting dif- or co-expression under different classes
ferentially expressed genes between two can be detected. Class-dependent gene
classes are straightforward. Essentially they expression can be identified by compari-
involve two-sample comparisons of the dif- son of expression profiles across all genes
ferences between mean log expressions of between different classes. The set of genes
the classes. The significance of the differ- can be mapped on to the gene ontology
ences is estimated using t-tests modified or to metabolic pathways. In this way,
specifically for array data (simple t-tests physiological functions of a gene can be
are almost never used usually more com- analysed and related functional genes
plex t-tests are needed due to the nature of can be determined. There are two early
the data) or its non-parametric analogues. examples (Lockhart et al., 1996; Wang,
When more than two classes are involved, X. et al., 1999). In addition, several tools
F-statistics and non-parametric analogues for exploring mRNA expression data with
can be applied. known proteinDNA and proteinprotein
A wide variety of statistical techniques interaction databases have been reported.
is available for class prediction (Finak et al., An example is CYTOSCAPE, which maps
2005), including linear discriminant analy- expression data on to the protein interac-
sis, weighted voting, nearest-neighbour tion network (Shannon et al., 2003). This
classifiers, support vector machines, neural approach can yield important insights into
nets and Bayesian methods. At the centre the protein complexes perturbed in a given
of all of these methods is the issue of fea- experiment and establish functional roles
ture selection. The goal is to select a sub- for the genes distinguished.
set of features (genes) that best distinguish
between known classes and can predict
new, unseen samples.
Class discovery experiments attempt to 11.6.2 Using homologous probes
determine biologically relevant subclasses
of a particular cellular state. Several meth- Availability of adequate quantities of suf-
ods are used for this purpose and the most ficiently pure protein for production of
popular techniques are k-means cluster- specific antibodies or for partial peptide
ing, hierarchical clustering, self-organizing sequencing opens the door to cloning of the
maps and principal component analysis. gene specifying that protein. Physiological
The goal of clustering is to organize similar and biochemical investigations may lead
456 Chapter 11

to the identification of a protein responsi- that can produce the corresponding mRNA.
ble for the phenotype or biological property Positively-reacting genomic library clones
of interest. If the purified polypeptide has need to be further examined to identify
a non-blocked N-terminus, the N-terminal where on the clone the gene is located and
sequence may be determined directly. prove that the clone actually contains the
Sequences from the interior of the protein gene.
can be obtained by producing and analysing Functional genomics has been broadly
proteolytic fragments of the polypeptide. If applied to include many endeavours aimed
sufficient quantities of purified protein are at determining functions of genes on a
available, the protein may be used to immu- genome-wide scale, such as transcriptional
nize animals (commonly rabbits or mice). profiling to determine gene expression pat-
The animals usually produce and secrete terns; and yeast two-hybrid and other inter-
into the serum, antibodies specifically action analyses to help identify pathways,
recognizing the protein. The antisera may networks and protein complexes (Chapter
be used to detect the immunizing protein 3; Henikoff and Comai, 2003). Although
specifically. a daunting task, several approaches have
Peptides of ten to 30 residues in length already been established, including the use
can be chemically synthesized efficiently. of T-DNA knock-out lines and over-expres-
Small peptides chemically coupled to sion studies. In contrast to the previously
larger carriers can be effective immunogens. prevalent gene-by-gene approaches, new
Antisera can be used to recognize clones of high-throughput methods are being devel-
an expression library that are synthesiz- oped for expression analysis as well as for
ing the cognate antigen. The antibodies the recovery and identification of mutants.
bind to protein from the colony or plaque. The experimental approach is consequently
Bound antibodies can be detected by any of changing from hypothesis-driven to non-
a variety of methods such as radioimmune biased data collection and an archiving
precipitation and enzyme-linked immuno- methodology that makes these data avail-
sorbent assay (ELISA). able for analysis by bioinformatics tools.
Nucleotide sequences that could code Reverse genetics (sequenced gene to mutant
for the determined sequence of amino acids and function) may play a more prominent
can be deduced from the genetic code. role in functional genomics studies in the
Since the genetic code is redundant, mul- future (Xu et al., 2005).
tiple nucleotide sequences can encode the In true directed mutagenesis, research-
same peptide sequence. To be sure that the ers choose the gene to be perturbed. The most
actual nucleotide sequence is present in a elegant and precise targeted mutagenesis
probe oligonucleotide, the oligonucleotide approach relies on homologous recombina-
is synthesized incorporating, where needed, tion to target foreign DNA on a homologous
multiple nucleotides. The product is called sequence in the host genome. This is rarely
a degenerate oligonucleotide. possible in plants and therefore alternative
When amino acid sequences from two approaches have been developed to alter the
separated parts of the polypeptide chain are expression of selected genes. There are two
available, degenerate oligonucleotides based main variants of direct mutagenesis: gene-
on the two sequences can be used to attempt silencing (RNAi) and zinc-finger nucleases.
to amplify the intervening sequence by PCR. In these strategies, specific sequences that
An amplified RT-PCR product may be used in are unique for each gene to be disrupted
nucleic acid hybridization to screen the col- must be engineered in vitro and then intro-
onies or plaques of a cDNA library for clones duced into the plant. It has been shown that
containing complementary sequences. the expression of a sequence-specific zinc-
Positive clones obtained from the cDNA finger nuclease in A. thaliana generates
library can be used as nucleic acid hybridi- mutations in the target gene in planta (Lloyd
zation probes to screen a library of genomic et al., 2005). The large battery of well-char-
DNA to identify clones containing the gene acterized zinc-fingers, each with different
Isolation and Functional Analysis of Genes 457

and specific DNA-recognition sequences, alleles contribution to the phenotype.


should allow the use of this methodology Technologies such as microarrays, fluores-
to target almost any gene in the A. thaliana cence polarization, mass spectrometry and
genome (Alonso and Ecker, 2006). molecular barcodes could achieve through-
Developments in high-throughput puts of 10,000 markers, which shows prom-
genomics will facilitate the process of dis- ise for high resolution association studies
secting the genetic basis of complex traits, based on natural variation/natural popula-
including defining genetic intervals, iden- tions and of high-throughput map-based
tifying candidate genes and verifying an cloning.
12
Gene Transfer and Genetically Modified Plants

As described in previous chapters, intra- some early reviews can be found in McElroy
specific transfer of genes is easily per- and Brettell (1994), Christou (1996) and
formed by cross-hybridization in all plants McElroy (1996) among many other books
with a sexual cycle. Gene transfer by cross- and journal articles. Only some basic con-
hybridization becomes more difficult or cepts will be introduced in this chapter. For
impossible with increasing phylogenetic a full coverage, readers are recommended to
distance and as a result, the inter-generic seek information in recent books, including
gene transfer is very rare. By genetic trans- Liang and Skinner (2004), Parekh (2004),
formation, DNA from any organism can be Pea (2004), Skinner et al. (2004) and the
transferred into other species genomes. Transgenic Crops series by Springer.
The inserted gene sequence (known as the
transgene) may come from another unre-
lated plant, or from a completely different
species: transgenic Bt maize, for example,
12.1 Plant Tissue Culture
which produces its own insecticide, con- and Genetic Transformation
tains a gene from a bacterium. This power-
ful tool enables plant breeders to do what 12.1.1 Plant tissue culture
they have always done generate more use-
ful and productive crop cultivars containing Plant tissue culture exploits the in vitro
new combinations of genes but it expands plasticity of plant growth and develop-
the possibilities beyond the limitations ment because whole plants can be regen-
imposed by traditional cross-pollination erated from a wide range of plant cells
and selection techniques. Plants containing (totipotency). For the majority of species
transgenes are often called genetically mod- gene transfer is carried out using explants
ified- or GM-crops, although in reality all competent of regeneration to obtain com-
crops have been genetically modified from plete, fertile plants. Cell division and callus
their original wild state by domestication, (dedifferentiated tissue) formation, embryo-
selection and controlled breeding over long genesis and organogenesis can be induced
periods (Chapter 1). In this book, the term using combinations of plant growth regu-
transgenic is used to describe a crop plant lators. Auxins like 2,4-dichlorophenoxy-
that has transgenes inserted. acetic acid (2,4-D), picloram and dicamba
Issues in gene transfer and GM-crops and cytokinins like benzylaminopurine
have been a hot topic for many years and (BAP), kinetin and zeatin are usually used

458 Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu)


Gene Transfer and GM Plants 459

in the tissue culture media. There are no Bt gene without significantly changing the
universally applicable methods of plant tis- amino acid sequence. The result was the
sue culture and thus, protocols need to be enhanced production of the gene product in
modified for each genus, species, cultivar plant cells.
and tissue. Within individual cereal species 3. The termination sequence signals to the
the elite germplasm is usually least amena- cellular machinery that the end of the gene
ble to tissue culture. sequence has been reached.
4. A selectable marker gene in the gene con-
struct is to identify plant cells with the inte-
12.1.2 Genetic transformation grated transgene. This is necessary because
achieving incorporation and expression
of transgenes in plant cells is a rare event,
Goals of plant transformation for crop
occurring in just a few of the targeted tissues
improvement are to produce fertile trans-
or cells. Selectable marker genes encode pro-
genic plants with integrated transgenes at
teins that provide resistance to agents that
reasonable frequencies from elite back-
are normally toxic to plants, such as, meta-
grounds. Once a gene has been isolated
bolic inhibitors, antibiotics or herbicides.
through one of the approaches as described
As explained below, only plant cells that
in Chapter 11 and cloned (amplified in
have the integrated selectable marker gene
a bacterial vector), it must undergo sev-
will survive when grown on a medium con-
eral modifications before it can be effec-
taining the appropriate antibiotic or herbi-
tively inserted into a plant. Components
cide. Similar to the gene of interest, marker
of any successful plant transformation sys-
genes also require promoter and termination
tem include delivery of DNA to the plant
sequences for their proper function.
genome without compromising cell viabil-
ity, selection of transformed cells, regenera-
Conventional plant breeding represents
tion to produce intact fertile plants and the
the principal approach to crop improve-
transmission of transgenes into subsequent
ment. It employs methods such as hybridi-
generations. A simplified representation of
zation, introgression breeding, induced
a constructed transgene, containing the nec-
mutagenesis and somatic hybridization to
essary components, which need to be devel-
randomly modify genomes and, as a result,
oped in parallel, for successful integration
create genetic variation (Fig. 12.1a). Genetic
and expression is as follows:
engineering is different from the traditional
1. The promoter is the on/off switch that methods in that any modification can be
controls gene expression at different devel- designed and tailored to achieve the desired
opmental stages and in response to certain effect. This method often fuses promoters
environmental changes, or specific to cer- and genes to produce expression cassettes
tain tissues and organs. On the other hand, that are introduced into plants using bac-
promoters like the most commonly used terial transfer DNAs (T-DNAs) (Fig.12.1b).
cauliflower mosaic virus (CaMV) 35S are It excludes the transfer of known allergen-
constitutive. The genes under constitutive or toxin-encoding genes and analyses the
promoters are expected to be expressed sequence of insertion sites. The ability
throughout the life cycle of the plant in to identify rapidly and eliminate plants
most tissues and organs. containing inadvertent fusions or disrup-
2. The gene of interest is modified to achieve tions of genes is not available to traditional
greater expression in a plant. For example, plant breeding, where genes can be inacti-
the Bt gene for insect resistance is of bac- vated through unpredictable transposition
terial origin and has a higher percentage of of resident mobile elements. The second
A-T nucleotide pairs compared to plants, advantage of transgenic applications is that
which prefer G-C nucleotide pairs. In a it generally takes less than a year to trans-
clever modification, researchers substituted form an existing cultivar with one or several
A-T nucleotides with G-C nucleotides in the traits.
Source Development
Genetic distance time Issues Trait potential

of foreign DNA
unknown DNA
~820 years
~46 years
Transgenic

Transfer of

(proposed)
% genome

Regulation
Xenogenic

complexity
Famigenic

Intragenic

concerns
Transfer
Genetic

Public
(a)
Species boundary
Transfer of existing traits (native
Variety
+ + genes) from one to another Basic
crosses
variety.

M0
Modification of existing traits
Traditional breeding

Mutation through constitutively altered


> 1% + + gene expression (generally
Basic
breeding
knock-outs).

F1
hybrid Introduction of new traits that are
Introgression similar to existing traits and often Basic
breeding > 1% + + associated with disease or stress
tolerance.

Somatic
Interspecies hybrid Introduction of new traits that are
somatic similar to but possibly stronger
hybridization > 1% + + + than existing traits, and often Full
associated with disease or stress
tolerance.
(b) Introduction of new traits that
Transgenic may outperform native traits
modification < 0.1% by transforming plants with Full
+ + genes from viral, bacterial,
Tn
Binary vector fungal or unrelated plant
sources.

Introduction of powerful new


Xenogenic traits that might outperform native
< 0.1%
Genetic engineering

+ + Full
modification traits by transforming plants with
Tn
synthetic genes.

Creation of desired traits by fine-


Intragenic < 0.1% tuning the expression of native Basic
? genes, often in a tissue-specific
modification Tn
P-DNA vector manner.

Transfer of traits from related but


Cisgenic sexually incompatible species by
modification < 0.1% + ? transforming plants with genes Dep.
Tn that are linked to their own
promoters.

TRENDS in Plant Science

Fig. 12.1. Summary of various methods for crop improvement. The genetic distance between DNA
source and target crop is indicated in the left four columns, including foreign and sexually compatible.
The species barrier is shown as a dotted vertical line. Xenogenic, synthetic DNA; transgenic, DNA
from unrelated species, such as viruses, bacteria, fungi and plants that belong to different families;
famigenic, DNA from plants that belong to the same family; and intragenic, DNA from within the same
sexual compatibility group. The % genome column shows the estimated size of the introduced DNA as a
percentage of the entire genome. Proposed regulatory requirements are shown in bold letters with Basic
implying multi-year field tests on agronomic performance and an assessment of the nutritional profile,
and Full indicating more extensive studies, which include biosafety assessments of foreign proteins as
well as environmental studies. Regulatory requirements for cisgenic applications are dependent on the
trait (Dep.). In these cases, the transfer of traits that resemble native traits, such as those associated
with disease resistance, should be considered for the basic regulatory assessment described above.
However, traits that are new to the sexual compatibility group would require more extensive analyses.
(a) Methods in traditional breeding. M0 stands for an original plant derived from induced mutagenesis.
Random mutations are shown as triangles, and can represent hundreds of point mutations/chromosome
induced by ethylmethane sulfonate (EMS) or deletions of up to 100 kb pairs triggered by di-epoxy
butane (DEB) or low linear energy transfer radiation (LET). (b) Methods in genetic engineering. Tn, plant
transformation. Reprinted from Rommens et al. (2007) with permission from Elsevier.
Gene Transfer and GM Plants 461

There are several important fields in industrial research process at large seed or
plant transformation that will not be dis- agrochemical companies. Most university-
cussed in detail in this chapter but are based research groups do not have access
worthy of brief mention here: (i) high- to the physical or human resources neces-
throughput transformation, by which all sary to establish a cereal transformation
candidate genes can be used for transforma- effort for their own target crop. These limi-
tion prior to functional analysis; (ii) plastid tations have led to the establishment of core
transformation with a major advantage that plant transformation facilities (PTFs) at a
in many plant species plastid DNA is not number of academic institutions. Examples
inherited, preventing gene flow from the of North American PTFs include Cornell
GM-plant to other plants; and (iii) chro- University (tomato, algae, fungi), Iowa State
mosome construction and transformation, University (maize and soybean), University
by which high molecular weight DNA and of Nebraska at Lincoln (wheat and soy-
multiple genes can be delivered into plant bean), Texas A&M University (cotton, rice,
cells. Ogawa et al. (2008) established a sorghum, banana, conifers), University of
large-scale, high-throughput protocol to Wisconsin at Madison (lucerne) and the
construct Arabidopsis thaliana suspension- National Research Council (NRC) of Canada
cultured cell lines, each of which carries (canola, wheat and pea). Core PTFs have
a single transgene, using Agrobacterium- advantages to exploit the economies of
mediated transformation. They took advan- scale associated with the centralization of
tage of RIKEN Arabidopsis full-length a labour-intensive activity, i.e. to assemble
(RAFL) cDNA clones and the Gateway clon- a critical mass of transformation specialists
ing system for high-throughput prepara- working on related problems with conti-
tion of binary vectors carrying individual nuity of activity over long time frames, to
full-length cDNA sequences. Throughout provide an in-house resource dedicated
all cloning steps, multiple-well plates were to fulfilling the exclusive transformation
used to treat 96 samples simultaneously needs of the institutions own community,
in a high-throughput manner. They evalu- to eliminate the need to compete for lim-
ated the protocol by generating transgenic ited collaborative opportunities elsewhere,
Arabidopsis T87 cell lines carrying indi- to offer on-site teaching resources in plant
vidual 96 metabolism-related RAFL cDNA tissue culture and transformation, to facili-
fragments and showed that the protocol was tate funding from local grower groups and
useful for high-throughput and large-scale to generate funds from private enterprises
production of gain-of-function lines for that contract out transformation activities
functional genomics. Plastid transformation to public sector organizations as a matter of
is suitable only for certain crop species. For economy. The core PTFs have been work-
example, Ruf et al. (2007) studied geneti- ing well at big companies and international
cally modified tobacco in which the trans- centres. A problem with core PTFs is that
gene was integrated in chloroplasts. In a they may forget their reason for being and
large screen, they detected low-level pater- go off on tangents, thus being of limited use
nal inheritance of transgenic plastids in to the wider community.
tobacco. Mini-chromosomes will be briefly
discussed in Section 12.2.2.

12.2 Transformation Approaches

12.1.3 Development of core plant Transformation is the heritable change in a


transformation facilities cell or organism brought about by the uptake
and establishment of introduced DNA.
Transformation, particularly for cereals, is Different methods have been developed to
presently a technically demanding activ- introduce foreign genes into plants. A com-
ity that is usually carried out as part of an mon feature is that the foreign DNA first has
462 Chapter 12

to enter the plant cell by penetrating the To harness A. tumefaciens as a trans-


plant cell wall and the plasma membrane gene vector, the oncogenes (gall-forming
and then must reach the nucleus and inte- sequences) have been moved from the T-DNA
grate into the resident chromosomes. There and in their place engineered expression cas-
are two major techniques for introducing settes with genes from virtually any source
foreign genetic material into an organism may be substituted, usually by convenient
(IBRD/World Bank, 2006). One is based on insertion into multiple cloning sequences
Agrobacterium tumefaciens, a bacterium that have been incorporated into these
that is able to insert its own or other genes plasmids. The transgene and the selectable
into a plant genome. This method is accom- markers are inserted in the vector between
plished through a plasmid (an autonomous two unique sequences, called the left border
piece of DNA) from the bacterium. The plas- (LB) and the right border (RB). Only T-DNA
mid is used as the basis for the construc- is expected to be transferred to the plant cell
tion of a vector that incorporates the genes and becomes integrated into the plants chro-
that are to be transferred to the plant cells. mosomes (Wong, 1997).
Another one is based on the direct, physi- At the present time gene transfer by
cal transfer of foreign genes into target plant Agrobacterium is the established method
cells. The most common example is particle of choice for the genetic transformation of
bombardment (biolistics), in which metal most plant species. It has been successfully
particles are used as carriers of plasmids practised in both dicots (broadleaf plants
and are introduced into target plant tissue like soybeans and tomatoes) and monocots
at high velocity. In this section, these two (banana, grasses and their relatives). A gen-
major approaches will be introduced along eral scheme for Agrobacterium-mediated
with a brief discussion of other direct gene transformation is outlined in Fig. 12.2. It is
transfer methods. perceived to have several advantages over
other forms of transformation (such as bio-
listics), including the ability to transfer large
12.2.1 Agrobacterium-mediated segments of DNA with minimal rearrange-
transformation ment and with fewer copies of inserted
genes at higher efficiencies with lower cost
Agrobacterium strains (reviewed by Hiei et al., 1997). In addi-
tion, Agrobacterium transformation may
Agrobacterium tumefaciens is a remarkable facilitate the removal of plant-selectable
species of soil-dwelling bacteria that has the marker genes by segregation (Komari, 1996;
ability to infect plant cells with a piece of Matthews et al., 2001).
its DNA. When the bacterial DNA is inte-
grated into a plant chromosome, it effec- Application in cereals
tively hijacks the plants cellular machinery
and uses it to ensure the proliferation of the As a vector-mediated transformation sys-
bacterial population. Many gardeners and tem, Agrobacterium was thought to have
orchard owners are unfortunately famil- a restricted range of interacting hosts.
iar with A. tumefaciens, because it causes For example, cereal cells were initially
crown gall diseases in many ornamental thought to be recalcitrant to Agrobacterium-
and fruit plants. mediated transformation because most
The DNA in an A. tumefaciens cell is monocot species are outside the natu-
contained in the bacterial chromosome as ral host range of A. tumefaciens. Current
well as in another structure known as a Ti Agrobacterium-mediated transformation
(tumour-inducing) plasmid. The Ti plasmid protocols for cereals still involve a callus
contains a stretch of DNA termed T-DNA initiation stage. There are still issues like
(20 kb long) that is transferred to the plant genotype-dependency and somaclonal vari-
cell in the infection process and a series of ation. Development of reliable and efficient
vir genes that direct the infection process. protocols is of great importance for the
Gene Transfer and GM Plants 463

Encounter of tissues
Immature embryos with Agrobacterium
with and without preculture Liquid 1015 min
Solid
treatment co-culture co-culture 23 days Eliminating
Room temp. Agrobacterium
Dark 2025C
Explant Co-cultivation Resting
phase
Immature embryos
Embryogenic calli Optional 7 days

Selection
General steps in
phase-1
Ro

Agrobacterium-mediated
ot

transformation of cereals Dark


in
g
an oil

Transgenic

Selection
d
s
tra

plants 3 weeks
n
sf
er

12 weeks
to

Optional
Rooting Regeneration
Selection
Light/dark 16/8 h phase-2
Dark
34 weeks light/dark Pre- 4 weeks
Regeneration
1 week regeneration
Light/dark 16/8 h Dark

Fig. 12.2. General scheme for Agrobacterium-mediated transformation of cereal plants. From Shrawat
and Lrz (2006) with permission from Wiley-Blackwell.

successful application of transformation immature embryos with and without pre-


technology in crop species. treatment have been part of the major-
Several factors influencing ity of successful reports on the genetic
Agrobacterium-mediated transformation of transformation of cereals and are consid-
monocots have been investigated and elu- ered the best explant type (Cheng et al.,
cidated (as reviewed by Cheng et al., 2004; 1997; Wu, H. et al., 2003). Embryogenic
Jones et al., 2005), including the screen- callus derived from mature seeds has
ing of the most responsive genotype and been reported to be the best explant for
explant, Agrobacterium strain, binary vec- Agrobacterium-mediated transformation
tor, selectable marker gene and promoter, of rice as a result of active cell division
inoculation and co-culture conditions and (Hiei et al., 1994). In conclusion, any explants
tissue culture and regeneration medium. at a vigorously dividing stage can be good
Of these factors, genotype and explant are for transformation.
considered the major limiting factors in
Agrobacterium-mediated transformation Host plant genes involved in Agrobacterium-
of cereals, especially in extending the host mediated transformation
range to commercial cultivars.
The explant type, explant quality and The identification and molecular charac-
source of the explant have been found to terization of the plant genes involved in
be correlated with successful reports on successful Agrobacterium-mediated trans-
the Agrobacterium-mediated genetic trans- formation have opened up new avenues for a
formation of cereals (reviewed by Repellin better understanding of the plant response to
et al., 2001). For example, freshly isolated Agrobacterium infection (Veena et al., 2003).
464 Chapter 12

Such information may help to develop meth- 12.2.2 Particle bombardment


ods to enhance the transformation frequency
of economically important plant species. The microprojectile bombardment method,
In addition, the in-depth studies and evalua- also known as gene gun or biolistics, uses
tion of the genes responsible for stimulating fine metal particles (typically tungsten or
plant cell division and the competency of gold) coated with DNA that are usually
plant cells to Agrobacterium may increase accelerated with helium gas under pressure
not only the extension of transforma- (Fig. 12.3). Particle bombardment involves
tion protocols to elite genotypes but also the acceleration of DNA coated microparti-
the transformation efficiency in cereals cles into cells and tissues. In biolistic bom-
(Shrawat and Lrz, 2006). bardment, the primary delivering systems are
In recent years, efforts have also been the Biolistic PDS-1000/He helium-powered
made to understand the interactions of gun and similar designs, or the particle
host plants with Agrobacterium at the inflow gun. Parameters involved in the
molecular level (Ditt et al., 2001; Veena et biolistic gun include pressure (ranging from
al., 2003). Hwang and Gelvin (2004) have 900 to 1300 psi), particle size (0.61.1 m)
identified four Arabidopsis proteins that and type of material (gold and tungsten), tar-
interact with the main T-pilus protein, get distance (7.510 cm) and target material
VirB2 and have shown that the presence (cell suspension, callus, meristem, proto-
of these proteins is required for efficient plast, immature embryo). Particle accelera-
transformation. tion is rapid enough to penetrate the cell wall

A B
Helium pressure gauge
Before After

Fire switch
Vac/vent/hold switch
Gas acceleration tube
Power switch
ON/OFF
Rupture disk
Bombardment Macrocarrier A
chamber door DNA-coated microcarriers
B
Stopping screen
Vacuum gauge Disk-retaining C
Target cells
cap
Microcarrier
launch assembly
Target shelf

Vacuum/vent rate
control valves

Fig. 12.3. Gene gun and system. (A) The biolistic system. The Biolistic PDS-1000/He instrument
consists of the bombardment chamber (main unit), connective tubing for attachment to vacuum source,
and all components necessary for attachment and delivery of high pressure helium to the main unit
(helium regulator, solenoid valve, etc.). (B) Biolistic process. The Biolistic PDS-1000/He system uses
high pressure helium, released by a rupture disk and partial vacuum to propel a macrocarrier sheet
loaded with millions of microscopic tungsten or gold microcarriers towards target cells at high velocity.
The microcarriers are coated with DNA or other biological materials for transformation. The macrocarrier
is halted after a short distance by a stopping screen. The DNA-coated microcarriers continue travelling
towards the target to penetrate and transform the cells. The launch velocity of microcarriers for each
bombardment is dependent upon the helium pressure (rupture disk selection), the amount of vacuum in
the bombardment chamber, the distance from the rupture disk to the macrocarrier, the macrocarrier travel
distance to the stopping screen, and the distance between the stopping screen and target cells.
Gene Transfer and GM Plants 465

without causing excessive damage. For trans- of Agrobacterium-mediated transformation


formation, DNA is coated on to the surface by exploiting physical principles to intro-
of micron-sized tungsten or gold particles by duce the DNA into the plant cell and then
precipitation with calcium chloride and sper- relying on factors that are common to all
midine and DNA must be delivered into cells plants (i.e. DNA repair mechanisms) to
from which whole plants can be generated. enable stable transgene integration. For
If the foreign DNA reaches the nucleus then stable transformation and the recovery of
transient expression is likely to result and the transgenic plants, particle bombardment is
transgene may become incorporated in a sta- restricted only by the requirement to deliver
ble manner into host chromosomes. Stanford DNA into regenerable cells (Altpeter et al.,
developed the original bombardment con- 2005a). By removing almost all the inciden-
cept (Stanford et al., 1987; Stanford, 2000) tal biological constraints that limit other
and coined the term biolistics (short for bio- transformation methods, particle bombard-
logical ballistics) for both the process and ment has facilitated the transformation of
the device. Particle bombardment has been some of the most recalcitrant plant species.
especially useful in transforming monocot The ability to transform diverse cell
species as it has no biological constraints or types by particle bombardment facilitates a
host limitations, can target diverse cell types broad range of applications that are difficult
and is the most convenient way for achiev- or impossible to achieve by other transfor-
ing organelle transformation. However, it is mation methods. This is critical when the
widely believed that particle bombardment rapid analysis of large numbers of constructs
produces large, multi-copy and highly com- in a specific tissue or cell type is required.
plex transgenic loci that are prone to further
recombination, instability and silencing. Diverse cell types can be targeted efficiently
for foreign DNA delivery
Particle bombardment facilitates a wide
range of transformation strategies Particle bombardment does not depend
on any particular cell type as long as the
The Biolistic and Helios systems can be used DNA can be introduced into the cell with-
to circumvent the need to maintain virulifer- out killing it. In rice, the range of suitable
ous populations of insect vectors, allowing tissues includes immature embryos (78
direct introduction of infectious viral nucleic days after anthesis), embryogenic callus
acids into a range of plant species. An attrac- derived from either immature embryos or
tive feature of such systems is the flexibility mature seeds and suspension culture cells
by which co-infections can be achieved with (Datta et al., 1998, 2001; Tu et al., 1998a, b;
different viral species and genomic compo- Baisakh et al., 2001). Transient expression
nents, generating a powerful tool for inves- has even been achieved using the intact
tigating mechanisms of pathogenicity and immature seed endosperm following bom-
host resistance. Particle bombardment was bardment with a vector carrying the gusA
utilized both to produce transgenic cassava reporter gene (Grosset et al., 1997; Clarke
plants and to challenge them by simultane- and Appels, 1998).
ous inoculation with two species of gemi-
niviruses (Chellappan et al., 2004). Particle
bombardment also has an important role to Vectors are not required for particle
play in extending virus-induced gene silenc- bombardment
ing (VIGS) into economically important crop The exogenous DNA used in transforma-
plants (Fofana et al., 2004). tion experiments typically comprises a
Particle bombardment has no biological plant expression cassette inserted in a vec-
constraints or host limitations tor based on a high-copy number bacterial
cloning plasmid. Neither of these com-
Particle bombardment overcomes the ponents is required for DNA transfer and
boundaries defined by classical host ranges only the expression cassette is required for
466 Chapter 12

transgene expression. The expression cas- bardment was first demonstrated by Vaneck
sette typically comprises a promoter, open et al. (1995) using cell suspensions of two
reading frame and polyadenylation site tomato cultivars. Only one of the cultivars
that are functional in plant cells, although yielded YAC transformants and initial stud-
other components may be present, such as ies suggested that the integrated YAC was
a protein-targeting signal (Altpeter et al., fairly intact in four of the five transform-
2005a). Once this plasmid has been isolated ants recovered, based on the presence of
from the bacterial culture it can be purified two marker genes. The most promising way
and used directly for transformation. of introducing high molecular weight DNA
During Agrobacterium-mediated trans- into plant cells is to create engineered mini-
formation, the T-DNA is naturally excised chromosomes in maize and genes to those
from the vector during the transformation mini-chromosomes (Yu et al., 2007). Mini-
process. This frequently, although not always, chromosomes are able to function in many
prevents the integration of vector backbone of the same ways as chromosomes but allow
sequence into the plant genome (Fang et al., for genes to be stacked on them. The tech-
2002; Popelka and Altpeter, 2003), neces- nique developed in maize should be trans-
sitating time-consuming sequence analy- ferable to other plant species.
sis of transgene insertion sites following
Agrobacterium-mediated gene transfer. In Particle bombardment is the most convenient
contrast, particle bombardment involves no way to achieve organelle transformation
such processing. Cloning vectors are used
in particle bombardment for convenience Thus far, most genetically engineered plants
rather than necessity. Consequently, Fu et al. have been subject to nuclear transformation.
(2000) devised a clean DNA strategy in which An alternative approach is to introduce
all vector sequences were removed prior to transgenes into the chloroplast genome.
particle loading. A standard plasmid vec- This strategy offers advantages such as very
tor was used to clone the plant expression high levels of transgene expression, uni-
cassette and transgene of interest in bacteria parental plastid gene inheritance in most
and then the cassette was excised from the crop plants (preventing pollen transmis-
plasmid and purified by agarose gel electro- sion of transgenes), the absence of gene
phoresis. This minimal, linear cassette was silencing and position effects, integration
then used to coat the metal particles and via a homologous recombination process
carry out transformation. that facilitates targeted transgene inser-
tion, elimination of vector sequences, pre-
High molecular weight DNA delivery cise transgene control and sequestration
into plant cells of foreign proteins in the organelle, which
prevents adverse interactions within the
Until recently, one serious limitation to cytoplasmic environment (as reviewed by
plant transformation technology was the Altpeter et al., 2005a).
inability to introduce large intact DNA
constructs into the plant genome. Such Comparison with other methods
large constructs could incorporate multiple
transgenes, or could comprise a segment of In addition to the properties discussed
genomic DNA to facilitate the map-based for particle bombardment, Altpeter et al.
cloning of plant genes. In Agrobacterium- (2005a) provided a comprehensive review
mediated transformation, this limitation of this method by comparing it with other
has been addressed by the development transformation methods. Transgene inte-
of binary bacterial artificial chromosome gration, mediated by either A. tumefaciens
(BIBAC) and transformation-competent arti- or particle bombardment, is a random pro-
ficial chromosome (TAC) vectors (Shibata cess that appears to correlate with the posi-
and Liu, 2000). The transfer of yeast artificial tion of naturally occurring chromosome
chromosome (YAC) DNA by particle bom- breaks. Transcriptionally active regions of
Gene Transfer and GM Plants 467

the genome are favoured, particularly the protoplast isolation, callus formation and
subterminal regions of the chromosomes, plant regeneration. It is important to gener-
perhaps because the DNA is more accessi- ate and maintain a cell suspension culture
ble in these areas. It is possible, although for its embryogenic capacity. Extensive time
still a matter of speculation, that further in tissue culture often results in low repro-
breaks may be caused by particle bom- ducibility and poor regeneration capacity.
bardment since the microprojectiles may A breakthrough in Arabidopsis research
shear the ends of DNA loops in the nucleus was the invention of the vacuum-infiltration
(Abranches et al., 2000; Kohli et al., 2003), procedure, a simple and reliable method of
which may partially explain the relative obtaining transformants at high efficiency
efficiency of bombardment in terms of while avoiding the use of tissue culture
stable transformation compared to other (Bent, 2000). In planta transformation
techniques. involves floral dip, vacuum infiltration
Compared to biolistic techniques, and spraying. They yield transformants at
Agrobacterium-mediated transformation frequencies ranging up to several percent,
offers several advantages (Tzfira and Citovsky, with the most common frequency being
2006), such as simpler integration patterns 0.11%.
resulting in lower mutational consequences Electroporation utilizes short, high-
for the transgenic plant and limited transgene intensity electric fields to permeabilize
silencing via co-suppression. In addition, the reversely the liquid bilayers of the cell mem-
option for fine tuning the Agrobacterium- brane. It is widely believed that the electric
based transformation protocols renders more pulse causes extensive compression and
and more cereal species amenable for effi- thinning of the plasmalemma. The resulting
cient genetic engineering (Shrawat and Lrz, transient formation of pores permits free dif-
2006; Conner et al., 2007). fusion of various classes of macromolecules
including dyes, antibodies, RNA and viral
particles and DNA. Transient expression
from electroporated plant cells has been
12.2.3 Electroporation and other direct used to define functional elements within
gene transfer approaches a promoter, to examine the effects of anti-
sense RNA on gene expression, to study the
There are several less popular means of translocation of proteins into both plasmids
gene transfer that may be effective in spe- and nuclei of intact protoplasts, to examine
cific cases: polyethylene glycol (PEG)- cell-cycle-specific gene expression and to
facilitated protoplast fusion, microinjection, study responses to plant hormones.
sonication, in planta transformation and As a method of DNA transfer, electro-
electroporation. The mechanism is to cause poration is convenient and the results are
transient micro-wounds in the cell wall and consistently duplicated as a daily routine.
the plasma membrane, allowing DNA in In most cases it is more efficient than other
the medium to enter the cytoplasm before methods designed for the same purpose,
repair or fusion of the damaged cellular such as particle bombardment. In addition,
structures. The direct transfer of DNA to it does not suffer from host-range limitations
protoplasts using PEG, or electroporation imposed by biology-based systems such as
resulting in the transient permeabilization of those employing A. tumefaciens or toxic-
the cell membrane using high-voltage elec- ity problems sometimes encountered using
tric fields, has been shown to be possible in a PEG-based procedure. Finally, electropo-
various plants. Leaf tissue or embryogenegic ration coupled with a transient expression
calli are often used to isolate protoplasts by assay is rapid, allowing for the reproducible
enzymatic treatment. Using protoplasts as detection of gene products within hours of
the starting material for transformation in the introduction of DNA. This is in con-
cereals often employs callus induction, sus- trast to a stable transformation strategy
pension culture initiation and maintenance, that involves months to regenerate trans-
468 Chapter 12

formants and suffers from uncontrollable among a population of tobacco protoplasts.


larger variations in gene expression because They described optimized conditions for an
of positional effects. In the context of a electroporation-based transient expression
transformation programme in which stable assay that routinely results in nearly 90%
integration of genetic materials is required, expression frequency.
transient expression may be used to rapidly
demonstrate functionality of the new trans-
gene sequences before they are used to gen-
erate transformants by some other methods 12.3 Expression Vectors
of DNA introduction.
An electroporation-based transfection The progress in plant genetic engineering
system consists of a number of potentially could not have been as productive as it is
important variables, including methods today without the development of small,
of protoplast preparation, electric pulse easy-to-manipulate and simple-to-use Agro-
strength and duration, ionic concentration bacterium binary vectors (Komari et al.,
and composition of the electroporation 2006, 2007). The first generation of plant
buffer and DNA purity, concentration and transformation binary vectors were rather
topology. Fisk and Dandekar (2004) have simply designed, lacking cloning and
analysed the importance of these variables expression versatility and offered very little
in addition to a few others with the goal of flexibility for their manipulation for specific
identifying and optimizing the parameters research or application purposes (Fig. 12.4;
necessary to increase expression frequency Tzfira et al., 2007).

Modular vectors

Induced Custom-made Multi-gene


Viral vectors
expression vectors expression cassettes expression vectors

Unique Downregulation Overexpression


applications of genes of genes

Plant cells

Novel traits Functional data Transgenic plants Live cell imaging

Fig. 12.4. From vectors to applications to cellular functions. Introduction of genetic information into target
plant cells and acquisition of new data as a result of transgene expression may require a network of
modular vectors, flexible gene cloning and expression systems, and specialized plasmids that result in
different modes of transgene expression. Modular vectors may represent a starting point for assembly of
custom-made expression vectors, multi-gene expression vectors, and other types of plant transformation
vectors. These vectors in turn provide the users with the abilities to overexpress and downregulate
genes, as well as with the capacity for specific, and often unique, applications, useful for obtaining novel
traits and functional data, protein imaging in living plant cells, and generating transgenic plants for plant
research and biotechnology.
Gene Transfer and GM Plants 469

A crucial improvement made to the out various tasks in plant cells, e.g. the
first generation of binary plasmids was the transfer of extremely long DNA molecules
introduction of an empty plant expression (Hamilton, 1997), the expression of fluores-
cassette, a feature that allowed the plant cent protein fusions (Goodin et al., 2002)
biologists a simple and more direct route for and the detection of proteinprotein inter-
cloning their gene of interest under the con- actions (Bracha-Drori et al., 2004), while
trol of a plant-expressing constitutive pro- others were specifically designed for versa-
moter. The constant improvements in binary tility and simplicity, allowing plant biolo-
vectors even included the most famous gists not only a choice but also the ability
binary vector, one of which has been domi- to manipulate these vectors for their own
nating the landscape of binary plasmids needs. The latter group of vectors are typi-
for several decades: pBin19 (Bevan, 1984; cally constructed as families of plasmids
Komori et al., 2007). This plasmid offers and include, for example, the pCB mini-
several features, including incorporation of binary vector series that featured a collec-
the lacZ gene into the multiple cloning site tion of extremely small pBin19-derivative
(MCS) to facilitate identification of recom- vectors (Xiang et al., 1999) and the pGreen
binant plasmids using a colorimetric assay, series of plasmids featuring versatile and
a bacterial kanamycin-resistance gene, an flexible series of binary vectors (Hellens
E. coli origin of replication, a complete et al., 2000b). These and many other fami-
plant selection marker expression cassette lies of vectors provide the plant research
and an extended MCS. community with a vast number of versatile
New vectors were designed and con- tools for various plant expression analyses.
structed to provide users with a more spe- Some well-known binary and superbinary
cialized set of tools suitable for carrying vectors are listed in Table 12.1.

Table 12.1. Well-known binary and superbinary vectors (from Komori et al. (2007) reproduced with
permission of the American Society of Plant Biologists).

Frequency
Plant Bacterial Replication Replication of use in
selection selection origin for A. origin for Mobili- recent
Vector markera markerb tumefaciens E. coli zation Reference literaturec

pBin19 Kan Kan IncP IncP Yes Bevan 40%


(1984)
pBI121 Kan Kan IncP IncP Yes Jefferson 40%
(1987)
pCAMBIA Kan or Cm or pVS1 CoIE1 Yes www. 30%
series Hyg Kan cambia.org
pPZP Kan or Cm or pVS1 CoIE1 Yes Hajdukiewicz 30%
series Gen Sp et al. (1994)
pGreen Kan, Hyg, Kan IncW pUC Yes Hellens et al. 3%
series Sul, (2000)
or Bar
pGA482 Kan Tc, Kan IncP CoIE1d Yes An et al. (1985) 3%
pSB11e None Sp None CoIE1 Yes Komari et al. 3%
(1996)
pSB1e None Tc IncP CoIE1d Yes Komari et al. 3%
(1996)
pPCV001 Kan Ap IncP CoIE1d Yes Koncz and 1%
Schell (1986)
pCLD04541 Kan Tc, Kan IncP IncP Yes Tao and Zhang 1%
(1998)
Continued
470 Chapter 12

Table 12.1. Continued.

Frequency
Plant Bacterial Replication Replication of use in
selection selection origin for A. origin for Mobili- recent
Vector markera markerb tumefaciens E. coli zation Reference literaturec

pBIBAC Kan or Kan pRi F factor Yes Hamilton 0%


series Hyg (1997)
pYLTAC Kan or Kan pRi Phage P1 No Liu et al. (1999) 0%
series Bar
a
Kan, Kanamycin; Hyg, hygromycin; Gen, gentamycin; Sul, sulfonylurea; Bar, phosphinothricin.
b
Kan, Kanamycin; Cm, chloramphenicol; Sp, spectinomycin; Tc, tetracycline; Ap, ampicillin.
c
From issues between 2005 and 2007 of 12 leading plant journals, 180 papers in which plant transformation mediated
by A. tumefaciens is described were randomly chosen and surveyed.
d
Although IncP is also active in E. coli, it is likely that the plasmid is replicated mainly by the CoIE1 system.
e
pSB11 and pSB1 are an intermediate vector and an acceptor vector of the superbinary vector system, respectively.

The 2007 Focus Issue of Plant Phys- and an artificial T-DNA within a plasmid
iology presented a collection of original that can be replicated both in E. coli and
articles describing the development of new A. tumefaciens turned out to be fully func-
vector systems useful for plant research and tional in plant transformation. The term
biotechnology, as well as a compilation of binary vector literally refers to the entire
short review articles that highlight some of combination, but the plasmid that carries
the major developments in vector-assisted the artificial T-DNA is usually called a
plant research technologies (Tzfira et al., binary vector.
2007). It includes papers describing an A binary vector consists of T-DNA and
extensive collection of MultiSite Gateway- the vector backbone (Fig. 12.5). T-DNA is the
based plant expression vectors (Karimi segment delimited by the border sequences,
et al., 2007), a guide to vectors for chloro- the right border (RB) and the left border (LB)
plast transformation (Lutz et al., 2007) and and may contain MCS, a selectable marker
a system of transformation vectors with gene for plants, a reporter gene and other
the superpromoter (Lee, L.-Y., et al., 2007). genes of interest. The vector backbone car-
For a recent update on binary vectors, the ries plasmid replication functions for E. coli
reader is referred to an update by Komori and A. tumefaciens, selectable marker genes
et al. (2007). for the bacteria, and optionally a function for
plasmid mobilization between the bacteria
and other accessory components (Komori
12.3.1 Binary vectors et al., 2007).
The RB and the LB are imperfect, direct
The binary vector was invented soon after repeats of 25 bases and said to be the only
it had been elucidated that crown gall tum- essential cis-elements for T-DNA transfer
origenesis was caused by genetic transfor- (Yadav et al., 1982). The RB and the LB are
mation of plant cells with a piece of T-DNA integrated in binary vectors as DNA frag-
from a Ti plasmid (tumour-inducing plas- ments cloned from well-known Ti plasmids,
mid) harboured by A. tumefaciens (Fraley of either the octopine or nopaline type.
et al., 1986). A key finding was that the Insertion of genes of interest into appro-
virulence genes, which are involved in the priate locations of a binary vector is tradi-
transfer of T-DNA, could be placed on a tionally carried out by standard subcloning
replicon separate from the one with T-DNA techniques. MCS, which are similar or iden-
(Hoekema et al., 1983). Thus, combina- tical to those in pUC, pBluescript and other
tion of a disarmed strain, which carries a standard vectors, are still very useful in this
Ti plasmid without the wild-type T-DNA regard, but recently constructed vectors are
Gene Transfer and GM Plants 471

Pro: Promoter Reporter: MCS: 3: 3signal


CaMV 35S GUS Multiple cloning sites CaMV 35S
T-DNA genes LUC pUC T-DNA genes
Ubiquitin GFP pBluescript
Actin
T-DNA
P Select:
RB: Right border 3 MCS Pro Plant selectable marker
Octopine Ti plasmid Kanamycin
Reporter P Select Hygromycin
Nopaline Ti plasmid
Phosphinothricin
Pro 3 Glyphosate
RB Binary vector LB Phosphomannose
isomerase
Mob: Mob
Plasmid mobilization B Select
OriA LB: Left border
OriT
OriE Octopine Ti plasmid
Bom
Nopaline Ti plasmid
Vector backbone

OriA: OriE: B Select:


Plasmid replication Plasmid replication Bacterial selectable marker
function for A. function for E. coli Kanamycin
tumefaciens pUC Ampicillin
IncP ColE1 Gentamycin
IncW IncP Spectinomycin
pVS1 F factor Chloramphenicol
pRi Phage P1 Tetracycline

Fig. 12.5. Typical structure of a binary vector. Key components and their major options are displayed.
From Komori et al. (2007) reproduced with permission of the American Society of Plant Biologists.

more user-friendly. Recognition sites for mainly in dicotyledons and it had been
rare cutters, which are restriction enzymes difficult to apply the method to cereals.
with long recognition sequences, are very The finding that some of the virulence
convenient in this respect because the DNA genes exhibited gene dosage effects led to
fragments that are to be inserted scarcely the development of a superbinary vector,
have such sites. In some of the recently which carried additional virulence genes.
created vectors termed modular vectors, a The superbinary vector has been highly effi-
series of these rare sites are placed in the cient in the transformation of various plants
T-DNA (Chung et al., 2005). An extensive and especially useful in the transformation
set of auxiliary plasmids, which have full of recalcitrant plants, such as important
sets or subsets of these rare sites and other cereals.
restriction sites, are provided and some of A superbinary vector was developed
the plasmids also carry frequently used and successfully used for the transformation
promoters, marker genes and/or 3' signals. of monocotyledons, such as rice and maize
Various types of expression units may be (Hiei et al., 1994; Ishida et al., 1996). The
constructed in auxiliary plasmids and then superbinary vector is an improved version
the units may be inserted into the modular of a binary vector and carries the 14.8-kb
binary vectors. Thus, several expression KpnI fragment that contains the virB, virG
cassettes could easily be assembled in a and virC genes derived from pTiBo542,
binary vector. which is responsible for the supervirulence
Until the early 1990s, Agrobacterium- phenotype of an A. tumefaciens strain, A281
mediated transformation had been used (Jin et al., 1987; Komari, 1990).
472 Chapter 12

12.3.2 Gateway-based binary vectors configuration permits convenient insertion


of promoter and effector sequences, as well
Binary vectors used for generation of trans- as of plant selection marker cassettes of
genic cereal species are typically cumbersome choice. The insertion of effector sequences
due to their large size and the rather limited into the binary overexpression and knock-
number of useful restriction sites. To bypass down vector series is facilitated by the
laborious preparation of constructs, Gateway highly efficient Gateway recombination sys-
technology (Invitrogen) is used especially for tem. The spectrum of applications is further
binary vectors generating knock-down lines. extended by the options to test constructs in
Gateway-derived cloning systems are based transient expression assays (e.g. in barley)
on the site-specific recombination system prior to starting the laborious stable trans-
from bacteriophage 1 (Landy, 1989) and cir- formation procedure and by the option to
cumvent traditional cloning methods involv- transform monocotyledonous and dicotyle-
ing restriction and ligation of DNA sequences. donous plants using the same binary vector.
A number of Gateway-based binary vector Vector derivatives with strong, constitutive
sets for plant functional genomics have been promoters, such as the maize ubiquitin pro-
developed, thereby allowing overexpression moter (ZmUbi1; Furtado and Henry, 2005),
or knock-down of effector genes, expression the double-enhanced CaMV 35S promoter
of fusion proteins (as reviewed by Earley et (d35S; Furtado and Henry, 2005), or the
al., 2006) and transformation of multiple rice actin promoter (OsAct1; McElroy et al.,
genes (Chen et al., 2006). 1990; Vickers et al., 2006), are provided. In
The Gateway system provides another addition, the wheat glutathione S-transferase
user-friendly feature. A DNA fragment promoter (TaGstA1; Altpeter et al., 2005b)
flanked by a pair of short, specific sequences permits the expression of transgenes con-
may easily be replaced with another DNA fined to leaf epidermis in a constitutive
fragment by the Gateway system. Thus, manner. With the availability of a combina-
introduction of DNA fragments into a binary tion of the highly efficient Gateway cloning
vector with the sites for the Gateway sys- system, a selection of cereal promoters con-
tem is a straightforward step and is useful trolling the expression of genes of interest,
in many applications. Combination of the different plant selection markers and the
modularity based on rare-cutting restriction option of further convenient vector modi-
enzymes and the Gateway recombination fications, the functional characterization
sites provides an extensively versatile clon- of DNA sequences in cereal species will be
ing system, which is especially useful in the greatly facilitated.
production of T-DNA with multiple genes
(Chen et al., 2006).
Gateway-based binary vectors have 12.3.3 Choice of transformation vectors
been developed for dicotyledonous plants
(e.g. Wesley et al., 2001; Curtis and As a wide range of binary vectors and
Grossniklaus, 2003; Tzfira et al., 2005). superbinary vectors is available now, help-
However, these are typically not useful for ful guidance for selection of the vectors is
monocotyledons, mainly because of the lim- needed and has already been provided in
ited functionality of promoters that are used the literature (Hellens et al., 2000a; Komari
to drive either the gene of interest or the plant et al., 2006). Unfortunately there is no vector
selection marker. However, other specific that is good for all purposes, but, fortunately,
vector elements, such as the plant-selectable many of the vectors currently available are
marker and the origin of replication, may quite versatile. They may be used in various
impede the amenability of a binary vector. types of experiments and there is a good
Himmelbach et al. (2007) provided a chance that vectors being routinely used
set of generic binary vectors that is made can be employed in the experiment of inter-
available for phenotypic studies in stably est. If this is not the case or better options
transformed cereal species. Its modular are worthwhile searching for, a series of
Gene Transfer and GM Plants 473

questions needs to be asked about the size recovery of transgenic crop plants (Ramessar
and nature of the DNA fragments, the strains et al., 2007). Without them, the few plant
of A. tumefaciens to be employed, the spe- cells that take up and stably integrate the
cies of plants to be transformed and the foreign DNA would simply be lost in an
purposes of the experiments. If the DNA frag- ocean of wild-type cells, which would cer-
ments are larger than 15 kb, IncP, BIBAC and tainly overgrow these transformed cells in
TAC vectors are recommended. Otherwise, the absence of effective selection against
high-copy-number plasmids are very con- them. However, under certain conditions,
venient and a wide range of vectors varying selectable marker genes may not be neces-
in restriction sites, selectable markers and sary and it may be feasible to get transgenic
Gateway sites is available. A series of vec- plants without selection of a marker gene.
tors designed for specific purposes, e.g. vec-
tors for suppression of plant genes by RNA
interference (RNAi) technology (Miki and
Shimamoto, 2004) may also be chosen. 12.4.1 Functions of selectable marker
Newer generations of plant transforma- genes
tion vectors provide us with improved strat-
egies for cloning and delivering their genes Once a plant cell has incorporated the
of interest into plant cells, typically using introduced DNA in a stable manner (i.e.
Agrobacterium as a vehicle for the transfor- covalently integrated within the host plants
mation process. Some of these vectors were genome), the next step is to regenerate
developed as families of plasmids and others plants from the transformed cells. Position,
represented single constructs designed for frequency and scope of regeneration events
specific purposes. One can find a plasmid for are critical to the isolation of transgenic
every task, including such relatively unique plants. Most often, the major limiting step
applications as activation tagging (e.g. the in the isolation of transgenic plants is a lack
pSKI015 and pSKI074 binary vectors; Weigel of regeneration occurring from within the
et al., 2000) or dexamethasone-inducible transformed cell populations. There is a
expression (e.g. the pOp/LhGR transcription large amount of variability in the frequency
activation system; Samalova et al., 2005). In and scope of regeneration among different
addition, vectors have been constructed that angiosperm species as well as among differ-
allow us to take advantage of radically new ent cultivars of any one species.
cloning methodologies and utilize new gene A critical step in the regeneration of
expression technologies. In addition, new transgenic plants is the ability to distinguish
vector systems are being produced to utilize between transformed plant cells with an
transgenic technologies in an ever-expanding integrated transgene and the bulk of non-
range of plant species, such as forest trees and transformed cells. The traditional way to
transformation-recalcitrant crops (e.g. Meyer achieve this goal is to use marker genes within
et al., 2004; Coutu et al., 2007). Furthermore, the transgene and to select for their expres-
vectors for systemic gene expression without sion. Genes conferring resistance to vari-
permanent genetic modification of the plant ous antibiotics or herbicides are commonly
are being developed based on different plant used in laboratory transformation research.
viruses (e.g. Gleba et al., 2005; Marillonnet Selective marker genes act by expressing an
et al., 2005). enzyme that inactivates the selective agent
(detoxification) and a resistant variant of a
selective agents target enzyme (tolerance).
For example, the aminoglycoside antibiot-
12.4 Selectable Marker Genes ics, such as kanamycin, neomycin and G418
kill cells by inhibiting protein translation.
The use of selectable marker gene sys- The E. coli nptII gene, encoding neomycin
tems facilitates the transformation process phosphotransferase, inactivates these anti-
and allows the relatively straightforward biotics by phosphorylation, thus allowing
474 Chapter 12

preferential growth of plant cells trans- tive media, only plant tissues that have suc-
formed with this gene on media containing cessfully integrated the transgene construct
these selection agents. The herbicide phos- and express the selectable marker gene will
phinothricin is an analogue of glutamine survive. It is assumed that these plants will
and acts by irreversibly inhibiting glutamine also possess the transgene of interest. Thus,
synthetase, a key enzyme for ammonium subsequent steps in the process will only
assimilation and the regulation of nitro- use these surviving plants.
gen assimilation in plants. The bar gene,
cloned from the bacterium Streptomyces
hygroscopicus, encodes phosphinothricin
acetyltransferase, which converts phosphi- 12.4.2 Selectable marker genes
nothricin into the non-toxic acetylated form for plants
and allows growth of transformed plant
cells in the presence of phosphinothricin, or There are two major classes of selectable
commercial glufosinate ammonium-based marker genes, antibiotic and herbicide
herbicides. resistance genes. Antibiotic resistance
All systems in general have low trans- genes are used in two important phases of
formation efficiencies in the absence of transgenic plant production: (i) pre-plant
selectable markers. However, in the presence transformation to select bacteria during
of a selectable marker, in systems such as routine molecular biology operations to
tobacco, rice and maize cells, transformation manipulate transgenes and create expres-
frequencies are extremely high. With high sion vectors; and (ii) during the transforma-
co-transformation frequencies selectable tion process itself, to select cells and plants
markers facilitate the identification of plants that have stably integrated introduced trans-
containing co-transformed transgenes. The genes (selectable markers and gene(s) of
utility of the individual selectable marker interest) (Ramessar et al., 2007). There are
genes is a function of both the properties of two issues frequently raised with respect to
the respective resistance protein they encode antibiotic resistance genes: (i) effects on the
and the relative sensitivity of the target tis- therapeutic efficacy of clinically used anti-
sue to their corresponding selective agent. biotics, i.e. concerns that antibiotic resist-
The timing of selective agent application ance gene products in transgenic crops or
is critical to its successful utilization and products might render clinically important
transformed cells need to recover and com- therapeutic antibiotics ineffective; and
pete. The relative insensitivity of monocots (ii) potential for horizontal gene transfer,
to high levels of the antibiotic kanamycin i.e. concerns about the potential transfer of
(commonly used in dicot transformation) the antibiotic resistance marker gene to intes-
led to attempts to replace this antibiotic tinal and soil microorganisms. For herbicide
with other selective agents. The features of resistance genes the issues are: (i) gene
a particular transformation system (espe- flow by which new genes can spread by
cially the nature of the material to be trans- normal outcrossing to wild or weedy rela-
formed and the route of transgenic plant tives of the engineered crops; (ii) weediness
regeneration) should be considered when the potential for a crop or its sexually com-
choosing the resistance mechanism and the patible wild relatives to become established
individual marker gene to be employed in and to persist and spread into new habitats
any selection scheme. Patent and freedom as a result of newly introduced genes; and
to operate (FTO) issues often influence the (iii) toxicity and allergenicity an issue
choice of selectable marker gene. associated with human health and the safety
Following the gene insertion process, of novel foods and potential negative effects
plant tissues are transferred to a selec- on non-target organisms.
tive medium containing an antibiotic or Choice of selectable marker genes is a
herbicide, depending on which selectable key factor in plant transformation. Genes that
marker was used. When grown on selec- give resistance to antibiotics or herbicides,
Gene Transfer and GM Plants 475

such as kanamycin, hygromycin, phosphi- markers (Joersbo et al., 1998). Table 12.2 pro-
nothricin and glyphosate, are very popu- vides a list for selectable marker genes used
lar. Kanamycin resistance has been most in plant transformation. Although to date
frequently employed in the transformation more than 20 selectable marker genes have
of many dicotyledonous plants. If the devel- been reported in the transformation of higher
opment of herbicide-resistant plants is aimed plants, many of them were tested only in a
at, a trait gene could also be a selectable limited number of plant species on a limited
marker gene. Because of concerns over anti- scale. Therefore, further studies of marker
biotic resistance genes in commercial trans- genes may contribute to improvement of
formants, genes to add metabolic capabilities the transformation of certain plant species
have been drawing considerable attention. (Komori et al., 2007).
For example, plant cells expressing a phos- Selectable marker genes are driven by
phomannose isomerase can grow on media constitutive promoters. The promoters of
with mannose as the sole carbon source. Such the CaMV 35S transcript (Odell et al., 1985)
markers are referred to as positive selection and the nopaline synthase of A. tumefaciens

Table 12.2. Selectable marker genes for plant transformation.

Selectable
marker gene Gene product Source Selection

nptII Neomycin Tn5 Kanamicin; G418


phosphotransferase paromomycin; neomycin
ble Bleomycin resistance Tn5 and Bleomycin;
Streptoalloteichus phleomycin
hindustanus
dhf r Dihydrofolate reductase Plasmid R67 Methotrexate
cat Chloramphenicol Phage p1Cm Chloramphenicol
acetyltransferase
aphIV Hygromycin E. coli Hygromycin B
phosphotransferase
ept Streptomycin Tn5 Streptomycin
phosphotransferase
aacC3, aacC4 Gentamycin-3-N Serratia Gentamycin
acetyltransferase marcescens,
Klebsiella
pneumoniae
bar Phosphinothricin Streptomyces Phosphinothricin;
acetyltransferase hygroscopicus bialophos
epsp 5-enolpyruvylshikimate- Petunia hydrida Glyphosate
3-phosphate synthase
bxn Bromoxynil specific Klebsiella ozaenae Bromoxynil
nitrilase
psbA Qa protein Amaranthus Attrazine
hydridus
FfdA 2,4-D monooxygenase Alcaligenes 2,4 Dichlorophen-
eutrophus oxyacetic acid
dhps Dihydrodipicolinate E. coli S-Aminoethyl; L-cystein
synthase
ak Aspirate kinase E. coli of lysine High concentrations
and threonine
sul Dihydropteroate Plasmid R46 Sulfonamide
synthase
Csr1-1 Acetolactate synthase Arabidopsis thaliana Sulfonylurea herbicides
Tdc Tryptophan decarboxylase Catharanthus roseus 4-Methyl trytophan
476 Chapter 12

(Depicker et al., 1982) are very popular latory concerns, especially in Europe and
in dicotyledons and the promoters of the difficulty in using in breeding programmes
ubiquitin gene of maize (Christensen et al., for transgene identification.
1992) and the actin gene of rice are popu-
lar in monocotyledons (Zhang et al., 1991). Herbicide tolerance genes
Selectable marker genes are followed by a
DNA fragment, the so-called 3' signal. The A number of herbicides have been used as
3' regions of the CaMV 35S transcript and selective agents in cereal transformation.
the nopaline synthase gene in the wild-type Markers have been developed by engineer-
T-DNA of A. tumefaciens are frequently ing tolerance to herbicides that inhibit
used as a 3' signal. amino acid biosynthesis. Both herbicides
and antibiotics can be used to select materi-
Antibiotic resistance genes als by addition to the tissue culture media or
by spraying the full-grown plants. They both
Aminoglycoside antibiotics are bacte- can be readily used in breeding programmes
rial inhibitors of prokaryotic, mitochon- to select for the inheritance of linked trans-
drial and chloroplast protein synthesis. genes. However, herbicides have a more
Kanamycin, gentamycin/geneticin (G418) serious intellectual property problem than
and paromomycin bind the 30S ribosomal antibiotics. A problem with herbicide resist-
subunit to inhibit translation initiation. ance genes is that we end up with plants that
Hygromycin interacts with the elongation are herbicide resistant although this may
factor EF-2 to inhibit peptide chain elon- not be a desired goal. Several strategies have
gation. Exposure of plants to these antibi- been developed for engineering herbicide
otics leads to an inhibition of chlorophyll tolerance in transgenic cereals, by introdu-
biosynthesis and leaf bleaching. The most cing a herbicide tolerant variant of an amino
widely used selectable markers in cereal acid biosynthetic enzyme, e.g. a mutant als
transformation are the genes encoding gene for sulfometuron methyl (Qust) toler-
neomycin phosphotransferase (nptII), ance and by introducing an enzyme which
hygromycin phosphotransferase (hpt) and inactivates the herbicide, e.g. the bar gene for
phosphinothricin acetyltransferase (bar) phosphinothricin (PPT, Liberty) tolerance.
(Cheng et al., 2004). These genes confer Resistance to PPT-based herbicides using
resistance to kanamycin and some related the bar gene from S. hygroscopicus has been
aminoglycosides (such as G418 and paro- used for the selection of fertile transgenic
momycin), hygromycin and PPT, respec- cereals, e.g. rice, maize, wheat and barley,
tively. Transformed cells in these systems while Monsanto uses 5-enol-pyruvylshiki-
are able to survive and non-transformed mate-3-phosphate synthase (EPSPS) and
cells are killed by the selective agents. DuPont uses imidazolinone, chlorsulfuron
This type of selection is referred to as or acetolactate synthase (ALS).
negative selection. Cereals have proven to
be insensitive to relatively high concen- Engineering detoxification of herbicides
trations of kanamycin. Paromomycin has that inhibit glutamine synthase
been used for selection and regeneration of
rice, maize, wheat, oats and barley trans- The enzyme glutamine synthase (GS)
formed with the nptII gene. Resistance to catalyses the synthesis of glutamine from
hygromycin is encoded by the aphIV gene glutamate and free ammonium. PPT is
(commonly referred to as the hpt gene) a glutamate analogue that acts by inhib-
of E. coli, which codes for hygromycin iting GS activity resulting in a cytotoxic
phosphotransferase (HPT). Rice showed accumulation of ammonium. Inactivation
relatively high sensitivity to hygromycin. of PPT and PPT-containing herbicides
There has been a move away from anti- (Liberty) is conferred by the bar gene from
biotic marker genes in commercial cereal S. hygroscopicus, which encodes a phos-
biotechnology because of associated regu- phinothricin acetyltransferase.
Gene Transfer and GM Plants 477

Engineering tolerance to and 1998a, b) and mannose (Miles and Guest,


detoxification of herbicides that inhibit 1984). These selection systems allow trans-
5-enol-pyruvylshikimate-3-phosphate genic plants to be produced without antibi-
synthase otic or herbicide resistance genes in many
plant species (e.g. Negrotto et al., 2000;
The chloroplast-localized enzyme EPSPS Lucca et al., 2001; Reed et al., 2001; He, Z.
catalyses a common step in aromatic amino et al., 2004; Gao et al., 2005).
acid biosynthesis. Glyphosate, the active Positive selection can be of many types,
ingredient in the herbicide Roundup, inhib- from inactive forms of plant growth regula-
its the plastid enzyme EPSPS and thus pre- tors that are then converted to active forms
vents the synthesis of chorismate-derived by the transformed enzyme, to alternative
aromatic amino acids and secondary metab- carbohydrate sources that are not utilized
olites in plants. Dominant mutations that efficiently by the non-transformed cells
confer resistance to glyphosate-containing that become available upon transforma-
herbicides (Roundup) have been shown to tion with an enzyme that allows them to be
result from base pair substitutions within metabolized. Non-transformed cells either
epsps genes. Transformation of plants with grow slowly in comparison to transformed
a mutant epsps gene renders them tolerant cells or not at all. Using positive selection,
to glyphosate. Inactivation of glyphosate- non-transformed cells may die, but, typi-
containing herbicides is conferred by a bac- cally, production of phenolic compounds
terial gox gene that encodes a glyphosate observed with negative selection markers
oxidase. Howe et al. (2002) have described does not occur.
the development of an efficient selectable The first example of positive selec-
marker system for the production of trans- tion was provided by Joersbo and Okkels
genic maize plants using genes that confer (1996); these researchers demonstrated
resistance to the herbicide glyphosate. that transgenic tobacco plants could be
obtained from a leaf disc transformed with
b-glucuronidase (GUS) when a cytokinin
12.4.3 Positive selection glucuronide was provided as a substrate
and cytokinin was absent from the media.
In most cases, negative selection markers Only cells expressing GUS could metabo-
are used to select transformed cells from lize the cytokinin glucuronide. These cells
a population of non-transformed cells, as could then proliferate and differentiate into
described above. As a result of increasing shoots, whereas cells without GUS activity
concern worldwide regarding the use of could not. This concept of positive selec-
antibiotic or herbicide markers in trans- tion was further expanded to include not
genic crop plants, although it is completely only plant hormones, but also carbohy-
unfounded scientifically, several positive drate and nitrogen sources (Okkels and
selection systems have been developed in Whenham, 1994). Carbohydrates repre-
recent years and successfully used for the sent one of the most readily used aspects
production of transgenic plants. In the case of positive selection because plant cells in
of positive selection, a transformed cell culture require the presence of a carbohy-
acquires the ability to metabolize a substrate drate source. Typically sucrose, glucose or
that it previously could not use (or not use maltose is incorporated into plant culture
efficiently) and thereby it grows out of the media. However, if another carbohydrate
mass of non-transformed tissue. In contrast such as mannose is introduced instead into
with negative selection, positive selection the media, in the majority of cases studied,
does not kill the non-transgenic cells, but the plant cells will be unable to prolifer-
gives clear advantages to the transformed ate and may die. In the case of mannose
cells. Positive selection systems include and many other carbohydrates, the com-
benzyladenine N-3-glucuronide (Joersbo pound is metabolized but the product of
and Okkels, 1996), xylose (Haldrup et al., that step cannot be further metabolized.
478 Chapter 12

Other examples of the use of alternative of Agrobacterium has been shown to


carbohydrate sources as a means of selec- yield higher co-transformation frequen-
tion for transgenic cells have utilized cies (Komari et al., 1996). However, the
deoxyglucose, xylose and ribitol. The best convenience of the method will largely
documented of these carbohydrate sources depend on the suitability of the analyti-
is the use of mannose combined with the cal method required. Moreover, Hohn et
phosphomannose isomerase (PMI) gene of al. (2001) suggested that the elimination
E. coli (manA). PMI catalyses the conver- of marker genes by co-transformation
sion of mannose-6-phosphate to fructose- may be especially useful when using
6-phosphate, which can be utilized as a Agrobacterium-mediated transformation.
carbohydrate source. The PMI system has In combination with twin T-DNA vec-
been shown to be effective for sugarbeet, tors, marker-free transgenic plants can be
maize, rice, wheat, Arabidopsis and many produced by carefully designing the trans-
other dicot and monocot species (reviewed formation vectors.
by Wenck and Hansen, 2004). To carry out co-transformation in the
superbinary vector system, a T-DNA with a
selectable marker was located in an acceptor
vector. For example, pSB4 and pSB6 were
12.4.4 Elimination of selectable marker constructed by locating a T-DNA carrying the
genes from transgenic plants hygromycin resistance gene and the phos-
phinothricin resistance gene, respectively,
Selectable marker genes are required for and have been tested in a number of plant
efficient generation of transgenic plants species (Komari et al., 1996; Ishida et al.,
in nearly all transformation procedures, 2004). The frequency of co-transformation,
but serve no purpose once plants have which is the ratio of transformants with
been obtained that are homozygous for the the genes of interests among the number
transgene. On the contrary, their continued of plants with the selective marker gene,
presence can pose technological problems has been quite high, ranging typically
because it precludes retransformation with between 50% and 80%. Marker-free trans-
the same marker systems. formants have then been obtained from
more than 50% of the co-transformants.
Use of co-transformation Co-transformation may be carried out using
other types of vectors. For example, Huang
A simple approach is to co-transform et al. (2004) placed a marker gene in the vec-
plant cells with two separate pieces of tor backbone in a regular binary vector and
T-DNA, one with a selective marker gene observed that plants were co-transformed
and the other with genes of interest and with one T-DNA processed from the right
to select marker-free progeny segregated border and another T-DNA processed from
from the co-transformants (Hohn et al., the left border.
2001). Unlinked integrations of the two Another method for producing marker-
T-DNAs lead to the segregation of the free transgenic plants was proposed by Vain
marker gene from the gene of interest in et al. (2003) using a new dual binary vec-
the T1 generation. Co-transformation can tor system pGreen/pSoup (Hellens et al.,
be performed using either two strains, 2000a). pGreen is a small Ti binary vector
or a single strain, of A. tumefaciens. unable to replicate in Agrobacterium with-
A mixture of two strains, each harbouring out the presence of another binary plasmid,
a binary vector (Komari et al., 1996), or a pSoup, in the same strain. Co-transformation
co-integrate and a binary vector (De Buck with pGreen, carrying the gene of interest
et al., 2000), have been used to study the and pSoup, carrying the selectable marker,
factors that influence co-transformation may lead to the production of marker-free
frequencies. An alternative method for transgenic plants in subsequent progeny
co-transformation using a single strain (Vain et al., 2003).
Gene Transfer and GM Plants 479

Removal of marker genes and other Verweire et al. (2007) presented a vector
unnecessary segments by recombination system to obtain homozygous marker-free
transgenic plants without the need of extra
Recombinases from phages and yeasts, such handling and within the same period as
as cre, FLP and R, which recombine spe- transformation methods in which the
cific sites loxP, FRT and RS, respectively, marker is not removed. By introducing a
are powerful tools to remove selectable germline-specific auto-excision vector con-
marker genes (Ow, 2001) and effective for a taining a cre recombinase gene under the
few model systems. A DNA segment placed control of a germline-specific promoter,
between two of the specific recombination transgenic plants become genetically pro-
sites may be excised from the plant chromo- grammed to lose the marker when its pres-
some if the corresponding recombinase is ence is no longer required (i.e. after the
somehow expressed in the plant cell. For initial selection of primary transformants).
example, transgenic lines that contained Using promoters with different germline
the loxP sites were crossed with lines that functionality, two modules of this genetic
expressed the cre recombinase gene (Moore programme were developed. In the first
and Srivastava, 2006). Various sophisticated module, the promoter, placed upstream of
vector configurations and means to express the cre gene, confers CRE functionality in
the recombinases were reported to exploit both the male and the female germline or in
this system (Wang, Y. et al., 2005; Jia et al., the common germline (e.g. floral meristem
2006). The recombinases may be able to cut cells). In the second module, a promoter
out not only marker genes but also other conferring single germline-specific CRE
unnecessary DNA segments. For example, functionality was introduced upstream of
tandem integration of two or more copies of the cre gene.
T-DNA in a single locus has been observed Recently, Mlynarova et al. (2006) and
quite frequently (Krizkova and Hrouda, Luo et al. (2007) showed that it was possible
1998); it is a cumbersome phenomenon to remove transgenes (selectable markers and
because clean, single-copy transformants are others) efficiently by using an auto-excision
generally preferred. If a recombination site is vector in which a promoter that was specifi-
possessed by the T-DNA, a segment between cally functional during microsporogenesis,
two of the sites in the tandem T-DNA could in pollen or in seed, was placed upstream
be deleted so that a clean, single T-DNA of a site-specific recombinase gene. More
integration pattern could be generated. efficient transmission of the recombined
The multi-auto transformation (MAT) allele to the progeny was observed com-
vector system uses recombinase-based exci- pared to previously described auto-excision
sion to enable the production of marker-free strategies that rely on chemical or physical
transgenic plants (Sugita et al., 2000). An induction of the recombinase. The results
Agrobacterium isopentenyltransferase (ipt) presented by Verweire et al. (2007), together
gene provides a positive visual selectable with the results obtained by Mlynarova
marker for transformation by catalysing et al. (2006) and Luo et al. (2007), clearly
cytokinin synthesis and inducing a shooty indicate that germline-specific auto-exci-
phenotype on hormone-free medium. After sion is an efficient, flexible and versatile
selection, subsequent excision via the R/RS system to remove selectable markers from
system produces marker-free transgenic transgenic plants.
plants with a normal phenotype, allowing ipt
and MAT to be used again for another round Use of transposons
of transformation. Recent improvements
to the method have increased its efficiency The maize Ac/Ds transposable element
and have allowed it to be applied to species system has been used to create novel
that do not regenerate through cytokinin- T-DNA vectors for separating genes that are
dependent organogenesis, but rather via linked together on the same T-DNA after
somatic embryogenesis (Endo et al., 2002b). insertion into plants. The expression of the
480 Chapter 12

Ac transposase from within the T-DNA bined with the site-specific recombina-
can induce the transposition of the gene of tion system (R/RS). At transformation, the
interest from the T-DNA to another chromo- oncogenes regenerate transgenic plants and
somal location (Shrawat and Lrz, 2006). then are removed by the R/RS system to
This results in the separation of the gene generate marker-free transgenic plants. The
of interest from the T-DNA and selectable choice of a promoter for the oncogenes and
marker gene. the recombinase (R) gene, the state of plant
materials and the tissue culture conditions
greatly affect efficiency of both the regenera-
Use of homologous recombination
tion of transgenic plants and the generation
Homologous recombination between direct of marker-free plants (Ebinuma et al., 2004).
repeats provides a method for excising These conditions have been evaluated in
marker genes after transgenic cells and several plant species to increase their gen-
shoots have been isolated. The strategy uses eration efficiency and the MAT system has
native plant enzymes and is simple because been applied to tobacco and rice (Endo
it avoids the need for foreign site-specific et al., 2002a, b).
DNA recombinases (Corneille et al., 2001; As discussed above, marker-free
Hajdukiewicz et al., 2001). Efficient imple- transgenic cereal plants can be generated
mentation of the method requires high rates at varying efficiencies using different
of homologous recombination relative to approaches and techniques, followed by
illegitimate recombination pathways. The segregation of the genes in the subsequent
procedure works well in plasmids where sexual generation. However, there are lim-
homologous recombination predominates. itations associated with these techniques
Marker genes are flanked by engineered (Shrawat and Lrz, 2006). For example,
direct repeats. The number and length of co-transformation technology is not suit-
direct repeats flanking a marker gene influ- able for all plant species and its efficiency
ence the excision rate. Excision is automatic is clearly dependent on a number of vari-
and loss of the marker gene is controlled by ables, including the Agrobacterium strain
selection alone. After transgenic cells have and the plant tissue being transformed. In
been isolated, selection is removed allow- addition, this technique is labour inten-
ing loss of the marker genes. Excision is a sive, requiring the production of a large
unidirectional process resulting in the rapid number of transgenic plants to isolate the
accumulation of high levels of marker-free plant of interest. Although site-specific
plastid genomes. Cytoplasmic sorting of recombinases hold the greatest prom-
marker-free plastids from marker-containing ise for the excision of selectable marker
plastids leads to the isolation of marker-free genes, concerns also exist about pleio-
plants. Marker-free plants can be isolated tropic effects induced by the action of
following vegetative propagation or among recombinase on cryptic excision sites in
the progeny of sexual crosses. the plant genomes. A transposon to sepa-
rate the selectable marker gene and gene
Use of positive markers of interest (Goldsbrough et al., 1993) is of
limited use. Homologous recombination
Ebinuma et al. (2001) developed removal approaches, although interesting from a
systems combined with a positive marker, scientific point of view, are only effective
which are called MAT vectors. The MAT for a few model systems.
vector system is designed to use the onco-
genes (ipt, iaaM/H, rol) of Agrobacterium,
which control the endogenous levels of 12.5 Transgene Integration,
plant hormones and the cell response to Expression and Localization
plant growth regulators, to differentiate
transgenic cells and to select marker-free Once whole plants are generated and
transgenic plants. The oncogenes are com- produce seeds, evaluation of the progeny
Gene Transfer and GM Plants 481

begins. The transgenic plants should be integration of defined T-DNAs, often into tran-
evaluated for transgene integration, expres- scriptionally active sites. Gene targeting has
sion and localization. the potential to place foreign gene sequences
in predetermined regions of the genome thus
potentially overcoming so-called position
12.5.1 Transgene integration effects on transgene expression. Transposons
can be used to deliver recombination targets
As a part of the regulatory process associ- for subsequent site-specific integration.
ated with commercial release of a trans-
genic plant product, transgene integration
events must be fully characterized. For 12.5.2 Transgene expression
transgene technology to be useful, trans-
genes must have predictable and stable
Transformation technologies can be used
expression. Technologies have been sought
for characterizing expression elements
that would enhance our ability to create
using reporter genes, utilizing transgene
transgenic plants with the desired expres-
expression to modify endogenous metabolic
sion characteristics. One of these technolo-
activities, introducing transgenes conferring
gies involves the use of matrix attachment
novel phenotypic characteristics, inactivat-
regions (MARs). MARs are DNA sequences
ing genes using anti-sense or co-suppres-
that bind specifically to a network of pro-
sion technologies and identifying genes by
teinaceous fibres, called the nuclear matrix,
complementation. The characterization of
which permeates the nucleus. These MAR
constitutive and non-constitutive promoter
matrix interactions are thought to organize
elements has advanced the most in cereal
chromatin into a series of independent loop
transformation, but there are other non-
domains. When MARs are positioned at
promoter elements that regulate and control
the 5'- and 3'-ends of a transgene more pre-
gene expression in transgenic plants, which
dictable expression of the transgene results
include transcript termination, transcript
(Allen et al., 2000).
stability, post-transcriptional modification,
Transgenic plants often contain complex
translation efficiency and protein targeting.
integration structures at an undetermined
Transgenes currently used in cereal
genomic location, which may cause varia-
transformation have a relatively simple
tions in gene expression. It has been dem-
structure. They usually contain: (i) a pro-
onstrated that the precise integration of a
moter, usually of plant, bacterial or viral
transgene in a pre-determined genomic loca-
origin, which may be constitutive (Act1),
tion can reduce the variation in transgene
inducible (Hsp70) or tissue-specific (Amy1)
expression (Day et al., 2000). The integration
and which may have been modified for
of transgenes in a pre-determined genomic
optimal activity; (ii) a coding sequence,
locus can be achieved by the use of site-
which may have been modified for optimal
specific recombinase systems, such as cre/
expression in transgenic plants, e.g. trans-
lox and FLP/frt (Ow, 2002). Integration by
lation initiation site modification, targeting
homologous recombination would favour the
information, glycosylation site modification
establishment of a simple integration pattern
and codon usage modification; and (iii) a
and allow the insertion of a transgene into a
transcript termination sequence.
known and stable region of the genome.
Individual transgenic lines with com-
plex integration patterns are generally
considered undesirable. There has been a 12.5.3 Confirmation of transgene
drive to achieve cereal transformation using and analysis of gene expression
Agrobacterium and other target recombi- in transgenic plants
nation/integration systems. Agrobacterium-
mediated DNA integration is a defined Commonly used methods to confirm the
process that generally results in low copy putative transgenic plants, as discussed in
482 Chapter 12

Chapter 3, are PCR, Southern blotting, while concatemeric head-to-tail multiple


Western blotting, Northern blotting, enzyme- copies are good for high levels of stable
linked immunosorbent assay (ELISA), func- expression. Numerous medium-to-high
tional assay (testing the presence of selecta- throughput analytical techniques are
ble marker and the target gene), in situ available to quantify the levels of mRNA
hybridization and progeny analysis (segre- transcripts of large numbers of genes. RNA-
gation of the target gene). In Southern blot- based techniques for the analysis of trans-
ting, whether an introduced gene is indeed gene expression begins with the extraction
present in the plant DNA and whether mul- of RNA and its evaluation in terms of purity
tiple transgenic plants carry the introduced and integrity by spectrophotometry and gel
genes on the same size of DNA fragment electrophoresis, respectively, followed by
(suggesting a single transformation event) the use of Northern blotting, reverse tran-
or on different sized fragments (suggesting scription- (RT-) PCR and quantitative or
independent transformation events) can real-time RT-PCR (QRT-PCR) for quantifying
be determined using the cloned gene as a RNA and in situ hybridization for studying
probe. Northern blotting is used to deter- tissue-level expression patterns. QRT-PCR
mine whether the introduced gene has been is a highly sensitive technique for quantify-
transcribed into mRNA and accumulates ing mRNA copy numbers of specific genes.
in the transgenic plant. Western blotting, The method permits a direct measurement
ELISA and specific techniques must be of products during the log-linear phase of
used for analysis of protein (enzyme) activ- the PCR reaction via the incorporation of a
ity. The Western blotting detects the protein fluorescent probe in the PCR reaction mix
of the transgene in an extract of protein pre- and the use of a thermocycler equipped
pared from various parts of the transgenic with an optical sensor for fluorescence
plants and is, therefore, an assay for a func- quantification. The details on the selection
tional transgene. When the selectable mark- and suitability of and a comparison of vari-
ers used are antibiotic or herbicide-resistant ous techniques for analysis of gene expres-
genes, a functional assay can be made by sion in general can be found in Jones (1995)
spraying antibiotics or smearing herbicide and Bartlett (2002).
on the leaves of those putative transgenic The stable inheritance and expression
seedlings or plants in later segregating of foreign genes are of critical importance in
populations. With stable transformed genes, the application of GM-plants to agriculture.
progeny testing should show the presence The perfect transformation would contain a
and activity of the selectable marker and single copy of the transgene that would seg-
target genes, such as the gene gfp encoding regate in a Mendelian fashion, with uniform
green fluorescent protein (GFP), or bar and expression from one generation to the next.
disease resistance. However, studies on transgene behaviour
When the PCR method is used, two indicated that segregation in transformed
primers specific for the selectable marker lines of cereals does not always follow the
(bar or cah gene, for example) are used in a typical Mendelian fashion but an aberrant
PCR reaction with genomics DNA extracted segregation (Barro et al., 1998; Vain et al.,
from the transgenic plants. The DNA 2003; Wu, H. et al., 2006). This is a highly
fragment yielded should have the predicted undesirable character when it occurs for
size (the length equal to the number of base transgenes encoding a useful trait. Many
pairs between the two primers in the trans- factors can contribute to variation in trans-
gene). The presence of additional transgenes gene expression, including tissue culture-
in the same plants carrying the selectable induced variation or chimerism in the
marker can be detected with a different set primary integration site (position effects),
of primers using the same template DNA. transgene copy number (dosage effects),
It should be noted that organization of transgene mutation and epigenetic gene
the transgenic locus with inverted repeats silencing (as reviewed by Shrawat and Lrz,
of multiple copies results in silencing 2006). Gene silencing, the decline or loss of
Gene Transfer and GM Plants 483

gene expression in subsequent generations tissue where the gene is expressed, as well
of primary transformants, can occur at the as the cellular localization of the protein.
transcriptional or post-transcriptional level The reintroduction of the full-length cDNA
and the phenomenon has often been asso- into a plant can also result in either over-
ciated with a high transgene copy number expression or silencing of that gene. The
(Matzke and Matzke, 1995; Matzke et al., subsequent phenotype that is observed pro-
2000). Studies have indicated that the vides clues as to the function of the gene. In
problem of transgene silencing raises seri- addition, overexpression of such a gene, for
ous concerns regarding the selection of which a full-length cDNA is available, can
transgenic lines for crop improvement with be accomplished in a heterologous system,
specific trait(s). Therefore, it now appears such as yeast or E. coli, followed by in vitro
imperative that transgenic lines carrying studies of the protein function.
gene(s) of economic importance need to be Transformation of allelic series into iso-
carefully tested for gene expression levels genic backgrounds can confirm the function
over many generations. of individual sequence motifs. However,
Particle bombardment has featured current plant transformation protocols based
strongly in the burgeoning field of cereal on non-homologous end joining result in
functional genomics, specifically through random genomic integration of transgenic
the development of transposon-tagged plant DNA, position effects, multiple insertions of
lines for the systematic functional char- the transgene and transgene alterations (Xu,
acterization of plant genes. For example, 1997; Hanin and Paszkowski, 2003), obscur-
Kohli et al. (2001, 2004) produced a large ing quantitative phenotypic differences
population of transgenic rice plants tagged between alleles. This can be circumvented
with the maize Ac transposon. They found using homologous recombination-based,
that this population was suitable for satura- locus-targeted integration of alleles. Recently,
tion mutagenesis and the rapid PCR-based 1% of insertion events in rice were found
cloning of interrupted genes using unique to result from homologous recombination
barcode elements present in the DNA cas- (Terada et al., 2002). If this finding can
sette used for transformation (Kohli et al., be confirmed, rice genomics-genetics will be
2001). Callus induced from specific trans- revolutionized. Further, if the method can
poson-tagged rice plants was maintained in be applied to other species, a similar advance
a dedifferentiated state prior to regenera- in genomics of all plants would occur.
tion into clonal transgenic lines, prolonging Virus-based vectors can be efficiently
the developmental phase characterized by used for high levels of transient expression
hypomethylation of genomic DNA (Kohli of foreign proteins in transfected plants and
et al., 2004). This resulted in a dramatically permit non-Agrobacterium bacterial spe-
increased frequency of secondary trans- cies to be employed for the production of
position events compared to seed-derived transgenic plants (reviewed by Chung et al.,
plants, thus increasing the rate of genome 2006). Viral vectors hold great promise as
saturation. efficient tools for transient recombinant
As detailed more fully in Chapter 6 of protein expression in plant cells because
Cullis (2004), the use of tagged full-length of their ability to replicate in host cells
cDNAs in transgenic plants can be a first autonomously (Marillonnet et al., 2004,
step in isolating and identifying the protein 2005). These viral vectors are built on the
complexes that exist in vivo. Genetic trans- backbones of plus-sense RNA viruses, such
formation can also be used to develop a pro- as tobacco mosaic virus (TMV) or potato
tein atlas of where in the cell each of the virus and have been used for the expression
genes is expressed. A full-length cDNA can of foreign sequences in plants (Porta and
be tagged with a dye and the tagged probe Lomonossoff, 2002; Gleba et al., 2004).
transformed back into the plant under the The recent development of reliable and
control of its native promoter. The site of efficient Agrobacterium-mediated transfor-
the fluorescence will indicate the organ or mation technologies for cereals (for review,
484 Chapter 12

see Shrawat and Lrz, 2006; Goedeke et al., under, for instance, the direction of differ-
2007) has stimulated a variety of strategies ent promoters or the presence of different
towards functional gene characterization, transcription factors may be investigated.
thereby paving the way for deeper under- Reporter genes are used in cereal transfor-
standing of crop plant biology in cereals mation for analysing gene function, monitor-
(Himmelbach et al., 2007). Comprehensive ing selection efficiency in both transformed
analyses of gene function include stable tissue and transgenic plants and following
transformation with sequences for overex- the inheritance of foreign genes in subse-
pression or knock-out of plant genes. quent plant generations.
Transient expression assays using
promoterreporter fusion genes may be used
to analyse gene regulation and function.
12.5.4 Reporter genes There can be incongruity between results
obtained from transient assays and those
Reporter genes, whose expression can be observed in stably transformed plants. The
easily monitored, are useful in many ways utility of different reporter genes in cereal
in plant transformation. Strength and tem- transformation is a function of the proper-
poral, spatial and other types of regulation ties of the respective protein products they
of promoters and other elements may be encode. The required properties a good
conveniently assayed by connecting these reporter gene should have include: (i) expres-
elements to the reporter genes. Genes for sion in plant cells; (ii) low background activ-
GUS (Jefferson, 1987), luciferase (Ow et al., ity in transgenic cereals; (iii) no detrimental
1986) and GFP (Pang et al., 1996) are popu- effects on plant metabolism; (iv) only mod-
lar examples. Gene fusions of the reporters erate stability in vivo so as to detect down-
and proteins of interest may be employed to regulation of gene expression as well as gene
examine the subcellular localization of the activation; and (v) coming with an assay
proteins. system that is non-destructive, quantitative,
Reporter genes that are connected to sensitive, versatile, simple to carry out and
constitutive promoters may be used to inexpensive. The coral-derived red fluores-
monitor the process of transformation. The cent protein DsRed is one of the reporter
establishment of genetic transformation pro- systems currently used in cereal transforma-
cedures has relied on, among other factors, tion that have all these desired properties.
the use of efficient reporter genes, which
easily allows the detection of transgenic -Glucuronidase
events after a transformation experiment, in
either a transient or stable expression assay. b-Glucuronidase (GUS) catalyses the hydrol-
Expression of the reporter genes soon after ysis and cleavage of a wide range of fluoro-
the inoculation of plant cells with A. tumefa- metric and histochemical b-glucuronide
ciens, is referred to as transient expression. substrates. Since GUS gene (gus, gusA, or
Expression of the reporter genes later in a uidA) was first isolated from E. coli, many
cluster of cells growing on selection media efforts have been made to develop the
is a piece of evidence for integration of the E. coli uidA gene as a reporter system for
T-DNA in plant chromosomes. A binary plant transformation. Indeed, it has become
vector that carries a constitutive selectable the most widely used marker system,
marker and a constitutive reporter is very mainly because of the enzyme stability
useful as a control vector both in transfor- and high sensitivity and amenability of the
mation experiments and in assays of gene assay to detection by fluorometric, spectro-
expression (Komori et al., 2007). photometric, or histochemical techniques.
It should also be mentioned that gene In addition, there is little or no detectable
reporter systems have played a key role in GUS activity in almost any higher plant tis-
many gene expression and regulation stud- sues. The expression of gus gene fusions
ies, in which expression of a reporter gene can be quantified by fluorometric assay.
Gene Transfer and GM Plants 485

Histochemical analysis can be used to local- cations. The GFP was isolated from a jel-
ize gene activity in transgenic tissues. lyfish (Aequorea victoria) in 1992 and has
There are a number of problems since been modified for specific applications
associated with the use of gus reporter genes. and transformed into many different organ-
The expression assays of the gus gene are isms. GFP monitoring has the potential to
destructive. The GUS protein shows high in track transgenes under large spatial scales
vivo stability, leading to problems when used utilizing visual or instrumental detection of
to monitor gene inactivation. Histochemical the characteristic green fluorescence of trans-
localization of GUS enzyme activity can be genic materials. There are other versions of
leaky. Dependence on the use of gus genes GFP fluorescing at different wavelengths
to monitor the efficiency of cereal transfor- that allow detection of multiple proteins.
mation protocols has often been misleading. GFP expression in mammalian cells yields
a green fluorescence when excited by blue
Luciferase light, which does not require additional gene
products or exogenous substrates for activity
The product of the firefly (Photinus pyralis) and detection is non-destructive.
luciferase gene (luc) catalyses the oxidation GFP showed relatively weak activity
of D()-luciferin in the presence of ATP to in transformed plant cells and a number of
generate oxyluciferin and yellow-green light. modifications have been made to increase
The activity of luciferase gene fusions can be GFP expression in plants. The modifications
assayed in transformed cereal tissue non- include: (i) point mutations to increase
destructively. There are a number of prob- signal intensity and shift excitation peak;
lems associated with the use of luc reporter (ii) mutations to alter codon usage for effi-
genes. First, penetration of the luciferin sub- cient translation and increased mRNA sta-
strate can be limiting in whole plant material. bility; (iii) mutation to remove cryptic intron
Secondly, detection equipment presently splice junctions to increase mRNA process-
needed to monitor luciferase gene expres- ing and stability; (iv) subcellular localiza-
sion is relatively expensive. luc genes are tion, targeting to the oestrogen receptor, to
widely used as an internal standard with gus reduce mild phytotoxicity; and (v) mutation
fusions constructed to study gene expression to inhibit thermosensitive protein misfold-
in transient assays and in transgenic plants. ing. The mgfp5-er variant gene has been
shown to be a feasible transgene monitor in
Anthocyanin biosynthetic pathway genes plants under field conditions (Haseloff et al.,
C1, B and R genes code for trans-acting factors 1997; Harper et al., 1999). GFP has also been
that regulate the anthocyanin biosynthetic shown to be a feasible qualitative marker for
pathway in maize seeds. Introduction of these the presence of a linked synthetic Bt crylAc
regulatory genes, with constitutive promoters, endotoxic transgene (Harper et al., 1999;
into cereal cells induces cell autonomous pig- Halfhill et al., 2001). With these beneficial
mentation in non-seed tissues. This reporter characteristics, the next step in the develop-
system does not require the application of ment of a GFP monitoring system is to better
external substrates for its detection. describe the system and resolve weaknesses
that could limit the utility of the monitoring
system (Halfhill et al., 2004b).
Green fluorescent protein

For a monitoring system to be effective, the


genetic marker technology should be accurate 12.5.5 Promoters
with few false positives or negatives, detecta-
ble throughout the life cycle of the plant and Promoters for constitutive transgene
able to inform on the status of genetically expression
linked or fused transgenes of interest. Green
fluorescent protein (GFP) has been proposed The uses of constitutive promoters in trans-
as a whole-plant marker for field-level appli- genic cereals include: (i) the expression of
486 Chapter 12

reporter genes to monitor transformation energy consumption, which could, in turn,


protocols; (ii) expression of marker genes for generate phenotypes that are not directly
transgenic cell selection; (iii) expression of correlated with the recombinant protein
herbicide tolerance genes; (iv) repression of itself. To avoid such unwanted pleiotropic
endogenous and/or pathogenic gene expres- effects that occlude phenotypic analysis,
sion through antisense and co-suppression as reviewed by Himmelbach et al. (2007),
technologies; (v) overproduction of bio- transgene expression can be controlled
molecules; and (vi) overexpression of dis- temporally and spatially by the use of cell-
ease and stress tolerance genes prior to the and tissue-specific or chemically inducible
employment of targeted gene expression promoters.
strategies. The constitutive promoter of the
CaMV 35S RNA transcript (35S) was ini- Promoters for non-constitutive transgene
tially used for constitutive gene expression expression
in cereal transformation. CaMV 35S pro-
moter is used extensively in dicot transfor- Progress in rice transformation has facilitated
mation and is available at the onset of cereal the study of non-constitutive promoters in
transformation and used to express selecta- transgenic cereals using gus reporter gene
ble marker genes in rice and maize. There fusions. Non-constitutive promoters used in
are problems associated with the CaMV 35S rice transformation include cereal promot-
promoter in transgenic cereals. It has rela- ers (maize Adh1, wheat His3 and rice rbcS),
tively low activity in transient assays and dicot promoters (tomato rbcS, potato pinII),
is not completely constitutive in transgenic bacterial promoters (Agrobacterium rhizo-
cereal plants. In general, the CaMV 35S pro- genes roIC) and viral promoters (rice tungro
moter is not a strong promoter for cereals. bacilliform virus major transcript). A number
In order to get it expressed at high levels, an of general conclusions can be drawn from
intron or other enhancers are needed. these studies in rice: (i) inclusion of the pro-
A number of strategies have been used moter region alone is (usually) enough to
to increase gene expression in monocots. give the expected pattern of reporter gene
These strategies are: (i) enhancement of expression; (ii) signal transduction path-
the CaMV 35S promoter, e.g. e35S, 2 35S; ways are usually conserved between cere-
(ii) incorporation of an intron into the for- als, e.g. barley a-amylase promoter activity
eign gene transcript unit to elevate mRNA in transgenic rice; (iii) cereal promoters can
abundance; (iii) modification of a mono- show higher activity than their dicot homo-
cot promoter for high-level constitutive logues, e.g. rice versus tomato rbcS-gus
activity in cereals, e.g. the modified maize gene expression; and (iv) intron-mediated
Adh1 sequence in the Emu promoter; and enhancement of gene expression in cereals
(iv) isolation of monocot promoter that cells does not generally alter their pattern
show high level constitutive activity in of activity. The isolation and utilization of
cereals, e.g. the rice actin (Act1) and maize cereal promoters will continue because of
ubiquitin (Ubi1) promoters. difference in the biochemistry, physiology
Most overexpression studies employ a and/or morphology between monocots and
strong, constitutive promoter, such as the dicots (e.g. aleurone-specific expression)
CaMV 35S promoter, followed by pheno- and the need to obviate potential problems
typic analysis of the transgenic plant. In with the use of non-cereal genetic elements
many cases, ectopic expression experiments in transgenic cereals.
gave important insight into gene function
(Jack et al., 1994). However, as a possible
consequence of ubiquitous overexpression
and misdirection of gene products, undesir- 12.5.6 Transgene inactivation
able pleiotropic effects on the plant may be
caused. In addition, strong accumulation What causes transgene inactivation is not very
of unnecessary proteins leads to wasteful conclusive. Although it might be associated
Gene Transfer and GM Plants 487

with high copy/complex integration events transgenes, including optimizing codon


and nucleic acid interactions between mul- usage to produce less foreign-looking
tiple copies of homologous DNA sequences, transgenes and the isolation and utilization
there are many examples where single copy of insulating sequences; (iii) transgene
events, simple interaction patterns and heter- expression, the regulation of transgene tran-
ologous sequences also cause transgene inac- scription rates and/or transcript structure to
tivation. Two things matter in this context: reduce excess (aberrant read through/anti-
one is organization of the transgenic locus, sense) RNA-mediated transcript turnover;
with inverted repeats of multiple copies (iv) the use of site-directed gene targeting
causing silencing; the other is the integrity of (e.g. cre/lox, FLP/frp) to target transgenes
the inserted DNA sequence, where partial or into chromosomal regions that provide an
rearranged copies cause problems. optimal sequence environment for stable
Inactivation can act at different steps expression; (v) use of double-haploid sys-
in transgene expression: (i) transcription tems to rapidly evaluate transgene stability
inactivation, by de novo methylation (often in homozygous plants; (vi) stress mediated
of promoter regions) and/or heterochroma- induction of hypermethylation in tissue cul-
tin formation, e.g. homology-dependent ture using stress mimics such as propionic
trans-inactivation of the maize AI gene in or butyric acid; and (vii) evaluation of trans-
transgenic petunia and natural paramu- gene expression under different field con-
tation at the maize B locus; and (ii) post- ditions and different genetic backgrounds.
transcriptional inactivation, by increased RNA silencing and associated RNAi, as a
RNA turnover, antisense and/or defective fundamental mechanism of gene regulation
transcript effects (e.g. overproduction of in plants, has great potential application in
untranslatable tobacco etch virus coat pro- plant science including regulating transgene
tein transcripts in transgenic tobacco) and expression (Eamens et al., 2008).
co-suppression, such as de novo methylation
of coding regions.
The potential signals for transgene
inactivation include: (i) DNA structure/ 12.6 Transgene Stacking
integration site, the recognition of an intro-
duced gene as foreign and its subsequent Multiple gene transfer to plants is neces-
de novo methylation, e.g. maize AI gene sary for sophisticated genetic manipula-
in petunia (but not the related gerbera A1 tion strategies, such as the stacking of
gene); (ii) DNADNA association between transgenes specifying different agronomic
genes leading to a transmission of chroma- traits, the expression of different polypep-
tin-based transcription states and/or de novo tide subunits making up a multimeric pro-
methylation, e.g. repeat-induced transgene tein, the introduction of several enzymes
methylation in Arabidopsis; and (iii) DNA acting sequentially in a metabolic pathway
RNA association RNA (antisense/aberrant) or the expression of a target protein and
accumulation causing a feedback signal to one or more enzymes required for specific
reduce gene expression via DNA methyla- types of post-translational modification.
tion or increased RNA turnover, e.g. potato As discussed in Chapter 1, most agronomic
spindle tuber viroid DNA methylation dur- traits are multigenic in nature. Plant genetic
ing viroid RNARNA replication. improvement will require manipulation of
There are several steps one can take complex metabolic or regulatory pathways
to minimize transgene inactivation, which involving multiple genes. A plant breeder
have been very useful for Arabidopsis or tries to assemble a combination of genes
tobacco. These are: (i) transgene integra- in a crop plant that will make it as useful
tion by development of recombination and productive as possible. Combining the
systems for transgene targeting to suitable best genes in one plant is a long and diffi-
genomic locations; (ii) transgene structure, cult process, especially as traditional plant
the elimination of repeated elements from breeding has been limited to artificially
488 Chapter 12

crossing plants within the same species or the other. Crossing both transgenic parental
with closely related species to bring dif- lines results in progeny of which 25% (in
ferent genes together. The growing interest case both parents were hemizygous for the
in dissecting and analysing complex meta- transgenes) or all (in case both parents were
bolic pathways and the need to exploit the homozygous for the transgenes) contain the
full potential of multi-gene traits for plant two transgenes.
biotechnology (for review, see Halpin and The main advantage of the crossing-
Boerjan, 2003; Tyo et al., 2007) provide based method for transgene stacking is that
a mandate for the development of new the method is technically simple. It only
methods and tools for the integration of involves transfer of pollen from one par-
multiple transgenes into the plant genome ent to the female reproductive organ of the
(multi-transgene pyramiding or stacking) other. One other advantage is that transgenic
and coordinated expression of these trans- populations of each parent can be screened
genes in transformed plants. for optimal expression of each transgene,
Several approaches can be considered thus facilitating the combination of two
when using single-gene vectors for the optimally expressed transgenes. However,
delivery of multiple genes into plant cells the procedure is relatively time-consuming,
(Halpin et al., 2001; Daniell and Dhingra, certainly if more than two transgenes need
2002; Halpin and Boerjan, 2003). Some to be combined by sequential crossing. The
of the approaches used for the produc- two transgenes in the lines resulting from
tion of transgenic plants carrying multiple the cross will most probably reside on dif-
new traits include: (i) re-transformation ferent chromosomal loci that complicate
(Singla-Pareek et al., 2003; Seitz et al., further breeding through conventional
2007), the stacking of several transgenes methods. Furthermore, for some agronomi-
by successive delivery of single genes into cally important crops like potato and cas-
transgenic plants; (ii) co-transformation sava, the high level of heterozygosity in the
(Li, L. et al., 2003; Altpeter et al., 2005a), species makes crossing approaches difficult
the combined delivery of several trans- and time-consuming. Crossing is very diffi-
genes in a single transformation experi- cult to apply to plants that are vegetatively
ment; and (iii) sexual crosses (Ma et al., propagated (e.g. perennial fruit crops and
1995; Zhao et al., 2003; Lucker et al., 2004) many ornamentals) since the (desired) het-
between transgenic plants carrying differ- erozygous nature of the genetic background
ent transgenes. will be altered due to recombination during
In this section, several transgene- meiosis (Gleave et al., 1999).
stacking/pyramiding methods will be dis- Sexual crosses among transgenic plants
cussed, which are mainly based on two make it possible to exploit powerful super-
reviews by Francois et al. (2002a) and traits that are not attainable through
Dafny-Yelin and Tzfira (2007) and revision traditional methods. One example of a
of the different multi-transgene-pyramiding crop carrying such new characteristics is
methods. Table 12.3 summarizes the Monsantos multi-stacked maize, which was
advantages, disadvantages and examples produced via conventional crossing of three
of the different multi-transgene-stacking inbred transgenic maize lines: MON863,
methods in plants. MON810 and NK603. The elements incorpo-
rated into this multi-stack include five loci,
four of which carry a synthetic gene linked
12.6.1 Sexual crosses to combinations of strong regulatory ele-
ments from viruses, bacteria and unrelated
In a crossing experiment, two plants are plants. Expression of the first two synthetic
crossed to obtain progeny that consists of genes produces an EPSPS that resembles
the traits of the two parents. In the case of the EPSPS from E. coli and is, unlike most
transgenic plants, a first gene is introduced plant versions, not inactivated by herbicides
in one of the parents and a second gene in containing glyphosate. The third synthetic
Table 12.3. Summary of advantages, disadvantages and examples of multi-transgene-stacking methods in plants.

Techniques Advantages Disadvantages Example and reference

Crossing Technically simple; pre-selection Time-consuming; difficulties Mercury detoxification (Bizily et al, 2000);
of parents with optimal gene in further breeding; not applicable to antibody engineering (Hiatt et al., 1989);
expression vegetatively propagated plants antimicrobial resistance (Zhu et al., 1994)
Sequential Applicable to vegetatively propagated Time-consuming; necessity for different Plant fertility restoration system (Hird et al.,
transformation plants; allows maintenance of elite selection markers 2000); removal of selectable marker gene
genotype (Gleave et al., 1999)
Co-transformation
Single plasmid Linked integrationa; single Technically demanding; linked Reporter gene expression (Christou and
transformation event integrationa Swain, 1990)
Multiple plasmid Technically simple; single Dependence on co-transformation Production of vitamin A-enriched rice
transformation event frequency (Golden Rice) (Ye et al., 2000);

Gene Transfer and GM Plants


polyhydroxyalkanoate production
(Slater et al., 1999)
IRESb-based Use of only a single set of promoter/ Tissue-specificity; developmental Reporter gene expression (Urwin et al.,
approach terminator sequences; single regulation; lower expression 2000)
transformation event; natural system levels; linked integrationa
for co-expression of multiple genes;
linked integrationa
Transplastomic Use of only a single set of Chloroplast containment of gene Insect resistance (De Cosa et al., 2001)
technology promoter/terminator sequences; single products; linked integrationa; number
transformation event; natural system for of plant species to which transplastomic
co-expression of multiple genes; linked technology is applicable is limited;
integrationa; high expression levels; low success rate of gene insertion
avoiding positional effects into chloroplast genome
Polyprotein approach
Potyviral system Use of only a single set of promoter/ Energetically wasteful; aberrant Expression of mannityl opine biosynthetic
terminator sequences; coordinated cleavage of endogenous plant pathway (Beck von Bodman et al., 1995)
expression substrates by viral protease
Cleavage dependent Single transformation event; possible Necessity for that plant protease Nematode resistance (Francois et al., 2002c);
on plant protease targeting of gene products antifungal resistance (Francois et al., 2002b)
2A system Linked integrationa Addition of 2A originating sequences Reporter gene expression (Urwin et al.,
to the gene products 1998)

489
a
Linked integration of the transgenes can be advantageous when the transgenic line is to be used in traditional breeding, whereas linked integration can be undesirable when one of the
transgenes (e.g. the selectable marker gene) is to be removed via outcrossing.
b
IRES, internal ribosome entrysite.
490 Chapter 12

gene encodes the insecticidal cry3Bb1 pro- Co-transformation with multiple plas-
tein with activity against specific Coleoptera, mids has the obvious advantage that assem-
whereas the fourth gene product, cry1Ab, pro- bly of the different expression cassettes
vides tolerance against certain Lepidopteran is technically easier as it is done inde-
insects. The fifth gene is a bacterial kan- pendently on different plasmids (Komari
amycin resistance gene encoding neomycin et al., 1996). The success of this technique
phosphotransferase (nptII). The pentuple depends on the frequency with which two
stack maize currently occupies millions of (or more) independent transgenes are both
hectares in the USA and supports a substan- transferred to the plant cell and integrated
tial reduction in pesticide usage. into the cell genome (= co-transformation
frequency). Agrawal et al. (2005) trans-
formed rice simultaneously with five mini-
12.6.2 Co-transformation via plasmids mal cassettes, each containing a promoter,
coding region and polyadenylation site but
Co-transformation is defined as the simul- no vector backbone. They found that multi-
taneous introduction in a cell of multi- transgene co-transformation was achieved
ple genes followed by the integration of with high efficiency using multiple cas-
the genes in the cell genome. The genes settes, with all transgenic plants generated
are either present on the same plasmid containing at least two transgenes and 16%
used in transformation (single-plasmid co- containing all five. They concluded that
transformation) or on separate plasmids gene transfer using minimal cassettes is an
(multiple-plasmid co-transformation). The efficient and rapid method for the produc-
main advantage of co-transformation for tion of transgenic plants containing and sta-
transfer of multiple genes into a plant is bly expressing several different transgenes.
that a single transformation event can result Their results facilitate effective manipula-
in the integration of multiple transgenes as tion of multi-gene pathways in plants in a
opposed to sequential transformation which single transformation step.
requires multiple, time-consuming transfor-
mation events.
Theoretically speaking, however, co- 12.6.3 Co-transformation via particle
transformation has some technical limita- bombardment
tions. For single-plasmid co-transformation,
the main technical limitation is the diffi- Particle bombardment is the most conven-
culty to assemble complex plasmids with ient method for multiple gene transfer to
multiple gene cassettes (Franois et al., plants since DNA mixtures comprising
2002a). Standard transformation vectors are any number of different transformation
not really up to such a task. A major prob- constructs can be used, with no need
lem is that their multiple cloning sites con- for complex cloning strategies, multiple
sist merely of hexa-nucleotide restriction Agrobacterium strains or sequential cross-
sites, which are often present within one ing (Altpeter et al., 2005a). Many studies
or more of the sequences that one wishes describe successful integration of two or
to insert in the vector. Insertion of more three different transgenes, in addition to the
than one or two expression cassettes often selectable marker, into plants by particle
requires inefficient partial digests or the use bombardment.
of linkers to convert one restriction site to Wu, L. et al. (2002) examined the co-
another or the use of inefficient blunt-end transformation of rice with nine transgenes
cloning. When plant transformation vectors via particle bombardment and documented
with multiple expression cassettes are even- the levels of transgene expression. They
tually finalized, it is often not possible to found that non-selected transgenes were
move or replace the cassettes in single clon- present along with the selectable marker in
ing steps, due to the presence of restriction about 70% of the plants and that 56% car-
sites at undesired locations. ried seven or more genes. This was much
Gene Transfer and GM Plants 491

higher than expected given the independ- duced full-sized multimeric antibodies in
ent integration frequencies, agreeing with a transgenic plants. These proteins comprise
model proposing that the integration of one at least two components, the heavy and light
gene into a specific locus in the rice genome chains, but more complex antibody forms
could mediate the insertion of other genes such as secretory antibodies (sIgA) also
into the same locus (Kohli et al., 1998). require a joining chain and a secretory com-
This phenomenon is important when large ponent. Nicholson et al. (2005) simultane-
numbers of genes are considered, since a ously delivered all four genes, together with
much larger transgenic population would a fifth gene encoding a selectable marker,
be required if each integration event were into rice by particle bombardment.
independent. Wu, L. et al. (2002) also For many applications of transgenesis,
found that all of the nine transgenes were production of different heterologous pro-
expressed and that the expression of one teins and hence introduction of multiple
gene was independent of each other. These transgenes (multi-transgene-stacking), is
findings are very useful in designing mul- highly desired. During the last decade, the
tiple plasmid transformation experiments number of approaches for multi-transgene-
such as those required for plant metabolic stacking in plants using transgenesis has sig-
engineering. nificantly increased. For all the benefits and
One of the most interesting recent devel- simplicity of combining co-transformation,
opments of particle bombardment is the retransformation and crosses while using
combination of multiple gene transfer and single-gene vectors for the delivery of multi-
clean DNA techniques, i.e. the simultaneous ple genes into plant species, these methods
transfer of multiple gene cassettes into rice suffer from several drawbacks. These include
plants. Three coat protein genes from the the undesirable incorporation of a complex
same virus were introduced simultaneously T-DNA integration pattern, often observed
to generate rice plants with pyramidal resist- during integration of T-DNA molecules from
ance against a single pathogen (Sivamani multiple sources (De Neve et al., 1997; De
et al., 1999). Similarly, Maqbool et al. (2001) Buck et al., 1999) and the time needed for
have shown how the same transforma- retransformation or crosses between trans-
tion strategy can provide pyramidal insect genic plants. More importantly, transgenes
resistance in rice. Datta et al. (2003) have derived from different sources typically
succeeded in the development of Golden integrate at different locations in the plant
indica rice lines containing four genes, genome, which may lead to various expres-
i.e. those required to extend the existing sion patterns and possible segregation of the
carotenoid metabolic pathway (psy, crtI transgenes in the offspring.
and lcy) in addition to the selectable marker Except for those discussed above, other
gene, either phosphomannose isomerase approaches for transgene stacking include
(pmi) or hygromycin phosphotransferase vector assembly, internal ribosome entry site
(hpt). Romano and colleagues synthesized (IRES), transplastomic technology and poly-
polyhydroxyalkanoates (PHAs) in trans- protein approach. Therefore, after evaluation
genic potatoes by simultaneously introduc- of the pros and cons of the different methods,
ing the phaG and phaC genes encoding one should be able now to select an appropri-
acyl-CoA trans-acylase and PHA polymer- ate approach for most purposes. Moreover,
ase along with the neomycin-phosphotrans- the potential of the different methods can
ferase selectable marker in three separate be significantly increased by combining
constructs (Romano et al., 2005). approaches. For example, for the delivery of
In addition to applications in metabolic different antimicrobial protein (AMP) genes,
engineering and multi-gene resistance strate- it has been able to double the capacity of
gies, the direct transfer of multiple genes has modular plant transformation vector by com-
also become a practical strategy for generat- bining it with a polyprotein strategy (Goderis
ing crops that produce multimeric proteins. et al., 2002). For this purpose, single transgene
For example, Nicholson et al. (2005) pro- units of the original vector were replaced by
492 Chapter 12

poly-AMP encoding expression cassettes and improved cultivar must be followed by sev-
transformed to A. thaliana. Single biologi- eral cycles of repeated backcrosses to the
cally active AMPs could be demonstrated in improved parent. The goal is to recover as
the resulting transgenic plants. much of the improved parents genome as
possible, with the addition of the transgene
from the transformed parent.
12.7 Transgenic Crop The next step in the process is multi-
Commercialization location and multi-year evaluation trials
in greenhouse and field environments, as
Genetic transformation has the potential to described in Chapter 10, to test the effects of
address some of the most challenging biotic the transgene and overall performance. This
and abiotic constraints faced by farmers in phase also includes evaluation of envir-
non-industrialized agriculture, which are onmental effects and food safety.
not easily addressed through conventional
plant breeding alone. The major constraints
include insect pests and viruses, as well 12.7.1 Commercial targets
as drought. A second advantage of genetic
transformation is that it can add an eco- Commercialization of transgenic products
nomically valuable trait while maintaining is influenced by markets, i.e. consumer
other desirable characteristics of the host demand for improved processes and new
cultivar. For example, enhanced product products are dependent on technology
quality or micronutrients can be added to scientific discoveries in molecular genetics
a well-adapted cultivar that already yields and biochemistry. Some examples of com-
well under local conditions. This feature is mercial targets include:
particularly attractive for semi-commercial,
1. Hybrid seed systems for heterosis and
small-holder farmers in non-industrialized
intellectual property protection, such as
agriculture, who are more likely to consume
nuclear male sterility systems for inbred
as well as sell their farm products. The poor
line production.
of the developing world should benefit from
2. Pest and disease tolerance genes: Bt genes,
the deployment of desirable transgenic crops
a-amylase inhibitors, viral coat proteins.
that follows scientifically-sound biosafety
3. Stress tolerance genes: barley Hva1,
and food safety standards and appropriate
maize ZmPLC1.
intellectual property management and stew-
4. Herbicide resistant crops: muta-
ardship (Ortiz and Smale, 2007).
tion screens for resistance to sethoxy-
Intrinsic to the production of trans-
dim (Poast), an acetyl-CoA carboxylase
genic plants is an extensive evaluation
(ACCase) inhibitor; transgenic plants for
process to verify whether the inserted gene
glyphosate (Roundup) resistance.
has been stably incorporated without det-
5. Genes for commercially valuable oils,
rimental effects to other plant functions,
proteins and starches: fatty acid biosynthetic
product quality, or the intended agroecosys-
gene modification in high oil corn; modifi-
tem. Initial evaluation includes attention to
cation of seed storage proteins; generation of
activity of the introduced gene, stable inher-
transgenic corn with improved amino acid
itance of the gene and unintended effects on
profiles; manipulation of carbon-partitioning
plant growth, yield and quality.
genes for novel starch production.
If a plant passes these tests, it may not
6. Genes for improved plant performance:
be used directly for crop production, but
generation of dwarf cultivars of wheat and
will be crossed with improved cultivars of
rice; PhyA expression for narrow-row crop
the crop. This is because not all cultivars of
production.
a given crop can be efficiently transformed
and these generally do not possess all the Most of these potential products gener-
producer and consumer qualities required ate revenue by lowering the costs (financial
of modern cultivars. The initial cross to the and/or environmental) of plant production,
Gene Transfer and GM Plants 493

e.g. reducing the level of chemical inputs 4. Improved grain quality: bean b-phaseolin
such as insecticides for both pests and seed storage gene expression in endos-
viral vectors. Following are examples for perm for improved lysine and isoleucine
the trangenes of agronomic importance levels.
that have been introduced into transgenic
Barley:
cereals:
1. Virus resistance: coat protein-mediated
Maize: barley yellow dwarf virus tolerance.
1. Insect resistance: synthetic truncated ver- 2. Improved malting/brewing character-
sion of the CrylA(b) protein from Bacillus istics: hybrid bacterial b-glucanase gene
thuringiensis for tolerance to European corn expression for enzyme thermotolerance.
borer. Wheat:
2. Virus resistance: coat protein-mediated
tolerance to maize dwarf mosaic virus. 1. Improved bread-making characteristics:
3. Herbicide resistance: the bar gene for PPT chimeric Dy10-Dx5 high molecular weight
(Liberty) tolerance; mutant epsps synthase gluten gene expression in endosperm.
genes for glyphosate (Roundup) toler- 2. Transgenes conferring herbicide resist-
ance; and mutant als gene for sulfonylurea ance: the bar gene for PPT (Liberty) tol-
(Glean) tolerance. erance; mutant EPSPS synthase genes for
glyphosate (Roundup) tolerance.
Rice:
1. Resistance to bacterial pathogens: chiti-
nase gene conferring enhanced tolerance to 12.7.2 Current status of transgenic crop
sheath blight; Xa-21 bacterial blight resist- commercialization
ance gene.
2. Virus resistance: coat protein-mediated Commercial adoption by farmers of trans-
rice stripe virus tolerance; coat protein- genic crops has been one of the most
mediated rice dwarf phytoreovirus rapid cases of technology diffusion in
tolerance. the history of agriculture (Borlaug, 2000).
3. Insect resistance: Bt CrylA(b) gene Commercialization of transgenic crops
expression for leaf folder and stem borer started in 1996. Fig. 12.6 provides data on
tolerance. the global areas of biotech/GM-crops grown

Total 23 Biotech crop countries


140 Industrial countries
Developing countries
120

100
Million ha

80

60

40

20

0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year

Fig. 12.6. Global area of biotech/GM crops (19962007). From James (2008) with permission.
494 Chapter 12

over the last 12 years (19962007) (James, GM crops in the USA in 2007 were stacked
2008). As a result of consistent and substan- products containing two or three traits that
tial benefits during the first dozen years of delivered multiple benefits.
commercialization, farmers have continued Soybean is the principal biotech/
to plant more biotech/GM-crops every sin- GM-crop, occupying 58.6 million ha (51%
gle year. In 2007, the global area of biotech/ of global biotech/GM area), followed by
GM-c0rops reached 114.3 million ha with fast-growing maize (35.2 million ha at 31%),
an unprecedented 67-fold increase between cotton (15.0 million ha at 13%) and canola
1996 and 2007, making it the fastest adopted (5.5 million ha at 5% of global biotech/
crop technology in recent history. The pro- GM-crop area) (Fig. 12.7A; James, 2008).
portion of the global area of biotech/GM Since the genesis of commercialization
crops grown by developing countries has in 1996, herbicide tolerance has consist-
increased consistently and by 2007, 43% of ently been the dominant trait (Fig. 12.7B).
the global biotech crop area, equivalent to In 2007, herbicide tolerance, deployed in
49.4 million ha, was grown in developing soybean, maize, canola, cotton and lucerne
countries (Table 12.4). The USA, followed occupied 63% or 72.2 million ha of the glo-
by Argentina, Brazil, Canada, India and bal biotech/GM-crops.
China are the principal adopters of biotech/ The most recent survey of the glo-
GM crops globally, with the USA retaining bal impact of biotech/GM-crops for the
its top world ranking with 57.7 million ha period 19962006, estimates that the
(50% of global biotech area) (Table 12.4). global net economic benefits to biotech/
Notably, 63% of biotech/GM-maize, 78% of GM-crop farmers in 2006 was US$7 billion,
biotech/GM cotton and 37% of all biotech/ and US$34 billion (US$16.5 billion for

Table 12.4. Global area of biotech/GM-crops in 2007 by country (from James (2008) with permission).

Rank Country Area (million hectares) Biotech/GM crops

1a USA 57.7 Soybean, maize, cotton, canola, squash,


papaya, lucerne
2a Argentina 19.1 Soybean, maize, cotton
3a Brazil 15 Soybean, cotton
4a Canada 7 Canola, maize, soybean
5a India 6.2 Cotton
6a China 3.8 Cotton, tomato, poplar, petunia, papaya,
sweet pepper
7a Paraguay 2.6 Soybean
8a South Africa 1.8 Maize, soybean, cotton
9a Uruguay 0.5 Soybean, maize
10a Philippines 0.3 Maize
11a Australia 0.1 Cotton
12a Spain 0.1 Maize
13a Mexico 0.1 Cotton, soybean
14 Colombia < 0.1 Cotton, carnation
15 Chile < 0.1 Maize, soybean, canola
16 France < 0.1 Maize
17 Honduras < 0.1 Maize
18 Czech Republic < 0.1 Maize
19 Portugal < 0.1 Maize
20 Germany < 0.1 Maize
21 Slovakia < 0.1 Maize
22 Romania < 0.1 Maize
23 Poland < 0.1 Maize

Thirteen biotech mega-countries growing 50,000 ha or more of biotech/GM crops.


a
Gene Transfer and GM Plants 495

A 70

60 Soybean
Maize
50
Cotton
Canola
Million ha

40

30

20

10

0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

B 80

70
Herbicide tolerance

60 Insect resistance
Herbicide tolerance/insect resistance
50
Million ha

40

30

20

10

0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Year

Fig. 12.7. Global area of biotech/GM crops (19962007). (A) By crops. (B) By traits. From James (2008)
with permission.

developing countries and US$17.5 billion ing to the net environmental impact of an
for industrial countries) for the accumu- individual active ingredient.
lated benefits during the period 19962006; While 23 countries planted commer-
these estimates include the very important cialized biotech/GM-crops in 2007, an addi-
benefits associated with the double crop- tional 29 countries, totalling 52, have granted
ping of biotech/GM-soybean in Argentina regulatory approvals for biotech/GM-crops
(Brookes and Barfoot, 2008). The accumu- for import for food and feed use and for
lative reduction in pesticides for the period release into the environment since 1996.
19962006 was estimated at 289,000 t of A total of 615 approvals have been granted
active ingredient, which is equivalent to for 124 events for 23 crops. Thus, biotech/
a 15.5% reduction in the associated envi- GM-crops have been accepted for import
ronmental impact of pesticide use on these for food and feed use and for release into
crops, as measured by the Environmental the environment in 29 countries, including
Impact Quotient (EIQ) a composite meas- major food importing countries like Japan,
ure based on the various factors contribut- which do not have plant biotech/GM-crops.
496 Chapter 12

The most important potential contribu- to produce a beneficial trait could also
tion of biotech/GM-crops will be their con- result in unintentionally hazardous effects.
tribution to the humanitarian Millennium Individual transgenic crops could poten-
Development Goals (MDG) of reducing pov- tially present risks to humans or the envi-
erty and hunger by 50% by 2015. With a ronment, although there is no evidence that
dozen years of accumulated knowledge and it has happened over the 10 years of com-
significant economic, environmental and mercialization. In general, a strong, but not
socio-economic benefits, biotech crops are stifling, regulatory system needs to be estab-
poised for even greater growth in coming lished and properly implemented to ensure
years, particularly in developing countries safe crops to humans and the environment.
that have the greatest need for this technology. Other reasons include fraud avoidance and
The number of biotech/GM-crop countries, social, ethical and public concerns.
crops and traits and hectarage are projected
to double between 2006 and 2015, the second Risk assessment
decade of commercialization (James, 2008).
Despite globally organized opposition, All transgenic plants are required to undergo
few innovations in agriculture have spread thorough and vigorous safety and risk assess-
so rapidly as GM-crops. Still, much remains ments before commercialization. A risk
to be done particularly the expansion of assessment consists of hazard identification,
disease-resistant cultivars, increased yields, hazard characterization, exposure assess-
biofortification of food for poor consum- ment and risk characterization (Codex
ers, substitution of plant-produced targeted Alimentarious Commission, 2001; Craig et al.,
endotoxins for broad-band pesticides and, 2008; Nickson, 2008). Regulatory justifica-
perhaps most crucially, drought-tolerant tions for these assessments differ between
and salt-tolerant cultivars (Herring, 2008). countries. In most countries there are two
Disaggregating the concept of the genetically kinds of regulations that govern research
modified organism (GMO) is a necessary and development of transgenic plants:
condition for confronting misconceptions (i) contained-use rules governing genetic
that constrain the use of biotechnology in modification in the laboratory, concentrating
addressing imperatives of development and mainly on worker health and safety issues;
escalating challenges from nature. There and (ii) field-release regulations focusing on
are still several problems associated with environmental risk assessment appropriate
commercialization. There is a high invest- to the nature and final use of the transgenic
ment cost associated with the long lead plant. Each release is considered case by
time (810 years) for products to reach the case to build up experience with particular
marketplace; profitability of some technol- crop and transgene combinations.
ogy push products at the onset of product The United States National Research
development is uncertain; intellectual prop- Council identified four categories of poten-
erty issues limit freedom to operate with key tial environmental hazards from the release
technologies; and uncertainty associated of a transgenic crop:
with regulatory and consumer acceptance
(i) hazards associated with the movement
issues inhibit trade and investment. of the transgene itself with subsequent
expression in a different organism or
species, (ii) hazards associated directly or
indirectly with the transgenic plant as a
12.7.3 Regulating transgenic crops whole, (iii) non-target hazards associated
with the transgene product outside the
There are many reasons why governments plant and (iv) resistance evolution in the
regulate and oversee processes and prod- targeted pest population.
(NRC, 2002)
ucts for transgenic crops (Jaffe, 2004). One
major concern about the use of GM-foods The potential human hazards from trans-
is that the molecular alterations designed genic crops are also generally recognized by
Gene Transfer and GM Plants 497

the scientific and regulatory communities (i) horizontal a process-based system that
(Jaffe, 2004). The potential risks generally applies to all plants produced by transforma-
related to: tion methods, e.g. Europe; and (ii) vertical
a product-based system that defines the
the possibility of introducing new allergens
or toxins into food-plant varieties, the characteristic of modified plants that require
possibility of introducing new allergens into them to be regulated, e.g. the USA.
pollen, or the possibility that previously In the USA, transgenic plants are regulated
unknown protein combinations now by three federal agencies the Department
being produced in food plants will have of Agriculture (USDA), the Environmental
unforeseen secondary or pleiotropic effect. Protection Agency (EPA) and the Food and
(NRC, 2001) Drug Administration (FDA). The USDA
However, not all tests currently being applied controls permits for inter-state movement of
to assessing allergenicity have a sound scien- transgenic materials, assesses the pest char-
tific basis (Goodman et al., 2008). Therefore, acter of transgenic plants, determines when
factors to be borne in mind in the risk assess- transgenic plants can be field-grown without
ment should include, but are not limited notification or permits. The FDA determines
to: (i) the function of the gene in the donor whether a transgenic plant has been ade-
organism; (ii) the effect of the transgene on quately evaluated in accordance with its bio-
the phenotype of the transgenic plant; (iii) technology food and feed policy, e.g. for the
evidence of toxicity and/or allerginicity, e.g. safety of antibiotic selectable marker genes.
Brazil nut seed storage protein; (iv) persist- The EPA regulates plants with pesticide prop-
ence in agricultural habitats (weediness); (v) erties, e.g. Bt plants and registers herbicides to
invasiveness in natural habitats; (vi) impact be used on herbicide-resistant plants.
on non-target organisms, e.g. Bt maize, regu- In the European Union (EU), EU Directive
lation requires extensive analysis to identify 2001/18/EC (European Parliament, 2001)
any potential problem; and (vii) the likeli- sets forth regulations governing the deliber-
hood/consequence of transgene movement ate release into the environment of GMOs.
to other plants by cross-pollination or to The Directive put in place a step-by-step
other (pathogenic) organisms by horizon- approval process on a case-by-case assess-
tal gene transfer, e.g. sexual compatibility ment of the risks to human health and the
between cultivated oats (Avena sativa) and environment before any GMO or products
wild oats (Avena fatua), viral host range consisting of or containing GMOs can be
extension by transgene-encoded coat pro- released into the environment or placed on
tein transencapsidation or genetic recom- the market (European Parliament, 2003).
bination. More recent discussion on several There are no European Community (EC)-
specific issues can be found in Craig et al. wide regulations governing novel feed and
(2008), Nickson (2008) and Romeis et al. foods, and some countries have established
(2008) and Tabashnik et al. (2008). their own national regulations.
The risk assessment process for trans- In Japan, transgenic plants are regulated
genic plants consists of two steps: (i) a com- by the Ministry of Agriculture, Forestry and
parative analysis (substantial equivalence) Fisheries and the Ministry of Health and
to identify potential differences with their Welfare. In Canada, transgenic plants are
non-engineered counterpart(s); followed regulated by Ag. & Agri-Foods Canada and
by (ii) an assessment of the environmental Health Canada. Novel products are regu-
and food/feed safety or nutritional impact lated in the same way whether they are gen-
of any identified differences (Ramessar erated by mutation or transformation.
et al., 2007). There is no international harmoniza-
tion of regulations to ensure that transgenic
Regulatory systems plant cultivars released in one country will
be accepted in another. Antibiotic resist-
There are two kinds of regulation systems ance genes in food products might inhibit
(the data assessment in both is similar): international trade in transgenic products.
498 Chapter 12

12.7.4 Product release and marketing or beside fields containing three kinds of
strategies inherited herbicide resistance, dominant,
recessive, or maternal. Over the 6-year
In order to recover their substantial research study, in the absence of herbicide selection,
investment the developer of a potential the maternal chloroplast-inherited resist-
commercial product must either generate ance was observed at a 2 106 frequency in
and market the transgenic seed directly or the weed populations. Resistant weed plants
negotiate a royalty with a seed company/ were observed 60 times as often, at 1.2 104
companies, e.g. universities, government in the case of the nuclear recessive resistance
agencies, technology development compa- and 190 times as often, at 3.9 104 in the
nies, large agrochemical companies. The case of the dominant resistance. The results
marketing strategies for any one particular indicated that the hereditary mode of trans-
trait will be influenced by the nature of the mission of transgenes played a major role in
product. Herbicides, disease or stress toler- interspecific gene flow. More recently visual
ance traits which enhance yield or reduce markers such as GFP have been proposed for
inputs will help increase market shares use, using whole plant expression to moni-
and/or increase seed sale premiums, e.g. Bt tor gene flow under agricultural conditions.
maize. Herbicide tolerant crops will help This method has been used successfully to
benefit from increased chemical sales, e.g. assess outcrossing events in canola (Brassica
Roundup Ready maize. Improved grain napus) under field conditions (Halfhill et al.,
quality, for on-farm/downstream processing 2004a). A direct method could be the use of
uses, will influence a seed sale premium, GFP-tagged pollen to monitor pollen move-
e.g. high lysine maize. ment under field conditions. This system
would allow the quantification of pollen
flow directly from a group of individuals in
the field and would determine the distance
12.7.5 Monitoring transgenes and directional patterns of pollen dispersal
within a plant population. In Hudson et al.s
One of the principal concerns of GM-crops (2004) report, a pollen specific promoter
is the likelihood and possible consequence was used to express the GFP gene in tobacco
of the introduced transgenes being trans- (Nicotiana tabacum L.). GFP was visualized
ferred through pollen dispersal to wild rela- in pollen and growing pollen tubes using
tives or non-transgenic crops (Chandler and fluorescence microscopy. Furthermore, the
Dunwell, 2008). For pollen-mediated gene goal of the research was to compare the
flow to occur among plant populations, dis- dynamics of pollen movement with that
persal of pollen to a different population of gene flow by using another method of
must occur with successful fertilization of whole plant expression of GFP to estimate
an ovule. Although the movement of pollen the outcrossing rate by progeny analysis.
is a critical step in transgene escape, there Pollen movement and gene flow were quan-
are currently few systems for the direct tified under field conditions. Pollen was col-
monitoring of transgenic pollen movement lected in traps and screened for the presence
under field conditions. Previous attempts to of GFP-tagged pollen using fluorescence
measure gene flow have evolved around the microscopy. Progeny from wild-type plants
analyses of genetic markers (Slatkin, 1985) were screened with a hand-held ultraviolet
as discussed in Chapter 13. These systems light for detection of the GFP phenotype. It
have limitations because they are species- should be noted that the GFP gene is from an
specific, requiring the use of expensive animal or fly and thus it should be handled
assays that hardly yield results in real time very carefully. The examples given here are
or in the field. Shi et al. (2008) reported on only proposals by researchers and they are
the gene flow between foxtail millet (Setaria not used for commercial purposes.
italica), an autogamous crop and its weedy A built-in strategy was developed to
relative, Setaria viridis, growing within create selectively terminable transgenic rice,
Gene Transfer and GM Plants 499

where the transgenic rice plants mixed in Regarding future developments in


the conventional rice could be selectively genetic transformation, new techniques for
eliminated by a spray of bentazon, a herbi- producing transgenic plants will improve
cide commonly used for rice weed control the efficiency of the process and will help
(Lin et al., 2008). The gene(s) of interest is resolve some of the environmental and
tagged with a RNAi cassette, which specifi- health concerns. Among the expected
cally suppresses the expression of the ben- changes are the following: (i) more efficient
tazon detoxification enzyme CYP81A6 and transformation, that is, a higher percentage
thus renders transgenic rice to be sensitive to of plant cells will successfully incorpo-
bentazon. Transgenic rice plants were gener- rate the transgene; (ii) better marker genes
ated by this method using a new glyphosate to replace the use of antibiotic resistance
resistant EPSPS gene from Pesudomonas genes; (iii) better control of gene expression
putida as the gene of interest and it was through more specific promoters, so that the
demonstrated that these transgenic rice inserted gene will be active only when and
plants were highly sensitive to bentazon but where needed; and (iv) transfer of multi-
tolerant to glyphosate, which is exactly the gene DNA fragments to modify more com-
opposite of conventional rice. Field trials plex traits.
of these transgenic rice plants further con-
firmed that they could be selectively killed
at 100% by one spray of bentazon at a reg-
ular dose used for conventional rice weed 12.8 Perspectives
control. Furthermore, it was found that the
terminable transgenic rice created showed Improvement of crop plants through genetic
no difference in growth, development and transformation, also called transgenic breed-
yield compared to its non-transgenic control. ing, is one of the two major approaches in
Therefore, this method of creating transgenic molecular breeding. It can utilize the genes
rice constitutes a novel strategy of transgene from any organisms, by which obstacles
containment, which appears simple, reliable associated with sexual hybridization can
and inexpensive for implementation. be overcome. On the other hand, transgenic
Metabolomics are being developed to breeding provides a quick approach to pyr-
assess the safety of GM-foods (Kuiper et al., amid genes of different sources into one
2003). Because society demands that produ- genetic background. There are numerous
cers demonstrate substantial equivalence examples of transgenic plants with single
between transgenic and non-transgenic crop genes or traits incorporated. Typical exam-
plants, metabolite profiling is expected to ples include GM-crops containing genes for
provide a reliable means of detecting differ- pest and disease resistance and for improved
ences in metabolite levels between the two quality as discussed in Section 12.7.2.
types of plants and identifying potential Examples of GM-crops with multiple traits
problems. that have been improved are Monsantos
Monitoring of transgenic crops has multi-stacked maize as discussed in Section
been a wide concern among the centres of 12.6.1 and an ongoing project in China to
the Consultative Group on International develop green super hybrid rice by stacking
Agricultural Research (CGIAR). However, genes for many traits including insect resist-
decisions, policies and procedures about ance, disease resistance, nutrient efficiency,
monitoring should be science-based and drought tolerance, grain quality and yield
this requires education, an area where the (Zhang, 2007).
CGIAR centres can play an important role. Transformation-based gene transfer
There will be a need to continue to evaluate should be integrated with genomics and
the need for and type of monitoring, as new other molecular approaches. For example,
(and unique) products are developed and functional genomics, discussed in Chapters
released in the emergent economies of the 3 and 11, will bring new frontiers and hori-
world (Hoisington and Ortiz, 2008). zons to transgenic breeding by providing
500 Chapter 12

more genes with well-characterized func- limit many steps in transgenic breeding will
tion and tools optimized for transgene become less demanding compared to the
expression. Molecular markers can be used discovery and characterization of genes and
to facilitate the transformation process, to commercialization of transgenic products.
transfer the transgenes to a different genetic It can be expected that transgenic breed-
background and to identify and select trans- ing will become increasingly important by
genic plants as discussed in Chapter 13. producing good-quality and high-yielding
Genetic transformation will also be increas- agricultural products. All regulatory and
ingly combined with conventional breed- biosafety issues, both of which are man-
ing approaches, which will contribute to made and currently slow or stop the adop-
improved breeding efficiency. tion of transgenic crops by farmers in many
As technology develops in transgenic countries, will be brought under control at a
breeding, transformation technologies that reasonable level.
13
Intellectual Property Rights and Plant Variety Protection

Intellectual property rights (IPR), especially an individual, a group of individuals, an


patents, are become increasingly important organization/company, or a government.
as crop improvement increasingly becomes An exclusive owner of a particular piece of
an industrial process with large private IP may choose to do nothing with it; use or
sector investments driven by an expectation practise it directly; or sell, license or gift it
of high economic returns. Biotechnological to others (Boyd, 1996). For a detailed cov-
inventions, particularly in the field of agri- erage of IPR, please refer to Krattiger et al.
biotechnology, are increasing with the (2006).
worldwide expansion of the cultivation of As has been chronicled extensively
transgenic plants and the increasing reli- elsewhere, patents covering modified liv-
ance of commercial breeding programmes ing organisms were first approved in the US
on genomics tools such as marker-assisted Supreme Court in 1980. Since then, IPR cov-
selection (MAS). Proprietary control over ering organisms and/or their components
many new breeding technologies and plant have become commonplace in many coun-
variety protection (PVP) of the products of tries and expanded internationally through
crop improvement is increasingly influenc- treaties such as the International Convention
ing the nature and extent of public funding for the Protection of New Varieties of Plants
available for applied research in this area. (known by its French acronym UPOV) and
Genomics tools also provide a means for the World Trade Organizations (WTO) 1994
DNA fingerprinting plants in order to obtain Agreement on Trade-Related Aspects of
information on the relationship between Intellectual Property Rights (TRIPS). This
new breeding lines and commercial cul- chapter is devoted to various aspects of
tivars which can be used in the protec- PVP including its needs, impacts, strate-
tion of plant cultivars once they have been gies and related international agreements.
registered. It also covers IPR that affect development
Intellectual property (IP) can take the and application of molecular breeding tech-
form of genes, markers, technological pro- niques in crop improvement. Publications
cesses, information and concepts and in that serve as key references for this chapter
some countries even whole plants. Major include Heitz (1998), the proceedings from
criteria for granting patents require that the a seminar on the use of molecular tech-
IP be novel and non-obvious, useful and not niques for plant variety protection organ-
previously disclosed. A particular piece of ized by Canadian Food Inspection Agency
IP may be owned exclusively or jointly by and National Forum on Seed (CFIA/NFS,

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 501


502 Chapter 13

2005), Chan (2006), Louwaars et al. (2006), to realize that IPR are only enforceable
a publication by The International Bank for within the national territory within which
Reconstruction and Development and The they have been registered. Moreover, the
World Bank that is cited as IBRD/World level and nature of enforcement of IPR
Bank (2006), Tripp et al. (2006) and Henson- laws varies considerably across geographi-
Apollonio (2007). cal regions. These factors significantly and
differentially influence the product devel-
opment and deployment strategies of com-
mercial companies operating in different
13.1 Intellectual Property and Plant countries.
Breeders Rights

13.1.1 Basic aspects of intellectual


property 13.1.2 Intellectual property rights
in plant breeding
As a specialized area of law, IPR covers
all things which emanate from the exer- Various attempts have been made to estab-
cise of the human brain (Walden, 1998). lish IPR systems for crop cultivars, but the
Classically IPR have been divided into two concept of PVP and the possibility of plant
groups: (i) industrial property (patents for patents emerged just under a century ago.
inventions, trade secrets, trademarks, spe- Different IPR are now available to the plant
cial rights for integrated circuits, etc.); and breeding sector: PBRs, patents, trade secrets
(ii) literary and artistic property (copyright, and trademarks are already highly import-
rights of performers, etc.; Walden, 1998; ant. In addition, copyright and database
Ngetich, 2005) although molecular breed- protection are likely to play an increas-
ing programmes may pursue rights under ingly important role in molecular breeding.
either group. National governments have Very few countries (e.g. the USA and Japan)
established IPR laws in order to achieve offer patent protection to cultivars so most
several goals: (i) to create incentives to plant breeders must still rely on PBRs for
stimulate new technological advances by conventionally bred material. However, the
providing mechanisms to ensure capture increasing range of biotechnology interven-
of financial returns on investments; (ii) to tions used in plant breeding can be pat-
reward inventors with exclusive rights for ented, thus providing increasing scope for
a certain period of time thereby ensuring use of these patents in protecting new culti-
that others do not capture financial returns vars (Louwaars et al., 2006).
on the inventors investments; and (iii) to An IPR regime in plant breeding should
create avenues for public disclosure of new perform two basic roles (IBRD/World Bank,
technologies, which provides stimulus for 2006). First, in the interest of the public, the
further advances. In nearly all national legal IPR regime should ensure that knowledge
systems, individuals and companies are and materials enter the public domain at the
usually able to acquire IPR, protect the eco- earliest possible point and it should stimu-
nomic returns on their investment and sell, late improvements and innovations that
lease or license those IPR to third parties. increase the choices available to farmers and
National IPR laws vary in terms of the consumers. Secondly, in the interest of the
specific details of registration, scope and rights-holder, the IPR regime should provide
duration of protection. General IPR laws opportunities for breeders to recover their
are often considered inadequate for certain investments, which may include the rights
industrial sectors, leading to the develop- to recover royalties from farmers who save
ment of so called sui generis systems to seed for planting the next season, directly
protect, for example, integrated computer for themselves or for sharing with neigh-
circuits, databases and plant cultivars (e.g. bours through informal sale of the seed.
plant breeders rights, PBR). It is important In addition, the IPR regime should keep
IPR and PVP 503

competing commercial seed producers from nologies has brought additional challenges
multiplying and marketing the protected for the application of IPR in plant breeding.
cultivar without a licence. Many breeding Analysis of the historical seed-saving
companies would like to keep competing practices of soybean farmers in the USA
plant breeders from using a protected cul- indicates that large US farms have consist-
tivar or technology in the development of a ently saved seed as much as 60% in some
new cultivar. In this case, they must use the years. However, with the introduction of
patent system as PBRs have been explicitly Roundup Ready soybeans the nature of
structured to encourage rather than exclude seed saving was drastically changed. The
this type of activity. The degree to which combination of an expanding array of IPR
IPR and PVP systems in developing coun- on technologies used in the development of
tries are able to limit practices depends on new breeding materials, new genetically
economic, administrative and political fac- modified (GM) technologies that in some
tors (Tripp et al., 2006). A general prohibi- countries have led to whole plant patents
tion on saving seed of protected cultivars and the increasing application of indus-
is an unlikely strategy in most developing trial concepts to plant breeding has brought
countries. Although coordinated systems huge new private sector investments to
of cleaning and dressing farmer-saved seed plant breeding that are dramatically chang-
while collecting royalties have been suc- ing the nature of the business worldwide
cessful in some Organisation for Economic (Mascarenhas and Busch, 2006).
Co-operation and Development (OECD) Not only do some countries allow the
countries. use of patents to protect plants, cultivars
Crop cultivars present several important and genes, but the majority of the tools
challenges for an IPR system (IBRD/World and processes of molecular biology and
Bank, 2006). First, they are biological prod- genetic transformation can be patented as
ucts that are easily reproduced and whose well. Many of the biotechnology techniques,
very use entails multiplication. Secondly, which are becoming increasingly important
the users (and potential copiers) of the in conventional plant breeding, are also
technology are millions of individual farm- protected, thereby raising implications for
ers whose compliance with any protection the ownership of any cultivar resulting from
regime is difficult and expensive to moni- their use. In addition, because biotechnology
tor, particularly in developing countries. allows a much more precise understanding
Thirdly, the agricultural sector involves of the genetic make-up of any crop cultivar,
cultural values and food security issues that it opens the door to sophisticated screening
in many countries affect the livelihoods and reverse engineering techniques, which
and even potential survival of the rural in turns offer new possibilities for utilizing
poor, making the imposition of any controls protected cultivars, leading to pressure for
a sensitive political issue. Fourthly, the more stringent protection.
inherent diversity of crop cultivars makes Although a range of attempts have
it difficult to apply the narrow technical been made to provide a set of IPR for crop
criteria of novelty and reproducibility used cultivars, only within the last few decades
in the conventional patent system, whereas has a mechanism for PVP firmly taken hold
the use of standard breeding methodologies in industrialized countries. International
may frustrate the application of the inven- treaties such as UPOV and TRIPS, are
tive step criterion. Fifthly, the development attempting to establish common features
of new crop cultivars has always relied to of certain IPR although most develop-
some extent on public research, partly in ing country signatories are slow to ratify
response to the traditional public goods and implement these agreements through
nature of crop-related biodiversity. Thus, their national laws. One of the most con-
the application of IPR to the products of a troversial of these features is contained in
publicly funded endeavour can be problem- Article 27.3(b) of the TRIPS Agreement,
atic. Sixthly, the increasing use of biotech- which requires all WTO member states to
504 Chapter 13

provide IP protection, through patents or of the 1040 applications while Australia


an effective sui generis system or both, for approved about 30% of the 399 applica-
crop cultivars. tions (Chan, 2006).
Plant breeding and crop biotechnology
product development is a long-term proc-
ess. It usually takes 712 years to develop a
13.2 Plant Variety Protection: Needs
product from concept initiation to deliver-
and Impacts ing a product to the market (Fig. 13.1), where
this involves MAS or use of a transgene in
13.2.1 Needs for protection of crop cultivar development. Initial advances in
cultivars biotechnology were heralded as offering
substantial savings in time to bring new
The agriculture and biotechnology sector as products to market. However, as the dis-
a whole has become a huge business with cipline has matured, the added value of
very large research and development (R&D) the products possible through use of bio-
investments. For example, Syngenta has technology tools has become increasingly
total sales in these areas of US$6340 mil- important. Thus, the creation of a new crop
lion in 2004, while its total R&D investment cultivar continues to require a substantial
was US$1738 billion, which means about investment in terms of skills, time, labour,
27.4% of the total sale was used for R&D. resources and finances. The value of bio-
This compares with about 10% of profits technology investments for commercial
reinvested into research for other crop sci- agriculture, e.g. in OECD countries, now
ence companies such as Monsanto (9.4%), seems to be well established. However, the
BASF (8%), Pioneer Hi-Bred (10.9%), Bayer great remaining challenge is to apply these
CropScience (11.4%) and Dow AgroSciences advances to the development of the new
(9.9%). This R&D investment in 2004 ranged crop cultivars that are an essential tool for
from US$350 million (Dow AgroSciences) to the improvements in sustainable agricul-
US$926M (Bayer CropScience), compared ture and food security in developing coun-
to US$428 million for the entire multidis- tries. A new cultivar, once released to the
ciplinary R&D budget across all Consulta- market place, can in many cases be read-
tive Group on International Agricultural ily reproduced by others thereby depriv-
Research (CGIAR) research centres in 2004, ing the original breeders of recovering the
of which only 510% is spent on biotech- full profit on the investment. The granting
nology (Spielman et al., 2006). of exclusive rights to a breeder of a new
Clearly, the CGIAR and many other pub- cultivar encourages future investments in
lic sector research organizations have little plant breeding while the ability for other
hope of competing with the private-sector breeders to use that new cultivar in their
agrobiotechnology investments. Instead, breeding programmes contributes to the
the public sector should strengthen their IP development of agriculture, horticulture
management capabilities in order to estab- and forestry. However, as companies make
lish effective private sector partnerships increasing investments in specific traits or
that capitalize on these private sector invest- processes there is an increasing pressure for
ments and generate synergy with their own them to protect these investments through
product development programmes. Seeking patents.
patents is the most frequently used approach Improved cultivars developed by breed-
in IP protection. For example, Pioneer alone ers routinely replace old cultivars because
submitted 463 applications to the USA and they provide higher yield, better quality
176 to Australia in fiscal year 19992000. and stronger adaptability to ever-changing
Considering all submitted patent applica- environments and market demands. The
tions, the percentage of patent applications new cultivar is available on the market at
that were granted by the end of 2004 varied a cost that is broadly similar to the cost of
greatly by country: the USA granted 81.4% the cultivar(s) from the previous generation
IPR and PVP 505

Market
Discovery
introduction

Product Gene Transformation GH and field Line Variety Post


Production Market
concept discovery or MAS evaluation selection development market

Selection of line(s) Detailed risk Appropriate


Definition of the trait with appropriate assessment for product
Choice of genes
Available technology and

characteristics regulatory stewardship


Source of genes
further improvement

review
Stringent agronomic Product
performance and efficacy Food performance
criteria
Feed Investigate
Decision and Greater than 90% of all complaints
actions here can events are eliminated Fuel
have long-term Support of
and late Based in part on methods Environmental academic
consequences used to evaluate research into
conventioanl varieties applications
through traditional breeding

Biotechnology product development process with projected time: 712 years

Fig. 13.1. Steps involved in crop biotechnology product development using transformation or
marker-assisted selection (MAS). GH, greenhouse.

and thus the marginal cost to the farmer recognition of these difficulties is one of
of shifting cultivar is low. The experience the main reasons for the interest shown by
of Argentina shows that the advent of pro- many developing countries in PVP: PVP is
tected cultivars did not increase the price therefore an indispensable element of the
of the seed: breeders and seed producers, new seed policies that give an important
to remain competitive on the seed market role to both public and private-sector plant
with unprotected cultivars, rationalized breeding.
the production of and trade with their seed
and the royalty was taken from the savings. The exercise of the breeders rights
However, the situation is not always compa-
rable in developing countries where many The way in which plant breeders choose
farmers may be currently obtaining seed to exercise their right depends upon many
through informal systems and are thus faced factors and the scope of the protection con-
with significant initial investment require- ferred to them is only one of the options on
ments in order to shift to a new cultivar. hand. The breeder of seed crops will seek
Many countries have strong invest- to organize the production of commercial
ments in public research of relevance to (certified) seed in a fairly loose manner and
plant breeders. But experience shows that seek to collect a royalty at each multiplica-
this approach is not enough: public funds tion stage (to spread the risks); he will apply
cannot adequately cover the needs of every a very open licence policy. The breeder of
crop, every agroclimatic zone, every market an ornamental plant will seek to organize
preference, etc. In addition, the interaction the production and sale of cut flowers and
between strategic research and product not just propagating material.
development, between public plant breed- Heitz (1998) gave two reasons why
ing and private-sector product deployment, economic theories and constructions based
is frequently deficient (Heitz, 1998). The upon the notion of monopoly are totally
506 Chapter 13

inappropriate in the case of crop cultivars: Given the declining public funding of agri-
(i) breeders are bound to associate with oth- cultural research in many countries, rev-
ers in effect partners to exploit their enue generation is an attractive option for
cultivars; the success of a particular culti- many public institutions. Income from IPR
var and of the commercial strategy of its can support the institution to cover opera-
breeder is the result of many individual tional costs or hire additional staff and
decisions; and (ii) the breeder of protected provides managers with a financial tool to
cultivars is almost always bound to com- support particularly innovative researchers
pete with other breeders and their cultivars. or research groups. Public cultivars can gen-
Another relevant factor is the existence and erate a ready income, especially if cultivars
scope of a farmers privilege (the need to bred in the past can be protected.
make commercial seed competitive with
farm-saved seed).
13.2.2 Impacts of plant variety
The derived benefits protection
There are three main reasons for national
agricultural research institutes (NARIs) to Plant breeding
embrace IPR: recognition, technology access
MAINTENANCE BREEDING. The PVP system is
and transfer and revenue (Louwaars et al.,
not only there to encourage creative plant
2006). In commercial breeding, the last reason
breeding activities. The full benefit of a new
prevails; IPR create additional value for the
and improved cultivar can only be drawn
crop cultivar by providing a legal basis for
if, first of all, the cultivar is properly main-
licence contracts between the breeder and
tained and, secondly, if authentic propagat-
seed producers, which commonly includes a
ing material of the cultivar is made available
royalty payment that serves as an important
to users. The PVP system ensures that the
tool to recoup the investment in research.
breeder has a lasting interest in ensuring
In public research, however, cultivar devel-
these activities (at least as long as the culti-
opment is funded from public sources and
var is commercially successful).
research managers tend to put some empha-
sis on other research objectives. IPR formally
link the cultivar to the institute and individ- GENETIC DIVERSITY. The increased number
ual breeders. Furthermore, IPR may facilitate of breeding programmes which enter into
seed production when only an exclusive competition, if this happens, implies a
market will entice an individual seed pro- diversification of the programmes that
ducer to take a new cultivar into its product result an increasing probability of obtaining
range (to facilitate technology transfer) and superior and genetically diverse cultivars.
technology may be more easily acquired if This scenario provides a strong counterbal-
patents can be traded. ance to the trend for uniformity that may be
The direct effect of PVP is to promote generated by the market demand in prod-
plant breeding. All countries which have ucts (Heitz, 1998). However, the trend in the
become a member of UPOV and whose seed market for a few to be private provid-
agricultural sector is of a size that justifies ers of bred-seed worldwide contributes to a
investments in plant breeding have reported few crop cultivars that dominate the market,
increases in the volume of plant breed- reducing genetic diversity. The pressure on
ing activities in developing countries with the natural ecosystems can be lessened
direct effects on their agriculture. However, through: (i) providing uniformity for use of
the merges and acquisitions keep reduc- a cultivar in single fields (to allow rational
ing the number of players in the seed sec- and efficient production) but diversity for
tor in developed countries. The others have use across fields; and (ii) contributing to
reported increases in the assortment of cul- the widespread use of a relatively narrow,
tivars made available by foreign breeders. but improved, gene pool (to maximize
IPR and PVP 507

agricultural production) and to the preser- activities, in particular comparative trials


vation of the large gene pool that serves as a and provide farmers with detailed technical
genetic reservoir for further progress. information on crop management.
International trade becomes even more
PUBLIC RESEARCH. PVP also benefits public important with the extension of inter-
breeding. Public institutions use the system national trade and the reinforcement of IPR.
to generate income and optimize the exploi- For example, a producer of a tropical fruit
tation of their cultivars. The income can be may be refused access to the consumer mar-
used as an argument to resist cutbacks in ket on the basis of the breeders right granted
their programmes. In addition, PVP helps in the country concerned. Conversely, the
to organize optimal distribution of the tasks producer operating on the basis of a licence
between the various partners, for example, concluded under the breeders right valid
by sharing the tasks in an organized man- in the production country will secure, in
ner and handling the competition between the licence agreement, the right to enter the
public and private breeding. consumer market (Heitz, 1998). Subsidies,
UPOV (2005) provided a report that as measures taken by national governments
stated the introduction of the UPOV sys- to protect local farmers from market forces
tem of PVP and membership of the UPOV or to ensure abundant availability of agri-
can open a door to economic development, cultural products, will affect the trade of
particularly in the rural sector as shown other nations by artificially interfering with
by their individual research in Argentina, the global markets. The issue of PVP gen-
China, Kenya, Poland and the Republic of erally mirrors the debate on trade barriers
Korea. This demonstrated that PVP can pro- created from lack of IP rights. That is, lack
duce benefits in a range of ways that differ of IPR for innovations in plant breeding
from country to country reflecting the spe- results in depriving the property holder of
cific circumstances of each. Hence, a key rightful royalties to which they would be
conclusion is that the UPOV system of PVP entitled to otherwise (Ragavan, 2006). The
provides an effective incentive for plant TRIPS Agreement was signed in 1994 to
breeding in many different situations and address the wider issue of facilitating inter-
in various sectors, resulting in the develop- national trade that was affected by the lack
ment of new, improved cultivars of benefit of IP protection in some countries. While
to farmers, growers and consumers. introducing PBR provides one facet of the
analysis concerning the reduction of distor-
Agricultural production and trade tions to trade, the discussion on subsidies
provides a contextual understanding of all
Plant breeders draw their revenue, at least barriers, including those that are unrelated
in the case of the major crops, from the trade to IP, affecting agricultural trade.
in seeds and plant material by choosing seri-
ous partners and driving less serious people Transfer of technology and know-how
out of the market. In many countries, breed-
ers have created licensing societies, so that PVP plays an essential role in the introduc-
seed production is also organized on a fairly tion of foreign cultivars and other novel
uniform basis across cultivars and species genetic materials (and the associated tech-
(Heitz, 1998). Seed markets respond to the nology) that enriches the assortment avail-
changes because of the involvement of plant able to the farmer. It also contributes to the
breeders and their products, cultivars that improvement of agricultural production,
are protected by IPR. As a primary user, the both in permitting such introduction and in
farmer does not just receive genetic poten- speeding up the introduction.
tial in a high quality seed. In the highly Foreign plant breeders also take a
competitive markets, breeders also offer more direct interest in breeding activi-
fringe benefits, for example in the form of ties in specific regions, in particular by
crop insurance. They undertake extension breeding plants for a specific environment
508 Chapter 13

so that the resulting cultivars grow better The shift to commercial crops and
in that environment. This is done both farmers may be consistent with recent
through the organization of subsidiaries changes in national agricultural policy and
and through partnership or licensing trends of commercialization of public enti-
agreements. In both cases there is a flow of ties. In some countries, however, the public
technology and know-how, in both direc- task of a NARI is to support both equity and
tions, as subsidiaries have to rely on the national agricultural production. The trend
local seed trade. towards crop diversification and breeding
for low-input agriculture, which means bet-
Breeding strategies ter yield stability, may be reversed when
NARIs focus on using IPR for revenue gener-
Introducing the concept of revenue gen- ation. Another strategy of a NARI may be to
eration in public plant breeding is likely to secure a choice of cultivars for farmers in a
have an impact on the distribution of funds market that may otherwise be dominated by
within the NARI and on the breeding strat- large commercial companies owing to IPR.
egies applied. Louwaars et al. (2006) dis- However, this latter option also may shift
cussed the impact of IPR on plant breeding research priorities away from smallholder
strategies which is summarized as follows. farmers needs (Louwaars et al., 2006). Policy
First, IPR can be generated in plant makers and research managers need to care-
breeding relatively easily compared with fully consider the impact of the use of IPR
other agricultural research undertakings. As in public breeding before including protec-
a result, the pursuit of revenue could lead to tion in their research strategies. If national
important disciplines such as soil science, public organizations are not supposed to
socio-economics and plant pathology being protect their inventions, governments will
marginalized or downgraded to supporting have to provide the necessary funds for
only breeding efforts. their research.
A second possible impact is that funds
will be distributed more to crops with a high NARI organizations
value in seed production. These include,
in general, crops that are produced for the Louwaars et al. (2006) discussed how PVP-
market (where investment in seed is com- related issues would greatly affect the
mon), which are difficult to reproduce on- NARI organizations. When a NARI intends
farm (e.g. cross-pollinated crops) and that to commercialize its cultivars using IPR,
have a low seed rate. In practical terms, this it must realize that the right holders are
means that maize-breeding programmes will responsible for implementing their rights
get priority over those for open-pollinated and that the NARI needs capacities to
small grains, most pulses and root crops. design commercialization strategies and
The latter crops, however, may be impor- licence contracts, as well as to follow up
tant for the nutrition security of most of the on these contracts. In addition, research
population. managers have to be aware that there are
The third level of impact is within many costs involved in IP protection such
breeding programmes themselves, where as those for additional personnel, IPR acqui-
researchers have to choose which ecologi- sition, implementation, application and
cal areas or client groups to target. Revenue maintenance fees. Commercial decisions
generation will focus breeding on com- have to be made on which rights to apply
mercial farmers and hybrids rather than on for and when to surrender them. A signifi-
resource-poor farmers who cannot afford cant cost can arise when the rights have to
to buy hybrid seed and they have to use be defended, especially against experienced
open-pollinated cultivars instead. In the lat- negotiators of commercial companies with
ter situation, the seed industry is unlikely significant resources.
to generate profits and pay royalties to the While crop cultivars are in almost
breeder. all cases freely used as parents in further
IPR and PVP 509

breeding, this is not the case in patented Another challenge IARCs face is to get
biotechnologies. Hence, NARIs will need access to protected technologies without
to develop ways and means to observe diminishing their primary task of poverty
rights on technologies and materials that alleviation. Materials and tools may not
they use in breeding. Most countries be used in research if the products cannot
have a fairly liberal research exemption be made available to the target groups (i.e.
in their patent laws. This situation com- the resource poor) without restrictions.
monly leads to a licence contract in which Humanitarian licences and cooperation
the patent holder can specify the uses, agreements should at least contain such
the ways of commercialization and ben- provisions.
efit sharing (royalty payment) (Louwaars A less debated result of the spread of
et al., 2006). A NARI needs therefore to IPR on IARCs is the impact of the com-
identify possible risks associated with the mercialization of some NARIs on the cap-
use of patented technologies. An IP plan abilities of IARCs to reach the resource poor
needs to be developed for each project, in (Louwaars et al., 2006). A NARI that will
which it is decided when and how con- concentrate their strategy on revenue gen-
tact will be established with the technol- eration through IPR and thus move away
ogy provider. from producing solutions for resource-poor
The introduction of IPR brings new farmers in favour of commercial produc-
tasks and responsibilities to the NARI. It tion may not always be suitable partners of
requires not just access to lawyers, IP spe- IARCs for reaching the poor. The latter may
cialists, negotiators and marketers, but more need to look for other ways, for example,
importantly, it calls for a shift in culture through non-governmental organizations
among the researchers. All researchers will and in some cases, direct contacts with seed
have to be aware of the potential impact of producers. All the IARCs have IPR policies,
IPR on their work, when they commonly although most of these are still subject to
prefer to concentrate on their own science adjustment and elaboration. The increased
and not be bothered by administrative use of IPR has caused IARCs to re-evaluate
rules. Senior management will have to lead their modes of interaction with both NARI
the way in this gradual shift, assisted by organizations and seed companies. Various
well-designed capacity-building initiatives approaches have been taken to ensure that
and support systems. NARI germplasm reaches the farmers for
whom it is intended.
International agricultural research

The same considerations are important in


international agricultural research (Louwaars 13.3 International Agreements
et al., 2006). Strategies for protecting inven- Affecting Plant Breeding
tions by the International Agricultural
Research Centres (IARCs) concentrate on The international agreements related to
the technology transfer argument on the one regulatory systems that affect plant breed-
hand and the original objective to develop ing include the TRIPS, the Convention on
international public goods on the other. Biological Diversity (CBD), the International
Several IARCs are developing agribusiness Treaty on Plant Genetic Resources for
parks or other mechanisms to link them Food and Agriculture and the discussions
directly with the private sector to provide in the Intergovernmental Committee on
additional routes for technology transfer. Intellectual Property and Genetic Resources,
There are, however, also cases in which an Traditional Knowledge and Folklore of the
IARC may obtain research funds for develop- World Intellectual Property Organization
ing cultivars jointly with the private sector (WIPO). In addition, the development of a
or obtain royalties on the commercialization Substantive Patent Law Treaty (SPLT) as dis-
of such cultivars. cussed within WIPO is likely to reduce the
510 Chapter 13

current flexibility to protect plant cultivars (i) to clarify certain provisions in the light of
(Wall Tvet, 2005). These agreements that the experience of the UPOV member states
affect plant breeding are shown in Fig. 13.2. in operating the Convention since 1961;
(ii) to strengthen the protection offered to
plant breeders in certain specific ways; and
(iii) to reflect technological changes. The
13.3.1 The UPOV Convention and UPOV rights defined under UPOV are known as
plant variety protection (PVP). The UPOV
After decades of attempts to obtain patent system is considered as the most straight-
protection for their achievements, plant forward choice for countries wishing to
breeders, together with a segment of the IP comply with the TRIPS Agreement. The
specialists, requested that consideration be UPOV Convention is the only model for a
given to a specially designed protection sys- PVP system. It is not only an IP treaty, but
tem. The request was taken up by the French also an instrument in the field of agricul-
government, through the conferences and tural policies.
meetings it hosted between 1957 and 1961, The breeder is defined in the 1991 Act
leading to the signing on 2 December 1961 of the UPOV Convention as the person who
of the International Convention for the bred, or discovered and developed, a vari-
Protection of New Varieties of Plants (also ety (cultivar in this book). Protection has
known as the UPOV Convention). thus to be afforded not only where a cul-
The UPOV system revised in 1972, tivar has originated from breeding in the
1978 and 1991 has gradually strength- somewhat restricted sense of crossing par-
ened the rights of plant breeders. The last ent plants and selecting from within the
revision, 30 years after the initial adoption, progeny, but also where a person identi-
was substantial. The revisions were made: fies a mutation or a variation, of known or

Food and
Agriculture
Convention on World Trade World Intellectual
Organization of
Biological Diversity Organization Property Organization
the United
(CBD) (WTO) (WIPO)
Nations
(FAO)

Intergovernmental
International
Patent Committee on
Trade-Related Treaty on Plant
Cooperation Intellectual Property
Aspects of Genetic
Access and Cartagena Treaty (PCT), and Genetic
Intellectual Resources for
benefit sharing Protocol Substantive Resources,
Property Rights Food and
Patent Law Treaty Traditional
(TRIPS) Agriculture
(SPLT) Knowledge, and
(IT PGRFA)
Folklore

Breeders right,
Traditional Facilitated
Living modified patents, Harmonization of
Genetic resources knowledge, genetic access,
organisms trademarks, trade IPRs
resources, folklore farmers rights
secrets

Breeders

Fig. 13.2. International agreements that affect plant breeding. From IBRD/World Bank (2006) The
World Bank 2006.
IPR and PVP 511

unknown origin, in existing plant material UPOV provides protocols for assessing
and ensures that the mutation or variation and describing the unique characteristics of
is isolated and propagated as a new cultivar a new cultivar, ensuring that it is distinct,
(Heitz, 1998). uniform and stable (DUS). These standards
The UPOV Convention has been very are adapted to the mode of reproduction of
successful in impressing upon the crop cul- the protected species: cross-fertilizing crops
tivars and seed sector, in particular in the admit a wider tolerance than the relatively
UPOV member states, a notion of variety strict requirements for uniformity in vegeta-
(cultivar) that, from the technical point of tively propagated crops. Any cultivar that
view, is identical with protectable variety. fulfils the DUS criteria and that is new (in
According to Article 1(vi) of the 1991 Act of the market) is eligible for protection and
the UPOV Convention, a variety basically there is no need to demonstrate an inventive
is a plant grouping that meets the conditions step or industrial application, as required
of distinctness, uniformity and stability, but under a patent regime. A DUS examina-
not necessarily to the degree required for tion involves growing the candidate culti-
protection. var together with the most similar cultivars
as per common knowledge, usually for at
least two seasons and recording a compre-
Distinctness, uniformity and stability (DUS) hensive set of morphological (and in some
cases agronomic) descriptors (IBRD/World
There are five required conditions for
Bank, 2006).
protection.
Characteristics are used to assess DUS
1. Novelty The cultivar to be protected and include descriptors such as flower col-
must not have been the subject of commer- our, or leaf shape. A characteristic must
cial acts before certain dates determined on meet a number of basic requirements for it
the basis of the date of application. to be used for DUS testing or for producing
2. Distinctness (Article 7): The variety a cultivar description. The characteristic
shall be deemed to be distinct if it is clearly must:
distinguishable from any other variety whose
1. Result from a given genotype or combi-
existence is a matter of common knowledge
nation of genotypes.
at the time of the filing of the application.
2. Be sufficiently consistent and repeatable
Distinctness is established on the basis of
in a particular environment.
individual characteristics (descriptors in
3. Exhibit sufficient variation between cul-
genetic resources parlance) that are botani-
tivars to be able to establish distinctness.
cal in nature and are not necessarily related
4. Be capable of precise definition and
to the agricultural or technological proper-
recognition.
ties or value of the cultivar.
5. Allow uniformity and stability require-
3. Uniformity (or homogeneity) (Article
ments to be fulfilled.
8): The variety shall be deemed to be uni-
form if, subject to the variation that may be Characteristics may have direct com-
expected from the particular features of its mercial relevance or no commercial rele-
propagation, it is sufficiently uniform in its vance. For example, using the criteria above
relevant characteristics. may eliminate some commercially import-
4. Stability (Article 9): The variety shall ant traits, for example, yield. Chemical con-
be deemed to be stable if its relevant char- stituents may be acceptable characteristics,
acteristics remain unchanged after repeated provided they meet the criteria. It is impor-
propagation or, in the case of a particular tant that characteristics based on chemical
cycle of propagation, at the end of each such constituents be well defined and supported
cycle. by an appropriate method for examination.
5. Denomination The cultivar must be UPOV test guidelines have been devel-
given a denomination under which it will oped for individual species or cultivar
be commercialized. groupings to provide guidance related to
512 Chapter 13

growing cycles, number of plants, material ing since it suffices to add yet another gene
to be tested, or characteristics to be exam- to escape the protection of the cultivar taken
ined. The DUS test may be undertaken as host for that gene). The concept of essen-
directly by the authority of the UPOV mem- tially derived variety (EDV) embodied in
ber, by a party designated by the author- Article 14(5) of the 1991 Act of the UPOV
ity (e.g. an institute, the breeder), or the Convention is designed to ensure that the
authority may take into account the results Convention continues to provide an ade-
from previous tests or trials conducted by, quate incentive for plant breeding. Under
for example, other UPOV members. There that Article, a cultivar that is essentially
can be therefore a high level of coopera- derived from a protected cultivar may be the
tion in DUS testing, including, for example, subject of protection (if it fulfils the normal
the purchase of DUS test reports, bilateral protection criteria of DUS and novelty), but
arrangements to avoid duplication of testing cannot be exploited without the authoriza-
and centralized DUS testing at regional or tion of the breeder of the protected cultivar.
global levels. Cooperation between authori- For practical purposes, cultivars will only be
ties can minimize the time for DUS testing, essentially derived when they are developed
minimize costs and optimize examination in such a way that they retain virtually the
of characteristics in growing trials. whole genetic structure of the earlier variety.

Essentially derived varieties/cultivars Farmers privilege

Under the 1978 Act of the UPOV Convention, The most prominent issues in the sui generis
any protected cultivar may be freely used as systems involve the so-called farmers priv-
a source of initial variation to develop fur- ilege and breeders exemption. The trad-
ther cultivars. Any such cultivar may itself itional right of farmers to save seed from
be protected and, what is more important, their harvests to plant the following season
exploited without any obligation on the part is an important aspect of sui generis systems
of its breeder and users towards the breeder and is one of the most contentious aspects
of the cultivar that was used as a source of of IPR in plant breeding. Although this prac-
initial variation. These rules have with cer- tice is often described as a farmers right,
tain exceptions worked well in practice and it is referred to here by the UPOV term of
have been reaffirmed in the 1991 Act. farmers privilege to distinguish it from the
However, the rules did not prevent a broader concept of farmers rights.
person finding a mutation within a crop cul- The 1978 UPOV Convention assumed
tivar (such mutations are for a few traits in that farmers were permitted to save and reuse
some species), or selecting some other minor seed of protected cultivars as part of private
variant from within a cultivar, from exploit- and non-commercial use. However, Article
ing the mutant or variant with no authoriza- 15(2) of the 1991 UPOV Convention rules that
tion from, or recognition of the contribution on-farm seed saving is not permitted without
of, the original breeder to the final result. the consent of the breeder, although it allows
The lack of recognition of that contribu- member states to specify crops for which the
tion in such circumstances was generally use of farm-saved seed is permitted, taking
considered to be improper (Heitz, 1998). into account the legitimate interests of the
Modern biotechnology has greatly increased breeder. In the European Union (EU), this
the likelihood of such situations; it may take provision is interpreted as the right of small-
12 years to develop a new cultivar but a mere holder farmers to save seed for specific crops
3 months to modify it by adding a transgene and the right of the breeder to collect royalties
or genes introduced through genetic engi- on farm-saved seed used on larger farms. The
neering in the laboratory. 1991 Convention also prohibits any transfer
This situation indeed can be a disincen- of seed of protected cultivars (through sale,
tive to the continued pursuit of classical barter or gift) between farmers. Utility patents
plant breeding (and also of genetic engineer- on plant cultivars are even more rigid and a
IPR and PVP 513

patented cultivar normally cannot be saved 4. Selling or other marketing.


for subsequent use as seed on the farm or 5. Exporting.
traded or exchanged with other farmers. 6. Importing.
Various interpretations of farmers 7. Stocking for any of these purposes.
privilege have favoured the adoption of
laws based on the more liberal 1978 UPOV The purpose of this more detailed enumera-
Convention in many developing countries. tion is not so much to give a more exten-
In most cases in these countries, restrictions sive right to the breeder, than to give him or
on saving seed of food crops on the farm are her a more effective one. Conditioning, for
neither administratively feasible nor politi- instance, is essentially one (technical) step
cally acceptable. Making the transfer of seed of seed or plant production.
from farmer to farmer illegal is widely con- The right applies to two classes of mate-
sidered incompatible with the traditions of rial to which such acts must relate and one
small-scale farming. class to which they may relate: (i) the propa-
The issue of seed saving is a good exam- gating material; (ii) the harvested material
ple of how IPR in plant breeding must be (including whole plants and parts of plants),
tailored to the conditions of national seed provided this has been obtained through the
systems. Even within a single country, the unauthorized use of propagating material and
requirements and conditions of different that the breeder has had no reasonable oppor-
crop production systems are not uniform tunity to exercise his or her right in relation to
and countries could consider legal options the propagating material; and (iii) optionally
that address this variability. For instance, (at the discretion of the member state), prod-
earlier seed law in the Netherlands included ucts made directly from harvested material,
severe restrictions on saving planting mate- provided this has been obtained through the
rials for ornamental crops, while field crops unauthorized use of harvested material and
were regulated on the basis of the more lib- that breeders have had no reasonable oppor-
eral UPOV 1978 Convention. Many vegeta- tunity to exercise their rights in relation to
tively propagated commercial flower species the harvested material.
can be multiplied very rapidly by farmers, In the case under (ii), for instance, the
which would considerably reduce revenues breeder gets a more extensive right in rela-
for breeders and provide inadequate incen- tion to certain imported harvested material.
tives for innovation in a sector that is very Where harvested material has been pro-
important for Dutch agriculture. Thus an duced with illegal seed, he or she now has
amendment to the law made the farm-level another opportunity to exercise their right.
propagation of such species illegal. This Furthermore, the 1991 Act specifies
example emphasizes that countries need to four subject matters to which the breeders
design appropriate levels of protection for right extends:
different types of commodities, in accord 1. The protected cultivar itself.
with the domestic agricultural economy 2. Cultivars that are not clearly distinguish-
and plant breeding capacities. able from the protected cultivar.
3. Cultivars that are essentially derived
from the protected cultivar.
The breeders rights
4. Cultivars whose production requires the
A completely new approach taken in the repeated use of the protected cultivar.
1991 Act defines the scope of the breeders
The addition of cultivars that are not
rights. The basic right pertains to seven acts.
clearly distinguishable is designed to make
1. Production or reproduction (multipli- the breeders right more effective; to prevent
cation). an infringer from claiming that he was not
2. Conditioning for the purpose of pro- exploiting the protected cultivar, but a very
pagation. similar one falling outside the protection
3. Offering for sale. perimeter.
514 Chapter 13

The 1991 Act establishes three com- It is seen as a way of promoting the
pulsory exceptions to the breeders right development of the best cultivars for farm-
and one optional exception. The three com- ers, limiting the development of long-term
pulsory exceptions are: (i) acts done pri- commercial advantages, improving oppor-
vately and for non-commercial purposes tunities for smaller breeding companies and
(in particular the reproduction of a pro- thus promoting competition in the sector.
tected cultivar by a subsistence farmer or Unlike the farmers privilege, the breeders
by an amateur gardener); (ii) acts done for exemption has not dramatically changed in
experimental purposes; and (iii) acts done later UPOV Conventions, prompting some
for the purpose of breeding other cultivars companies in the USA to look to the patent
and (provided protection has not been spe- system for protecting their germplasm. The
cifically extended to them, as for instance only modification in the 1991 Convention
in the case of an EDV) for the purpose of is the limitation on EDVs, which may fall
exploiting such other cultivars. under the rights of the original breeder.
The optional exception relates to farm-
saved seed. States that are party to the 1991
Act of the UPOV Convention may exempt 13.3.2 The 1983 International
farm-saved seed from the breeders right, Undertaking on Plant Genetic Resources
within reasonable limits and subject to
safeguarding the legitimate interests of the
In 1983, the Food and Agriculture Organ-
breeder. Each member state will exercise
ization of the United Nations (FAO) estab-
this option in the light of its own national
lished a Commission on Plant Genetic
conditions. Some states have chosen to give
Resources (later renamed the Commission
farmers an unconditional right to replant
on Genetic Resources), the first permanent
seed from their previous harvest while
intergovernmental forum devoted to germ-
others have limited this right to certain
plasm conservation and development. The
crops or to small farmers.
Commissions first major action was to adopt
a non-binding resolution known as the
Breeders exemption International Undertaking on Plant Genetic
Resources (hereafter the Undertaking),
As plant breeding is generally considered as
which is based on the principle that plant
incremental, breeders have built on exist-
genetic resources are a common heritage of
ing cultivars to develop improved ones. To
mankind to be preserved and to be freely
make progress, contrary to the situation in
available for use, for the benefit of present
mechanics or chemistry, the description of
and future generations. The purpose of
the invention is not enough, as it is not to
the Undertaking is to ensure that genetic
rebuild a whole genome starting from nucle-
resources will be explored, preserved,
otides. That is why the UPOV Convention
evaluated and made available for breeding
included an exception to breeders rights:
and science. It is based on the following
The utilisation [by others] of the [protected]
underlying principles:
new cultivar as an initial source of variation
for the purpose of creating other new cul- Genetic resources are a heritage of
tivars and the marketing of such cultivars humanity and should be available with-
(Art. 5.3 of the 1961 Act). This exception, out restriction.
widely known as the breeders exemption, Establishes farmers rights: farmers
has been one of the engines of the breeding should be compensated for develop-
industry since the late 1960s. It stems from ment and conservation of genetic
the traditionally unrestricted use of seed by resources.
farmers and breeders. It provides that any Sovereign rights of nations to preserve,
person is allowed to use a protected culti- protect and be compensated for inno-
var for further breeding without requiring vative utilization of their native genetic
the consent of the rights holder. resources.
IPR and PVP 515

Article 5 of the Undertaking (Availability common concern of humankind. The CBD


of Plant Genetic Resources) provides as is based on a new set of principles:
follows:
Affirming that the conservation of
It will be the policy of adhering biological diversity is a common concern of
Governments and institutions having plant humankind,
genetic resources under their control to
Reaffirming that States have sovereign
allow access to samples of such resources
rights over their own biological resources,
and to permit their export, where the
resources have been requested for the Reaffirming also that States are responsible
purposes of scientific research, plant for conserving their biological diversity
breeding or genetic resources conservation. and for using their biological resources in a
The samples should be made available free sustainable manner, []
of charge, on the basis of mutual exchange,
or on mutually agreed terms. The CBD must be viewed as a frame-
work Convention that needs implementing
measures. The analysis of the true scope of
the obligations under the CBD is very diffi-
13.3.3 The 1992 Convention on cult for its text is cluttered with limitations
Biological Diversity such as as far as possible and as appropri-
ate and subject to national legislation.
In 1992, in Rio de Janeiro, the United Nations It is nevertheless clear that there is no
hosted an Earth Summit to consider the contradiction or conflict between the UPOV
state of the worlds environment. In addi- Convention and the Undertaking under the
tion to producing a number of non-binding FAO aegis, on the one hand and the CBD on
declarations of international environmental the other. The measures that may be taken
policy, the Earth Summit gave birth to the to implement the CBD, however, may create
Convention on Biological Diversity (CBD). such a contradiction or conflict if they do
The specific concern of the CBD is biologi- not properly consider the prior legal instru-
cal diversity, which the convention defines ments (and their objective background and
as the variability among living organisms rationale) and may even run counter to the
from all sources including diversity within CBDs stated objectives.
species, between species and of ecosystems. In particular, respect for IPR is expressly
The CBD has the following objectives: called for under Article 16(2), relating to
to establish conserving biological diver- access to and transfer of technology:
sity as an international priority; Access to and transfer of technology
to promote fair and equitable sharing of to developing countries shall be provided
benefits from genetic resources; and/or facilitated under fair and most
to maintain appropriate access and favourable terms, including on concessional
transfer of relevant technology among and preferential terms where mutually
countries; agreed and, where necessary, in accordance
with the financial mechanism established
to reaffirm sovereign rights of states
by Articles 20 and 21. In the case of
over natural resources, including
technology subject to patents and other
genetic resources; and intellectual property rights, such access
to promote international agreements, and transfer shall be provided on terms
efforts in technology transfer, licensing, which recognize and are consistent with
protection, sharing of R&D, cooperative the adequate and effective protection of
training. intellectual property rights.

The CBD marked the end of the com- With respect to the fair and equitable
mon heritage of mankind conception of sharing of the benefits arising out of the uti-
genetic resources. The CBD does not refer to lization of genetic resources, it should also
a common heritage and its preamble states be obvious that it implies, first, the creation
only that conservation of biodiversity is a of benefits and, secondly, the identification
516 Chapter 13

of a person who would be called upon to developing countries and especially the
share the benefits which he and his part- least developed among them, secure a
ners have created. All agreements that have share in the growth in international trade
been publicized so far and follow the commensurate with the needs of their
economic development,
pattern created by the Merck-INBio agree-
ment (http://www.american.edu/projects/ Being desirous of contributing to these
mandala/TED/MERCK.HTM) include as objectives by entering into reciprocal and
a major component the sharing of royalties mutually advantageous arrangements
derived from patents. directed to the substantial reduction of
tariffs and other barriers to trade and to
elimination of discriminatory treatment in
international trade relations, []
13.3.4 The 1994 TRIPS Agreement
Article 27.3 provides for an obligation
The Uruguay Round of multilateral trade to protect crop cultivars which became effec-
negotiations held under the framework of tive for developed countries on 1 January
the General Agreement on Tariffs and Trade 1996 and became effective for developing
was concluded on 15 December 1993. The countries on 1 January 2000 (1 January 2006
agreement embodying the results of those for least-developed countries):
negotiations, the Agreement Establishing 3. Members may also exclude from
the World Trade Organization (WTO Agree- patentability:
ment), was adopted on 15 April 1994, in []
Marrakech, Morocco. (b) plants and animals other than
The result of those negotiations, con- micro-organisms and essentially biological
tained in an Annex to the WTO Agreement, processes for the production of plants or
was the Agreement on Trade-Related As- animals other than non-biological and
pects of Intellectual Property Rights (the microbiological processes. However,
Members shall provide for the protection
TRIPS Agreement). The WTO Agreement,
of plant cultivars either by patents or by
including the TRIPS Agreement (which is an effective sui generis system or by any
binding on all WTO members), came into combination thereof. The provisions of this
force on 1 January 1995. The former agree- subparagraph shall be reviewed four years
ment established a new organization, the after the date of entry into force of the
World Trade Organization (WTO), which WTO Agreement.
began its work on 1 January 1995.
The purpose and objective of the WTO It is clear that WTO members enforced
Agreement is described in its preamble: this obligation through the adoption of a
sui generis protection system. At the Fourth
Recognizing that their relations in the field Extraordinary Session of the FAO Commission
of trade and economic endeavour should be
on Genetic Resources for Food and Agriculture
conducted with a view to raising standards
of living, ensuring full employment and a
(Rome, 15 December 1997), the FAO Legal
large and steadily growing volume of real Adviser commented as follows:
income and effective demand and expanding
In fact, the concept of a sui generis system
the production of and trade in goods and
in the TRIPS Agreement is a very general
services, while allowing for the optimal use
concept that allows States to exercise
of the worlds resources in accordance with
ample discretion. The TRIPS Agreement
the objective of sustainable development,
does not give any direct indication on the
seeking both to protect and preserve the
elements or components that should be
environment and to enhance the means for
included in the sui generis system; nor does
doing so in a manner consistent with their
it require to follow the criteria of UPOV,
respective needs and concerns at different
which is already a sui generis system of
levels of economic development,
plant cultivar protection although not
Recognizing further that there is need for the only possible one. Nevertheless, it is
positive efforts designed to ensure that possible to infer, from the general context
IPR and PVP 517

of the TRIPS Agreement, some of the The MS of ABS applies to an initial


minimum requirements of the sui generis Annex (1) of 35 food crops and 29 genera
system, namely: (i) it should be, at least of forages, which include important sta-
in the broad sense, a system to protect ples such as wheat, rice, maize and potato.
intellectual property rights; (ii) it should
Collectively Annex 1 lists crops represent-
be applicable, in principle, to all traded
plant cultivars; (iii) it should be effective,
ing 80% of the worlds calorie intake.
that is, enforceable; (iv) it should be non- The Treaty is an ambiguous document.
discriminatory as regards the country of It seeks to vindicate the interests of parties
origin of the applicant (principle of national that previously were underrepresented in
treatment); and (v) it should accord the international legal policy relating to plant
most-favoured-nation treatment. genetic resources. At the same time, it seeks
to assure industrial users of such resources
that their economic interests will not be
13.3.5 The 2001 International Treaty on harmed. Although the Treaty is more
Plant Genetic Resources for Food and specific in some aspects than the CBD, its
Agriculture policies are stated broadly and often with
significant practical detail (Sullivan, 2004).
For these and other reasons, the Treaty can
On 3 November 2001 in Rome and after more
be thought of as an international policy to
than 15 sessions of the FAO Commission on
be built for plant genetic resources. What
Genetic Resources and its subsidiary bod-
that structure will look like when it is com-
ies, representatives of 116 nations approved
pleted will depend upon a variety of politi-
a new International Treaty on Plant Genetic
cal, economic and scientific influences that
Resources for Food and Agriculture (here-
are already at work to shape the policies of
after the Treaty). The Treaty applies only
the future.
to plant genetic resources useful for food
The Treaty came into force on 29 June
and agriculture. It establishes the follow-
2004, i.e. 90 days after 40 governments had
ing objectives (Sullivan, 2004; Fowler and
ratified it. Governments that have ratified it
Lower, 2005):
will make up its Governing Body. At its first
to encourage the conservation of plant meeting, held in Madrid in June 2006, this
genetic resources in order to preserve Governing Body addressed important ques-
and enhance the genetic diversity of tions, such as the level, form and manner of
plant species and cultivars of value to monetary payments on commercialization,
food or agriculture; mechanisms to promote compliance with
to provide a workable, juridical basis the Treaty, the funding strategy and an
for rewarding farmers for their contribu- approved standard material transfer agree-
tion in conserving, improving and mak- ment (SMTA) for plant genetic resources
ing available plant genetic resources; for food and agriculture. Each country that
to further develop the system of national ratifies will then develop the legislation
sovereignty over plant genetic resources and regulations it needs to implement the
first established in the CBD, while ensur- Treaty.
ing that such exercise of sovereignty Different sides in the MS bargained
does not hinder international exchange hard for an equal position before and after
of such resources; and the Treaty. On one hand, developed coun-
to establish a multilateral system (MS) tries wanted the Treaty to guarantee access
of access and benefit-sharing (ABS) to all crops. On the other hand, develop-
that will coordinate exchanges of plant ing countries tend to believe that they were
genetic resources and in some cases, being exploited, as modern cultivars pro-
require payments by persons or enti- tected by IPR have been marketed to them
ties who commercially exploit such at high prices when, in fact, these cultivars
resources, to the nations from which are based on genetic materials donated to
such resources originated. developed countries breeders. Developing
518 Chapter 13

countries also saw themselves as donors, 13.4.1 Plant variety protection or plant
not as recipients, of germplasm (Fowler and breeders rights
Lower, 2005).
A tremendous asset associated with the UPOV is the most widely used system
Treaty is the genetic resources, mostly of for PVP, currently with 63 member states
the worlds major food crops, that are held (http://www.upov.int/). Most countries
at the centres of the CGIAR. Historically, of the OECD and some developing coun-
these have been considered as an interna- tries are members of one of the UPOV
tional heritage and have been freely avail- conventions, although that is not the only
able to everyone, most recently under the sui generis option under the WTOs 1994
terms of a formal agreement between FAO TRIPS Agreement. Countries wishing to join
and the centres in which it is agreed that the UPOV must present legislation compatible
centres are holding the materials in trust with the 1991 Convention. UPOV member-
for the benefit of the international commu- ship offers a number of advantages, includ-
nity. The agreements signed by the centres ing a source of technical backstopping for
with FAO on behalf of the Governing Body cultivar testing and the assurance of a PVP
of the Treaty on 16 October 2006 oblige the system recognized and respected by for-
centres to deal differently with the plant eign investors. On the other hand, the 1991
genetic resources for food and agriculture Convention imposes potential restrictions
(PGRFA) they hold and have brought under on farmer seed management practices that
the Treaty, depending on whether or not the may be politically unacceptable, a poten-
PGRFA is listed in Annex 1 of the Treaty. All tial threat to food security and impossible
transfers of PGRFA of crops listed in Annex to enforce in some circumstances. For these
1 must be under the SMTA. It is assumed reasons some developing countries have
that the SMTAs prohibition against recipi- declined to join UPOV. Only in specific
ents acquiring IPR on the germplasm and cases where seed saving might threaten a
related information refers to the access market (e.g. export markets for flowers) or
and use of the material with few onerous seed exchange would reduce incentives for
restrictions. It therefore encourages use and plant breeding (e.g. informal seed sale by
development of the materials while keeping larger farmers or sales by grain merchants
it available for use in the future by others in competition with the commercial seed
(Fowler et al., 2005). sector) would restrictions be justified in
most developing countries (Tripp et al.,
2006).
The UPOV mission statement is to
13.4 Plant Variety Protection provide and promote an effective system of
Strategies PVP, with the aim of encouraging the devel-
opment of new cultivars of plants, for the
A number of mechanisms are available to benefit of society. PVP provides the oppor-
protect the interests of plant breeders and tunity for breeders to gain a return on the
contribute to the development of a com- investment made in breeding a new cultivar.
petitive and dynamic national seed sector. For large-scale commercial farmers, market
In addition to PVP (through the granting of forces under UPOV schemes will generally
plant breeders rights) and patents, addi- lead to largely positive scenarios. However,
tional options include biological proc- the situation is very different and substan-
esses (such as the hybrid cultivar system), tially more complex for farmers in develop-
national seed laws, contract law, brand pro- ing countries. Cultivar protection systems
tection and other IPR (such as trademarks), may not be inappropriate in developing
as well as trade secrets. As with patents and countries as long as resource-poor farmers
PVP, the effectiveness of these alternatives continue to have choices through access
depends on the local capacity for enforce- to public cultivars or the right to save seed
ment (IBRD/World Bank, 2006). for their own purposes from commercial
IPR and PVP 519

cultivars. PVP or PBR systems will benefit commercialization of under-utilized crops


commercial farmers and agricultural pro- and species, the development of new mar-
ductivity in all countries by stimulating pri- kets for local cultivars and maintenance of
vate sector investment, increasing cultivar diversity-rich products.
choice for farmers and facilitating technol-
ogy transfer and agricultural development
efforts, including the acquisition of crop
biotechnology. However, for subsistence 13.4.2 Patents
farmers, who are with different systems in
developing countries, the benefits become A patent is a legal right, granted by a gov-
complicated. ernment to the original and first discoverer
The PVP system has been largely bene- or inventor of a new IP, to exclude others
ficial for breeders and farmers in OECD from making, using or selling the subject
countries over the past few decades. How- IP invention for a defined period of time.
ever, some believe that PVP rights are too A patent is allowed by the grantor only
burdensome to acquire in relation to the if the claimed IP is deemed useful, novel
relatively limited protection they provide and unobvious to others skilled in the
for returns on investment (Janis and Kesan, art (Boyd, 1996). A patent application is
2002). Naseem et al. (2005) investigated a written document which must fully dis-
this issue for the case of cotton in the USA, close and describe the claimed invention in
first by examining trends in cotton cultivars sufficient detail and completeness to allow
planted and then by quantifying the effect of one skilled in the art to use or practise the
PVP cultivars on cotton yields. The analysis invention. Patents are granted for inven-
suggested that PVP had led to the develop- tions that are new, involve a creative step
ment of more cultivars and that through and can be applied in industry. Since the
these cultivars PVP had an overall impact late 1980s, private companies, universities
on cotton yields. and federal governments all increased pat-
Chiarrolla (2006) addressed the ques- enting in agricultural biotechnology partic-
tion of whether sui generis PVP legislation ularly rapidly and they now hold a greater
is becoming redundant due to the growing proportion of agricultural biotechnology
use of patents (to be discussed in the next patents than they do of patents in general.
section) for the protection of plant-related Private companies tend to dominate pat-
inventions. However, to be a fully functional enting in plant technologies and molecular
standalone system, there would need to be level agricultural biotechnology. As Heisey
modifications to the patent system in order et al. (2005) indicated, differences in pat-
to prevent agricultural exemptions, enjoyed terns of patent production suggest not only
by plant breeders and farmers under sui differences in agricultural research invest-
generis PVP systems, from being overridden ment but also differences in motivations for
by patent claims, particularly those related to patenting.
entire plant cultivars. There is a danger that PVP through the utility patent system is
if sui generis PVP regimes continue to focus provided only in a few countries (for exam-
on broad societal objectives and promot- ple, the USA, Australia and Japan). The US
ing sustainable agriculture that a two tiered Utility Patent law designates four broad cat-
system will emerge where PVP-only culti- egories of patentable subject matter: com-
vars fall behind in the private-sector-funded position, machines, articles of manufacture
technology race. Alternatively the IPR sys- and processes. Plants and biological subject
tem could be refined to support the needs of matter are not explicitly included. However,
commercialization, research, further breed- in 1980, the Supreme Court decision in
ing and developing country agriculture. Diamond v. Chakrabarty, construed sec-
Important issues of global social relevance tion 101 to encompass genetically modified
include the diversification of cropping sys- organisms (GMOs). This case undoubtedly
tems, the promotion of development and helped to open the door for ensuing patents
520 Chapter 13

for genetically engineered biological materi- Utility patents for plant cultivars are
als and plant/plant cultivars. Only plant cul- not considered a reasonable option for
tivars invented or discovered in a cultivated developing country IPR systems (IBRD/
area are eligible for patents, thus limiting World Bank, 2006). Nevertheless, aspects of
the possibility of patents on wild relatives. patents for plant cultivars become increas-
Under the 1978 UPOV Convention, a culti- ingly important because of the pressure from
var could not be protected by both a patent some parts of the seed industry to move in
and PVP, but the 1991 Convention allows this direction and because this option is
this double protection. Table 13.1 provides included in some of the bilateral trade nego-
a comparison of three major IP systems for tiations between the USA and several Latin
plant cultivars. In Japan, the patent system American countries.
is used only for plant cultivars that are con- Patent protection first became available
sidered innovative and not merely a product in 1985 and companies used both PVP and
of normal plant breeding. patent systems for some years; recently a

Table 13.1. Comparison of major intellectual property systems for plant cultivars (varieties) (from IBRD/
World Bank (2006) the World Bank).

Criterion UPOV 1978 UPOV 1991 Utility patents (USA)

Protection Varieties of species or Varieties of all genera Sexually reproduced plants


genera as listed and species (and genes, tools, methods
to produce varieties)
Exclusion Unlisted species None First-generation hybrids,
uncultivated varieties
Requirements Novelty (in trade) Novelty (in trade) Novelty (in public knowledge)
Distinctness Distinctness Utility
Uniformity Uniformity Non-obviousness
Stability Stability Industrial application
Disclosure Description (DUS) Description (DUS) Enabling disclosure
Best mode disclosure
Deposit of novel materials
Rights Prevent others from Prevent others from Prevent others from making,
commercializing commercializing using or selling the
propagating propagating materials claimed invention or selling
materials and, under certain a component of the
conditions, using invention
harvested materials
Seed saving Allowed for private For use on own holding Not allowed without consent
and non-commercial only (for listed crops of patent holder
use only)
Seed exchange Allowed when Not allowed without Not allowed without consent
non-commercial consent of rights of patent holder
holder
Breeders Use in breeding Use in breeding Not allowed without consent
exemption allowed allowed (but sharing of patent holder
rights in case of EDV)
Duration 1520 years 2025 years 20 years from filing or
(depending on (depending on crops) 17 years from granting
crops) (prior to June 1995)
Double protection Not allowed Allowed Allowed
(PVP and
patent)
IPR and PVP 521

reliance on patents has dominated. There (Louwaars et al., 2002). In the absence of
are reasons for that choice, despite the special treatments, plants containing these
higher cost of utility patents (Lesser, 2005). technologies produce sterile seed, thereby
PVP allows farmers reuse of seed (although ensuring that farmers cannot save commer-
not an issue for F1 hybrids) as well as open cial seed of self-fertilizing crops (e.g. wheat
breeding access. Patents allow neither. and beans) for subsequent planting. The
Moreover, underfunding and resultant technologies also make it difficult for other
delays in issuing certificates reduced the breeders to use the protected germplasm.
value of PVP for breeders. Companies are using the methods of genetic
transformation to develop several such pro-
tection mechanisms including the so called
terminator technology, a colloquial name
13.4.3 Biological protection given to proposed methods for restricting
the use of GM-plants by genetically switch-
The oldest mechanism for protecting a plant ing off a plants ability to germinate a sec-
cultivar is hybridization. The discovery of ond time as next-generation seed. None is
the phenomenon of hybrid vigour (hetero- commercially viable yet, but the possibility
sis) in the early 20th century opened new of this technology has led to widespread
possibilities for producing high-yielding debate and concern in the popular press (e.g.
and uniform cultivars of cross-fertilizing http://www.banterminator.org/; Guidetti,
crops and offered two distinct advantages 1998) and has caused the technology to be
for protecting the interests of commer- specifically banned in Indias Protection
cial seed provision. First, seed of hybrid of Plant Varieties and Farmers Rights Act
origin will lose some yield potential and (IBRD/World Bank, 2006).
other valuable characteristics (such as uni-
formity) in subsequent generations, which
reduces farmers incentives for saving seed.
Secondly, competing seed companies can- 13.4.4 Seed laws
not duplicate a particular hybrid cultivar if
they do not have access to the inbred lines Plant breeding and seed production are
used to develop the hybrid cultivar. If the already subjected to a set of national regu-
inbreds can be physically protected, they lations on cultivar release and seed qual-
have the character of a trade secret. Hybrids ity control. These regulations are related
from self-pollinated species including rice to seed saving, seed exchange, the scope of
were first commercialized in China in the protection, the breadth of coverage and the
1970s using genetic male sterility and now relation of PVP and patents to the concerns
over 50% of rice land is planted with hybrid of farmers rights. They have played an
rice that has a huge seed market in China important part in determining the current
and South-east Asia. The use of hybrids thus evolution of seed systems. The following
provides a steady demand for seed, over- discussion on seed laws is based on IBRD/
coming much of the uncertainty in the con- World Bank (2006).
ventional seed market, where factors such Conventional seed laws can provide
as the weather determine how much seed is opportunities for controlling access to plant
saved on the farm and hence the demand for cultivars, even in the absence of IPR legisla-
fresh seed. In China a thriving and diverse tion. They determine what cultivars may be
commercial seed sector has existed for more produced and establish regulations for seed
than two decades because of the develop- certification and quality control. They can
ment of hybrid rice. also limit the production and sale of seed
A more recent example of biological by competitors and can perform some of the
protection mechanisms is the introduc- functions expected of PVP. Seed laws usu-
tion of genetic use restriction technologies, ally specify the extent to which seed must
operating at the cultivar level (V-GURTS) be certified and define the types of cultivar
522 Chapter 13

that may be offered for sale. Where seed is recognized by the law and remedies can
certification is compulsory, the breeder may be provided. Contract law can be classi-
determine who is to produce seed by con- fied, as is habitual in civil law systems, as
trolling access to breeders (or pre-basic) part of a general law of obligations. Various
seed. Any unauthorized multiplication will types of contracts can be effective in pro-
not be acceptable to the certification agency. viding legally enforceable agreements that
A public or private breeder can establish an restrict the use of a breeders cultivar and
exclusive contract with a seed company offer complements or substitutes to IPR.
for the production of specified cultivars. Some contracts are aimed primarily at pre-
When a cultivar is not protected by PVP, the venting seed saving and multiplication,
authorities can assign one or more maintain- whereas others are aimed at protecting the
ers to meet the continued demand for seed. germplasm from being used in competitors
Seed certification requirements can also be breeding programmes.
used to limit informal seed sales, especially One type of contract that is increas-
when they occur on a large scale. ingly prevalent in the US seed market is the
Where seed law specifies that a culti- grower contract, or bag tag. This simple
var must be approved through a registration (unsigned) agreement restricts the farmer
process or on the basis of performance tests from using or disposing of any part of the
before entering commercial seed produc- harvest as seed. Farmers are considered to
tion, this provision can also prohibit the comply with the provisions of such con-
sale of a released cultivar under a different tracts when they open the seed bag. If con-
name. In this way, the law limits the extent trolling the market for the harvested product
to which a competing company can market becomes possible, another type of contract
seed of a protected or an essentially derived can be enforced. The breeder can oblige a
version of a released cultivar, including the grower to use crop cultivars in certain ways
unauthorized use of a transgene. and can impose restrictions on the saving or
Commercial seed systems usually begin multiplication of planting material. In the
with products that are difficult for farmers cut flower industry, for example, the vast
to save (hybrid cultivars or small seeded majority of the output is sold in a limited
vegetables) and that generally require number of wholesale markets. If a flower
little IP protection. As the seed industry cultivar is protected in the country where a
matures and farmers recognize the value major wholesale market is located, growers
of commercial seed, companies will offer in other countries may have to sign contracts
a wider range of products, some of which limiting multiplication or unauthorized sale
may require attention to IPR. Seed industry of that cultivar, or they risk being denied
development usually parallels the growth further access to the major market.
of agribusiness and markets for particular Access to germplasm may also be
commodities may demand specific atten- controlled through material transfer
tion to IPR. Seed companies can sell seed agreements (MTA), which may be seen
to farmers who recognize the quality and as another form of contract regulating the
convenience of commercial seed, on the use of plant germplasm. Such an example
basis of reputation and branding as is the involving MTA includes the Agreements
case for small-size vegetable seeds in vari- signed by the CGIAR centres with the
ous countries. FAO as discussed in Section 13.3.5. MTA
and other contractual arrangements can
be used by private companies to control
access to genes or transgenic cultivars that
13.4.5 Contract law are protected by IPR in one country, even
if the recipient country does not recognize
A contract is a legally binding exchange of the particular IPR. For example, when a
promises or agreement between parties that national agricultural research organiza-
the law will enforce. Breach of a contract tion contracts with a major biotechnology
IPR and PVP 523

company to use particular proprietary protected long after the PVP expires. Some
transgenes, the contract may specify how countries prohibit the use of separate trade
the national organization is to use the names and prescribe that the name regis-
genes, the rights to any technologies that tered in the PVP or seed law lists is to be
are produced and the companys obliga- used in commerce.
tions (for example, to provide training or
other assistance). Access to various tools
and processes of biotechnology, such as
genetic transformation techniques or diag- 13.4.7 Trade secrets
nostic methods, is also usually subject to
contracts specifying limitations on their A trade secret can be considered a formula,
use and the rights of the provider in rela- practice, process, design, instrument, pat-
tion to commercial products. tern, or compilation of information used
by a business to obtain an advantage over
competitors within the same industry or
profession. In some instances, secrecy is an
13.4.6 Brands and trademarks effective way to protect certain technolo-
gies and the choice between patenting and
As a symbol such as a name, logo, slogan secrecy may depend on the type of tech-
and design scheme, which embodies all the nology and the size of the company. Trade
information connected to a company, prod- secrets may not be included in a separate
uct or service, brands and trademarks are body of law but come under standard trade
part of IP law, but their utility in the seed law. In plant breeding, the primary exam-
industry is often overlooked in the policy ple of a trade secret is the protection of
debate about IPR (IBRD/World Bank, 2006). the inbred lines used to produce a hybrid.
A minor point to remember is that terms The ability to exploit this type of secrecy
such as AFLP and Breeding by Design, depends to an important extent on the
both trademarks of Keygene, Inc., should degree of physical security that can be pro-
carry the or designation. Seed compa- vided to plant breeding facilities and seed
nies frequently register their brands and multiplication plots. Registration require-
trademarks as a way of distinguishing their ments (under PVP or seed law) may require
products from those of their competitors the breeder to provide information on the
and building up a loyal customer base. In pedigree (e.g. the specific inbred lines) or
the absence of other IP instruments, the even deposit samples of the different par-
development of a strong brand image and ent lines. This requirement can nullify the
reputation can protect a company from some trade secret unless the registration author-
types of competition. While trademarks can ity can keep the information and materials
be effective in communication with custom- confidential. Advances in biotechnology
ers (farmers), they do not protect a breeder make this type of secrecy more difficult
from competitors who steal the cultivar to maintain, as reverse engineering of new
and include it in their own (branded) prod- cultivars becomes easier. Even though such
uct portfolio. actions might be covered by the enforce-
As there is usually a prohibition against ment of provisions on EDVs, they help to
using a cultivar name registered under PVP explain the pressure from some parts of
as a trademark, it is much less common for the seed industry for further limitations
crop cultivars to be trademarked. However, on the breeders exemption (IBRD/World
in some cases a trademarked cultivar name Bank, 2006). Trade secrets are also use-
may be very useful (IBRD/World Bank, ful for protecting certain aspects of plant
2006). For example, flower breeders often biotechnology, particularly procedures or
register a cultivar through PVP under one techniques that cannot be detected in the
name but market it under a second, trade- final product, such as markers and regen-
marked name, which can be used and eration methods.
524 Chapter 13

13.5 Intellectual Property Rights material into an organism are complex


Affecting Molecular Breeding procedures characterized by a wide range
of modifications and improvements. For
As more and more plant patents are granted, instance, the Agrobacterium methodology
IPR will increasingly affect each proce- was initially unsuccessful at transforming
dure of molecular breeding, which include monocotyledons (such as cereals), but sev-
methods for generation, identification and eral recent advances have overcome this
transfer and selection of genetic variation. limitation. Similarly, the success of biol-
Genetic materials (DNA, markers, genes istics depends on a number of engineering
and sequences) and methodologies (marker considerations governing particle delivery.
detection, MAS, genetic transformation and Hence both technologies are subject to a
plant generation) that are keys to molecular large number of broad and specific patent
breeding are heavily affected by biotechnol- claims that make their utilization (and any
ogy patents. claims on the resulting cultivars) far from
straightforward. The particle gun technique
was developed by US public researchers and
licensed exclusively to a multinational com-
13.5.1 Genetic transformation pany, thus providing an exclusive right to
technologies use and sublicense the technique. A devel-
opment that may address restricted access to
Since the late 1980s, the contributions of transformation methodology was the recent
biotechnology have transformed the science announcement of the discovery of transfor-
of plant breeding. The most visible (and mation methods based on several genera of
controversial) aspect of plant biotechnology bacteria other than Agrobacterium and the
is the ability to transfer segments of DNA establishment of an open source licensing
from one organism to another, resulting in facility for these techniques (Broothaerts
GM plants. The range of commercial trans- et al., 2005). However, whether this method-
genic crop cultivars is still quite narrow ology proves as efficient as the older meth-
(the majority feature herbicide tolerance ods remains to be seen.
or insect resistance) and about one-third of Plant regeneration is a process of grow-
the global area planted to transgenic crops ing an entire plant from a single cell or group
is in developing countries (the majority of cells through tissue culture in which
in Argentina, Brazil, India, China, India, fragments of tissue from a plant are trans-
Paraguay and South Africa) (James, 2006). ferred to an artificial environment in which
One of the recurring themes of the they can continue to survive and function.
debates concerning the application of genetic In genetic transformation, the transformed
transformation technology has been the role plant cells (the products of Agrobacterium
of IPR. This term covers both the content of or biolistic methods) must be regenerated
patents and the confidential expertise usu- to produce whole GM plants (Chapter 12).
ally related to methodology. The possession Various techniques, which are mostly from
of appropriate genes and sequences is obvi- tissue culture, are used to accomplish this
ously not sufficient to produce a transgenic goal and each regeneration protocol is
plant cultivar. As described in Chapter 12, appropriate to particular species or even
there are two major techniques for intro- cultivars. The majority of these transfor-
ducing foreign genetic material into an mation methods are described in the pub-
organism. One is based on Agrobacterium lished literature and hence are available
tumefaciens, a species that is able to insert to all researchers. However, modifications
its own or other genes into a plant genome. that provide higher efficiency or that are
The second is based on the direct, physical appropriate to specific species may be kept
transfer of foreign genes into target plant secret by individual laboratories, because it
cells, e.g. through biolistics. Both major is impossible to detect their utilization in
techniques for introducing foreign genetic the final product.
IPR and PVP 525

Dunwell (2005) reviewed the wide range of granted US utility patents in the category
of existing patents that cover all aspects of genetic transformation from 1976 to 2000
transgenic technology, from selectable mark- is available at http://www.ars.usda.gov/
ers and novel promoters to methods of gene data/AgBiotechIP/. For detailed analysis of
introduction. Although few of the patents several of the key areas under discussion,
in this area have any real commercial value, the reader is referred to detailed summaries
there are a small number of key patents that published elsewhere, for example in the
restrict the freedom to operate of new com- series of comprehensive CAMBIA White
panies seeking to exploit the methods. Since Papers (http://www.cambia.org/daisy/bios/
the late 1980s, these restrictions have forced home.html). Frequently, the main point of
extensive cross-licensing between agricul- interest in these discussions is the coverage
tural biotechnology companies and have of the patent(s) in question.
been one of the driving forces behind the
consolidation of these companies. Transformation methods
During the period since the produc-
tion of the first transgenic plants a wide As described in Chapter 12, there are several
diversity of patents have been sought on techniques for the introduction of recom-
all aspects of the process, ranging from the binant vectors containing heterologous
underlying tissue culture methods through genes of interest into plant cells and the
to the means of introducing the heterolo- subsequent regeneration of plants from such
gous DNA and to the composition of the cells. Some of the patents covering these
DNA construct so introduced. The summary techniques are summarized in Table 13.2.

Table 13.2. Selection of patents/applications covering plant transformation methods (from Dunwell
(2005) with permission from Wiley-Blackwell).

Method Company/institution Patent/application number

Agrobacterium University of Toledo US 5177010, WO 02/102979


Texas A&M University US 5104310, WO 03/048369
Leiden University EP 120516, 159418, 176112
US 5149645, 5469976, 5464763
US 4940838, 4693976
Max Planck EP 116718, 290799, 320500
Japan Tobacco US 5591616
EP 604662, 627752
Ciba-Geigy EP 267159, 292435
Washington University US 6051757
Calgene US 5463174, 4762785
Agracetus US 5004863, 5159135
Monsanto WO 03/007698
BASF WO 03/017752
Purdue University WO 01/020012
Particle bombardment Cornell University US 4945050
DowElanco US 5141131
Dekalb US 5538877, 5538880
Agracetus US 5015580, 5120657
Electroporation Boyce Thompson Institute WO 87/06614
Dekalb US 5472869, 5384253
PGS US 5679558, 5641664
WO 92/09696, 93/21335
Whiskers Zeneca US 5302523, 5464765
Protoplasts Ciba-Geigy US 5231019
526 Chapter 13

Most of these methods involve a tissue cul- Almost all the significant components of
ture step and many of these enabling proto- the constructs used in plant transforma-
cols are also the subject of patent claims. tion have been the subject of patent cover-
The most extensive publication in this age. These include the effect gene as well
area is the 360-page CAMBIA White Paper as its associated regulatory sequences, the
(Roa-Rodriguez and Nottenburg, 2003a) selectable or screenable marker and addi-
on Agrobacterium-mediated transforma- tional sequences that might be required for
tion. This document focuses on the patents the subsequent excision of the transgene.
directed to methods and materials used for It is important to recognize that patents
transformation, mainly of plants, but also of on plant genes affect more than just the pro-
other organisms such as fungi. duction of transgenic cultivars. It is possible
to identify and protect genes that are used
Genes and DNA sequences in more conventional breeding procedures.
For instance, several herbicide-tolerant crop
Much of the debate in this area concerns cultivars commercially available in North
the ability to apply for patents on DNA America incorporate patented genes that
sequences of unproven function. There have been identified through techniques
have been several attempts to do so and the such as mutagenesis or whole cell selec-
decisions on such applications have not tion and then incorporated in new crop
been finalized. However, the fact remains cultivars through conventional breeding.
that there is much useful sequence informa- Another example is imidazolinone-resist-
tion available in patent databases and much ant maize, which is being tested in sub-
of it is ignored by academic research scien- Saharan Africa to control the weed Striga.
tists. Specifically, it is estimated that some The key to patent protection in these cases
3040% of all DNA sequences are only is the definition of novelty that is, some
available in patent databases, since there is countries prohibit patent protection on
of course no obligation for commercial (or substances found in nature, which are con-
other) applicants to submit their sequences sidered to be discoveries rather than inno-
to public databases. Possibly, the best way vations. In most cases, a discovery must be
to access this information is via the GENESEQ further developed in order to be considered
system, a commercial (Derwent) service. an innovation and eventually gain a patent
As described in Chapter 12, transgenic that may effectively include the discovery.
crops are distinguished by the presence of However, genes discovered and developed
several types of foreign genetic material. in the course of conventional breeding can
These include: (i) functional genes (that is, be patented in several countries. IBRD/
genes that code for insect resistance, herbi- World Bank (2006) provided an example of
cide tolerance, or other desired character- the resistance to aphid (Nasonovia ribisn-
istics); (ii) selectable marker genes (which igri) in lettuce, patented by a Dutch breed-
have characteristics easily identifiable in the ing company in the USA and Europe. The
laboratory and, when linked to a functional European patent is, however, under appeal
gene, facilitate the detection of transformed from various sides, including some impor-
cells); (iii) promoters (which regulate the tant vegetable seed companies. So far, the
timing and location of the expression of US Patent and Trademark Office (USPTO)
functional or marker genes); and (iv) end and the European Patent Office (EPO) have
sequences (portions of DNA that termi- treated isolated and purified nucleotide
nate transcription). These different types sequences as if they were the same as man-
of genes, sequences and techniques used in made chemicals (Doll, 1998). Andrews
developing transgenic crops, as well as the (2002) argued that the useful properties of
diagnostic tools and processes of marker- a gene sequence (such as its ability to bind
assisted breeding used to produce conven- to a complementary strand of DNA for diag-
tional crop cultivars, are all candidates for nostic purposes) are not ones that scientists
patent protection (IBRD/World Bank, 2006). have invented, but instead, are natural,
IPR and PVP 527

inherent properties of the genes themselves. is provided by the Bt genes that are used
Moreover, gene patents do not meet the cri- for insect resistance in cotton, maize and
teria of non-obviousness, because, through other crops (IBRD/World Bank, 2006). The
in silico analysis, the function of genes Bt bacterium produces certain insecticidal
can now be predicted on the basis of their proteins and has been used as a source of
homology to other genes. natural insecticide for many years. The
Although the possibility of patenting techniques of biotechnology have allowed
genes is controversial, the concept itself the identification and transfer of the genes
seems straightforward. Even so, several that code for these crystalline (Cry) pro-
issues contribute to making this area a teins; the nomenclature describes a series
particularly complex one for patent law. of different cry genes (found in different
One problem is related to broad patent strains of the bacterium), each coding for a
claims, which may cut a swathe as wide as distinct Cry protein that is effective against
all genetically engineered cotton plants. specific insects. Thus the cry1Ac gene
Although such comprehensive claims codes for the Cry1Ac protein that is effec-
may be more difficult to make now than tive against the cotton bollworm and is the
in the early years of biotechnology, the basis of most versions of Bt-cotton. Not only
issue of broad patents remains a concern are there various claims on genes that code
for many areas of research, including the for specific Cry proteins; the cry genes that
plant breeding industry (Barton, 2000). are used in transgenic plants are synthetic
Another issue that affects gene patenting and significantly different from the origi-
is the degree to which claims are allowed nal wild genes found in the bacterium. In
for genetic material whose functions are most cases, the Bt genes are codon modi-
incompletely understood. For instance, fied because part of the code that functions
the Human Genome Project witnessed a in a bacterium must be changed to be more
rush towards patents for a wide range of effective in a plant. So although the insecti-
DNA sequences without any correspond- cidal protein that is produced by the trans-
ing characterization and although such genic plant may be essentially identical to
practices are more prevalent in the phar- that produced by the bacterium, the govern-
maceutical industry than in plant breed- ing gene may look somewhat different and
ing, they illustrate that there is not yet a patent claims can be made on the modified
widely accepted definition of how genetic gene and the techniques used for its modi-
material qualifies for a patent. This issue fication. A cry gene may be further altered
is related to a third issue, which concerns by eliminating certain portions to produce
the type of genes or DNA sequences that a truncated form of the gene (which may
might be patented. Claims have been made prove more effective) and research has also
for protecting DNA that does not consti- created fusion genes that code for novel
tute a complete gene, including promoters, proteins combining parts of two differ-
nucleic acid probes (used to identify DNA ent Cry proteins. The various types of cry
sequences) and polymorphisms. On the genes must be linked with specific promot-
other hand, patents have been sought for ers as well. The potential patent claims on
collections of genes, from bacterial clon- various aspects of the process and disputes
ing vectors to entire genomes (IBRD/World over definitions of novelty explain why Bt
Bank, 2006). Both the EPO and USPTO now technology causes considerable uncertainty
have stronger guidelines concerning claims among scientists in developing countries
on genes: there must be a good knowledge and it is the subject of continuing legal
and description of the genes function. disputes among the major biotechnology
A fourth issue that complicates the multinational corporations. Although the
granting and defence of gene patents is the Bt example is particularly complex, it illus-
variable nature of the genes themselves. trates that genetic modification is rarely a
A good example of the difficulties in identi- case of simply identifying and moving a
fying what precisely is eligible for protection gene from one organism to another and it
528 Chapter 13

demonstrates how patent claims on genes in conjunction with a transgene that con-
may cover a range of issues. fers resistance or tolerance to the chemi-
cal through detoxification or modification
Selection and identification of transformants of the chemical. Much of the original work
was conducted using antibiotic resistance
The production of transgenic organisms, marker (ARM) genes, which confer resist-
including plants, involves the delivery of a ance to antibiotics such as neomycin, kana-
gene of interest and the use of a selectable mycin and hygromycin. Roa-Rodriguez and
marker that enables the selection and recov- Nottenburg (2003b) provided a summary
ery of transformed cells. This is necessary of the most important scientific aspects
because only a minor fraction of the treated of such resistance genes, together with an
cells become transgenic while the major- analysis of selected patents that relate to
ity remain untransformed. It has been esti- the most widely used ARM. Many of these
mated recently (Miki and McHugh, 2004) marker genes are covered by patents or pat-
that approximately 50 marker genes used ent applications (Table 13.3) with a thor-
for transgenic and transplastomic plant ough IP analysis available on antibiotic
research or crop development have been markers and Basta resistance (Mayer et al.,
assessed for efficiency, biosafety, scientific 2004). As an alternative, or addition, to the
applications and commercialization. use of selectable markers, transformants are
Selectable marker genes (see Table 13.3 often identified through the use of reporter
for selected patents) can be divided into or visualization molecules.
several categories depending on whether
they confer positive or negative selection Promoters and other regulatory elements
and whether selection is conditional or
non-conditional on the presence of external Regulatory elements are crucial to gene expres-
substrates. The most common strategy cur- sion in all organisms. The patent landscape
rently used for selection is negative selec- of transcriptional regulators that are consti-
tion, the elimination of non-transformed tutively active, spatially active (e.g. tissue-
cells in conditions where the transformed specific) and temporally active (e.g. induced
cells are allowed to thrive. Elimination is or active in response to a certain chemical or
often affected by treatment of cells with physical stimulus) has been well summarized
chemicals, (e.g. antibiotics or herbicides) (Roa-Rodriguez, 2003).

Table 13.3. Selection of patents covering selectable marker genes (Pardey et al., 2003).

Selectable marker Company Region/country Patent number

Phosphinothricin, Basta Aventis/AgrEvo Europe, USA et al. EP 531716 et al.


US 5767371 et al.
Kanamycin Monsanto Europe, USA et al. EP 131623
US 6174724 et al.
Hygromycin Novartis Europe, USA et al. EP 186425 et al.
US 5668298 et al.
Sulfonamide Rhone Poulenc USA US 5714096
Cyanamide Syngenta Mogen Europe, USA EP 97201140
US 6660910
Aldehyde Calgene Europe, USA et al. EP 0800583
US 5633153
Mannose/xylose Novartis Europe, USA et al. US 5767378 et al.
Glucosamine Danisco USA et al. US 6444878
2,4D Unknown Europe, USA EP 0738326
US 5608147
IPR and PVP 529

Although the inventions protected by remaining edible part of rice grains, the
individual patents cannot be exactly the endosperm, lacks several essential nutrients
same, in certain cases, there are patents including pro-vitamin A. Thus, predomi-
that due to the breadth of their scope may nant rice consumption promotes vitamin
encompass other protected inventions or A deficiency. A combination of transgenes
there may be patents which share common enabled biosynthesis of pro-vitamin A in
features. Where that is the case, Dunwell the endosperm (Ye et al., 2000). GM-rice that
(2006) pointed out the juxtaposition of the produces b-carotene (pro-vitamin A) in the
different inventions and the possible room endosperm shows the yellow colour of the
left to manoeuvre around the different enti- grain that is visible after milling and polish-
ties in the field. It also needs to be taken ing, from which the generic name Golden
into account that there are patents that Rice is derived. Golden Rice could be used
while not totally directed to promoters may in food-based approaches and complement
have an effect on gene expression control. others, in reducing the persistent problem
This is the case for the restrictive reproduc- of vitamin A deficiency in rice-dependent
tive technologies, for example, those termed populations. The Golden Rice technology
as terminator technologies, which may was developed by I. Potrykus and P. Beyer
have a great impact on the use and develop- with their co-workers and was funded
ment of methods to regulate the expression by the Rockefeller Foundation, the Swiss
of genes related to plant reproduction and Federal Institute of Technology, the EU and
seed generation. the Swiss Federal Office for Education and
Science.
Golden Rice as an example for Golden Rice and its use in grain pro-
freedom-to-operate duction has involved a lot of controversies.
It has been suggested that extensive patent-
One of the issues of over-riding importance ing has hampered delivery of this rice to
to all companies is whether or not they are those in need since about 40 organizations
free to commercialize any particular prod- hold 72 patents on the technology under-
uct. Such freedom-to-operate is deter- lying its production (Kryder et al., 2000).
mined by the status of any IPR that might The range of patents covering various com-
cover the product in question and analysis ponents of the pBin 19hpc plasmid used
of such IPR requires continuous (and there- in the production of this rice include ones
fore expensive) surveillance. on the phytoene trait genes, the promoter
A well known example that can be sequences, the selectable marker and the
used to demonstrate the complexity of this transit peptide. Table 13.4 shows the prod-
issue is Golden Rice, a transgenic line that uct clearance profile detailing the possible
is enhanced for b-carotene (pro-vitamin A) required licences and/or agreements for
(Ye et al., 2000). Vitamin A deficiency causes Golden Rice. Table 13.5 lists the tangible
symptoms ranging from night blindness to property received by ETH-Zurich, including
those of xerophthalmia and keratomalacia, the apparatuses used in the transformation.
leading to total blindness. In developing Some components were obtained under
countries, 500,000 children year1 go blind research-only licences or research-only
and up to 600 day-1 die from vitamin A mal- MTA whereas others included use licences.
nutrition (Potrykus, 2005). As oral delivery The challenges to freedom-to-operate for
of vitamin A is problematic, mainly due Golden Rice at national and international
to the lack of infrastructure, alternatives levels include: (i) the technology is quite
might be found in supplementation of the complex with many sophisticated compo-
major staple food with pro-vitamin A. As a nents and processes; (ii) many potential IP
table food for many countries, rice is usu- owners or assignees; (iii) the range of poten-
ally milled to remove the oil-rich aleurone tial producers and consumers of Golden
layer that turns rancid upon storage. The Rice is wide; (iv) a rapidly evolving global
530 Chapter 13

Table 13.4. Product clearance profile: possible required licences and/or agreements for GoldenRice
(from Kryder et al. (2000) with permission).

Company/institutiona Patent number

AMOCO US 5545816, EP 0471056, US 5530189, WO 9113078, US 5530188,


US 5656472
Bio-Rad Inc. US 5186800
Biotechnica WO 8603516
Calgene WO 9907867, WO 9806862
Centra National de la RSK WO 9636717
Cetus WO 8504899, US 4965188, EP 0258017
Columbia University US 4399216, US 4634665, WO 8303259
DuPont WO 9955889, WO 995588, WO 9955887
Eli Lilly US 5668298
Hoffman-La-Roche US 4683202, EP 0509612, EP 0502588, US 4889818
ICI, Ltd WO 9109128
Japan Tobacco EP 0927765, US 5591616, EP 0604662, EP 0672752, US 5731179,
EP 0687730, WO 9516031
Kirin Brewery JP 3058786, US 5429939, US 5589581, EP 0393690, US 5350688
Max Planck Gesell. EP 0265556, EP 0270822, EP 0257472
Monsanto US 5352605, US 5858742, WO 8402913
National Foods RI JP 63091085
NRC Canada WO 9419930
Nederlandse OVT EP 0765397, WO 9535389
Phytogen US 4536475
Plant Genetic Systems US 5717084, US 5778925, WO 8603776, WO 9209696
Promega US 4766072
Rhone-Poulenc Agro US RE36449, WO 9967357
Stanford University US 4237224
Stratagene US 5128256, US 5188957, US 5286636, EP 0286200, WO 880508
University of Maryland WO 9963055
University of California US 4407956, WO 9916890
Yissum RDC US 5792903, EP 0820221, WO 9628014
Zeneca Corp. US 5750865, EP 0699765A1
a
Note that these are the names of the owners or assignees of the rights under the relevant patents. Because of possible
subsequent licensing or assignment, these are not necessarily the current entities to approach for licences.

IP landscape; and (v) Golden Rice may The inventors have reached an agreement
have significant commercial values (Kryder with Greenovation and Zeneca (now
et al., 2000). This issue has been overcome Syngenta)to enable the delivery of this
by a coordinated international programme technology free-of-charge for humanitarian
designed to streamline the production purposes in the developing world.
and distribution of this material (http:// Inventors (Beyer and Potrykus) assigned
www.goldenrice.org/). However, perceived their rights exclusivelyto [Syngenta] for
problems with access to Golden Rice and all uses; [Syngenta] licensed inventors for
essential medicines have stimulated debate humanitarian uses, with right to sublicense
public research institutes and poor farmers
within the USA on the obligations of US
in developing countries; the technology is
universities to facilitate the provision of to be made freely available, poor farmers
goods for the public benefit (Kowalski and can trade Golden Rice locally; [Syngenta]
Kryder, 2002; Phillips et al., 2004). will support inventors in this task; and
The deal for Golden Rice has the [Syngenta] retains commercial rights. In
following clauses: the Golden Rice Deal, Syngentas role is to
IPR and PVP 531

Table 13.5. Material transfer agreements (MTAs), licences, documents and agreements relevant to
Golden Rice (from Kryder et al. (2000) with permission).

Product Company holding the licence/agreement

Rice germplasm transformed with Taipei 309, obtained from International Rice
Research Institite (IRRI) gene construct(s)
PGEM4 Promega
PbluescriptKS Stratagene
PCIB900 Ciba-Geigy Limited (now Novartis Seeds AG)
CaMv35S promoter (component of pCIB900) Monsanto
CaMv35S terminator (component of pCIB900) Monsanto
AphIV gene: hygromycin phosphotransferase Ciba-Geigy Limited (now Novartis Seeds AG)
(component of pCIB900)
pKSP-1 Thomas Okita, Washington State University
GT1 promoter: glutelin storage protein Thomas Okita, Washington State University
(component of pKSP-1)
pUCET4 N. Misawa, Kirin Brewery Co., Ltd
Pea Rubisco transit peptide N. Misawa, Kirin Brewery Co., Ltd
(component of pUCET4)
CrtI gene: phytoene desaturase N. Misawa, Kirin Brewery Co., Ltd
(component of pUCET4)
PPZP100 Pal Maliga, Rutgers University
pYPIET4 Clontech, but now marketed by Life Technologies
Electroporation apparatus Bio-Rad Corp., Gene Pulser II System
Miroprojectile bombardment apparatus Bio-Rad Corp.

help the inventors in the management of are up to 8.0 and 36.7 g g1, respectively,
Golden Rice deployment for humanitarian compared to 1.21.8 g g1 for Golden Rice
purposes and with other companies and (Plate 4, colour photograph from Paine et al.,
universities obtained FTO for humanitarian 2005). Consistent with Syngentas support
use; provide biosafety expertise; and
of the Humanitarian Project for Golden Rice,
share available regulatory data. Here
Humanitarian Use means (research leading
SGR2 transgenic events will be donated
to): developing country use (FAO list); for further research and development. The
resource poor farmer use (<US$10,000 pa use of the SGR2 events will be governed by
from farming); in public germplasm (= seed); the strategic directions of the Golden Rice
there must be no charge for technology Humanitarian Board and full regulatory
(normal costs can be recovered; no premium); compliance. It is expected that the third gen-
local sales are allowed by such farmers ( eration of Golden Rice will be the rice with
urban needs); and replanting is allowed. a high level of pro-vitamin A and the normal
Other license terms include regulatory colour of polished rice grain, which is more
requirements national sovereignty (or
acceptable to most rice consumers. Looking
international standards); no export of grain
allowed (or seed, expect for research, to
into the future, an interesting issue has been
other licenses) liability, trade, biosafety pointed out by Potrykus (2005), that is, expe-
approvals; and obliged to fulfil all regulatory rience with the Humanitarian Golden Rice
requirements. project has shown that extreme precaution-
ary regulation not IPR prevents use of
Golden Rice 1 (SGR1) and Golden the GMO potential to the benefit of the poor
Rice 2 (SGR2), as the second generation of and that the public domain is incompetent
Golden Rice, were developed by Syngenta and unwilling to deliver products. But a
as a part of their commercial pipe-line and decade after its invention, Golden Rice is
their pro-vitamin A (b-carotenoid) levels still stuck in the laboratory. Well-organized
532 Chapter 13

opposition and a thicket of regulations on ents, copyrighted software and trademarks


transgenic crops have prevented the plant that are relevant to MAS. It can be certain that
from appearing on Asian farms within 23 many new patents have been claimed since
years (Enserink, 2008). Although the first then, particularly when the third generation
filed trial of Golden Rice in Asia started of molecular markers such as single nucle-
in 2008, no farmer will plant the rice before otide polymorphism (SNP) are used, readers
2011. can update the figure with those related to
Although there are also some suc- the multiple procedures.
cessful stories in other crops such as
metabolic engineering of potato caroten- Selection of microsatellite primers
oid content through tuber-specific over- and the PCR
expression of a bacterial mini-pathway
(Diretto et al., 2008), whether justified There are some patents that claim prim-
or not, the turmoil over Golden Rice ers for PCR analysis. Figure 13.3 shows
has shaped other efforts to improve the two typical patents claiming primers for
nutritional value of crops. For example, microsatellite (simple sequence repeat, SSR)
Harvest Plus has a US$13 million annual markers. One is patent No. 1 (Rder et al.,
budget that aims to boost levels of three 1997) which claims specific microsatellite
key nutrients, vitamin A, iron and zinc, markers for wheat, and the other is patent
and it relies almost entirely on conven- No. 2 (Nagaraju, 2003), which claims a cer-
tional breeding with only some efforts tain class of SSR primers, the inter-simple
involving marker-assisted breeding. sequence repeat-PCR primers (ISSR). Patents
claiming specific primer sequences for
marker analysis have become rare recently
as primer sequences are increasingly treated
13.5.2 Marker-assisted plant breeding as a business secret that is licensed to users.
After selecting primers, the PCR experi-
Marker-assisted selection (MAS) has been ment will be affected by the basic patents in
described in detail in Chapters 8 and 9. The PCR that were registered in 1985 including
standard steps employed in MAS generally patents Nos 35 (Mullis, 1992; Fig. 13.3).
include: (i) selecting individuals to be tested; Meanwhile, there are many other patents
(ii) harvesting materials; (iii) extracting DNA that claim special polymerases or methods
from the materials, amplifying the DNA by like reverse transcription PCR (RT-PCR) and
PCR to enrich for gene sequences or DNA quantitative PCR.
fragments associated with a particular trait There are patents that claim both the
or phenotype; and (iv) separating these frag- primer sequences and the PCR methods
ments, visualizing and identifying the DNA such as patents No. 6 (Morgante and Vogel,
fragments and interpreting and utilizing the 1997), 7 (Kuiper et al., 1997) and 8 (van Eijk
information. MAS techniques are subject to et al., 2001; Fig.13.3). These patents speci-
varying degrees of protection. They rely on fications claim processes for detecting poly-
DNA sequences and probes, some of which morphisms between nucleic acid segments
may be available publicly whereas others are using defined primers sequences, some-
covered by patents. Jorasch (2004) provided times starting with the previous restriction
an example for IPR in the field of molecular of the DNA sample by restriction endonu-
marker analysis including a figure showing cleases and the ligation of certain adaptor
some of the patents that claimed different sequences similar to the amplified fragment
steps of this typical marker experiment and length polymorphism (AFLP) approach.
to indicate which step of the experiment is
claimed (Fig. 13.3). Henson-Apollonio (2007) Analysis of PCR products
reviewed the impacts of IPR on MAS research
and application for agriculture in developing Figure 13.3 shows three different meth-
countries, providing some examples for pat- ods for the analysis of the resulting PCR
IPR and PVP 533

DNA

1 2
Selection of
SSR-primers
6
3 7
PCR 4 8
13
5
14

15

M2
882

* *
Analysis * * M3
* M1
895
878
*
870 880 890 900
9 10 11 12

Marker-assisted plant breeding methods


16 17 18 19

Fig. 13.3. An overview of a typical experiment involving SSR markers and some sample patents that
are relevant for the different steps as described in Chapters 2 and 3. Patents are indicated by numbers.
Starting with DNA isolation, the DNA is sometimes cut by restriction enzymes. After selection of specific
SSR primers, a PCR reaction is carried out. There are different possible methods for the analysis of
the PCR product including gel electrophoresis (the fluorescent label of the PCR product is indicated
by *), mass spectrometry, and microarray analysis. The result from molecular marker analysis can be
used in marker-assisted plant breeding, which involves several patents. From Jorasch (2004) with kind
permission of Springer Science and Business Media.

products. The most common one, the product sizes. A second method of analysis,
analysis by gel electrophoresis, can also be mass spectrometry, is claimed by patent No.
claimed by patents, if for example special 10 (Hillenkamp and Kster, 1999). This pat-
fluorescent labels for detection are used. ent generally claims the analysis of nuclei
Such a method is claimed by patent No. 9 acids by mass spectrometry in general. The
(Shuber and Pierceall, 2002). The claimed microarray technique that can be used for
method comprises the PCR with fluorescent high-throughput analysis of probe is pro-
primers, the detection of the labelled exten- tected by a patent of Affymetrix, patent No.
sion products and the comparison of the PCR 12 (Fodor et al., 1998). This specification not
534 Chapter 13

only protects the detection of microsatellite depend on the conditions under which the
markers by microarray analysis, but also the technology was acquired and the wording
detection of nuclei acid sequences in general of any contract with the supplier. So-called
which comprises microsatellites. Another reach through claims have not seemed to
high-throughput technique described in pat- play a significant role in this area to date.
ent No. 11 (Olek, 1996) combines the meth- Patent offices have also become aware of the
ods of mass spectrometry and microarray negative effect of these claims and are very
analysis of microsatellite markers. In addi- critical in granting wide claims.
tion, patent specifications No. 13 (Caskey National patent systems have been
and Edwards, 1992), No. 14 (Perlin, 1995) unable to keep pace with the rapid devel-
and No. 15 (Saint-Louis and Paquin, 2003) opment of plant biotechnology, leaving
summarize the complete experimental proc- many areas of uncertainty and dispute. In
ess from DNA extraction to the analysis of developing countries, only a small minor-
PCR products, in which different PCR meth- ity of patent offices have begun to consider
ods are combined, e.g. use of certain labelled applications related to plant biotechnology,
nucleotide triphosphates and different ana- while in several industrialized countries a
lytical tools such as mass spectrometry or number of claims to basic technologies are
computer analytical tools. still the subject of complex court cases. It is
therefore impossible to chart an unambigu-
Marker-assisted breeding methods ous course for the development of effec-
tive IPR regimes for plant breeding-related
The most comprehensive patent specifica- biotechnology, but it is important to recog-
tions claim complete plant breeding meth- nize the major parameters and to identify
ods in which molecular marker analysis is the issues that will affect IPR policy in the
used. Examples are patent specifications coming years. Areas of particular concern
No. 16 (Byrum and Reiter, 1998), No. 17 include the protection of genes and other
(Beavis, 1999), No. 18 (Openshaw and sequences, the methods used for genetic
Bruce, 2001) and No. 19 (Jansen and Beavis, transformation, information in bioinformat-
2001) (Fig. 13.3). These comprise previously ics databases and the diagnostic techniques
mentioned experimental steps in which that biotechnology offers conventional plant
they claim the association of genotype with breeding (IBRD/World Bank, 2006).
phenotypic traits of interest for molecular
marker analysis. The patents differ in the
selection of plan populations that are the
basis for the analysis, statistical methods 13.5.3 Product development
applied in the analysis and the integration and commercialization
of molecular biological techniques such as
expression profiling of genes. There are multiple steps involved in the
This section just describes how bio- procedure of developing biotechnology and
technology patents would affect marker-as- breeding products, each of which might be
sisted plant breeding using microsatellites associated with specific IPR issues. For exam-
as an example. As SNP markers and gene- ple, the development of Bt maize involves
based markers become increasingly feasible, multiple steps that are associated with spe-
more claimed patents will be associated cific patents and IPR issues: (i) gene owner-
with their application in plant breeding. ship (CrylF, PAT marker gene); (ii) enabling
Although MAS techniques are used in con- technologies (microprojectile bombardment,
ventional plant breeding and no foreign herbicide selection, backcrossing, production
DNA sequences become part of any result- of fertile transgenic plants); (iii) enhanced
ing cultivar, the use of patented diagnostic expression (chimeric genes using viral
technology may have implications for a promoters, enhanced expression, enhanced
plant breeders ability to claim ownership of transcription efficiency, selective gene
the final product. The exact situation will expression); and (iv) developing elite maize
IPR and PVP 535

inbreds and hybrids (patented inbreds, to add to and modify the best list we can
hybrids and patents for associated traits and generate so far.
genes).
There are many IPR issues involved in
delivering a transformation product from
research to the farmers field. For exam- 13.6 Use of Molecular Techniques
ple, in Bt-maize IPR issues would include: in Plant Variety Protection
(i) research agreements among major players
allowing forward movement in plant biotech- Molecular techniques, particularly molecu-
nology; and (ii) cross-licences for Roundup lar markers, have been widely used in all
Ready (RR) YieldGard. Monsanto licenses procedures involved in plant breeding and
Herculex 1 whereas Pioneer licenses RR for some fields of PVP as well. For example, on
maize, soybean and canola, or Pioneer needs 16 and 17 June 2005, the Plant Production
to deal on germplasm issues with Monsanto. Division of the Canadian Food Inspection
Likewise, there was competition for devel- Agency (CFIA) and the National Forum on
oping basic technologies to most effective Seed jointly held a seminar on UPOV plant
use of technologies to develop improved variety protection and the use of molecular
products. In addition, payment for technol- techniques. The objectives of the seminar
ogy or germplasm research is ultimately were: (i) to provide information to Canadian
dependent on farmer purchases of seed. plant breeders and other stakeholders on
It can be expected that a large number PVP and the use of molecular markers under
of new techniques will be developed and the UPOV Convention; and (ii) to facilitate
associated patents will be claimed in the discussion on the potential application of
field of molecular breeding in the near molecular techniques to PBR, cultivar reg-
future. The new patents that may add to the istration and seed certification in Canada.
current patent list and will affect molecular Information from this seminar is available
breeding include: at http://www.inspection.gc.ca/english/
plaveg/pbrpov/molece.shtml (CFIA/NFS,
High-throughput automated molecular
2005).
marker profiling.
There are a number of advantages to
High-throughput gene expression as-
the use of DNA marker techniques in plant
says using DNA on silicon chips.
breeding as described in Chapter 8, most of
High-throughput proteomics assays.
which are also applicable to PBR. Among
High-throughput DNA sequencing
all available molecular markers discussed
facilities.
in Chapter 2, SNP is the most prolific and is
The ability to DNA profile both the
very efficient and inexpensive to use once
female and the male parents of hybrids
developed. The technology holds enor-
without accessing either parent per se
mous potential for multiplexing and high
via use of maternally inherited tissue.
throughput. SNP technology is being used
The ability to conduct genome-wide
to characterize germplasm and in breeding
genetrait association studies involving
programmes. Broad adoption of this technol-
hundreds or thousands of genotypes,
ogy would be useful to the plant protection
including heterogeneous complexes
regulatory systems, especially for cultivar
such as landraces.
identification and protection purposes.
The ability to conduct genome-wide
While SSRs are now generally accepted in
scans comparing domesticated cultivars
the courts, there are some inherent limita-
or landraces and to compare them with
tions that would be overcome through the
wild relatives to identify potentially
use of SNP.
useful loci and novel genetic diversity.
Traditional methods based on mor-
It would not be surprising at all if several phological observations take time to com-
years later many other fields and associ- plete and results are influenced by the
ated techniques and knowledge will have environment. Molecular tools can play a
536 Chapter 13

complementary role to traditional methods. of Agricultural Botany (NIAB, Cambridge,


In the USA, except for complete descrip- UK) carries out some of the technical work in
tions of similar cultivars, colour chart ref- this area for Defra and has been very active in
erences, statistical analyses, photographs, research into the use of molecular markers for
or plant specimens, the applicant may also DUS testing. NIAB has undertaken research
provide, as supplementary information, relating to the three UPOV BMT options. For
results of isozyme analysis, restriction frag- example, Defra funded a project to develop a
ment length polymorphism (RFLP), SSR, set of SSR primer pairs for wheat that could
SNP or other genetic fingerprinting testing be used independently of the detection plat-
results. Grant protection now can be based form. The results indicated that in principle
on molecular marker differences if those DUS could all be assessed using well-charac-
differences meet the definition of distinct- terized SSR markers.
ness, that is, if they are clear. Early investigation of molecular tech-
niques clearly indicated the potential of
molecular markers for discrimination
between sets of cultivars, confirmation of
13.6.1 DUS testing cultivar identity or measurement of diver-
sity, etc., as discussed in Chapter 5. Overall,
As tools for DUS testing, molecular tech- marker technology has the potential to assist
niques offer the potential for more rapid and the PBR process in determining distinct-
cost-effective results that are less influenced ness and uniformity. Further studies are
by the environment, year, growth stage and required to determine marker type, number
other factors. The UPOV Working Group on of markers, quality of markers, distribution
Biochemical and Molecular Techniques and of markers on a genome and considerations
DNA-Profiling in Particular (BMT) reviewed pertaining to seed source and sampling
three options for introducing molecular (CFIA/NFS, 2005).
techniques into the UPOV system. For some There was potential for molecular tech-
applications, such as the use of gene specific niques for PBR DUS testing of some specific
markers to identify a phenotypic characteris- crops or species, rather than generalized
tic (option 1) or the use of molecular markers broad-based use throughout the sector.
for the management of reference collections Molecular techniques were considered to be
(option 2), molecular techniques would be particularly applicable to plants with novel
acceptable within the terms of the UPOV traits and GMOs. However, for some crops,
Convention and would not undermine the molecular techniques may be an unneces-
effectiveness of the protection it provides. A sary complication (CFIA/NFS, 2005). It
third proposal (option 3), which would create was noted that molecular techniques are
a new system, raises a key concern that molec- especially effective for identifying new-
ular techniques could potentially provide ness and therefore distinctness. Generally,
the opportunity to use a limitless number of molecular techniques, particularly molecu-
markers to find differences between cultivars. lar markers, are considered to be more use-
There is also concern that differences would ful for distinctness over uniformity and
be found at the genetic levels which are not stability.
reflected in morphological characteristics. While the potential for molecular data
GEVES (Groupe dtude et de contrle to demonstrate stability was considered to
des varieties et des semences) in France be low, there is also potential regarding uni-
and the Department for Environment, Food formity. As breeder seed samples are really
and Rural Affairs (Defra) in the UK have populations of genotypes, genotypic non-
used molecular techniques for DUS testing. uniformity is very typical. Some variability
GEVES used fingerprinting (ISSR, AFLP, SSR, in a cultivar is not a bad thing. What is
sequence tagged sites (STS)) and gene specific important is that the amount of variability
tests (GMO, species-specific reference genes that is acceptable is defined (CFIA/NFS,
or resistance genes). The National Institute 2005). When molecular markers are used
IPR and PVP 537

for a uniformity test, a number of questions On the issue of uniformity, it may be


are raised: difficult to find cultivars that are fixed
at all marker loci.
If genotypic data is used in DUS-like There may be laboratory-to-laboratory
testing, how is an heterogeneous locus variations in genotypic scores for some
interpreted in a bulk seed sample? types of markers.
What is an appropriate sample size for
single seed analysis?
Genotypic stability will not be perfect
in heterogeneous mixtures. Will the 13.6.2 Essentially derived varieties
guidelines account for registration of
deliberate mixtures of lines? An EDV is distinct and predominantly
To have a cultivar that can be identified derived from a protected initial cultivar,
as unique only by markers is of no value if while retaining the essential characteristics
the physical appearance is not unique. If of that initial cultivar. To avoid genetic ero-
there is no physical inspection, someone sion in breeding germplasm and to support
could, theoretically, apply and receive a PBR creative, additive plant breeding, the prin-
for an obsolete cultivar. A library of DNA fin- ciple of breeders exemption allows breed-
gerprints could prevent this from occurring, ers to use protected cultivars for developing
but there would be significant costs to create new cultivars. However, a newly derived
it. It seems a system that uses both botanical cultivar may be developed from an already
and molecular measurements to protect cul- protected initial cultivar owned by a com-
tivars provides the best protection. petitor, by applying breeding methods that
It was noted that molecular techniques are regarded as illegitimate. The potential
would be particularly advantageous for crops occurrence of plagiarism has increased with
with few variable morphological traits or the development of biotechnologies that
where cultivars are very similar. It was also enable the introduction of a single gene into
suggested that the preferred approach would a cultivar, such as genetic engineering or
be for the use of molecular techniques to be marker-assisted backcrossing, which facili-
on a voluntary, case-by-case and crop-based tates deliberate selection of lines that retain
approach. Molecular markers would be con- a large amount of genome of the parental
sidered by the Plant Breeders Rights Office line.
to be an additional descriptor or supple- The EDV provision is meant to
ment (CFIA/NFS, 2005). However, if there limit the possibility of cosmetic breed-
is a move towards creating a new platform ing, which produces a cultivar that is
in which molecular techniques would be a only slightly different from the original.
standalone tool, then all elements should be Introduction of the concept of EDV in the
re-addressed, such as thresholds, definition UPOV Convention improved the protec-
of distinctness, purpose of protection, sam- tion of the breeder of the initial cultivar. A
pling methods, etc. cultivar is deemed to be essentially derived
Finally, there are a number of issues to from another cultivar (the initial cultivar)
be considered and questions to be raised when it: (i) is predominantly derived from
about the use of DNA markers in DUS the initial cultivar; (ii) is clearly distin-
testing: guishable from the initial cultivar; and
(iii) conforms to the initial cultivar in the
The variation measured may not have a expression of essential characteristics that
genetic basis. result from the genotype of the initial cul-
If markers are used in conjunction with tivar (UPOV, 1991). Fowler et al. (2005)
phenotypic traits, how will the two sys- discussed a series of possible definitions
tems be weighted? of derivatives, each based on a differ-
Marker thresholds for distinctness will ent approach. These definitions are based
need to be established. on allelic differences, allelic frequencies,
538 Chapter 13

phenotype, breeding action, or composite traits as is possible with patents (Lesser,


definitions including two of the above. An 2005). As a result, we have to consider what
alternative approach would be to employ a portion of the germplasm was derived from
definition which, while allowing for IPR, the purported initial cultivar something
aims to ensure the designated accessions we can refer to as relatedness. In addition,
or the designated accession and its com- the allegedly EDV must express the essen-
ponents remain available for use by other tial characteristics of the initial cultivar. The
recipients in defined ways. The goal of this complexity arises because traits transferred
approach would be to keep the materials into a cultivar may involve few genes for
in the public domain while encouraging a simple trait or multiple genes for a com-
research on designated accessions. plex trait. The complication arises when the
An EDV can be protected when it is degree of relatedness is set. If it is set high
DUS and new, but to commercialize the cul- (e.g. 95% or higher), then there is an incen-
tivar, the breeder of the EDV must have the tive when the trait is discrete to engage in
consent of the person or entity that holds unproductive cosmetic breeding to circum-
the rights to the initial cultivar. The EDV vent the dependency standard. Conversely,
concept is susceptible to different interpre- if the relatedness is low (e.g. 90% or lower)
tations, however, and is the subject of an and the trait discrete, then it is possible that
ongoing debate among breeders (ISF, 2004). an independently discovered trait will be
Under some laws, a newly bred cultivar may identified as dependent, greatly expanding
be considered an EDV and the IPR to that the control of the initial cultivar owner over
cultivar then depend on the rights to the ini- cultivars he or she made no contribution to.
tial cultivar (IBRD/World Bank 2006). This could happen when highly polymor-
Although the EDV concept offers pro- phic molecular markers are used. Lesser
tection against piracy and violation of PBR, and Mutschler (2004) concluded that a sin-
breeding companies have not yet agreed gle relatedness requirement for a species
upon a catalogue of specific breeding pro- cannot equitably be applied to both discrete
cedures considered to yield EDVs. As a and complex traits.
consequence, official guidelines for the Among possible tools to assess genetic
estimation of genetic conformity between conformity between putative EDVs and
an initial cultivar and an EDV, as well as their initial cultivars including morpho-
crop-specific thresholds to distinguish logical traits, agronomic descriptions and
between independently derived cultivars heterosis, molecular markers seem to be
and an EDV are important and should be the most promising for establishing genetic
established. Some larger companies would conformity because of the reasons that have
like to introduce the concept of genetic been described previously. Genetic distance
distance in the definition of an EDV, but estimates based on molecular markers rep-
others fear that this step could lead to the resent a key to the assessment of essential
monopolization of certain gene pools. After derivation and effective determination of
much debate, seed company representatives genetic conformity between a protected
agreed upon arbitration rules for EDV dis- initial cultivar and a putative EDV. Highly
putes (ISF, 2005). polymorphic markers such as AFLP, SSR
Historic samples can be used by a gov- and SNP are useful in genotypic analyses
ernment agency related to PVP to establish because they are amenable to distinguish
thresholds regarding distinctness. Crops closely related cultivars.
could be reviewed to determine what might A statistical test for the identifica-
be considered similar and what might be tion of EDVs with genetic distances from
considered dissimilar. On the other hand, molecular markers was proposed by
stakeholders may create working groups to Heckenberger et al. (2005a). For a pro-
develop marker thresholds. The underly- geny line derived from a biparental cross,
ing problem is the functionality of PVP as the genetic distance to each of the par-
protecting the entire plant and not specific ents depends on the genetic distance
IPR and PVP 539

between the parents and p, the parental domly distributed markers, particularly
genome contribution transmitted to the with medium-to-low marker densities.
progeny. The treatise provides estimates
of p and the variances of p (s 2i ; Wang and
Bernardo, 2000). Morphological distances
based on 25 traits and midparent hetero- 13.6.3 Cultivar identification
sis for 12 traits were observed for a total
of 58 European maize inbred lines com- The identification of cultivars is an import-
prising 38 triplets. A triplet consisted of ant aspect of plant production systems and
one homozygous line derived from an F2, is central to the protection of IPR through
BC1 or BC2 population and both parental PBR. Preston et al. (1999) discussed the
inbreds. All inbreds were genotyped with application of a range of molecular marker
100 uniformly distributed SSR markers technologies to three points in the PBR
and 20 AFLP primer combinations in com- registration process: (i) the analysis of the
panion studies for calculation of genetic genetic distance between a candidate cul-
distances. Correlations between the co- tivar and the existing pool of cultivars in
ancestry coefficient, genetic distances and order to define a set of comparison culti-
morphological distances and midparent vars; (ii) the contribution to the generation
heterosis were significant and high for of a description of the cultivar for PBR reg-
the majority of traits. However, thresh- istration; and (iii) the use of DNA markers to
olds for EDVs to discriminate between F2- investigate and resolve the identity for cul-
and BC1-derived, or BC1- and BC2-derived tivars in cases where infringement of PBR
progenies using only morphological dis- is claimed. Molecular techniques may be
tances or heterosis yielded a considerably particularly useful in resolving the dispute
higher probability of error than observed relating to cultivar infringement (i.e. some-
with genetic distances based on SSRs one selling anothers cultivar) dealt with by
and AFLPs (Heckenberger et al., 2005b). breeders in the courts. For example, molec-
Consequently, morphological traits and ular techniques are used in Canada by the
heterosis are less suited for identification Grain Research Laboratory (GRL) for culti-
of EDVs in maize than molecular markers. var identification testing of wheat and bar-
Heckenberger et al. (2006) observed ley. The GRL has used two methods: acidic
considerable differences between AFLP- polyacrylamide gel electrophoresis (acid
and SSR-based mean genetic distance es- PAGE) and high performance liquid chro-
timates for unrelated inbred lines. With matography (HPLC). Protein fingerprinting
each marker system, the genetic distance works well, there are however limitations.
between progeny lines and parents was With acid PAGE, estimates of sample com-
little affected by the variation in genetic position are based on single kernels and
distance between the parents. Substantial large numbers of kernels can be necessary
differences in Type I and Type II errors for statistically reliable estimates. HPLC
were detected between flint and dent maize can be used on ground samples, but it is not
germplasm pools with different marker sys- suitable for complex mixtures. Both meth-
tems and when fixed EDV thresholds were ods are limited by finite protein diversity;
considered. It was suggested that threshold there are a limited number of protein differ-
levels should be crop-specific. With a crop, ences among cultivars and not all cultivars
thresholds should be germplasm pool- are distinguishable.
specific. In addition, thresholds should Quantitative DNA methods are now
also be molecular marker system-specific being developed using SNP and insertion/
because marker systems vary in the way of deletion polymorphisms (Indels). The goal
generating polymorphism. Heckenberger is to be able to look at ground samples of
et al. (2005a) reported that correlation grain to determine the cultivars present in
between true and estimated genetic dis- a mixture and their proportions. Key chal-
tances was considerably lower for ran- lenges ahead include the development of
540 Chapter 13

accurate and sensitive quantification meth- Common seed is less costly than certified
ods and, ultimately, the development of seed and includes farm-saved seed.
portable technologies that are capable of Certified (pedigreed) seed is used by
delivering rapid results. farmers who want additional assurance on
Japanese barberry (Berberis thunbergii) seed quality, cultivar purity and perform-
is an ornamental shrub desired for its hardi- ance. It is derived from a crop that has been
ness and attractiveness. However, because it issued a crop certificate from a seed associa-
is a host of black stem rust of wheat, it has tion like CSGA indicating it has been granted
been banned from importation. With per- Breeder, Select, Foundation, Registered or
mitted importation of 11 rust-resistant culti- Certified status. Production of certified seed
vars, molecular identification methods were involves the planting of known seed stocks,
used to assist CFIA inspectors with identi- previous land use restrictions, minimum
fication of the permitted cultivars, as their isolation distances and field inspections.
appearance is not always consistent with CFIA seed laboratories use a number of
the morphological criteria, particularly for test methods for the determination of cul-
plants imported in the dormant state. AFLP tivar purity and identity, depending on the
test results identified 33 reference polymor- crop kind and other factors. CFIA seed labo-
phic bands. A sample is of the same cultivar ratories are International Organization for
if 31 or more polymorphic bands are shared, Standardization (ISO)-accredited and there-
whereas if the number of shared bands is 28 fore carry responsibilities regarding the use
or fewer, it is not considered to be of the of validated methods. The methods they
same cultivar. Should results show 29 or 30 use are classified as routine or non-routine
shared bands, the DNA is re-extracted and and range from field, growth chamber and
more primer sets are used to set the refer- greenhouse grow outs to PCR.
ence bands to 64 (CFIA/NFS, 2005). Certified seed must be processed by
While one gene or one trait may be suf- an approved conditioner or by the grower
ficient for identifying a cultivar in one spe- of the seed and it must be sampled, tested
cies, it may not be sufficient in all species. and graded by accredited industry person-
It may be appropriate to take a case-by-case nel. Molecular markers are not currently
and crop-by-crop approach. used in CFIA seed laboratories because seed
certification has traditionally been based on
phenotypic traits observable during crop
inspection. However, molecular markers
13.6.4 Seed certification have the potential to be used as a control
tool to ensure seed certification is working
The purpose of seed certification is to provide as it should. This could expand the level of
high quality seed to consumers by maintain- public confidence regarding the purity level
ing the cultivar identity and purity of seed and security of certified seed or grain.
and ensuring high standards of germina-
tion, seed health and mechanical purity. For
example, in Canada for seed to be certified,
it must be a recognized cultivar, multiplied 13.6.5 Seed purification
according to strict rules that include proc-
ess standards and cultivar purity standards Breeder seed purification is a 3-year pro-
established and monitored by the Canadian cess. For example in wheat, the first year
Seed Growers Association (CSGA). involves head selection from seed increases
Common seed must meet germination, of material also being tested in first-year
disease and mechanical purity standards, collaborative trials. Year 2 includes grow-
but there are no cultivar identity or purity ing single-head-derived breeder lines in
guarantees associated with the purchase or hill or short row plots with line discards
use of the seed. For most crop kinds, com- based on visual and in some cases chemi-
mon seed may not be sold by cultivar name. cal phenotype. During the third year the
IPR and PVP 541

remaining breeder lines are grown as indi- The system follows UPOVs DUS
vidual breeder long rows with further line requirements. The cultivar must also be
discarding based on visual phenotype novel, i.e. commercialized for less than 1
with all rows of the remaining single-head- year within the EU and commercialized for
derived lines bulked as the first breeder less than 5 years outside the EU (6 years for
seed. In addition during year 3, the cultivar trees). Protection is granted for 25 years (30
is visually described in order to be registered years for trees, vines and potatoes) and pro-
and for purposes of future pedigree seed vides that authorization of the right-holder
production (CFIA/NFS, 2005). Purification is required for the multiplication, sale or
for molecular characterization is the same, international trade of the cultivar.
with line discards during year 3 also based There is currently no use of molecu-
on molecular characterization and purifi- lar techniques in CPVO DUS testing pro-
cation, but the molecular characterization tocols, however CPVO funds research and
process occurs in the laboratory rather than development projects on the potential use
in the field. of molecular techniques and supports ongo-
An important application of molecu- ing discussions and consultations on impli-
lar techniques in seed purification is seed cations and issues related to molecular
purity testing, particularly for hybrid crops. techniques. CPVO received requests from
Using cucumber as an example, Staub breeders to add a genetic fingerprint to the
(1999) illustrated the usefulness of genetic official cultivar description to facilitate the
markers in hybrid seed production includ- enforcement of European Community plant
ing purity testing. Xu, Y. (2003) discussed variety rights.
the use of molecular markers in seed qual-
ity assurance including identification of
off-types and false hybrids in rice seed pro-
duction. When a two-line hybrid system is 13.7.2 Plant variety protection
involved, the false hybrids in rice usually in the USA
come from the selfing of the female parents
due to the sterility instability of environ- In the USA, IPR for plants are provided
ment genic male sterility lines caused by through plant patents, PVP and utility pat-
temperature fluctuation beyond their criti- ents. Plant patents provide protection for
cal temperature for fertility conversion. The asexually reproduced (by vegetation) cul-
false hybrids that co-exist with real hybrid tivars excluding tuber crops. PVP provides
seeds can also happen to other hybrid crops protection for sexually (by seed) reproduced
because of various reasons. cultivars including tuber crops, F1 hybrids
and EDVs. Utility patents currently offer
protection for any plant type or plant parts.
A plant cultivar can also receive double
13.7 Plant Variety Protection protection under a utility patent and PVP.
Practice The US Plant Variety Protection Office
is responsible for administering the PVP
13.7.1 Plant variety protection in the EU Act, which provides plant cultivar own-
ers with exclusive marketing rights within
The European Communitys plant variety the USA. The requirements of protection
rights system (CPVO) was established in are that the cultivar be new, uniform, sta-
1994. The IPR granted under this system ble and distinct from all other cultivars. The
are valid throughout the 25 member states PVP Act states that a novel cultivar is dis-
of the EU. Most of the members of the EU tinct when it clearly differs by one or more
are members of UPOV. The system is in line identifiable morphological, physiological,
with UPOV 1991. It provides a one applica- or other characteristics . . . from all prior
tion, one procedure, one examination, one cultivars of public knowledge. The mean-
decision approach to the granting of rights. ings of characteristic and identifiable
542 Chapter 13

are purposefully vague in this definition to Crop cultivars may be protected under the
allow for future advances in knowledge and legislation for a period of up to 18 years. All
methodology. plant species are eligible for protection.
PVP Office protection applies to culti- The owners of new cultivars who
vars that are sexually (seed) reproduced or receives a Grant of Rights will have exclu-
tuber propagated and F1 hybrids. Cultivars sive rights over the use of the cultivar and
sold or used in the USA for longer than 1 will be able to protect their new cultivars
year or more than 4 years in a foreign coun- from exploitation by others. To be protected,
try are ineligible for protection. Fungi and a cultivar must be new, distinct, uniform
bacteria are specifically excluded by the and stable.
PVP Act. Asexually propagated crops fall The Plant Breeders Rights Office, which
under the purview of the US Patent Office. is part of the CFIA, functions to secure the
A Certificate of Protection remains in rights of plant breeders by granting pro-
effect for 20 years from the date of issue, or tection for their new cultivars. It reviews
25 years in the case of vines or trees. There and accepts applications, conducts site
are two exemptions to the rights granted. examinations, reviews data and compara-
One exists to allow farmers to save seed for tive descriptions, publishes descriptions of
use on their own farm. Another exemption cultivars and comparative photographs and
allows research to be conducted using the grants rights.
cultivar. This allows for the free exchange of
germplasm within the research community.
Important events in the US history
of IPR for plants and agriculture include: 13.7.4 Plant variety protection
(i) hybrid cultivars could be protected in developing countries
through trade secrecy (1930s); (ii) the Plant
Patent Act (1930), administered through Systems for IPR have been recognized for
the US Patent Office, provided protec- more than a century, yet until recently IPR
tion for asexually propagated plants only have not been an issue in the plant breeding
(plants reproduced through buds or graft- and seed sector in most developing coun-
ing) including horticultural crops and nurs- tries. Developing countries are being urged
ery stocks, with potato excluded; (iii) the to strengthen IPR to foster innovation and
Plant Variety Protection Act (1970), with expand trade. The field of agriculture is no
the goal to promote commercial invest- exception and the TRIPS Agreement requires
ments in plant breeding, provided patent- all WTO members to provide either patent
like protection for plants reproduced by or sui generis protection for plant cultivars.
seed; and (iv) the Utility Patent of living Developing countries will almost certainly
organisms (1980), as shown in Diamond look towards sui generis options for PVP to
v. Chakrabarty Supreme Court decision in meet their TRIPS obligations (Tripp et al.,
1980, established that anything under the 2006). IPR are being introduced or strength-
sun made by man is patentable, broadened ened in developing countries as a result of
patent law to encompass living organisms the TRIPS Agreement of the WTO, bilateral
and established ownership of plant culti- trade negotiations and pressure from export-
vars, traits, parts and processes. oriented sectors in agriculture.
Most developing countries are in the
early stages of implementing and/or enfor-
cing IPR related to plant cultivars. The
13.7.3 Plant variety protection in Canada use of IPR in plant breeding in develop-
ing countries raises a number of important
The Canadian Plant Breeders Rights Act issues, including smallholders access to
came into force on 1 August 1990. The legi- technology, the role of public agricultural
slation makes it possible for plant breeders research, the growth of the domestic private
to legally protect new cultivars of plants. seed sector, the status of farmer-developed
IPR and PVP 543

cultivars and the growing northsouth tech- policy makers (food security, health and
nology divide that restricts access to plant nutrition, employment). In the context of
germplasm and research tools (IBRD/World plant breeding in the developing world,
Bank, 2006). PPB is breeding that involves close farmer
Relatively few developing countries researcher collaboration to bring about plant
have any significant experience with pro- genetic improvement within a species.
tecting cultivars. In systems where there PPB is seen as a way to overcome the
is heavy emphasis on hybrid cultivars and limitations of conventional breeding by
considerable commercial competition, such offering farmers the possibility to choose,
as those in China and India, most interest in their own environment, which cultivars
centres on PVP for parent lines and hybrids, better suit their needs and conditions. PPB
particularly in rice and maize. In countries exploits the potential gains of breeding for
where the production of ornamental plant specific adaptation through decentralized
materials is important, these materials dom- selection, defined as selection in the target
inate PVP applications. environment and is the ultimate conceptual
The protection of transgenic crops has consequence of a positive interpretation
proven particularly difficult in developing of genotype-by-environment interactions
countries. Most experience with transgenic (Ceccarelli and Grando, 2007). As one of the
crops resolves around Roundup Ready models, selection is conducted jointly by
soybean and Bt cotton. IBRD/World Bank breeders, farmers and extension specialists
(2006)s report shows that the presence of in a number of target environments and the
IPR systems is not necessarily correlated best selections are used in further cycles of
with the effectiveness of controlling access recombination and selection.
to seed of transgenic cultivars. In developing countries, plant breeding
Pressure to strengthen IPR in plant in the public sector is seldom a profit-making
breeding in developing countries presents activity. Public sector plant breeders rarely
both immediate and long-term challenges to make financial gains from their released prod-
policy makers and development investors. ucts. This is unlikely to change if plant breed-
The immediate challenges are related to ers rights are introduced. Hence the issue of
framing and implementing appropriate legi- how to reward farmers is not complicated
slation that is consistent with TRIPS and by a need to divide profits. Farmers partici-
that supports national agricultural develop- pating in breeding programmes benefit from
ment goals. The long-term challenges are early access to new material, gain recognition
derived from the fact that an IPR regime, on from the community and learning new tech-
its own, is not likely to provide the incen- niques. In Nepal, farmers involved in PPB
tives that elicit the emergence of a robust have gained all of these benefits and have
plant breeding and seed sector; attention to sold seed of the new cultivar at a higher price
other institutions and the provision of an than the local landrace (Witcombe, 1996).
enabling environment are also necessary
(IBRD/World Bank, 2006). Collaboration
and understanding between the south and
the north should be strengthened for a bet- 13.8 Future Perspectives
ter worldwide PVP.
13.8.1 Extension and enforcement

13.7.5 Participatory plant breeding A PVP system will not meet its goals unless
and plant variety protection it is supported by the full range of stakehold-
ers. Breeders, seed producers, traders and
Participatory plant breeding (PPB) is the farmers need to understand the objectives of
development of a plant breeding programme the system in order to comply with it. The
in collaboration between breeders and farm- development of a PVP system should thus
ers, marketers, processors, consumers and include an extensive information campaign
544 Chapter 13

involving all stakeholders, including the legal tivar for protection, which crops to protect
profession. One of the major challenges for a first, how to recruit personnel with requisite
PVP system is providing effective enforce- technical and legal capacities and how the
ment. Establishing elaborate restrictions on authority can pursue cost recovery while
seed use is counterproductive if there is no ensuring that small players can afford to apply
enforcement capacity. Private companies and for protection (Tripp et al., 2006). Research
public institutes that lobby for the establish- managers and policy makers responsible for
ment of PVP must be made aware that most public research, who are commonly in favour
enforcement responsibilities will fall on their of using IPR in public sector breeding, have
shoulders. Likewise, identifying offenders to consider the potential impact on breeding
is of little use if the court system is unable strategies and on the costs and benefits before
to understand or interpret PVP legislation. giving their unconditional support to IPR in
Developing judicial experience in PVP may plant breeding and their use in public agri-
take some time (Tripp et al., 2006). cultural research (Louwaars et al., 2006).
The following next steps should be Developing molecular techniques for
taken in moving forward as suggested for IPR in plant breeding requires greater atten-
Canada by CFIA/NFS (2005), which should tion to strengthening capacities in national
be applicable to other countries: patent offices. As new methods of cultivar
identification become available, the PVP
1. Develop standardized protocols.
Office should consult with the plant breed-
2. Update marker systems.
ing community and research experts to best
3. Develop stakeholder agreement on thresh-
use these procedures. On the other hand, new
old levels and techniques to be used.
tools also raise some concerns, including
4. Work towards harmonization of crop-
legal considerations relating to conformity
specific protocols as they relate to PBR both
with the UPOV Convention and the poten-
nationally and internationally.
tial impact on the strength of protection. For
5. Develop a means for validation of tests
example, countries that use transgenic culti-
and accrediting laboratories.
vars will need to ensure adequate protection,
6. Review current national and interna-
although in many cases credible enforcement
tional crop specific projects to determine
of the right combination of biosafety regula-
all possible available markers.
tions, seed laws and PVP may offer adequate
7. Initiate research projects for selected
protection for transgenic cultivars, at least in
species.
the early stages of their availability in devel-
8. Canada should establish and lead a BMT
oping countries (IBRD/World Bank, 2006).
subgroup on barley and possibly one on
Not all crops need to be covered by
peas; and should participate in existing soy-
PVP initially and choices should be made
bean, wheat and canola BMT subgroups.
about which crop-breeding efforts would
9. Improve Canadian involvement in and
benefit most from IPR. With respect to
feedback to Canadian experts and stake-
public plant-breeding efforts, policy mak-
holders from UPOV BMT meetings.
ers must distinguish between situations in
10. Explore further collaboration with the
which PVP will help stimulate the deploy-
National Forum on Seed.
ment of crop cultivars developed by pub-
lic institutes and those in which PVP may
turn national research institutes away from
13.8.2 Administrative challenges their public mandate. A further decision
for implementing PVP involves the protection afforded to extant
(usually public) cultivars. Given that the
In addition to establishing a framework for rationale for IPR is to provide incentives for
PVP legislation, there are administrative future breeding, rather than to reward past
challenges for implementing PVP, including achievement, it seems reasonable to limit
decisions on where to house the new author- the protection periods for extant cultivars
ity, how to establish eligibility of a new cul- (Tripp et al., 2006).
IPR and PVP 545

The general concept of a PVP-type sys- date of a PVP application during which
tem is appropriate and important to pro- the breeders exemption would not be
vide affordable IP for plant breeders while available for UPOV-protected material
retaining the availability of germplasm as including commercialized cultivars.
an initial source of variation in breeding. Requiring a seed deposit for all UPOV-
PVP remains especially important to pro- related applications.
vide IP for successful breeders who, either Requiring the disclosure of all material
because of the incredible and still largely deposited with PVP applications at the
incomprehensible complex biology of their end of x years and making all material
crop species or through lack of expensive deposited available for research under
technology cannot describe an individual the breeders exemption at the end of x
gene and its agronomic impact, but who, years unless the disclosure and availa-
none the less, develop improved cultivars bility would be in conflict with a utility
that are needed in agriculture, horticulture, patent on the same material.
or forestry. Other forms of IP (trade secrets, Placing all UPOV-related deposits
contracts, patents) are also important. (excepting parents and synthetics) into
The use of molecular techniques for the public domain following expiration
cultivar registration needs to be harmonized of UPOV protection.
with its use for PBR. To do this, international Creating a PCT (Patent Cooperation
agreement on methods and procedures need Treaty)-like system to facilitate filing
to be established. There may be legal prob- of PVP applications on an international
lems associated with the use of molecular basis.
markers that may require third party verifi- Providing for and facilitating under
cation. Related government agencies should UPOV global benefit sharing consist-
act as coordinators or verifiers of molecular ent with the International Treaty on
markers and accredit or certify laboratories Plant Genetic Resources for Food and
that wish to perform molecular techniques. Agriculture.
The PBR Office may not be responsible for
establishing standards or thresholds or for Janis and Smith (2007) made two novel
review of molecular markers. Instead, it and provocative claims in Obsolescence in
would be handled in a similar manner as intellectual property regimes. They first
botanical descriptions. argued that the legal regime for protect-
ing new plant cultivars has become hope-
lessly outdated in light of recent changes
13.8.3 The need to update UPOV in technology. They next asserted that the
fate of the PVP system illustrates a broader
and more disturbing phenomenon in IP
UPOV was updated once due to changes in
law the potential for sui generis, indus-
technology. It is time to update the provi-
try-specific IP regimes to become increas-
sions once again to accommodate advances
ingly ineffective over time. Helfer (2006)
in technology that have occurred since 1991,
believed that Obsolescence in intellectual
in order to encourage continued infusions
property regimes offered an insightful
of new germplasm into breeding pools. As
legal analysis of PVP, one of IP laws least
suggested by Donnenwirth et al. (2004),
understood sui generis regimes and that the
these UPOV updates should include:
article also made a persuasive case that the
Providing compensation for and limits lynchpin of the PVP system is outdated and
on saved seed in all countries. needs to be replaced with more flexible
Making the EDV system more effec- unfair competition principles. International
tively defined to avoid technological and domestic policy makers interested in
loopholes. advancing innovation in the plant breeding
Revising the breeders exemption to industry and legal scholars concerned with
include a period of x years from the the ever-evolving relationship between law
546 Chapter 13

and technological change would do well the matter, however, lies in the application
to consider the arguments from Janis and of conflicting conventions and protocols in
Smith (2007). respect of genetic resources and biotechnol-
ogy: genetic resources are treated as public
goods, while biotechnology is treated as a
private good (Adi, 2006). Developing coun-
13.8.4 Collaboration in use of genetic tries that claim ownership to a large reserve
resources of the Earths pool of genetic resource feel
that this exposes them to the exploitative
Historically, there has been excellent col- tendencies of multinational corporations
laboration between the US Land Grant (MNC) that are mainly owned by devel-
Institutions and publicly supported IARCs oped countries of the north, considering
in crop improvement efforts. A hallmark of 74% of agbiotech patents held by six gene
the collaboration has been the free exchange giants (Monsanto, Dupont, Syngenta, Dow,
of plant germplasm and information. Now Aventis and Grupo Pulsar) (http://www.
there are increasing restrictions to use and etcgroup.org/upload/publication/247/01/
exchange of germplasm from the USA to com_globilization.pdf). MNC are some-
the IARCs and from the private sector to the times regarded as exploiting the advantages
public sector, although the reverse cannot as well as the weaknesses in the various
happen due to international public good conventions increasingly to monopolize
nature of the CGIAR centres as well as their the seed and germplasm industry, without
agreement with the International Treaty due consideration for farmers and develop-
on Plant Genetic Resources for Food and ing countries (Adi, 2006). It will take a long
Agriculture. This situation results in the fol- time to introduce a more even playing field
lowing consequences: that is mutually favourable to both parties
and to establish a better regime of benefit
restricted access and use of germplasm;
sharing that recognizes farmers or indig-
legal costs and enforcement;
enous rights alongside patents and plant
restrictions on progeny and publications;
breeders rights.
joint ownership of progenies and dis-
Publicprivate partnerships will need
coveries;
to be established to manage IP issues
complication caused by biotech patents
related to the transfer of information, mate-
on single genes and processes;
rial or technologies from private compa-
public programmes increasingly being
nies to developing countries (Naylor et al.,
unable to access and use technology; and
2004). The African Agricultural Technology
companies becoming increasingly restric-
Foundation is one initiative that has been
tive and demanding.
established to deal with such issues. Several
On the other hand, international col- private corporations with major invest-
laboration in the use of genetic resources ments in MAS in maize have agreed to pro-
becomes increasingly important and IPR vide access to germplasm and knowledge
issues are worth more attention. The recent for African countries (Naylor et al., 2004;
revolution in the field of biotechnology has Delmer, 2005).
triggered off another round of controversy
between the developed countries of the
north and the developing countries of
the south concerning access to genetic 13.8.5 Technology and intellectual
resources and equitable sharing of its ben- property interaction
efits. Developed countries, as the genetic
resource poor, assert ownership claims on Technology can be a two-edged sword with
associated technologies, while developing respect to the effective level of IPR and
countries, as the genetic resource rich, claim the utilization of genetic resources. While
ownership of genetic resources. The heart of technology can facilitate the use of genetic
IPR and PVP 547

resources, it can also be used in a fashion An inbred containing the key genetics
that threatens to undermine existing levels of the male parent of a hybrid (hitherto
of IPR. Donnenwirth et al. (2004) gave the essentially impossible to access via a
following examples: hybrid) can similarly be recreated and
used.
Molecular marker technologies can be
used to attack trade secrets by rapid
identification of female parent inbred
line contaminants in bags of hybrid 13.8.6 Seed saving and plant variety
seed. These inbred lines might then be protection
used directly as parents of hybrids or as
parents for further breeding. Seed saving is a historical cultural phe-
Molecular marker technology can be nomenon that dates back to the beginning
used to identify segregating molecular of agriculture itself. It helps farmers control
characteristics in an otherwise uni- their enterprises and maintain their inde-
form cultivar and thus to select a dis- pendence; it allows them to predict how
tinct new cultivar from the segregating well a crop will perform in the following
source without any breeding effort being season; it allows them to participate in
expended. maintaining the crop; it serves as insurance
An existing cultivar could be trans- against inadequate supplies of seed; it helps
formed by genetic engineering and to maintain food security; and it creates a
thus achieve cultivar status by virtue of viable market that ensures that seed prices
its distinctness but without any effort remain affordable (Mascarenhas and Busch,
expended to change the genetic base of 2006). Because a seed contains within itself
the cultivar. the means for its own reproduction, seeds
An existing cultivar could be changed have offered a particularly large stumbling
just sufficiently and even only cosmeti- block to capital accumulation. In the
cally using marker-assisted breeding so USA, IPR legislation and Supreme Court
that it retains the important agronomic decisions have played a profound role in
attributes of the initial cultivar but overcoming these unique characteristics.
would evade the dependency resulting According to the ETC Group (2005) the top
from its status as an EDV through selec- ten seed companies including Monsanto,
tion for a molecular marker profile that Dupont and Syngenta now account for
is sufficiently different from the initial an estimated market value of US$21 bil-
cultivar. lion for commercial seed sales worldwide
An existing cultivar could be changed and about 50% of the global seed market.
dramatically in its overall DNA marker Mascarenhas and Busch (2006) argued that
profile yet contain some or all of the the combination of expanding IPR, new
key genetics impacting important agro- GM technology and the ideology of the
nomic traits due to targeted selection of technological treadmill have successfully
its genetics using molecular marker or overcome seeds inherent obstacles to capi-
genomics data. talist accumulation. As a result, US farmers
An inbred containing the key genetics are facing further loss of control of the farm
of the female parent of a hybrid can be production process.
rapidly recreated using one or a suite For example, US large soybean farmers
of technologies including di-haploidy, have consistently saved seed in the USA
molecular markers, genomics, winter as much as 60% in some years. However,
nurseries and high-throughput labora- with the introduction of Roundup Ready
tory genetic profiling and screening. soybeans the nature of seed saving was
The inbred can then either be used as drastically changed. Savings rates have
a parent of a hybrid or as a parent for ranged from a peak of 63% in 1960 to 33%
further breeding. in 1991. The decline in saved soybean
548 Chapter 13

seed from 1955 to 1974 before GM soy- per bushel. The decline in seed saving has
bean was approximately 1.4% year1. shifted a significant portion of the value of
However, with the introduction in 1996 bin-run seed from farmers to commercial
of Monsantos Roundup Ready soybean, seed retailers and their parent owners. The
a genetically modified herbicide tolerant value of bin-run seed in 2000 was about
cultivar, the rate of decline in soybean seed US$170 million or approximately half its
saving increased to 2.3% year1 from 1996 value before the introduction of Roundup
to 2002. Ready soybeans. This decline in bin-run
More remarkable perhaps has been the seed amounted to approximately US$374
intensive adoption of Monsantos Roundup million in additional profits in 2001 to
Ready soybean since its introduction, e.g. commercial seed retailers (Mascarenhas
Monsanto accounted for 91% of the world- and Busch, 2006).
wide GM-soybean area in 2004 (ETC Group, The above information is not to say that
2005). However, Ervin et al. (2000) suggest farmers who adopted Roundup Ready seed
that when examined worldwide, all cur- necessarily lost money. The major draw of
rently available transgenic crops account for Roundup Ready soybeans was that they
a yield increase of no more than 2%. On the required less farmer labour and manage-
contrary, government data sources reveal ment time. This point is significant, particu-
that in some areas seed saving has all but larly when one recalls that the persistence
ceased (USDA, 2002a). In order to explain of family farms has been through their abil-
the apparent contradiction, Mascarenhas ity to self-exploit farm labour. Furthermore,
and Busch (2006) invoked the theory of GM soybeans are relatively simple to use
technological treadmill (Cochrane, 1993) and increased flexibility in herbicide appli-
and they considered the rapid adoption of cation provided that one used a glypho-
Roundup Ready soybeans a classic exam- sate herbicide such as Roundup allows
ple of the technological treadmill. As spraying to occur throughout most of the
the theory suggests, given the inability of crop cycle. This flexibility also fits well
farmers to affect the prices they receive for with conservation tillage and other produc-
their commodity crops, farmers can only tion inputs currently in practice (USDA,
increase their profits by adopting new tech- 2002b).
nologies that decrease their costs. However, The decline in seed saving that has
only early adopters of new technology gain happened in the USA as shown by soybeans
because the efficiencies usually in terms and would also be expected elsewhere in
of increased profits gained from wide- the world bring up two important issues.
spread adoption itself pushes the prices First, to prepare for the natural disaster
received by all farmers downwards, thus or civil disturbance (such as particularly
abolishing any comparative advantage. severe weather and global climate change),
When confronted with the rapidly expand- it is essential to domestic food security
ing technologies of natures production that farmers: (i) already have some saved
farmers are left with few options: loyalty seed on hand; and (ii) have the requisite
to the technological treadmill or exiting skills needed to properly save that seed
the industry all together, the latter being an skills that are only maintained if seed can
option few are willing to consider. be regularly saved. The second issue is asso-
For as rapidly as seed saving has ciated with genetic diversity that might be
been declining, the cost of seed has been narrowed down because of the decline in
rising. For example, in 1975 a bushel of seed saving. If this narrowing continues it
soybean seed cost US$7.34. Twenty years will result in great homogeneity in domestic
later, in 1994, it was US$12.21. However, crops that are planted across expansive con-
in 1997, 1 year after the introduction of tiguous areas. And, as demonstrated in the
Roundup Ready soybeans, the price of case of the 19721973 southern corn leaf
soybean seed jumped to US$17.40 and blight, this lack of planted biodiversity can
6 years later, in 2003, sold for US$24.20 prove to be very costly.
IPR and PVP 549

13.8.7 Other plant products mented on both national and international


levels (Kartal, 2007). A golden triangle con-
As discussed in Chapters 1 and 5, plants sisting of traditional knowledge, modern
provide various resources for humans, medicine and modern science with systems
and crop cultivars developed by breeders orientation will converge to form an inno-
through various breeding programmes are vative discovery engine for newer, safer
mainly for food, feed, fibre, fuel and other and more affordable and effective therapies
needs. In many cases, breeding efforts are (Patwardhan, 2005). Countries and peoples
combined with collection and selection of providing the resources for natural products
natural products from plants to meet human research and drug development now have
demands. As an example, traditional medi- well-defined benefits and rights. This will
cine has been an important part of human have a direct impact on the sharing of ben-
health care in many developing countries, efits accruing from collaborations (Gurib-
and developed countries as well, for many Fakim, 2006). Increased legislation, rather
years. It is estimated that between 25,000 than assisting traditional knowledge hold-
and 75,000 plant species are used for tra- ers may in fact be to their detriment and
ditional medicine, 1% of which are known also discourage companies from investing
by scientists and accepted for commercial in bio-prospecting activities. Of greater con-
purposes (Aguilar, 2001). The world market cern to the world is the loss of traditional
for herbal medicines has been estimated at knowledge brought about by rapid sociocul-
US$60 billion with annual growth rates of tural changes and inhibition of research by
between 5 and 15%. At the moment, the creating bureaucratic minefields that may
mechanisms for IPR are not able to pro- prevent this knowledge from being docu-
tect traditional knowledge and indigenous mented and transmitted (Dutfield, 2003).
peoples (Kartal, 2007). Local communities The book edited by Biber-Klemm and
believe that they are subject of bio-piracy, Cottier (2006) discussed the basic issues
which is unauthorized use of traditional and perspectives on rights to plant genetic
knowledge or biological resources (Aguilar, resources and traditional knowledge. It
2001). Researchers or companies may claim covers the means, instruments and insti-
IPR over biological resources and tradi- tutions needed to create incentives to pro-
tional knowledge, after slightly modifying mote the conservation and sustainable use
them. However, the IP laws do not protect of traditional knowledge and plant genetic
traditional knowledge adequately. A har- resources for food and agriculture, within
monized system of traditional knowledge the framework of the existing world trade
protection should be effectively imple- order.
14
Breeding Informatics

In previous chapters, we have discussed been driven by the availability and acces-
genetic variation as it relates to plant breed- sibility of various types of information.
ing and the molecular tools used for the The first computer network, ARPAnet, was
dissection, transfer and selection of novel developed in the late 1950s as a product
traits and genes. Using these molecular of the Cold War. By the 1980s, universities
techniques may result in the generation of a throughout North America and Western
large amount of data. Extracting useful infor- Europe were connected via countrywide
mation from this ocean of data requires the networks such as the UKs Joint Academic
integration of different sources of data and Network (JANet). Molecular biologists were
the ability to analyse and visualize the data regularly logging in to central servers to run
in effective and efficient ways. The genom- sequence analysis programs and transferring
ics projects of the last decade are most often data from one machine to another. In the
carried out within universities, research early 1990s, the World Wide Web (WWW)
institutes and companies, closely allied was invented and turned the Internet into
with laboratories producing large quanti- the worldwide cultural phenomenon that
ties of data. While bioinformatics has been it is today. The WWW has made the con-
involved with the primary data, it has yet cept of a global village developed by
to become focused significantly on applied Marshal McLuhan decades ago into a
areas such as plant breeding. However, this reality. In 1991, Tim Berners-Lee and Robert
situation is beginning to change. Advances Caillou, scientists working at CERN (the
in plant breeding will depend heavily on Organisation Europenne pour la Recherche
how well we can manage and utilize all rel- Nuclaire: European Organization for
evant information (Xu et al., 2009b) In this Nuclear Research) in Geneva, developed
chapter, breeding-related informatics will the Hypertext Transfer Protocol (HTTP) as a
be discussed, including information collec- way of linking and cross-referencing docu-
tion, storage, integration and mining. ments held on different computers. Many
professionals, including plant breeders,
now make regular use of the Internet as an
14.1 Information-driven Plant integral part of their work.
Breeding Through these networks, information
has been transferred at an increasing rate
As in other disciplines, plant breeding and across the world and the quantity of infor-
in particular molecular plant breeding, has mation has been increasing exponentially.

550 Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu)


Breeding Informatics 551

Considering only DNA sequence data, the biology, biochemistry, statistics and compu-
volume of biological information is doubling ter and information science. It involves the
roughly every 6 months. This is faster than use of computer technologies and statistical
the exponential rate of increase in comput- methods to manage and analyse a huge vol-
ing power, as suggested by Moores law (an ume of biological data. Bioinformatics pro-
empirical observation made long ago that vides a common conceptual framework for
has held until today: the doubling of proces- molecular biologists, biochemists, molecular
sor power every 12 months) (Sobral, 2002). evolutionists, statisticians, computer scien-
With the development of high-throughput tists, information technologists and many
technologies, genotypic information includ- others to work together.
ing genetic polymorphisms and gene expres- Databases allow people to organize and
sion profiling, will increase exponentially. manipulate large amounts of data and to
Since the first molecular genetic maps based quickly translate and deliver that information
on restriction fragment length polymor- in useful summaries and formats. A database
phism (RFLP) markers were developed in can be defined as a structured collection of
the 1980s, a significant amount of molecu- records or data which is stored on a com-
lar marker and genetic map information has puter. The database can be queried and the
been generated and become available for records retrieved can be used to make deci-
many plant species. Genotype information sions. The computer software used to manage
is now generated primarily using PCR-based and query a database is known as a database
markers such as simple sequence repeats management system (DBMS).
(SSRs) and single nucleotide polymorphisms The structural description of a data-
(SNPs) and high-throughput systems. These base is known as a schema. The schema
molecular polymorphisms can be accurately describes the objects that are represented in
sized and readily compared across laborato- the database and the relationships among
ries and experiments. them. There are several different database
Historically, plant breeding has been models (or data models) and the most com-
driven by phenotypic information and a mon in use today is the relational model,
large amount of phenotype and pedigree data which represents information in the form of
has been accumulating for many decades in data records in different sets of tables and
plant breeding programmes. A typical exam- the relationships between them.
ple is the use of multi-environment trials There are four main components to
(METs), which have been in general prac- any database application: (i) a method for
tice for most plant breeding programmes entering or editing data usually data entry
for years. Yield trial data in many private screens or import functions; (ii) a data stor-
breeding companies can often be traced age mechanism a way of storing the data
back to the very beginning of each specific on the computer; (iii) a query mechanism
breeding programme. Most breeding insti- to allow users to filter and summarize data
tutions/companies have extensive facilities in structured ways; and (iv) a report genera-
and expertise in collecting phenotype data tor to extract and interpret information from
for various agronomic traits. Integrating the stored data.
this type of information with other sources The first basic concept to understand
of information from genetics and genomics about databases is the difference between
will lead to more efficient use of both types data and information. What we call data
of information for plant breeding. is really a collection of facts in a specified
domain; the facts may be measured values,
observations, responses or even pictures.
Data by itself is meaningless, but once it
14.1.1 Basics of informatics is organized in useful ways, it becomes
meaningful information. Therefore, essen-
Bioinformatics can be considered a combina- tially a database is nothing more than a
tion of several scientific disciplines including tool to organize and access large amounts
552 Chapter 14

of data so that people can turn it into useful 14.1.2 Gaps between bioinformatics and
information. plant breeding
The content of a database determines its
type. The main types of databases include There has been some delay in the uptake
those listed below and a combination of any of bioinformatics within the plant breeding
of them: community. Most bioinformatics databases
are lacking information on phenotypes, traits
Bibliographic: examples are library cat-
and other organism data, largely because
alogues and an article index. A library
bioinformatics grew out of the fields of
catalogue is a database that describes
molecular biology and biochemistry. When
what the library owns. Each item in
applied to plant breeding, bioinformatics
the catalogue describes a book or other
data must be combined with other types of
item in the library. An article index is a
information, including plant phenotype and
database that describes the contents of
information on the environment where the
a particular set of journals, magazines,
phenotype is measured. Therefore, breeding
newspapers and/or other documents.
informatics focuses on the development of
Full text: a full-text database pro-
breeding-centric databases and algorithms
vides the full text of a publication.
and statistical tools to analyse, interpret and
For instance, the research library in
mine these datasets (Xu et al., 2009b).
GALILEO (Georgia Library Learning
Although the recent explosion of genetic
Online) provides not only the citation
and genomic data for a wide range of plant
to a journal article, but often the entire
species has led to a proliferation of publicly
text of the article as well.
available plant databases, this wealth of
Numeric databases: examples are Cen-
knowledge has not yet found its way into
sus Bureau databases and databases for
mainstream plant breeding. There may be
stock market information, each contain-
several explanations for this. First, it is not
ing primarily numeric data (statistics,
obvious to many plant breeders how or if
census data, economic indicators, etc.).
much of the primary information generated
Image databases: these collect only image
in plant genomics can be applied to real-
information (EBSCO host image collec-
life breeding situations. Secondly, breeding
tion, www.ebscohost.com).
requires the integration of information from
Audio databases: those containing MP3
different sources, usually stored in different
or wav files, etc.
databases and managed by different groups
Meta-databases contain information of scientists (for example, pedigree, geno-
about databases. They allow users to search type and phenotype). Thirdly, many of the
for content that is indexed by other data- publicly available tools and interfaces avail-
bases. For example, the Genomes Online able for bioinformatic data are oriented at the
Database (GOLD, http://www.genomeson cellular/molecular level, while most breed-
line.org/) is an Internet resource for access ers are working and thinking at the organ-
to information regarding complete and ism level. Fourthly, until recently, much
ongoing genome projects around the world of the genomic research and therefore the
and JAKE (jointly administered knowledge publically available data has concentrated
environment, http://jake.openly.com) is a on the comparison of genes between species
meta-database of bibliographic databases. rather than the gene diversity within spe-
If you find a citation for an article in one cies required for plant breeding. Therefore,
of the bibliographic databases and want to there is a need to re-orient the tools and
determine if the article is available in full information so that crop researchers and
text in another database, you could do a biologists in general can query and use them
search for the journal in JAKE to get a list properly. As in many informatics projects,
of all the databases that index that specific an essential factor for success in plant bio-
publication and whether those databases informatics will be the ability to integrate
include it in full text. related information and to view and analyse
Breeding Informatics 553

it with tools that support decision-making The success story from the data-hosting
functions. As the volume of information institutions and those for managing large-
continues to increase, the need for such scale genome sequencing programs provide
tools grows. us with a clue that a web-based informa-
Bioinformatics data typically includes tion management system is a better bet for
cDNA and genomic sequence data, genetic modern plant breeding programmes than
maps of mutants, DNA markers and maps, local, stand-alone systems. The web-based
candidate genes and quantitative trait loci information management systems, once
(QTL), physical maps based on chromo- fully developed as stand-alone systems,
some breakpoints, gene expression data and are anticipated to offer several advantages,
libraries of large inserts of DNA such as bac- including the following:
terial artificial chromosomes and radiation
hybrids. Information flow from molecular They provide highly efficient cutting
markers to genetic maps to sequences and edge technology solutions for breeding
to genes has been established. However, institutions/programmes looking to dra-
there is a gap between the sequence-based matically improve quality and reduce
information and breeding-related infor- costs of information management.
mation such as germplasm, pedigree and They provide a universal information
phenotype. We will depend on phenotyp- management system that is suitable for
ing as the basis for the functional analysis all breeding programmes so that end-
of about 40% of genes, even though a com- user application setup and maintenance
plete sequence is available. Therefore, the is simplified in the Internet computing
integration of breeding-related information environment because there is nothing
with genomics databases is required for to install, configure, or maintain on the
genomics-based breeding programmes. users computer. Each institution has
Mayes et al. (2005) discussed how no need to maintain, manage, and inte-
genetic information can be integrated into grate their data using their own facili-
plant breeding programmes to produce cul- ties and personal.
tivars from molecular variation using bio- They accelerate breeding procedures
informatics and what crop scientists might by providing a much more affordable
want from bioinformatics. They examined information management system to
how bioinformatics tools might be used to breeders. The application needs only to
track down the underlying genes controlling be installed, configured and modified
sustainability traits and how these may then on the web server, reducing the risk of
be exploited in plant breeding programmes inconsistent configurations and incom-
using marker-assisted selection (MAS). patible versions of software between
client and the server machines.
They create a knowledge base for typi-
14.1.3 A universal system cal customer support questions; a sin-
for information management and data gle integrated source for all customer
analysis support inquiries; and ability to ana-
lyse information more effectively.
Modern plant breeding needs a stand- They stimulate collaborations by pro-
ardized and widely accepted system viding more accessible approaches to
for information acquisition, deposition, sharing data and sharing intellectual
classification, integration, interpreta- property based on mutual interests.
tion and utilization. Three major types of They provide effective, flexible and
information genotypic, phenotypic and competitive approaches to converting
environmental should be brought under data into knowledge that is critical to
a single umbrella, with comprehensive companies and institutions continu-
tools for integrating, extracting and ana- ously seeking ways to improve their
lysing useful information. products and services.
554 Chapter 14

With such a system, customers can use plant breeding. A rice double haploid (DH)
the power of data management and analy- population derived from IR64/Azucena has
sis, along with computational biology and been shared and used worldwide for the
comparative genomics, to create an intel- genetic mapping of many different traits
lectual property portfolio associated with with hundreds of genes/QTL identified,
their particular cultivars and hybrids. The largely based on the first generation RFLP
International Crop Information System map. However the original phenotypic data
(ICIS) to be discussed later in this chapter has never been shared. The phenotypic data
is under development towards providing collected across laboratories should have
such a universal system to worldwide plant been analysed through a meta-analysis and
breeding programmes, with a great potential fine mapped using the updated genetic map
but a long way to go. consisting of about a thousand SSR mark-
ers, rather than individual efforts using the
first version of the molecular map consisting
14.1.4 Transforming information to of only 175 RFLP markers. The same story
new cultivars can be found in almost all well-studied crop
plants. As both parental lines involved in
A challenge to modern plant breeding is the rice example have been used widely for
how to best utilize all relevant information breeding yield and adaptive traits, a collec-
efficiently and comprehensively, harness- tive effort bringing all related information
ing the power of informatics to support on one page and mining it through an inte-
molecular breeding. Integrated exploration grated data analysis would help transform
of genotypic, phenotypic and environmen- them into new cultivars.
tal information is critical for more efficient
and predictable plant breeding.
A database would organize genetic
information and give breeders the oppor- 14.2 Information Collection
tunity to pose specific questions through
a software interface, helping them make 14.2.1 Data collection procedures
selections and identify desired parents and
progeny. Breeders will be able to look for Planning the research and developing data
particular traits they want to breed for by collection strategies is the first step in
going back through breeding history and research management. Before data collec-
pedigrees to see where traceable character- tion begins, the following questions should
istics could come. be clearly answered: What hypothesis is to
As a result of years of research on genetic be tested? What data needs to be collected?
mapping, allele mining and molecular and How will this data be collected? What
functional diversity analysis of germplasm equipment or supplies are needed for the
collections, we have amassed a large body of data collection?
knowledge regarding the genomic location of Plant breeding information comes
factors/alleles that affect specific agronomic from many different sources and in many
traits and the allelic variation available for different forms, including a description of
utilization in plant breeding, but often it the plant itself, its genotype and pheno-
is not in an easy format for all researchers type and a depiction of the environment
to use. For most of the traits, it is very dif- (Fig. 14.1). What data should be in the
ficult to detect the presence of particular repository depends on many factors, how-
alleles when the lines are only examined ever, if human and computing resources
phenotypically in the field; however, by are not limiting it is advisable to preserve
examining the lines at the DNA or sequence all the historical data, so that it can reana-
level, this becomes possible. Xu, Y. (2002) lysed for new hypotheses and guide new
provided an example to illustrate the impor- research. Whatever system will be built, it
tance of this effort in information-driven should be flexible, because there are clients
Breeding Informatics 555

Genetic analysis

has has
Genotype Germplasm Phenotype
De n
ter Inventory n
n Genetic maps mi s Anatomical
ne n Genealogy ine
n Physical maps s te rm n Developmental
n DNA sequence
De n Field performance
n DNA markers n Stress response
Molecular
n Functional annotation n Transcriptome
expression
n Molecular variation n Proteome
(natural or induced) n Metabolome
n Physiology
Affects

n Location (geographic information systems, GIS)


n Climate
n Day length
Environment n Ecosystem
n Agronomy
n Stresses
n Soil and water
n Cropping systems
n Accompanying organisms

Fig. 14.1. Crop biological concepts, relationships and breeding related information. Modified from
Richard Bruskiewich (ICIS Workshop, 2005, http://www.icis.cgiar.org).

who require minimum data but there are multiple factors including environmental
others who require all. In general, data and measurement errors, multiple replica-
should include germplasm information tions are usually required for most quan-
(passport data, pedigree and genealogy, titatively inherited traits. To check the
genetic stocks), genotypic information data quality and phenotyping reliability,
(DNA markers, sequences, and expression data collected from multiple replications
information), phenotypic information and within a trial can be analysed for between-
environmental information. replication correlation. When a relatively
A reliable data-collection technique large genetic variation exists within the
will ensure that information is systemati- tested population or cultivars, correlation
cally collected in a manner compatible with coefficients should be high, e.g. 0.6 and 0.8
other existing information and it should or higher, for traits with medium and high
take into account the following considera- heritability, respectively.
tions: controls or checks to be used in data Often there is relevant information
collection, sampling method, sample size, that has already been collected by others,
testing sites, replications and previously although it may not necessarily have been
used data-collection techniques. Bias during analysed or published. Taking the effort to
data collection could come from defective locate and review this information is a good
instruments, biased observation, sampling starting point and can help in planning a
errors, etc. Quality control for data collec- more efficient experiment.
tion can be done by checking relevant data
and comparing and contrasting the collected
information with expectations, controls and
hypotheses. In addition, before the data is 14.2.2 Germplasm information
entered into a database, some preliminary
organization and analysis might be needed. Information for a specific germplasm acces-
As phenotyping procedures are affected by sion could include passport data, pedigree
556 Chapter 14

and genealogy information and all other Passport data


measurements at a genotypic or phenotypic
level. For a germplasm collection, the infor- Passport data includes accession number
mation can include genetic relationships and/or other numerical identifiers, attributes
and structure within the collection and describing the origin (country of origin,
other characteristics determined through collection site, collection expedition, donor
population genetic analysis. institute), botanical classification (scientific
As a major effort in characterizing name, taxonomic system, crop, regenera-
genetic resources for several important tion method) and breeding information
crop species, the Generation Challenge (institute, method and stage). Most plant
Program (GCP) has undertaken molecular germplasm databases contain passport data
characterization of thousands of acces- for their maintained accessions.
sions and is to cross-link the information
to facilitate in the discovery of novel alle- Pedigree and genealogy
les and germplasm for crop improvement,
with a focus on stress traits (drought in A pedigree represents the ancestral his-
particular). This project contains a series tory of a strain and shows how a particular
of informatics development components, accession/strain was derived from its par-
aimed at the development of an integrated ents, including crossing, selfing, backcross-
web-enabled platform for crop research ing and selection. Genealogy is the ancestral
(Bruskiewich et al., 2008). Obviously this relationships among sets of germplasm,
kind of effort has been going on inter- which is a more general description of the
nally in most large breeding companies breeding history of an accession/strain. For
as well. Some selected Internet resources most crop plants, there is a registration sys-
on germplasm resources are provided in tem which requires pedigree and genealogy
Table 14.1. information. Names are assigned when a

Table 14.1. Selected Internet resources on germplasm resources.

Intergovernmental organizations
Commission on Genetic Resources for Food and Agriculture (FAO): http://www.fao.org/ag/cgrfa/
Consultative Group on International Agricultural Research (CGIAR): http://www.cgiar.org/
Convention on Biological Diversity Secretariat: http://www.biodiv.org/
FAO Plant Genetic Resources: http://www.fao.org/ag/cgrfa/PGR.htm
Bioversity International: www.bioversityinternational.org
CGIARs System-wide Information Network for Genetic Resources (SINGER): http://singer.grinfo.net/
System-wide Information Network for Genetic Resources (SINGER): http://singer.cgiar.org/
National/regional activities
Asian Vegetable Research and Development Center: http://www.avrdc.org/
Information System Genetic Resources: http://www.genres.de/genres-e.htm
Centre for Genetic Resources, The Netherlands: http://www.cgn.wur.nl/UK/
UK Plant Genetic Resource Group: http://ukpgrg.org/
N.I. Vavilov Research Institute of Plant Industry, Russia: http://www.vir.nw.ru/
Southern African Development Community (SADC) Plant Genetic Resources Project: http://www.ngb.
se/sadc/sadc.html
United States Department of Agricutlure (USDA) Genetic Resources Information Network: http://www.
ars-grin.gov/
Chinese Crop Germplasm Information System: http://icgr.caas.net.cn/cgris_english.html
Non-governmental organizations
Conservation International: http://www.conservation.org/
Global Biodiversity Forum: http://www.gbf.ch/
World Resources Institute: http://www.wri.org/
Genetic Resources Action International (GRAIN): http://www.grain.org/
Breeding Informatics 557

cultivar is released or an accession/strain is for a standardized genealogy management


collected; however, there is no standard for system. Figure 14.2 provides an example
translating the name from one language to of using the concept of ontology to name a
another. As a result, a given cultivar devel- germplasm accession.
oped in a country like China can have sev-
eral different names when it is translated Genetic stocks
into English, depending on which Chinese
pronunciation system is used and which A genetic stock is a plant sample that
rules are used to separate multiple Chinese expresses a specific variation (or a specific
words in the cultivar name. Also, the way small set of variations). Thus, genetic stocks
names are recorded by different breeders are living examples of their underlying
varies with dashes, spaces and codes. genetic variation. A database with genetic
Genealogy management systems have stock information helps scientists interested
been established in several international in a particular variation or locus to find and
research organizations. For example, acquire living tissue that contains a desired
the International Center for Agricultural variation.
Research in the Dry Areas (ICARDA) has The most frequently used types of
such systems for barley and chickpea. plant genetic stocks include near-isogenic
Creation of a genealogy ontology is required lines (NILs), single, series or genome-wide

By an accession number
externally given
locally given
Is a name for a commercial cultivar
heterozygous
propagated asexually
is a homogenous collection of a single heterozygous genotype
is a clonal traditional cultivar
propagated by crossing
of a small number of inbred lines
by farmer selection
by mass selection
of inbred lines
homozygous
derived from a single plant of a highly inbred population
is a traditional cultivar
is a deliberate mixture of fixed lines
Is the name of a (collection of) homozygous individual(s)
from a breeding population
collected
from a bulk of many plants
is a weed
not a weed
from a single plant
Is the name of a population (collection of heterogeneous /heterozygous individuals)
used for breeding
a recurrent selection cycle name
a recognized genetic stock
a deliberate mixture of populations
named after place of collection
collected off farm

Fig. 14.2. Genealogy ontology: how a germplasm accession is named. From the International Crop
Information System (ICIA) Workshop (2005).
558 Chapter 14

mutants, populations with segregating geno- characteristics describing the markers and
types (recombinant inbred lines (RILs), DHs, how to best apply them to plant breeding.
introgression lines (ILs)), cytogenetic mate- As an example, information about a PCR-
rial (primary trisomics, translocation lines, based marker is given in Table 14.2, which
etc), cell culture lines and gene and DNA is what a molecular database could manage
clones. Genetic stock availability may vary and provide.
greatly from one crop species to another. For There are some databases that have
example, the National Institute of Genetics been developed with features for display-
(Japan) provides information on about ing marker-related information. Despite
11,000 genetic stocks developed in Japan. the large numbers involved with SNPs and
These resources include marker gene test- polymorphism data, presenting SNPs on a
ers, mutant lines, isogenic lines, autotetra- genome browser has become fairly straight-
ploid lines, primary trisomics, reciprocal forward. For example, Ensembl can show
translocation homozygote lines, cytoplasm SNP locations as a track in ContigView dis-
substitution lines and cell cultured lines. plays, with colour coding to highlight those
located in coding, intronic or upstream
areas of genes. Clicking on an SNP produces
a SNPView page with further details includ-
14.2.3 Genotypic information
ing, where appropriate, primers, validation
status, heterozygosity, strain differences
It is genotypic information that funda- and links to entries in variation databases
mentally drives the breeding informatics such as dbSNP and HGVBase (Hammond
and products. Genotypic scores are deter- and Birney, 2004) and Panzea for plants.
mined by the genotype of an individual
or a pooled DNA sample of multiple indi-
viduals. A genotype is determined by Table 14.2. Information associated with PCR-
DNA sequences, the genes encoded by the based DNA markers.
sequences and the gene products translated
from the sequences. Therefore, genotypic Marker per se
information is based on underlying DNA Marker name and synonyms
polymorphisms, which can be detected with Repeat motif/enzyme and repeat length
many different techniques (Fig. 14.1). The Primer sequence
PCR protocol (i.e. annealing temperature and
action of these genes is not always additive,
number of cycles)
thus epistasis, the particular combination of Expected allele size in a control cultivar or a
alleles and genotype-by-environment inter- group of cultivars
action can be of great importance. Number of alleles
Allele frequency (most/least frequent allele)
Signal strength
Molecular markers Allele size/range
Polymorphic information content
Many molecular breeding projects involve
Chromosome location
a large number of molecular markers (hun- Linkage to other markers
dreds to thousands) that cover the whole Images and gel pictures
plant genome. These markers are used to References
genotype, or fingerprint, a large number of Source (inventory)
accessions (an entire or core collection, indi- Patent information
viduals derived from a selected cross, a set of Historical data (e.g. associated with a trait)
landraces or a population). This will create Project data (date, title, germplasm, reports, etc.)
a valuable database of information that can Marker-derived
be used to determine which crosses or indi- Genetic maps (including haplotype block)
Physical maps
viduals may be more valuable than others.
Consensus maps
Information for DNA markers (e.g. Comparative maps
primers for PCR-based markers) includes
Breeding Informatics 559

As discussed in the previous chapters, putative polymorphic sequences that are


molecular markers are required for a broad segregating between different collections.
spectrum of gene screening approaches, A web presence at http:///markers.btk.fi
ranging from gene mapping within a tradi- provides functionality allowing a user to
tional forward-genetics approach, to QTL search for species-specific markers on the
identification studies, to genotyping and basis of many specific criteria, not lim-
haplotyping studies. As we enter the post- ited to non-synonymous SNPs segregating
genomics era, the need for genetic mark- between different cultivars or measured
ers does not diminish, even in the species polymorphic SSRs.
with fully sequenced genomes. Regardless
of the ultimate reason for the application Sequences
of molecular markers, there is a general
need for the characterization of molecular The typical data for sequences are strings
markers and high density, uniform maps of nucleic (or amino) acid residues. Each
that represent whole genomes. The avail- DNA or protein sequence database entry
ability of complete genomic sequences in has the following information: an assigned
several plant species has led to the creation accession number, source organism, name
of several genome-based resources to expe- of locus, reference, key words that apply to
dite and facilitate traditional breeding and the sequence, features in the sequence such
mapping approaches. Now, large expressed as coding regions, intron splice sites and
sequence tag (EST) collections are avail- mutations and finally the sequence itself.
able for more and more plant species, with Amino acid sequences are derived from the
more than 1800 species having over 63 mil- translation of cDNA sequences or predicted
lion ESTs in dbEST (25 September 2009; gene structures in genomic DNA sequences.
http://www.ncbi.nlm.nih.gov/dbEST). EST Partial sequences are also derived by the
sequence information along with the under- translation of EST sequences or genomic
lying redundancy and parental associations DNA sequences in all six reading frames.
has been used in predictive approaches for There are many structural features that
SNPs, SSRs and conserved orthologue set can be used to describe a specific protein,
(COS) markers and the speed and success including active site, sequence and struc-
of marker characterization and validation tural motifs, domains, fingerprints, primary
typically exceeds traditional approaches. structure, sequence and structural profiles,
Given the technical ability to detect molec- three-dimensional structure and family
ular markers in silico, there is truly a vast classification.
potential collection of molecular markers Plant genome sequence data have been
present with these sequences. A database accumulating from three major sources: (i)
resource, PlantMarkers, has been created whole genome sequencing and assembly
for the prediction, analysis and display (e.g. Arabidopsis thaliana, rice, Medicago
of plant molecular markers (Rudd et al., truncatula); (ii) genome survey sequencing
2005). Techniques have been developed (maize); and (iii) ESTs (for all target spe-
to identify putative SNP, SSR and COS cies). This data flow will likely continue,
markers from the available sequence col- with a focus on the complete sequencing of
lections. A systematic approach to iden- reference species (Arabidopsis, rice, sor-
tify a broad range of putative markers has ghum, maize, M. truncatula, tomato), draft
been undertaken by screening the available sequencing of other selected species and fur-
openSputnik unigene consensus sequences ther EST and full-length cDNA sequencing.
from over 50 plant species. Putative A challenge facing the plant biotech-
markers have been anchored to available nology and bioinformatics research commu-
protein-coding sequences where possi- nity is the translation of complete genome
ble. Underlying sequence annotations that sequence data into protein structures and
relate to clone library, strain or cultivar have predicted functions. Such a step will pro-
been retained, allowing for the selection of vide a vital link between the genetics of
560 Chapter 14

an organism and its expressed phenotype. disciplines within bioinformatics. Microarray


Proteomes have a strong influence on the technology continues to expand: cDNA
measured phenotype of the plant, either arrays are being produced for gene expres-
directly through protein content or func- sion analysis in many plant species and
tion, or indirectly through the relationship complete oligonucleotide-based UniGene
of a protein with the metabolome. arrays are being developed for the major
plant species. Thus, the volume of plant
Expression information breeding-related data being generated con-
tinues to expand. With continued devel-
In contrast to DNA sequences which opments in the field of microarray data
may show variation across a population production, significant improvements in
but invariance within individuals, gene data analysis and integration are necessary
expression is highly dynamic and it is this before these data can be structured for effi-
variation in the expression patterns that cient interrogation.
researchers use to study the relationships
between genes. In isolation, gene expression
data contains little inherent information.
The value or meaning of gene expression 14.2.4 Phenotypic information
information comes from the context of the
experiment (Sobral, 2002). What were the The definition and scoring of phenotypes
taxonomy, sex and developmental stages of has a key role in genomics and plant breed-
the organism? What were its growth con- ing. Phenotypic data consist of all data col-
ditions? From which organ and tissue was lected in various genomics and breeding
the sample extracted? What protocols were programmes, either for basic research or
used in sample preparation? product delivery, which describe a distin-
Microarrays, designed from sequence guishable feature, characteristic, quality or
data, have been used to measure changes in physical feature of a developing or mature
gene expression in response to changes in individual. Examples are concepts of glu-
ecologically relevant variables. Microarrays ten endosperm, disease resistance, plant
provide high-throughput identification of height, photosensitivity, male sterility, etc.
the transcriptional activity of the cell. This Phenotypes result from the expression of
capacity places them at the centre of the the genotype in specific environments. Here
revolution of plant functional genomics, environment can be thought of in very gen-
just as high-throughput sequencing once eral terms, including not only external grow-
was. There are many computational chal- ing conditions but also internal conditions
lenges relevant to gaining new insights from such as other gees, regulators, organ and
the analysis of the patterns of gene expres- growth stage. In these terms, gene expres-
sion in response to environmental stress sion data are phenotypic.
and responses. These include new methods Phenotypes are important for several
for functional sequence annotation by com- reasons. They allow us to observe geneti-
bining sequence and expression data, better cally inherited traits and events and aid
probabilistic techniques that make use of in genetic manipulations. Genetic changes
variation in sequence and expression data that alter a trait that can be scored physi-
at the population and species level and the cally have been exploited to great advant-
means to use genomic data in models that age. Phenotypes are also crucial because
improve our explanatory and predictive they are the expression of genotypes and
capabilities. reveal gene function. In this regard, pheno-
The application of microarrays and types are an essential intermediate in the
sequence-based methods to expression pathway from basic genetics to biological
profiling has added an extra dimension to understanding.
current genomic data and has led to the Traditionally, plant breeders generate
development of several statistics-based and collect large amounts of phenotypic
Breeding Informatics 561

data. These data are associated with dif- 14.2.5 Environmental information
ferent breeding procedures or stages. The
most systematically collected information Environmental informatics may be viewed
relating to plant breeding is probably yield as a merging of biodiversity and ecological
trial data, which has been accumulating for informatics with geographic information
many years and in many plants, since con- systems (GIS) and other environmental data
trolled plant breeding programmes started. (Fig. 14.1). Environmental data include all
As more and more breeding objectives are the environmental factors that contribute to
added and advanced instruments are devel- crop growth and development, including
oped, many additional phenotypic traits are soil type as well as chemical, moisture and
now measured, such as nutritional charac- nutritional components in the soil, daily,
teristics, chemical responses, stress toler- monthly and annual temperature, humidity
ance, etc. and precipitation profiles, day-length and
Categories of phenotypes of interest to even winds and other climatic factors as
plant breeders include yield and yield com- listed in Table 14.3, plus many environmen-
ponents, product quality and biochemical tal factors, such as drought and cold tem-
characteristics, morphological characteris- perature, that cause stress to crop plants.
tics (e.g. plant height), physiological char- GIS has proven to be of great utility to
acteristics (e.g. flowering time) and abiotic predict the environments in which wild
and biotic stress tolerance. A more compre-
hensive list can be derived from the breed-
ing objectives as listed in Section 1.7. Table 14.3. Environmental factors affecting plants
In addition to phenotypic data and plant breeding.
generated by plant breeders, more phe-
notypes are being generated by physiolo- Soil
gists, geneticists, pathologists and other Texture
biologists for both model and non-model Water content
organisms. New technologies such as RNA Fertility
interference (RNAi) now make genome- Nutrient content
Production index
wide knock-down studies feasible and have
Air
already been applied in a high-throughput Pollutants
manner for many novel characteristics. Emission of CO2
As microarray techniques become widely Light
used in transcriptomics and metabolomics, Light intensity
molecular phenotypic data will keep wid- Day length
ening the definition of phenotype. Temperature
We need efficient, precise and compre- Average, maximum, minimum daily temperature
hensive large-scale phenotyping techniques. Effective temperature
This presents a difficult challenge because Length of available growing period
Water
phenotypes are numerous and diverse and
Humidity
they can be observed and annotated at the Precipitation
molecular, cellular and organism levels. Ground water
Bochner (2003) described the efforts to Water quality
develop new and efficient technologies for Potential evapotranspiration
assessing cellular phenotypes for simple Cropping systems
microbial-cell model organisms such as E. Intercropping
coli and Saccharomyces cerevisiae. Such a Previous crop
system could be exploited for the character- Accompanying organisms
ization of in vitro culture of any plant spe- Root microorganisms
Weeds
cies. Phenotypic profiling through a whole
Pathogens
plant procedure will facilitate phenotypic Insects
data collecting and processing.
562 Chapter 14

ancestors of crop plants can be expected 14.3 Information Integration


to flourish, through a mathematical com-
parison of climatic data at known collec- For genomics to deliver the promise of
tion sites with that at all other sites. In the making agricultural systems more efficient,
strictest sense, GIS is a computer system sustainable and environmentally friendly,
capable of integrating, storing, editing, ana- various types of biological data must be
lysing, sharing and displaying geographi- integrated to reveal the functional rela-
cally referenced information. In a more tionships between DNA, RNA, proteins,
generic sense, GIS is a tool that allows environment and phenotypes (Sobral et al.,
users to create interactive queries, analyse 2001). Such integrative approaches will rely
spatial information, edit data and maps heavily on the development of information
and present the results of all these opera- systems and analytical methods that will
tions. GIS technology can be used for scien- require the same rigour and resources that
tific investigations, resource management, have been applied to the development of
asset management, environmental impact laboratory experimental protocols. The new
assessment, urban planning, cartography, challenges facing the field of bioinformat-
criminology, history, sales, marketing and ics are to provide complex data integra-
route planning. For example, Canadian tion between traditional genetics through
Geographic Information Systems was used the genome, transcriptome, proteome and
to store, analyse and manipulate data col- metabolome (omics disciplines) and the
lected for the Canada Land Inventory (CLI) observed phenotypes (Edwards and Batley,
an initiative to determine the land capa- 2004). The major challenges for the plant
bility for rural Canada by mapping infor- science community will be how to extend
mation about soils, agriculture, recreation, genomics from models to crops and how to
wildlife, waterfowl, forestry and land use. integrate various types of data.
While breeding programmes in high- It is increasingly recognized that many,
income countries may employ real-time GIS if not most, biological functions result from
information to more accurately weight infor- interactions or networking among many
mation from METs (Podlich et al., 1999), components. This suggests that biological
those opportunities rarely exist in low- disciplines will need to incorporate the con-
income countries as there is a lack of both cept of systems, as is typical in engineering.
real-time GIS information and the resources As indicated by Sobral (2002), it will be nec-
for conducting a large number of METs. In essary to provide integration for at least the
the 1997 season, the International Maize following types of molecular data: (i) struc-
and Wheat Improvement Center (CIMMYT) tural genomics: DNA sequences (to com-
initiated a programme targeted at improving plete genomes) and maps (genetic, physical
maize for the drought-prone mid-altitudes of or cytological); (ii) gene expression: mRNA
southern Africa. The breeding programme profiling and single gene profiles (Northern);
was product-oriented and therefore simul- and (iii) biochemistry: pathways (metabolic
taneously addressed several high-priority and signalling), metabolites, proteomics.
constraints including drought, low N and When considering breeding-related infor-
major leaf and ear diseases. To develop mation, the integration should be extended
breeding strategies that increase productiv- to include all kinds of phenotypic and
ity in highly variable drought-prone envi- environmental information.
ronments, cluster analysis applied to the Meeting the demands and challenges
most prominent genotype-by-environment of an ideal plant improvement strategy
interactions grouped trial sites into eight remains a matter of combining traditional
mega-environments (Bnziger et al., 2004), breeding concepts and genomic tools
mainly distinguished by GIS information through rigorous phases of experimenta-
available for seasonal rainfall, maximum tion, including information integration.
temperature, subsoil pH and N application Logical connections among various types
(Hodson et al., 2002). of information will enhance the intrinsic
Breeding Informatics 563

value of raw data of all types and facili- information and knowledge (Sobral, 2002).
tate new biological discoveries. For a given Standardization of databases has been
gene, for example, a database could horizon- receiving more and more attention because
tally link sequence, structure, map position it is required for integration across data-
and associated germplasm accessions and bases. Currently the emphasis is in func-
could include related elements pertain- tional genomics but is expanding to other
ing to the expression profile of the gene, fields including plant breeding. A number
its protein structure, example phenotypes of initiatives have proposed standard report-
and environmental factors that affect gene ing guidelines for functional genomics
expression. All this information should be experiments. Associated with these are data
correlated with the genetic resources avail- models that may be used as the basis of the
able for a given crop. design of software tools that store and trans-
At the level of databases, there are three mit experiment data in standard formats.
main ways to integrate information, referred Data standards and the formal data
to as link integration, view integration and descriptions that underlie them may yield
data warehousing. For link integration, a range of benefits including the following
researchers begin their query with one data (Jenkins et al., 2005): (i) consideration and
source and then follow links to related infor- development of best practice and standard
mation in other data sources. View integra- operating procedures, which, in turn ena-
tion leaves the information in its source ble proper interpretation of experimental
database, but builds an environment around results, principled dataset comparison and
the databases that makes them appear to be experiment repetition; (ii) standardized
part of one large system. A data warehouse reporting of experiments and deposition
brings all the data together under one roof and archiving of data associated with pub-
in a single database. Information integration lications or other standard pieces of work;
will be promoted by standardized data col- and (iii) development of databases and veri-
lection, shared vocabularies/terms (ontol- fiable transmission mechanisms for storage,
ogy) and the development of database tools collection and dissemination of results.
that help cross-database querying and paral- When phenotypes are characterized as
lel analysis of related data. a whole (phenome), which are characteris-
tics of organisms that arise via the interac-
tion of the genome with the environment,
there is a need for phenotypic standardiza-
14.3.1 Data standardization tion that has been recognized by breeding
and stock centres. Several projects associ-
Genomics and plant breeding are generating a ated with handling genetic mutants have
large and heterogeneous set of data. Efficient begun to develop a standardized approach
sharing, computational integration and to developing annotation and databases.
accurate scientific interpretation of research Jenkins et al. (2005) described the collec-
outputs will require some agreement about tion of datasets that conform to the recently
the format and semantics of the basic data. proposed data model for plant metabo-
A common set of biological domain models lomics known as ArMet (architecture for
is essential to achieve this goal. A standard- metabolomics) and illustrated a number
ized nomenclature will facilitate database of approaches to robust data collection
searches, comparisons and extrapolations that have been developed in collaboration
throughout model biological systems from between software engineers and biologists.
bacteria to Arabidopsis and rice. Database curators are working to con-
To achieve some of the benefits possible vert raw datasets contributed by researchers
from an integrative approach to biological from their original format, including struc-
questions and information, it is necessary ture, syntax, assumptions, naming rules
to create standards for data interpretation and conventions, into a format compatible
and comparison and its transformation into and contextually consistent with respective
564 Chapter 14

genome databases, while maintaining accu- aim is to build a generic organism database
racy of fact and interpretation. In addition, toolkit to allow researchers to set up a genome
curators help users access and query the database off the shelf. Recent developments
data and cooperate with other groups to in the Ensembl system include access to
improve the software and data distribution inter-species sequence levels and improve-
infrastructure. ments to the display of polymorphism data
while users can display their own data in the
context of other annotation (Hammond and
Birney, 2004).
14.3.2 Development of generic
databases

The increasing number and types of data- 14.3.3 Use of controlled vocabularies
bases and software applications make it and ontologies
more and more difficult for researchers to
determine which databases to use for vari- The diverse databases reflect the expertise
ous types of information. A universal or and interests of the groups that maintain
updatable database system is required and them. A current limitation of complex anno-
so is an automated updating system. In add- tation and integration is the lack of agreed-
ition, the different ways in which data are upon formats across databases. There are
accessed and presented create an additional many integration challenges. One of the
burden on researchers who seek to apply the most difficult is the one that might seem the
available resources to their research. Using most minor: how do you assign and main-
biodiversity as an example, the difficulties tain the correct names of biological objects
in finding, accessing and using biodiversity across databases?
data include the long history of the bot- A more subtle problem is the clash of
tom-up evolution of scientific biodiversity concepts as users move from one database to
information, the mismatch between the dis- another. An extreme example, first noted by
tribution of biodiversity itself and the distri- Michael Ashburner, considers the use of the
bution of information describing it and most term pseudogene by different researchers
importantly, the inherent complexity of and research communities. To some, a pseu-
biodiversity and ecological data. This stems dogene is a gene-like structure that contains
from numerous data types, the non-existence in-frame stop codons or evidence of reverse
of a common underlying language and the transcription. To others, the definition of
multiple perceptions of different research- a pseudogene is expanded to include gene
ers/data recorders across spatial or temporal structures that contain full open read-
distance or both (Lane et al., 2000). ing frames (ORFs) but are not transcribed.
Emerging technologies to solve these Some members of the Neisseria gonorrhea
problems have been proposed, such as research community, meanwhile, use pseu-
the BioMOBY initiative (Wilkinson et al., dogene to mean a transposable cassette that
2005). BioMOBY is an international research is rearranged in the course of antigenic vari-
project involving biological data hosts, serv- ation (Stein, 2003).
ice providers and coders whose aim is to There are also more subtle disagree-
explore various methodologies for biologi- ments. The human genetics community uses
cal data representation, distribution and the term allele to refer to any genomic vari-
discovery. In addition, the National Human ant, including silent nucleotide polymor-
Genome Research Institute has funded a col- phisms that lie outside of genes, whereas
laborative project called the Generic Model members of many model organism com-
Organism Database (GMOD; http://www. munities prefer to reserve the term allele
gmod.org) to promote the development and to refer to variants that change genes. Even
sharing of software, schemas and standard the concept of the gene itself can mean radi-
operating procedures. The projects major cally different things to different research
Breeding Informatics 565

communities because it has been refined in globally unique identifiers. One line
as the field of genetics moves forward. holds that object identifiers should point to
Some researchers may treat the gene as the the objects themselves and use a Uniform
transcriptional unit itself, whereas others Resource Locator (URL) syntax. The other
extend this definition to include up- and decouples the notion of the location of a
downstream regulatory elements and still resource from its authoritative source.
others use the classical definition of cistron Dictionaries, encyclopedias and data-
and genetic complementation. Plant breed- base schemas are examples of ontologies,
ers may consider a gene to be a manipulable as are many web-based entities, such as
unit during the breeding process, which can the search engines Yahoo! and Google. One
be as big as a gene complex that is trans- approach to this problem is to have biolo-
ferred in conventional backcross breeding, gists describe and conceptualize common
or as small as a single nucleotide difference biological elements and produce a dynamic,
that can be detected in MAS. controlled vocabulary that can be applied to
It will be increasingly desirable for inter- certain types of organisms. An ontology is
database queries to be performed to exploit simply an organized set of concepts about
comparative genomic and phenomic strate- a specified domain. It generally consists of
gies in order to elucidate functional aspects two components: (i) an indexed controlled
of plant biology and study synteny. However, vocabulary of terms (the concept); and
terms used to describe comparative objects (ii) information about semantic relation-
within and between databases are some- ships between these terms. As sophisticated
times quite variable and limit the ability to types of controlled vocabularies that attempt
accurately and successfully query informa- to capture the main concepts in knowledge
tion in and across different databases. To domains, ontologies are important facilita-
solve this problem, controlled vocabularies tors but they do not, by themselves, lead to
and ontologies become increasingly impor- the integration of biological databases. The
tant. Unique identifiers that are associated existence of a shared ontology allows two
with each concept in biological ontologies databases to be merged with some guarantee
(bio-ontologies) can be used for linking that a term used in one database corresponds
and querying databases (Bard and Rhee, to the same term in the other.
2004). Natural language processing (NLP) There are many ontology projects that
techniques are increasingly being used range from descriptions of mutant pheno-
to automate the capture of new biologi- types in plants to anatomical structures
cal discoveries described in text. A novel in vertebrates (details can be found at the
representational schema, PGschema, was Global Open Biological Ontologies web
developed that enables translation of pheno- site). Imperfect ontologies in biology are
typic, genetic and other related information the gene ontology of terms for protein and
found in textual narratives to a well-defined gene sequences, the minimum information
data structure comprising phenotypic and about a microarray experiment (MIAME)
genetic concepts taken from established (Brazma et al., 2001) and plant ontologies
ontologies along with modifiers and rela- for broader plant-based information (Plant
tionships (Friedman et al., 2006). Ontology Consortium, 2002). Once an
Shared ontologies can help bioinforma- ontology is established, databases need to
ticians agree on how to describe biological be annotated under the agreed terms. At
objects, but they do not necessarily help present, only a few plant genome databases,
them agree on how to name them. The same such as Arabidopsis and rice, have primary
biological object might have multiple names gene ontology annotation.
and the same name might denote multiple In an effort to address the need for con-
objects. One approach is to establish glo- sistent descriptions of gene products in dif-
bally unique identifiers to standardize the ferent databases, the Gene Ontology (GO)
description. There are two main lines of project began as a collaboration between three
thought among groups that are interested model organism databases (Gene Ontology
566 Chapter 14

Consortium, 2000): FlyBase (Drosophila), At lower taxonomic levels the develop-


the Saccharomyces Genome Database (SGD) ment of phenotypic trait ontologies promises
and the Mouse Genome Database (MGD) in to provide a formal framework for navigat-
1998. Since then, the GO Consortium has ing between crop phenotype and genome.
grown to include many databases, including The Trait Ontology initiative is at an early
several of the worlds major repositories for stage of development and considerable work
plant, animal and microbial genomes (see is required to match the plant development-
the GO web page http://www.geneontology. focused hierarchical phenotype to relevant
org for a full list of member organizations). agronomic traits. Initial progress (http://
The GO collaborators are developing three www.gramene.org) in matching traits associ-
structured, controlled vocabularies (ontolo- ated with rice to underlying biological entries
gies) that describe gene products in terms of provides a focus for discussion among the rel-
their associated biological processes, cellu- evant research communities. Although this
lar components and molecular functions in is starting to be achieved by associating rice
a species-independent manner. trait ontologies with trait definitions estab-
As an extended paradigm for the lished by the International Crop Information
GO Consortium, the Plant Ontology System (ICIS), definition of a broader range
Consortium (POC), funded by the National of vocabularies is required (King, 2004). As
Science Foundation, aims to develop, curate an example of Plant Ontology, maize leaf
and share controlled vocabularies (ontolo- morphology and specifically the ligule, can
gies) that describe plant structures and be explored as shown in Figure 2 of the Plant
growth/developmental stages providing a Ontology Consortium (2002) through the
semantic framework for meaningful cross- Ontology Search link at Gramene.
species queries across databases (Plant The problems associated with estab-
Ontology Consortium, 2002; Avraham lishing satisfactory vocabularies, especially
et al., 2008). The first task of the POC project across species, are substantial. Even in tax-
is to efficiently integrate diverse vocabularies onomy, where there has been an established
currently in use to describe anatomy, mor- formalism for centuries, there is disagree-
phology and growth and developmental ment on vocabulary. The situation is much
stages in Arabidopsis, maize and rice. In less settled for such recent and dynamic
coming years, POC will extend this control- domains as developmental biology, plant and
led vocabulary to encompass the Fabaceae, animal pathologies, gene and protein nam-
Solanaceae and other plant families. The ing conventions, metabolic relationships or
description of plant phenotypes has become laboratory protocols, all of which will play a
one of the key issues in plant ontology that major role in determining the utility of gene
is based on complex knowledge and phe- expression systems. There is a need for com-
notyping protocols which are either to be munities to collaborate and share restricted
developed or vary greatly from one species vocabularies. An example of successful col-
to another. laboration in this respect is the development
More recently, the Plant Structure of the System-wide Information Network for
Ontology (PSO), the first generic ontological Genetic Resources (SINGER). SINGER is a
representation of anatomy and morphology common gateway to the genebank databases
of a flowering plant, was created (Ilic et al., managed in 12 centres of the Consultative
2007). The PSO is intended for the broad Group on International Agricultural Research
plant research community, including bench (CGIAR). In developing SINGER it was nec-
scientists, curators of genomic databases essary to adopt a common structure based
and bioinformaticians. The initial release on agreed taxonomy and other descriptors,
of the PSO integrated existing ontologies for while retaining the identity and independ-
Arabidopsis, maize and rice; more recent ence of the individual databases contribut-
versions of the ontology encompass terms ing to SINGER in terms of their software and
relevant to the Fabaceae, Solanaceae, addi- hardware platforms and structure (Sobral,
tional cereal crops and poplar. 2002).
Breeding Informatics 567

14.3.4 Interoperable query system 14.3.5 Redundant data condensing

Scientists have realized that there is a Just as redundant accessions are found in
need to make existing data from differ- germplasm collections, redundant data are
ent organisms simultaneously searchable, unavoidable because duplicate datasets
visible and, most importantly, comparable. using the same genotypes are generated
To look for genes involved in a particular by different research groups or for differ-
trait one has to search different databases ent purposes. The availability of complete
and manually figure out the orthology genome sequences, as well as the flood of
relationships among the relevant genes. other sequence data, is leading to alterna-
These species-specific databases are widely tive views on how these data can be organ-
dispersed and tailored to different objectives ized and interrogated. The high level of
and they store phenotypic data in different redundancy in gene discovery programmes
formats. Considerable handwork is therefore is being condensed through reference to
necessary to compare the phenotype of the consensus or complete genome sequences.
same gene in different organisms. A simple If a complete genome sequence is unavail-
meta-search engine or an interoperable able for a specific crop, closely related
query system for these databases alone does syntenic genomes can be used. The ever-
resolve this kind of problem. increasing size of DNA sequence databases
Productive utilization of databases continues to push bioinformatic capabili-
requires interoperability: that is, the precise ties and there is a growing need to condense
yet flexible interrelating of information from redundant data (Edwards and Batley, 2004).
one database to another. There are, at present, Information integration and redundant data
two major impediments to achieving wide- condensing often are two procedures that
scale interoperability: the state of database can interact and support each other.
protection legislation and computer security
issues (Greenbaum et al., 2005). While most
non-commercial/academic databases may 14.3.6 Database integration
not be overly concerned with the protection
of their intellectual property, they still put up Sequence databases that evolve from rigor-
barriers to entrance and consequently inter- ous and systematic sequencing efforts should
operability, due to concerns regarding the not merely function as warehouses for mil-
security of their computing infrastructure. lions of bases or amino acids. Of particular
For rice and maize, cross-database importance is the ability to attach substantial
querying and display of text objects can genomic information to the sequences. Studies
be implemented using a web-based object- will follow on identifying genes and predict-
oriented query system called the OPM ing the proteins they encode, determining
(Object-Protocol Model) data management when and where the proteins are expressed
tools of Gene Logic Inc. These tools are and how they interact and how these expres-
unique in their capacity to impose a uni- sion and interaction profiles are modified in
form object-oriented data model on an exist- response to environmental signals. Emphasis
ing relational database framework where on the underlying value of genotypic and
users can explore and assemble biological genomic elements must be balanced with a
information from heterogeneous databases. phenocentric approach. One way to address
This query system promotes direct analy- this need is to link the resources containing
sis of collinearity at the nucleotide level in the various types of information, such as
cereal species and may also be applied for genomic data, phenotypic or expression data
exploring multiple crop databases. The fur- and genetic resources.
ther incorporation of datasets from different The different source databases all use
studies will close the loop and create the different gene loci description systems
foundation for meta-integrated databases (i.e. gene indices) and the orthology rela-
that facilitate queries across whole systems. tionships are not always obvious, so many
568 Chapter 14

important phenotypic relationships may be serve as an intuitive and extensible frame-


difficult to discover. Therefore, a common work for the integration of various kinds of
data model combining the data with a com- genomic data. Another example of this is
mon gene index is required. Orthology data PHENOMAP (http://www.deltaphenomics.
must be available and a case-oriented user com), which can create consensus maps,
interface should facilitate access to phe- overlay all QTL on the consensus map,
notypic data. There are some difficulties identify regions of synteny between species,
in integrating phenotypic terminology that connect common markers across maps, etc.
varies significantly between each organism- GENEFLOW (http://www.geneflowinc.com/)
specific research community. also provides curated, annotated databases
Once integration from the molecular to containing mapping and QTL information
the organism levels has occurred, the next for the Poaceae, Fabaceae and Solanaceae.
frontier becomes integration of ecosystems A multi-species genotype/phenotype
information (environment and interactions database, PhenomicDB (http://www.phe
among organisms and populations) as the nomicDB.de), was created by merging pub-
concluding step in bridging from molecules lic genotype/phenotype data from a wide
to ecosystems. There are various efforts range of model organisms and Homo sapi-
underway to tackle linking environmental ens (Kahraman et al., 2005). This wealth of
information to the organism. data was compiled into a single integrated
One such effort is represented by resource by coarse-grained semantic map-
FLORAMAP (http://www.floramap-ciat.org), ping of the phenotypic data fields, by includ-
which is a computer tool for predicting the ing common gene indices (NCBI Gene) and
distribution of plants and other organisms in by the use of associated orthology relation-
the world. Approaches like FLORAMAP could ships. With its use-case-oriented interface,
enable scientists to link disparate stud- PhenomicDB allows scientists to compare
ies based on environmental characteristics and browse known phenotypes for a given
(Sobral, 2002). Conceptually, it should be gene or a set of genes from different organ-
possible to perform a similar function with isms simultaneously.
crop plants using GIS, to predict which envi- Although it is tempting to treat the inte-
ronments are similar and can be expected to gration of biological databases as a techno-
give a similar genotypic response, although logical problem, in fact the main impediment
in practice the process is quite distinct. to achieving this goal is not technological
but sociological. As Stein (2002) indicated,
a meaningful scalable integration cannot be
achieved without the cooperation of the data
14.3.7 Tool-based information providers. This also includes the cooperation
integration of the data generators. As long as the data
providers continue to produce online data-
In order to address biological questions bases without regard for the way in which
more fully and to extract more information the information will be aggregated, integra-
from various databases, researchers require tion will be a monumental task (Stein, 2003).
tools that allow them to integrate different
datasets in a dynamic, hypothesis-driven
14.4 Information Retrieval
fashion. For example, researchers need effi-
cient and intuitive tools to help identify and Mining
common genomic regions and where pos-
sible specific genes, influencing the expres- 14.4.1 Information retrieval
sion of target traits across diverse germplasm
and growing conditions. To address these Information retrieval (IR) can encompass
needs, Sawkins et al. (2004) developed a searching for documents themselves and
Comparative Map and Trait Viewer (CMTV) as information within documents and metadata
a part of the ISYS platform that can help which describe documents, or searching
Breeding Informatics 569

a database including stand-alone databases Abstracts


or hypertextually networked databases such
as the WWW. In this section, we will discuss Searching an online collection of abstracts
data retrieval, document retrieval and text is often the first approach to investigating a
retrieval, each of which has its own body of new topic or seeking an update on a known
literature, theory, practice and technologies. research area. There are several abstract-
Automated IR systems are used to ing and indexing services available online,
reduce information overload. Many univer- many of which require a subscription. There
sities and public libraries use IR systems to is considerable content overlap among major
provide access to books, journals and other bibliographic databases. There are three
documents. IR systems are often based on major databases for abstracts: Web of Science,
objects and queries. An object is an entity Current Contents and BIOSYS Previews.
that keeps or stores information in a data- The Institute for Scientific Information
base. User queries are matched to objects (ISI) produces the Web of Science an
stored in the database. Web search engines interface to the ISI Citation Database that
such as Google, Live.com, or Yahoo! search contains more than 5300 scientific journals,
are some of the current IR applications. dating from 1980, which is updated weekly.
One of the most valuable features of the
Bibliographic databases Web of Science is the inclusion of the cita-
tions associated with each abstract. It is pos-
Use of the literature is fundamental to the sible to: (i) view the abstracts of all articles
pursuit of all knowledge including plant cited in the original (parent) article; (ii) find
breeding. Through searching and reading, all articles published since the original (par-
we learn what our colleagues are doing, ent) article and those that have cited it; and
develop a broader perspective on our field of (iii) find all the articles that have cited a
interest, get ideas and confirm our discover- particular author.
ies. The scientific literature has become an The Web of Science also interfaces
expanding knowledge base that represents with the ISI Current Contents databases.
the collective archive of the work carried Current Contents divides into broad subject
out by the international scholarly commu- categories, among which the Life Science
nity (Trawick and McEntyre, 2004) and bib- category (coverage of about 1400 journals)
liographic databases have become part of is the most relevant to plant breeding. The
the daily life of scientists. Current Contents database can be searched,
Traditionally, the term bibliographic abstracts of articles found can be viewed and
databases referred to the abstracting and from these the table of contents of the jour-
indexing service for scholarly literature. As nal issue can be displayed and browsed.
technology has advanced, this has expanded BIOSIS Previews is made up of two
to include full-text articles, original data, databases: Biological Abstracts, which
images and books. For the average biologist, contains about 12 million records from
mining the literature usually means a key- more than 5000 journals and Biological
word search in Pubmed. However, methods Abstracts/RRM, which covers reports,
for extracting biological facts from the sci- reviews and meetings information not for-
entific literature have improved consider- mally published in scientific research jour-
ably and the associated tools will probably nals. This includes references to items from
soon be used to automatically annotate and meetings, symposia, workshops, review
analyse the growing number of system- articles, books, book chapters, software and
wide experimental data sets. Thanks to the US patents related to life sciences. It covers
increasing body of text and the open-access the biological sciences, from biochemistry
policies of many journals, literature mining to zoology, including almost all the areas
has also become useful for both hypothesis related to plant breeding.
generation and biological discovery (Jensen As a fully searchable abstracts data-
et al., 2006). base of internationally published research,
570 Chapter 14

Plant Breeding Abstracts is available at Science Direct (http:///www.sciencedirect.


http://www.cabi.org. The Plant Breeding com/) and Link (http://link.springer.de)
Abstracts database contains the most up-to- are two of the major publishers that have a
date, relevant information about all aspects collection of full-text articles. Open-access
of plant breeding and genetics, including: journals (freely available online for reading,
(i) plant breeding for specific traits, genetic downloading, copying, distributing and
resources, cultivar trials and cultivar descrip- using) have become increasingly popular.
tions; (ii) plant genetics, both classical and PLoS (Public Library of Science, http://www.
molecular, cytogenetics, genetics of specific plos.org/), as one of the open-access pub-
traits; (iii) plant biotechnology, genetic engi- lishers, is now publishing seven online
neering, transgenic plants; (iv) taxonomy peer-reviewed journals. Another example
and evolution; (v) in vitro culture; (vi) pest, is Hindawi Publishing Corporation (http://
disease and pesticide resistance; (vii) stress www.hindawi.com/), which publishes over
tolerance; (viii) breeding for and genetics 100 open-access journals in science, tech-
of crop production, botany, stability and nology and medicine.
quality traits; and (ix) reproductive behav- Finally, HighWire Press (http://highwire.
iour. Each week Plant Breeding Abstracts stanford.edu/) works with scientific socie-
Online delivers all the new highly targeted, ties and publishers to create online coun-
searchable summaries covering key English terparts to their print journals. It hosts the
and non-English language journal articles, largest repository of high-impact, peer-
reports, conferences and books about plant reviewed content, with 1036 journals and
breeding, genetics and plant biotechnology. 6,100,549 full-text articles from over 130
Over 16,000 records are added to the data- scholarly publishers. HighWire-hosted pub-
base each year. lishers have collectively made 1,940,665
articles free. With its partner publishers,
Full text of research articles HighWire produces 71 of the 200 most-
frequently cited journals. It allows a basic
Several thousand biology journals are now search across all the journals with which
available in electronic form, most of which they collaborate.
are online counterparts to the paper editions.
Some journals are only available online. Books and text-rich web sites
More and more journals have made articles
from back issues freely available and more While books have been slower than jour-
publishers offer free access to articles. nals to make the transition from paper to
There are several common routes to electronic form, there are more and more
online journal articles. Full articles can be books becoming available online. A grow-
accessed through abstract databases. Many ing trend is for books to be associated with
databases have links between abstracts and web sites for further information and correc-
the corresponding online full-text article tions. A project to put biomedical textbooks
allowing the user seamless access to the full online, make them searchable and integrate
text if the following is true: (i) the journal them with PubMed and other data resources
(more specifically, the journal issue) is pub- has recently begun at the National Center
lished online; (ii) the publisher of the jour- for Biotechnology Information (NCBI). Book
nal has agreed with the database to make the publishers also provide information and
article available via this route; and (iii) the brief descriptions for new and recently pub-
individual or individuals library subscribes lished books on their web sites.
to the journal, or the publisher makes the Any search engine can be used to search
article freely available. for molecular biology information. Many
Full articles can also be accessed by log- publishers, biotech companies, research
ging on directly to the publisher web sites. laboratories, teachers and others display
Many publishers have developed their own information that can be browsed freely. As
online interfaces to their journal databases. indicated by Trawick and McEntyre (2004),
Breeding Informatics 571

information found in this way should be from data (Frawely et al., 1991). The major-
carefully evaluated. Be aware that anyone ity of data mining exercises in bioinformatics
can publish almost anything on the Internet, at present are founded on the requirement
so a key factor in assessing the validity of to screen through large, usually sequence-
online information is the reliability of its based, datasets searching for homology.
source. It is important to assess what quali- Bioinformatics has traditionally helped to
fies the individual or organization to publish identify the molecular constituents of the
the information and what their motivation cell and their functions, often described in
for doing so might be. As with any literature relation to a biochemical activity. This has
research, the information found should be included gene finding, motif recognition,
cross-checked and critically evaluated. similarity searches, multiple sequence align-
A search engine provided by Google ment, protein structure prediction, phyloge-
(http://scholar.google.com/) has recently netic analysis and other related methods.
become popular for literature searches. Compared to sequence databases, ex-
Searching by authors, key words or authors tracting information related to plant breeding
affiliation will bring up all related publica- is not an easy task. It is like finding a tiny
tions (articles, books, etc.) with article title, bit of gold in the voluminous portion of ore
author list, the number of citations, etc. taken from a gold mine. Breeding related data
A full article can be browsed from a pro- consist of many inter-related, complex data
vided link. All the articles that cite a spe- types and therefore require complex queries
cific article can be browsed. Therefore, a to search, retrieve and analyse them. Classical
key-word search for a specific topic would multivariate and discriminant statistics are
provide a series of linked information. relevant to many biological data mining exer-
cises. Significant progress has yet to be made
in carrying out systematic or integrated data
14.4.2 Information mining mining for the disparate and complex infor-
mation now available to plant scientists.
The first major goal for plant biologists in Plant breeders may want to mine for
the post-genome era is to understand the the following information: (i) germplasm
function of every gene and how individual information collected across worldwide
gene products interact and contribute to institutions; (ii) markertrait associations
major plant processes. This new challenge reported for traits of interest to specific
for plant functional genomics is destined to breeding programmes; (iii) genes that are
become the most difficult hurdle in plant required for the improvement of traits of
biology and requires the systematic appli- agronomic importance through transforma-
cation of global molecular approaches inte- tion and introgression; and (iv) molecular
grated through bioinformatics. Several tools markers and marker-related information for
are now required to decipher gene function the development of MAS tools.
including the traditional methods of random
mutagenesis, gene knock-out and silencing, Comparative informatics
as well as high-throughout omics disciplines
of transcriptomics, proteomics and metabo- As comparative genetics is now viewed as a
lomics. Mining this genomics information key component to expanding existing know-
and effectively applying it to plant breeding ledge on plant genomes and genes, com-
is a significant challenge indeed. parative bioinformatics remains an essential
strategy of this pursuit. Comparative infor-
Data mining matics facilitates linking the genomes of var-
ious crop species and will provide keys to
Data mining, or knowledge discovery in understanding how genes and genomes are
databases (KDD), has been described as the structured and how they evolve. Through
nontrivial extraction of implicit, previously the identification of synteny, it will be pos-
unknown and potentially useful information sible to isolate genes from crop plants with
572 Chapter 14

large genomes using information about to identify different genes for the same
homologous genes in related crops with or similar phenotype.
smaller genomes. Linkages and interactions
should also be promoted between databases Sequence similarity analysis
of plants and non-plant species.
Horan et al. (2005) clustered all pro- DNA sequence similarity analysis can be used
tein sequences from Arabidopsis and rice to trace allele, gene or chromosomal fragments,
into similarity groups, calculated their identify similarities between sequences or
corresponding alignments, localized their genes and align multiple sequences. Protein
conserved domains and generated distance sequence analysis includes searching for
trees. The resulting datasets provide compre- protein similarity and looking at primary,
hensive information about the similarities secondary and tertiary structure.
and dissimilarities between a monocotyle- BLAST (Basic Local Alignment Search
don and dicotyledon representative with Tool) is a set of similarity search programs
regard to the size, quantity and composi- designed to explore all of the available
tion of their family and singlet proteins. sequence databases regardless of whether
The provided datasets represent a founda- the query is a protein or DNA sequence.
tion for future studies of orthologous and The BLAST services available include: NUCLE-
paralogous sequences of the two species. OTIDE BLAST; PROTEIN BLAST; TRANSLATED BLAST;
The user-friendly Genome Cluster Database GENOMIC BLAST pages (human genome,
(GCD; http://bioinfo.ucr.edu/projects/GCD) eukaryotes, microbial genomes); and spe-
was designed to provide an efficient cluster cialized BLAST pages (VECSCREEN, a BLAST-
mining tool for Arabidopsis and rice, to per- based detection of vector contamination;
form various intraspecific and interspecific IGBLAST, for analysis of immunoglobulin
comparisons and also to retrieve related sequences in GenBank; GEO BLAST, for gene
sequences from other organisms. expression data; and SNP BLAST). These serv-
There are four basic comparative bioin- ices are produced and made available on
formatics analyses, including: the Internet by the US NCBI.

DNADNA conservation: the align-


ment of complete DNA sequence from 14.5 Information Management
two plant species to determine DNA Systems
DNA conservation is computationally
demanding and the algorithms that per- The core components of information manage-
form this are under active development. ment tools for plant breeding should ideally
Syntenic blocks: the identification of support the acquisition, storage and analy-
segments of the genome in which the sis of information on genomes, proteins,
order of particular genes is conserved biochemical pathways, cellular systems,
between two species (syntenic blocks) organism models, ecological systems, geo-
is of interest not only for studying the graphic biodiversity, germplasm evalua-
evolution of chromosome structure but tion, field trial measurements, phenotypic
also for helping to predict and identify variation and environmental interactions.
pairs of genes between species that are In addition, the systems should adapt to
(or are not) orthologues. emerging information infrastructure, such
Orthologues: another type of com- as: (i) scalable computing; (ii) distributed
parative analysis focuses on genes and sensors, data, people and computers; (iii)
proteins and attempts to identify the web- and object-oriented software archi-
orthologous genes in different plants. In tecture; and (iv) decentralization of content
most cases, the orthologous genes can be and software authoring (Sobral, 2002).
expected to be functionally equivalent. Currently available information man-
Phenomic similarity: phenotypic and agement and data analysis systems have
physiological similarities can be used their pitfalls:
Breeding Informatics 573

There is a big concept gap between tion management tools are needed. High-
breeding and molecular biology in terms throughput laboratories, often required
of what information is available and by molecular breeding programmes, make
how it can be used. Laboratory Information Management
Many of the systems have been devel- Systems (LIMS) a necessity. LIMS manage
oped independently for phenotypic data, samples, laboratory users, instru-
and genotypic data and used by two ments, standards and support laboratory
different groups of scientists including functions such as invoicing, plate manage-
breeders/agronomists and geneticists/ ment, sample tracking and work flow auto-
molecular biologists. mation. Taking a typical genotyping project
Breeding information has been man- as an example, the LIMS may include track-
aged in most breeding programmes by ing the samples from field to plate and to
using relatively simple tools such as storage, managing the data flow from plate
MS ACCESS and AGROBASE (http://www. to genotyping facilities and to computers
agronomix.mb.ca/), for which less train- and organizing and optimizing experiments
ing is required. However, these tools both internally and externally.
are not suitable for data management Todays trend is to move the whole
and statistical analysis when molecu- process of information collection, manage-
lar data and multiple-resource data are ment, analysis, decision making, review
incorporated. and release into the workplace. The goal of
Insufficient communication between a LIMS is to create a seamless organization
breeders/agronomists and geneticists/ in which:
molecular biologists has contributed Instruments are integrated in the lab
to the paucity of tools suitable for both
network where they receive instruc-
groups. As a result, hardware, database
tions and work lists from the LIMS and
and software support in most breeding
return finished results, including raw
institutions is very limited or very dif-
data, back to a central repository where
ferent from those established in the bio-
the LIMS can update relevant informa-
technology and IT industries.
tion to external systems.
Understanding and communication Laboratory personnel perform calcu-
between IT scientists and breeders and
lations, review and document results
between facility developers and breed-
using online information from con-
ers, is also lacking. This contributes to
nected instruments, reference databases
underdevelopment of information sys-
and other resources using electronic lab
tems designed for plant breeding and
notebooks connected to the LIMS.
the limited use of those currently avail- Management can supervise the lab
able in genomics.
process, react to bottlenecks in work-
Many breeding companies and institu-
flow and ensure regulatory require-
tions, especially in developing coun-
ments are met.
tries, are lacking the personnel and Laboratory participants can place work
facilities for information management
requests and follow up on progress,
which are well developed in the bio-
review results and other documentation.
technology industry.
With several thousand data points
flowing out of the laboratory every day,
14.5.1 Laboratory information timely scoring and delivery of the results
management systems to breeders are basic requirements for an
efficient breeding system. Well-trained
To handle the constant flow of data from assistants for genotyping and scoring,
the lab to the breeder and to integrate infor- coupled with research scientists who can
mation from molecular markers, genetic analyse data in meaningful ways, are the
mapping and phenotyping, many informa- key components for a data management
574 Chapter 14

and delivery system. A laboratory with information support, IT infrastructure sup-


well-equipped facilities must also be well port, data scoring, acquisition and data for-
equipped with qualified personnel and the matting, data hosting, data integration and
appropriate software for data integration, data mining.
manipulation, analysis and mining. Timely Most bioinformatics data and tools are
delivery of data is equally important, available through the Internet. Post-genomics
because in many cases the window of time experiments require access to dozens of data
the breeder has to make selections is very types for tens of thousands of data points
limited. With high-throughput genotyping simultaneously. This cannot be achieved
and data management systems currently with common web-based tools; such analyses
available, it takes about a week to generate require programmatic access to web inter-
and analyse data for a breeding population faces such that large quantities of data can
consisting of several hundred individuals. be pipelined from one interface to the next.
This includes activities ranging from leaf A biological web service interoperability
tissue harvesting to DNA extraction, geno- initiative, BioMOBY (http://www.biomoby.
typing, data scoring, analysing, summariz- org/), was established to prove a simple,
ing and reporting. extensive platform through which the myriad
of online biological databases and analytical
tools can offer their information and analyti-
cal services in a fully automated and interop-
14.5.2 Breeding information erable way (Wilkinson et al., 2005).
management systems The components of an information
management system designed for molecular
Organizations involved in molecular plant breeding are database modules that link
breeding should establish information gene location, function and allele value data
management systems capable of handling with target environment characterization data
information from multiple sources, includ- and the germplasm units used in the breed-
ing institution-owned information and that ing programme, together with the tools for
in public databases. The data model serves querying the database. The importance and
as a foundation for data transfer between enormity of this task cannot be overstated.
different entities in the field of plant breed- As discussed previously, web-based breeding
ing. Legacy data collected from different information management systems provide
sources should be cleaned and organized several advantages over local and stand-alone
according to this specification while new systems because most public breeding insti-
data is fed from users through a standard tutions have limited IT and service support.
interface to ensure the format of the data.
The collected data are stored in a central-
ized database. Different data warehouses can 14.5.3 International Crop Information
be built to meet the needs of data analysis System
and decision making. Raw phenotypic and
genotypic information generated within the Informatics has become a prerequisite to
breeding institution can be stored in two molecular breeding because the volume of
separate databases; however, a knowledge breeding-related information is increasing
base should be created which combines at such a high rate that collecting, storing,
data from the two data sources. mining and manipulating this information
The data management system serves for selection decisions is not possible with-
as a bridge connecting genotypic and phe- out appropriate statistical, biometrical and
notypic information and provides tools for informatics tools. An integrated breeding
gathering data, integrating public informa- tool is therefore needed to rapidly collect,
tion into the breeders data warehouse and analyse and represent breeding-related data,
extracting useful information for breeding. uniquely identify the germplasm units, doc-
The components of such a system include ument their co-ancestry and associate gene
Breeding Informatics 575

information with these units in the short tion management for each crop. Meanwhile,
window of time available for most selection a common structure ensures that huge econ-
decisions. This is critical to practical imple- omies are gained by shared commitments
mentation of any molecular-based breed- to training in national agricultural systems
ing strategy. In addition, computational and through collaboration in terms of intel-
tools are required to translate and integrate lectual development, programming, testing
research outputs into a usable form for and maintenance.
plant breeding programmes (Dwivedi et al., Linkages between the GMS and DMS
2007). The International Crop Information provide biological scientists with powerful
System (ICIS) is identified as the key com- querying functionality. The querying cap-
ponent that can link the gene, phenotype abilities of ICIS will not place sensitive data
and environment data with uniquely identi- at risk. To permit researchers to manage their
fied germplasm units used and manipulated own data in parallel with those from other
in breeding programmes. sources, ICIS has a parallel structure of cen-
ICIS is a database system already pro- tral and local versions. This structure pro-
totyped since 1996 by a CGIAR multi- vides local read/write capabilities, allowing
centre group of biologists and information data generated locally to be merged and
scientists (www.icis.cgiar.org) to manage harmonized with the central database at the
and integrate all research data on genetic local users discretion.
resources, crop improvement and resource ICIS must have seamless links to other
management and to link this information information technologies used in agricul-
to global environmental and genomic data ture. The System-wide Genetic Resources
resources (McLaren et al., 2005). ICIS is Program (SGRP) has endorsed ICIS as a criti-
attempting to level the information play- cal initiative in germplasm information sys-
ing field between developed and devel- tems. A current project with SGRP ensures
oping nations and it addresses the CGIAR that ICIS and the System-wide Information
mandate to share research information as Network for Genetic Resources (SINGER)
well as germplasm and technology. Modest exchange data smoothly; another project
resources with strong commitments have so with the Collaborative Research Centre for
far produced an innovative prototype for Molecular Plant Breeding in Australia tar-
tracking and recording generically all the gets linkages between conventional evalua-
processes in germplasm collection, char- tion data and molecular marker data within
acterization, evaluation and development. ICIS. In addition, ICIS is in many ways very
The system has been used or evaluated for complementary and becoming increasingly
rice, wheat, maize, barley, cowpea and com- dependent upon the content and technolo-
mon bean and is used by private and public gies of plant genome databases. As ICIS
breeding programmes. finds itself increasingly drawn towards the
ICIS has a modular structure with a core integration of breeding and field evalua-
consisting of The Genealogy Management tions with associated molecular data, this
System (GMS) which manages data on requirement has inspired ongoing collabo-
nomenclature, origin, development and rations to integrate ICIS data with external
deployment of germplasm and the Data genetic, genomic, transcriptomic and pro-
Management System (DMS) which manages teomic datasets, such as species-specific
and documents characterization and evalua- plant genomic databases, the United
tion data. Specialized user interfaces deliver States Department of Agriculture (USDA)
data views and decision support tools to Gramene comparative genomics database,
crop scientists from different disciplines the European PlaNet group and others.
which access the same data resources lead- Although ICIS has created some
ing to efficient use and re-use of research fundamental components required for mole-
data. The development of distinct crop cular breeding, there are several general
databases (separate ICIS implementations) needs for plant breeding, which still require
are resulting in focused data and informa- a great deal of development: (i) databasing
576 Chapter 14

for all breeding-related information such as possibly several of the following data
climate, soil and phenotype data for selec- types: (i) QTL and (comparative) genetic
tion and target environments; (ii) data min- mapping data from both specific projects
ing for specific breeding purposes such as and public sources of such information as
environment classification, genotype-by- in Gramene, GrainGenes and MaizeGDB;
environment interaction and identification (ii) additional public sequence and anno-
of novel alleles and genetic variation; (iii) tation data, as it becomes available, pos-
modelling breeding processes and selection sibly including molecular marker data
schemes using multiple sources of breeding of various types; (iii) crop mutant data;
information to eliminate some field and lab and (iv) information from other pertinent
tests required for making selection decisions, international plant databases, in particu-
which may be critical for complex traits; and lar, The Arabidopsis Information Resource
(iv) extracting useful information by an inte- (TAIR) and equivalent model organism
grated exploration of the information created databases.
in a specific breeding programme with all To help users choose the most appro-
related information from public databases. priate experimental design and data ana-
Phenotypic and genetic data should be lysis methods and to provide them with a
stored in a generic database such as ICIS or regularly updated selection of appropriate
where necessary in other databases compat- options, the system under development by
ible with a standard informatics platform. the GCP project targets to provide automatic
For example the CIMMYT MAIZE FIELDBOOK, transition for data flow between all permu-
currently used for capturing phenotypic tations and combinations of software to
and trial data in maize, is being integrated be used. This integrated decision support
with ICIS. The functionality developed in system for marker-assisted plant breeding
the MAIZEFINDER software is being utilized (analogous to AGROBASE), called iMAS,
as a data warehouse and an interface for will facilitate an integrated, error-free and
queries on data in ICIS. This integration appropriate data analysis from the begin-
will allow breeders to continue to use the ning to the end of the molecular breeding
functions provided by both MAIZE FIELDBOOK pathway. As an integrated decision support
and MAIZEFINDER, but also allows access system for marker-assisted plant breeding,
to the functionalities of ICIS, such as iMAS was developed to seamlessly facili-
pedigree management and storage for tate marker-assisted plant breeding by inte-
genomic data. ICIS also provides integration grating freely available quality software
with the GCP informatics platform, called involved in the journey from phenotyping
Pantheon (http://pantheon.generationcp. and genotyping of individuals to identifica-
org). This software includes a web-based tion and application of trait-linked markers
search engine, standalone Java graphical and providing simple-to-understand-and-
user interfaces and integration to other third use online decision guidelines to correctly
party software such as ISYS (http://www. use these software programs and interpret
ncgr.org/cmtv), the Genomic Diverstiy and and use their product. Potential useful soft-
Phenotype Connection (GDPC) (http:// ware identified include those for generation
www.maizegenetics.net/gdpc/index.html) of experimental design, biometric analysis
and BioMOBY web service compliant of phenotypic data, building a linkage map,
tools (http://www.biomoby.org/). Through marker identification through QTL analysis,
GDPC the platform has access to the TASSEL marker identification through association
software used for association analysis and analysis and determination of sample sizes
through ISYS access to the visualization required for foreground and background
tools such as the Comparative Map and Trait selection. ICIS should finally integrate with
Viewer (http://www.ncgr.org/cmtv) that are iMAS to make all the software available
useful for QTL mapping and MAS. under one single umbrella.
The data should be cross-linked with Other statistical tools and software
other publicly available data, including should also be incorporated into the same
Breeding Informatics 577

platform through ICIS. One such tool is researchers. Its Basic System offers data man-
CROPSTAT, a computer program for data man- agement, experiment management and sta-
agement and basic statistical analysis of tistical analysis. The Varietal Comparisons
experimental data. CROPSTAT is freely avail- Module compares relative performance
able from www.irri.cgiar.org and has been of cultivars or treatments within a trial or
developed primarily for the analysis of data across all trials, locations and years and also
from agricultural field trials, but many of analyses genotype-by-environment inter-
the features can be used for analysis of data actions. The Advanced Statistics Module
from other sources. The main modules and supports the randomization and analysis
facilities are: of more advanced experimental designs,
spatial analyses of yield trials, multivari-
data management with a spreadsheet;
ate analyses and other advanced statistical
text editor;
analyses. The Pedigree Data Management
summary statistics and scatter plot
Module supports the plant breeding needs
graphics;
of many types of crops. The Image Display
analysis of variance;
Module supports the display of images of
regression and correlation;
cultivars or treatments including the growth
mixed model analysis;
stages, flower colour or shape, plant com-
single site analysis of plant breeding
ponents and characteristics, or molecular
cultivar trials;
markers, for any cultivar or genotype.
cross site and additive main effect
GERMINATE (Lee et al., 2005), devel-
and multiplicative interaction (AMMI)
oped by the Scottish Crop Research Institute
analysis;
and the John Innes Centre (http://germi
pattern analysis of genotype-by-
nate.scri.sari.ac.uk/), is a generic plant data
environment interaction;
management system designed to hold a
generalized linear models;
diverse variety of data types, ranging from
log linear models;
molecular to phenotypic and to allow que-
QTL analysis;
rying between such data for any plant spe-
randomization and layout of experi-
cies. Data are stored in GERMINATE in a
mental designs;
technology-independent manner, such that
display of linear forms for general facto-
new technologies can be accommodated in
rial expected mean squares (EMS); and
the database as they emerge, without modi-
generation of coefficients for orthogo-
fication of the underlying schema.
nal polynomials.
The Plabsoft database (Heckenberger
Although CROPSTAT is an easy-to-use soft- et al., 2008) is a comprehensive database
ware package, it is not suitable for analysing management system (DBMS) for integrating
large-scale data sets. phenotypic and genomic data in academic
and commercial plant breeding programmes.
The database structure is capable of manag-
14.5.4 Other informatics tools ing the following types of data observed in
breeding programmes of all major crops:
There are several informatics tools available (i) germplasm data of any species including
from either private or public sectors. Some pedigree data; (ii) phenotypic data of any
of them have multiple functions relevant traits and trait complexity; (iii) trial man-
to plant breeding, while others only pro- agement data for any field and trial design;
vide specific applications in plant breed- (iv) molecular marker data for all common
ing. Only some representative tools will be types of markers; and (v) project and study
described here. management data. By implementing the
AGROBASE Generation II (http:// database structure into the DBMS, functions
www.agronomix.com/) is a comprehensive have been developed for data import, data
database management and analysis system retrieval and data transfer from and to com-
for agronomists, plant breeders and plant monly used statistical analysis software.
578 Chapter 14

Compared to the above informatics multi-year trial data: environment clas-


tools, GENEFLOW (http://www.geneflowinc. sification and yield stability analysis;
com/geneflow.html) is a comprehensive tool heterotic pattern analysis and heterotic
more relevant to molecular breeding. The pooling;
system integrates pedigree, genotype and genotype-by-environment interaction;
phenotype data information traditionally germplasm characterization and
kept in separate databases but which gains classification;
considerable value and power by being marker characterization and genetic
linked together. This allows the researcher mapping;
to study the inheritance of a trait, explore genetic mating systems;
the relationship between genetic make-up identification and quantification of
and observed phenotype, look for genetic novel variation at both phenotypic and
components associated with a trait, track genotypic levels;
genetic fingerprints for a set of individuals molecular and functional analysis
and identify ancestors that are the likely of genetic diversity and evolutionary
source of a gene or trait, all from a single process;
platform. The primary components of the association of genotypes with pheno-
system include: types through linkage genetics;
statistical methods for fine mapping
a pedigree module to use as a pedigree-
using multiple populations, multiple
based display and to support the overlay
environments, comparative mapping
and analysis of genetic and phenotypic
and linkage disequilibrium mapping;
data within the context of known fam- simulation and modelling of gene
ily relationships;
networks and biological interactions
a genotype module to provide a
including major gene interaction, major
detailed, chromosome-level view of
geneQTL interaction and QTLQTL
the genetic content and organization of
interaction;
various individuals; heterosis and combining ability ana-
a population module to analyse struc-
lysis and hybrid and inbred perform-
tured populations, ranking the progeny
ance prediction; and
and producing a detailed report and statistical methods for molecular data
display; and
of multiple sources.
a report module to generate a large
number of key reports and graphs, shed-
Currently, bioinformatics is conducted
ding light on the structure of genetic
by a specialized group of individuals. The
diversity and the relationship between
majority of biologists use only basic bioinfor-
genes and traits.
matics tools and there is little involvement
by plant breeders. This has been considered
a major limitation of bioinformatics today
14.5.5 Future needs for informatics tools (Rhee, 2005). In the decades to come, the
majority of biologists will need skills such
A comprehensive statistical analysis sys- as programming, database development and
tem needs to be developed. This system management of large datasets and quantita-
should provide both basic and advanced tive and statistical analysis of data. The rich-
statistical methods, analytical tools and ness and enormity of available information,
web-based analysis and visualization soft- such as understanding the function of every
ware for managing, analysing and mining gene in an organism, will shift research into
all kinds of data including those related to more theoretical biology using informat-
phenotype, genotype, sequence and expres- ics approaches. Another current issue with
sion. The following are examples of func- bioinformatics includes the heterogeneity
tional components that should be included of data and how it is analysed, annotated
in the system: and displayed and the lack of connectivity
Breeding Informatics 579

among the available data. Recent movements the availability of specific thesauri to cata-
towards the creation of a scientific society logue and validate terms.
for database curators (http://www.biocura To meet this demand, a public resource
tor.org) and projects that bring together the for mining, filtering and visualizing
efforts of different model organism databases phenotypic data the PROPHECY database
(http://www.gmod.org) provide early hints was designed to allow easy and flexible
that bioinformatics is developing into a more access to physiologically relevant quantita-
coherent discipline of biology. With the mat- tive data for the growth behaviour of mutant
uration of genomics has come the adoption strains in the yeast deletion collection dur-
of standard data formats and schemata for ing conditions of environmental challenges
crop genome information, and it is likely that (Fernandez-Ricaud et al., 2005). We would
future databases will be designed with cross- expect a similar effort in crop plants.
connectivity capabilities as a priority. The Some informatics tools are discussed in
availability of complete genome sequences Chapter 15.
enables further mining for novel promoter
sequences and other regulatory features such
as micro-RNA. This tertiary annotation pro- 14.6 Plant Databases
vides links to both the phenotype and the
complex regulatory mechanisms that govern A list of currently available molecular bio-
development and response to the environ- logy databases can be found at http://www.
ment (Edwards and Batley, 2004). oxfordjournals.org/nar/database/a/. The list
One of the more significant changes to is updated in the January issue of Nucleic
crop genome databases has been the move Acid Research each year. The 2008 update
towards graphical user interfaces that pro- contains 1078 databases, which can be classi-
vide a more user-friendly search envir- fied into 14 categories (Table 14.4; Galperin,
onment. The Ensembl database schema, 2008). In addition, a comprehensive list of
which has a strong emphasis on graphi- databases is available at ExPASy Life Science
cal user interaction, is used in the cereal Directory (http://expasy.ch/alinks.html).
comparative genomic database Gramene Many attempts are being made to
(Liang et al., 2008). No single database can understand biological subjects at a systems
attempt to store all of the possible informa- level. A major resource for these approaches
tion about an organism. Therefore, a key are biological databases, storing large vol-
role of genome browsers is to provide a rich umes of information about DNA, RNA and
variety of links to external databases. As protein sequences, including their func-
genome sequences become available from tional and structural motifs, molecular
more organisms, projects such as Ensembl markers, mRNA expression levels, metabo-
are attempting to provide access to genome- lite concentrations, proteinprotein inter-
wide inter-species comparisons of genomic actions, phenotypic traits or taxonomic
and protein sequences. The same strategy is relationships. As an example of a compre-
needed to develop inter-species crop infor- hensive resource, the NCBI provides analysis
matics resources aimed at serving the plant and retrieval tools for the data in GenBank
breeding community. and other biological data made available
The rapid evolution of the field of through NCBIs web site, in addition to
phenomics the genome-wide study of maintaining the GenBank nucleic acid
gene dispensability by quantitative analysis sequence database (Wheeler et al., 2007).
of phenotype has resulted in an increasing NCBI resources include Entrez, the Entrez
demand for new data analysis and visuali- Programming Utilities, My NCBI, PubMed,
zation tools. Most of the valuable pheno- PubMed Central, Entrez Gene, the NCBI
typic data reside in the public literature, not Taxonomy Browser, BLAST, BLAST LINK (BLINK),
captured in databases. Effective text min- Electronic PCR, OrfFinder, Spidey, Splign,
ing is needed to gather these data as well. RefSeq, UniGene, HomoloGene, ProtEST,
A prerequisite for text mining, however, is dbMHC, dbSNP, Cancer Chromosomes,
580 Chapter 14

Table 14.4. Molecular biology databases: categories and numbers. Summarized from http://www.
oxfordjournals.org/nar/database/a/; the number in parentheses is the number of databases in the
category.

Nucleotide sequence databases Metabolic and signalling pathways


International Nucleotide Sequence Database Enzymes and enzyme nomenclature (12)
Collaboration (3) Metabolic pathways (19)
Coding and non-coding DNA (41) Proteinprotein interactions (70)
Gene structure, introns and exons, splice sites (25) Signalling pathways (5)
Transcriptional regulator sites and transcription
Human and other vertebrate genomes
factors (60)
Model organisms, comparative genomics (63)
RNA sequence databases (63) Human genome databases, maps and viewers (19)
Protein sequence databases Human ORFs (28)
General sequence databases (15)
Human genes and diseases
Protein properties (16)
General human genetics databases (13)
Protein localization and targeting (22)
General polymorphism databases (28)
Protein sequence motifs and active sites (22)
Cancer gene databases (22)
Protein domain databases; protein
Gene-, system- or disease-specific databases (50)
classification (38)
Databases of individual protein families (65) Microarray data and other gene expression
Structure databases databases (65)
Small molecules (15) Proteomics resources (18)
Carbohydrates (9)
Nucleic acid structure (16) Other molecular biology databases (41)
Protein structure (75) Drugs and drug design (23)
Molecular probes and primers (9)
Genomics databases (non-vertebrate)
Genome annotation terms, ontologies and Organelle databases
nomenclature (12) Mitochondrial genes and proteins (18)
Taxonomy and identification (10)
Plant databases
General genomics databases (44)
General plant databases (38)
Viral genome databases (25)
Arabidopsis thaliana (26)
Prokaryotic genome databases (61)
Rice (17)
Unicellular eukaryotes genome databases (15)
Other plants (18)
Fungal genome databases (32)
Invertebrate genome databases (51) Immunological databases (27)

Entrez Genome, Genome Project and related sequence databases, structure databases,
tools, the Trace and Assembly Archives, the genomics databases, metabolic and signalling
Map Viewer, Model Maker, Evidence Viewer, pathways, microarray data and other gene
Clusters of Orthologous Groups (COGs), expression databases, proteomics resources,
Viral Genotyping Tools, Influenza Viral organelle databases and plant databases.
Resources, HIV-1/Human Protein Interaction Some of the databases to be discussed in this
Database, Gene Expression Omnibus (GEO), section are not for plant species, however,
Entrez Probe, GENSAT, Online Mendelian they may be useful in comparative genom-
Inheritance in Man (OMIM), Online ics, genetics and phenomics.
Mendelian Inheritance in Animals (OMIA),
the Molecular Modelling Database (MMDB),
the Conserved Domain Database (CDD), the 14.6.1 Sequence databases
Conserved Domain Architecture Retrieval
Tool (CDART) and the PubChem suite of Nucleotide sequence databases
small molecule databases.
Information relevant to plant breeding The most important DNA sequence databases
may be housed in nucleotide, RNA and protein are listed in Table 14.5 with their URLs
Table 14.5. DNA and protein sequence databases.

Database name Uniform Resource Locator (URL) Database description

DDBJ http://www.ddbj.nig.ac.jp DNA Data Bank of Japan (DDBJ), one of the three major databases for the
International Nucleotide Sequence Database Collaboration
EMBL Nucleotide http://www.ebi.ac.uk/embl The EMBL Nucleotide Sequence Database is maintained at the European
Sequence Bioinformatics Institute (EBI) in an international collaboration with
Database DDBJ and GenBank at the NCBI (USA)
GenBank http://www.ncbi.nlm.nih.gov A comprehensive sequence database that contains publicly available DNA sequences
for more than 170,000 different organisms, obtained primarily through the submission
of sequence data from individual laboratories and batch submissions from
large-scale sequencing projects
EXProt http://www.cmbi.kun.nl/EXProt A non-redundant protein database containing a selection of entries from genome
annotation projects and public databases, aimed at including only proteins with an
experimentally verified function

Breeding Informatics
MIPS http://mips.gsf.de Databases at Munich Information Center for Protein Sequences
NCBI Protein http://www.ncbi.nlm.nih.gov/ The NCBI Entrez Protein database comprises sequences taken from a variety of
entrez/query.fcgi?db=Protein sources, including Swiss-PROT, the Protein Information Resource, the
Protein Research Foundation, the Protein Data Bank, and translations from
annotated coding regions in the GenBank and RefSeq databases
Patome http://www.patome.org Biological sequence data disclosed in patents and published applications, as well as
their analysis information
PIR-PSD http://pir.georgetown.edu The Protein Information Resource (PIR) is an integrated public bioinformatics resource
that supports genomic and proteomic research and scientific studies. PIR has
provided many protein databases and analysis tools to the scientific community,
including the PIR-International Protein Sequence Database (PSD) of functionally
annotated protein sequences
PRF http://www.prf.or.jp/en/index.shtml Protein Research Foundation database of peptides: sequences, literature and unnatural
amino acids
RefSeq http://www.ncbi.nlm.nih.gov/RefSeq The NCBI Reference Sequence (RefSeq) database provides curated non-redundant
sequence standards for genomic regions, transcripts (including splice variants),
and proteins
Swiss-PROT http://www.expasy.org/sprot The UniProt/Swiss-PROT Protein Knowledgebase is a curated protein sequence,
providing a high level of annotation (such as the description of protein function,
domains structure, post-translational modifications, variants, etc.), a minimal level of
redundancy and high level of integration with other databases. It is part of the

581
Universal Protein Knowledgebase (UniProtKB)
Continued
582
Table 14.5. Continued.

Database name Uniform Resource Locator (URL) Database description

TCDB http://www.tcdb.org The Transporter Classification Database (TCDB) is a curated, relational database
containing sequence, classification, structural, functional and evolutionary
information about transport systems from a variety of living organisms
UniProt http://www.uniprot.org UniProt (Universal Protein Resource) is the worlds most comprehensive catalogue of

Chapter 14
information on proteins. It is a central repository of protein sequences and functions
created by joining the information contained in Swiss-PROT, TrEMBL and PIR.
UniProt has three components, each optimized for different uses. The UniProt
Knowledgebase (UniProtKB) is the central access point for extensive curated protein
information, including function, classification and cross-reference. The UniProt
Reference Clusters (UniRef) databases combine closely related sequences into a
single record to speed searches. The UniProt Archive (UniParc) is a
comprehensive repository, reflecting the history of all protein sequences
Breeding Informatics 583

and a brief description. The International base and maintained collaboratively by the
Nucleotide Sequence Database Collabo- Swiss Institute of Bioinformatics (SIB) and
ration is a joint effort of the European the EBI, provides a high level of annota-
Bioinformatics Institute (EBI), the DNA tion, a minimal level of redundancy, a high
Data Bank of Japan (DDBJ) and the US level of integration with other biomolecular
NCBI. The nucleotide sequence databases databases and extensive external documen-
are data repositories, accepting nucleic acid tation. Each entry in Swiss-PROT is thor-
sequence data from the community and oughly analysed and annotated to ensure a
making it freely available. high standard of annotation and maintain
Each entry in a database has a unique the quality of the database. In Swiss-PROT
identifier, which is a string of letters and two classes of data can be distinguished: the
numbers corresponding to that record. This core data and the annotation. The core data
unique identifier, known as the Accession consists of the sequence data, the citation
Number, can be quoted in the scientific information (bibliographical references) and
literature. As the Accession Number is the taxonomic data (description of the bio-
permanent, another code is used to indi- logical source of the protein). The annota-
cate the number of changes that a particu- tion describes the function(s) of the protein,
lar sequence has undergone. This code is post-transcriptional modification (carbo-
known as the Sequence Version and is com- hydrates, phosphorylation, acetylation,
posed of the Accession Number followed by glycosylphosphatidylinositol-anchor, etc.),
a period and a number indicating the spe- domains and sites (calcium binding regions,
cific version. ATP-binding sites, zinc fingers, homeobox,
Since their inception in the 1980s, kringle, etc.), secondary structure, quater-
the nucleic acid sequence databases have nary structure (homodimer, heterotrimer,
experienced exponential growth, with etc.), similarities to other proteins, disease(s)
archives doubling in size about every 18 associated with deficiencies in the protein
months, reflecting advances in sequencing and sequence conflicts, variants, etc.
technologies. TrEMBL (Translation of EMBL nucle-
otide sequence database), the supplement of
Protein sequence databases Swiss-PROT, was created in 1996 to make
new sequences available as quickly as pos-
The protein sequence databases are the sible, since maintaining the high quality of
most comprehensive source of informa- Swiss-PROT is a time-consuming process
tion on proteins, some of which are listed that involves extensive sequence analysis
in Table 14.5. They can be classified into and detailed curation by expert annotators.
universal databases, covering proteins from TrEMBL consists of computer-annotated
all species and specialized data collections entries derived from the translation of all
storing information about specific families coding sequences in the EMBL nucleotide
or groups of proteins, or about the proteins sequence database, except for those already
of a specific organism. Two categories of included in Swiss-PROT.
universal protein sequence databases can be Searches in protein databases have
discerned: (i) simple archives of sequence become a standard research tool in the life
data; and (ii) annotated databases where sciences. To produce valuable results, the
additional information has been added to source databases should be comprehen-
the sequence record. sive, non-redundant, well-annotated and
The Protein Information Resource (PIR) up-to-date. However, the lack of a single
is the oldest protein sequence database. protein sequence database satisfying all
It was established in 1984 by the National four criteria has forced users to search mul-
Biomedical Research Foundation (NBRF) tiple databases. By unifying the PIR, Swiss-
and has been maintained since 1988 by PIR. PROT and TrEMBL database activities, PIR
Swiss-PROT, established in 1986 as an International and its partners, EBI and SIB,
annotated universal protein sequence data- have produced a single worldwide database
584 Chapter 14

of protein sequence and function, UniProt, interpro) and CluSTr (http://www.ebi.


which is the central repository of protein ac.uk/clustr) resources have been used to
sequence and function data, created by classify the data by sequence similarity.
joining the information contained in Swiss- Structural information includes amino acid
Prot, TrEMBL and PIR. composition for each of the proteomes, the
Homology derived Secondary Structure of
Proteins (HSSP) classification and links
to the Protein Data Bank (PDB). A search-
14.6.2 General genomics and able functional classification using the
proteomics databases Gene Ontology (GO) is also available. The
Proteome Analysis Database contains sta-
Table 14.6 provides some general genomics tistical and analytical data for the proteins
and proteomics databases. Databases of two- from completely sequenced genomes.
dimensional gel electrophoresis data, like
Swiss-2DPAGE which is maintained collab-
oratively by the Central Clinical Chemistry
Laboratory of the Geneva University 14.6.3 General plant databases
Hospital and SIB, has been considered one
of the classical proteomics databases. Table 14.7 lists the databases that generally
A number of databases address some cover multiple plant species or have mul-
aspect of genome or proteome compari- tiple functions. Information contained in
sons. The Kyoto Encyclopedia of Genes and these databases includes genetic and physi-
Genomes (KEGG) is a knowledge base for cal mapping, sequencing, clustering, micro-
systematic analysis of gene functions, linking array analysis, functional annotation, signal
genomic information with higher order func- transduction analysis, etc.
tional information. KEGG mainly addresses Some databases contain relatively sim-
regulation and metabolic pathways, although ple information such as cis-element motifs,
the KEGG scheme is being extended to plant genome sizes (C-values), promoter
include a number of non-metabolism-related sequences, small nucleolar RNAs (snoR-
functions. Clusters of Orthologous Groups NAs) or non-coding RNAs (ncRNAs), mito-
of proteins (COGs) is a phylogenetic classi- chondrial protein, cis-acting regulatory
fication of proteins encoded in completely elements/enhancers/repressors and clusters
sequenced genomes. COGs group together of predicted plant proteins.
related proteins with similar but sometimes In addition to data integration and visu-
non-identical functions. alization, some databases provide specific
The Proteome Analysis Initiative at tools required for managing and mining
EBI has the more general aim of integrat- the data, such as plant EST clustering and
ing information from a variety of sources functional annotation, signal transduction
that will together facilitate the classifica- analysis, functional analysis of agricultural
tion of the proteins in complete proteome plant and animal gene products, classifi-
sets. These proteome sets are built from the cation of repetitive sequences, phylogeny-
Swiss-PROT and TrEMBL protein sequence based tools for comparative genomics and
databases that provide reliable, well- retrieval of plant protease inhibitors (PIs)
annotated data as the basis for the analysis. and their genes.
The Proteome Analysis Initiative provides a Some databases cover a group of plant
broad view of the proteome data classified species, such as GrainGenes for wheat, bar-
according to signatures describing particu- ley, rye, triticale and oats, TropGENE DB for
lar sequence motifs or sequence similari- tropical crops and PLANTS Database for the
ties. At the same time it affords the option vascular plants, mosses, liverworts, hornworts
of examining various specific details like and lichens of the USA and its territories.
structure or searchable functional classifi- Some databases provide multiple
cation. The InterPro (http://www.ebi.ac.uk/ categories of information and also tools
Table 14.6. General genomics and proteomics databases.

Database name Uniform Resource Locator (URL) Database description

TIGR Gene Indices http://compbio.dfci.harvard.edu/tgi Databases to identify and classify transcribed sequences in eukaryotic species
using available EST and gene sequence data
GO http://www.geneontology.org Gene Ontology Consortium database
KEGG http://www.genome.ad.jp/kegg KEGG (Kyoto Encyclopedia of Genes and Genomes) is the primary database
resource of the Japanese GenomeNet service for understanding higher order
functional meanings and utilities of the cell or the organism from its genome
information
Swiss-2DPAGE http://www.expasy.org/ch2d Maintained collaboratively by the Central Clinical Chemistry Laboratory of the
Geneva University Hospital and the Swiss Institute of Bioinformatics (SIB), the
database contains data on proteins identified on various two-dimensional PAGE
and SDS-PAGE reference maps from human, mouse, Arabidopsis thaliana,
Dictyostelium discoideum, Escherichia coli, Saccharomyces cerevisiae and

Breeding Informatics
Staphylococcus aureus
COGs http://www.ncbi.nlm.nih.gov/COG Clusters of Orthologous Groups of proteins (COGs) were delineated by
comparing protein sequences encoded in complete genomes, representing
major phylogenetic lineages. Each COG consists of individual proteins or
groups of paralogues from at least three lineages and thus corresponds to an
ancient conserved domain
ERGO http://www.ergo-light.com ERGO, formerly WIT database, provides links to information about the functional
role of enzymes (via links to data in KEGG); links to NCBI Medline entries for
each enzyme; and links to enzymes and metabolic pathways records for each
enzyme. The database also provides access to thoroughly annotated genomes
within a framework of metabolic reconstructions, connected to the sequence
data; protein alignments and phylogenetic trees, and data on gene clusters,
potential operons and functional domains
wwPDB http://www.wwpdb.org The Worldwide Protein Data Bank (wwPDB) consists of organizations that act as
deposition, data processing and distribution centres for PDB data. The mission
of the wwPDB is to maintain a single Protein Data Bank Archive of
macromolecular structural data that is freely and publicly available to the
global community
Genome Project http://www.ncbi.nlm.nih.gov/entrez/ The NCBI Entrez Genome Project Database is intended to be a searchable
Database query.fcgi?CMD=search&DB= collection of complete and incomplete (in-progress) large-scale sequencing,
genomeprj assembly, annotation, and mapping projects for cellular organisms

585
Continued
586
Table 14.6. Continued.

Database name Uniform Resource Locator (URL) Database description

Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/ Entrez Gene is NCBIs database for gene-specific information with focus on the
query.fcgi?db=gene genomes that have been completely sequenced, that have an active research
community to contribute gene-specific information, or that are scheduled for
intense sequence analysis

Chapter 14
Entrez Genomes http://www.ncbi.nlm.nih.gov/sites/ NCBIs collection of databases for the analysis of complete and unfinished viral,
entrez?db=genome pro- and eukaryotic genomes
ACeDB http://www.acedb.org Caenorhabditis elegans, Schizosaccharomyces pombe, and human sequences
and genomic information
FlyBase http://flybase.org An integrated resource for genetic, molecular and descriptive data concerning
the Drosophilidae, including interactive genomic maps, gene product
descriptions, mutant allele phenotypes, genetic interactions, expression
patterns, transgenic constructs and their insertions, anatomy and images, and
genetic stock collections
Table 14.7. General plant databases.

Database name Uniform Resource Locator (URL) Database description

AgBase http://www.agbase.msstate.edu A curated, open-source, web-accessible resource for functional analysis of agricultural plant and
animal gene products
BarleyBase http://www.barleybase.org An online database for plant microarrays with integrated tools for data visualization and statistical
analysis
Cereal Small http://sundarlab.ucdavis.edu/ An integrated resource for small RNAs expressed in rice and maize that includes a genome
RNA Database smrnas browser and a smRNA-target relational database as well as relevant bioinformatic tools
CR-EST http://pgrc.ipk-gatersleben. A publicly available online resource providing access to sequence, classification, clustering, and
Crop ESTs de/cr-est annotation data of crop EST projects at IPK Gatersleben, Germany
CropNet http://ukcrop.net The UK Crop Plant Bioinformatics Network (UK CropNet) established to harness the extensive
work in genome mapping in crop plants in the UK. The resource facilitates the identification and
manipulation of agronomically important genes by laying a foundation for comparative analysis
among crop plants and model species. A number of software tools have been developed to
facilitate data visualization and analysis

Breeding Informatics
FLAGdb++ http://urgv.evry.inra.fr/projects/ Dedicated to the integration and visualization of data for high-throughput functional analysis
FLAGdb++/HTML/index.shtml of a fully sequenced genome, as illustrated for Arabidopsis
GnoPlante-Info http://www.genoplante.com Integrated and made publicly available the data have been generated for genomics sequence,
transcriptome, proteome, allelic variability, mapping and synteny, mutation data) and tools
(databases, interfaces, analysis software) through a collaboration between public French
institutes and private companies that aims at developing genome analysis programs for
crop species (maize, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis
thaliana and rice)
GeneFarm http://urgi.versailles.inra.fr/ Expert annotation of Arabidopsis gene and protein families
Genefarm
GrainGenes http://wheat.pw.usda.gov Molecular and phenotypic information on wheat, barley, rye, triticale and oats
Gramene http://www.gramene.org A comparative genome mapping database for grasses with both automatic and manual curation
performed to combine and interrelate information on genomic and EST sequences, genetic,
physical and sequence-based maps, proteins, molecular markers, mutant phenotypes and
QTL, and publications
MIPSPlantsDB http://mips.gsf.de/proj/plant/jsf The MIPS (Plant Genome Bioinformatics at the Institute for Bioinformatics) plant Genomics group
focuses on the bioinformatics of plant genomes. It developed from the Arabidopsis Genome
Annotation Group and currently provides the following databases: the MIPS Arabidopsis
thaliana genome database, the maize genome, the rice genome (MOsDB), the Medicago
Genome database, the Lotus Genome database, the Tomato Genome database,
Cis-Regulatory Element Detection Online (CREDO), mips Repeat Element database

587
(mips-REdat), mips Repeat Element catalogue (mips-REcat), and MotifDB
(Continued )
588
Table 14.7. Continued.

Database name Uniform Resource Locator (URL) Database description

MPIM http://www.plantenergy.uwa.edu. A database containing information on the mitochondrial protein import apparatus from a wide
au/applications/mpimp/index. range of organisms, including yeast, human, rat, mouse, Drosophila, Danio rerio, Cenorhabtidis
html elegans, Arabidopsis, rice and Plasmodium falciparum
PathoPlant http://www.pathoplant.de A database on plantpathogen interactions and components of signal transduction pathways
related to plant pathogenesis
ICIS http://www.icis.cgiar.org The International Crop Information System (ICIS) is a database system for the management and
integration of global information on genetic resources and crop improvement for any crop
Phytome http://www.phytome.org A comparative genomics database designed to facilitate functional plant genomics, molecular
breeding, and evolutionary studies. It contains predicted protein sequences, protein family
assignments, multiple sequence alignments, phylogenies, and functional annotations for

Chapter 14
proteins from a large, phylogenetically diverse set of plant taxa
PHYTOPROT http://urgi.versailles.inra.fr/ Clusters of predicted plant proteins
phytoprot
dbEST http://www.ncbi.nlm.nih.gov/ A division of GenBank that contains sequence data and other information on single-pass cDNA
dbEST sequences, or ESTs, from a number of organisms
PLACE http://www.dna.affrc.go.jp/ A database containing cis-element motifs found in plant genes
htdocs/PLACE
Plant DNA http://www.kew.org/genomesize/ A one-stop, user-friendly database for plant genome sizes. The most recent release (release 4.0,
C-values homepage.html October 2005) contains genome size data for 5150 species comprising 4427 angiosperms,
database 207 gymnosperms, 87 monilophytes and lycopods, 176 bryophytes and 253 algal species
Plant Genome http://www.ncbi.nlm.nih.gov/ Providing access to data from large-scale sequencing projects, genetic maps, and large-scale
Central genomes/PLANTS/PlantList. EST sequencing projects
html
Plant MPSS http://mpss.udel.edu MPSS (Massively Parallel Signature Sequencing) is a sequencing-based technology that uses a
unique method to quantify gene expression level, generating millions of short sequence tags
per library. The Plant MPSS databases are the largest publicly available set of tag-based gene
expression data
Plant Ontology http://www.plantontology.org The Plant Ontology (PO) is a collaborative effort among several plant databases and experts in
database plant systematics, botany and genomics to develop simple yet robust and extensible controlled
vocabularies that accurately reflect the biology of plant structures (morphology and anatomy)
and developmental stages
Plant snoRNA DB http://bioinf.scri.sari.ac.uk/cgi-bin/ Small nucleolar RNA (snoRNA) genes in plant species
plant_snorna/home
PLANT-PIs http://bighost.area.ba.cnr.it/ A database for facilitating retrieval of information on plant protease inhibitors (PIs) and their
PLANT-PIs genes
PlantGDB http://www.plantgdb.org A database for plant genomic sequences, in particular ESTs that correspond to fragments of
genes that are actively transcribed under particular conditions (currently with data for 48 plant
species)
PlantProm http://mendel.cs.rhul.ac.uk/mendel. A database for plant promoter sequences
php?topic=plantprom
PlantsP/PlantsT http://plantsp.sdsc.edu PlantsP and PlantsT are plant-specific curated databases that combine sequence derived
information with experimental functional genomics data. PlantsP focuses on proteins involved
in the phosphorylation process (i.e. kinases and phosphatases), whereas PlantsT focuses on
membrane transport proteins
POGs/PlantRBP http://plantrbp.uoregon.edu A relational database that integrates data from rice, Arabidopsis, and maize by placing the
complete Arabidopsis and rice proteomes and available maize sequences into putative

Breeding Informatics
orthologous groups (POGs)
TAED http://www.bioinfo.no/tools/TAED TAED (The Adaptive Evolution Database) is a phylogeny-based tool for comparative genomics
TIGR plant repeat http://www.tigr.org/tdb/e2k1/ Classification of repetitive sequences in plant genomes
database plant.repeats
TIGR Plant http://plantta.tigr.org The database uses expressed sequences collected from the NCBI GenBank Nucleotide
Transcript database for the construction of transcript assemblies. The sequences collected include ESTs
Assembly and full-length and partial cDNAs, but exclude computationally predicted gene sequences
Database
TropGENE DB http://tropgenedb.cirad.fr A database that manages genetic and genomic information about tropical crops
PLEXdb http://www.plexdb.org PLEXdb (Plant Expression Database) is a unified public resource for gene expression for plants
and plant pathogens, serving as a bridge to integrate new and rapidly expanding gene
expression profile data sets with traditional structural genomics and phenotypic data
PLANTS Database http://plants.usda.gov The PLANTS Database provides standardized information about the vascular plants, mosses,
liverworts, hornworts, and lichens of the USA and its territories
UK CropNet http://ukcrop.net/db.html Contains six databases (Arabidopsis Genome Resource, BarleyDB, BrassicaDG, CropSeqDB,
Databases FoggDB, MilletGenes) and mirrors many other plant-related databases

589
590 Chapter 14

for multiple functions. A typical exam- map viewer (CMAP) from GMOD and in the
ple is ICIS, which has been described in Proteins module displays. BLAST is used to
the previous section. As another example, search for similar sequences.
Phytome is an online comparative genomics Some sites host their own databases
resource that is built upon publicly avail- and also mirror other related databases.
able sequence and map information from a The UK Crop Plant Bioinformatics Network
diverse set of plant species, with a focus on (UK CropNet), established to harness the
the angiosperms, or flowering plants. It pro- extensive work in genome mapping in crop
vides an interface to the results from a vari- plants in the UK, is one such example. The UK
ety of phylogenomic analyses. Phytome is CropNet contains six databases (Arabidopsis
designed to facilitate functional genomics, Genome Resource, BarleyDB, BrassicaDG,
molecular breeding and evolutionary stud- CropSeqDB, FoggDB, MilletGenes). It also
ies in model and non-model plant species. mirrors many other plant-related databases
Currently, Phytome contains phylogenetic (Table 14.7).
and functional information for predicted
protein sequences (Unipeptides). Future
development will incorporate data and
tools for analysis of sequence-based com- 14.6.4 Individual plant databases
parative maps.
There are some databases supporting Tables 14.8, 14.9 and 14.10 list databases for
functions for comparative biology. Gramene specific plants. As model plants, Arabidopsis
is one such database for comparative genome and rice databases are listed in separate
mapping of grasses, with both automatic and tables. Although most plant databases differ
manual curation performed to combine considerably in content, the general subject
and interrelate information on genomic matter for a species-specific database may
and EST sequences, genetic, physical and include:
sequence-based maps, proteins, molecular genetic and cytogenetic maps;
markers, mutant phenotypes and QTL and genomic probes, nucleotide sequences;
publications. As an information resource, genes, alleles and gene products;
Gramenes purpose is to provide added value phenotypes, quantitative traits and QTL;
to data sets available within the public sec- genotypes and pedigrees of cultivars,
tor, which will facilitate researchers ability
genetic stocks and other germplasm;
to understand the rice genome and leverage pathologies and the corresponding
the rice genomic sequence for identifying
pathogens, insects and abiotic stresses;
and understanding corresponding genes, taxonomy of the crops and related
pathways and phenotypes in other crop
species;
grasses. This is achieved by building auto- addresses and research interests of col-
mated and curated relationships between
leagues; and
rice and other cereals. The automated and relevant bibliographic citations.
curated relationships are queried and dis-
played using controlled vocabularies and As a model crop, rice has highly diver-
web-based displays. The controlled vocab- sified databases, which results in each data-
ularies (ontologies) currently being utilized base containing specific information, such
include Gene ontology, Plant Ontology, as mutants and T-DNA insertions, or serv-
Trait ontology, Environment ontology and ing as a specific function, such as annota-
Gramene Taxonomy ontology. The web- tion and proteomic analysis (Table 14.9).
based displays for phenotypes include the There are two annotation related databases,
Genes and Quantitative Trait Loci (QTL) one oriented around contigs for high-quality
modules. Sequence based relationships are manual annotation (RAD) and the other pro-
displayed in the Genomes module using the viding a system to integrate programs for
genome browser adapted from Ensembl, in prediction and analysis of protein-coding
the Maps module using the comparative gene structure (RiceGAAS). The evidence
Table 14.8. Arabidopsis thaliana databases.

Database name Uniform Resource Locator (URL) Database description

AGNS http://wwwmgs.bionet.nsc.ru/agns AGNS (Arabidopsis GeneNet supplementary) database provides access to


description of the functions of the known Arabidopsis genes at various levels-the
levels of mRNA, protein, cell, tissue, and ultimately at the levels of organs and the
organism in both wild-type and mutant backgrounds
AGRIS http://arabidopsis.med.ohio-state.edu Arabidopsis Gene Regulatory Information Serve (AGRIS) is an information resource
of Arabidopsis promoter sequences, transcription factors and their target genes,
currently containing two databases, AtTFDB (Arabidopsis thaliana transcription
factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database)
Arabidopsis http://www.plantenergy.uwa.edu.au Experimentally identified mitochondrial proteins in Arabidopsis
Mitochondrial applications/ampdb/index.html
Protein Database

Breeding Informatics
Arabidopsis MPSS http://mpss.udel.edu/at Arabidopsis gene expression detected by massively parallel signature sequencing
Arabidopsis Nucleolar http://bioinf.scri.sari.ac.uk/cgi-bin/ Comparative analysis of nucleolar proteomes of human and Arabidopsis
Protein Database atnopdb/proteome_comparison
ARAMEMNON http://aramemnon.botanik.uni-koeln.de A curated database for A. thaliana transmembrane (TM) proteins and
transporters
ARTADE http://omicspace.riken.jp/ARTADE A database containing transcriptional structures elucidated by ARTADE which
estimates exon/intron structures of structurally unknown genes based on both tiling
array data and genomic sequence data
ASRP http://asrp.cgrb.oregonstate.edu A database for A. thaliana small RNA project
AtGDB http://www.plantgdb.org/AtGDB A part of PlantGDB Plant Genome Database and Analysis Tools to provide a
convenient sequence-centred genome view for A. thaliana, with a narrow
focus on gene structure annotation
AthaMap http://www.athamap.de Genome-wide map of putative transcription factor binding sites in A. thaliana
ATTED-II http://www.atted.bio.titech.ac.jp A database providing co-regulated gene relationships based on co-expressed genes
deduced from microarray data and the predicted cis elements
DATF http://datf.cbi.pku.edu.cn The database of Arabidopsis Transcription Factors (DATF) collects all Arabidopsis
transcription factors (total of 1922 loci and 2290 gene models) and classifies them
into 64 families
GABI-Kat http://www.gabi-kat.de A Flanking Sequence Tag (FST)-based database for T-DNA insertion mutants
generated by the GABI-Kat project
(Continued )

591
592
Table 14.8. Continued.

Database name Uniform Resource Locator (URL) Database description

MAtDB http://mips.gsf.de/proj/thal/db MIPS (Plant Genome Bioinformatics at the Institute for Bioinformatics)
A. thaliana database
NASCarrays http://affymetrix.arabidopsis.info Nottingham Arabidopsis Stock Centre microarray database
PLprot http://www.pb.ipw.biol.ethz.ch/ A. thaliana chloroplast protein database

Chapter 14
proteomics
RARGE http://rarge.gsc.riken.jp RIKEN Arabidopsis Genome Encyclopaedia (RARGE) contains Arabidopsis cDNAs,
mutants and microarray data
SeedGenes http://www.seedgenes.org Genes essential for Arabidopsis development
SUBA http://www.suba.bcs.uwa.edu.au The Arabidopsis Subcellular Database (SUBA) contains publicly available protein
subcellular localization data from a variety of sources from the model plant
Arabidopsis
TAIR http://www.arabidopsis.org The Arabidopsis Information Resource (TAIR) contains data for A. thaliana
genome
Table 14.9. Rice databases.

Database name Uniform Resource Locator (URL) Database description

Oryza Tag Line http://urgi.versailles.inra.fr/OryzaTagLine A database to organize data resulting from the phenotypic characterization
of a library of T-DNA insertion lines of rice (Oryza sativa L. cv. Nipponbare)
BGI-RISe http://rise.genomics.org.cn Beijing Genomics Institute Rice Information System (BGI-RISe), containing
comprehensive data from O. sativa L. ssp. indica, genome information
from O. sativa L. ssp. japonica and EST sequences available from
other cereal crops. Sequence contigs of indica (93-11) have been further
assembled into Mbp-sized scaffolds and anchored on to the rice
chromosomes referenced to physical/genetic markers, cDNAs and
BAC-end sequences. The rice genomes have been annotated for gene
content, repetitive elements, gene duplications (tandem and segmental)
and SNPs between rice subspecies
WhoGA http://rgp.dna.affrc.go.jp/whoga WhoGA is a rice genome annotation viewer using the GBrowse web-server

Breeding Informatics
application. In addition to predicted genes, WhoGA also includes gene
models for pseudogenes with or without EST/full-length cDNA support,
regions wherein genes could not be modelled although showing
significant homology to known genes, and ORFs predicted by a single
gene prediction program
IRIS http://www.iris.irri.org The International Rice Information System (IRIS) is the rice implementation
of the International Crop Information System (ICIS, www.cgiar.org/icis),
a database system for the management and integration of global
information on genetic resources and crop improvement for any crop
MOsDB http://mips.gsf.de/proj/plant/jsf/rice/index.jsp A resource for publicly available sequences of the rice (O. sativa L.)
genome to provide all available data about rice genes and genomics,
including mutant information and expression profiles
OryGenesDB http://orygenesdb.cirad.fr A database for rice genes, T-DNA and transposable elements flanking
sequence tags
Oryzabase http://www.shigen.nig.ac.jp/rice/oryzabase A comprehensive rice science database with the original aim to gather as
much knowledge as possible ranging from classical rice genetics to
recent genomics and from fundamental information to hot topics
RAP-DB http://rapdb.lab.nig.ac.jp Rice Annotation Project Database (RAP-DB) provides access to the
annotation data. By connecting the annotations to other rice genomics
data, such as full-length cDNAs and Tos17 mutant lines, the RAP-DB
serves as a hub for rice genomics

593
(Continued )
594
Table 14.9. Continued.

Database name Uniform Resource Locator (URL) Database description

RetrOryza http://www.retroryza.org RetrOryza is a database that aims at providing the research community
with the most complete resource on long terminal repeat-retrotransposon
for rice
RAD http://golgi.gs.dna.affrc.go.jp/SY-1102 The Rice Annotation Database (RAD) is a contig-oriented database for
rad/index.html high-quality manual annotation of the Rice Genome Project, which can

Chapter 14
present non-redundant contig analyses by merging the accumulated
PAC/BAC clones
RMD http://rmd.ncpgr.cn The Rice Mutant Database (RMD) contains the information of approximately
129,000 rice T-DNA insertion (enhancer trap) lines generated by an
enhancer trap system
Rice Pipeline http://cdna01.dna.affrc.go.jp/PIPE A unification tool which dynamically collects and compiles data from
scientific databases in National Institute of Agrobiological Sciences
(NIAS) to provide a unique scientific resource of rice that pools publicly
available data
Rice Proteome Database http://gene64.dna.affrc.go.jp/RPD/main_en.html Rice proteome database
RiceGAAS http://RiceGAAS.dna.affrc.go.jp Rice Genome Automated Annotation System (RiceGAAS)
RMD http://www.ricefgchina.org/mutant Rice mutant database
Breeding Informatics 595

for database diversification within a sin- The lack of mutual understanding between
gle species is also apparent from the list of breeders and scientists in other disciplines
Arabidopsis databases (Table 14.8). will continue to be a major limiting fac-
For other plant databases (Table 14.10), tor, so information management systems
two will be briefly described here. MaizeGDB and tools should be enhanced so that they
the Maize Genetics and Genomics Database can be accessed and used by breeders more
provides a central repository for public easily.
maize information and presents it in a way The great proliferation of relevant data-
that creates intuitive biological connec- bases and informatics tools makes them less
tions for the researcher with minimal effort. accessible to most breeders. The use of these
It also provides a series of computational resources is often hampered by the fact that
tools that directly address the questions of they are designed for specific application
the biologist in an easy-to-use form. Its data areas and thus lack universality. As users,
centre contains the following information: breeders have to visit many different data-
data centres; bacterial artificial chromo- bases and use different tool packages for
somes (BACs); ESTs; gene products; locus/ specific purposes, depending on which crop
loci; maps; metabolic pathways; microar- species the breeder works with, the types of
rays; overgos; people/organizations; pheno- information the breeder wants to retrieve
types; probes; QTL; references; sequences; and the different functions the breeder
SSRs; stocks; variations. At CIMMYT, two wants to perform. As a result, knowing how
crop-specific databases for wheat and maize to access and use these databases demands a
(http://iwis.cimmyt.org/ICIS5/) have been significant investment of time and effort.
developed. Data stored in central databases such
Dendrome is a collection of forest as KEGG, BRENDA or SABIO-RK is often
tree genome databases and information limited to read-only access. If researchers
resources for the international forest genet- want to store their own data, they must
ics community. Dendrome is part of a larger either develop their own information sys-
collaborative effort to construct genome tem for managing that data, which can be
databases for major crop and forest species. time-consuming and costly, or they must
The primary genome database of Dendrome store their data in existing systems, which
is called TreeGenes. TreeGenes provides is often restricted. Hence, an out-of-the-box
curated information about genetic maps, information system for managing breeding-
DNA sequences, germplasm, markers, QTL related data is needed. As an example of
and ESTs. The goal of this effort is to pro- such effort, Weise et al. (2006) designed
vide an improved interface for comparison META-ALL, an information system that allows
between maps and to integrate expression the management of metabolic pathways,
and EST data. including reaction kinetics, detailed loca-
tions, environmental factors and taxonomic
information. Data can be stored together
with quality tags and in different parallel
14.7 Future Prospects for Breeding versions.
Informatics As many information systems and
databases are developed through specially
Plant breeding in the future will be largely funded projects, which generally only run
driven by molecular biology and informat- for a specific period of time, they become
ics. Breeding efficiency will depend on how outdated and ill supported. They may also
much information breeders can access and be abandoned completely. Maintaining the
how wisely and effectively they can use it databases and tools that have been devel-
in their breeding programmes. oped requires continuous funding and tech-
Breeding related databases and infor- nical support, which is almost impossible
mation systems have to be improved so if the number of databases and informatics
they are more user-friendly to breeders. tools keep growing at the present rate. One
596
Table 14.10. Databases for other plants excluding Arabidopsis and rice.

Database name Uniform Resource Locator (URL) Database description

Brassica BASC http://bioinformatics.pbcbasc. The BASC system provides tools for the integrated mining and browsing
latrobe.edu.au of genetic, genomic and phenotypic data, hosting information on Brassica species
supporting the Multinational Brassica Genome Sequencing Project
Diatom EST Database http://www.biologie.ens.fr/ ESTs from two diatom algae, Thalassiosira pseudonana and
diatomics/EST Phaeodactylum tricornutum
ForestTreeDB http://foresttree.org/ftdb A resource that centralizes large-scale EST sequencing results from several tree species
Legume Information http://www.comparative-legumes.org The Legume Information System (LIS), formerly the Medicago Genome System Initiative
(MGI), is an EST sequence database and analysis system that supports EST
sequencing at the Noble Foundation Center for Medicago Genome Research
MaizeGDB http://www.maizegdb.org The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for

Chapter 14
maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal
mapping data. In addition, MaizeGDB provides contact information for over 2400
maize cooperative researchers, facilitating interactions among members of the rapidly
expanding maize community
MtDB http://www.medicago.org/MtDB Medicago truncatula genome database
NRESTdb http://genome.ukm.my/nrestdb Natural Rubber EST Database (NRESTdb) serving as a molecular resource for
functional genomics of the rubber tree
Panzea http://www.panzea.org The Panzea Database contains the genotype, phenotype, and polymorphism data
produced by the Molecular and Functional Diversity in the Maize Genome project
PoMaMo https://gabi.rzpd.de/PoMaMo.html PoMaMo (Potato Maps and More), established within the German Plant Genome Project
GABI, harbours information on molecular maps of all 12 potato chromosomes with
about 1000 mapped elements, sequence data, putative gene functions, results from
BLAST analysis, SNP and Indel information from different diploid and tetraploid potato
genotypes, publication references, and links to other public databases like NCBI or
SGN (see below) for example
SGMD http://psi081.ba.ars.usda.gov/SGMD/ Soybean genomics and microarray database
default.htm
SoyGD http://soybeangenome.siu.edu The Soybean Genome Database (SoyGD) genome browser integrates the publicly
available physical map, BAC sequence database and genetic map-associated
genomic data
TED http://ted.bti.cornell.edu Tomato expression database
TIGR Maize database http://maize.tigr.org A repository of publicly available maize genomic sequences
TomatEST DB http://biosrv.cab.unina.it/tomatestdb/ A secondary database integrating EST/cDNA sequence informationindex2.php from
different libraries of multiple tomato species collected from dbEST
Soybean Genome http://www.soybeangenome.org Dedicated to the sharing and dissemination of public information on all aspects of
soybean genomics and the application of genome information to soybean
BarleyBase http://www.plexdb.org/plex.php? BarleyBase is a MIAME-compliant and Plant Ontology enhanced database=Barley
expression database for plant microarray data
Dendrome http://dendrome.ucdavis.edu Dendrome is a collection of forest tree genome databases and other forest genetic
information resources for the international forest genetics community
TropGENE http://tropgenedb.cirad.fr A database that manages genetic and genomic information about tropical crops studied
by the Agricultural Research Centre for International Development (known by its
French acronym, CIRAD), including banana, cocoa, coconut, coffee, cotton, oil palm,
rice, rubber tree and sugarcane
Cotton http://www.cottondb.org A database that contains genomic, genetic and taxonomic information for cotton

Breeding Informatics
(Gossypium spp.). It serves both as an archival database and as a dynamic database
which incorporates new data and user resources
CyanoBase http://bacteria.kazusa.or.jp/cyano CyanoBase provides an easy way of accessing the sequences and all-inclusive
annotation data on the structures of the cyanobacterial genomes
BeanGenes http://beangenes.cws.ndsu.nodak.edu A plant genome database currently containing information relevant to Phaseolus and
Vigna species
SGN http://sgn.cornell.edu The SOL Genomics Network (SGN) is a Clade Oriented Database (COD) containing
genomic, genetic and taxonomic information for species in the Euasterid clade,
including the families Solanaceae (e.g. tomato, potato, eggplant, pepper, petunia)
and Rubiaceae (coffee)
RAPESEED http://rapeseed.plantsignal.cn Shanghai RAPESEED database contains information collected on ESTs, full-length
cDNA, unique serial analysis of gene expression (SAGE) tags, and EMS mutants for
Brassica napus
ICIS http://www.icis.cgiar.org A database system that provides integrated management of global information on crop
improvement and management both for individual crops and for farming systems

597
598 Chapter 14

approach is to develop databases and tools universal language that can be shared across
that need minimum maintenance or that all plant species. Gene Ontology and Plant
can be upgraded or updated automatically. Ontology projects represent a good begin-
Another way is to develop a universal data- ning of such effort. Another universal lan-
base and informatics tool package for infor- guage is also needed that can be used for
mation-driven plant breeding, which needs communications among breeders, database
a worldwide collaboration through a global curators, bioinformaticians, molecular biol-
scientific programme in a way similar to the ogists and tool developers. Breeders should
human genome sequencing project. be a major player rather than an observer in
Developing a universal database or a the development of such a universal data-
database of all databases would require a base or language.
15
Decision Support Tools

Molecular breeding involves identification in order to provide the best compromise


of beneficial genetic variation and selection between time, cost and genetic gain:
of desirable recombinants, and it manages Identify new sources of beneficial
and utilizes genetic variation more effec-
genetic variation and develop robust
tively and efficiently through molecular
markertrait associations.
technology including two major proce- Manage and manipulate large amounts
dures, marker-assisted-selection (MAS) and
of genotype, pedigree and phenotype
genetic transformation. In general, MAS
data.
relies on the reliable identification and Select desirable recombinants through
application of simply inherited markers
an optimum combination (in time and
that are inside of, or in close proximity to,
space) of phenotypic and genotypic
genetic factors affecting simple, oligogenic
information.
and multi-genic traits of importance to crop Develop breeding systems that mini-
improvement. The journey from phenotyp-
mize population sizes, number of
ing and genotyping individuals from genetic
generations and overall costs while
populations to identifying markertrait
maximizing genetic gain for traditional
associations and finally applying markers
and novel target traits.
in molecular breeding programmes depends
on a sequential use of a number of decision Figure 15.1 summarizes the forward
support tools that facilitate communica- and reverse genetics approaches associated
tion and collaboration between molecular with molecular breeding product delivery.
biologists, geneticists, bioinformaticians, Decision support tools are required to man-
trait specialists and breeders towards effec- age and optimize many components of
tive interdisciplinary decision making. molecular plant breeding. Many decision
Ultimately, molecular breeding programmes support tools are in the form of software.
will combine MAS with a diverse range of From the UKs gateway to high-quality
technology-assisted interventions including Internet resources in biology and biomedical
whole genome scans, advanced biometrical research (http://bioresearch.ac.uk/browse/
analyses and quantitative genetics model- mesh/D012984.html), there is a list of all
ling that will require increasingly complex software available, some of which are related
facilitating software. to molecular breeding. The Laboratory of
Effective molecular breeding requires Statistical Genetics at Rockefeller University
a careful balance of many diverse elements also provides web resources of genetic

Yunbi Xu 2010. Molecular Plant Breeding (Yunbi Xu) 599


600 Chapter 15

Functional Analysis
Differential expressed genes between genetic
stocks (chips and arrays)
REVERSE
GENETICS
Annotation through comparison with
model genome
FORWARD GENETICS

Detailed expressional
Association mapping Linkage mapping Candidate genes
analysis (RT-PCR)

Comparative validation Functional validation


Validated through
Epistasis
multiple populations and
GxE
environments
Physical map and gene isolation

Candidate gene-based
genomics technologies

Gene functional validation (RNAi) Multiple backgrounds (epistasis)


Multiple environments (G x E)
Marker-assisted
breeding
Gene-based genomics
GMO products
technologies

Allele and gene mining


MOLECULAR BREEDING OUTPUTS

Fig. 15.1. Flowchart for molecular breeding approaches and outputs. Various molecular breeding
approaches discussed in this book are summarized, including forward and reverse genetics approaches
and their associated breeding outputs. Decision support tools may be needed in each step. G E,
genotype-by-environment interaction.

linkage analysis with various software listed improvement. Decision support tools are
in alphabetic order (http://linkage.rockefel- needed to manage and evaluate crop genetic
ler.edu/soft/list.html). resources and breeding materials including
This chapter provides an overview of key genetic diversity and variation analysis,
decision support tools that need to support population structure evaluation and for
molecular breeding programmes, including hybrid crops, use of the genetic diversity to
germplasm evaluation, breeding popula- define heterotic groups and predict hybrid
tion management, genotype-by-environment performance.
interaction (GEI), genetic map construction,
markertrait linkage and association analy-
sis, MAS and breeding system design and 15.1.1 Germplasm management
simulation. Plant variety protection and and evaluation
breeding information management are dis-
cussed in Chapters 13 and 14, respectively. Genetic resources provide the foundation
of any plant breeding programme. Efficient
germplasm utilization requires well-founded
15.1 Germplasm and Breeding sampling strategies. Genetic diversity analy-
Population Management and sis and its relationship with functional vari-
Evaluation ation of the target trait is the fundamental
basis of germplasm evaluation (Chapter 5).
Germplasm collections and breeding popu- Novel alleles and new genes from both culti-
lations are basic materials required in crop vated and wild relatives provide the engine
Decision Support Tools 601

of germplasm enhancement. However, needed that can be used to combine these


marker technologies offer to dramatically different criteria. As more and more genetic
increase the pace, precision and efficiency information from functionally characterized
with which that genetic variation can drive genes and genomic regions becomes avail-
new product development. able, the construction of core collections
As discussed in Chapter 5, genebank can also utilize this functional diversity
curators use a variety of methodologies for rather than the neutral information that is
guiding their germplasm collection and revealed by molecular markers used in most
management strategies. Geographical infor- current studies. Development of core collec-
mation systems (GIS) data associated with tions based on all possible types of data will
the site from which germplasm was collected require powerful new computational tools.
is an important component of the standard Ultimately, germplasm users need to be pro-
description (passport data) of an accession. vided with on-line dynamic selector tools
Plant breeders can use GIS data to access that allow them to tailor the selection of a
valuable information about the ecological subset of germplasm based upon analysis
environment where new genetic resources of all available data using their own unique
were growing. Thus, genetic resources col- criteria.
lected from drought prone areas or hot spots Marker-assisted germplasm evaluation
for important diseases can be rich sources (MAGE) will play an important role in the
of new beneficial genetic variation for crop procedures related to the acquisition/distri-
improvement programmes. GIS data alone bution, maintenance and use of germplasm
can be misleading in this respect where the (Bretting and Widrlechner, 1995; Xu, Y.
site of collection is not related to the loca- 2003; Chapter 5). Efficient marker-assisted
tion. However, DNA marker analysis pro- germplasm management relies on the avail-
vides a valuable complement to GIS data, ability of several key resources (Xu, Y., 2003),
as it can assist in establishing generic rela- including: (i) suitable genetic markers char-
tionships and estimating genetic distances acterized for the number of alleles, polymor-
between genetic resources. Thus, the com- phic information content (PIC value), allele
bined use of GIS and DNA marker data can sizes and ranges, signal strength, working
help to prioritize emphasis of germplasm conditions and the necessary information
screening efforts. for multiplexing; (ii) high-density molecu-
As the size of germplasm collections lar maps allowing the selection of markers
has increased, genebank curators have evenly spread over the whole genome or
attempted to stratify genetic resources in densely spread over the specific region of
order to provide breeders and research- interest; (iii) established markertrait asso-
ers with a small number of accessions that ciations for traits of agronomic importance;
represent a large proportion of the over- (iv) high-throughput genotyping systems;
all genetic variation. This has led to the and (v) an efficient data management and
establishment of so-called core collections analysis system. In addition, germplasm
(Chapter 5). Breeders and researchers will collection should provide a large number
then typically first evaluate the core collec- of relevant accessions for the target research
tion to identify accessions with high levels purpose. Although computational programs
of their desired target trait and then move are available for all relevant analyses, as
on to screen closely related germplasm from discussed in Chapter 5, including computer
the main collection. Unfortunately, a core simulation and re-sampling (Xu et al., 2004),
collection can only be as good as the data on a fully integrated, user-friendly graphical
which it is based, which often constitutes program is needed to bring all these func-
a relatively small number of taxonomic tions together to facilitate decisions through
descriptors and agronomic traits. More all aspects of germplasm evaluation.
robust core collections could be generated The recent focus on molecular charac-
through combined analysis of phenotype, terization of germplasm collections and the
genotype and GIS data. However, a tool is subsequent use of those collections in crop
602 Chapter 15

improvement has led to a number of bio- showing genotype information; (iii) a graph
informatics projects developing new tools drawing tool that could show both pedi-
to improve the power and scope of such grees and phylogenetic trees or networks
analyses. Ambiguous germplasm identifi- (graphs containing closed loops, which
cation, difficulty in tracing pedigree infor- can be used to represent genetic exchange
mation and lack of integration of databases between organisms); (iv) a map tool show-
across genetic resources, characterization, ing the distribution of genetic markers on
evaluation and utilization have been identi- the relevant linkage maps; and (v) a plot-
fied as the major constraints to developing ting tool that could show scatter plots of, for
knowledge-led germplasm enhancement example, diversity distances between pairs
programmes. of accessions and principal components
Visualization tools enable us to simul- (Davenport et al., 2004).
taneously view large quantities of these A high-throughput platform for identi-
data and to identify underlying patterns fying single feature polymorphisms (SFPs)
in our data sets. We also need analytical in complex genomes has been developed
tools to help search for association between by Borevitz et al. (2003). This is based on
the target trait and individual markers or hybridizing Arabidopsis genomic DNA
marker haplotypes and to look for patterns against an RNA expression GeneChip. Their
of genetic diversity with our germplasm informatics analysis involved development
collections. GENE-MINE (http://www.gene- of analytical tools to identify 4000 SFPs by
mine.org; Davenport et al., 2004) was devel- comparing the reference ecotype Columbia
oped to bring together experts in database against Landsberg erecta. A linear cluster-
development, data querying and visualiza- ing algorithm enabled identification of SFPs
tion, quantitative methods and computa- representing potential deletions in 111
tional methods, to develop novel tools for transposons, disease resistance genes and
the analysis of germplasm collections char- genes involved in secondary metabolism,
acterized by molecular markers. Within the at 5% error rate. In crop plants, a genome-
GENE-MINE project, a generic information wide rice DNA polymorphism database has
system was developed for studying the rela- been constructed based on the genomic
tionships between large-scale collections sequences from two subspecies, indica
databases from genebank material such as (93-11) and japonica (Nipponbare). This
molecular marker data, trait phenotype, database contains 1,703,176 single nucle-
passport and environmental data. The sys- otide polymorphisms (SNPs) and 479,406
tem uses a generic model for associating insertion/deletions (Indels), approximately
properties, such as traits, genetic data and one SNP every 268 bp and one Indel every
molecular data, with accession data. Users 953 bp in rice genome (Shen et al., 2004).
are able to make queries using terms from Several commercialized or freely avail-
germplasm query language (GQL), an exten- able software packages, such as STATISTICA, JMP,
sion of structured query language. GQL SAS, NTSYS, GENEFLOW, STRUCTURE and POWER-
allows the definition of specialist query MARKER, can be used for germplasm evaluation
terms that are not held with the database. including principal component or coordinate
For germplasm analysis, these may include analysis to identify distinct groups or popu-
pedigree terms such as grandparent or lations, cluster or structure analysis to find
ancestor, geographical terms such as neigh- population structure. STRUCTURE, developed
bouring country and marker terms such as by Pritchard et al. (2000a), uses multi-locus
haplotype. genotype data to investigate population struc-
Graphical tools for germplasm analysis ture, which can be used to infer the presence
that are considered to be essential to the of distinct populations, assign individuals
GENE-MINE and other similar tools include: to populations, study hybrid zones, identify
(i) a geographical tool that could show the migrants and admixed individuals and esti-
origin of accessions and the distribution mate population allele frequencies in situa-
of genetic diversity; (ii) a haplotype tool tions where many individuals are migrants
Decision Support Tools 603

or admixed. It can be applied to most of the R package for multivariate analysis, graphics,
commonly used genetic markers. phylogeny and spatial analysis.
POWERMARKER (http://www.powermarker. The exploration of DNA sequence vari-
net), as a software package to perform statisti- ation for making inferences on evolution-
cal analysis of marker data collected from a ary processes in populations has become
set of germplasm accessions, delivers a data- increasingly important recently and
driven, integrated analysis environment (IAE) requires the coordinated implementation
for marker data. The IAE integrates the data of a Suite of Nucleotide Analysis Programs
management, analysis and visualization in (SNAP; http://www.cals.ncsu.edu/plant
a user-friendly graphic interface. It acceler- path/people/faculty/carbone/snap.html),
ates the analysis process and enables users to each bound by specific assumptions and
maintain data integrity throughout the proc- limitations. A workbench tool was devel-
ess. POWERMARKER handles a variety of data oped to make existing population genetic
from most of the commonly used genetic software more accessible and to facilitate
markers including simple sequence repeat the integration of new tools for analysing
(SSR), SNP and restriction fragment length patterns of DNA sequence variation, within
polymorphism (RFLP). The results can be a phylogenetic context. Collectively, SNAP
exported as frequency, distance and tree. tools can serve as a bridge between theoreti-
Various data analyses can be performed cal and applied population genetic analysis
by POWERMARKER. Its summary statistics (Aylor et al., 2006).
include basic statistics, allele and geno-
type frequencies, haplotype frequencies,
HardyWeinberg disequilibrium, two-locus
linkage disequilibrium and multi-locus 15.1.2 Breeding population management
linkage disequilibrium. Structure analysis
includes population differentiation test, Decision support tools for the management
classic F-statistics, population-specific of breeding populations are needed to assist
F-statistics and co-ancestry matrix. In phy- in the choice of parental lines, types of
logenetic analysis, tree construction can be crosses and the nature of breeding system.
made after computing frequencies and fre- Computational tools may also assist in the
quency-based distances with bootstrapping establishment and maintenance of hetero-
implemented. Association analysis can be tic groups, selection of lines for creation
done through a single-locus case control of a synthetic cultivar, prediction of prog-
test, single-locus F-test and haplotype trend eny and hybrid performance; and monitor-
regression. ing of genomic profiles during population
There are several software packages for improvement.
treatment and analysis of data collected for
germplasm accessions. GRAPHICAL GENOTYPES Establishing heterotic patterns
(GGT) software, developed by van Berloo
(1999), allows the user to transform molec- Generating highly heterotic hybrids is highly
ular marker data into simple colourful dependent on having sufficient genetic
chromosome drawings. Besides graphical diversity in the germplasm pool of potential
representation, GGT can also be used for parents. However, it is still not possible in
selection or filtering of marker data. POPDIST many crops to predict the level of hybrid
calculates a number of different genetic iden- vigour from analysis of parental lines. For
tities, phylogeny reconstructing measures example, commercial maize hybrids are
and distance reconstructing measures (http:// typically generated from crosses between
genetics.agrsci.dk/bg/popgen/). ADEGENET is inbreds from complementary heterotic
a package dedicated to the handling of groups. Therefore, construction or develop-
molecular marker data for multivariate an- ment of heterotic groups has been one of the
alysis (http://pbil.univ-lyon1.fr/software/ade key strategies in hybrid breeding for many
genet/). This package is related to ADE4, an crops. However, moving to a more definitive
604 Chapter 15

system to predict which genotypes in each find the best process consumes a large pro-
heterotic group should be crossed to maxi- portion of the breeding effort but there is
mize heterosis, is still not possible in many currently no alternative as hybrid perform-
crops. ance is highly unpredictable in most crops.
Genotyping parental lines on a genome- Therefore, predicting hybrid performance
wide scale, especially when gene-based has always been a primary objective in all
markers are available, may provide an hybrid-breeding programmes.
opportunity for establishing parenthybrid Methods for predicting the perform-
performance relationships at the molecu- ance of single crosses would greatly
lar level. Genome-wide heterozygosity and enhance the efficiency of hybrid breeding
specific combinations of alleles (linkats) programmes. Development of a reliable
may be useful determinants in some crops method for predicting hybrid performance
for maximizing heterosis and hybrid vig- or heterosis without generating and test-
our. Melchinger and Gumber (1998) sug- ing hundreds or thousands of single cross
gested a multi-stage procedure to identify combinations has been the goal of numer-
heterotic groups (Chapter 9). Determining ous studies using marker data and combi-
heterotic patterns is a continual process, nations of marker and phenotypic data,
each cycle of which consists of three steps: particularly in maize and rice. Considering
(i) cluster analysis to identify broad heter- that hybrid performance must be governed
otic groups; (ii) combining ability and heter- by many genes, genotyping parental lines
osis analysis to define the heterotic pattern; on a genome-wide scale, especially when
and (iii) update and maintain heterotic gene-based markers are used, provides an
groups. Tools for heterotic group identifica- opportunity of establishing parenthybrid
tion are usually the same as those that have performance relationships at the molecu-
been used in germplasm classification and lar level. Genome-wide heterozygosity and
grouping. allele combination analysis may provide
some clue for breeding more heterotic and
Predicting hybrid performance vigorous hybrids. Therefore, using parental
genotyping may reduce the required level of
A successful hybrid development process testcross-based phenotyping analysis.
depends on a full understanding of the The best linear unbiased prediction
parental genotypes and the consequences of (BLUP) procedure has been used for decades
their genetic combinations and interactions for evaluating the genetic merit of animals,
in the hybrid. Hybrid breeding includes especially dairy cattle. Intrapopulation,
two major procedures: breeding parental additive genetic models have traditionally
lines and selection of the best combina- been used for BLUP in animal breeding
tions of those parental lines for hybrid pro- (Henderson, 1975). Bernardo (1994, 1996)
duction. These procedures involve a large used BLUP in maize breeding with inter-
amount of work for field evaluation, test- population genetic models that involve
crossing and progeny tests. Breeders both general combining ability and specific
continually have to decide which experi- combining ability and found that BLUP
mental single crosses to test, which advanced is useful for routine prediction of single-
hybrids to recommend for further testing cross performance. Results have indicated
or commercialization and which inbred that BLUP is useful for routine prediction
parents to cross to form new base popula- of single-cross performance. The predicted
tions for inbred/population development performance of single crosses may subse-
(Bernardo, 1999). As a result, large-scale quently be used to predict the perform-
testcrossing is required for all hybrid- ance of F2 tester combinations, three-way
related inbred development. Testcrossing crosses, or double crosses. Along with the
might be carried out at many stages in the pedigree relationship, the BLUP method
breeding process often beginning from the can use trait data, or both trait and marker
very first generations. This trying all to data, for prediction.
Decision Support Tools 605

In some specific cases within a breed- by the Whitehead Institute (Lander et al.,
ing programme, tools are needed for selec- 1987). Almost all molecular maps based on
tive genotyping and pooled DNA analysis the first generation of molecular markers,
as described in Chapter 7 and by Xu et al. RFLPs, were constructed using this soft-
(2008). GENEPOOL (http://genepool.tgen.org/) ware. As an alternative, MAP MANAGER CLAS-
is such a software package that provides SIC is a graphic, interactive program to map
analytical tools for the detection of shifts Mendelian loci using intercrosses with co-
in relative allele frequency between pooled dominant markers, backcrosses or recom-
genomic DNA from cases and controls binant inbred lines (RILs) in experimental
using SNP-based genotyping microarrays. plants or animals (Manly, 1993; http://www.
GENEPOOL supports genotyping platforms mapmanager.org/mapmgr.html).
from Affymetrix and Illumina (Pearson Some special statistical modifications
et al., 2007). Another package is PDA, may be needed to construct a map using
POOLED DNA ANALYSER, a tool for analysis of markers with severe distortion of segrega-
pooled DNA data (http://www.ibms.sinica. tion. MAPDISTO (web/ftp: http://mapdisto.free.
edu.tw/csjfann/first%20flow/programlist. fr/) is such a program for mapping genetic
htm; Yang, H.-C. et al., 2006b). markers in case of segregation distortion
In addition to the tools for germplasm using experimental segregating populations
and breeding population management and such as backcross, double haploid (DH) and
evaluation described above, decision sup- RIL populations. It can: (i) compute and draw
port tools are needed for intellectual property genetic maps through a graphical interface;
rights and plant variety protection. Chapter and (ii) facilitate the analysis of marker data
13 provides a section on how molecular showing segregation distortion due to differ-
markers can be used for this purpose. ential viability of gametes or zygotes.
Maps or data from multiple populations
derived from different crosses can be com-
15.2 Genetic Mapping and bined into single or consensus maps through
MarkerTrait Association Analysis joint mapping. JOINMAP is a software package
for construction of genetic linkage maps for
several types of mapping populations: BC1, F2,
Construction of genetic maps using molec-
RIL, F1- and F2-derived DH and out-breeder
ular markers (Chapter 2) and use of these
full-sib family (http://www.kyazma.nl/index.
maps in markertrait association analysis
php/mc.JoinMap/). It can combine (join)
(Chapters 6 and 7) are two prerequisite steps
data derived from several sources into an
required for MAS. There is a large number
integrated map, with several other functions
of methodologies and tools currently avail-
including linkage group determination, auto-
able for various types of populations and
matic phase determination for out-breeder
markers. In this section, only some of these
full-sib family, several diagnostics and map
tools will be discussed and at the same time,
charts (van Ooijen and Voorrips, 2001).
we expect more tools will be developed for
A software package with comparative
new mapping strategies and markers and
function is CMAP, which was developed
new types of populations.
as a web-based tool to allow users to view
comparisons of genetic and physical maps.
The package also includes tools for curat-
15.2.1 Genetic map construction ing map data (http://www.gmod.org/cmap;
Ware et al., 2002).
Genetic maps can be constructed using seg-
regating populations of different types for
species with different levels of ploidy as 15.2.2 Linkage-based QTL mapping
described in Chapter 2. The first and most
frequently used software for map construc- Demonstrated linkages/associations between
tion is MAPMAKER/EXP, which was developed target traits/genes and molecular markers
606 Chapter 15

are based on genetic linkage and linkage- A currently widely used QTL mapping
disequilibrium (LD) mapping experiments software is QTL CARTOGRAPHER (http://stat
(Chapters 6 and 7). Decision support tools gen.ncsu.edu/qtlcart/cartographer.html),
required for genotypephenotype asso- which implements several statistical meth-
ciation include: (i) statistical methods and ods using multiple markers simultaneously
tools to establish, validate and compare including composite interval mapping and
genotypephenotype associations through multiple composite interval mapping.
linkage mapping, LD or association map- Interaction between identified QTL can also
ping and in silico mapping, using single be estimated. PLABQTL uses composite inter-
populations, multiple populations or all val mapping with many functions similar
genetic resources with information avail- to QTL CARTOGRAPHER. QTL can be localized
able from multiple trials across years, sea- and characterized in populations derived
sons and locations; (ii) statistical methods from a biparental cross by selfing or produc-
and tools for identification of genetic back- tion of DHs. Simple and composite interval
ground effects, quantitative trait loci (QTL) mapping are performed using a fast multi-
alleles at multiple loci and multiple alleles ple regression procedure. As an additional
at a locus; (iii) tools facilitating the process function to many other software packages, it
from linked markers to functional markers can be used for QTL environment interac-
and candidate genes; and (iv) tools facili- tion analysis (Utz and Melchinger, 1996).
tating management of genetic populations, QGENE (http://www.qgene.org/) is
maps and related marker and phenotypic intended for doing comparative analyses
data. of QTL mapping data sets in computation-
There are many commercial or freely ally efficient ways that are of maximum use
available software packages for establishing to analysts. It is also written with a plug-in
association between marker genotypes and architecture for ready extensibility. QGENE
trait phenotypes. The most commonly used was begun in about 1991 as a map and
are QTL CARTOGRAPHER, MAPQTL, PLABQTL and population simulation program, to which
QGENE. All of these only handle bi-allelic pop- QTL analyses were added on. Recently
ulations, while MCQTL (Jourjon et al., 2005) QGENE has been rewritten in the Java lan-
also performs QTL mapping in multi-allelic guage, allowing it to run on any computer
situations, including bi-parental populations operating system. It offers most conven-
made from segregating parents, or sets of bi- tional QTL-mapping methods and allows
parental, bi-allelic populations. The most fre- their side-by-side comparison. Its interface
quently used software during the 1980s and can be rendered in any human language
1990s was MAPMAKER/QTL, which is a sister desired; the conversion requires only that
software package to MAPMAKER/EXP, developed the interested user writes a translation file.
by Lander et al. (1987) (http://www-genome. QGENE can be used for analysis of trait, QTL
wi.mit.edu/genome_software). This software and permutation and simulation of popula-
is based on maximum likelihood estimation tions and traits as well.
of linkage between marker and phenotype Several software packages can be
using interval mapping, which deals with used for constructing linkage maps in out-
simple QTL and several standard popula- crossing plant species. ONEMAP provides
tions. Another early software package, MAPL such an environment using full-sib families
(MAPping and QTL analysis; http://lbm. derived from two outbreed (non-inbreed-
ab.a.u-tokyo.ac.jp/software.html; Ukai et al., ing) parent plants (http://www.ciagri.usp.
1995) allows a user to get results on segrega- br/aafgarci/OneMap/; Garcia et al., 2006).
tion ratio, linkage test, recombination value, Another is MAPQTL (software for the calcu-
group markers, order markers by metric lation of QTL positions on genetic maps,
multi-dimensional scaling, draw a map and http://www.mapqtl.nl), which can be used
graphical genotype and map QTL through for several types of mapping populations
interval mapping and analysis of variance including BC1, F2, RILs, (doubled) haploids
(ANOVA). and full-sib family of out-breeders. It can
Decision Support Tools 607

be used for QTL mapping through interval a diallel modelling of the QTL effects is
mapping, composite interval mapping and allowed when using multiple related fami-
non-parametric mapping, with functions for lies. MAPPOP was developed for selective
automatic cofactor selection and permuta- mapping and bin mapping by choosing good
tion test. samples from mapping populations and for
A few mapping software programs locating new markers on pre-existing maps
consider epistasis in QTL mapping. EPISTACY (Vision et al., 2000). In addition, QTLNET-
is an SAS program designed to test all pos- WORK was developed for mapping and visu-
sible two-locus combinations for epistatic alizing the genetic architecture underlying
(interaction) effects on a quantitative trait. complex traits for experimental populations
The program is really an SAS program tem- derived from a cross between two inbred
plate that users must modify to suit their lines (http://ibi.zju.edu.cn/software/qtlnet
own data sets. In the simplest cases, users work; Yang et al., 2008).
will need only to change the names of the As web-based tools become increas-
files containing their data. However, the ingly important, web-based QTL analytical
program uses least squares methods and tools become available. As such an exam-
does not employ interval mapping methods ple, WEBQTL was developed as an interactive
(Holland, 1998). web site useful for exploring the genetic
Bayesian QTL mapping has received a modulation of thousands of phenotypes
lot of attention in recently years with several gathered over a 30-year period by hun-
software packages developed. For example, dreds of investigators using reference pan-
BQTL, Bayesian Quantitative Trait Locus els of recombinant inbred strains of mice
mapping, was developed for the mapping of (http://www.webqtl.org/search.html). WEBQTL
genetic traits from line crosses and RILs includes dense error-checked genetic maps,
(http://hacuna.ucsd.edu/bqtl; Borevitz et al., as well as extensive gene expression data
2002). It performs: (i) maximum likelihood sets (Affymetrix) acquired across more than
estimation of multi-gene models; (ii) Bayesian 35 strains of mice. As a web-based user-
estimation of multi-gene models via Laplace friendly package to map QTL in out-bred
Approximations; and (iii) interval mapping populations, QTL EXPRESS (http://qtl.cap.
and composite interval mapping of genetic ed.ac.uk; Seaton et al., 2002) was devel-
loci. BLADE, Bayesian LinkAge DisEquilibrium oped for line crosses, half-sib families,
mapping, was developed for Bayesian analy- nuclear families and sib-pairs. It provides
sis of haplotypes for LD mapping (http:// two options for QTL significance tests:
www.people.fas.harvard.edu/junliu/Tech permutation tests to determine empirical
Rept/03folder/; Liu et al., 2001; Lu, X. et al., significance levels and bootstrapping to
2003). MULTIMAPPER is a Bayesian QTL map- estimate empirical confidence intervals of
ping software for analysing backcross, DH QTL locations. Fixed effects/covariates can
and F2 data from designed crossing experi- be fitted and models may include single or
ments of inbred lines (Martinez et al., 2005). multiple QTL.
MULTIMAPPER/OUTBRED extended this to the
populations derived from out-bred lines
(http://www.rni.helsinki.fi/mjs/).
Several mapping software packages 15.2.3 eQTL mapping
were developed for QTL mapping required
in some specific situations. MCQTL was With the availability of whole genome
developed for simultaneous QTL mapping sequences in many plant species, linkage
in multiple crosses and populations (http:// analysis, positional cloning and microarray
www.genoplante.com; Jourjon et al., 2005). are gradually becoming powerful tools for
It allows the analysis of the usual popula- revealing the links between phenotype and
tions derived from inbred lines and can link genotype or genes. To display the myriad of
the families by assuming that the QTL loca- relationships between eTraits, markers and
tions are the same in all of them. Moreover, genes, we need a convenient bioinformatics
608 Chapter 15

tool to visualize eQTL mapping results at a 15.2.4 Linkage-disequilibrium based


variety of scales ranging from a single locus QTL mapping
to the entire genome. Additionally, research-
ers need quick and straightforward ways to Association or LD mapping has become
integrate these results with the extra infor- increasingly popular (Chapter 6). It uses
mation from previous studies on the organ- unstructured populations that consist of
ism. To address these needs, eQTL Explorer unrelated individuals, germplasm acces-
was developed (Mueller et al., 2006) to store sions, or randomly selected cultivars.
expression profiles, linkage data and infor- Before LD mapping, genotyped units are
mation from external sources in a relational subjected to statistical analysis to remove
database, enabling simultaneous visualiza- the most important factor, population struc-
tion and intuitive interpretation of the com- ture, which can cause false positive asso-
bined data via a Java graphical interface. ciations due to circumstantial correlations
Zou et al. (2007) developed eQTL Viewer, rather than real linkage. For example, the
a web-based tool that plots eQTL mapping STRUCTURE software (Pritchard et al., 2000a)
results. The resulting plot displays eQTL for can be used for this purpose. Some software
thousands of eTraits in a single view, which packages have been developed for LD map-
makes patterns such as cis- and trans- ping with the population structure analysis
regulations readily identifiable. They also functionality included. STRAT, as a compan-
empowered such a plot with the ability to ion program to STRUCTURE, uses a structured
present annotations, highlight features and association method for LD mapping, ena-
organize eTraits in biological groups, such bling valid case-control studies even in the
as biochemical pathways. All these charac- presence of population structure (http://
teristics make eQTL Viewer an intuitive and pritch.bsd.uchicago.edu/software/STRAT.
information rich environment to discover html; Pritchard et al., 2000b).
and understand genome-wide transcrip-
tional regulation patterns. LD-based QTL mapping
A web site developed by Bhave et al.
(2007), PhenoGen, can be used to search for TASSEL is a comprehensive software pack-
candidate genes that control a complex trait age for trait analysis by association, evolu-
based on the co-occurrence of differentially tion and linkage, which performs a variety
expressed genes in microarray experiments of genetic analyses including LD mapping,
and phenotypic QTL or co-occurrence diversity estimation and calculating LD
of phenotypic QTL and expression QTL. (http://sourceforge.net/projects/tassel/;
PhenoGen needs to know how many can- Zhang, Z. et al., 2006). The LD analysis
didate genes exist within the QTL region between genotypes and phenotypes can be
according to known literature reports and performed by either a general linear model
detailed information of those candidates or a mixed linear model. The general linear
and related reports indicating their candi- model allows users to analyse complex
dacy. Xiong et al. (2008) developed a soft- field designs, environmental interactions
ware tool, PGMAPPER, for automatically and epistasis. The mixed model is specially
matching phenotype to genes from a defined designed to handle polygenic effects at mul-
genome region or a group of given genes by tiple levels of relatedness including pedi-
combining the mapping information from gree information. These analyses should
the Ensembl databases and gene function permit LD analysis in a wide range of plant
information from the OMIM (http://www. and animal species.
ncbi.nlm.nih.gov/sites//entrez?db=omim) Other software packages include
and PubMed databases (http://www.ncbi. Multiallelic Interallelic Disequilibrium
nlm.nih.gov/sites//entrez). PGMAPPER is Analysis Software (MIDAS), which was
currently available for candidate gene designed for analysis and visualization of
search of human, mouse, rat, zebrafish and interallelic disequilibrium between multi-
12 other species. allelic markers (http://www.genes.org.uk/
Decision Support Tools 609

software/midas; Gaunt et al., 2006) and annotation, pathway information and pat-
PEDGENIE (http://bioinformatics.med.utah. terns of LD (Pettersson et al., 2008). GOLD-
edu/PedGenie/index.html; Allen-Brady et SURFER2 (GS2), a comprehensive tool for the
al., 2006), which was developed as a general- analysis and visualization of GWA studies,
purpose tool to analyse association and was developed by Pettersson et al. (2008).
transmission disequilibrium (TDT) between GS2 is an interactive and user-friendly graph-
genetic markers and traits in families of ical application that can be used in all steps
arbitrary size and structure. With PEDGENIE, in GWA projects from initial data quality
any size pedigree may be incorporated into control and analysis to biological evalua-
this tool, from independent individuals to tion and validation of results. The program
large genealogies. Independent individuals is implemented in Java and can be used on
and families may be analysed together. all platforms. With GS2, very large data sets
GENERECON (http://www.daimi.au.dk/ (e.g. 500K markers and 5000 samples) can
mailund/GeneRecon/) is another software be quality assessed, rapidly analysed and
package for LD mapping using coalescent integrated with genomic sequence informa-
theory. It is based on a Bayesian Markov- tion. Candidate SNPs can be selected and
chain Monte Carlo method for fine-scale LD functionally evaluated.
mapping using high-density marker maps Other tools that are developed for GWA
in animals. GENERECON explicitly models studies include GENOMIZER (a platform-
the genealogy of a sample of the case chro- independent Java program for the analysis
mosomes in the vicinity of a disease locus. of GWA experiments; http://www.ikmb.
Given case and control data in the form of uni-kiel.de/genomizer), PLINK (a whole-
genotype or haplotype information, it esti- genome LD analysis toolset; http://pngu.
mates a number of parameters, most impor- mgh.harvard.edu/purcell/plink; Purcell et
tantly, the disease position (Mailund et al., al., 2007), MAPBUILDER (for chromosome-
2006). wide LD mapping; http://bios.ugr.es/
BMapBuilder; Abad-Grau et al., 2006) and
Genome-wide association mapping power Calculator for Association with Two
Stage design (CATS), which calculates the
Genome-wide association (GWA) studies power and other useful quantities for two-
are now being widely undertaken to find stage GWA studies (http://www.sph.umich.
the link between genetic variations and edu/csg/abecasis/CaTS) (Skol et al., 2006).
common diseases in humans and agronomic The results of large GWA studies are
traits in plants. Ideally, a well-powered being deposited in public databases with
GWA study will involve the measurement increasing frequency. But the currently
of hundreds of thousands of SNPs in thou- available software to analyse and inter-
sands of individuals. The sheer volume of pret GWA data sets can be difficult to use
data generated by these experiments cre- (Buckingham, 2008). User-friendly software
ates very high analytical demands. There is urgently needed to provide new ways of
are a number of important steps during the making GWA data sets easy to explore and
analysis of such data, many of which may share among researchers and to design ana-
present several bottlenecks. The data need lysis packages that deal with the increas-
to be imported and reviewed to perform ini- ing computational demands posed by these
tial quality control before proceeding to LD data sets.
testing. Evaluation of results may involve
further statistical analysis, such as permu- Integrated haplotype and LD analysis
tation testing, or further quality control of
associated markers, for example, review- The analysis of large amounts of SNP data
ing raw genotyping intensities. Finally, sig- creates difficulties for the analysis of hap-
nificant associations need to be prioritized lotypes and their association to traits of
using functional and biological interpreta- interest. Commonly fairly simple methods,
tion methods, browsing available biological such as two- or three-SNP sliding windows
610 Chapter 15

are used to create haplotypes across large lotype block identification, haplotype
regions, but these may be of limited value resolution and LD mapping, suitable
when adjacent SNPs are in strong LD and for high-density phased or unphased
provide redundant information. Genetic SNP data (http://bioinfo.cs.technion.
analysis of SNP data and haplotypes have ac.il/haploblock);
received more and more attention recently HAPLOT, a simple program for graphical
and various software packages have been presentation of haplotype block struc-
developed for haplotype analysis and these tures, tagSNP selection and SNP varia-
are sometimes integrated with LD analysis. tion (Gu et al., 2005);
HAPLOBUILD (http://snp.bumc.bu.edu/ HAPLOREC, population-based haplotyping
modules.php?name=HaploBuild), was cre- software (Eronen et al., 2004); and
ated for constructing and testing haplotypes HAP, a haplotype analysis system which
for SNPs in close physical proximity to one is aimed at helping to perform dis-
another but which are not necessarily con- ease association studies and a phasing
tiguous (Laramie et al., 2007). The number method which is based on the assump-
of SNPs contained in the haplotype is not tion of imperfect phylogeny (http://
restricted, thereby permitting the evalua- research.calit2.net/hap).
tion of complex haplotype structures.
HAPLOVIEW (http://www.broad.mit.edu/
personal/jcbarret/haploview) was designed
to simplify and expedite the process of 15.2.5 Genotype-by-environment
haplotype analysis by providing a common interaction analysis
interface to several tasks relating to such
analyses. HAPLOVIEW currently allows users To better separate the genetic effects from
to examine block structures, generate hap- the environmental effects and their interac-
lotypes in these blocks, run association tests tion, statistical methods are of paramount
and save the data in a number of formats. importance in a traditional as well as molec-
All functionalities are highly customizable ular breeding programme (Chapter 10).
(Barrett et al., 2005). These methods become even more essential
HAPSTAT (http://www.bios.unc.edu/lin/ when developing MAS systems for abiotic
hapstat/) is a user-friendly software inter- stress tolerance where germplasm must be
face for the statistical analysis of haplotype- tested under drought or low nitrogen condi-
disease association. HAPSTAT allows the user tions, for example. Under such stress con-
to estimate or test haplotype effects and ditions the soil where the plants are grown
haplotypeenvironment interactions by becomes extremely variable and patchy so
maximizing the (observed-data) likelihood that the separation of genetic effects from
that properly accounts for phase uncer- environmental effects is much more diffi-
tainty and study design. The current ver- cult than under normal conditions.
sion considers cross-sectional, case-control Various processes contribute to the
and cohort studies. characterization of a genotypeenvironment
Other related software packages include: system (Cooper et al., 1999). There is a great
need for integrated decision support tools
DPPH (Direct method for Perfect for genotype-by-environment interaction
Phylogeney Haplotyping; http://wwwc- (GEI) analysis: (i) developing field experi-
sif.cs.ucdavis.edu/gusfield/dpph. mental designs; defining the target popula-
html; Bafna et al., 2003); tion of environments (TPE) and genotypes;
EHAP (detecting association between (ii) assessing GEI for various field condi-
haplotypes and phenotypes; http:// tions and determining subsets of genotypes
wpicr.wpic.pitt.edu/WPICCompGen/ and sites with negligible crossover interac-
ehap__v1.htm); tion effects from which subgroups of sites
HAPLOBLOCK, a software package which and genotypes with similar response can be
provides an integrated approach to hap- identified in order to maximize responses
Decision Support Tools 611

to selection; (iii) mapping QTL and QTL- detection. Among various models, mixed
by-environment interaction (QEI) of compo- linear models are fundamental in the pro-
nent traits important for the target traits; (iv) cess of in silico QTL linkage and LD map-
developing a selection index for phenotypic ping. These decision support tools are being
as well as molecular marker data in order further refined through the integration of
to select the best genotypes to be used in whole-plant physiology models.
the next cycle of selection; (v) incorporat-
ing environmental and/or genotypic vari-
ables into statistical models to explain the
causes of GEI (physical and chemical soil 15.2.6 Comparative mapping and
conditions may be of importance under consensus maps
drought and may be the main cause of GEI);
(vi) studying genetic diversity of crop geno- In the past few decades, a wealth of genomic
types associated with the target traits; (vii) data has been produced in a wide variety
performing LD mapping of those traits; and of species using a diverse array of func-
(viii) studying gene expression of genes tional and molecular marker approaches.
under target conditions from microarray In order to unlock the full potential of the
experiments. information contained in these independ-
Decision support tools are required for ent experiments, researchers need efficient
classifying the most important testing envir- and intuitive means to identify common
onments into mega-environments that will genomic regions and genes involved in the
then define the appropriate TPE. Based on expression of target phenotypic traits across
these environmental classifications breeding diverse conditions. Experimenters who
strategies can be developed and established seek to apply many diverse studies on QTL
for a more efficient and rapid realization of face complex problems in summarizing,
genetic gains targeting those specific environ- interrelating and integrating them. Tools for
ments. Furthermore, the incorporation of cli- QTL consensus map building offer exten-
matic variables (attributes of environments) sive analysis or meta-analysis of data prior
and molecular markers (attributes of geno- to assigning a consensus QTL location for
types such as QTL) into statistical models a trait (Sawkins et al., 2004; Arcade et al.,
facilitate the identification of the causes of GEI 2004).
and therefore help explain QEI. This allows CMTV (Comparative Map and Trait
interpreting, understanding and exploiting Viewer; Sawkins et al., 2004) was devel-
GEI and QEI and it allows identification of oped as a software component to help serve
the regions of the chromosomes affecting a as an intuitive and extensible framework for
trait that are highly affected by external cli- the integration of various kinds of genomic
matic conditions. This also facilitates group- data. The software components use the
ing of environments with negligible genetic ISYS (Integrated SYStem) integration plat-
crossover effects as well as clustering geno- form developed by the National Center for
types with no genotypic crossover GEI. Genetic Resources (Siepel et al., 2001) to
Podlich and Cooper (1998) developed access and visualize map data and related
the QUGENE software for carrying out quan- information such as germplasm pedigree
titative genetic analyses of GEI in crop relationships. CMTV is based on algorithmi-
breeding and this has become an increas- cally determining correspondences between
ingly widely utilized decision support tool sets of objects on multiple genomic maps,
in breeding programmes. More recently, a and can display syntenic regions across taxa,
statistical model developed by Crossa et al. combine maps from separate experiments
(2006) incorporates pedigree information into a consensus map, or project data from
(through the coefficient of parentage) for different maps into a common coordinate
test genotypes when modelling GEI. This framework. As such an example, Schaeffer
model can be used to perform more efficient et al. (2006) used a strategy for consen-
LD mapping studies as well as in silico QTL sus QTL maps that leverages the highly
612 Chapter 15

curated data in MaizeGDB, in particular, experimental population(s). Each experi-


the numerous QTL studies and maps that ment is limited in size and usually restricted
are integrated with other genome data on to a single population or a cross planted in
a common coordinate system. In addition, a specific environment. As suggested by
they exploited a systematic QTL nomen- Xu, Y. (2002), it is important for research-
clature and a hierarchical categorization of ers to follow general rules for naming and
over 400 maize traits developed in the mid- reporting genes and traits. This will then
1990s; the main nodes of the hierarchy are facilitate the combination of information
aligned with the trait ontology at Gramene, from several studies, for example, through
a comparative mapping database for cereals meta-analysis of results of QTL studies
(http://www.gramene.org). Consensus maps (Goffinet and Gerber, 2000) or joint analysis
are presented for one trait category, insect of the raw data (Haley, 1999). Extension of
response (80 QTL); two traits, grain yield current databases to include raw data from
(71 QTL) and kernel weight (113 QTL), rep- gene mapping projects will stimulate this
resenting over 20 separate QTL map sets effort. On the other hand, many permanent
of ten chromosomes each. The strategy is populations have been shared internation-
germplasm-independent and reflects any ally for genomic studies and the raw phe-
trait relationships that may be chosen. notype and genotype data should also be
A systematic approach for associat- shared at the same time. A rice RFLP map
ing genes and phenotypic characteristics based on a DH population from the cross
that combines literature mining with com- between IR64 and Azucena has been satu-
parative genome analysis has also been rated with about a thousand SSR markers
practised. The underlying principle is that (Chen et al., 1997; Temnykh et al., 2000;
species sharing a phenotype may share McCouch et al., 2002). However, research-
orthologous genes associated with the ers involved in QTL mapping continued to
same biological process and thus correla- use a molecular map consisting of only 175
tions between the presence and absence of RFLP markers for many years after the SSR
both genes and traits across species should markers were developed. Clearly, sharing
indicate relevant genotypephenotype asso- marker and phenotype information through
ciations (Korbel et al., 2005). In a global a well-established database such as Gramine
analysis involving 92 prokaryotic genomes, or GrainGenes has made all sources of data
323 clusters containing a total of 2700 sig- more valuable.
nificant genephenotype associations were A standard reporting system is also crit-
retrieved from the MEDLINE literature ical for comparative genomics, QTL allel-
database that reflect phenotypic similarities ism tests, data sharing and mining and the
of species. Some clusters contain mostly association between major genes and QTL.
known relationships, such as genes involved As discussed by Xu, Y. (2002), a standard
in motility or plant degradation, often with system for markertrait association should
additional hypothetical proteins associated include allele characterization data such as
with those phenotypes. Other clusters com- allele sizes, gene effects, variation explained
prise unexpected associations: for exam- by each gene or all genes in the model, gene
ple, a group related to food and spoilage is interaction if more than one gene is identi-
linked to genes predicted to be involved in fied and GEI if more than one environment
bacterial food poisoning. Among the clus- is involved. Genetic information should be
ters, an enrichment of pathogenicity-related shared and combined with data generated
associations was observed, suggesting that by plant breeding programmes, for example,
this approach revealed many novel genes germplasm diversity, mapping populations,
likely to play a role in infectious diseases pedigrees, graphical genotypes, mutants
(Korbel et al., 2005). and other genetic stocks.
The explosion of interest in marker Finally, as a comprehensive tool,
trait association studies has led to numer- The Rosetta Syllego System (http://www.
ous reports in plants, each based on its own rosettabio.com/products/syllego/; Broman
Decision Support Tools 613

et al., 2003b) was developed as a genetic tools have been developed for assisting
data management and analysis system to germplasm evaluation, genetic mapping
advance whole genome linkage, LD and and MAS, they either work independently,
eQTL studies. Designed for biologists, sta- depending on different operating systems,
tistical geneticists and investigators respon- or require different data formats which
sible for generating genotyping data, the makes it impossible to complete a compre-
Syllego system provides us with an easy- hensive data analysis to make the results
to-use project workspace so that we can available to breeders for decision making in
organize, analyse and share genotype and such a short time window.
phenotype data along with analysis results.
With the Syllego system, generating high
quality analysis data and meaningful results 15.3.1 MAS methodologies and
becomes simple. It automates all tedious implementation
data management and data formatting tasks
so that genetic analysis workflows can be There are many factors that affect the effi-
streamlined using analysis methods of ciency of MAS. In theory, MAS is expected
choice. Managing all genetic data and ref- to be more efficient than phenotypic selec-
erence information is straightforward. The tion when the heritability of a trait is low,
Syllego system converts public and private where there is tight linkage between the
genotype data sets and reference annota- QTL and the DNA markers (Dudley, 1993;
tions, such as dbSNP (http://www.ncbi. Knapp, 1998), with larger population sizes
nlm.nih.gov/projects/SNP/) and HapMap (Moreau et al., 1998) and in earlier genera-
(http://www.hapmap.org/), as well as indi- tions of selection before recombinational
vidual (sample) information into a single, erosion of markertrait associations (Lee,
consistent repository for fast, convenient 1995). Edwards and Page (1994) proposed
access. that the distance between the markers and
the QTL was the single largest constraining
factor for gains from MAS. An example in
Lande and Thompson (1990) demonstrated
15.3 Marker-assisted Selection that on a single trait the potential selection
efficiency, using a combination of molecu-
MAS is one of the major activities in molec- lar and phenotypic information, depends
ular breeding (Chapters 8 and 9). It needs on the heritability of the trait, the propor-
various decision support tools including tion of additive genetic variance associated
those for foreground and background selec- with marker loci and the selection scheme.
tion and identification of the recombinants The relative efficiency of MAS is greatest
with favourable alleles and allele combin- for traits with low heritability if a large frac-
ations. However, only a few tools are avail- tion of the additive genetic variance is asso-
able so far for some procedures of MAS. ciated with marker loci.
Development of decision support tools for Decision support tools are required for
fully functional MAS still faces a lot of the following procedures related to MAS: (i)
challenges. determining minimum sample size for fore-
A huge amount of data will be gener- ground/background selection; (ii)estimation
ated with large-scale MAS and this set of of genetic gains (response to selection);
data needs to be analysed and also inte- (iii) construction of selection indices for
grated with other types of data to make multiple traits and whole genome selection;
selection decisions in a short time window, (iv) estimation and graphical display of
e.g. 4 weeks during vegetative to flowering recipient genome content of selected indi-
stages, or harvest to planting the next season. viduals at each generation of introgression;
Thus, decision support tools are essential (v) identification of desirable plants based
to accelerate this process while maintain- on both phenotype and genotype; (vi) cost
ing accuracy and precision. Although many benefit analysis; and (vii) software for
614 Chapter 15

MAS and simulations (using all available decision-support guidelines to help the user
information). correctly operate the software and correctly
There has been much interest in the interpret the outputs.
development of software that simulates Other MAS tools include: (i) POPMIN, a
MAS using genetic models. Early efforts had program for the numerical optimization of
somewhat limited value, for example, GREGOR population sizes in marker-assisted back-
simulates MAS based only on predefined cross programmes (Hospital and Decoux,
genetic linkage maps and is thus restricted 2002); (ii) BCSIM, backcross simulation soft-
in its value for simulation of MAS in breed- ware for evaluation of marker-assisted
ing programmes (Tinker and Mather, 1993). backcross programmes (http://www.plant-
The program GREGOR implements the basic breeding.wur.nl/UK/software_bcsim.html);
principles, but the interactive use and the and (iii) the GGT, GRAPHICAL GENOTYPES soft-
fact that it simulates only some predefined ware, allowing the user to transform molec-
genetic linkage maps restricts its value for ular marker data into simple colourful
simulation of breeding programmes. chromosome drawings (van Berloo, 1999).
Frisch et al. (2000) present PLABSIM, a
tool for simulation of MAS programmes.
The software can be used to investigate the
effect of varying population size, marker 15.3.2 Marker-assisted inbred and
density and positions and selection strate- synthetic creation
gies on the genetic composition of the breed-
ing product and on the required number of For open-pollinated crops, a synthetic cul-
marker data points. It has the following fea- tivar is developed by inter-crossing selected
tures: (i) simulations can be made for any clones or inbred lines, with seed production
diploid genome with an arbitrary number of the cultivar through open pollination.
of loci at arbitrary positions on an arbitrary For self-pollinated crops, a synthetic culti-
number of chromosomes; (ii) the imple- var is a mix of different inbred lines. The
mented reproduction schemes include all breeding procedures used to develop a
common breeding methods; (iii) an arbitrary synthetic cultivar depend on the feasibil-
number of selection steps can be combined ity of developing superior inbred lines and
with a selection strategy; (iv) selection can clones. For species such as maize, inbred
be carried out for genotypes at defined loci, lines for synthetic cultivars are developed
or for selection indices calculated from by the same procedures used for the devel-
allele frequencies at several loci; and (v) the opment of hybrid cultivars. For many for-
simulated data can be analysed for a broad age crops, inbreeding depression is too
range of genetic parameters including pop- severe to permit the formation of inbred
ulation size, marker density and positions lines, but the parent can be maintained
and selection strategies for the genetic com- and reproduced readily by cloning. The
position of the breeding product and on the factors to consider in the development of a
required number of marker data points. synthetic cultivar include: (i) formation of
To integrate various tools into a common a population; (ii) evaluation of individual
platform to assist their effective deployment inbreds/clones per se; (iii) evaluation of the
in crop improvement, iMAS (www.genera- combining ability of the inbreds/clones;
tioncp.org) is a preliminary attempt to create (iv) evaluation of experimental synthetics;
a publicly available computational platform and (v) preparation of seed for commercial
to assist the development and application of use (Fehr, 1987).
marker-assisted breeding. iMAS currently Synthetic cultivars can be developed
integrates freely available software for the by mixing inbred lines that have been bred
journey from phenotyping-and-genotyping by MAS or by mixing individual plants
of individuals to identification and applica- derived from any stage of MAS (Dwivedi
tion of trait-linked markers. iMAS also pro- et al., 2007). With genotypic information
vides simple-to-understand-and-use on-line available across the whole genome for all the
Decision Support Tools 615

selected individuals or inbred lines, support Goldman, 2000), various assumptions are
tools are needed to facilitate developing made in quantitative genetics to render
synthetic cultivars to contain complemen- theories mathematically or statistically trac-
tary genotypes, fixed heterozygosity and the table. Some of these assumptions can be
best combinations of genetic structure. easily tested or satisfied by certain experi-
mental designs; others, such as the assump-
tions of no linkage, no multiple alleles and
no GEI, can seldom, if ever, be met. Other
15.4 Simulation and Modelling assumptions, like the presence or absence
of epistasis and pleiotropy, are statisti-
Along with the fast development in molec- cally difficult to define and test. Computer
ular biology and biotechnology, a large simulation provides a tool to investigate
amount of biological data becomes increas- the implications of relaxing some of the
ingly available for important breeding traits, assumptions and the effect it has on the
which in turn allows selection based on implementation of a breeding programme
information of multiple sources. As dis- (Kempthorne, 1988). Computer simula-
cussed in the previous sections, however, tion provides an opportunity to lessen the
available information has not been effec- impact of these assumptions by accommo-
tively used in crop improvement due to dating these factors, thereby improving the
the lack of appropriate tools. In this sec- validity of genetic models for use in plant
tion, plant breeding through simulation breeding. This approach would be very
and modelling will be discussed including helpful when the breeders want to compare
utilizing the vast and diverse information breeding efficiencies from different selec-
by incorporating simulation and modelling tion strategies, to predict the cross perform-
into breeding programmes to develop and ance with known gene information and to
upgrade various decision support tools. utilize efficiently identified major genes
and QTL in breeding.
As agronomically important traits
are significantly affected by the environ-
15.4.1 Importance of simulation ment, whole-plant physiology modelling
and modelling is becoming increasingly important for
partitioning complex traits into their com-
The accumulation in genomics information ponents and understanding how those
for breeding traits has made simulation and components interact with each other and
modelling more and more practical and contribute to the overall trait expression
important, as computer simulation can help in different environmental conditions.
to investigate many what if crossing and With a commitment to genomic analysis of
selection scenarios, allowing many scenar- component traits, whole-plant physiology
ios to be tested in silico in a short period modelling provides a critical link between
of time, which in turn helps breeders make molecular genetics and crop improvement.
important decisions to minimize and opti- Crop models with generic approaches to
mize highly resource demanding field underlying physiological processes (Wang
experiments. As the number of published et al., 2002) provide a means to link phe-
genes and QTL for various traits continues notype and genotype, through simulation
to increase, for example, plant breeders face analysis, of an in silico or virtual plant
a challenge to determine how to best uti- (Tardieu, 2003). In this way it is possible to
lize this multitude of information for crop dissect the physiological basis of adaptive
improvement. Although quantitative genet- traits and determine their control at whole-
ics provides much of the framework for the plant level through modelling.
design and analysis of selection methods A plant requires information about its
used within breeding programmes (Falconer environment and its interaction with that
and Mackay, 1996; Lynch and Walsh, 1998; environment and uses that information to
616 Chapter 15

dictate its adaptive responses that result in tion studies for grain yield in maize. The
the plant phenotype. Significant endeav- synopsis we can take from Coorss synthe-
ours in the field of whole-plant modelling sis of published studies strongly suggested
are now being directed at understanding that the realized progress from selection
genetic regulation and aiding crop improve- for this trait is considerably lower than the
ment (Cooper et al., 2002a; Chapman et al., predicted response. For most involved in
2002, 2003; Hammer et al., 2002; Yin et al., applied breeding this result is not surpris-
2003; Wang et al., 2004; Yin, X. et al., 2004; ing. However, this quantified observation
Wang, J. et al., 2005). There are three areas forces us to consider the possible reasons
in which crop modelling could assist in for the discrepancies between the predic-
assessing in silico the multitude of options tions made from classical quantitative
to improve the efficiency of plant breed- genetic theory and the realized responses
ing (Cooper et al. 2002a): (i) characterizing from applied breeding.
environments to define the target popu- A crop can be analysable for processes
lation of environments; (ii) assessing the at various scales: community, population,
value of specific putative traits in improved plant, organ, tissues, cell and downwards to
plant types; and (iii) enhancing integration molecular levels. White and Hoogenboom
of molecular genetic methodologies. Hence, (2003) identified six levels of genetic details
plant breeders can pose questions that range for simulation to elucidate differences in
from how to better utilize field performance plant growth and development among
data to how knowledge of gene action or cultivars:
function can be utilized for selection in a
1. Genetic model with no reference to
complex TPE.
species.
2. Species-specific model with no reference
15.4.2 Genetic models used to genotypes.
in simulation 3. Genetic differences represented by
cultivar-specific parameters.
4. Genetic differences represented by spe-
Multiple mathematical formalisms have
cific alleles, with gene action and gene
been used to model genetic and, more
effects presented through linear effects on
generally, metabolic networks. Examples
model parameters.
include: (i) Boolean (ON/OFF) networks;
5. Genetic differences represented by geno-
(ii) Petri (concurrent information flow) nets;
types with gene action explicitly simulated
(iii) S-systems (continuous time models
based on knowledge of regulation of gene
motivated by chemical kinetics); (iv) differ-
expression and effects of gene products.
ential equation models; (v) neutral network
6. Genetic differences represented by geno-
models; and (vi) Bayesian networks (Welch
types, with gene action simulated at the
et al., 2004). Despite this extensive effort,
level of interactions of regulators, gene
little attention has focused on predicting
products and other metabolites.
phenotypes of interest to plant breeders or
on integrating the effect of multiple envir- The first two levels are found in early
onmental factors. models of crops and are still used for
Simulation, using relatively simple models where only genetic representa-
genetic models, has been used for many tions of species are required. Most current
special studies in plant breeding (Casali and crop models are at level 3. Level 4 corre-
Tigchelaar, 1975; Reddy and Comstock, 1976; sponds to the approach used in GeneGro
van Oeveren and Stam, 1992; van Berloo Version 1 (White and Hoogenboom, 1996)
and Stam, 1998; Frisch and Melchinger, and linear models of gene effects and level
2001). When it is used for genetic models 5 is partially represented in the phenology
with complex traits involved, however, the routines of GeneGro Version 2 (Hoogenboom
result is uncertain. Coors (1999) summa- and White, 2003) and based on knowledge
rized many of the published recurrent selec- of gene action. The feasibility of level
Decision Support Tools 617

6 is implicitly considered for unicellu- acting in an additive manner. The results


lar organisms in models such as E-CELL indicated that use of a crop growth and
(Tomita et al., 1999), which can advance development modelling framework can
our understanding of cell biochemistry link phenotype complexity to underlying
and gene regulation, but current applica- genetic systems in a way that enhances the
tion are far from providing the capacity of power of molecular breeding strategies.
simulating growth of a plant, even if sim- The environmental characterization and
plified to a few key cell types and main- physiological knowledge helped to dissect
tained in a constant environment (White and explain gene and environment context
and Hoogenboom, 2003). The last three lev- dependencies in the data and based on
els represent a continuum of approaches estimated gene effects to simulate a range
involving greater levels of genetic and bio- of MAS breeding strategies.
chemical details. QTL mapping allows the dissection of
To study the behaviour of gene net- a phenotype into underlying genetic factors
works and their influences on organism but it has limited ability to predict how QTL
development and evolutionary processes, detected in one set of environmental fac-
Cooper et al. (2005) developed the E(NK ) tors or management practices will behave
model, as discussed in Chapter 10, which in a new set of conditions (Stratton, 1998).
is an extension of the NK gene network Eco-physiological modelling provides an
model introduced and used by Kauffman insight into the factors influencing GEI
(1993). van Eeuwijk et al. (2004) presented (Tardieu, 2003), but it does help define the
various statistical models for the analysis genetic basis for differences in response to
of multi-environment trial data that differ environmental changes. Combining eco-
in the extent to which additional genetic, physiological modelling with genetic map-
physiological and environmental informa- ping provides the opportunity for creating a
tion is incorporated into the model formu- QTL-based crop physiology model that could
lation. Their models range from a simplest be powerful tool for resolving the genetic
one with only the additive two-way basis of complex environment-dependent
ANOVA model to a complex one involv- yield-related traits. For example, using this
ing a synthesis of a multiple QTL model approach researchers predicted specific leaf
and an eco-physiological model to describe area in barley (Yin et al., 1999), stay-green
a collection of genotypic response curves. response to nitrogen in sorghum (Borrell
Between these extremes, they discussed et al., 2001), leaf growth response to temper-
linearbilinear models, whose parameters ature and water deficit in maize (Reymond
can only indirectly be related to genetic et al., 2003) and pre-flowering duration in
and physiological information and factorial barley (Yin et al., 2005). By removing gene
regression models that allow direct incorpo- and environment context dependencies,
ration of explicit genetic, physiological and it was possible to devise breeding strate-
environmental co-variables on the levels of gies that generated an enhanced rate of
the genotypic and environmental factors. yield improvement over several cycles of
Hammer et al. (2005) explored whether selection. Messina et al. (2006) combined
physiological dissection and integrative an eco-physiological model (CROPGRO-
modelling of complex traits could link the Soybean) with a linear model that predicted
complexity of the phenotype to underly- cultivar-specific parameters as a function of
ing genetic systems in a way that could E-loci. This approach predicted 75% of the
enhance the power of molecular breeding variance in time to maturity and 54% of the
strategies in sorghum. This approach was variance in yield, demonstrating that agri-
applied to four key adaptive traits (phe- cultural genomics data can be effectively
nology, osmotic adjustment, transpiration used for predicting cultivar performance
efficiency and stay-green). It was assumed and refining crop breeding systems.
that the three to five genes associated The genotype-to-phenotype (GP)
with each trait, had two alleles per locus model, as a key component of breeding
618 Chapter 15

design, describes how different genotypes collaboration among crop modellers, gene-
interact with environments to produce dif- ticists and molecular biologists.
ferent phenotypes (Cooper et al., 2005).
Using information from genes, core germ-
plasm collections and cornerstone parents,
when combined with the biological charac- 15.4.3 A simulation module for genetics
teristics and breeding objectives for the tar- and breeding: QULINE
get environments, breeding procedure and
selection methods can be simulated and Typically, breeding is done by crossing and
optimized and desirable genotypes and the selecting from progeny. With the opportu-
probability of breeding new cultivars can be nity to make predictions of crop perform-
predicted. ance and to explicitly model in silico the
Comparisons among genomes of dif- desired genotype environment breeding
ferent crop species reveal high levels of scheme combinations, breeding shifts in its
similarity and it appears likely that models character. Breeders become model testers
of gene action in one crop can be extrapo- themselves while model systems (or other
lated to other crops in the same botanical information rich systems) become useful
family (e.g. among legumes or among cere- tools for model building. Once phenotype
als). However, Helentjaris and Briggs (1998) genotype environment models are veri-
noted that efforts to identify maize homo- fied through explicit breeding experiments,
logues for genes described in other species the task is to move the models themselves
have proven more difficult than originally around through breeding programmes of
anticipated. One problem is that a single different crop species. One of the interest-
species may have multiple genes with simi- ing efforts pursuing this type of paradigm
lar sequences but different functions. shift is the QUantitative GENEtics (QUGENE;
In the future it will be possible to build www.pig.ag.uq.edu.au/qu-gene) system.
more realistic genetic models if advances QUGENE is a simulation platform for
in genomics improve our understanding quantitative analysis of genetic models,
of the GP relationship and GEIs (Bernardo, which provides the opportunity to develop
2002; Cooper et al., 2005). Conclusions on a general simulation program for actual
the relative merits of breeding strategies breeding programmes through its two-
based on simple GP models may have to stage architecture (Podlich and Cooper,
be re-evaluated in the context of an expo- 1998). The first stage is the engine, which
nentially growing knowledge base. This has two roles: (i) to define the genotype-
information will aid in determining gene by-environment (GE) system (i.e. all the
number and gene effects on phenotype. genetic and environmental information of
In addition, conventional plant breeding the simulation experiment); and (ii) to gen-
provides a wealth of information about trait erate the starting population of individu-
heritabilities and correlations. This infor- als (base germplasm). The second stage
mation, once determined, will help define includes the application modules, whose
errors, linkage and pleiotropic effects. In role is to investigate, analyse, or manipu-
addition, crop physiological models may late the starting population of individu-
also help fine-tune the genetic models for als within the GE system defined by the
breeding modelling (Reymond et al., 2003; engine. The application module usually
Yin, X. et al., 2004; Hammer et al., 2005). represents the operation of a breeding pro-
White and Hoogenboom (2003) discussed gramme. The core model within the engine
several practical issues in gene-based mod- can incorporate many of the features for
elling, including how to access genetic and the architecture of traits that are revealed
molecular data, which species, traits and by the characterization of GE system.
what scale and level of detail to model, the It includes multiple traits and QTL with
relevance of results from animal systems to different effects, genome positional infor-
plant biology and how to ensure effective mation such as that provided by molecular
Decision Support Tools 619

maps, epistasis within gene networks, dif- QULINE has the potential to provide a
ferential gene expression, GEIs and struc- bridge between the vast amount of biologi-
ture within the TPE. Cooper et al. (1999) cal data and breeders queries on optimiz-
provided an example of this approach for ing selection gain and efficiency. It has been
comparisons between conventional pheno- used to compare two selection strategies
typic and MAS strategies. (Wang et al., 2003), study the effects on
Using QUGENE software, a breeding selection of dominance and epistasis (Wang
module was developed for sorghum by et al., 2004), predict cross performance
incorporating physiological constraints and using known gene information (Wang, J.
was implemented by linking QUGENE to the et al., 2005) and optimize MAS to efficiently
Agricultural Production System Simulator pyramid multiple genes (Kuchel et al., 2005;
(APSIM) cropping systems model (Keating Wang, J. et al., 2007).
et al., 2003; http://www.apsru.gov.au). This By defining breeding strategy, QULINE
module can be used to simulate breeding translates the complicated breeding process
line performance in a given environment into a way that the computer can understand
and extrapolate the effects of long-term and simulate. QULINE allows for several
selection over many breeding cycles and breeding strategies to be defined simulta-
seasons. Another project supported by the neously. The programme then starts with
Generation Challenge Programme links the same virtual crosses for all the defined
QUGENE/APSIM with QTL data on maize leaf strategies at the first breeding cycle, includ-
growth under drought. These projects aim ing the same initial population, crosses and
to deliver modelling tools into the hands genotype and environment systems, allow-
of molecular breeders and other research- ing appropriate comparisons. A breeding
ers to extend the scope and impact of their strategy in QULINE is defined to include all
use, particularly with respect to molecular activities involved in an entire breeding
breeding of complex traits such as drought cycle such as crossing, seed propagation
tolerance (Dwivedi et al., 2007). and selection (Wang and Pfeiffer, 2007).
As a QUGENE application module, A breeding cycle begins with crossing and
QULINE was developed at the International ends at the generation when the selected
Maize and Wheat Improvement Center advanced lines are returned to the cross-
(CIMMYT) specifically for wheat-breeding ing block as new parents. The genotypic
programme simulation. It is a computer value of a genotype is calculated based on
tool capable of defining a range of genetic the definition of gene actions. The pheno-
models from simple to complex and simu- typic value and family mean is derived from
lating breeding processes for developing the genotypic value and its associated error
final advanced lines. Simulation indicated (environmental deviation). With all defined
that it can be used to optimize breeding phenotypic and genotypic values, QULINE
methodology and improve breeding effi- then makes within-family selection from
ciency. QULINE can be used to integrate vari- phenotypic values and among-family selec-
ous genes with multiple alleles functioning tion from family means.
within epistatic networks and differentially To simulate in QULINE, the seed propa-
interacting with the environment and pre- gation type must be defined to describe
dict the outcomes from a specific cross how the selected plants in a retained fam-
following the application of a real selec- ily from the previous selection round or
tion scheme (Wang et al. 2003, 2004). The generation are propagated to generate the
breeding methods that can be simulated by seed for the current selection round or gen-
QULINE are mass selection, pedigree system eration. Wang and Pfeiffer (2007) defined
(including single seed descent), bulk popu- nine options for seed propagation, which
lation system, backcross breeding, top cross can be presented in the order of increasing
(or three-way cross) breeding, DH breeding, genetic diversity (the F1 excluded) as: (i)
MAS and many combinations and modifi- clone (asexual reproduction); (ii) DH (dou-
cations of these methods. bled haploid); (iii) self (self-pollination);
620 Chapter 15

(iv) singlecross (single crosses between Practical applications often oblige crop
two parents); (v) backcross (backcrossed modellers to emphasize simulation of eco-
to one of the two parents); (vi) topcross nomic yield. A set of traits that are involved
(crossed to a third parent, also known as a in stress response is also worthy consider-
three-way cross); (vii) doublecross (crossed ing. While allowing precise control of plant
between two F1s); (viii) random (random response and gene expression, specific stress
mating among the selected plants in a fam- responses may largely be survival mech-
ily); and (ix) noself (random mating but anisms. Thus, whereas their study could
self-pollination is eliminated). The seed improve the simulation of plant survival,
for the F1 is derived from crossing among the results might prove harder to relate to
the parents in the initial population (or the simulation of basic processes of growth
crossing block). QULINE randomly deter- and partitioning (White and Hoogenboom,
mines the female and the male parents for 2003). Innovative simulation models will
each cross from a defined initial popula- bridge the gap between molecular and con-
tion, or alternatively, one may select some ventional plant breeding and will inform
preferred parents from the crossing block. both strategic research and tactical breed-
The selection criteria used to identify such ing decisions (www.generationcp.org/
preferred parents can be defined in terms sccv10/sccv10_upload/modelling_links.
of among-family and within-family selec- pdf). Simulation models integrate molecu-
tion descriptors within the crossing block lar information about interaction between
(referred to as the F0 generation). By using genes and simpler traits to allow realistic
the parameter of seed propagation type, predictions for more complex traits such as
most if not all methods of seed propagation drought tolerance and yield.
in self-pollinated crops can be simulated Developing and implementing a design-
in QULINE. led breeding system for complex traits
requires enhanced attention to precision
phenotyping, eco-physiological modelling
and marker validation to ensure robust-
15.4.4 The future of simulation ness and selective power. These approaches
and modelling require the iterative and systemic integration
of a range of scientific disciplines including
There are several practical implementation modellers, physiologists, geneticists, breed-
issues in simulation and modelling to be ers and molecular biologists. Nevertheless,
solved, including: (i) communications and the first preliminary studies reviewed in
training required to combine modelling and this section suggest that a new paradigm in
simulation with real breeding programmes knowledge-led, design-driven plant breed-
through involvement of other scientists ing is a feasible option and that for the
including breeders, agronomists and geneti- first time, genomics may finally realize its
cists; (ii) standardization and documen- potential impact on breeding complex traits
tation of data collection for phenotypic, (Dwivedi et al., 2007).
environmental and genomic information Although many public databases on
needs to be enforced through the project; genes, alleles, gene and genomic sequences
(iii) unexpected and great variation within and related information are maintained
selection and target environments requires by geneticists and molecular biologists,
much more comprehensive data collec- physiologists and modellers may find these
tion, compared to other breeding environ- databases less useful than expected. The
ments with much less stressful factors; user interfaces assume familiarity with bio-
and (iv) when more and more factors are informatics. Databases of gene sequences
involved in modelling and simulation, data and protein structure lack information on
generation and collection should be done actual gene function in most cases. The
with more data dimensions including more number of lines or cultivars characterized
locations, samples and replications. for a given gene is usually limited to the
Decision Support Tools 621

parents used in describing the gene and few haplotypes, extensive phenotyping of all
data on field performance are found (White agronomic traits for both the mapping pop-
and Hoogenboom, 2003). For example, the ulations and the inbred lines that are used
Arabidopsis Information Resource (2000) for chromosome haplotyping and allele
purports to provide more phenotypic data assessment. Breeding by design involves
for Arabidopsis as a model plant, but still the integrative, complementary applica-
falls short of meeting the requirements of tion of technological tools and the materi-
whole plant model. The same is true for rice als currently available to develop superior
and its related databases. cultivars. During this process, an enormous
resource of knowledge is generated and
accumulated that should enable breeders to
deploy more rational and refined breeding
15.5 Breeding by Design strategies in the future. The developments
in high-throughput genotyping and genetic
The advances in applied genomics and mapping with associated statistical method-
the possibility of generating large-scale ology have now brought this strategy within
marker data sets provide us with the tools reach. The optimal exploitation of the nat-
to determine the genetic basis for all traits urally available genetic resources should
of agronomic importance. In addition, create unsurpassed possibilities to generate
methods for assessing the allelic varia- new traits and crop performance.
tion at these agronomically important loci
are now available. This combined knowl-
edge will eventually allow the breeder to 15.5.1 Parental selection
combine the most favourable alleles at all
these loci in a controlled manner to design Selecting parents to make crosses is the
superior cultivars in silico. This concept is first and essential step in plant breeding
called breeding by design (Peleman and (Fehr, 1987). Due to incomplete gene infor-
van der Voort, 2003) and has been gener- mation (i.e. some resistance genes and their
alized to breeding design using genome- effects on phenotype are known, while
wide QTLmarker associations identified other genes and most genes for other agro-
through all types of effort due to the fast nomic traits are unknown), many seem-
development in molecular marker technol- ingly good crosses are discarded during the
ogy (Bernardo, 2002; Peleman and van der segregating phase of a breeding programme.
Voort, 2003). The goal can be reached fol- Almost all agronomic traits including dis-
lowing a three-step approach: (i) mapping ease resistance, stress tolerance and yield
loci involved in all agronomically relevant involve complex genetics. It makes sense
traits; (ii) assessing the allelic variation at to understand as much as we can about the
those loci; and (iii) following the breed- plant parents, including genotype, before
ing by design approach. Because the posi- we make decisions about crossing one par-
tions of all loci of importance are mapped ent with another. In most plant breeding
precisely, recombinant events can be accu- programmes, less than 1% of all the crosses
rately selected using flanking markers to made end up in a cultivar. To a layperson,
collate the different favourable alleles next that may seem incredibly inefficient, but
to each other. Software tools should enable thats the nature of the beast. What is most
us to determine the optimal route for gen- important in plant breeding is to pick the
eration of those mosaic genotypes by cross- right parents so that breeders would have
ing lines and using markers to select for the fewer crosses to deal with and would be
specific combinations that will eventually able to spend more time and attention
combine all those alleles. The prerequisites on the crosses that will result in superior
for this approach include extremely satu- material.
rated marker maps available to enable the Generally speaking, the cross with the
generation of high-resolution chromosome highest progeny mean and largest genetic
622 Chapter 15

variance has the most potential to produce get genotype and the probability of success-
the best lines (Bernardo, 2002). Under an fully generating new cultivars through the
additive genetic model, the mid-parent proposed breeding system.
value is a good predictor of the progeny Cross performance can be accurately
mean, but the variance cannot be deduced predicted when information about the genes
from the performance of the parents alone. controlling the traits of interest is known.
The best way to estimate the progeny vari- If progeny arrays after selection in a breed-
ance is to generate and test the progeny. ing programme could be predicted, then the
Breeders normally use one of two types of efficiency of plant breeding would be greatly
parental selection: one based on parental increased. Take wheat as an example. For the
information, such as parental performance majority of economically important traits in
or the genetic diversity among parents; the wheat breeding the genes controlling their
other based on parental and progeny infor- expression remain unknown. However, for
mation. In the first case, previous studies wheat quality this information is known,
found that both high high and high low though incompletely, for certain aspects of
crosses have the potential to produce the wheat quality (Eagles et al., 2002, 2004).
best lines. In the second case, the progeny Wang and Pfeiffer (2007) demonstrated how
needs to be grown and tested, which pre- cross performance, following selection, can
cludes parental selection. Due to compli- be predicted in wheat quality breeding by
cated intra-genic, inter-genic and GEIs, no using QULINE, under the condition that all
method has given a precise prediction of the gene information of key selection traits
cross performance (Wang et al., 2005). is known.
Breeders are already aware of what par- Plant breeders have been always con-
ents are available, but often breeders pheno- fronted with the problem of predicting the
typic and field data comes in spreadsheets expected phenotypic performance of new
with numerous columns and reams of data individuals with untested gene combin-
without much association with other types ations (new genotypes) with limited infor-
of data generated in genetics and genomics. mation on the GP architecture for traits. The
Once software becomes available to show success of molecular breeding relies on an
a full genome genotype of all possible par- effective prediction of phenotypic variation
ents, one can ask, for example, which par- based on allelic variation. There are oppor-
ents will provide high yield and resistance tunities to apply molecular technologies to
to a specific disease. The informatics tool further refine the pedigree-based breeding
will indicate to the breeder what genes will strategies used today. Ultimately it will not
be traceable in the progeny and which are be sufficient to demonstrate that we can
the best sets of molecular markers for track- predict phenotypic variation and the phe-
ing these genes. notypic changes that result from selection
using genetic information, but this know-
ledge allows us to improve on the outcomes
that are currently being achieved by con-
15.5.2 Breeding product prediction ventional selection on phenotype alone.

Designing effective breeding systems


requires information about target genes,
donor germplasm and proposed elite recur- 15.5.3 Selection method evaluation
rent parents. This can then be combined
with evaluation data on the target biological To develop new genotypes that are geneti-
characteristics, breeding objectives for the cally superior to those currently available
TPE, in order to optimize the breeding pro- for a specific target environment, plant
cedure and selection methods through mod- breeders employ a range of selection meth-
elling and simulation analysis. This type of ods. Many field experiments have been
analysis will also predict the desirable tar- conducted to compare the efficiencies
Decision Support Tools 623

of different breeding methods. However, in time, labour and costs associated with
because of the time and effort spent in con- nursery preparation, planting and plot
ducting field experiments, the concept of labelling (van Ginkel et al., 2002).
modelling and prediction has always been Before simulation, the breeders already
of interest to plant breeders. knew that SELBLK can save costs com-
Taking the bread wheat breeding at pared with MODPED. Some small-scale
CIMMYT as an example, breeders spend field experiments have been conducted
great efforts in choosing parents to make comparing the efficiencies of MODPED
the targeted crosses and approximately and SELBLK (Singh et al., 1998), but the
5080% of crosses are discarded in gen- relative efficiency of the two methods
erations F1 to F8, following selection for remains untested on a larger scale. Wang
agronomic traits (e.g. plant height, lodging and Pfeiffer (2007) illustrated the simula-
tolerance, tillering, appropriate heading tion principles by using the QULINE module
date and balanced yield components), dis- with CIMMYTs wheat breeding programme
ease resistance (e.g. stem rust, leaf rust and as an example. They developed the genetic
stripe rust) and end-use quality (e.g. dough models accounting for epistasis, pleiotropy
strength and extensibility, protein quantity and GEI. For each selection method, the
and quality). Then, after two cycles of yield simulation experiment comprised the same
trials (i.e. preliminary yield trial in F8 and 1000 crosses derived from 200 parents with
replicated yield trial in F9), only 10% of the an assumption that a total of 258 advanced
initial crosses remain, among which 13% lines remained following ten generations
of the crosses originally made are released of selection. The tests for the two methods
as cultivars from CIMMYTs international were each repeated 500 times on 12 GE sys-
nurseries (Wang et al., 2003, 2005). This fact tems. The simulation not only provided a
is true across plant breeding programmes clear answer that the adoption of SELBLK
of different species, which calls for a more would not cause a yield-gain penalty, but
efficient breeding system. also indicated a fact that CIMMYTs breed-
Two selection methods are commonly ers did not realize, i.e. SELBLK can retain
used in CIMMYTs wheat breeding pro- more crosses in the final selected popula-
grammes. Pedigree selection was used pri- tion than MODPED.
marily from 1944 until 1985. From 1985
until the second half of the 1990s the main
selection method was a modified pedigree/ 15.6 Future Perspectives
bulk method (MODPED), which resulted
in many widely adapted wheat cultivars The use of appropriate experimental design
and was replaced in the late 1990s by and data analysis is a critical component for
the selected bulk method (SELBLK) (van successful development and application of
Ginkel et al., 2002). The MODPED method molecular breeding approaches, in particu-
begins with pedigree selection of individ- lar, marker-assisted breeding systems. Figure
ual plants in the F2 followed by three times 15.2 shows an information flowchart from
of bulk selection from F3 to F5 and pedigree data to outputs through use of various ana-
selection in the F6; hence the name modi- lytical tools. Making these choices correctly
fied pedigree/bulk. In the SELBLK method, is a highly specialized function. There is a
spikes of selected F2 plants within a cross lack of proper and simple-to-use guidelines
are harvested in bulk, resulting in one F3 for non-specialists, which makes it difficult
seed lot per cross. This process continues for them to confidently choose the appropri-
from F3 to F5, while pedigree selection is ate design and analysis methods offered by
used only in the F6. A major advantage various types of software. Having a central-
of SELBLK compared with MODPED is ized and evolving resource offering biomet-
that fewer seed lots need to be harvested, ric inputs required for molecular breeding
threshed and visually selected for seed would be a tremendously valuable asset
appearance, leading to significant savings to the research and breeding community.
624 Chapter 15

Data Tools Output

Genotype Gene functional analysis


BLASTN/X
Sequences
Markers MAPMAKER Genetic diversity
Maps MULTIQTL
Germplasm evaluation
Genealogy GENEFLOW
QTL CARTOGRAPHER Germpalsm classification
Phenotype SAS/JMP
Yield STRUCTURE Variety identification
Quality GENEMAPPER
Agronomy Genetic mapping
POWERMARKER
Stress response
ARLEQUIN Markertrait association
BIPLOT
Marker-assisted selection
Environment CMTV
Water TASSEL
G  E interaction
Fertilizer ..
Soil Environmental classification
Temperature
Precipitation Variety stability/adaptability
GIS
Day-length Integrated information management system
for molecular breeding

Fig. 15.2. Analytical tools and outputs associated with procedures in plant breeding. Three types of data
from genotype (G), phenotype (P) and environment (E) are analysed using various tools, and outputs will
be delivered to breeders for decision making.

There is an urgent need for integrated mol- Independent of the platform and the
ecular tools including those for facilitating analysis methods used, the result of micro-
molecular breeding design, integrated map- array experiments is, in most cases, a list
ping and MAS and communications between of differentially expressed genes. An auto-
genomics scientists, geneticists, bioinforma- matic ontological analysis approach using
ticians and breeders. Gene Ontology has been proposed to help
There is also a need to develop mol- with the biological interpretation of such
ecular breeding decision support tools that results (Khatri et al., 2002). Currently this
can use modelling and simulation analysis approach is the de facto standard for the sec-
of all pre-existing and project generated ondary analysis of high-throughput experi-
data. These tools will help breeders design ments and a large number of tools have
and implement the most efficient breeding been developed for this purpose. Khatri
schemes (including cost- and time-related and Draghici (2005) provided a detailed
factors) based on the optimum combina- comparison of 14 such tools using the fol-
tion of MAS (for both foreground and back- lowing criteria: scope of the analysis, visu-
ground) and phenotypic selection. Other alization capabilities, statistical model(s)
decision support tools that are needed in used, correlation for multiple comparisons,
molecular breeding include: (i) those for reference microarray available, installa-
sample colleting, depositing, retrieving and tion issues and sources of annotation data.
tracking; (ii) those for data acquiring, col- This detailed analysis of the capabilities
lecting, processing and mining; and (iii) of these tools will help researchers choose
databases. the most appropriate tool for a given type
Decision Support Tools 625

of analysis. More importantly, in spite of There is a need for systematic construc-


the fact that this type of analysis has been tion of biological relationship graphs from
generally adopted, this approach has sev- the integration of gene, protein, metabolite
eral intrinsic drawbacks. These drawbacks and phenotype data (Blanchard, 2004). The
are associated with all tools discussed and challenge is now to use the large-scale data
represent conceptual limitations of the sets in a meaningful way to weed out the high
current state-of-the-art in ontological anal- false positive rates associated with high-
ysis. These are challenges for the next gen- throughput techniques and to encapsulate
eration of secondary data analysis tools. knowledge to validate and extend existing
It would be more beneficial if future tools models (Blanchard, 2004). Graphical mod-
expand the current approach by trying to els represent a union between probability
address some of these limitations rather theory and graph theory and thus represent
than providing endless variations of the a natural extension of the work on relation-
same idea. ship graphs.
This page intentionally left blank
References

Aastveit, H. and Martens, H. (1986) ANOVA interactions interpreted by partial least squares regression.
Biometrics 42, 829844.
Abad-Grau, M.M., Montes, R. and Sebastiani, P. (2006) Building chromosome-wide LD maps. Bioinformatics
22, 19331934.
Able, J.A., Langridge, P. and Milligan, A.S. (2008) Capturing diversity in the cereals: many options but little
promiscuity. Trends in Plant Sciences 12, 7179.
Abranches, R., Santos, A.P., Williams, S., Wegel, E., Castilho, A., Christou, P., Shaw, P. and Stoger, E.
(2000) Widely-separated multiple transgene integration sites in wheat chromosomes are brought
together at interphase. The Plant Journal 24, 713723.
Acosta-Gallegos, J.A., Kelly, J.D. and Gepts, P. (2007) Prebreeding in common bean and use of genetic
diversity from wild germplasm. Crop Science 47(S3), S44S59.
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A.,
Olde, B., Moreno, R.E., Kerlavage, A.R., Combie, W.R. and Venter, J.C. (1991) Complementary DNA
sequencing: expressed sequence tags and human genome project. Science 252, 16511653.
Adams, R.P. (1997) Conservation of DNA: DNA banking. In: Callow, J.A., Ford-Lloyd, B.V. and Newbury,
H.J. (eds) Biotechnology and Plant Genetics Resources Conservation and Use. CAB International,
Wallingford, UK, pp.163174.
Adi, B. (2006) Intellectual property rights in biotechnology and the fate of poor farmers agriculture. The
Journal of World Intellectual Property 9, 91112.
Aebersold, R. and Goodlett, D.R. (2001) Mass spectrometry in proteomics. Chemical Reviews 101,
269295.
Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198207.
Agrawal, P.K., Kohli, A., Twyman, R.M. and Christou, P. (2005) Transformation of plants with multiple
cassettes generates simple transgene integration patterns and high expression levels. Molecular
Breeding 16, 247260.
Aguilar, G. (2001) Access to genetic resources and protection of traditional knowledge in the territories of
indigenous peoples. Environmental Science and Policy 4, 241256.
Ahmadi, N., Albar, L., Pressoir, G., Pinel, A., Fargette, D. and Ghesquiere, A. (2001) Genetic basis and
mapping of the resistance to rice yellow mottle virus. III. Analysis of QTL efficiency in introgressed
progenies confirmed the hypothesis of complementary epistasis between two resistance QTL.
Theoretical and Applied Genetics 103, 10841092.
Ahmadian, A., Gharizadeh, B., Gustafsson, A.C., Sterky, F., Nyren, P., Uhlen, M. and Lundeberg, J. (2000)
Single nucleotide polymorphism analysis by pyrosequencing. Analytical Biochemistry 280, 103110.
Ahn, S.N. and Tanksley, S.D. (1993) Comparative linkage maps of the rice and maize genomes. Proceedings
of the National Academy of Sciences of the United States of America 90, 79807984.

627
628 References

Ahn, S.N., Anderson, J.A., Sorrells, M.E. and Tanksley, S.D. (1993) Homoeologous relationships of rice,
wheat and maize chromosomes. Molecular and General Genetics 241, 483490.
Ajmone Marson, P., Castiglioni, P., Fusari, F., Kuiper, M. and Motto, M. (1998) Genetic diversity and its
relationship to hybrid performance in maize as revealed by RFLP and AFLP markers. Theoretical and
Applied Genetics 96, 219227.
Akaike, H. (1969) Fitting autoregressive models for prediction. Annals of the Institute of Statistical
Mathematics 21, 243247.
Alan, A.R., Mutchler, M.A., Brants, A., Cobb, E. and Earle, E.D. (2003) Production of gynogenic plants from
hybrids of Allium apa L. and A. roylei Stearn. Plant Science 165, 12011211.
Allard, R.W. (1956) Formulas and tables to facilitate the calculation of recombination values in heredity.
Hilgardia 24, 235278.
Allard, R.W. (1988) Genetic changes associated with the evolution of adaptedness in cultivated plants and
their progenitors. Journal of Heredity 79, 225238.
Allard, R.W. (1999) Principles of Plant Breeding, 2nd edn. John Wiley & Son, Inc., New York, 254 pp.
Allard, R.W. and Bradshaw, A.D. (1964) Implications of genotypeenvironmental interactions in applied
plant breeding. Crop Science 4, 503507.
Allen, G.C., Spiker, S. and Thompson, W.F. (2000) Use of matrix attachment regions (MARs) to minimize
transgene silencing. Plant Molecular Biology 43, 361376.
Allen-Brady, K., Wong, J. and Camp, N.J. (2006) PedGenie: an analysis approach for genetic association
testing in extended pedigrees and genealogies of arbitrary size. BMC Bioinformatics 7, 209.
Allison, D.B., Cui, X., Page, G.P. and Sabripour, M. (2006) Microarray data analysis: from disarray to con-
solidation and consensus. Nature Reviews Genetics 7, 5565.
Alonso, J.M. and Ecker, J.R. (2006) Moving forward in reverse: genetic technologies to enable genome-
wide phenomic screens in Arabidopsis. Nature Reviews Genetics 7, 524536.
Alonso, J.M., Stepanova, A.N., Leisse, T.J., Kim, C.J., Chen, H., Shinn, P., Stevenson, D.K., Zimmerman, J.,
Barajas, P., Cheuk, R., Gadrinab, C., Heller, C., Jeske, A., Koesema, E., Meyers, C.C., Parker, H.,
Prednis, L., Ansari, Y., Choy, N., Deen, H., Geralt, M., Hazari, N., Hom, E., Karnes, M., Mulholland, C.,
Ndubaku, R., Schmidt, I., Guzman, P., Aguilar-Henonin, L., Schmid, M., Weigel, D., Carter, D.E.,
Marchand, T., Risseeuw, E., Brogden, D., Zeko, A., Crosby, W.L., Berry, C.C. and Ecker, J.R. (2003)
Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 301, 653657.
Alpert, K.B. and Tanksley, S.D. (1996) High-resolution mapping and isolation of a yeast artificial chromo-
some contig containing fw2.2: a major fruit weight quantitative trait locus in tomato. Proceedings of the
National Academy of Sciences of the United States of America 93, 1550315507.
Altpeter, F., Baisakh, N., Beachy, R., Bock, R., Capell, T., Christou, P., Daniell, H., Datta, K., Datta, S.,
Dix, P.J., Fauquet, C., Huang, N., Kohli, A., Mooribroek, H., Nicholson, L., Nguyen, T.H., Nugent, G.,
Raemakers, K., Romano, A., Somers, D.A., Stoger, E., Taylor, N. and Visser, R. (2005a) Particle
bombardment and the genetic enhancement of crops: myths and realities. Molecular Breeding 15,
305327.
Altpeter, F., Varshney, A., Abderhalden, O., Douchkov, D., Sautter, C., Kumlehn, J., Dudler, R. and Schweizer,
P. (2005b) Stable expression of a defense-related gene in wheat epidermis under transcriptional con-
trol of a novel promoter confers pathogen resistance. Plant Molecular Biology 57, 271283.
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. (1997) Gapped BLAST
and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25,
33893402.
lvarez-Castro, J.M. and Carlborg, . (2007) A unified model for functional and statistical epistasis and its
application in quantitative trait loci analysis. Genetics 176, 11511167.
Amratunga, D. and Cabrera, J. (2004) Exploration and Analysis of DNA Microarray and Protein Array Data.
John Wiley & Sons, Inc., New York.
An, G., Watson, B.D., Stachel, S. and Gordon, M.P. (1985) New cloning vehicles for transformation of higher
plants. EMBO Journal 4, 277284.
An, G., Jeong, D.-H., An, S., Kang, H.-G., Moon, S., Han, J., Park, S., Lee, H. S. and An, K. (2003) Activation
tagged mutants to discover novel rice genes. In: Mew, T.W., Brar, D.S., Peng, S., Dawe, D. and Hardy, B.
(eds) Rice Science: Innovations and Impact for Livelihood. Proceedings of the International Rice
Research Conference, 1619 September 2002, Beijing, China, International Rice Research Institute,
Chinese Academy of Engineering and Chinese Academy of Agricultural Sciences, pp. 195204.
Andersen, J.R. and Lbberstedt, T. (2003) Functional markers in plants. Trends in Plant Science 8,
554560.
References 629

Anderson, J.A., Churchill, G.A., Autrique, J.E., Tanksley, S.D. and Sorrells, M.E. (1993) Optimizing parental
selection for genetic linkage maps. Genome 36, 181186.
Andrews, L.B. (2002) Genes and patent policy: rethinking intellectual property rights. Nature Reviews
Genetics 3, 803808.
Anido, F.L., Cravero, V., Asprelli, P., Firpo, T., Garca, S.M. and Cointry, E. (2004) Heterotic patterns in
hybrids involving cultivar-groups of summer squash, Cucurbita pepo L. Euphytica 135, 355360.
Annicchiarico, P., Bellah, F. and Chiari, T. (2005) Defining subregions and estimating benefits for a specific-
adaptation strategy by breeding programs: a case study. Crop Science 45, 17411749.
Annicchiarico, P., Bellah, F. and Chiari, T. (2006) Repeatable genotype location interaction and its exploi-
tation by conventional and GIS-based cultivar recommendation for durum wheat in Algeria. European
Journal of Agronomy 24, 7081.
Antonio, B.A., Inoue, T., Kajiya, H., Nagamura, Y., Kurata, N., Minobe, Y., Yano, M., Nakagahra, M. and
Sasaki, T. (1996) Comparison of genetic distance and order of DNA markers in five populations of
rice. Genome 39, 946956.
Arabidopsis Information Resource (2000) The Arabidopsis Information Resource (TAIR). TAIR, Stanford,
California. Available at: http://www.arabidopsis.org (accessed 17 November 2009).
Aranzana, M.J., Kim, S., Zhao, K., Bakker, E., Horton, M., Jakob, K., Lister, C., Molitor, J., Shindo, C., Tang, C.,
Toomajian, C., Traw, B., Honggang Zheng, H., Bergelson, J., Dean, C., Marjoram, P. and Nordborg, M.
(2005) Genome-wide association mapping in Arabidopsis identifies previously known flowering time
and pathogen resistance genes. PLoS Genetics 1, e60.
Arcade, A., Labourdette, A., Falque, M., Mangin, B., Chardon, F., Charcosset, A. and Joets, J. (2004) BioMercator:
integrating genetic maps and QTL towards discovery of candidate genes. Bioinformatics 20, 23242326.
Arcelllana-Panlilio, M. (2005) Principles of application of DNA microarrays. In: Sensen, C.W. (ed.) Handbook
of Genome Research, Genomics, Proteomics, Metabolomics, Bioinformatics, Ethical and Legal
Issues. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp. 239260.
Arumuganathan, K. and Earle, E.D. (1991) Nuclear DNA content of some important plant species. Plant
Molecular Biology Reporter 9, 208219.
Ashikari, M., Sakakibara, H., Lin, S., Yamamoto, T., Takashi, T., Nishimura, A., Angeles, R.E., Qian, Q., Kitano, H.
and Matsuoka, M. (2005) Cytokinin oxidase regulates rice grain production. Science 309, 741745.
Ashman, K., Moran, M.F., Sicheri, F., Pawson, T. and Tyers, M. (2001) Cell signalling the proteomics of it
all. Sciences STKE. Available at: http://stke.sciencemag.org/cgi/content/full/sigtrans;2001/103/pe33
(accessed 17 November 2009).
Ashmore, S. (1997) Status Report on the Development and Application of in vitro Techniques for the
Conservation and Use of Plant Genetic Resources. Engelmann, F. (vol. ed.) International Plant
Genetic Resources Institute, Rome.
Auger, D.L., Gray, A.D., Ream, T.S., Kato, A., Coe, E.H., Jr and Birchler, J.A. (2005) Nonadditive gene
expression in diploid and triploid hybrids of maize. Genetics 169, 389397.
Auzanneau, J., Huyghe, C., Julier, R. and Barre, P. (2007) Linkage disequilibrium in synthetic varieties of
perennial ryegrass. Theoretical and Applied Genetics 115, 837847.
Avise, J.C. (1986) Mitochondrial DNA and the evolutionary genetics of higher animals. Philosophical
Transactions of the Royal Society of London B 312, 325342.
Avise, J.C. (2004) Molecular Markers, Natural History and Evolution, 2nd edn. Sinauer Associates, Inc.,
Sunderland, Massachusetts.
Avraham, S., Tung, C.-W., Ilic, K., Jaiswal, P., Kellogg, E.A., Susan McCouch, S., Pujar, A., Reiser, L.,
Rhee, S.Y., Sachs, M.M., Schaeffer, M., Stein, L., Stevens, P., Vincent, L., Zapata, F. and Ware, D.
(2008) The Plant Ontology Database: a community resource for plant structure and developmental
stages controlled vocabulary and annotations. Nucleic Acids Research 36, D449D454.
Ayele, M., Haas, B.J., Kumar, N., Wu, H., Xiao, Y., Van Aken, S., Utterback, T.R., Wortman, J.R., White, O.R.
and Town, C.D. (2005) Whole genome shotgun sequencing of Brassica oleracea and its application to
gene discovery and annotation in Arabidopsis. Genome Research 15, 487495.
Aylor, D.L., Price, E.W. and Carbone, I. (2006) SNAP: combine and map modules for multilocus population
genetic analysis. Bioinformatics 22, 13991401.
Ayoub, M., Armstrong, E., Bridger, G., Fortin, M.G. and Mather, D.E. (2003) Marker-based selection in
barley for a QTL region affecting -amylase activity of malt. Crop Science 43, 556561.
Ayres, N.M., Mclung, A.M., Larkin, P.D., Bligh, H.F.J., Jones, C.A. and Park, W.D. (1997) Microsatellites and
a single-nucleotide polymorphism differentiate apparent amylose classes in an extended pedigree of
US rice germ plasm. Theoretical and Applied Genetics 94, 773781.
630 References

Azpiroz-Leehan, R. and Feldmann, K.A. (1997) T-DNA insertion mutagenesis in Arabidopsis: going back
and forth. Trends in Genetics 13, 152156.
Babar, M.A., Reynolds, M.P., van Ginkel, M., Klatt, A.R., Raun, W.R. and Stone, M.L. (2006) Spectral
reflectance to estimate genetic variation for in-season biomass, leaf chlorophyll and canopy tempera-
ture in wheat. Crop Science 46, 10461057.
Babar, M.A., van Ginkel, M., Klatt, A.R., Prasad, B. and Reynold, M.P. (2007) The potential of using spec-
tral reflectance indices to estimate yield in wheat grown under reduced irrigation. Euphytica 150,
155172.
Babu, R., Nair, S.K., Prasanna, B.M. and Gupta, H.S. (2004) Integrating marker assisted selection in crop
breeding prospects and challenges. Current Science 87, 607619.
Babu, R., Nair, S.K., Kumar, A., Venkatesh, S., Sekhar, J.C., Singh, N.N., Srinivasan, G. and Gupta, H.S.
(2005) Two-generation marker-aided backcrossing for rapid conversion of normal maize lines to
Quality Protein Maize (QPM). Theoretical and Applied Genetics 111, 888897.
Bachem, C.W.B., van der Hoeven, R.S., de Bruijn, S.M., Vreugdenhil, D., Zabeau, M. and Visser, G.R.F.
(1996) Visualization of differential gene expression using a novel method of RNA fingerprinting
based on AFLP: analysis of gene expression during potato tuber development. The Plant Journal 9,
745753.
Bafna, V., Gusfield, D., Lancia, G. and Yooseph, S. (2003) Haplotyping as perfect phylogeny: a direct
approach. Journal of Computational Biology 10, 323340.
Bagge, M. and Lbberstedt, T. (2008) Functional markers in wheat: technical and economic aspects.
Molecular Breeding 22, 319328.
Bagge, M., Xia, X. and Lbberstedt, T. (2007) Functional markers in wheat. Current Opinion in Plant Biology
10, 211216.
Baginsky, S. and Gruissem, W. (2004) Choroplast proteomics: potentials and challenges. Journal of
Experimental Botany 55, 12131220.
Baginsky, S. and Gruissem, W. (2006) Arabidopsis thaliana proteomics: from proteome to genome. Journal
of Experimental Botany 57, 14851491.
Baieri, A., Bogdan, M., Frommlet, F. and Futschik, A. (2006) On locating multiple interacting quantitative
trait loci in intercross designs. Genetics 173, 16931703.
Baisakh, N., Datta, K., Oliva, N., Ona, I., Rao, G.J.N., Mew, T.W. and Datta, S.K. (2001) Rapid develop-
ment of homozygous transgenic rice using anther culture harboring rice chitinase gene for enhanced
sheath blight resistance. Plant Biotechnology 18, 101108.
Baker, R.J. (1986) Selection Indices in Plant Breeding. CRC Press, New York.
Bal, U. and Abak, K. (2007) Haploidy in tomato (Lycopersicon esculenttum Mill.): a critical review. Euphytica
158, 19.
Balint-Kurti, P.J., Zwonitzer, J.C., Wisser, R.J., Carson, M.L., Oropeza-Rosas, M.A., Holland, J.B. and
Szalma, S.J. (2007) Precise mapping of quantitative trait loci for resistance to southern leaf blight,
caused by Cochliobolus heterostrophus race O and flowering time using advanced intercross maize
lines. Genetics 176, 645657.
Balzergue, S., Dubreucq, B., Chauvin, S., Le-Clainche, I., Le Boulaire, F., de Rose, R., Samson, F.,
Biaudet, V., Lecharny, A., Cruaud, C., Weissenbach, J., Caboche, M. and Lepiniec, L. (2001) Improved
PCR-walking for large-scale isolation of plant T-DNA borders. Biotechniques 30, 496503.
Bnziger, M., Setimela, P.S., Hodson, D. and Vivek, B. (2004) Breeding for improved drought tolerance in
maize adapted to southern Africa. In: New Directions for a Diverse Planet, Proceedings of the 4th
International Crop Science Congress, 26 September1 October 2004, Brisbane, Australia. Published
on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004 (accessed 17 November 2009).
Bnziger, M., Setimela, P.S., Hodson, D. and Vivek, B. (2006) Breeding for improved abiotic stress toler-
ance in maize adapted to southern Africa. Agricultural Water Management 80, 212224.
Bao, J.B., Lee, S., Chen, C., Zhang, X.-Q., Zhang, Y., Liu, S.-Q., Clark, T., Wang, J., Cao, M.-L., Yang,
H.-M., Wang, S.M. and Yu, J. (2005) Serial analysis of gene expression study of a hybrid rice strain
(LYP9) and its parental cultivars. Plant Physiology 138, 12161231.
Barclay, I.R. (1975) High frequencies of haploid production in wheat (Triticum aestivum) by chromosome
elimination. Nature 256, 410411.
Bard, J.B.L. and Rhee, S.Y. (2004) Ontologies in biology: design, applications and future challenges. Nature
Reviews Genetics 5, 213222.
Bar-Hen, A., Charcosset, A., Bourgoin, M. and Guiard, J. (1995) Relationship between genetic markers
and morphological traits in a maize inbred lines collection. Euphytica 84, 145154.
References 631

Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. (2005) Haploview: analysis and visualization of LD and hap-
lotype maps. Bioinformatics 21, 263265.
Barrett, S.C.H and Kohn, J.R. (1991) Genetic and evolutionary consequences of small population size in
plants: implications for conservation. In: Falk, D.A. and Holsinger, K.E. (eds) Genetics and Conservation
of Rare Plants. Oxford University Press, Oxford, UK, pp. 330.
Barro, F., Cannell, M.E., Lazzeri, P.A. and Barcelo, P. (1998) The influence of auxins on transformation of
wheat and tritordeum and analysis of transgene integration patterns in transformants. Theoretical and
Applied Genetics 97, 684695.
Bartlett, J.M.S. (2002) Approaches to the analysis of gene expression using mRNA a technical overview.
Molecular Biotechnology 21, 149160.
Barton, J. (2000) Reforming the patent system. Science 287, 19331934.
Barton, N.H. and Keightley, P.D. (2002) Understanding quantitative genetic variation. Nature Reviews
Genetics 3, 1121.
Barua, U.M., Chalmers, K.J., Hackett, C.A., Thomas, W.T., Powell, W. and Waugh, R. (1993) Identification
of RAPD markers linked to a Rhynchosporium secalis resistance locus in barley using near-isogenic
lines and bulked segregant analysis. Heredity 71, 177184.
Beaujean, A., Sangwan, R.S., Hodges, M. and Sangwan-Norreel, B.S. (1998) Effect of ploidy and homozy-
gosity on transgene expression in primary tobacco transformants and their androgenetic progenies.
Molecular and General Genetics 260, 362371.
Beavis, W.D. (1994) The power and deceit of QTL experiments: lessons from comparative QTL studies. In:
49th Annual Corn and Sorghum Industry Research Conference. American Seed Trade Association,
Washington, DC, pp. 250266.
Beavis, W.D. (1998) QTL analyses: power, precision and accuracy. In: Paterson, A.H. (ed.) Molecular
Dissection of Complex Traits. CRC Press, Boca Raton, Florida, pp. 145162.
Beavis, W.D. (1999) QTL mapping in plant breeding populations. Patent EP 1042507.
Beavis, W.D. and Keim, P. (1996) Identification of QTL that are affected by environment. In: Kang, M.S. and
Gaugh, H.G. (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 123149.
Beavis, W.D., Grant, D., Albertson, M. and Fincher, R. (1991) Quantitative trait loci for plant height in
four maize populations and their associations with qualitative genetic loci. Theoretical and Applied
Genetics 83, 141145.
Beck von Bodman, S., Domier, L.L. and Farrand, S.K. (1995) Expression of multiple eukaryotic genes from
a single promotor in Nicotiana. BioTechnology 13, 587591.
Beckert, M. (1994) Advantages and disadvantages of the use of in vitro/in situ produced DH maize plants.
In: Bajaj, Y.P.S. (ed.) Biotechnology in Agriculture and Forestry, Vol. 25. Springer-Verlag, Berlin, pp.
201213.
Beckmann, J.S. and Soller, M. (1986a) Restriction fragment length polymorphisms in plant genetic improve-
ment. Oxford Surveys of Plant Molecular and Cell Biology 3, 196250.
Beckmann, J.S. and Soller, M. (1986b) Restriction fragment length polymorphisms and genetic improve-
ment of agricultural species. Euphytica 35, 111124.
Bedell, J.A., Budiman, M.A., Nunberg, A., Citek, R.W., Robbins, D., Jones, J., Flick, E., Rohlfing, T., Fries, J.,
Bradford, K., McMenamy, J., Smith, M., Holeman, H., Roe, B.A., Wiley, G., Korf, I.F., Rabinowicz, P.D.,
Lakey, N., McCombie, W.R., Jeddeloh, J.A. and Martienssen, R.A. (2005) Sorghum genome sequen-
cing by methylation filtration. PLoS Biology 3, 01030115.
Beer, S.C., Siripoonwiwat, W., ODonoughue, L.S., Sousza, E., Matthews, D. and Sorrells, M.E. (1997)
Associations between molecular markers and quantitative traits in a germplasm pool: can we infer
linkages? Journal of Agricultural Genomics 3. Available at: http://www.ncgr.org/research/jag/papers97/
paper197/indexp197.html (last accessed 31 December 2007).
Bekaert, S., Storozhenko, S., Mehrshahi, P., Bennett, M.J., Lambert, W., Gregory, J.F. III, Schubert, K.,
Hugenholtz, J., van der Straeten, D. and Hanson, A.D. (2008) Folate biofortification in food plants.
Trends in Plant Science 13, 2835.
Benchimol, L.L., de Souza, C.L., Jr, Garcia, A.F.F., Kono, P.M.S., Mangolin, C.A., Barbosa, A.M.M., Coelho,
A.S.G. and de Souza, A.P. (2000) Genetic diversity in tropical maize inbred lines: heterotic group
assignment and hybrid performance determined by RFLP markers. Plant Breeding 119, 491496.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: lessons from comparative QTL
approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289300.
Bennet, S.T., Barnes, C., Cox, A., Davies, L. and Brown C. (2005) Toward the 1,000 dollar human genome.
Pharmacogenomics 6, 373382.
632 References

Bennett, M.D., Finch, R.A. and Barclay, I.R. (1976) The time rate and mechanism of chromosome elimina-
tion in Hordeum hybrids. Chromosoma 54, 175200.
Bennetzen, J.L. (1996) The use of comparative genome mapping in the identification, cloning and manipu-
lation of important plant genes. In: Sobral, B.W.S. (ed.) The Impact of Plant Molecular Genetics.
Birkhuer, Boston, Massachusetts, pp. 7185.
Bennetzen, J.L. and Ma, J. (2003) The genetic colinearity of rice and other cereals on the basis of genomic
sequence analysis. Current Opinion in Plant Biology 6, 128133.
Bennetzen, J.L. and Ramakrishna, W. (2002) Numerous small rearrangements of gene content, order and
orientation differentiate grass genomes. Plant Molecular Biology 48, 821827.
Benson, E.E. (1990) Free Radical Damage in Stored Plant Germplasm. International Board for Plant
Genetic Resources (IBPGR), Rome.
Bent, A.F. (2000) Arabidopsis in planta transformation. Uses, mechanisms and prospects for transformation
of other species. Plant Physiology 124, 15401547.
Bernacchi, D., Beck-Bunn, T., Emmatty, D., Eshed, Y., Inai, S., Lopez, J., Petiard, V., Sayama, H., Uhlig, J.,
Zamir, D. and Tanksley, S.D. (1998a) Advanced backcross QTL analysis of tomato: II. Evaluation of
near-isogenic lines carrying single-donor introgressions for desirable wild QTL-alleles derived from
Lycopersicon hirsutum and L. pimpinellifolium. Theoretical and Applied Genetics 97, 170180.
Bernacchi, D., Beck-Bunn, T., Eshed, Y., Lopez, J., Petiard, V., Uhlig, J., Zamir, D. and Tanksley, S.D. (1998b)
Advanced backcross QTL analysis in tomato. I. Identification of QTLs for traits of agronomic import-
ance from Lycopersicon hirsutum. Theoretical and Applied Genetics 97, 381397.
Bernardo, R. (1991) Retrospective index weights used in multiple trait selection in a maize breeding pro-
gram. Crop Science 31, 11741179.
Bernardo, R. (1992) Relationship between single-cross performance and molecular marker heterozygosity.
Theoretical and Applied Genetics 83, 628634.
Bernardo, R. (1993) Estimation of coefficient of coancestry using molecular markers in maize. Theoretical
and Applied Genetics 85, 10551062.
Bernardo, R. (1994) Prediction of maize single-cross performance using RFLPs and information from
related hybrids. Crop Science 34, 2025.
Bernardo, R. (1996) Best linear unbiased prediction of maize single-cross performance. Crop Science 36,
5056.
Bernardo, R. (1999) Best linear unbiased predictor analysis. In: Coors, J.G. and Pandey, S. (eds) Genetics
and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 269276.
Bernardo, R. (2001) What if we knew all the genes for a quantitative trait in hybrid crops? Crop Science
41, 14.
Bernardo, R. (2002) Breeding for Quantitative Traits in Plants. Stemma Press, Woodbury, Minnesota,
369 pp.
Bernardo, R. (2004) What proportion of declared QTL in plants are false? Theoretical and Applied Genetics
109, 419424.
Bernardo, R. (2008) Molecular markers and selection for complex traits in plants: learning from the last 20
years. Crop Science 48, 16491664.
Bernardo, R. and Yu, J. (2007) Prospects for genomewide selection for quantitative traits in maize. Crop
Science 47, 10821090.
Bernot, A. (2004) Genome, Transcriptome and Protein Analysis. John Wiley & Sons, Ltd, Chichester, UK.
Betrn, F.J., Ribaut, J.M., Beck, D. and Gonzalez de Len, D. (2003) Genetic diversity, specific combining
ability and heterosis in tropical maize under stress and nonstress environments. Crop Science 43,
797806.
Bevan, M. (1984) Binary Agrobacterium vectors for plant transformation. Nucleic Acids Research 12,
87118721.
Bhave, S.V., Hombaker, C., Phang, T.L., Saba, L., Lapadat, R., Kechris, K., Gaydos, J., McGoldrick, D.,
Dolbey, A., Leach, S., Soriano, B., Ellington, A., Ellington, E., Jones, K., Mangion, J., Belknap, J.K.,
Williams, R.W., Hunter, L.E., Hoffman, P.L. and Tabakoff, B. (2007) The PhenoGen informatics web-
site: tools for analyses of complex traits. BMC Genetics 8, 59.
Bhojwani, S.S. (ed.) (1990) Plant Tissue Culture: Applications and Limitations. Elsevier Science Publishers,
The Netherlands.
Biber-Klemm, S. and Cottier, T. (2006) (eds) Rights to Plant Genetic Resources and Traditional Knowledge:
Basic Issues and Perspectives. CAB International, Wallingford, UK, 448 pp.
References 633

Bidinger, F.R., Serraj, R., Rizvi, S.M.H., Howarth, C., Yadav, R.S. and Hash, C.T. (2005) Field evaluation of
drought tolerance QTL effects on phenotype and adaptation in pearl millet (Pennisetum glaucum (L.)
R. Br.) top cross hybrids. Field Crops Research 94, 1432.
Bijlsma, R., Allard, R.W. and Kahler, A.L. (1986) Nonrandom mating in an open-pollinated maize popula-
tion. Genetics 112, 669680.
Bingham, P.M., Levis, R. and Rubin, G.M. (1981) Cloning of DNA sequences from the white locus of
Drosophila melanogaster by a general and novel method. Cell 25, 693704.
Bink, M.C.A.M. and Meuwissen, T. (2004) Fine mapping of quantitative trait loci using linkage disequilibrium
in inbred plant populations. Euphytica 137, 9599.
Birchler, J.A., Auger, D.L. and Riddle, N.C. (2003) In search of the molecular basis of heterosis. The Plant
Cell 15, 22362239.
Birney, E., Thompson, J.D. and Gibson, T.J. (1996) PairWise and SearchWise: finding the optimal alignment
in a simultaneous comparison of a protein profile against all DNA translation frames. Nucleic Acids
Research 24, 27302739.
Biswas, S., Storey, J.D. and Akey, J.M. (2008) Mapping gene expression quantitative trait loci by singular
value decomposition and independent component analysis. BMC Bioinformatics 9, 244.
Bizily, S.P., Rugh, C.L. and Meagher, R.B. (2000) Phytodetoxification of hazardous organomercurials by
genetically engineered plants. Nature Biotechnology 18, 213217.
Blakeslee, A.F. and Avery, A.H. (1937) Methods of inducing chromosome doubling in plants. Journal of
Heredity 28, 393411.
Blanc, G., Charcosset, A., Mangin, B., Gallais, A. and Moreau, L. (2006) Connected populations for detect-
ing quantitative trait loci and testing for epistasis: an application in maize. Theoretical and Applied
Genetics 113, 206224.
Blanchard, J.L. (2004) Bioinformatics and systems biology, rapidly evolving tools for interpreting plant
response to global change. Field Crops Research 90, 117131.
Blanco, A., Lotti, C., Simeone, R., Signorile, A., De-Santis, V., Pasqualone, A., Troccoli, A. and Di-Fonzo, N.
(2001) Detection of quantitative trait loci for grain yield and yield components across environments in
durum wheat. Cereal Research Communications 29, 237244.
Bligh, H.F.J., Till, R.I. and Jones, C.A. (1995) A microsatellite sequence closely linked to the waxy gene of
Oryza sativa. Euphytica 86, 8385.
Blow, N. (2008) Mass spectrometry and proteomics: hitting the mark. Nature Methods 5, 741747.
Bochner, B.R. (1989) Sleuthing out bacterial identifies. Nature 339, 157158.
Bochner, B.R. (2003) New technologies to assess genotypephenotype relationships. Nature Reviews
Genetics 4, 309314.
Boer, M.P., ter Braak, C.J.F and Jansen, R.C. (2002) A penalized likelihood method for mapping epistatic
quantitative trait loci with one-dimensional genome searches. Genetics 162, 951960.
Bogyo, T.P., Lance, R.C.M., Chevalier, P. and Nilan, P.A. (1988) Genetic models for quantitatively inherited
endosperm characters. Heredity 60, 6167.
Bohanec, B., Jakse, M. and Havey, M.J. (2003) Genetic analysis of gynogenetic haploid production in
onion. Journal of American Horticulture Science 128, 571574.
Bollen, K.A. (1989) Structural Equations with Latent Variables. John Wiley & Sons, New York.
Bonnet, D.G., Rebetzke, G.J. and Spielmeyer, W. (2005) Strategies for efficient implementation of molecu-
lar markers in wheat breeding. Molecular Breeding 15, 7585.
Boppenmaier, J., Melchinger, A.E., Seitz, G., Geiger, H.H. and Herrmann, R.G. (1993) Genetic diversity
for RFLPs in European maize inbreds. III. Performance of crosses within versus between heterotic
groups for grain traits. Plant Breeding 111, 217226.
Borevitz, J.O. and Ecker, J.R. (2004) Plan genomics: the third wave. Annual Review of Genomics and
Human Genetics 5, 443477.
Borevitz, J.O., Maloof, J.N., Lutes, J., Dabi, T., Redfern, J.L., Trainer, G.T., Werner, J.D., Asami, T., Berry,
C.C., Weigel, D. and Chory, J. (2002) Quantitative trait loci controlling light and hormone response in
two accessions of Arabidopsis thaliana. Genetics 160, 683696.
Borevitz, J.O., Liang, D., Plouffe, D., Chang, H.S., Zhu, T., Weigel, D., Berry, C.C., Winzeler, E. and Chory, J.
(2003) Large-scale identification of single-feature polymorphisms in complex genomes. Genome
Research 13, 513523.
Borevitz, J.O., Hazen, S.P., Michael, T.P., Morris, G.P., Baxter, I.R., Hu, T.T., Chen, H., Werner, J.D.,
Nordborg, M., Salt, D.E., Kay, S.A., Chory, J., Weigel, D., Jones, J.D.G. and Ecker, J.R. (2007)
634 References

Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proceedings of the


National Academy of Sciences of the United States of America 104, 1205712062.
Borlaug, N.E. (1972) The Green Revolution, Peace and Humanity. CIMMYT Reprint and Translation Series
No. 3, International Maize and Wheat Improvement Center, Mexico DF.
Borlaug, N.E. (2000) Ending world hunger. The promise of biotechnology and the threat of antiscience
zealotry. Plant Physiology 124, 487490.
Borlaug, N.E. (2001) Feeding the world in the 21st century: the role of agricultural science and technology.
Speech given at Tuskegee University, April 2001. Available at: http://www.agbioworld.org/biotech-info/
topics/borlaug/borlaugspeech.html (accessed 17 November 2009).
Borrell, A.K., Hammer, G.L. and van Oosterom, E. (2001) Staygreen: a consequence of the balance
between supply and demand for nitrogen during grain filling? Annals of Applied Biology 138, 9195.
Botstein, D.R., White, R.L., Skolnick, M. and Davis, R.W. (1980) Construction of a genetic linkage map in man
using restriction fragment length polymorphisms. American Journal of Human Genetics 32, 314331.
Boumedine, K.S. and Rodolakis, A. (1998) AFLP allows the identification of genomic markers of ruminant
Chlamydia psittaci strains useful for typing and epidemiological studies. Research in Microbiology
149, 735744.
Bourgault, R., Zulak, K.G. and Facchini, P.J. (2005) Applications of genomics in plant biology. In: Sensen,
C.W. (ed.) Handbook of Genome Research, Genomics, Proteomics, Metabolomics, Bioinformatics,
Ethical and Legal Issues. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp. 5980.
Bowers, J.E., Abbey, C., Anderson, S., Chang, C., Draye, X., Hoppe, A.H., Jessup, R., Lemke, C.,
Lennington, J., Li, Z., Lin, Y.-R., Liub, S.-C., Luo, L., Marler, B.S., Ming, R., Mitchell, S.E., Qiang, D.,
Reischmann, K., Schulze, S.R., Skinner, D.N., Wang, Y.-W., Kresovich, S., Schertz, K.F. and Paterson,
A.H. (2003a) A high-density genetic recombination map of sequence-tagged site for Sorghum, as
a framework for comparative structural and evolutionary genomics of tropical grains and grasses.
Genetics 165, 367386.
Bowers, J.E., Chapman, B.A., Rong, J. and Paterson, A.H. (2003b) Unravelling angiosperm genome evolu-
tion by phylogenetic analysis of chromosomal duplication events. Nature 422, 433438.
Bowman, J.G.P., Blake, T.K., Surber, L.M.M., Habernicht, D.K. and Bockelman, H. (2001) Feed-quality
variation in the barley core collection of the USDA National Small Grains Collection. Theoretical and
Applied Genetics 41, 863870.
Boyd, M.R. (1996) The position of intellectual property rights in drug discovery and development from
natural products. Journal of Ethnopharmacology 51, 1727.
Boyer, J.S. (1982) Plant productivity and environment. Science 218, 443448.
Bracha-Drori, K., Shichrur, K., Katz, A., Oliva, M., Angelovici, R., Yalovsky, S. and Ohad, N. (2004) Detection
of proteinprotein interactions in plants using bimolecular fluorescence complementation. The Plant
Journal 40, 419427.
Bradshaw, A.D. (1965) Evolutionary significance of phenotypic plasticity in plants. Advances in Genetics
13, 115155.
Bradshaw, H.D., Jr and Settler, R.F. (1995) Molecular genetics of growth and development in populus. IV. Mapping
QTLs with large effects on growth, form and phenology traits in a forest tree. Genetics 139, 963973.
Brancourt-Hulmel, M. (1999) Crop diagnosis and probe genotypes for interpreting genotype environment
interaction in winter wheat trials. Theoretical and Applied Genetics 99, 10181030.
Branton, D., Deamer, D.W., Marziali, A., Bayley, H., Benner, S.A., Butler, T., Ventra, M.D., Garaj, S., Hibbs, A.,
Huang, X., Jovanovich, S.B., Krstic, P.S., Lindsay, S., Ling, X.S., Mastrangelo, C.H., Meller, A., Oliver,
J.S., Pershin, Y.V., Ramsey, J.M., Riehn, R., Soni, G.V., Tabard-Cossa, V., Wanunu, M., Wiggin, M. and
Schloss, J.A. (2008) The potential and challenges of nanopore sequencing. Nature Biotechnology 26,
11461153.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge,
W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C., Kim, I.F., Markowitz, V.,
Matese, J.C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R.,
Vilo, J. and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME) toward
standards for microarray data. Nature Genetics 29, 365371.
Breitling, R., Pitt, A.R. and Barrett, M.P. (2006) Precision mapping of the metabolome. Trends in
Biotechnology 24, 543548.
Brem, R.B. and Kruglyak, L. (2005) The landscape of genetic complexity across 5,700 gene expression
traits in yeast. Proceedings of the National Academy of Sciences of the United States of America 102,
15721577.
References 635

Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. (2002) Genetic dissection of transcriptional regulation in
budding yeast. Science 296, 752755.
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S., McCurdy, S., Foy, M.,
Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G., Vermaas, E., Williams, S.R., Moon, K.,
Burcham, T., Pallas, M., DuBridge, R.B., Kirchner, J., Fearon, K., Mao J.-I. and Corcoran, K. (2000)
Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.
Nature Biotechnology 18, 630634.
Breseghello, F. and Sorrells, M.E. (2006a) Association mapping of kernel size and milling quality in wheat
(Triticum aestivum L.) cultivars. Genetics 172, 11651177.
Breseghello, F. and Sorrells, M.E. (2006b) Association analysis as a strategy for improvement of quantita-
tive traits in plants. Crop Science 46, 13231330.
Bretting, P. and Duvick, D. (1997) Dynamic conservation of plant genetic resources. Advances in Agronomy
61, 151.
Bretting, P.K. and Goodman, M.M. (1989) Genetic variation in crop plants and management of germ-
plasm collections. In: Stalker, H.T. and Chapman, C. (eds) Scientific Management of Germplasm:
Charaterization, Evaluation and Enhancement. International Board for Plant Genetic Resources
(IBPGR) Training Courses: Lecture Series 2. Department of Crop Science, North Carolina State
University, Raleigh, North Carolina and IBPGR, Rome, pp. 4154.
Bretting, P.K. and Widrlechner, M.P. (1995) Genetic markers and plant genetic resource management.
Plant Breeding Reviews 13, 1186.
Brick, M.A., Byrne, P.F., Schwartz, H.F., Ogg, J.B., Otto, K., Fall, A.L. and Gilbert, J. (2006) Reaction to
three races of Fusarium wilt in the Phaseolus vulgaris core collection. Crop Science 46, 12451252.
Briggs, R.N. and Knowles, P.F. (1967) Introduction to Plant Breeding. Reinhold Books, New York.
Broman, K.W. (1997) Identifying quantitative trait loci in experimental crosses. PhD thesis, Department of
Statistics, University of California, Berkeley.
Broman, K.W. (2005) The genomes of recombinant inbred lines. Genetics 169, 11331146.
Broman, K.W., Churchill, G.A., Yandell, B.S. and Zeng, Z.B. (2003a) Statistical methods for mapping
quantitative trait loci in experimental crosses. Available at: http://www.stat.wisc.edu/yandell/statgen
(accessed 17 November 2009).
Broman, K.W., Wu, H., Sen, S. and Churchill, G.A. (2003b) R/qtl: QTL mapping in experimental crosses.
Bioinformatics 19, 889890.
Brondani, C., Rangel, N., Brondani, V. and Ferreira, E. (2002) QTL mapping and introgression of yield-
related traits from Oryza glumaepatula to cultivated rice (Oryza sativa) using microsatellite markers.
Theoretical and Applied Genetics 104, 11921203.
Brookes, G. and Barfoot, P. (2008) GM Crops: Global Socio-economic and Environmental Impacts 1996
2006. PG Economics, Dorchester, UK.
Broothaerts, W., Mitchell, H.J., Weir, B., Kaines, S., Smith, L.M.A., Yang, W., Mayer, J.E., Roa-
Rodriguez, C. and Jefferson, R.A. (2005) Gene transfer to plants by diverse species of bacteria.
Nature 433, 629633.
Brown, A.D.H. (1989a) The case for core collections. In: Brown, A.D.H., Frankel, O.H., Marshall, R.D. and
Williams, J.T. (eds) The Use of Plant Genetic Resources. Cambridge University Press, Cambridge,
UK, pp. 136156.
Brown, A.D.H. (1989b) Core collection: a practical approach to genetic resources management. Genome
31, 818824.
Brown, A.H.D. and Brubaker, C.L. (2002) Indicators for sustainable management of plant genetic resources:
how well are we doing? In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T.
(eds) Managing Plant Genetic Diversity. International Plant Genetics Resources Institute (IPGRI),
Rome, pp. 249262.
Brown, A.H.D. and Weir, B.S. (1983) Measuring genetic variability in plant populations. In: Tanksley, S.D. and
Orton, T.J. (eds) Isozymes in Plant Genetics and Breeding, Vol. 1A. Developments in Plant Genetics
and Breeding 1. Elsevier, Amsterdam, pp. 219240.
Brown, G.G., Formanova, N., Jin, H., Wargachuk, R., Dondy, C., Patil, P., Laforest, M., Zhang, J., Cheung,
W.Y. and Landry, B.S. (2003) The radish Rfo restorer gene of Ogura cytoplasmic mole sterility encodes
a protein with multiple pentatricopeptide repeat. The Plant Journal 35, 262272.
Brown, P.J., Rooney, W.L., Franks, C. and Kresovich, S. (2008) Efficient mapping of plant height quantita-
tive trait loci in a sorghum association population with introgressed dwarfing genes. Genetics 180,
629637.
636 References

Brown, S.D. and Peters, J. (1996) Combining mutagenesis and genomics in the mouse closing the
phenotype gap. Trends in Genetics 12, 433435.
Brown, S.M. and Kresovich, S. (1996) Molecular characterization for plant genetic resources conservation.
In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Co., Austin, Texas, pp. 8593.
Brown, T.A. (2002) Genomics, 2nd edn. Wiley-Liss, Wilmington, Delaware, pp. 125159.
Brownstein, M.J., Carpten, J.D. and Smith, J.R. (1996) Modulation of non-templated nucleotide addition by
Taq DNA polymerase: primer modifications that facilitate genotyping. Biotechniques 20, 10041006.
Brueggeman, R., Rostoks, N., Kudrna, D., Kilian, A., Han, F., Chen, J., Druka, A., Steffenson, B. and
Kleinhofs, A. (2002) The barley stem rust-resistance gene Rpg1 is a novel disease-resistance gene
with homology to receptor kinases. Proceedings of the National Academy of Sciences of the United
States of America 99, 93289333.
Brummer, E.C. (1999) Capturing heterosis in forage crop cultivar development. Crop Science 39, 943954.
Brummer, E.C. (2006) Breeding for cropping systems. In: Lamkey, K.R. and Lee, M. (eds) Plant Breeding:
the Arnel R. Hallauer International Symposium. Blackwell Publishing, Oxford, UK, pp. 97106.
Brunner, S., Keller, B. and Feuillet, C. (2003) A large rearrangement involving genes and low copy DNA
interrupts the micro-collinearity between rice and barley at the Rph7 locus. Genetics 164, 673683.
Bruskiewich, R., Senger, M., Davenport, G., Ruiz, M., Rouard, M., Hazekamp, T., Takeya, M., Doi, K.,
Satoh, K., Costa, M., Simon, R., Balaji, J., Akintunde, A., Mauleon, R., Wanchana, S., Shah, T.,
Anacleto, M., Portugal, A., Ulat, V.J., Thongjuea, S., Braak, K., Ritter, S., Dereeper, A., Skofic, M.,
Rojas, E., Martins, N., Pappas, G., Alamban, R., Almodiel, R., Barboza, L.H., Detras, J., Manansala,
K., Mendoza, M.J., Morales, J., Peralta, B., Valerio, R., Zhang, Y., Gregorio, S., Hermocilla, J.,
Echavez, M., Yap, J.M., Farmer, A., Schiltz, G., Lee, J., Casstevens, T., Jaiswal, P., Meintjes, A.,
Wilkinson, M., Good, B., Wagner, J., Morris, J., Marshall, D., Collins, A., Kikuchi, S., Metz, T., McLaren, G.
and van Hintum, T. (2008) The Generation Challenge Programme platform: semantic standards and
workbench for crop science. International Journal of Plant Genomics, Article ID 369601, 6 pages.
Available at: http://www.hindawi.com/journals/ijpg/2008/369601.html (accessed 17 November 2009).
Buchanan, B., Gruissem, W. and Jones, R.L. (eds) (2002) Biochemistry and Molecular Biology of Plants.
John Wiley & Sons Inc., Chichester, UK.
Buckingham, S.D. (2008) Scientific software: seeing the SNPs between us. Nature Methods 5, 903908.
Buckler, E.S. IV and Thornsberry, J.M. (2002) Plant molecular diversity and applications to genomics.
Current Opinion in Plant Biology 5, 107111.
Buckler, E.S., Holland, J.B., Bradbury, P.J., Acharya, C.B., Brown, P.J., Browne, C., Ersoz, E., Flint-Garcia,
S., Garcia, A., Glaubitz, J.C., Goodman, M.M., Harjes, C., Guill, K., Kroon, D.E., Larsson, S., Lepak,
N.K., Huihui Li, H., Mitchell, S.E., Pressoir, G., Pfeiffer, J.A., Oropeza Rosas, M., Rocheford, T.R.,
Cinta Romay, M., Romero, S., Salvo, S., Sanchez-Villeda, H., Sofia da Silva, H., Qi Sun, Q., Tian, F.,
Upadyayula, N., Ware, D.,Yates, H., Yu, J., Zhang, Z., Kresovich, S. and McMullen, M.D. (2009) The
genetic architecture of maize flowering time. Science 325, 714718.
Burgueo, J., Crossa, J., Cornelius, P.L. and Yang, R.-C. (2008) Using factor analytic models for joining environ-
ment and genotypes without crossover genotype environment interaction. Crop Science 48, 12911305.
Burns, J., Fraser, P.D. and Bramley, P.M. (2003) Identification and quantification of carotenoids, tocopherols
and chlotophylls in commonly consumed fruits and vegetables. Phytochemistry 62, 939947.
Burr, B. and Burr, F.A. (1991) Recombinant inbreds for molecular mapping in maize: theoretical and practi-
cal considerations. Trends in Genetics 7, 5560.
Burr, B., Burr, F.A., Thompson, K.H., Albertson, M.C. and Stuber, C.W. (1988) Gene mapping with recom-
binant inbreds in maize. Genetics 118, 519526.
Burton, G.W. (1981) Meeting human needs through plant breeding: past progress and prospects for the
future. In: Frey, K.J. (ed.) Plant Breeding II. Iowa State University Press, Ames, Iowa, pp. 433466.
Busch, W. and Lohmann, J.U. (2007) Profiling a plant: expression analysis in Arabidopsis. Current Opinion
in Plant Biology 10, 136141.
Bschhes, R., Hollricher, K., Ranstruga, R., Simons, G., Wolter, M., Frijters, A., van Daelen, R., van der
Lee, T., Diergaarde, P., Groenendijk, J., Tpsch, S., Vos, P., Salamini, F. and Schulze-Lefert, P. (1997)
The barley Mio gene: a novel control element of plant pathogen resistance. Cell 88, 695705.
Busso, C.S., Liu, C.J., Hash, C.T., Witcombe, J.R., Devos, K.M., deWet, J.M.J. and Gale, M.D. (1995)
Analysis of recombination rate in female and male gametogenesis in pearl millet (Pennisetum glau-
cum) using RFLP markers. Theoretical and Applied Genetics 90, 242246.
Bustamam, M., Tabien, R.E., Suwarmo, A., Abalos, M.C., Kadir, T.S., Ona, I., Bernardo, M., VeraCruz, C.M.
and Leung, H. (2002) Asian rice biotechnology network: improving popular cultivars through marker-
References 637

assisted backcrossing by the NARES. Abstract of International Rice Congress, 1622 September
2002, Beijing China. Available at: http://www.irri.org/irc2002/index.htm (last accessed 31 December
2007).
Butlin, R.K. and Tregenta, T. (1998) Levels of genetic polymorphism: marker loci versus quantitative traits.
Philosophical Transactions of the Royal Society of London B 353, 112.
Byrum, J. and Reiter, R. (1998) A method for identifying genetic marker loci associated with trait loci. Patent
EP 0972076.
Caetano-Anolls, G., Bassam, B.J. and Gresshoff, P.M. (1991) DNA amplification fingerprinting using very
short arbitrary oligonucleotide primers. Bio/Technology 9, 553557.
Caicedo, A.L. and Purugganan, M.D. (2005) Comparative plant genomics. Frontiers and prospects. Plant
Physiology 138, 545547.
Caliski, T., Kaczmarek, Z., Krajewski, P., Frova, C. and Sari-Gorla, M. (2000) A multivariate approach to
the problem of QTL localization. Heredity 84, 303310.
Campbell, B.T., Baezinger, P.S., Gill, K.S., Eskridge, K.M., Budak, H., Erayman, M., Dweikat, I. and Yen, Y.
(2003) Identification of QTLs and environmental interactions associated with agronomic traits on chro-
mosome 3A of wheat. Crop Science 43, 14931505.
Campbell, B.T., Baenziger, P.S., Eskridge, K.M., Budak, H., Streck, N.A., Weiss, A., Gill, K.S. and Erayman,
M. (2004) Using environmental covariates to explain genotype environment and QTL environ-
ment interactions for agronomic traits on chromosome 3A of wheat. Crop Science 44, 620627.
Campbell, M.A., Zhu, W., Jiang, N., Lin, H., Ouyang, S., Childs, K.L., Haas, B.J., Hamilton, J.P. and Buell,
C.R. (2007) Identification and characterization of lineage-specific genes within the Poaceae. Plant
Physiology 145, 13111322.
Candela, H. and Hake, S. (2008) The art and design of genetic screens: maize. Nature Reviews Genetics
9, 192203.
Cardon, L.R. and Bell, J.I. (2001) Association study designs for complex diseases. Nature Reviews Genetics
2, 9198.
Carlborg, . and Andersson, L. (2002) Use of randomization testing to detect multiple epistatic QTLs.
Genetical Research 79, 175184.
Carlborg, . and Haley, C.S. (2004) Epistasis: too often neglected in complex trait studies? Nature Reviews
Genetics 5, 618625.
Carlborg, ., Andersson, L. and Kinghorn, B. (2000) The use of a genetic algorithm for simultaneous map-
ping of multiple interacting quantitative trait loci. Genetics 155, 20032010.
Carlborg, ., Brockmann, G.A. and Haley, C.S. (2005) Simultaneous mapping of epistatic QTL in DU6i
DBA/2 mice. Mammalian Genome 16, 481494.
Carninci, P. and Hayashizaki, Y. (1999) High-efficiency full-length cDNA cloning. Methods in Enzymology
303, 1944.
Carpenter, A.E. and Sabatini, D.M. (2004) Systematic genome-wide screens of gene function. Nature
Reviews Genetics 5, 1122.
Cartwright, D.A., Troggio, M., Velasco, R. and Gutin, A. (2007) Genetic mapping in the presence of genotyp-
ing errors. Genetics 176, 25212537.
Casali, V.W.D. and Tigchelaar, E.C. (1975) Computer simulation studies comparing pedigree, bulk and sin-
gle seed descent selection in self-pollinated populations. Journal of American Society of Horticulture
Science 100, 364367.
Casasoli, M., Derory, J., Morera-Dutrey, C., Brendel, O., Porth, I., Guehl, J.-M., Villani, F. and Kremer, A.
(2006) Comparison of quantitative trait loci for adaptive traits between oak and chestnut based on an
expressed sequence tag consensus map. Genetics 172, 533546.
Caskey, T. and Edwards, A. (1992) DNA typing with short tandem repeat polymorphisms and identification
of polymorphic short tandem repeats. Patent EP 0639228.
Castle, W.E. (1921) On a method of estimating the number of genetic factors concerned in cases of blend-
ing inheritance. Science 54, 9396.
Causier, B., Graham, J. and Davis, B. (2005) Large-scale yeast two-hybrid analysis. In: Leister, D. (ed.)
Plant Functional Genomics. Food Products Press, New York, pp. 119135.
Causse, M.A., Fulton, T.M., Cho, Y.G., Ahn, S.N., Chunwongse, J., Wu, K., Xiao, J., Yu, Z., Ronald, P.C.,
Harrington, S.E., Second, G., McCouch, S.R. and Tanksley, S.D. (1994) Saturated molecular map of
the rice genome based on an interspecific backcross population. Genetics 138, 12511274.
Cavalli-Sforza, L.L. and Edwards, A.W.F. (1967) Phylogenetic analysis: models and estimation procedures.
American Journal of Human Genetics 19, 233257.
638 References

Ceccarelli, S. and Grando, S. (2007) Decentralized participatory plant breeding: an example of demand
driven research. Euphytica 155, 349360.
Ceccarelli, S., Grando, S., Amri, A., Asaad, F.A., Benbelkacem, A., Harrabi, M., Maatougui, M., Mekni,
M.S., Mimoun, H., El-Einen, R.A., El-Felah, M., El-Sayed, A.F., Shreidi, A.S. and Yahyaoui, A. (2001)
Decentralized and participatory plant breeding for marginal environments. In: Cooper, H.D., Spillane,
C. and Hodgkins, T. (eds) Broadening the Genetic Bases of Crop Production. CAB International.
Wallingford, UK, pp. 115135.
Cerna, F.J., Cianzio, S.R., Rafalski, A., Tingey, S. and Dyer, D. (1997) Relationship between seed yield hetero-
sis and molecular marker heterozygosity in soybean. Theoretical and Applied Genetics 95, 460467.
CFIA/NFS (Canadian Food Inspection Agency/National Forum on Seed) (2005) Seminar on the use of
molecular techniques for plant variety protection. Available at: http://www.inspection.gc.ca/english/
plaveg/pbrpov/molece.shtml (last accessed 30 June 2008).
Chagn, D., Batley, J., Edwards, D. and Forster, J.W. (2007) Single nucleotide polymorphisms genotyping
in plants. In: Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association
Mapping in Plants. Springer, Berlin, pp.7794.
Chahal, G.S. and Gosal, S.S. (2002) Principles and Procedures of Plant Breeding, Biotechnological and
Conventional Approaches. Alpha Science International Ltd, Pangbourne, UK.
Chab, J., Lecomte, L., Buret, M. and Causse, M. (2006) Stability over genetic backgrounds, generations
and years of quantitative trait loci (QTLs) for organoleptic quality in tomato. Theoretical and Applied
Genetics 112, 934944.
Chan, E.K.F., Rowe, H.C. and Kliebenstein, D.J. (2009) Understanding the evolution of defense metabolites
in Arabidopsis thaliana using genome-wide association mapping. Genetics (in press).
Chan, H.P. (2006) International patent behaviour of nine major agricultural biotechnology firms. AgBioForum
9, 5968.
Chandler, P.M., Marrion-Poll, A., Ellis, M. and Gubler, F. (2002) Mutants at the Slender1 locus of barley cv
Himalaya. Molecular and physical characterization. Plant Physiology 129, 181190.
Chandler, S. and Dunwell, J.M. (2008) Gene flow, risk assessment and the environmental release of trans-
genic plants. Critical Reviews in Plant Sciences 27, 2549.
Chapman, S.C., Hammer, G.L., Podlich, D.W. and Cooper, M. (2002) Linking biophysical and genetic mod-
els to integrate physiology, molecular biology and plant breeding. In: Kang, M.S. (ed.) Quantitative
Genetics, Genomics and Plant Breeding. CAB Internationl, Wallingford, UK, pp. 167187.
Chapman, S., Cooper, M., Podlich, D.W. and Hammer, G.L. (2003) Evaluating plant breeding strategies by
simulating gene action and dryland environment effects. Agronomy Journal 95, 99113.
Charcosset, A. and Essioux, L. (1994) The effect of population structure on the relationship between het-
erosis and heterozygosity at marker loci. Theoretical and Applied Genetics 89, 336343.
Charcosset, A. and Gallais, A. (1996) Estimation of the contribution of quantitative trait loci (QTL) to the
variance of a quantitative trait by means of genetic markers. Theoretical and Applied Genetics 93,
11931201.
Charcosset, A., Lefort-Buson, M. and Gallais, A. (1991) Relationship between heterosis and heterozygosity
at marker loci: a theoretical computation. Theoretical and Applied Genetics 81, 571575.
Charcosset, A., Causse, M., Moreau, L. and Gallais, A. (1994) Investigation into the effect of genetic back-
ground on QTL expression using three recombinant inbred lines (RIL) populations. In: van Ooijen,
J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of Molecular Markers. Centre for
Plant Breeding and Reproduction Research, Wageningen, The Netherlands, pp. 7584.
Charcosset, A., Mangin, B., Moreau, L., Combes, L., Jourjon, M.F. and Gallais, A. (2000) Heterosis in maize
investigated using connected RIL populations. In: Quantitative Genetics and Breeding Methods: the
Way Ahead. Institut National de la Recherche Agronomique (INRA), Paris, pp. 8998.
Chardon, F., Virlon, B., Moreau, L., Falque, M., Joets, J., Decousset, L., Murigneux, A. and Charcosset, A.
(2004) Genetic architecture of flowering time in maize as inferred from quantitative trait loci meta-
analysis and synteny conservation with the rice genome. Genetics 168, 21692185.
Charmet, G., Robert, N., Perretant, M.R., Gay, G., Sourdille, P., Groos, C., Bernard, S. and Bernard, M.
(1999) Marker-assisted recurrent selection for cumulating additive and interactive QTLs in recom-
binant inbred lines. Theoretical and Applied Genetics 99, 11431148.
Chase, S.S. (1969) Monoploids and monoploid derivatives of maize (Zea mays L.). The Botanical Review
35, 117167.
Chavarriaga-Aguirre, P., Maya, M.M., Tohme, J., Duque, M.C., Iglesias, C., Bonierbale, M.W., Kresovich, C.
and Kochert, G. (1999) Using microsatellites, isozymes and AFLPs to evaluate genetic diversity and
References 639

redundancy in the cassava core collection and to assess the usefulness of DNA-based markers to
maintain germplasm collections. Molecular Breeding 5, 263273.
Chellappan, P., Masona, M.V., Vanitharani, R., Taylor, N.J. and Fauquet, C.M. (2004) Broad spectrum resist-
ance to ssDNA viruses associated with transgene-induced gene silencing in cassava. Plant Molecular
Biology 56, 601611.
Chen, H., Wang, S., Xing, Y., Xu, C., Hayes, P.M. and Zhang, Q. (2003) Comparative analyses of genomic
locations and race specificities of loci for quantitative resistance to Pyricularia grisea in rice and barley.
Proceedings of the National Academy of Sciences of the United States of America 100, 25442549.
Chen, J., Griffey, C.A., Chappell, M., Shaw, J. and Pridgen, T. (1999) Haploid production in twelve wheat
F1 by wheat maize hybridization method. In: Proceedings of National Fusarium Head Blight Forum,
December 1999, Sioux Falls, South Dakota, pp.147149.
Chen, J.Q., Zhou, H.M., Chen, J., and Wang, X.C. (2006) A GATEWAY-based platform for multiple plant
transformation. Plant Molecular Biology 62, 927936.
Chen, L. and Storey, J.D. (2006) Relaxed significance criteria for linkage analysis. Genetics 173,
23712381.
Chen, M., Presting, G., Barbazuk, W.B., Goicoechea, J.L., Blackmon, B., Fang, G., Kim, H., Frisch, D., Yu,
Y., Sun, S., Higingbottom, S., Phimphilai, J., Phimphilai, D., Thurmond, S., Gaudette, B., Li, P., Liu, J.,
Hatfield, J., Main, D., Farrar, K., Henderson, C., Barnett, L., Costa, R., Williams, B., Walser, S., Atkins,
M., Hall, C., Budiman, M.A., Tomkins, J.P., Luo, M., Bancroft, I., Salse, J., Regad, F., Mohapatra, T.,
Singh, N.K., Tyagi, A.K., Soderlund, C., Dean, R.A. and Wing, R.A. (2002) An integrated physical and
genetic map of the rice genome. The Plant Cell 14, 537545.
Chen, S., Lin, X.H., Xu, C.G. and Zhang, Q. (2000) Improvement of bacterial blight resistance of Minghui
63, an elite restorer line of hybrid rice, by molecular marker-assisted selection. Crop Science 40,
239244.
Chen, T.M., Lu, C.C. and Li, W.H. (2005) Prediction of splice sites with dependency graphs and their
expanded Bayesian networks. Bioinformatics 21, 471482.
Chen, X., Temnykh, S., Xu, Y., Cho, Y.G. and McCouch, S.R. (1997) Development of microsatellite frame-
work map providing genome-wide coverage in rice (Oryza sativa L.). Theoretical and Applied Genetics
95, 553567.
Chen, Y., Lu, C., He, P., Shen, L., Xu, J., Xu, Y. and Zhu, L. (1997) Gametic selection in a doubled hap-
loid population derived from anther culture of indica/japonica cross of rice. Acta Genetica Sinica 24,
322329.
Cheng, M., Fry, J.E., Pang, S., Zhou, H., Hironaka, C., Duncan, D.R., Conner, T.W. and Wan, Y. (1997)
Genetic transformation of wheat mediated by Agrobacterium tumefaciens. Plant Physiology 115,
971980.
Cheng, M., Lowe, B.A., Spencer, T.M., Ye, X. and Armstrong, C.L. (2004) Factors influencing Agrobacterium-
mediated transformation of monocotyledonous species. In Vitro Cellular and Development Biology
Plant 40, 3145.
Chiarrolla, C. (2006) Commodifying agricultural biodiversity and development-related issues. The Journal
of World Intellectual Property 9 (1), 2560.
Chin, H.E. and Roberts, E.H. (eds) (1980) Recalcitrant Crop Seeds. Tropical Press Sdn. Bhd., Kuala
Lumpur, Malaysia.
Cho, Y.G., Ishii, T., Temnykh, S., Chen, X., Lipovich, L., McCouch, S.R., Park, W.D., Ayres, N. and
Cartinhour, S. (2000) Diversity of microsatellites derived from genomic libraries and GeneBank
sequences in rice. Theoretical and Applied Genetics 100, 713722.
Choisne, N., Samain, S., Demange, N., Orjeda, G., Michelet, L., Pelletier, E., Salanoubat, M., Weissenbach,
J. and Quetier, F. (2007) The sequencing of plant nuclear genomes. In: Morot-Gaudry, J.F., Lea, P. and
Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 2351.
Choo, T.M., Reinbergs, E. and Park, S.J. (1982) Comparison of frequency distribution of doubled haploid
and single seed descent lines in barley. Theoretical and Applied Genetics 61, 215218.
Choo, T.M., Reinbergs, E. and Kasha, K.J. (1985) Use of haploids in breeding barley. Plant Breeding
Reviews 3, 219252.
Christensen, A.H., Sharrock, R.A. and Quail, P.H. (1992) Maize polyubiquitin genes: structure, thermal per-
turbation of expression and transcript splicing and promoter activity following transfer to protoplasts
by electroporation. Plant Molecular Biology 18, 675689.
Christiansen, M.J., Anderson, S.B. and Ortiz, R. (2002) Diversity changes in an intensively bred wheat
germplasm during the 20th century. Molecular Breeding 9, 111.
640 References

Christou, P. (1996) Transformation technology. Trends in Plant Science 1, 423431.


Christou, P. and Swain, W.F. (1990) Cotransformation frequencies of foreign genes in soybean cell cultures.
Theoretical and Applied Genetics 79, 337341.
Chung, S.M., Frankman, E.L. and Tzfira, T. (2005) A versatile vector system for multiple gene expression in
plants. Trends in Plant Science 10, 357361.
Chung, S.-M., Vaidya, M. and Tzfira, T. (2006) Agrobacterium is not alone: gene transfer to plants by viruses
and other bacteria. Trends in Plant Science 11, 14.
Churchill, G.A. and Doerge, R.W. (1994) Empirical threshold values for quantitative trait mapping. Genetics
138, 963971.
Clark, R.L., Shands, H.L., Bretting, P.K. and Eberhart, S.A. (1997) Germplasm regeneration: developments
in population genetics and their implications. Crop Science 37, 16.
Clark, R.M., Schweikert, G., Toomajian, C., Ossowski, S., Zeller, G., Shinn, P., Warthmann, N., Hu, T.T.,
Fu, G., Hinds, D.A., Chen, H., Frazer, K.A., Huson, D.H., Schlkopf, B., Nordborg, M., Rtsch, G.,
Ecker, J.R. and Weigel, D. (2007) Common sequence polymorphisms shaping genetic diversity in
Arabidopsis thaliana. Science 317, 338342.
Clarke, B.C. and Appels, R. (1998) A transient assay for evaluating promoters in wheat endosperm tissue.
Genome 41, 865871.
Clarke, J.H., Mithen, R., Brown, J.K.M. and Dean, C. (1995) QTL analysis of flowering time in Arabidopsis
thaliana. Molecular and General Genetics 248, 278286.
Coburn, J., Temnykh, S., Paul, E. and McCouch, S.R. (2002) Design and application of microsatellite marker
panels for semi-automated genotyping of rice (Oryza sativa L.). Crop Science 42, 20922099.
Cochrane, W. (1993) The Development of American Agriculture. University of Minnesota Press, Minneapolis,
Minnesota.
Codex Alimentarious Commission (2001) Codex Guidelines (ALINORM 01/22). FAO/WHO, Rome. Available
at: http:/www.codexalimentarious.net (accessed 17 November 2009).
Coe, E.H. (1959) A line of maize with high haploid frequency. American Naturalist 93, 381382.
Cogoni, C. and Macino, G. (2000) Post-transcriptional gene silencing across kingdoms. Genes and
Development 10, 638643.
Cokcerham, C.C. and Zeng, Z.-B. (1996) Design III with marker loci. Genetics 143, 14371456.
Colbert, T., Till, B.J., Tompa, R., Reynolds, S., Steine, M.N., Yeung, A.T., McCallum, C.M., Comai, L. and
Henikoff, S. (2001) High-throughput screening for induced point mutations. Plant Physiology 126,
480484.
Collard, B.C.Y. and Mackill, D.J. (2008) Marker-assisted selection: an approach for precision plant breeding
in the twenty-first century. Philosophical Transactions of The Royal Society B 363, 557572.
Collard, B.C.Y., Jahufer, M.Z.Z., Brouwer, J.B. and Pang, E.C.R. (2005) An introduction to markers, quan-
titative trait loci (QTL) mapping and marker-assisted selection for crop improvement: the basic con-
cepts. Euphytica 142, 169196.
Collard, B.C.Y., Vera Cruz, C.M., McNally, K.L., Virk, P.S. and Mackill, D.J. (2008) Rice molecular beeding
laboratories in the genomics era: current status and future considerations. International Journal of Plant
Genomics Article ID 524847, 25 pp. Available at: http://www.hindawi.com/journals/ijpg/2008/524847.
html (accessed 17 November 2009).
Collins, W.W. and Qualset, C.O. (1999) Biodiversity in Agroecosystems. CRC Press, Boca Raton, Florida.
Comai, L., Young, K., Till, B.J., Reynolds, S.H., Greene, E.A., Codomo, C.A., Enns, L.C., Johnson, J.E.,
Burtner, C., Odden, A.R. and Henikoff, S. (2004) Efficient discovery of DNA polymorphisms in natural
populations by Ecotilling. The Plant Journal 37, 778786.
Complex Trait Consortium (2004) The Collaborative Cross, a community resource for the genetic analysis
of complex traits. Nature Genetics 36, 11331137.
Comstock, R.E. and Robinson, H.F. (1952) Estimation of average dominance of genes. In: Gowen, J.W.
(ed.) Heterosis. Iowa State College Press, Ames, Iowa, pp. 494516.
Comstock, R.E., Robinson, H.F. and Harvey, P.H. (1949) A breeding procedure designed to make maxi-
mum use of both general and specific combining ability. Agronomy Journal 41, 360367.
Concibido, V.C., Denny, R.L., Lange, D.A., Orf, J.H. and Young, N.D. (1996) RFLP mapping and molecu-
lar marker-assisted selection of soybean cyst nematode resistance in PI 209332. Crop Science 36,
16431650.
Concibido, V.C., La Vallee, B., Mclaird, P., Pineda, N., Meyer, J., Hummel, L., Yang, J., Wu, K. and Delannay, X.
(2003) Introgression of a quantitative trait locus for yield from Glycine soja into commercial soybean
cultivars. Theoretical and Applied Genetics 106, 575582.
References 641

Cone, K.C., McMullen, M.D., Bi, I.V., Davis, G.L., Yim, Y.-S., Gardiner, J.M., Polacco, M.L., Sanchez-Villeda, H.,
Fang, Z., Schroeder, S.G., Havermann, S.A., Bowers, J.E., Paterson, A.H., Soderlund, C.A., Engler,
F.W., Wing, R.A. and Coe, E.H. (2002) Genetic, physical and informatics resources for maize. On the
road to an integrated map. Plant Physiology 130, 15981605.
Conner, A.J., Barrell, P.J., Baldwin, S.J., Lokerse, A.S., Cooper, P.A., Erasmuson, A.K., Nap, J.P. and Jacobs,
J.M.E. (2007) Intragenic vectors for gene transfer without foreign DNA. Euphytica 154, 341353.
Cooper, M. and Byth, D.E. (1996) Understanding plant adaptation to achieve systematic applied crop
improvement a fundamental challenge. In: Cooper, M. and Hammer, G.L. (eds) Plant Adaptation
and Crop Improvement. CAB International, Wallingford, UK, pp. 523.
Cooper, M. and Hammer, G.L. (1996) Synthesis of strategies for crop improvement. In: Cooper, M. and
Hammer, G.L. (eds) Plant Adaptation and Crop Improvement. CAB International, Wallingford, UK,
pp. 591623.
Cooper, M. and Podlich, D.W. (2002) The E(NK ) model: extending the NK model to incorporate gene-by-
environment interactions and epistasis for diploid genomes. Complexity 7, 3147.
Cooper, M., Podlich, D.W. and Chapman, S.C. (1999) Computer simulation linked to gene information data-
bases as a strategic research tool to evaluate molecular approaches for genetic improvement of crops.
Workshop on Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in
Water-Limited Environments, Cento Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mexico,
2125 June 1999. Available at: http://www.cimmyt.org/ABC/map/research_tools_results/wsmolecular/
workshopmolecular/WorkshopMolecularcontents.htm (accessed 30 June 2008).
Cooper, M., Chapman, S.C., Podlich, D.W. and Hammer, G.L. (2002a) The GP problem: quantifying gene-
to-phenotype relationships. In Silico Biology 2, 151164.
Cooper, M., Podlich, D.W., Micallef, K.P., Smith, O.S., Jensen, N.M., Chapman, S.C. and Kruger, N.L.
(2002b) Complexity, quantitative traits and plant breeding: a role for simulation modelling in the genetic
improvement of crops. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB
International, Wallingford, UK, pp. 143166.
Cooper, M., Smith, O.S., Graham, G., Arthur, L., Feng, L. and Bodlich, D.W. (2004) Genomics, genetics and
plant breeding: a private sector perspective. Crop Science 44, 19071914.
Cooper, M., Podlich, D.W. and Smith, O.S. (2005) Gene-to-phenotype and complex trait genetics. Australian
Journal of Agricultural Research 56, 895918.
Cooper, M., Podlich, D.W. and Luo, L. (2007) Modelling QTL effects and MAS in plant breeding. In: Varshney,
R.K. and Tuberosa, R. (eds) Genomics-Assisted Crop Improvement. Volume 1. Genomics Approaches
and Platforms. Springer, Dordrecht, Netherlands, pp. 5795.
Coors, J.G. (1999) Selection methodologies and heterosis. In: Coors, J.G. and Pandey, S. (eds) The Genetics
and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 225245.
Coque, M. and Gallais, A. (2006) Genomic regions involved in response to grain yield selection at high and
low nitrogen fertilization in maize. Theoretical and Applied Genetics 112, 12051220.
Corneille, S., Lutz, K., Svab, Z. and Maliga, P. (2001) Efficient elimination of selectable marker genes
from the plastid genome by the CRE-lox site-specific recombination system. The Plant Journal 27,
171178.
Cornelius, P.L. and Seyedsadr, M.S. (1997) Estimation of general linearbilinear models for two-way tables.
Journal of Statistical Computation and Simulation 58, 287322.
Cornelius, P.L., Seyedsadr, M. and Crossa, J. (1992) Using the shifted multiplicative model in search for
separability in corn cultivar trials. Theoretical and Applied Genetics 84, 161172.
Cornelius, P.L., van Sanford, D.A. and Seyedsadr, M.S. (1993) Clustering cultivars into groups without rank-
change interactions. Crop Science 33, 11931200.
Cornelius, P.L., Crossa, J. and Seyedsadr, M.S. (1996) Statistical tests and estimates of multiplicative mod-
els for GE interaction. In: Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction.
CRC Press, Boca Raton, Florida, pp. 199234.
Correns, C. (1901) Bastarde zwischen Maisrassen, mit besonderer Berucksichtigung der Xenien.
Bibliotheca Botanica 53, 1161.
Cottage, A., Yang, A.P., Maunders, H., de Lacy, R.C. and Ramsay, N.A. (2001) Identification of DNA sequences
flanking T-DNA insertion by PCR walking. Plant Molecular Biology Reporter 19, 321327.
Courtois, B. (1993) Comparison of single seed descent and anther culture-derived lines of three single
crosses of rice. Theoretical and Applied Genetics 85, 625631.
Courtois, B., McLaren, G., Sinha, P.K., Prasad, K., Yadav, R. and Shen, L. (2000) Mapping QTL associated
with drought avoidance in upland rice. Molecular Breeding 6, 5566.
642 References

Coutu, C., Brandle, J., Brown, D., Brown, K., Miki, B., Simmonds, J. and Hegedus, D.D. (2007) pORE:
a modular binary vector series suited for both monocot and dicot plant transformation. Transgenic
Research 16, 771781.
Craig, W., Tepfer, M., Degrassi, G. and Ripandelli, D. (2008) An overview of general feature of rick assess-
ments and genetically modified crops. Euphytica 164, 853880.
Cravatt, B.F., Simon, G.M. and Yates, J.R. (2007) The biological impact of mass-spectrometry-based pro-
teomics. Nature 450, 9911000.
Cregan, P.B., Shoemaker, R.C. and Specht, J.E. 1999) An integrated genetic linkage map of the soybean
genome. Crop Science 39, 14641490.
Cresham, D., Dunham, M.J. and Botstein, D. (2008) Comparing whole genomes using DNA microarrays.
Nature Reviews Genetics 9, 291302.
Crosbie, T.M., Eathington, S.R., Johnson, G.R., Edwards, M., Reiter, R., Stark, S., Mohanty, R.G., Oyervides,
M., Buehler, R.E., Walker, A.K., Dobert, R., Delannay, X., Pershing, J.C., Hall, M.A. and Lamkey, K.R.
(2006) Plant breeding: past, present and future. In: Lamkey, K.R. and Lee, M. (eds) Plant Breeding: the
Arnel R. Hallauer International Symposium. Blackwell Publishing, Oxford, UK, pp. 350.
Croser, J.S., Lulsdorf, M.M., Davies, P.A., Clarke, H.J., Dayliss, K.L., Mallikarjuna, N. and Siddique, K.H.M.
(2006) Toward doubled haploid production in the Fabaceae: progress, constraints and opportunities.
Critical Reviews in Plant Sciences 25, 139157.
Crossa, J. and Cornelius, P.L. (1997) Sites regression and shifted multiplicative model clustering of cultivar
trial sizes under heterogeneity of error variances. Crop Science 37, 406415.
Crossa, J. and Cornelius, P. (2002) Linearbilinear models for the analysis of genotype-environment inter-
action. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International,
Wallingford, UK, pp. 305322.
Crossa, J. and Franco, J. (2004) Statistical methods for classifying genotypes. Euphytica 137, 1937.
Crossa, J., Cornelius, P.L., Seyedsadr, M. and Byrne, P. (1993) A shifted multiplicative model cluster ana-
lysis for grouping environments without genotypic rank change. Theoretical and Applied Genetics 85,
577586.
Crossa, J., Cornelius, P.L., Sayre, K. and Ortiz-Monasterio, R.J.I. (1995) A shifted multiplicative model
fusion method for grouping environments without cultivar rank change. Crop Science 35, 5462.
Crossa, J., Cornelius, P.L. and Seyedsadr, M.S. (1996) Using the shifted multiplicative model cluster meth-
ods for crossover GE interaction. In: Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment
Interaction. CRC Press, Boca Raton, Florida, pp. 175198.
Crossa, J., Vargas, M., van Eeuwijk, F.A., Jiang, C., Edmeades, G.O. and Hoisington, D. (1999) Interpreting
genotype environment interaction in tropical maize using linked molecular markers and environmen-
tal covariables. Theoretical and Applied Genetics 99, 611625.
Crossa, J., Cornelius, P.L. and Yan, W. (2002) Biplots of linearbilinear models for studying crossover geno-
type environment interaction. Crop Science 42, 619633.
Crossa, J., Yang, R.-C. and Cornelius, P.L. (2004) Studying crossover genotype environment interaction
using linearbilinear models and mixed models. Journal of Agricultural Biological and Environmental
Statistics 9, 362380.
Crossa, J., Burgueo, J., Autran, D., Vielle-Calzada, J.-P., Cornelius, P.L., Garcia, N., Salamanca, F. and
Arenas, D. (2005) Using linearbilinear models for studying gene-expression treatment interac-
tion in microarray experiments. Journal of Agricultural, Biological and Environmental Statistics 10,
337353.
Crossa, J., Burgueo, J., Cornelius, P.L., McLaren, G., Trethowan, R. and Krischnamachari, A. (2006)
Modeling genotype environment interaction using additive genetic covariance of relatives for pre-
dicting breeding values of wheat genotypes. Crop Science 46, 17221733.
Crossa, J., Burdueno, J., Dreisigacker, S., Vargas, M., Herrera-Foessel, S.A., Lillemo, M., Singh, R.P.,
Trethowan, R., Warburton, M., Franco, J., Reynolds, M., Crouch, J.H. and Ortiz, R. (2007) Association
analysis of historical bread wheat germplasm using additive genetic covariance of relatives and popu-
lation structure. Genetics 177, 18891013.
Crow, J.F. (1999) Dominance and overdominance. In: Coors, J.G. and Pandey, S. (eds) Genetics and
Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 4958.
Crow, J.F. (2000) The rise and fall of overdominance. Plant Breeding Reviews 17, 225257.
Cui, Y. and Wu, R. (2005) Statistical model for characterizing epistatic control of triploid endosperm trig-
gered by maternal and offspring QTLs. Genetical Research 86, 6575.
Cullis, C.A. (2004) Plant Genomics and Proteomics. John Wiley & Sons, Inc., Chichester, UK.
References 643

Curtis, J.J., Brunson, A.M., Hubbard, J.E. and Earle, F.R. (1956) Effect of the parent on oil content of the
corn kernel. Agronomy Journal 48, 551555.
Curtis, M.D. and Grossniklaus, U. (2003) A Gateway cloning vector set for high-throughput functional ana-
lysis of genes in planta. Plant Physiology 133, 462469.
Dafny-Yelin, M. and Tzfira, Z. (2007) Delivery of multiple transgenes to plant cells. Plant Physiology 145,
11181128.
DAmato, F. (1975) The problem of genetic stability in plant tissues and cell cultures. In: Frankel, O. and
Hawkes, J.G. (eds) Crop Genetic Resources for Today and Tomorrow. Cambridge University Press,
Cambridge, UK, pp. 333348.
Damude, H.G. and Kinney, A.J. (2008) Enhancing plant seed oils for human nutrition. Plant Physiology
147, 962968.
Daniell, H. and Dhingra, A. (2002) Multigene engineering: dawn of an exciting new era in biotechnology.
Current Opinion in Biotechnology 13, 136141.
Dargie, J.D. (2007) Marker-assisted selection: policy considerations and options for developing countries.
In: Guimares, E.P., Ruane, J., Scherf, B.D., Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted
Selection, Current Status and Future Perspectives in Crops, Livestock, Forestry and Fish. Food and
Agriculture Organization of the Unites Nations, Rome, pp. 441471.
Darrah, L.L. and Zuber, M.S. (1986) 1985 United States maize germplasm base and commercial breeding
strategy. Crop Science 26, 11091113.
Darvasi, A. and Soller, M. (1992) Selective genotyping for determination of linkage between a molecular
marker and a quantitative trait. Theoretical and Applied Genetics 85, 353359.
Darvasi, A. and Soller, M. (1994) Selective DNA pooling for determination of linkage between a molecular
marker and a quantitative trait. Genetics 138, 13651373.
Darvasi, A. and Soller, M. (1995) Advanced intercross lines, an experimental population for fine genetic
mapping. Genetics 141, 11991207.
Darvasi, A. and Soller, M. (1997) A simple method to calculate resolving power and confidence interval of
QTL map location. Behavior Genetics 27, 125132.
Darvasi, A., Weinreb, A., Minke, V., Weller, J.I. and Soller, M. (1993) Detecting marker-QTL linkage and esti-
mating QTL gene effect and map location using a saturated genetic map. Genetics 134, 943951.
Datta, K., Vasquez, A., Tu, J., Torrizo, L., Alam, M.F., Oliva, N., Abrigo, E., Khush, G.S. and Datta, S.K.
(1998) Constitutive and tissue-specific differential expression of cryIA(b) gene in transgenic rice
plants conferring resistance to rice insect pest. Theoretical and Applied Genetics 97, 2030.
Datta, K., Tu, J., Oliva, N., Ona, I., Velazhahan, R., Mew, T.W., Muthukrishnan, S. and Datta, S.K. (2001)
Enhanced resistance to sheath blight by constitutive expression of infection-related rice chitinase in
transgenic elite indica rice cultivars. Plant Science 160, 405414.
Datta, K., Baisakh, N., Thet, K.M., Tu, J. and Datta, S.K. (2002) Pyramiding transgenes for multiple resist-
ance in rice against bacterial blight, yellow stem borer and sheath blight. Theoretical and Applied
Genetics 106, 18.
Datta, K., Baisakh, N., Oliva, N., Torrizo, L., Abrigo, E., Tan, J., Rai, M., Rehana, S., Al-Babili, S., Beyer,
P., Potrykus, I. and Datta, S.K. (2003) Bioengineered golden indica rice cultivars with beta-carotene
metabolism in the endosperm with hygromycin and mannose selection systems. Plant Biotechnology
Journal 1, 8190.
Davenport, C.B. (1908) Degeneration, albinism and inbreeding. Science 28, 454455.
Davenport, G., Ellis, N., Ambrose, M. and Dicks, J. (2004) Using bioinformatics to analyse germplasm col-
lections. Euphytica 137, 3954.
Davuluri, R.V. and Zhang, M.Q. (2003) Computer software to find genes in plant genomic DNA. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236: Plant Functional Genomics: Methods and
Protocols. Humana Press, Inc., Totowa, New Jersey, pp. 87107.
Day, C.D., Lee, E., Kobayashi, J., Holappa, L.D., Albert, H. and Ow, D.W. (2000) Transgene integration into
the same chromosome location can produce alleles that express at a predictable level, or alleles that
are differentially silenced. Genes and Development 14, 28692880.
Day Rubenstein, K., Heisey, P., Shoemaker, R., Sullivan, J. and Frisvold, G. (2005) Economic Information
Bulletin No. (EIE2), p. 47. Available at: http://www.ers.usda.gov/publications/eib2/ (accessed 17
November 2009).
De Buck, S., Jacobs, A., Van Montagu, M. and Depicker, A. (1999) The DNA sequences of T-DNA junctions
suggest that complex T-DNA loci are formed by a recombination process resembling T-DNA integra-
tion. The Plant Journal 20, 295304.
644 References

De Buck, S., De Wilde, C., Van Montagu, M. and Depicker, A. (2000) T-DNA vector backbone sequences
are frequently integrated into the genome of transgenic plants obtained by Agrobacterium mediated
transformation. Molecular Breeding 6, 459468.
De Cosa, B., Moar, W., Lee, S.B., Miller, M. and Daniell, H. (2001) Overexpression of the Bt cry2Aa2 operon
in chloroplasts leads to formation of insecticidal crystals. Nature Biotechnology 19, 7174.
De Groote, H., Wangare, L., Kanampiu, F., Odendo, M., Diallo, A., Karaya, H. and Friesen, D. (2008) The
potential of a herbicide resistant maize technology for Striga control in Africa. Agricultural Systems
97, 8394.
De Hoog, C.L. and Mann, M. (2004) Proteomics. Annual Review of Genomics and Human Genetics 5,
267293.
de Koning, D.J. and Haley, C.S. (2005) Genetical genomics in humans and model organisms. Trends in
Genetics 21, 377381.
De Neve, M., De Buck, S., Jacobs, A., Van Montagu, M. and Depicker, A. (1997) T-DNA integration patterns
in co-transformed plant cells suggest that T-DNA repeats originate from co-integration of separate
T-DNAs. The Plant Journal 11, 1529.
De Silva, H.N. and Ball, R.D. (2007) Linkage disequilibrium mapping concepts. In: Oraguzie, N.C.,
Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in Plants. Springer,
Berlin, pp. 103132.
De Vicente, M.C. and Tanksley, S.D. (1991) Genome-wide reduction in recombination of backcross progeny
derived from male versus female gametes in an interspecific cross of tomato. Theoretical and Applied
Genetics 83, 173178.
De Vicente, M.C. and Tanksley, S.D. (1993) QTL analysis of transgressive segregation in an interspecific
tomato cross. Genetics 134, 585596.
Dean, R.E., Dahlberg, J.A., Hopkins, M.S. and Kresovich, S. (1999) Genetic redundancy and diversity
among Orange accessions in the U.S. national sorghum collection as assessed with simple sequence
repeat (SSR) markers. Crop Science 39, 12151221.
Deimling, S., Rber, F.K. and Geiger, H.H. (1997) Methodik und Genetik der in-vivo-Haploideninduktion bei
Mais. Vortr. Pflanzenzchtg. 38, 203224.
DeLacy, I.H. and Cooper, M. (1990) Pattern analysis for the analysis of regional variety trials. In: Kang,
M.S. (ed.) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University
Agricultural Center, Baton Rouge, Louisiana, pp. 301334.
DeLacy, I.H., Cooper, M. and Basford, K.E. (1996) Relationships among analytical methods used to study
genotype-by-environment interactions and evaluation of their impact on response to selection. In:
Kang, M.S. and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton,
Florida, pp. 5184.
DellaPenna, D. and Last, R.L. (2008) Genome-enabled approaches shed new light on plant metabolism.
Science 320, 479481.
Delmer, D.P. (2005) Agriculture in the developing world: connecting innovations in plant research to
downstream applications. Proceedings of the National Academy of Sciences of the United States of
America 102, 1573915746.
Delseny, M. (2004) Re-evaluating the relevance of ancestral shared synteny as a tool for crop improvement.
Current Opinion in Plant Biology 7, 126131.
Delvin, B. and Risch, N. (1995) A comparison of linkage disequilibrium measures for fine-scale mapping.
Genomics 29, 311322.
Dempster, A.P., Laid, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM
algorithm. Journal of the Royal Statistical Society Series B 39, 138.
Depicker, A., Stachel, S., Dhaese, P., Zambryski, P. and Goodman, H.M. (1982) Nopaline synthase: tran-
script mapping and DNA sequence. Journal of Molecular and Applied Genetics 1, 561573.
Dereuddre, J., Blandin, S. and Hassen, N. (1991) Resistance of alginate-coated somatic embryos of carrot
(Daucus carota L.) to desiccation and freezing in liquid nitrogen: 1. Effects of preculture. Cryo-Letters
12, 125134.
Desloire, S., Gherbi, H., Laloui, W., Marhadour, S., Clouet, V., Cattolico, L., Falentin, C., Giancola, S.,
Renard, M., Budar, F., Small, I., Caboche, M., Delourme, R. and Bendahmane, A. (2003) Identification
of the fertility restoration locus, Rfo, in radish, as a member of the pentatricopeptide-repeat protein
family. EMBO Reports 4, 588594.
Devaux, P. and Zivy, M. (1994) Protein markers for anther culturability in barley. Theoretical and Applied
Genetics 88, 701706.
References 645

Devaux, P., Kilian, A. and Kleinhofs, A. (1995) Comparative mapping of the barley genome with male and female
recombination-derived, doubled haploid populations. Molecular and General Genetics 249, 600608.
DeVerna, J.W., Chetelat, R.T., Rick, C.M. and Stevens, M.A. (1987) Introgression of Solanum lycoper-
sicoides germplasm. In: Nevins, D.J. and Jones, R.A. (eds) Tomato Biotechnology. Proc. Seminar,
University of California, Davis, California, 2022 August 1986. Plant Biology Vol.4, Alan R. Liss, New
York, pp. 2736.
DeVerna, J.W., Rick, C.M., Chetelat, R.T., Lanini, B.J. and Alpert, K.B. (1990) Sexual hybridization of
Lycopersicon esculentum and Solanum rickii by means of a sesquidiploid bridging hybrid. Proceedings
of the National Academy of Sciences of the Unites States of America 87, 94869490.
Dhillon, B.S., Boppenmaier, J., Pollmer, W.G., Hermann, R.G. and Mechinger, A.E. (1993) Relationship of
restriction fragment length polymorphisms among European maize inbreds with ear dry matter yield
of their hybrids. Maydica 38, 245248.
Dhoop, B.B., Paulo, M.J., Mank, R.A., van Eck, H.J. and van Eeuwijk, F.A. (2008) Association mapping of
quality traits in potato (Solanum tuberosum L.). Theoretical and Applied Genetics 161, 4760.
Dhungana, P., Eskridge, K.M., Baenziger, P.S., Champbell, B.T., Gill, K.S. and Dweikat, I. (2007) Analysis
of genotype-by-environment interaction in wheat using a structural equation model and chromosome
substitution lines. Crop Science 47, 477484.
Dias, A.P., Brown, J., Bonello, P. and Brotewold, E. (2003) Metabolite profiling as a functional genomics tool.
In: Grotewold, E. (ed.) Methods in Molecular Biology 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Totowa, New Jersey, pp. 415425.
Diatchenko, L., Lau, Y.-F.C., Campbell, A.P., Chenchik, A., Moqadam, F., Huang, B., Lukyanov, S.,
Lukyanov, K., Gurskaya, N., Sverdlov, E.D. and Siebert, P.D. (1996) Suppression subtractive hybridi-
zation: a method for generating differentially regulated or tissue-specific cDNA probes and libraries.
Proceedings of the National Academy of Sciences of the United States of America 93, 60256030.
Dijkhuizen, A., Dudley, J.W., Rocheford, T.R., Haken, A.E. and Eckhoff, S.R. (1998) Comparative analysis
for kernel composition using near infrared reflectance and 100g Wetmill Analysis. Cereal Chemistry
75, 266270.
Dilday, R.H. (1990) Contribution of ancestral lines in the development of new cultivars of rice. Crop Science
30, 905911.
Dinka, S.J., Campbell, M.A., Demers, T. and Raizada, M.N. (2007) Predicting the size of the progeny map-
ping population required to positionally clone a gene. Genetics 176, 20352054.
Diretto, G., Al-Babili, S., Tavazza, R., Papacchioli, V., Beyer, P. and Giiliano, G. (2008) Metabolic engineering
of potato carotenoid content through tuber-specific over-expression of a bacterial mini-pathway. PLoS
ONE 2(4), e350. doi:10.1371/journal.pone.0000350. Available at: http://www.plosone.org (accessed
17 November 2009).
Ditt, R.F., Nester, E.W. and Comai, L. (2001) Plant gene expression to Agrobacterium tumefa-
ciens. Proceedings of the National Academy of Sciences of the United States of America 98,
1095410959.
Dixon, A.L., Liang, L., Moffatt, M.F., Chen, W., Heath, S., Wong, K.C., Taylor, J., Burnett, E., Gut, I., Farrall,
M., Lathrop, G.M., Abecasis, G.R. and Cookson, W.O.C. (2007) A genome-wide association study of
global gene expression. Nature Genetics 39, 12021207.
Dodds, J.H. (1991) Introduction: conservation of plant genetic resources the need for tissue culture. In:
Dodds, J.H. (ed.) In Vitro Methods for Conservation of Plant Genetic Resources. Chapman & Hall,
London, pp. 19.
Doebley, J. (1992) Molecular systematics and crop evolution. In: Soltis, D.E., Soltis, P.S. and Doyle, J.J.
(eds) Molecular Systematics of Plants. Chapman & Hall, New York, pp. 202222.
Doebley, J., Stec, A. and Gustus, C. (1995) Teosinte branched1 and the origin of maize: evidence for epista-
sis and the evolution of dominance. Genetics 141, 333346.
Doerge, R.W. and Churchill, G.A. (1996) Permutation tests for multiple loci affecting a quantitative charac-
ter. Genetics 142, 285294.
Doi, K., Izawa, T., Fuse, T., Yamanouchi, U., Kubo, T., Shimatani, Z., Yano, M. and Yoshimura, A. (2004)
Ehd1, a B-type response regulator in rice, confers short-day promotion of flowering and controls
FT-like gene expression independently of Hd1. Genes and Development 18, 926936.
Doll, J. (1998) The patent of DNA. Science 280, 689690.
Dong, Y.S., Cao, Y.S., Zhang, X.Y., Liu, S.C., Wang, L.F., You, G.X., Pang, B.S., Li, L.H. and Jia, J.Z. (2003)
Establishment of candidate core collections in Chinese common wheat germplasm. Journal of Plant
Genetic Resources 4, 18.
646 References

Donnenwirth, J., Grace, J. and Smith, S. (2004) Intellectual property rights, patents, plant variety protection
and contracts: a perspective from the private sector. IP Strategy Today, No. 9.
Doumas, P., Al-Ghazi, Y., Rothan, C. and Robin, S. (2007) DNA microarrays in plants. In: Morot-Gaudry, J.F.,
Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire,
pp. 165190.
Dreher, K., Khairallah, M., Ribau, J.M. and Morris, M. (2003) Money matters (I): cost of field and laboratory
procedures associated with conventional and marker-assisted maize breeding at CIMMYT. Molecular
Breeding 11, 221234.
Dubcovsky, J. (2004) Marker-assisted selection in public breeding programs. The wheat experience. Crop
Science 44, 18951898.
Dubcovsky, J., Ramakrishna, W., SanMiguel, P.J., Busso, C.S., Yan, L., Shiloff, B.A. and Bennetzen, J.L.
(2001) Comparative sequence analysis of colinear barley and rice BACs. Plant Physiology 125,
13421353.
Dudley, D.N., Saghai Maroof, M.A. and Rufener, G.K. (1991) Molecular markers and grouping of parents in
a maize breeding program. Crop Science 31, 718723.
Dudley, J.W. (1977) Seventy six generations of selection for oil and protein percentage in maize. In: Pollak,
E., Kempthorne, O. and Bailey, T.B. (eds) Proceedings of International Conference on Quantitative
Genetics. Iowa State University Press, Ames, Iowa, pp. 459473.
Dudley, J.W. (1993) Molecular markers in plant improvement: manipulation of genes affecting quantitative
traits. Crop Science 33, 660668.
Dudley, J.W. (1997) Quantitative genetics and plant breeding. Advances in Agronomy 59, 123.
Dudley, J.W. (2007) From means to QTL: the Illinois Long-Term Selection Experiment as a case study in
quantitative genetics. Crop Science 47(S3), S20S31.
Dudley, J.W. (2008) Epistatic interactions in crosses of Illinois High Oil Illinois Low Oil and of Illinois High
Protein Illinois Low Protein corn strains. Crop Science 48, 5968.
Dudley, J.W. and Lambert, R.J. (1992) Ninety generations of selection for oil and protein in maize. Maydica
37, 8187.
Dudley, J.W. and Lambert, R.J. (2004) 100 generations of selection for oil and protein in corn. Plant Breeding
Reviews 24 (Part 1), 79110.
Dudley, J.W., Lambert, R.J. and Alexander, D.E. (1974) Seventy generations of selection for oil and protein
concentration in the maize kernel. In: Dudley, J.W. (ed.) Seventy Generations of Selection for Oil and
Protein in Maize. Crop Science Society of America, Madison, Wisconsin, pp. 181212.
Dudley, J.W., Lambert, R.J. and de la Roche, I.A. (1977) Genetic analysis of crosses among corn strains
divergently selected for percent oil and protein. Crop Science 17, 111117.
Dunford, R.P., Yano, M., Kurata, N., Sasaki, T., Huestis, G., Rocheford, T. and Laurie, D.A. (2002)
Comparative mapping of the barley Phd-H1 photoperiod response gene region, which lies close to a
junction between two rice linkage segments. Genetics 161, 825834.
Dunn, G. and Everitt, B.S. (1982) An Introduction to Mathematical Taxonomy. Cambridge Studies in
Mathematical Biology. Vol. 5. Cambridge University Press, Cambridge, UK.
Dunning, A.M., Durocher, F., Healey, C.S., Teare, M.D., McBride, S.E., Carlomagno, F., Xu, C.-F., Dawson, E.,
Rhodes, S., Ueda, S., Lai, E., Luben, R.N., Van Rensburg, E.J., Mannermaa, A., Kataja, V., Rennart, G.,
Dunham, I., Purvis, I., Easton, D. and Ponder, B.A.J. (2000) The extent of linkage disequilibrium in four
populations with distinct demographic histories. American Journal of Human Genetics 67, 15441554.
Dunninnton, E.A., Haberefeld, A., Stallard, L.G., Siegel, P.B. and Hillel, J. (1992) Deoxyribonucleic-acid fin-
gerprint bands linked to loci coding for quantitative traits in chicken. Poultry Science 71, 12511258.
Dunwell, J.M. (2005) Intellectual property aspects of plant transformation. Plant Biotechnology Journal 3,
371384.
Dunwell, J.M. (2006) Patents and transgenic plants. In: Fri, M.G., Holb, I. and Bisztray, G.D. (eds) Proceedings
of Vth International Symposium on In Vitro Culture and Horticultural Breeding. International Society
for Horticultural Science. Acta Horticulturae 725, 719732.
Dutfield, G. (2003) Protecting Traditional Knowledge and Folklore, Issue Paper 1. International Centre
on Trade and Sustainable Development and United Nations Conference on Trade and Development
Project on Intellectual Property Rights and Sustainable Development, Geneva.
Duvick, D.N. (1977) Major USA crops in 1976. Annals of the New York Academy of Sciences 287,
8696.
Duvick, D.N. (1984) Genetic contribution to yield grains of U.S. hybrid maize, 19301980. In: Fehr, W.R.
(ed.) Genetic Contributions to Yield Grains of Five Major Crop Plants. Crop Science Society of America
References 647

(CSSA) Spec. Publ. 7. CSSA and American Society of Agronomy (ASA), Madison, Wisconsin, pp.
1547.
Duvick, D.N. (1990) Genetic enhancement and plant breeding. In: Janick, J. and Simon, J.E. (eds) Advances
in New Crops. Proc. First National Symposium on New Crops: Research, Development, Economics.
Timber Press, Portland, Oregon, pp. 9096.
Duvick, D.N. (1999) Heterosis: feeding people and protecting natural resources. In: Coors, J.G. and Pandey,
S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison, Wisconsin, pp.
1929.
Duvick, D.N., Smith, J.S.C. and Cooper, M. (2004) Long-term selection in commercial hybrid maize breed-
ing programs. Plant Breeding Reviews 24 (Part 2), 109151.
Dwivedi, S.L., Blair, M., Upadhyaya, H.D., Serraj, R., Balaji, J., Buhariwalla, H.K., Ortiz, R. and Crouch,
J.H. (2005) Using genomics to exploit grain legume biodiversity in crop improvement. Plant Breeding
Reviews 26, 176357.
Dwivedi, S.L., Crouch, J.H., Mackill, D.J., Xu, Y., Blair, M.W., Ragot, M., Upadhyaya, H.D. and Ortiz, R.
(2007) The molecularization of public sector crop breeding: progress, problems and prospects.
Advances in Agronomy 95, 163318.
Eagles, H.A., Bariana, H.S., Ogbonnaya, F.C., Rebetzke, G.J., Hollamby, G.J., Henry, R.J., Henschke, P.H.
and Carter, M. (2001) Implementation of markers in Australian wheat breeding. Australian Journal of
Agricultural Research 52, 13491356.
Eagles, H.A., Hollamby, G.J., Gororo, N.N. and Eastwood, R.F. (2002) Estimation and utilization of glutein
gene effects from the analysis of unbalanced data from wheat breeding programs. Australian Journal
of Agricultural Research 53, 367377.
Eagles, H.A., Eastwood, R.F., Hollamby, G.J., Martin, E.M. and Cornish, G.B. (2004) Revision of the esti-
mates of glutenin gene effects at the Glu-B1 locus form southern Australian wheat breeding pro-
grams. Australian Journal of Agricultural Research 55, 10931096.
Eamens, A., Wang, M.-B., Smith, N.A. and Waterhouse, P.M. (2008) RNA silencing in plants: yesterday,
today and tomorrow. Plant Physiology 147, 456468.
Earley, K.W., Haag, J.R., Pontes, O., Opper, K., Juehne, T., Song, K. and Pikaard, C.S. (2006) GATEWAY-
compatible vectors for plant functional genomics and proteomics. Plant Journal 45, 616629.
East, E.M. (1908) Inbreeding in corn. Rep. Connecticut Expt. Stat. Years 19071908, pp. 419428.
Eathington, S.R. (2005) Practical applications of molecular technology in the development of commercial
maize hybrids. In: Proceedings of the 60th Annual Corn and Sorghum Seed Research Conferences.
American Seed Trade Association, Washington, DC.
Eathington, S.R., Crosbie, T.M., Edwards, M.D., Reiter, R.S. and Bull, J.K. (2007) Molecular markers in a
commercial breeding program. Crop Science 47(S3), S154S163.
Eberhart, S.A. and Russell, W.A. (1966) Stability parameters for comparing varieties. Crop Science 6,
3640.
Ebinuma, H.K., Sugita, K., Matsunaga, E., Endo, S., Yamada, K. and Komamine, A. (2001) Systems for
removal of a selection marker and their combination with a positive marker. Plant Cell Reports 20,
383392.
Ebinuma, H.K., Sugita, E., Endo, S., Matsunaga, E. and Yamada, K. (2004) Elimination of markers genes
from transgenic plants using MAT vector system. In: Pea, L. (ed.) Methods in Molecular Biology,
vol. 286: Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp.
237253.
Eder, J. and Chalyk, S. (2002) In vivo haploid induction in maize. Theoretical and Applied Genetics 104,
703708.
Edmeades, G.O., Bnziger, M. and Ribaut, J.M. (2000) Maize improvement for drought-limited environ-
ments. In: Otegui, M.E. and Slafer, G.A. (eds) Physiological Bases for Maize Improvement. Food
Products Press, New York, pp. 75111.
Edwards, D. and Batley, J. (2004) Plant bioinformatics: from genome to phenome. Trends in Biotechnology
22, 232237.
Edwards, D., Forster, J.W., Chagn, D. and Batley, J. (2007a) What is SNPs? In: Oraguzie, N.C., Rikkerink,
E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in Plants. Springer, Berlin, pp.
4152.
Edwards, D., Forster, J.W., Cogan, N.O.I., Batley, J. and Chagn, D. (2007b) Single nucleotide poly-
morphism discovery. In: Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds)
Association Mapping in Plants. Springer, Berlin, pp. 5376.
648 References

Edwards, J.D., Janda, J., Sweeney, M.T., Gaikwad, A.B., Liu, B., Leung, H. and Galbraith, D.W. (2008)
Development and evaluation of a high-throughput, low-cost genotyping platform based on oligonucle-
otide microarrays in rice. Plant Methods 4, 13.
Edwards, M. and Johnson, L. (1994) RFLPs for rapid recurrent selection. In: Proceedings of Symposium
on Analysis of Molecular Marker Data. American Society of Horticultural Science and Crop Science
Society of America, Corvallis, Oregon, pp. 3340.
Edwards, M.D. and Page, N.J. (1994) Evaluation of marker-assisted selection through computer simulation.
Theoretical and Applied Genetics 88, 376382.
Edwards, M.D., Stuber, C.W. and Wendel, J.F. (1987) Molecular-marker-facilitated investigations of quan-
titative trait loci in maize. I. Numbers, genomic distribution and types of gene action. Genetics 116,
113125.
Edwards, M.D., Helentjaris, T., Wright, S. and Stuber, C.W. (1992) Molecular-marker-facilitated inves-
tigations of quantitative trait loci in maize. 4. Analysis based on genome saturation with isozyme
and restriction fragment length polymorphism markers. Theoretical and Applied Genetics 83,
765774.
Eisemann, R.L., Cooper, M. and Woodruff, D.R. (1990) Beyond the analytical methodology, better inter-
pretation and exploiting of GE interaction in plant breeding. In: Kang, M.S. (ed.) Genotype-by-
Environment Interaction and Plant Breeding. Louisiana University Agricultural Center, Baton Rouge,
Louisiana, pp. 108117.
Eitan, Y. and Soller, M. (2004) Selection induced genetic variation. In: Wasser, S. (ed.) Evolutionary Theory
and Processes: Modern Horizon. Papers in honour of Eviatar Nevo. Kluwer Academic Publishers,
Dordrecht, Netherlands, pp. 154176.
Elston, R.C. (1984) The genetic analysis of quantitative trait differences between two homozygous lines.
Genetics 108, 733744.
Emebiri, L.C. and Moody, D.B. (2006) Heritable basis for some genotype-environment stability statistics:
inference from QTL analysis of heading date in two-rowed barley. Field Crops Research 96, 243251.
Empig, L.T., Gardner, C.O. and Compton, W.A. (1972) Theoretical grains for different population improve-
ment procedures. Nebraska Agricultural Experiment Station Miscellaneous Publications 26 (revised).
Emrich, S., Li, L., Wen, T.J., Ashlock, D., Aluru, S. and Schnable, P. (2007b) Nearly identical paralogs:
implications for maize (Zea mays L.) genome evolution. Genetics 175, 429439.
Endo, S., Kasahara, Y., Sugita, K. and Ebinuma, H. (2002a) A new GST-MAT vector containing both the ipt
gene and iaaM/H genes can produce marker-free transgenic plants with high frequency. Plant Cell
Reports 20, 923928.
Endo, S., Sugita, K., Sakai, M., Tanaka, H. and Ebinuma, H. (2002b) Single-step transformation for generat-
ing marker-free transgenic rice using the ipt-type MAT vector system. The Plant Journal 30, 115122.
Engelmann, F. and Engels, J.M.M. (2002) Technologies and strategies for ex situ conservation. In: Engels,
J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic Diversity.
International Plant Genetic Resources Institute, Rome, pp. 89103.
Engels, J.M.M. and Visser, L. (2003) A guide to effective management of germplasm collections. IPGRI
Handbook for Genebanks No. 6. International Plant Genetic Resources Institute, Rome.
Enserink, M. (2008) Tough lessons from golden rice. Science 320, 468471.
Eronen, L., Geerts, F. and Toivonen, H. (2004) A Markov chain approach to reconstruction of long haplo-
types. Pacific Symposium on Biocomputing 9, 104115.
Ervin, D., Batie, S., Welsh, R., Carpentier, C.L., Fern, J.I., Richman, N.J. and Schulz, M.A. (2000) Transgenic
Crops: an Environmental Assessment. Henry A. Wallace Center for Agricultural and Environmental
Policy at Winrock International, Arlington, Virginia.
Erwin, T. (1991) An evolutionary basis for conservation strategies. Science 253, 750752.
Eshed, Y. and Zamir, D. (1994) A genomic library of Lycopersicon pennellii in L. esculentum: a tool for fine
mapping of genes. Euphytica 79, 175179.
Eshed, Y. and Zamir, D. (1995) An introgression line population of Lycopersicon pennellii in the cultivated
tomato enables the identification and fine mapping of yield associated QTL. Genetics 141, 11471162.
Eshed, Y. and Zamir, D. (1996) Less-than-additive epistatic interactions of quantitative trait loci in tomato.
Genetics 143, 18071817.
Esquinas-Alczar, J.T. (1993) Plant genetic resources. In: Hayward, M.D., Bosemark, N.O. and Romagosa, I.
(eds) Plant Breeding: Principles and Prospects. Chapman & Hall, London, pp. 3351.
Esquinas-Alczar, J. (2005) Protecting crop genetic diversity for food security: political, ethical and techni-
cal challenges. Nature Reviews Genetics 6, 946953.
References 649

ETC Group (Action Group on Erosion, Technology and Concentration) (2005) Global seed industry con-
centration 2005. Communique September/October 2005, pp. 112.
Etzel, C. and Guerra, R. (2003) Meta-analysis of genetic-linkage of quantitative trait loci. American Journal
of Human Genetics 71, 5665.
Eujayl, I., Sorrels, M.E., Baum, M., Wolters, P. and Powell, W. (2002) Isolation of EST-derived microsatel-
lite markers for genotyping the A and B genomes of wheat. Theoretical and Applied Genetics 104,
399407.
European Parliament (2001) Directive 2001/18/EC of the European Parliament and of the Council of 12
March 2001 on the deliberate release into the environment of genetically modified organisms and
repealing Council Directive 90/220/EEC Commission Declaration. Official Journal of European
Community L 106, 139.
Evans, L.T. (1993) Crop Evolution, Adaptation and Yield. Cambridge University Press, New York.
Faham, M., Zheng, J., Moorhead, M., Fakhrai-Rad, H., Namsaraev, E., Wong, K., Wang, Z., Chow,
S.G., Lee, L., Suyenaga, K., Reichert, J., Boudreau, A., Eberle, J., Bruckner, C., Jain, M., Karlin-
Neumann, G., Jones, H.B., Willis, T.D., Buxbaum, J.D. and Davis, R.W. (2005) Multiplexed variation
scanning for 1,000 amplicons in hundreds of patients using mismatch repair detection (MRD) on
tag arrays. Proceedings of the National Academy of Sciences of the United States of America 102,
1471714722.
Falconer, D.S. (1960) Introduction to Quantitative Genetics. Oliver & Boyd, Edinburgh, UK.
Falconer, D.S. (1981) Introduction to Quantitative Genetics, 2nd edn. Longman, London.
Falconer, D.S. (1989) Introduction to Quantitative Genetics, 3rd edn. Wiley, New York.
Falconer, D.S. and Mackay, T.F.C. (1996) Introduction to Quantitative Genetics, 4th edn. Longman Scientific
& Technical Ltd, Harlow, UK.
Faleiro, F.G., Ragagnin, V.A., Moreira, M.A. and de Barros, E.G. (2004) Use of molecular markers to acceler-
ate the breeding of common bean lines resistant to rust and anthracnose. Euphytica 138, 213218.
Falque, M. and Santoni, S. (2007) Molecular markers and high-throughput genotyping analysis. In: Morot-
Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New
Hampshire, pp. 503527.
Falque, M., Decousset, L., Dervins, D., Jacob, A.-M., Joets, J., Martinant, J.-P., Raffoux, X., Ribire, N.,
Ridel, C., Samson, D., Charcosset, A. and Murigneux, A. (2005) Linkage mapping of 1454 new maize
candidate gene loci. Genetics 170, 19571966.
Falush, D., Stephens, M. and Pritchard, J.K. (2003) Inference of population structure using multilocus geno-
type data: linked loci and correlated allele frequencies. Genetics 164, 15671587.
Fan, C., Xing, Y., Mao, H., Lu, T., Han, B., Xu, C., Li, X. and Zhang, Q. (2006) GS3, a major QTL for grain
length and weight and minor QTL for grain width and thickness in rice, encodes a putative transmem-
brane protein. Theoretical and Applied Genetics 112, 11641171.
Fang, Y.-D., Akula, C. and Altpeter, F. (2002) Agrobacterium-mediated barley (Hordeum vulgare L.)
transformation using green fluorescent protein as a visual marker and sequence analysis of the
T-DNA:genomic DNA junctions. Journal of Plant Physiology 159, 11311138.
FAO (Food and Agriculture Organization of the United Nations) (1998) The State of the Worlds Plant
Genetic Resources for Food and Agriculture. FAO, Rome.
Faris, J.D., Laddomada, B. and Gill, B.S. (1998) Molecular mapping of segregation distortion loci in Aegilops
tauschii. Genetics 149, 319327.
Faris, J.D., Fellers, J.P., Brooks, S.A. and Gill, B.S. (2003) A bacterial artificial chromosome contig span-
ning the major domestication locus Q in wheat and identification of a candidate gene. Genetics 164,
311321.
Fashena, S.J., Serebriiskii, I. and Golemis, E.A. (2000) The continued evolution of two-hybrid screening
approaches in yeast: how to outwit different preys with different baits. Gene 250, 114.
Fatokun, C.A., Menancio-Hautea, D.I., Danesh, D. and Young, N.D. (1992) Evidence for orthologous seed
weight genes in cowpea and mung bean based on RFLP mapping. Genetics 132, 841846.
Fauquet, C.M. and Tohme, J. (2004) The global cassava partnership for genetic improvement. Plant
Molecular Biology 86, vx (editorial).
Fehr, W.R. (1987) Principles of Cultivar Development. Vol. 1. Theory and Techniques. Macmillan Publishing
Company, London.
Feltus, F.A., Singh, H.P., Lohithaswa, H.C., Schulze, S.R., Silva, T.D. and Paterson, A.H. (2006) A com-
parative genomic strategy for targeted discovery of single-nucleotide polymorphisms and conserved-
noncoding sequences in orphan crops. Plant Physiology 140, 11831191.
650 References

Fenn, J.B., Mann, M., Meng, C.K., Wong, S.F. and Whitehouse, C.M. (1989) Electrospray ionization for the
mass spectrometry of large biomolecules. Science 246, 6471.
Fernandez-Ricaud, L., Warringer, J., Ericson, E., Pylvanainen, I., Kemp, G.J.L., Nerman, O. and Blomberg,
A. (2005) PROPHECY a database for high-resolution phenomics. Nucleic Acids Research 33,
D369D373.
Fernando, R.L. (2002) Methods to map QTL. Available at: http://meishan.ansci. iastate.edu/rohan/notes-dir/
QTL.pdf (accessed 31 December 2007).
Fernando, R.L., Nettleton, D., Southey, B.R., Dekkers, J.C.M., Rothschild, M.F. and Soller, M.
(2004) Controlling the proportion of false positives in multiple dependent tests. Genetics 166,
611619.
Ferrie, A.M.R. (2007) Doubled haploid production in nutraceutical species: a review. Euphytica 158, 347357.
Ferro, M., Salvi, D., Rivire-Polland, H., Vernat, T., Seigneurin-Berny, D., Grunwald, D., Garin, J., Joyard,
J. and Rolland, N. (2002) Integral membrane proteins of the chloroplast envelope: identification and
subcellular localization of new transporters. Proceedings of the National Academy of Sciences of the
United States of America 99, 1148711492.
Fiehn, O. (2002) Metabolomics the link between genotypes and phenotypes. Plant Molecular Biology
48, 155171.
Fiehn, O., Wohlgemuth, G., Scholz, M., Kind, T., Lee, D.Y., Lu, Y., Moon, S. and Nikolau. B. (2008) Quality
control for plant metabolomics: reporting MSI-compliant studies. The Plant Journal 53, 691704.
Fields, S. and Song, O. (1989) A novel genetic system to detect proteinprotein interactions. Nature 340,
245246.
Filipski, A. and Kumar, S. (2005) Comparative genomics in eukaryotes. In: Gregory, T.R. (ed.) The Evolution
of the Genome. Elsevier Inc., Amsterdam, pp. 521583.
Finak, G., Hallett, M., Park, M. and Pepin, F. (2005) Bioinformatics tools for gene-expression studies.
In: Sensen, C.W. (ed.) Handbook of Genome Research. Genomics, Proteomics, Metabolomics,
Bioinformatics, Ethical and Legal Issues. WILEY-VCH, Weinheim, Germany, pp. 415434.
Finlay, K.W. and Wilkinson, G.N. (1963) The analysis of adaptation in a plant-breeding programme.
Australian Journal of Agricultural Research 14, 742754.
Fire, A., Xu, S., Montgomery, M., Kostas, S., Driver, S. and Mello, C. (1998) Potent and specific genetic
interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806811.
Fisher, R.A. (1918) The correlation between relatives on the supposition of Mendelian inheritance.
Transactions of the Royal Society of Edinburgh, Earth Sciences 52, 399433.
Fisher, R.A. (1935) The detection of linkage with dominant abnormalities. Annals of Eugenics 6, 187201.
Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7,
179188.
Fisk, H.J. and Dandekar, A.M. (2004) Electroporation. In: Pea, L. (ed.) Methods in Molecular Biology, Vol.
286. Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp. 7990.
Flint, J. and Mott, R. (2001) Finding the molecular basis of quantitative traits: successes and pitfalls. Nature
Reviews Genetics 2, 437445.
Flint-Garcia, S.A., Thornsberry, J.M. and Buckler, E.S. (2003) Structure of linkage disequilibrium in plants.
Annual Review of Plant Biology 54, 357374.
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.G. and Miller, W. (1998) A computer program for aligning a
cDNA sequence with a genomic DNA sequence. Genome Research 8, 967974.
Flores, F., Moreno, M.T. and Cubero, J.I. (1998) A comparison of univariate and multivariate methods to
analyze G E interaction. Field Crops Research 56, 271286.
Fodor, S., Dower, W. and Solas, D. (1998) Detection of nucleic acid sequences. Patent EP 0834576.
Fofana, I.B.F., Sangar, A., Collier, R., Taylor, C. and Fauquet, C.M. (2004) A geminivirus-induced gene
silencing system for gene function validation in cassava. Plant Molecular Biology 56, 613624.
Foolad, M.R. and Jones, R.A. (1992) Models to estimate maternally controlled genetic variation in quantita-
tive seed characters. Theoretical and Applied Genetics 83, 360366.
Foolad, M.R. and Jones, R.A. (1993) Mapping salt-tolerance genes in tomato (Lycopersicon esculentum)
using trait-based marker analysis. Theoretical and Applied Genetics 87, 184192.
Forster, B.P. and Thomas, W.T.B. (2004) Doubled haploids in genetics and plant breeding. Plant Breeding
Reviews 25, 5788.
Forster, B.P., Ellis, R.P., Thomas, W.T.B., Newton, A.C., Tuberosa, R., This, D., El-Enein, R.A., Bahri, M.H.
and Ben Salem, M. (2000) The development and application of molecular markers for abiotic stress.
Journal of Experimental Botany 51, 1927.
References 651

Forster, B.P., Herberle-Bors, E., Kasha, K.J. and Touraev, A. (2007) The resurgence of haploids in higher
plants. Trends in Plant Science 12, 368375.
Foster, G.D. and Twell, D. (eds) (1996) Plant Gene Isolation: Principles and Practice. John Wiley & Sons,
Chichester, UK, 426 pp.
Fowler, C. and Hodgkin, T. (2004) Plant genetic resources for food and agriculture: assessing global avail-
ability. Annual Review of Environment and Resources 29, 143179.
Fowler, C. and Lower, R.L. (2005) Politics of plant breeding. Plant Breeding Reviews 25, 2155.
Fowler, C., Hawtin, G., Ortiz, R., Iwanaga, M. and Engels, J. (2005) The questions and derivatives: promot-
ing use and ensuring availability of non-proprietary plant genetic resources. The Journal of World
Intellectual Property 7, 641663.
Fox, P.N., Crossa, J. and Romagosa, I. (1997) Multi-environment testing and genotype environment
interaction. In: Kempton, R.A. and Fox, P.N. (eds) Statistical Methods for Plant Variety Evaluation.
Chapman & Hall, London, pp. 117138.
Fraley, R. (2006) Presentation at Monsanto European Investor Day, 10 November 2006. Available at: http://
www.monsanto.com (accessed 17 November 2009).
Fraley, R.T., Rogers, S.G. and Horsch, R.B. (1986) Genetic transformation in higher plants. Critical Reviews
in Plant Sciences 4, 146.
Francia, E., Tacconi, G., Crosatti, C., Barabaschi, D., Bulgarelli, D., DallAglio, E. and Vale, G. (2005) Marker
assisted selection in crop plants. Plant Cell, Tissue and Organ Culture 82, 317342.
Franco, J., Crossa, J., Taba, S. and Shands, H. (2005) A sampling strategy for conserving genetic diversity
when forming core subsets. Crop Science 45, 10351044.
Franco, J., Crossa, J., Warburton, M.L. and Taba, S. (2006) Sampling strategies for conserving maize diver-
sity when forming core subsets using genetic markers. Crop Science 46, 854864.
Franois, I., Broekaert, W. and Cammue, B. (2002a) Different approaches for multi-transgene-stacking in
plants. Plant Science 163, 281295.
Franois, I.E.J.A., De Bolle, M.F.C., Dwyer, G., Goderis, I.J.W.M, Wouters, P.F.J., Verhaert, P., Proost, P.,
Schaaper, W.M.M., Cammue, B.P.A and Broekaert, W.F. (2002b) Transgenic expression in Arabidopsis
thaliana of a polyprotein construct leading to production of two different antimicrobial proteins. Plant
Physiology 128, 13461358.
Franois, I.E.J.A., Dwyer, G.I., De Bolle, M.F.C., Goderis, I.J.W.M, van Hemelrijck, W., Proost, P., Wouters,
P.F.J., Broekaert, W.F. and Cammue,, B.P.A. (2002c) Processing in transgenic Arabidopsis thaliana
plants of polyproteins with linker peptide variants derived from the Impatiens balsamina antimicrobial
polyprotein precursor. Plant Physiology and Biochemistry 40, 871879.
Frankel, O. (1984) Genetic perspectives of germplasm conservation. In: Arber, W., Limensee, K., Peacock,
W.J. and Starlinger, P. (eds) Genetic Manipulation: Impact on Man and Society. Cambridge University
Press, Cambridge, UK, pp. 161170.
Frankel, O.H. (1986) Genetic resources museum or utility. In: Williams, T.A. and Wratt, G.S. (eds) Plant
Breeding Symposium, DSIR 1986. Agronomy Society of New Zealand, Christchurch, pp. 37.
Frankel, O.H. and Brown, A.H.D. (1984) Current plant genetic resources: a critical appraisal. In: Genetics:
New Frontiers (Vol. IV). Oxford & IBH, New Delhi.
Frankel, W.N. (1995) Taking stock of complex trait genetics in mice. Trends in Genetics 11, 471477.
Frary, A., Nesbitt, T.C., Frary, A., Grandillo, S., van de Knaap, E., Cong, B., Liu, J., Meller, J., Elber, R.,
Alpert, K.B. and Tanksley, S.D. (2000) fw2.2: a quantitative trait locus key to the evolution of tomato
fruit size. Science 289, 8588.
Frascaroli, E., Can, M.A., Landi, P., Pea, G., Gianfranceschi, L., Villa, M., Morgante, M. and P, M.E.
(2007) Classical genetic and quantitative trait loci analyses of heterosis in a maize hybrid between
two elite inbred lines. Genetics 176, 625644.
Frawely, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1991) Knowledge discovery in databases: an over-
view. In: Piatetsky-Shapiro, G. and Frawely, W.J. (eds) Knowledge Discovery in Databases. AAAI
Press, Menlo Park, California and MIT Press, Cambridge, Massachusetts, pp. 127.
Freeman, G.H. (1973) Statistical methods for the analysis of genotypeenvironment interactions. Heredity
31, 339354.
Freudenreich, C.H., Stavenhagen, J.B. and Zakian, V.A. (1997) Stability of CTG:CAG trinucleotide repeat in
yeast is dependent on its orientation in the genome. Molecular and Cell Biology 4, 20902098.
Fridman, E., Pleban, T. and Zamir, D. (2000) A recombination hotspot delimits a wild-species quantitative
trait locus for tomato sugar content to 484 bp within an invertase gene. Proceedings of the National
Academy of Sciences of the United States of America 97, 47184723.
652 References

Fridman, E., Carrari, F., Liu, Y.S., Fernie, A.R. and Zamir, D. (2004) Zooming in on a quantitative trait for
tomato yield using interspecific introgressions. Science 305, 17861789.
Friedman, C., Borlawsky, T., Shagina, L., Xing, H.R. and Lussier, Y.A. (2006) Bio-ontology and text: bridging
the modelling gap. Bioinformatics 22, 24212429.
Frisch, M. (2004) Breeding strategies: optimum design of marker-assisted backcross programs. In: Lrz, H.
and Wenzl, G. (eds) Biotechnology in Agriculture and Forestry, Vol. 55. Molecular Marker Systems in
Plant Breeding and Crop Improvement. Springer-Verlag, Berlin, pp. 319334.
Frisch, M. and Melchinger, A.E. (2001) Marker-assisted backcrossing for simultaneous introgression of two
genes. Crop Science 41, 17161725.
Frisch, M. and Melchinger, A.E. (2005) Selection theory for marker-assisted backcrossing. Genetics 170,
909917.
Frisch, M. and Melchinger, A.E. (2008) Precision of recombination frequency estimates after random inter-
mating with finite population sizes. Genetics 178, 597600.
Frisch, M., Bohn, M. and Melchinger, A.E. (1999a) Comparison of selection strategies for marker-assisted
backcrossing of a gene. Crop Science 39, 12951301.
Frisch, M., Bohn, M. and Melchinger, A.E. (1999b) Minimum sample size and optimal positioning of flanking
markers in marker-assisted backcrossing for transfer of a target gene. Crop Science 39, 967975.
Frisch, M., Bohn, M. and Melchinger, A.E. (2000) PLABSIM: software for simulation of marker-assisted
backcrossing. Journal of Heredity 91, 8687.
Fu, H. and Dooner, H.K. (2002) Intraspecific violation of genetic colinearity and its implications in maize.
Proceedings of the National Academy of Sciences of the United States of America 99, 95739578.
Fu, X.D., Duc, L.T., Fontana, S., Bong, B.B., Tinjuangjun, P., Sudhakar, D., Twyman, R.M., Christou, P. and
Kohli, A. (2000) Linear transgene constructs lacking vector backbone sequences generate low-copy
number transgenic plants with simple integration patterns. Transgenic Research 9, 1119.
Fu, Y., Wen, T.J., Ronin, Y.I., Chen, H.D., Guo, L., Mester, D.I., Yang, Y., Lee, M., Korol, A.B., Ashlock, D.A.
and Schnable, P.S. (2006) Genetic dissection of intermated recombinant inbred lines using a new
genetic map of maize. Genetics 174, 16711683.
Fu, Y.B., Peterson, G.W., Williams, D., Richards, K.W. and Fetch, J.M. (2005) Patterns of AFLP varia-
tion in a core subset of cultivated hexaploid oat germplasm. Theoetical and Applied Genetics 111,
530539.
Fulton, T.M., Beck-Bunn, T., Emmatty, D., Eshed, Y., Lopez, J., Petiard, V., Uhlig, J., Zamir, D. and Tanksley,
S.D. (1997) QTL analysis of an advanced backcross of Lycopersicon peruvianum to the cultivated
tomato and comparisons with QTLs found in other wild species. Theoretical and Applied Genetics
95, 881894.
Fulton, T.M., van der Hoeven, R., Eannetta, N.T. and Tanksley, S.D. (2002) Identification, analysis and uti-
lization of conserved ortholog set markers for comparative genomics in higher plants. The Plant Cell
14, 14571467.
Furtado, A. and Henry, R.J. (2005) The wheat Em promoter drives reporter gene expression in embryo and
aleurone tissue of transgenic barley and rice. Plant Biotechnology Journal 3, 421434.
Gabriel, K.R. (1971) The biplot graphic display of matrices with application to principal component analysis.
Biometrika 58, 453467.
Gabriel, K.R. (1978) Least squares approximation of matrices by additive and multiplicative models. Journal
of the Royal Statistical Society, Series B 40, 186196.
Gale, M.D. (1975) High a-amylase breeding and genetical aspects of the problem. Cereal Research
Communications 4, 231243.
Gale, M.D. and Devos, K.M. (1998) Comparative genetics in the grasses. Proceedings of the National
Academy of Sciences of the United States of America 95, 19711974.
Galinat, W.C. (1977) The origin of corn. In: Sprague, G.F. (ed.) Corn and Corn Improvement, 2nd edn.
American Society of Agronomy, Madison, Wisconsin, pp. 148.
Gallais, A. and Bordes, J. (2007) The use of doubled haploids in recurrent selection and hybrid develop-
ment in maize. Crop Science 47(S3), S190S201.
Gallais, A., Moreau, L. and Charcosset, A. (2007) Detection of markerQTL associations by studying
change in marker frequencies with selection. Theoretical and Applied Genetics 114, 669681.
Galperin, M.Y. (2008) The molecular biology database collection: 2008 update. Nucleic Acids Research
36, D2D4.
Galperin, M.Y. and Koller, E. (2006) New metrics for comparative genomics. Current Opinion in Biotechnology
17, 440447.
References 653

Gao, S., Martinez, C., Skinner, D.J., Krivanek, A.F., Crouch, J.H. and Xu, Y. (2008) Development of a
seed DNA-based genotyping system for marker-assisted selection in maize. Molecular Breeding 22,
477494.
Gao, Z., Xie, X., Ling, Y., Muthukrishnan, S. and Liang, G.H. (2005) Agrobacterium tumefaciens-medi-
ated sorghum transformation using a mannose selection system. Plant Biotechnology Journal 3,
591599.
Garcia, A.A., Kido, E.A., Meza, A.N., Souza, H.M., Pinto, L.R., Pastina, M.M., Leite, C.S., Silva, J.A., Ulian,
E.C., Figueira, A. and Souza, A.P. (2006) Development of an integrated genetic map of a sugarcane
(Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of link-
age and linkage phases. Theoretical and Applied Genetics 112, 298314.
Gauch, H.G., Jr (1988) Model selection and validation for yield trials with interaction. Biometrics 44,
705715.
Gauch, H.G. (2006) Statistical analysis of yield trials by AMMI and GGE. Crop Science 46, 14881500.
Gauch, H.G. and Zobel, R.W. (1988) Predictive and postdictive success of statistical analysis of yield trials.
Theoretical and Applied Genetics 76, 110.
Gauch, H.G. and Zobel, R.W. (1996) AMMI analysis of yield trials. In: Kang, M.S. and Hauch, H.G., Jr (eds)
Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 85122.
Gauch, H.G. and Zobel, R.W. (1997) Identifying mega-environments and targeting genotypes. Crop
Science 37, 311326.
Gauch, H.G., Piepho, H.-P. and Annicchiarico, P. (2008) Statistical analysis of yield trials by AMMI and
GGE: further considerations. Crop Science 48, 866889.
Gaunt, T.R., Rodriguez, S., Zapata, C. and Day, I.N.M. (2006) MIDAS: software for analysis and visualisation
of interallelic disequilibrium between multiallelic markers. BMC Bioinformatics 7, 227.
Gaut, B.S. and Ross-Ibarra, J. (2008) Selection on major components of angiosperm genomes. Science
320, 484486.
Gayen, P., Madan, J.K., Kumar, R. and Sarkar, K.R. (1994) Chromosome doubling in haploids through
colchicine. Maize Genetics Cooperation Newsletter 68, 65.
Gebhardt, C., Ballvora, A., Walkemeier, B., Oberhagemann, P. and Schler, K. (2004) Assessing genetic
potential in germplasm collections of crop plants by markertrait association: a case study for pota-
toes with quantitative variation of resistance to late blight and maturity type. Molecular Breeding 13,
93102.
Gedil, M.A., Wye, C., Berry, S., Segers, B., Peleman, J., Jones, R., Leon, A., Slabaugh, M.B. and Knapp,
S.J. (2001) An integrated restriction fragment length polymorphism-amplified fragment length poly-
morphism linkage map for cultivated sunflower. Genome 44, 213221.
Geldermann, H. (1975) Investigations on inheritance of quantitative characters in animals by gene markers.
I. Methods. Theoretical and Applied Genetics 46, 319330.
Geleta, L.F., Labuschagne, M.T. and Viljoen, C.D. (2004) Relationship between heterosis and genetic dis-
tance based on morphological traits and AFLP markers in pepper. Plant Breeding 123, 467473.
Gelfand, M.S., Mironow, A.A. and Pevzner, P.A. (1996) Gene recognition via spliced sequence alignment.
Proceedings of the National Academy of Sciences of the United States of America 93, 90619066.
Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nature Genetics 25,
2529.
George, E.I. and McMulloch, R.E. (1993) Variable selection via Gibbs sampling. Journal of The American
Statistical Association 91, 883904.
Georgiady, M.S., Whitkus, R.W. and Lord, E.M. (2002) Genetic analysis of traits distinguishing outcrossing
and self-pollinating forms of currant tomato, Lycopersicon pimpinellifolium (Jusl.) Mill. Genetics 161,
333344.
Gepts, P. (2006) Plant genetic resources conservation and utilization: the accomplishments and future of a
societal insurance policy. Crop Science 46, 22782292.
Gerdes, J.T. and Tracy, W.F. (1993) Pedigree diversity within the Lancaster Surecrop heterotic group of
maize. Crop Science 33, 334337.
Gerdes, J.T., Behr, C.F., Coors, J.G. and Tracy, W.F. (1993) Compilation of North America Maize Breeding
Programs. Crop Science Society of America, Madison, Wisconsin.
Gernand, D., Rutten, T., Varshney, A., Rubtsova, M., Prodanovic, S., Br, C., Kumlehn, J., Matzk, F. and
Houben, A. (2005) Uniparental chromosome elimination at mitosis and interphase in wheat and pearl
millet crosses involves micronucleus formation, progressive heterochromatinization and DNA frag-
mentation. The Plant Cell 17, 24312438.
654 References

Gerry, N.P., Witowski, N.E., Day, J., Hammer, R.P., Barany, G. and Barany, F. (1999) Universal DNA micro-
array method for multiplex detection of low abundance point mutations. Journal of Molecular Biology
292, 251262.
Gethi, J.G., Labate, J.A., Lamkey, K.R., Smith, M.E. and Kresovich, S. (2002) SSR variation in important
U.S. maize inbred lines. Crop Science 42, 951957.
Gibbon, B.C. and Larkins, B.A. (2005) Molecular genetic approaches to developing quality protein maize.
Trends in Genetics 21, 227233.
Gibrat, J.F. and Marin, A. (2007) Detecting protein function from genome sequences. In: Morot-Gaudry, J.F.,
Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New Hampshire,
pp. 87106.
Gibson, G. and Weir, B. (2005) The quantitative genetics of transcription. Trends in Genetics 21, 616623.
Gibson, S. and Somerville, C. (1993) Isolating plant genes. Trends in Biotechnology 11, 306313.
Gill, B.S., Appels, R., Botha-Oberholster, A.-M., Buell, C.R., Bennetzen, J.L., Chalhoub, B., Chumley, F.,
Dvork, J., Iwanaga, M., Keller, B., Li, W., McCombie, W.R., Ogihara, Y., Quetier, F. and Sasaki, T.
(2004) A workshop report on wheat genome sequencing: International Genome Research on Wheat
Consortium. Genetics 168, 10871096.
Gimelfarb, A. and Lande, R. (1994a) Simulation of marker-assisted selection in hybrid populations.
Genetical Research 63, 3947.
Gimelfarb, A. and Lande, R. (1994b) Simulation of marker-assisted selection for non-additive traits.
Genetical Research 64, 127136.
Gimelfarb, A. and Lande, R. (1995) Marker-assisted selection and marker-QTL associations in hybrid pop-
ulations. Theoretical and Applied Genetics 91, 522528.
Giovannoni, J.J., Wing, R.A., Ganal, M.W. and Tanksley, S.D. (1991) Isolation of molecular markers from
specific chromosome intervals using DNA pools from existing populations. Nucleic Acids Research
19, 65536558.
Gish, W. and States, D.J. (1993) Identification of protein coding regions by database similarity search.
Nature Genetics 3, 266272.
Gizlice, Z., Carter, T.E., Jr and Burton, J.W. (1993) Genetic diversity in North American soybean: II.
Prediction of heterosis in F2 populations of southern founding stock using genetic similarity measures.
Crop Science 33, 620626.
Glass, G.V. (1976) Primary, secondary and meta-analysis of research. Educational Researcher 5, 38.
Glazier, A.M., Nadeau, J.H. and Aitman, T.J. (2002) Finding genes that underlie complex traits. Science
298, 23452349.
Gleave, A.P., Mitra, D.S., Mudge, S.R. and Morris, B.A.M. (1999) Selectable marker-free transgenic plants
without sexual crossing: transient expression of cre recombinase and use of a conditional lethal domi-
nant gene. Plant Molecular Biology 40, 223235.
Gleba, Y., Marillonnet, S. and Klimyuk, V. (2004) Engineering viral expression vectors for plants: the full
virus and the deconstructed virus strategies. Current Opinion in Plant Biology 7, 182188.
Gleba, Y., Klimyuk, V. and Marillonnet, S. (2005) Magnifection a new platform for expressing recombinant
vaccines in plants. Vaccine 23, 20422048.
Goderis, I.J.W.M., De Bolle, M.F.C., Franois, I.E.J.A., Wouters, P.F.J., Broekaert, W.F. and Cammue, B.P.A.
(2002) A set of modular plant transformation vectors allowing flexible insertion of up to six expression
units. Plant Molecular Biology 50, 1727.
Godshalk, E.B., Lee, M. and Lamkey, K.R. (1990) Relationship of restriction fragment length poly-
morphisms to single-cross hybrid performance of maize. Theoretical and Applied Genetics 80,
273280.
Goedeke, S., Hensel, G., Kapusi, E., Gahrtz, M. and Kumlehn, J. (2007) Transgenic barley in fundamental
research and biotechnology. Transgenic Plant Journal 1, 104117.
Goff, S.A., Ricke, D., Lan, T.H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller,
P., Varma, H., Hadley, D., Hutchison, D., Martin, C., Katagiri, F., Lange, B.M., Moughamer, T., Xia,
Y., Budworth, P., Zhong, J., Miguel, T., Paszkowski, U., Zhang, S., Colbert, M., Sun, W.L., Chen, L.,
Cooper, B., Park, S., Wood, T.C., Mao, L., Quail, P., Wing, R., Dean, R., Yu, Y., Zharkikh, A., Shen, R.,
Sahasrabudhe, S., Thomas, A., Cannings, R., Gutin, A., Pruss, D., Reid, J., Tavtigian, S., Mitchell, J.,
Eldredge, G., Scholl, T., Miller, R.M., Bhatnagar, S., Adey, N., Rubano, T., Tusneem, N., Robinson,
R., Feldhaus, J., Macalma, T., Oliphant, A. and Briggs, S. (2002) A draft sequence of the rice genome
(Oryza sativa L. ssp. japonica). Science 296, 92100.
Goffinet, B. and Gerber, S. (2000) Quantitative trait loci: a meta-analysis. Genetics 155, 463473.
References 655

Goldman, I.L. (1999) Inbreeding and outbreeding in the development of a modern heterosis concept. In:
Coors, J.G. and Pandey, S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA,
Madison, Wisconsin, pp. 718.
Goldman, I.L. (2000) Prediction in plant breeding. Plant Breeding Reviews 19, 1540.
Goldman, I.L., Rocheford, T.R. and Dudley, J.W. (1993) Quantitative trait loci influencing protein and starch
concentration in the Illinois long term selection maize strains. Theoretical and Applied Genetics 87,
217224.
Goldman, I.L., Rocheford, T.R. and Dudley, J.W. (1994) Molecular markers associated with maize kernel oil
concentration in the Illinois High Protein Illinois Low Protein Cross. Crop Science 34, 908915.
Goldsbrough, A.P., Lastrella, C.N. and Yoder, J.I. (1993) Transposition mediated re-positioning and subse-
quent elimination of marker genes from transgenic tomato. Bio/Technology 11, 12861292.
Gollob, H.F. (1968) A statistical model which combines features of factor analytic and analysis of variance.
Psychometrika 33, 73115.
Goodin, M.M., Dietzgen, R.G., Schichnes, D., Ruzin, S. and Jackson, A.O. (2002) pGD vectors: versatile
tools for the expression of green and red fluorescent protein fusions in agroinfiltrated plant leaves. The
Plant Journal 31, 375383.
Goodman, R.E., Vieths, S., Sampson, H.A., Hill, D., Ebisawa, M., Tyaler, S.L. and van Ree, R. (2008)
Allergenicity assessment of genetically modified crops what make sense? Nature Biotechnology
26, 7381.
Goodnight, C.J. (2004) Gene interaction and selection. Plant Breeding Reviews 24 (Part 2), 269291.
Gorg, A., Obermaier, C., Boguth, G. and Weiss, W. (1999) Recent developments in two-dimensional gel
electrophoresis with immobilized pH gradients: wide pH gradients up to pH 12, longer separation
distances and simplified procedures. Electrophoresis 20, 712717.
Grandillo, S. and Tanksley, S.D. (1996) QTL analysis of horticultural traits differentiating the cultivated tomato
from the closely related species Lycopersicon pimpinellifolium. Theoretical and Applied Genetics 92,
935951.
Graner, A., Jahoor, A., Schondelmaier, J., Siedler, H., Pollen, K., Fischbeck, G., Wenzel, G. and
Herrmann, R.G. (1991) Construction of an RFLP map of barley. Theoretical and Applied Genetics
83, 250256.
Grapes, L., Dekkers, J.C.M., Rothschild, M.F. and Fernando, R.L. (2004) Comparing linkage disequilibrium-
based methods for fine mapping quantitative trait loci. Genetics 166, 15611570.
Green, P.J. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determi-
nation. Biometrika 82, 711732.
Greenbaun, D., Smith, A. and Gerstein, M. (2005) Impediments to database interoperation: legal issues
and security concerns. Nucleic Acids Research 33, D3D4.
Greene, S.L. and Guarino, L. (eds) (1999) Linking Genetic Resources and Geography: Emerging Strategies
for Conserving and Using Crop Biodiversity. American Society of Agronomy (ASA) and Crop Science
Society of America (CSSA), Madison, Wisconsin.
Gregory, B.D., Yazaki, J. and Ecker, J.R. (2008) Utilizing tiling microarrays for whole-genome analysis in
plants. The Plant Journal 53, 636644.
Groos, C., Robert, N., Bervas, E. and Charmet, G. (2003) Genetic analysis of grain protein-content,
grain yield and thousand-kernel weight in bread wheat. Theoretical and Applied Genetics 106,
10321040.
Grosset, J., Alary, R., Gautier, M.F., Menossi, M., Martinez-Izquierdo, J.A. and Joudrier, P. (1997)
Characterization of a barley gene coding for an alpha-amylase inhibitor subunit (CMd protein) and
analysis of its promoter in transgenic tobacco plants and in maize kernels by microprojectile bombard-
ment. Plant Molecular Biology 34, 331338.
Grupe, A., Germer, S., Usuka, J., Aud, D., Belknap, J.K., Klein, R.F., Ahluwalia, M.K., Higuchi, R. and Peltz,
G. (2001) In silico mapping of complex disease-related traits in mice. Science 292, 19151918.
Gu, S., Pakstis, A.J. and Kidd, K.K. (2005) HAPLOT: a graphical comparison of haplotype blocks, tagSNP
sets and SNP variation for multiple populations. Bioinformatics 21, 39383939.
Guidetti, G. (1998) Seed terminator and mega-merger threaten food and freedom. Available at: http://www.
sustainable-city.org/articles/terminat.htm (accessed 17 November 2009).
Guo, B., Sleper, D.A., Sun, J., Nguyen, H.T., Arelli, P.R. and Shannon, J.G. (2006) Pooled analysis of data from
multiple quantitative trait locus mapping populations. Theoretical and Applied Genetics 113, 3948.
Guo, M., Rupe, M.A., Zinselmeier, C., Habben, J., Bowen, B.A. and Smith, O.S. (2004) Allelic variation of
gene expression in maize hybrids. The Plant Cell 16, 17071716.
656 References

Guo, M., Rupe, M.A., Yang, X., Crasta, O., Zinselmeier, C., Smith, O.S. and Bowen, B. (2006) Genome-wide
transcript analysis of maize hybrids: allelic additive gene expression and yield heterosis. Theoretical
and Applied Genetics 113, 831845.
Gupta, P.K. and Rustgi, S. (2004) Molecular markers from the transcribed/expressed region of the genome
in higher plants. Functional and Integrated Genomics 4, 139162.
Gur, A. and Zamir, D. (2004) Unused natural variation can lift yield barriers in plant breeding. PLoS Biology
2(10), e245.
Gurib-Fakim, A. (2006) Medicinal plants: traditions of yesterday and drugs of tomorrow. Molecular Aspects
of Medicine 27, 193.
Haanstra, J.P.W., Wye, C., Verbakel, H., Meijer-Dekens, F., Van den Berg, P., Odinot, P., van Heusden,
A.W., Tanksely, S., Lindhout, P. and Peleman, J. (1999) An integrated high-density RFLP-AFLP map of
tomato based on two Lycopersicon esculentum L. pennellii F2 populations. Theoretical and Applied
Genetics 99, 254271.
Haberer, G., Young, S., Bharati, A.K., Gundlach, H., Raymond, C., Fuks, G., Butler, E., Wing, R.A., Rounsley,
S., Birren, B., Nusbaum, C., Mayer, K.F.X. and Messing, J. (2005) Structure and architecture of the
maize genome. Plant Physiology 139, 16121624.
Hackett, C.A., Meyer, R.C. and Thomas, W.T.B. (2001) Multi-trait QTL mapping in barley using multivariate
regression. Genetical Research 77, 95106.
Hagberg, A. and Hagberg, G. (1980) High frequency of spontaneous haploids in the progeny of an induced
mutation barley. Hereditas 93, 341343.
Hahn, W.J. and Grifo, F.T. (1996) Molecular markers in plant conservation genetics. In: Sobral, B.W.S. (ed.)
The Impact of Plant Molecular Genetics. Birkhuer, Boston, Massachusetts, pp. 113136.
Hajdukiewicz, P., Svab, Z. and Maliga, P. (1994) The small, versatile pPZP family of Agrobacterium binary
vectors for plant transformation. Plant Molecular Biology 25, 989994.
Hajdukiewicz, P.T.J., Gilbertson, L. and Staub, J.M. (2001) Multiple pathways for Cre/lox-mediated recom-
bination in plastids. The Plant Journal 27, 161170.
Haldane, J.B.S. (1919) The combination of linkage values and the calculation of distance between the loci
of linkage factors. Journal of Genetics 8, 299309.
Haldane, J.B.S. and Smith, C.A.B. (1947) A new estimate of the linkage between the genes for colour-
blindness and haemophilia in man. Annals of Eugenics 14, 1031.
Haldane, J.B.S. and Waddington, C.H. (1931) Inbreeding and linkage. Genetics 16, 357374.
Haldrup, A., Petersen, S.G. and Okkels, F.T. (1998a) Positive selection: a plant selection principle based on
xylose isomerase, an enzyme used in the food industry. Plant Cell Reports 18, 7681.
Haldrup, A., Petersen, S.G. and Okkels, F.T. (1998b) The xylose isomerase gene from Thermoanaerobacterium
thermosulfurogenes allows effective selection of transgenic plant cells using D-xylose as the selection
agent. Plant Molecular Biology 37, 287296.
Haley, C. (1999) Advances in quantitative trait locus mapping. In: Dekkers, J.C.M., Lamont, S.J. and
Rothschild, M.F. (eds) From Jay Lush to Genomics: Visions for Animal Breeding and Genetics. Animal
Breeding and Genetics Group, Department of Animal Science, Iowa State University, Ames, Iowa,
pp. 4759.
Haley, C.S. and Knott, S.A. (1992) A simple regression method for mapping quantitative trait loci in line
crosses using flanking markers. Heredity 69, 315324.
Haley, C.S., Knott, S.A. and Elsen, J.-M. (1994) Mapping quantitative trait loci in crosses between outbred
lines using least squares. Genetics 136, 11951207.
Halfhill, M.D., Richards, H.A., Mabon, S.A. and Stewart, C.N., Jr (2001) Expression of GFP and Bt trans-
genes in Brassica napus and hybridization and introgression with Brassica rapa. Theoretical and
Applied Genetics 103, 362368.
Halfhill, M.D., Zhu, B., Warwick, S.I., Raymer, P.L., Millwood, R.J., Weissinger, A.K. and Stewart, C.N., Jr
(2004a) Hybridization and backcrossing between transgenic oilseed rape and two related weed spe-
cies under field conditions. Environmental Biosafety Research 3, 7381.
Halfhill, M.D., Millwood, R.J. and Stewart, C.N., Jr (2004b) Green fluorescent protein quantification in whole
plants. In: Pea, L. (ed.) Methods in Molecular Biology, Vol. 286. Transgenic Plants: Methods and
Protocols. Humana Press Inc., Totowa, New Jersey, pp. 215225.
Hall, J.G., Eis, P.S., Law, S.M., Reynaldo, L.P., Prudent, J.R., Marshall, D.J., Allawi, H.T., Mast, A.L.,
Dahlberg, J.E., Kwiatkowski, R.W., de Arruda, M., Neri, B.P. and Lyamichev, V.I. (2000) Sensitive
detection of DNA polymorphisms by the serial invasive signal amplification reaction. Proceedings of
the National Academy of Sciences of the United States of America 97, 82728277.
References 657

Hallauer, A.R. (1990) Methods used in developing maize inbreds. Maydica 35, 116.
Hallauer, A.R. (2007) History, contribution and future of quantitative genetics in plant breeding: lessons
from maize. Crop Science 47(S3), S4S19.
Hallauer, A.R. and Miranda, J.B. (1988) Quantitative Genetics in Maize Breeding, 2nd edn. Iowa State
University Press, Ames, Iowa.
Hallauer, A.R., Russell, W.A. and Lamkey, K.R. (1988) Corn breeding. In: Sprague, G.F. and Dudley, J.W.
(eds) Corn and Corn Improvement, 3rd edn. ASA-CSSA-SSSA, Madison, Wisconsin, pp. 463564.
Hallauer, A.R., Ross, A.J. and Lee, M. (2004) Long-term divergent selection for ear length in maize. Plant
Breeding Reviews 24 (Part 2), 153168.
Halpin, C. and Boerjan, W. (2003) Stacking transgenes in forest trees. Trends in Plant Science 8,
363365.
Halpin, C., Barakate, A., Askari, B.M., Abbott, J.C. and Ryan, M.D. (2001) Enabling technologies for manip-
ulating multiple genes on complex pathways. Plant Molecular Biology 47, 295310.
Hamilton, C.M. (1997) A binary-BAC system for plant transformation with high-molecular-weight DNA.
Gene 200, 107116.
Hamilton, C.M., Frary, A., Lewis, C. and Tanksley, S.D. (1996) Stable transfer of intact high molecular weight
DNA into plant chromosomes. Proceedings of the National Academy of Sciences of the United States
of America 93, 99759979.
Hammer, G.L., Kropff, M.J., Sinclair, T.R. and Porter, J.R. (2002) Future contribution of crop modeling:
from heuristics and supporting decision making to understanding genetic regulation and aiding crop
improvement. European Journal of Agronomy 18, 1531.
Hammer, G.L., Chapman, S., van Oosterom, E. and Podlich, D.W. (2005) Trait physiology and crop mod-
eling as a framework to link phenotypic complexity to underlying genetic systems. Australian Journal
of Agricultural Research 56, 947960.
Hammond, M.P. and Birney, E. (2004) Genome information resources developments at Ensembl. Trends
in Genetics 20, 268272.
Han, B. and Xue, Y. (2003) Genome-wide intraspecific DNA-sequence variations in rice. Current Opinion
in Plant Biology 6, 134138.
Han, O.K., Kaga, A., Isemura, T., Wang, X.W., Tomooka, N. and Vaughan, D.A. (2005) A genetic linkage
map for azuki bean [Vigna angularis (Willd.) Ohwi & Ohashi]. Theoretical and Applied Genetics 111,
12781287.
Han, X., Aslanian, A. and Yates, J.R. III (2008) Mass spectrometry for proteomics. Current Opinion in
Chemical Biology 12, 483490.
Hanash, S. (2003) Disease proteomics. Nature 422, 226232.
Hanin, M. and Paszkowski, J. (2003) Plant genome modification by homologous recombination. Current
Opinion in Plant Biology 6, 157162.
Hanocq, E., Laperche, A., Jaminon, O., Lain, A.-L. and Le Guis, J. (2007) Most significant genome regions
involved in the control of earliness traits in bread wheat, as revealed by QTL meta-analysis. Theoretical
and Applied Genetics 114, 569584.
Hansen, B.G., Halkier, B.A. and Kliebenstein, D.J. (2008) Identifying the molecular basis of QTLs: eQTLs
add a new dimension. Trends in Plant Science 13, 7277.
Hansen, M., Kraft, T., Ganestam, S., Sll, T. and Nilsson, N.-O. (2001) Linkage disequilibrium mapping of
the bolting gene in sea beet using AFLP markers. Genetical Research 77, 6166.
Hanson, W.D. (1959) Early generation analysis of lengths of heterozygous chromosome segments around
a locus held heterozygous with backcrossing or selfing. Genetics 44, 833837.
Harding, K. (2004) Genetic integrity of cryopreserved plant cells: a review. Cryo Letters 25, 322.
Harlan, H.V. and Pope, M.N. (1922) The use and value of back-crosses in small grain breeding. Journal of
Heredity 13, 319322.
Harlan, H.V., Martini, M.L. and Stevens, H. (1940) A study of methods in barley breeding. USDA Technical
Bulletin 720.
Harlan, J. (1965) The possible role of weed races in the evolution of cultivated plants. Euphytica 14,
173176.
Harlan, J.R. (1971) Agricultural origins: centers and noncenters. Science 174, 468474.
Harlan, J. (1992) Crops and Man, 2nd edn. Crop Science Society of America, Madison, Wisconsin.
Harlan, J.R. (1987) Gene centers and gene utilization in American agriculture. In: Yeatman, C.W., Kafton, D.
and Wilkes, G. (eds) Plant Genetic Resources: a Conservation Imperative. Westview Press, Boulder,
Colorado, pp. 111129.
658 References

Harlan, J.R. and de Wet, J.M.J. (1971) Towards a rational classification of cultivated plants. Taxon 20,
509517.
Harper, B.K., Mabon, S.A., Leffel, S.M., Halfhill, M.D., Richards, H.A., Moyer, K.A. and Stewart, C.N., Jr
(1999) Green fluorescent protein as a marker for expression of a second gene in transgenic plants.
Nature Biotechnology 17, 11251129.
Harris, S.A. (1999) Molecular approaches to assessing plant diversity. In: Benson, E.E. (ed.) Plant
Conservation Biotechnology. Taylor & Francis Ltd, London, pp. 1124.
Hart, G.E., Gale, M.D. and McIntosh, R.A. (1993) Linkage maps of Triticum aestivum (Hexaploid wheat, 2n
= 42, genome A, B and D) and T. tauschii (2n = 14, genome D). In: OBrien, S.J. (ed.) Genetic Maps:
Locus Maps of Complex Genomes. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New
York, pp. 6.2046.219.
Harushima, Y., Kurata, N., Yano, M., Nagamura, Y., Sasaki, T., Minobe, Y. and Nakagahra, M. (1996)
Detection of segregation distortions in an indicajaponica rice cross using a high-resolution molecular
map. Theoretical and Applied Genetics 92,145150.
Harushima, Y., Yano, M., Shomura, A., Sato, M., Shimano, T., Kuboki, Y., Yamamoto, T., Lin, S.Y., Antonio,
B.A., Parco, A., Kajiya, H., Huang, N., Yamamoto, K., Nagamura, Y., Kurata, N., Khush, G.S. and
Sasaki, T. (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 popula-
tion. Genetics 148, 479494.
Haseloff, J., Siemering, K.P., Prasher, D. and Hodge, S. (1997) Removal of a cryptic intron and subcel-
lular localization of green fluorescent protein are required to mark transgenic Arabidopsis plants
brightly. Proceedings of the National Academy of Sciences of the United States of America 94,
21222127.
Havey, M.J. (1998) Molecular analyses and heterosis in the vegetables: can we breed them like maize?
Lamkey, K.R. and Staub, J.E. (eds) Concepts and Breeding of Heterosis in Crop Plants. Crop Science
Society of America (CSSA), Madison, Wisconsin, pp. 109116.
Hawtin, G. (1998) Conservation of agrobiodiversity for tropical agriculture. In: Chopra, V.L., Singh, R.B
and Varma, A. (eds) Crop Productivity and Sustainability Shaping the Future, Proceedings of the
2nd International Crop Science Congress. Oxford & IBH Publishing Co., New Delhi, pp. 917925.
Hayes, B. and Goddard, M.E. (2001) The distribution of the effects of genes affecting quantitative traits in
livestock. Genetics Selection Evolution 33, 209229.
Hazekamp, Th. (2002) The potential role of passport data in the conservation and use of plant genetic
resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing
Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome, pp. 185194.
Hazekamp, Th., Serwinski, J. and Alercia, A. (1997) Mulit-crop passport descriptors. In: Lipmann, E.,
Jongen, M.W.M., Hintum, Th.J.L. van, Gass, T. and Maggioni, L. (compilers) Central Crop Databases:
Tools for Plant Genetic Resources Management. Report of a Workshop, 1316 October 1996,
Budapest, Hungary. International Plant Genetic Resources Institute, Rome, Italy/CGN, Wageningen,
Netherlands, pp. 3539.
Hazen, S.P., Pathan, M.S., Sanchez, A., Baxter, I., Dunn, M., Estes, B., Chang, H.-S., Zhu, T., Kreps, J.A.
and Nguyen, H.T. (2005) Expression profiling of rice segregating for drought tolerance QTL using a
rice genome array. Functional and Integrative Genomics 5, 104116.
He, P., Li, J.Z., Zheng, X.W., Shen L.S., Lu, C.F., Chen, Y. and Zhu, L.H. (2001) Comparison of molecular
linkage maps and agronomic trait loci between DH and RIL populations derived from the same rice
cross. Crop Science 41, 12401246.
He, X.H. and Zhang, Y.M. (2008) Mapping epistatic quantitative trait loci underlying endosperm traits using
all markers on the entire genome in a random hybridization design. Heredity 101, 3947.
He, Y., Chen, C., Tu, J., Zhou, P., Jiang, G., Tan, Y., Xu, C. and Zhang, Q. (2002) Improvement of an elite
rice hybrid, Shanyou 63, by transformation and maker-assisted selection. In: Abstracts of the Fourth
International Symposium on Hybrid Rice, 1417 May 2002, Hanoi, Vietnam, p. 43.
He, Y., Li, X., Zhang, J., Jiang, G., Liu, S., Chen, S., Tu, J., Xu, C. and Zhang, Q. (2004) Gene pyramid-
ing to improve hybrid rice by molecular marker technique. 4th International Crop Science Congress.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
He, Z., Fu, Y., Si, H., Hu, G., Zhang, S., Yu, Y. and Sun, Z. (2004) Phosphomannose-isomerase (pmi) gene
as a selectable marker for rice transformation via Agrobacterium. Plant Science 166, 1722.
Heckenberger, M., Bohn, M., Maurer, H.P., Frisch, M. and Melchinger, A.E. (2005a) Identification of essen-
tially derived varieties with molecular markers: an approach based on statistical test theory and com-
puter simulations. Theoretical and Applied Genetics 111, 598608.
References 659

Heckenberger, M., Bohn, M., Klein, D. and Melchinger, A.E. (2005b) Identification of essentially derived
varieties obtained from biparental crosses of homozygous lines: II. Morphological distances and het-
erosis in comparison with simple sequence repeat and amplified fragment length polymorphism data
in maize. Crop Science 45, 11321140.
Heckenberger, M., Muminovic, J., van der Voort, J.R., Peleman, J., Bohn, M. and Melchinger, A.E. (2006)
Identification of essentially derived varieties from biparental crosses of homogenous lines. III.
AFLP data from maize inbreds and comparison with SSR data. Molecular Breeding 17, 111125.
Heckenberger, M., Maurer, H.P., Melchinger, A.E. and Frisch, M. (2008) The Plabsoft database: a compre-
hensive database management system for integrating phenotypic and genomic data in academic and
commercial plant breeding programs. Euphytica 161, 173179.
Hedden, P. (2003) The genes of the green revolution. Trends in Genetics 19, 519.
Hedgecock, D., Lin, J.Z., DeCola, S., Haudenschild, C., Meyer, E., Manahan, D.T. and Bowen, B. (2002)
Analysis of gene expression in hybrid Pacific oysters by massively parallel signature sequencing.
Plant & Animal Genome X Conference Abstract. Available at: http://www.intl-pag.org/pag/10/abstracts/
PAGX_W15.html (accessed 30 June 2007).
Hedges, L.V. and Olkin, I. (1985) Statistical Methods for Meta-analysis. Academic Press, Orlando, Florida.
Heisey, P.W., King, J.L. and Rubenstein, K.D. (2005) Patterns of public sector and private-sector patenting
in agricultural biotechnology. AgBioForum 8, 7382.
Heitz, A. (1998) Intellectual property rights and plant variety protection in relation to demands of the world
trade organization and farmers in sub-Saharan Africa. In: Proceedings of the Regional Technical
Meeting on Seed Policy and Programmes for Sub-Saharan Africa, Abidjan, Cte dIvoire, 2327
November 1998. Available at: http://www.fao.org/ag/agp/AGPS/abidjan/tabcont.htm (accessed 17
November 2009).
Helentjaris, T. and Briggs, K. (1998) Are there too many genes in maize? Maize Genetics Cooperation
Newsletter 72, 3940.
Helentjaris, T., Cushman, M.A.T. and Winkler, R. (1992) Developing a genetic understanding of agronomy
traits with complex inheritance. In: Dettee, Y., Dumas, C. and Gallais, A. (eds) Reproductive Biology
and Plant Breeding. Springer-Verlag, Berlin, pp. 397406.
Helfer, L.R. (2006) The demise and rebirth of plant variety protection: a comment on obsolescence in intel-
lectual property. Regimes. Public Law and Legal Theory (Vanderbilt University Law School), Working
Paper Number 0628. Vanderbilt University, Nashville, Tennessee.
Hellens, R., Mullineaux, P. and Klee, H. (2000) Technical focus: a guide to Agrobacterium binary Ti vectors.
Trends in Plant Science 5, 446451.
Hellens, R.P., Edwards, E.A., Leyland, N.R., Bean, S. and Mullineaux, P.M. (2000) pGreen, a versatile and
flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant Molecular Biology 42,
819832.
Henderson, C.R. (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics
31, 423447.
Henikoff, S. and Comai, L. (2003) Single-nucleotide mutations for plant functional genomics. Annual Review
of Plant Biology 54, 375401.
Henry, Y., De Buyser, J., Agache, S., Parker, B.B. and Snape, J.W. (1988) Comparison of methods of
haploid production and performance of wheat lines produced by doubled haploidy and single seed
descent. In: Miller, T.E. and Koebner, R.M.D. (eds) Proceedings of 7th International Wheat Genetics
Symposium, Cambridge, 1319 July 1988. Institute of Plant Science Research, Cambridge, UK, pp.
10871092.
Henson-Apollonio, V. (2007) Impacts of intellectual property rights on marker-assisted selection research
and application for agriculture in developing countries. In: Guimares, E.P., Ruane, J., Scherf, B.D.,
Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives
in Crops, Livestock, Forestry and Fish. Food and Agriculture Organization of the United Nations,
Rome, pp. 405425.
Herring, R.J. (2008) Opposition to transgenic technologies: ideology, interests and collective action frames.
Nature Reviews Genetics 9, 458463.
Heun, M., Kennedy, A.E., Anderson, J.A., Lapitan, N.L.V., Sorrells, M.E. and Tanksley, S.D. (1991)
Construction of a restriction fragment length polymorphism map for barley (Hordeum vulgare).
Genome 34, 437447.
Hiatt, A.C., Cafferkey, R. and Bowdish, K. (1989) Production of antibodies in transgenic plants. Nature 342,
7678.
660 References

Hiei, Y., Ohta, S., Komari, T. and Kumashiro, T. (1994) Efficient transformation of rice (Oryza sativa L.) medi-
ated by Agrobacterium and sequence analysis of the boundaries of the T-DNA. The Plant Journal 6,
271282.
Hiei, Y., Komari, T. and Kubo, T. (1997) Transformation of rice mediated by Agrobacterium tumefaciens.
Plant Molecular Biology 35, 205218.
Hijmans, R.J., Guarino, L., Cruz, M. and Rojas, E. (2001) Computer tools for spatial analysis of plant
genetic resources data. 1. DIVA-GIS. Plant Genetic Resources Newsletter 127, 1519.
Hillel, D. and Rosenzweig, C. (2005) The role of biodiversity in agronomy. Advances in Agronomy 88,
134.
Hillel, J., Avner, R., Baxter-Jones, C., Dunnington, E.A., Cahaner, A. and Siegel, P.B. (1990) DNA finger-
prints from blood mixes in chickens and turkeys. Animal Biotechnology 2, 201204.
Hillenkamp, F. and Kster, H. (1999) Infrared matrix-assisted laser desorption/ionization mass spectromet-
ric analysis of macro-molecules. Patent EP 1075545.
Himmelbach, A., Zierold, U., Hensel, G., Riechen, J., Douchkov, D., Schweizer, P. and Kumlehn, J. (2007) A
set of modular binary vectors for transformation of cereals. Plant Physiology 145, 11921200.
Hintum, Th.J.L. van (1999) The Core Selector, a system to generate representative selections of germ-
plasm accessions. Plant Genetic Resources Newsletter 118, 6467.
Hird, D.L., Paul, W., Hollyoak, J.S. and Scott, R.J. (2000) The restoration of fertility in male sterile tobacco
demonstrates that transgene silencing can be mediated by T-DNA that has no DNA homology to the
silenced transgene. Transgenic Research 9, 91102.
Hirochika, H. (2003) Insertional mutagenesis in rice using the endogenous retrotransposon. In: Mew, T.W.,
Brar, D.S., Peng, S., Dawe, D. and Hardy, B. (eds) Rice Science: Innovations and Impact for Livelihood,
Proceedings of the International Rice Research Conference, 1619 September 2002, Beijing, China.
International Rice Research Institute, Chinese Academy of Engineering and Chinese Academy of
Agricultural Sciences, pp. 205212.
Hirochika, H., Guiderdoni, E., An, G., Hsing, Y.I., Eun, M.Y., Han, C.D., Upadhyaya, N., Ramachandran,
S., Zhang, Q., Pereira, A., Sundaresan, V. and Leung, H. (2004) Rice mutant resources for gene
discovery. Plant Molecular Biology 54, 325334.
Hittalmani, S., Parco, A., Mew, T.V., Zeigler, R.S. and Huang, N. (2000) Fine mapping and DNA marker-
assisted pyramiding of the three major genes for blast resistance in rice. Theoretical and Applied
Genetics 100, 11211128.
Hodgkin, T. and Ramanatha Rao, V. (2002) People, plant and DNA: technical aspects of conserving and
using plant genetic resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson,
M.T. (eds) Managing Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome,
pp. 469480.
Hodson, D.P. and White, J.W. (2007) Use of spatial analyses for global characterization of wheat-based
production systems. Journal of Agricultural Science 145, 115125.
Hodson, D.P., Martinez-Romero, E., White, J.W., Corbett, J.D. and Bnziger, M. (2002) Africa Maize
Research Atlas (v. 3.0), CD-ROM Publication. Centro Internacional de Mejoramiento de Maiz y Trigo
(CIMMYT), Mexico, DF.
Hoekema, A., Hirsch, P.R., Hooykaas, P.J.J. and Schilperoort, R.A. (1983) A binary plant vector strategy based
on separation of vir- and T-region of the Agrobacterium tumefaciens Ti-plasmid. Nature 303, 179180.
Hoeschele, I. and VanRaden, P.M. (1993a) Bayesian analysis of linkage between genetic markers and
quantitative trait loci. I. Prior knowledge. Theoretical and Applied Genetics 85, 953960.
Hoeschele, I. and VanRaden, P.M. (1993b) Bayesian analysis of linkage between genetic markers and
quantitative trait loci. II. Combining prior knowledge with experimental evidence. Theoretical and
Applied Genetics 85, 946952.
Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999) The Prosite database, its status in 1999.
Nucleic Acids Research 27, 215219.
Hoheisel, J.D. (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nature
Reviews Genetics 7, 200210.
Hohn, B., Levy, A.A. and Puchta, H. (2001) Elimination of selection markers from transgenic plants. Current
Opinion in Biotechnology 12, 139143.
Hoisington, D. and Ortiz, R. (2008) Research and field monitoring on transgenic crops by the Centro
Internacional de Mejoramiento de Maiz y Trigo (CIMMYT). Euphytica 164, 893902.
Holland, J.B. (1998) EPISTACY: a SAS program for detecting two-locus epistasis interactions using genetic
marker information. Journal of Heredity 89, 374375.
References 661

Holland, J.B. (2001) Epistasis and plant breeding. Plant Breeding Reviews 21, 2932.
Holland, J.B. (2004) Implementation of molecular markers for quantitative traits in breeding programs
challenges and opportunities. In: New Direction for a Diverse Planet, Proceedings of the 4th International
Crop Science Congress, 26 September1 October 2004, Brisbane, Australia. Published on CD-ROM.
Available at: http://www.cropscience.org.au/icsc 2004/ (accessed 17 November 2009).
Hopkins, C.G. (1899) Improvement in the chemical composition of the corn kernel. Illinois Agricultural
Experiment Station Bulletin 55, 205240.
Horan, K., Lauricha, J., Bailey-Serres, J., Raikhel, N. and Girke, T. (2005) Genome cluster database.
A sequence family analysis platform for Arabidopsis and rice. Plant Physiology 138, 4754.
Hori, K., Kobayashi, T., Shimizu, A., Sato, K., Takeda, K. and Kawasaki, S. (2003) Efficient construction
of high-density linkage map and its application to QTL analysis in barley. Theoretical and Applied
Genetics 107, 806813.
Hori, K., Sato, K. and Takeda, K. (2007) Detection of seed dormancy QTL in multiple mapping popula-
tions derived from crosses involving novel barley germplasm. Theoretical and Applied Genetics 115,
869876.
Hormaza, J.I., Dollo, L. and Polito, V.S. (1994) Identification of a RAPD marker linked to sex determination
in Pistacia vera using bulked segregant analysis. Theoretical and Applied Genetics 89, 913.
Hospital, F. (2001) Size of donor chromosome segments around introgressed loci and reduction of linkage
drag in marker-assisted backcross programs. Genetics 158, 13631379.
Hospital, F. (2002) Marker-assisted backcross breeding: a case study in genotype building theory. In: Kang,
M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK,
pp. 135141.
Hospital, F. and Charcosset, A. (1997) Marker-assisted introgression of quantitative trait loci. Genetics 147,
14691485.
Hospital, F. and Decoux, G. (2002) Popmin: a program for the numerical optimization of population sizes in
marker-assisted backcross breeding programs. Journal of Heredity 93, 383384.
Hospital, F., Chevalet, C. and Mulsant, P. (1992) Using markers in gene introgression breeding programs.
Genetics 231, 11991210.
Hospital, F., Moreau, L., Lacoudre, F., Charcosset, A. and Gallais, A. (1997) More on the efficiency of
marker-assisted selection. Theoretical and Applied Genetics 95, 11811189.
Hospital, F., Goldringer, I. and Openshaw, S. (2000) Efficient marker-based recurrent selection for multiple
quantitative trait loci. Genetical Research 75, 11811189.
Hoti, F. and Sillanp, M.J. (2006) Bayesian mapping of genotype expression interaction in quantitative
and qualitative traits. Heredity 97, 418.
Howe, A.R., Gasser, C.S., Brown, S.M., Padgette, S.R., Hart, J., Parker, G.B., Fromn, M.E. and Armstrong,
C.L. (2002) Glyphosate as a selective agent for the production of fertile transgenic maize (Zea mays
L.) plants. Molecular Breeding 10, 153164.
Howell, W.M., Jobs, M., Gyllensten, U. and Brooks, V. (1999) Dynamic allele-specific hybridization. A new
method for scoring single nucleotide polymorphisms. Nature Biotechnology 17, 8788.
Hsing, Y.-I., Chern, C.-G., Fan, M.-J., Lu, P.-C., Chen, K.-T., Lo, S.-F., Sun, P.-K., Ho, S.-L., Lee, K.-W.,
Wang, Y.-C., Huang, W.-L., Ko, S.-S., Chen, S., Chen, J.-L., Chung, C.-I., Lin, Y.-C., Hour, A.-L., Wang,
Y.-W., Chang, Y.-C., Tsai, M.-W., Lin, Y.-S., Chen, Y.-C., Yen, H.-M., Li, C.-P., Wey, C.-K., Tseng, C.-S.,
Lai, M.-H., Huang, S.-C., Chen, L.-J. and Yu, S.-M. (2007) A rice gene activation/knockout mutant
resource for high throughput functional genomics. Plant Molecular Biology 63, 351364.
Hu, J. and Vick, B.A. (2003) Target region amplification polymorphism: a novel marker technique for plant
genotyping. Plant Molecular Biology Reporter 21, 289294.
Hua, J., Xing, Y., Wu, W., Xu, C., Sun, X., Yu, S. and Zhang, Q. (2003) Single-locus heterotic effects and
dominance by dominance interactions can adequately explain the genetic basis of heterosis in an
elite rice hybrid. Proceedings of National Academy of Sciences of United States of America 100,
25742579.
Hua, J.P., Xing, Y.Z., Xu, C.G., Sun, X.L., Yu, S.B. and Zhang, Q. (2002) Genetic dissection of an elite rice
hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162,
18851895.
Huamn, Z., Ortiz, R., Zhang, D. and Rodrguez, F. (2000) Isozyme analysis of entire and core collection of
Solanum tuberosum subsp. andigena potato cultivars. Crop Science 40, 273276.
Huang, L., Brooks, S.H., Li, W., Fellers, J.P., Trick, H.N. and Gill, B.S. (2003) Map based cloning of leaf rust
resistance gene Lr21 from the large and polyploid genome in bread wheat. Genetics 164, 655664.
662 References

Huang, N., Courtois, B., Khush, G.S., Lin, H., Wang, G., Wu, P. and Zheng, K. (1996) Association of quan-
titative trait loci for plant height with major dwarfing genes in rice. Heredity 77, 130137.
Huang, N., Angeles, E.R., Domingo, J., Magpantay, G., Singh, S., Zhang, G., Kumaravadivel, N., Bennet,
J. and Khush, G.S. (1997) Pyramiding of bacterial blight resistance genes in rice: marker-assisted
selection using RFLP and PCR. Theoretical and Applied Genetics 95, 313320.
Huang, S., Gilbertson, L.A., Adams, T.H., Malloy, K.P., Reisenbigler, E.K., Birr, D.H., Snyder, M.W., Zhang,
Q. and Luethy, M.H. (2004) Generation of marker-free transgenic maize by regular two-border
Agrobacterium transformation vectors. Transgenic Research 13, 451461.
Huang, X., Feng, Q., Qian, Q., Zhao, Q., Wang, L., Wang, A., Guan, J., Fan, D., Wang, Q., Huang, T.,
Dong, G., Sang, T. and Han, B. (2009) High-throughput genotyping by whole-genome resequencing.
Genome Research 19, 10681076.
Hudson, L.C., Halfhill, M.D. and Stewart, C.N., Jr (2004) Transgene dispersal through pollen. In: Pea, L.
(ed.) Methods in Molecular Biology, Vol. 286. Transgenic Plants: Methods and Protocols. Humana
Press Inc., Totowa, New Jersey, pp. 365374.
Huelsenbeck, J.P., Ronquist, F., Nielsen, R. and Bollback, J.P. (2001) Bayesian inference of phylogeny and
its impact on evolutionary biology. Science 294, 23102314.
Hhn, M. (1996) Nonparametric analysis of genotype environment interactions by ranks. In: Kang, M.S.
and Hauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida,
pp. 235271.
Hulden, M. (1997) Standardization of central crop databases. In: Lipmann, E., Jongen, M.W.M., Hintum,
Th.J.L. van, Gass, T. and Maggioni, L. (compilers) Central Crop Databases: Tools for Plant Genetic
Resources Management. Report of a Workshop, 1316 October 1996, Budapest, Hungary. International
Plant Genetic Resources Institute, Rome, Italy/CGN, Wageningen, Netherlands, pp. 2634.
Hunt, M. (1997) How Science Takes Stock: the Story of Meta Analysis. Russell Sage Foundation,
New York.
Hussein, M.A., Bjornstad, A. and Aastveit, A.H. (2000) SASG ESTAB: a SAS program for computing
genotype environment stability statistics. Agronomy Journal 92, 454459.
Hyne, V. and Kearsey, M.J. (1995) QTL analysis further uses of marker regression. Theoretical and
Applied Genetics 91, 471476.
Hyten, D.L., Song, Q., Choi, I.-Y., Yoon, M.-P., Specht, J.E., Matukumalli, L.K., Nelson, R.L., Shoemaker,
R.C., Young, N.D. and Cregan, P.B. (2008) High-throughput genotyping with the GoldenGate assay in
the complex genome of soybean. Theoretical and Applied Genetics 116, 945952.
IBPGR (International Board for Plant Genetic Resources) (1986) Design, Planning and Operation of In
Vitro Genebanks: Reports of a Subcommittee of the IBPGR Advisory Committee on In Vitro Storage.
IBPGR, Rome.
IBRD/World Bank (The International Bank for Reconstruction and Development/The World Bank) (2006)
Intellectual Property Rights: Designing Regimes to Support Plant Breeding in Developing Countries.
The World Bank, Washington, DC.
Ideta, O., Yoshimura, A. and Iwata, N. (1996) An integrated linkage map of rice. Rice Genetics III. Proceedings
of the Third International Rice Genetics Symposium, 1620 October 1995, Manila. International Rice
Research Institute (IRRI), Manila, Phillipines.
Igartua, E., Casas, A.M., Ciudad, F., Montoya, L. and Romagosa, I. (1999) RFLP markers associated
with major genes controlling heading date evaluated in a barley germ plasm pool. Heredity 83,
551559.
Igartua, E., Edney, M., Rossnagel, B.G., Spaner, D., Legge, W.G., Scoles, G.L., Ecksteins, P.E., Penner,
G.A.,Tinker, N.A., Briggs, K.G., Falk, D.E. and Mather, D.E. (2000) Marker-assisted selection of QTL
affecting grain and malt quality in two-row barley. Crop Science 40, 14261433.
Ikeda, A., Ueguchi-Tanaka, M., Sonoda, Y., Kitano, H., Koshioka, M., Futsuhara, Y., Matsuoka, M. and
Yamaguchi, J. (2001) slender rice, a constitutive gibberellin response mutant, is caused by a null
mutation of the SLR1 gene, an ortholog of the height-regulating gene GAI/RGA/RHT/D8. The Plant
Cell 13, 9991010.
Ilic, K., Kellogg, E.A., Jaiswal, P., Zapata, F., Stevens, P.F., Vincent, L.P., Avraham, S., Reiser, L., Pujar, A.,
Sachs, M.M., Whitman, N.T., McCouch, S.R., Schaeffer, M.L., Ware, D.H., Stein, L.D. and Rhee, S.Y.
(2007) The Plant Structure Ontology, a unified vocabulary of anatomy and morphology of a flowering
plant. Plant Physiology 143, 587599.
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of human
genome. Nature 409, 860921.
References 663

Ioannidis, J.P., Ntzani, E.E., Trikalinos, T.A. and Contopoulos-Ioannidis, D.G. (2001) Replication validity of
genetic association studies. Nature Genetics 29, 306309.
IRGSP (International Rice Genome Sequencing Project) (2005) The map-based sequence of the rice
genome. Nature 436, 793800.
ISF (International Seed Federation) (2004) Protection of Intellectual Property and Access to Plant Genetic
Resources. Proceedings of an International Seminar, 2728 May, 2004, Berlin, CD-ROM.
ISF (International Seed Federation) (2005) Essential derivation from a not-yet protected variety and
dependency. ISF Position Paper, June 2005. Available at: http://www.worldseed.org/Position_papers/
ED&Dependency.htm (accessed 30 June 2007).
Ishida, Y., Saito, H., Ohta, S., Hiei, Y., Komari, T. and Kumashiro, T. (1996) High efficiency transformation of
maize (Zea mays L.) mediated by Agrobacterium tumefaciens. Nature Biotechnology 14, 745750.
Ishida, Y., Murai, N., Kuraya, Y., Ohta, S., Saito, H., Hiei, Y. and Komari, T. (2004) Improved co-transformation
of maize with vectors carrying two separate T-DNAs mediated by Agrobacterium tumefaciens. Plant
Biotechnology 21, 5763.
Ishimaru, K. (2003) Identification of a locus increasing rice yield and physiological analysis of its function.
Plant Physiology 122, 10831090.
Ivandic, V., Hackett, C.A., Nevo, E., Keith, R., Thomas, W.T.B. and Forster, B.P. (2002) Analysis of simple
sequence repeats (SSRs) in wild barley from the Fertile Crescent: associations with ecology, geogra-
phy and flowering time. Plant Molecular Biology 48, 511527.
Ivandic, V., Thomas, W.T.B., Nevo, E., Zhang, Z. and Forster, B.P. (2003) Association of SSRs with quan-
titative trait variation including biotic and abiotic stress tolerance in Hordeum spontaneum. Plant
Breeding 122, 300304.
Iwata, H., Uga, Y., Yoshioka, Y., Ebana, K. and Hayashi, T. (2007) Bayesian association mapping of multiple
quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L.
germplasms. Theoretical and Applied Genetics 114, 14371449.
Izawa, T., Takahashi, Y. and Yano, M. (2003) Comparative biology comes into bloom: genomic and genetic
comparison of flowering pathways in rice and Arabidopsis. Current Opinion in Plant Biology 6,
113120.
Jaccoud, D., Peng, K., Feinstein, D. and Kilian, A. (2001) Diversity arrays: a solid state technology for
sequence information independent genotyping. Nucleic Acids Research 29, e25.
Jack, T., Fox, G.L. and Meyerowitz, E.M. (1994) Arabidopsis homeotic gene APETALA3 ectopic expression:
transcriptional and posttranscriptional regulation determine floral organ identity. Cell 76, 703716.
Jaffe, G. (2004) Regulation transgenic crops: a comparative analysis of different regulatory processes.
Transgenic Research 13, 519.
Jain, S.M., Sopory, S.K. and Veilleux, R.E. (19961997) In Vitro Haploid Production in Higher Plants. Kluwer
Academic Publishers, Dordrecht, Netherlands.
James, C. (2006) Global Status of Commercialized Biotech/GM Crops: 2006. ISAAA Briefs No. 35.
International Service for the Acquisition of Agri-biotech Applications (ISAAA), Ithaca, New York.
James, C. (2008) 2007 ISAAA Report on Global Status of Biotech/GM Crops. International Service for
the Acquisition of Agri-biotech Applications (ISAAA). Available at: http://www.isaaa.org (accessed 17
November 2009).
Jander, G., Norris, S.R., Rounsley, S.D., Bush, D.F., Levin, I.M. and Last, R.L. (2002) Arabidopsis map-
based cloning in the post-genome era. Plant Physiology 129, 440450.
Janick, J. (1988) Horticulture, science and society. HortScience 23, 1113.
Janick, J. (1998) Hybrids in horticulture crops. In: Lamkey, K.R. and Staub, J.E. (eds) Concepts and
Breeding of Heterosis in Crop Plants. Crop Science Society of America (CSSA), Madison, Wisconsin,
pp. 4556.
Janis, M.D. and Kesan, J.P. (2002) U.S. plant variety protection: sound or furry ? Houston Law Review
39, 727778.
Janis, M.D. and Smith, S. (2007) Obsolescence in intellectual property regimes. University of Iowa Legal
Studies Research Paper No. 05-48. Abstract available at: http://papers.ssrn.com/sol3/papers.
cfm?abstract_id=897728 (accessed 17 November 2009).
Jannink, J.L. (2005) Selective phenotyping to accurately mapping quantitative trait loci. Crop Science 45,
901908.
Jannink, J.L. and Jansen, R.C. (2000) The diallel mating design for mapping interacting QTLs. In: Quantitative
Genetics and Breeding Methods: the Way Ahead. Institut National de la Recherche Agronomique
(INRA), Paris, pp. 8188.
664 References

Jannink, J.L. and Jansen, R. (2001) Mapping epistatic quantitative trait loci with one-dimensional genome
searches. Genetics 157, 445454.
Jannink, J.L. and Walsh, B. (2002) Association mapping in plant populations. In: Kang, M.S. (ed.) Quantitative
Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK, pp. 5968.
Jannink, J.L., Bink, M. and Jansen, R.C. (2001) Using complex plant pedigrees to map valuable genes.
Trends in Plant Science 6, 337342.
Jansen, C., Thomas, D.Y. and Pollock, S. (2005) Yeast two-hybrid technologies. In: Sensen, C.W. (ed.)
Handbook of Genome Research, Genomics, Metabolomics, Bioinformatics, Ethical and Legal Issues.
WILEY-VCH Verlag GmbH & Co., KGaA, Weinheim, Germany, pp. 261272.
Jansen, J.P.A. (1996) Aphid resistance in composites. International application published under the patent
cooperation treaty (PCT) No. WO 97/46080.
Jansen, R.C. (1996) A general Monte Carlo method for mapping multiple quantitative trait loci. Genetics
142, 305311.
Jansen, R.C. and Beavis, W.D. (2001) MQM mapping using haplotyped putative QTL-alleles: a simple
approach for mapping QTLs in plant breeding populations. Patent EP 1265476.
Jansen, R.C. and Nap, J.P. (2001) Genetical genomics: the added value from segregation. Trends in
Genetics 17, 388391.
Jansen, R.C. and Stam, P. (1994) High resolution of quantitative traits into multiple loci via interval mapping.
Genetics 136, 14471455.
Jansen, R.C., Van-Ooijen, J.W., Stam, P., Lister, C. and Dean, C. (1995) Genotype-by-environment inter-
action in genetic mapping of multiple quantitative trait loci. Theoretical and Applied Genetics 91,
3337.
Jansen, R.C., Jannink, J.-L. and Beavis, W.D. (2003) Mapping quantitative trait loci in plant breeding popu-
lations: use of parental haplotype sharing. Crop Science 43, 829834.
Jarvis, A., Yeaman, S., Guarino, L. and Tohme, J. (2005) The role of geographic analysis in locating, under-
standing and using plant genetic diversity. In: Zimmer, E. (ed.) Molecular Evolution: Producing the
Biochemical Data, Part B. Elsevier, New York, pp. 279298.
Jarvis, D.I. and Hodgkin, T. (1999) Wild relatives and crop cultivars: detecting natural introgression and
farmer selection of new genetic combinations in agroecosystems. Molecular Ecology 8, S159S173.
Jayasekara, N.E.M. and Jinks, J.L. (1976) Effect of gene dispersion on estimates of components of genera-
tion means and variances. Heredity 36, 3140.
Jefferson, R.A. (1987) Assaying chimeric genes in plants: the GUS gene fusion system. Plant Molecular
Biology Reporter 5, 387405.
Jenkins, H., Johnson, H., Kular, B., Wang, T. and Hardy, N. (2005) Toward supportive data collection tools
for plant metabolomics. Plant Physiology 138, 6777.
Jenkins, S. and Gibson, N. (2002) High-throughput SNP genotyping. Comparative and Functional Genomics
3, 5766.
Jenks, M.A. and Feldmann, K. (1996) Cloning genes by insertion mutagenesis. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 155168.
Jensen, C.J. (1974) Chromosome doubling techniques in haploids. In: Kasha, K.J. (ed.) Haploids in Higher
Plants: Advances and Potentials. Guelph University Press, Guelph, Canada, pp. 153190.
Jensen, L.J., Saric, J. and Bork, P. (2006) Literature mining for the biologist: from information retrieval to
biological discovery. Nature Reviews Genetics 7, 119129.
Jeon, J.-S., Kang, H.-G. and An, G. (2004) Tools for gene tagging and mutagenesis. In: Christou, P. and Klee,
H. (eds) Handbook of Plant Biotechnology. John Wiley & Sons Ltd, Chichester, UK, pp. 103125.
Jia, H., Pang, Y., Chen, X. and Fang, R. (2006) Removal of the selectable marker gene from transgenic
tobacco plants by expression of Cre recombinase from a tobacco mosaic virus vector through agroin-
fection. Transgenic Research 15, 375384.
Jiang, C. and Zeng, Z.B. (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics
140, 11111127.
Jiang, C., Pan, X. and Gu, M. (1994) The use of mixture models to detect effects of major genes on quan-
titative characters in plant breeding experiment. Genetics 136, 383394.
Jiang, C., Edmeades, G.O., Armstead, I., Lafitte, H.R., Hayward, M.D. and Hoisington, D. (1999) Genetic
analysis of adaptation differences between highland and lowland tropical maize using molecular
markers. Theoretical and Applied Genetics 99, 11061119.
Jiang, N., Bao, Z., Zhang, X., Hirochika, H., Eddy, S.R., McCouch, S.R. and Wessler, S.R. (2003) An active
DNA transposon family in rice. Nature 421, 163167.
References 665

Jin, C., Lan, H., Attie, A.D., Churchill, G.A., Bulutuglo, D. and Yandell, B.Y. (2004) Selective phenotyping for
increased efficiency in genetic mapping study. Genetics 168, 22852293.
Jin, S., Komari, T., Gordon, M.P. and Nester, E.W. (1987) Genes responsible for the supervirulence pheno-
type of Agrobacterium tumefaciens A281. Journal of Bacteriology 169, 44174425.
Jinks, J.L. and Perkins, J.M. (1969) The detection of linked epistatic genes for a metrical trait. Heredity 24,
465475.
Jinks, J.L. and Perkins, J.M. (1972) Predicting the range of inbred lines. Heredity 28, 399403.
Joen, J.-S., Lee, S., Jung, K.-H., Jun, S.-H., Joeng, D.-H., Lee, J., Kim, C., Jang, S., Yang, K., Nam, J., An, K.,
Han, M.J., Sung, R.-J., Choi, H.-S., Yu, J.-H., Choi, J.-H., Cho, S.-S., Cha, S.-S., Kim, S.-I. and An, G.
(2000) T-DNA insertional mutagenesis for functional genomics in rice. The Plant Journal 22, 561571.
Joersbo, M. and Okkels, F.T. (1996) A novel principle for selection of transgenic plant cells: positive selec-
tion. Plant Cell Reports 16, 219221.
Joersbo, M., Donaldson, I., Kreiberg, J., Petersen, S.G., Brunstedt, J. and Okkels, F.T. (1998) Analysis of
mannose selection used for transformation of sugar beet. Molecular Breeding 4, 111117.
Johannes, F. (2007) Mapping temporally varying quantitative trait loci in time-to-failure experiments.
Genetics 175, 855865.
Johnson, B., Gardner, C.O. and Wrede, K.C. (1988) Application of an optimization model to multi-trait selec-
tion programs. Crop Science 28, 723728.
Johnson, G.R. (2004) Marker assisted selection. Plant Breeding Reviews 24, 293310.
Johnson, H.E., Broadburst, D., Goodacre, R. and Smith, A.R. (2003) Metabolic fingerprinting of salt-
stressed tomatoes. Phytochemistry 62, 919928.
Johnson, H.W., Robinson, H.F. and Comstock, R.E. (1955) Estimates of genetic and environmental vari-
ability in soybeans. Agronomy Journal 47, 314318.
Johnson, R. (2001) Marker-assisted sweet corn breeding: a model for special crops. In: Proceedings of
56th Annual Corn and Sorghum Industry Research Conference Chicago, Illinois, 57 December
2001. American Seed Trade Association, Washington, DC, pp. 2530.
Jones, H. (ed.) (1995) Plant Gene Transfer and Expression Protocols. Humana Press, Totowa, New Jersey.
Jones, H.D., Doherty, A. and Wu, H. (2005) Review of methodologies and a protocol for the Agrobacterium-
mediated transformation of wheat. Plant Methods 2005, 15.
Jorasch, P. (2004) Intellectual property rights in the field of molecular marker analysis. In: Lrz, H. and
Wenzel, G. (eds) Biotechnology in Agriculture and Forestry, Vol. 55. Molecular Marker Systems.
Springer-Verlag Berlin, pp. 433471.
Jordaan, J.P., Engelbrecht, S.A., Malan, J.H. and Knobel, H.A. (1999) Wheat and heterosis. In: Coors, J.G.
and Pandey, S. (eds) Genetics and Exploitation of Heterosis in Crops. ASA-CSSA-SSSA, Madison,
Wisconsin, pp. 411421.
Jordan, D., Tao, Y., Godwin, I., Henzell, R., Cooper, M. and McIntyre, C. (2004) Prediction of hybrid perform-
ance in grain sorghum using RFLP markers. Theoretical and Applied Genetics 106, 559567.
Jorde, L.B. (2000) Linkage disequilibrium and the search for complex disease genes. Genome Research
10, 14351444.
Joseph, M., Gopalakrishnan, S., Sharma, R.K., Singh, V.P., Singh, A.K., Singh, N.K. and Mohapatra, T.
(2003) Combining bacterial blight resistance and Basmati quality characteristics by phenotypic and
molecular marker-assisted selection in rice. Molecular Breeding 13, 111.
Jourjon, M.F., Jasson, S., Marcel, J., Ngom, B. and Mangin, B. (2005) MCQTL: multi-allelic QTL mapping
in multi-cross design. Bioinformatics 21, 128130.
Jung, K.-H., An, G. and Ronald, P.C. (2008) Towards a better bowl of rice: assigning function to tens of
thousands of rice genes. Nature Reviews Genetics 9, 91101.
Kahler, A.L., Gardner, C.O. and Allard, R.W. (1984) Nonrandom mating in experimental populations of
maize. Crop Science 24, 350354.
Kahraman, A., Avramov, A., Nashev, L.G., Popov, D., Ternes, R., Pohlenz, H.-D. and Weiss, B. (2005)
PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics. Bio-
informatics 21, 418420.
Kahvejian, A., Quackenbush, J. and Thompson, J.F. (2008) What would you do if you could sequence
everything? Nature Biotechnology 26, 11251133.
Kamujima, O., Tanisaka, T. and Kinoshita, T. (1996) Gene symbols for dwarfness. Rice Genetics Newsletter
13,1925.
Kang, M.S. (1988) A rank-sum method for selecting high-yielding, stable corn genotypes. Cereal Research
Communications 16, 113115.
666 References

Kang, M.S. (1990) Understanding and utilization of genotypeenvironment interaction in plant breeding.
In: Kang, M.S. (ed.) Genotype-By-Environment Interactions and Plant Breeding. Louisiana State
University Agriculture Center, Baton Rouge, Louisiana, pp. 5268.
Kang, M.S. (1993) Simultaneous selection for yield and stability in crop performance trials: consequences
for growers. Agronomy Journal 85, 754757.
Kang, M.S. (2002) Genotypeenvironment interaction: progress and prospects. In: Kang, M.S. (ed.)
Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford, UK,
pp. 221243.
Kang, M.S. and Magari, R. (1996) New developments in selecting for phenotypic stability in crop breeding.
In: Kang, M.S. and Gauch, H.G., Jr (eds) Genotype-by-Environment Interaction. CRC Press, Boca
Raton, Florida, pp. 114.
Kantety, R.V., Rota, M.L., Mathews, D.E. and Sorrels, M.E. (2002) Data mining for simple-sequence repeats
in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Molecular Biology
48, 501510.
Kao, C.H. (2004) Multiple-interval mapping for quantitative trait loci controlling endosperm traits. Genetics
167, 19872002.
Kao, C.H. (2006) Mapping quantitative trait loci using the experimental designs of recombinant inbred
populations. Genetics 174, 13731386.
Kao, C.H. and Zeng, Z.B. (1997) General formulas for obtaining the MLEs and the asymptotic variance
covariance matrix in mapping quantitative trait loci when using the EM algorithm. Biometrics 53,
653665.
Kao, C.H., Zeng, Z.B. and Teasdale, R.D. (1999) Multiple interval mapping for quantitative trait loci. Genetics
152, 12031216.
Karas, M. and Hillenkamp, F. (1988) Laser desorption ionization of proteins with molecular mass exceeding
10000 daltons. Analytical Chemistry 60, 22992301.
Karimi, M., Bleys, A., Vanderhaeghen, R. and Hilson, P. (2007) Building blocks for plant gene assembly.
Plant Physiology 145, 11831191.
Karp, A. and Edwards, J. (1997) DNA markers: a global overview. In: Caetano-Anolles, G. and Gresshoff,
P.M. (eds) DNA Markers Protocols, Applications and Overviews. Wiley-Liss, Inc., New York,
pp. 113.
Kartal, M. (2007) Intellectual property protection in the natural product drug discovery, traditional herbal
medicine and herbal medicinal products. Phytotherapy Research 21, 113119.
Kasha, K.J. (2005) Chromosome doubling and recovery of doubled haploid plants. In: Palmer, C.E., Keller,
W.A. and Kasha, K.J. (eds) Biotechnology in Agriculture and Forestry, Vol. 56. Haploids in Crop
Improvement II. Springer-Verlag, Berlin, pp. 123152.
Katari, M.S., Balija, V., Wilson, R.K., Martienssen, R.A. and McCombie, W.R. (2005) Comparing low cover-
age random shotgun sequence data from Brassica oleracea and Oryza sativa genome sequence for
their ability to add to the annotation of Arabidopsis thaliana. Genome Research 15, 496504.
Kato, A. (2002) Chromosome doubling of haploid maize seedlings using nitrous oxide gas at the flower
primordial stage. Plant Breeding 121, 370377.
Kauffman, S.A. (1993) The Origins of Order: Self-Organization and Selection in Evolution. Oxford University
Press, Oxford, UK.
Kaushik, N., Sirohi, M. and Khanna, V.K. (2004) Influence of age of the embryo and method of hormone
application on haploid embryo formation in wheat x maize crosses. In: New Directions for a Diverse
Planet, Proceedings of the 4th International Crop Science Congress, 26 September1 October, 2004,
Brisbane, Australia. Published on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004/
(accessed 17 November 2009).
Kearsey, M.J. and Farquhar, A.G.L. (1998) QTL analysis in plants: where are we now? Heredity 80,
137142.
Kearsey, M.J. and Hyne, V. (1994) QTL analysis: a simple marker regression approach. Theoretical and
Applied Genetics 89, 698702.
Kearsey, M.J. and Jinks, J.L. (1968) A general method of detecting additive, dominance and epistasis vari-
ation for metrical traits. I. Theory. Heredity 23, 403409.
Kearsey, M.J., Pooni, H.S. and Syed, N.H. (2003) Genetics of quantitative traits in Arabidopsis thaliana.
Heredity 91, 456464.
Keating, B.A., Carberry, P.S., Hammer, G.L., Probert, M.E., Robertson, M.J., Holzworth, D., Huth, N.I.,
Hargreaves, J.N.G., Meinke, H., Hockman, Z., McLean, G., Verburg, K., Snow, V., Dimes, J.P., Silburn,
References 667

M., Wang, E., Brown, S., Bristow, K.L., Asseng, S., Chapman, S., McCown, R.L., Freebairn, D.M. and
Smith, C.J. (2003) An overview of APSIM, a model designed for farming system simulation. European
Journal of Agronomy 18, 267288.
Keightley, P.D. (2004) Mutational variation and long-term selection response. Plant Breeding Reviews
24(1), 227247.
Keightley, P.D. and Bulfield, G. (1993) Detection of quantitative trait loci from frequency changes of marker
alleles under selection. Genetical Research 62, 195203.
Keller, E.R.J. and Korzun, L. (1996) Haploidy in onion (Allium cepa L.) and other Allium species. In: Jain,
S.M., Sopory, S.M. and Veilleux, R.E. (eds) In Vitro Haploid Production in Higher Plants. Vol. 3:
Important Selected Plants. Kluwer Academic Publisher, Dordrecht, Netherlands, pp. 5175.
Kempthorne, O. (1957) An Introduction to Genetics Statistics. Wiley, New York.
Kempthorne, O. (1988) An overview of the field of quantitative genetics. In: Weir, B.S., Eisen, E.J., Goodman,
M.M. and Namkoong, G. (eds) Proceedings of the 2nd International Conference on Quantitative
Genetics. Sinauer Associates, Inc., Sunderland, Massachusetts, pp. 4756.
Kennedy, B.G., Waters, D.L.E. and Henry, R.J. (2006) Screening for the rice blast resistance gene Pi-ta
using LNA displacement probes and real-time PCR. Molecular Breeding 18, 185193.
Kermicle, J.L. (1969) Androgenesis conditioned by a mutation in maize. Science 166, 14221424.
Kerns, M.R., Dudley, J.W. and Rufener, G.K. (1999) QTL for resistance to common rust and smut in maize.
Maydica 44, 3745.
Kersten, B., Berkle, L., Kuhn, E.J., Giavalisco, P., Konthur, Z., Lueking, A., Walter, G., Eickhoff, H. and
Schneider, U. (2002) Large-scale plant proteomics. Plant Molecular Biology 48, 133141.
Keurentjes, J.J., Bentsink, L., Alonso-Blanco, C., Hanhart, C.J., Blankestijn-De Vries, H., Effgen, S., Vreugdenhil,
D. and Koornneef, M. (2007a) Development of a near-isogenic line population of Arabidopsis thaliana
and comparison of mapping power with a recombinant inbred line population. Genetics 175, 891905.
Keurentjes, J.J.B., Jingyuan Fu, L., Terpstra, I.R., Garcia, J.M., Ackerveken, G., Snoek, L.B., Peeters,
A.J.M., Vreugdenhil, D., Koornneef, M. and Jansen, R.C. (2007b) Regulatory network construction in
Arabidopsis by using genome-wide gene expression quantitative trait loci. Proceedings of the National
Academy of Sciences of the United States of America 104, 17081713.
Keurentjes, J.J.B., Koornnef, M. and Vreugdenhil, D. (2008) Quantitative genetics in the age of omics.
Current Opinion in Plant Biology 11, 123128.
Khatkar, M.S., Thomson, P.C., Tammen, I. and Raadsma, H.W. (2004) Quantitative trait loci mapping in
dairy cattle: review and meta-analysis. Genetics Selection Evolution 36, 163190.
Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations
and open problems. Bioinformatics 21, 35873595.
Khatri, P., Draghici, S., Ostermeier, G.C. and Krawetz, S.A. (2002) Profiling gene expression using Onto-
Express. Genomics 79, 266270.
Khush, G.S. (1987) List of gene markers maintained in the Rice Genetic Stock Center, IRRI. Rice Genetics
Newsletter 4, 5662.
Khush, G.S. (1999) Green revolution: preparing for the 21st century. Genome 42, 646655.
Kiesselbach, T.A. (1926) The immediate effect of gametic relationship and of parental type upon the kernel
weight of corn. Nebraska Agricultural Experiment Station Bulletin 33, 169.
Kikuchi, K., Terauchi, K., Wada, M. and Hirano, Y. (2003) The plant MITE mPing is mobilized in anther
culture. Nature 421, 167170.
Kilian, A., Chen, J., Han, F., Steffenson, B. and Kleinhofs, A. (1997) Towards map-based cloning of the
barley stem rust resistance gene Rpg1 and rpg4 using rice as an intergenomic cloning vehicle. Plant
Molecular Biology 35, 187195.
Kilian, A., Kudrna, D. and Kleinhofs, A. (1999) Genetic and molecular characterization of barley chromo-
some telomeres. Genome 42, 412419.
Kilpikari, R. and Sillanp, M.J. (2003) Bayesian analysis of multilocus association in quantitative and
qualitative traits. Genetic Epidemiology 25, 122135.
Kim, K.-W., Chung, H.-K., Cho, G.-T., Ma, K.-H., Chandrabalan, D., Gwag, J.-G., Kim, T.-S., Cho, E.-G. and
Park, Y.-J. (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for
establishing core mining sets. Bioinformatics 23, 21552162.
Kimmel, A. and Oliver, B. (eds) (2006a) DNA Microarrays Part A: Array Platforms and Wet-Bench Protocols.
Elsevier Inc., Amsterdam.
Kimmel, A. and Oliver, B. (eds) (2006b) DNA Microarrays Part B: Databases and Statistics. Elsevier Inc.,
Amsterdam.
668 References

Kimmel, B.E., Palazzolo, M.J., Martin, C.H., Boeke, J.D. and Devine, S.E. (1997) Transposon-mediated
DNA sequencing. In: Birren, B., Green, E.D., Klapholz, S., Myers, R.M. and Roskams, J. (eds) Genome
Analysis: a Laboratory Manual, Vol. 1. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New
York, pp. 455532.
Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to
steady flux of mutations. Genetics 61, 893903.
King, G.L. (2004) Bioinformatics: harvesting information for plant and crop science. Seminars in Cell and
Developmental Biology 15, 721731.
King, J., Armstead, I.P., Donnison, I.S., Thomas, H.M., Jones, R.N., Kearseyc, M.J., Roberts, L.A., Thomas, A.,
Morgan, W.G. and King, I.P. (2002) Physical and genetic mapping in the grasses Lolium perenne and
Festuca pratensis. Genetics 161, 315324.
Kinoshita, T. (1995) Report of Committee on Gene Symbolization, Nomenclature and Linkage Groups. Rice
Genetics Newsletter 12, 9153.
Kinoshita, T. and Takahashi, M. (1991) The one hundredth report of genetical studies on rice plant: linkage
studies and future prospects. Journal of the Faculty of Agriculture, Hokkaido University 65, 161.
Kirst, M., Myburg, A.A., De Len, J.P.G., Kirst, M.E., Scott, J. and Sederoff, R. (2004) Coordinated genetic
regulation of growth and lignin revealed by quantitative trait locus analysis of cDNA microarray data in
an interspecific backcross of eucalyptus. Plant Physiology 135, 23682378.
Kisana, N.S., Nkongolo, K.K., Quick, J.S. and Johnson, D.L. (1993) Production of doubled haploids by
anther culture and wheat maize method in a wheat breeding programme. Plant Breeding 110,
96102.
Kiviharju, E., Moisander, S. and Laurila, J. (2005) Improved green plant regeneration rates from oat anther cul-
ture and the agronomic performance of some DH lines. Plant Cell, Tissue and Organ Culture 81, 19.
Kjemtrup, S., Boyes, D.C., Christensen, C., McCaskill, A.J., Hylton, M. and Davis, K. (2003) Growth
stage-based phenotypic profiling of plants. In: Grotewold, E. (ed.) Methods in Molecular Biology, Vol.
236. Plant Functional Genomics: Methods and Protocols. Humana Press, Totowa, New Jersey, pp.
427441.
Klein, P.E., Klein, R.R., Cartinhour, S.W., Ulanch, P.E., Dong, J., Obert, J.A., Morishige, D.T., Schlueter,
S.D., Childs, K.L., Ale, M. and Mullet, J.E. (2000) A high-throughput AFLP-based method for con-
structing integrated genetic and physical maps: progress toward a sorghum genome map. Genome
Research 10, 789807.
Klein, P.E., Klein, R.R., Vrebalov, J. and Mullet, J.E. (2003) Sequence-based alignment of sorghum chro-
mosome 3 and rice chromosome 1 reveals extensive conservation of gene order and one major
chromosomal rearrangement. The Plant Journal 34, 605621.
Klose, J., Nock, C., Herrmann, M., Sthler, K., Marcus, K., Blggel, M., Krause, E., Schalkwyk, L.C.,
Rastan, S., Brown, S.D.M., Bssow, K., Himmelbauer, H. and Lehrach, H. (2002) Genetic analysis of
mouse brain proteome. Nature Genetics 30, 385393.
Knapp, S.J. (1991) Using molecular markers to map multiple quantitative trait loci: models for backcross,
recombinant inbred and doubled haploid progeny. Theoretical and Applied Genetics 81, 333338.
Knapp, S.J. (1994) Mapping quantitative trait loci. In: Philip, R.I. and Vasil, I.K. (eds) DNA-Based Markers in
Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 5896.
Knapp, S.J. (1998) Marker-assisted selection as a strategy for increasing the probability of selecting supe-
rior genotypes. Crop Science 38, 11641174.
Knapp, S.J., Holloway, J.L., Bridges, W.C. and Liu, B.-H. (1995) Mapping dominant markers using F2 mat-
ings. Theoretical and Applied Genetics 91, 7481.
Knoll, J. and Ejeta, G. (2008) Marker-assisted selection for early-season cold tolerance in sorghum: QTL
validation across populations and environments. Theoretical and Applied Genetics 116, 541553.
Knox, M.R. and Ellis, T.H. (2001) Stability and inheritance of methylation states at PstI sites in Pisum.
Molecular Genetics and Genomics 265, 497507.
Kobiljski, B., Quarrie, S., Dencic, S., Kirby, J. and Ivege, M. (2002) Genetic diversity of the Novi Sad wheat
core collection revealed by microsatellites. Cellular and Molecular Biology 7, 685694.
Koebner, R.M. and Summers, R.W. (2003) 21st century wheat breeding: plot selection or plate detection?
Trends in Biotechnology 21, 5963.
Koester, R.P., Sisco, P.H. and Stuber, C.W. (1993) Identification of quantitative trait loci controlling days to
flowering and plant height in two near isogenic lines of maize. Crop Science 33, 12091216.
Kohli, A., Leech, M., Vain, P., Laurie, D.A. and Christou, P. (1998) Transgene organization in rice engineered
through direct DNA transfer supports a two-phase integration mechanism mediated by the establish-
References 669

ment of integration hot spots. Proceedings of the National Academy of Sciences of the United States
of America 95, 72037208.
Kohli, A., Xiong, J., Greco, R., Christou, P. and Pereira, A. (2001) Transcriptome Display (TTD) in indica rice
using Ac transposition. Molecular Genetics and Genomics 266, 111.
Kohli, A., Twyman, R.M., Abranches, A., Wegel, E., Christou, P. and Stoger, E. (2003) Transgene integra-
tion, organization and interaction in plants. Plant Molecular Biology 52, 247258.
Kohli, A., Prynne, M.Q., Berta, M., Pereira, A., Cappell, T., Twyman, R.M. and Christou, P. (2004)
Dedifferentiation-mediated changes in transposition behavior make the Activator transposon an ideal
tool for functional genomics in rice. Molecular Breeding 13, 177191.
Koizuka, N., Imai, R., Fujimoto, H., Hayakawa, T., Kimura, Y., Kohno-Murase, J., Sakai, T., Kawasaki, S. and
Imamura, J. (2003) Genetic characterization of a pentatricopeptide repeat protein gene, orf687, that
restores fertility in the cytoplasmic male-sterile Kosena radish. The Plant Journal 34, 407415.
Kojima, S., Takahashi, Y., Kobayashi, Y., Monna, L., Sasaki, T., Araki, T. and Yano, M. (2002) Hd3a, a rice
ortholog of the Arabidopsis FT gene, promotes transition to flowering downstream of Hd1 under short-
day conditions. Plant Cell and Physiology 43, 10961105.
Koller, A., Washburn, M.P., Lange, B.M., Andon, N.L., Deciu, C., Haynes, P.A., Hays, L., Schieltz, D., Ulaszek,
R., Wei, J., Wolters, D. and Yates, J.R. III (2002) Proteomic survey of metabolic pathways in rice.
Proceedings of the National Academy of Sciences of the United States of America 99, 1196911974.
Komari, T. (1990) Transformation of cultured cells of Chenopodium quinoa by binary vectors that carry a
fragment of DNA from the virulence region of pTiBo542. Plant Cell Reports 9, 303306.
Komari, T., Hiei, Y., Saito, Y., Murai, N. and Kumashiro, T. (1996) Vectors carrying two separate T-DNAs for
co-transformation of higher plants mediated by Agrobacterium tumefaciens and segregation of trans-
formants free from selection markers. The Plant Journal 10, 165174.
Komari, T., Takakura, Y., Ueki, J., Kato, N., Ishida, Y. and Hiei, Y. (2006) Binary vectors and super-binary
vectors. In: Wang, K. (ed.) Methods in Molecular Biology 343: Agrobacterium Protocols, Vol. 1, 2nd
edn. Humana Press, Totowa, New Jersey, pp. 1541.
Komori, T., Ohta, S., Murai, N., Takakura, Y., Kuraya, Y., Suzuki, S., Hiei, Y., Imaseki, H. and Nitta, N. (2004)
Map-based cloning of a fertility restorer gene, Rf-1, in rice (Oryza sativa L.). The Plant Journal 37,
315325.
Komori, T., Imayama, T., Kato, N., Ishida,Y., Ueki, J. and Komari, T. (2007) Current status of binary vectors
and superbinary vectors. Plant Physiology 145, 11551160.
Koncz, C. and Schell, J. (1986) The promoter of TL-DNA gene 5 controls the tissue-specific expression
of chimaeric genes carried by a novel type of Agrobacterium binary vector. Molecular and General
Genetics 204, 383396.
Konieczny, A. and Ausubel, F. (1993) A procedure for mapping Arabidopsis mutations using co-dominant
ecotype-specific PCR based markers. The Plant Journal 4, 403410.
Konishi, T., Abe, K., Matsuura, S. and Yano, Y. (1990) Distorted segregation of the esterase isozyme geno-
types in barley Hordeum vulgare L. Japanese Journal of Genetics 65, 411416.
Konishi, T., Yano, Y. and Abe, K. (1992) Geographic distribution of alleles at the ga2 locus for segregation
distortion in barley. Theoretical and Applied Genetics 85, 419422.
Koonin, E.V. (2005) Orthologies, paralogs and evolutionay genomics. Annual Review of Genetics 39,
309338.
Koornneef, M., Dellaert, L.W.M. and van der Veen, J.H. (1982) EMS- and radiation-induced mutation fre-
quencies at individual loci in Arabidopsis thaliana (L.) Heynh. Mutation Research 93, 109123.
Korbel, J.O., Doerks, T., Jensen, L.J., Perez-Iratxeta, C., Kaczanowski, S., Hooper, S.D., Andrade, M.A.
and Bork, P. (2005) Systematic association of genes to phenotypes by genome and literature mining.
PLos Biology 3, e134.
Korol, A.B., Ronin, Y.I., Nevo, E. and Hayes, P. (1998) Multi-interval mapping of correlated trait complexes:
simulation analysis and evidence from barley. Heredity 80, 273284.
Korol, A.B., Ronin, Y.I., Itskovichi, A.M., Peng, J. and Nevo, E. (2001) Enhanced efficiency of quantita-
tive trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157,
17891803.
Kosambi, D.D. (1944) The estimation of map distances from recombination values. Annals of Eugenics 12,
172175.
Kota, R., Rudd, S., Facius, A., Kolesov, G., Theil, T., Zhang, H., Stein, N., Mayer, K. and Graner, A. (2003)
Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.). Molecular
Genectics and Genomics 270, 2433.
670 References

Kowalski, S.P. and Kryder, R.D. (2002) Golden rice: a case study in intellectual property management and
international capacity building. RISK: Health, Safety and Environment 13, 4767.
Kraakman, A.T.W., Niks, R.E., van den Berg, P.M.M.M., Stam, P. and van Eeuwijk, F.A. (2004) Linkage
disequilibrium mapping of yield and yield stability in modern spring barley cultivars. Genetics 168,
435446.
Kraft, T., Hansen, M. and Nilsson, N.-O. (2000) Linkage disequilibrium and fingerprinting in sugar beet.
Theoretical and Applied Genetics 101, 323326.
Krapp, A., Morot-Gaudry, J.F., Boutet, S., Bergot, G., Lelarge, C., Prioul, J.L. and Noctor, G. (2007)
Metabolomics. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science
Publishers, Enfield, New Hampshire, pp. 311333.
Krattiger, A., Mahoney, R.T., Nelsen, L., Bennett, A.B., Graff, G.D., Fernandez, C. and Kowalski, S.P. (eds)
(2006) Intellectual Property Management in Health and Agricultural Innovation, a Handbook of Best
Practices. Centre for the Management of Intellectual Property in Health R&D, Oxford, UK and Public
Intellectual Property Resource for Agriculture, Davis, California.
Kresovich, S. and McFerson, J.R. (1992) Assessment and management of plant genetic diversity: consid-
eration of intra- and interspecific variation. Field Crops Research 29, 185204.
Kresovich, S., Luongo, A.J. and Schloss, S.J. (2002) Mining the gold: finding allelic variants for improved
crop conservation and use. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson,
M.T. (eds) Managing Plant Genetic Diversity. International Plant Genetic Resources Institute, Rome,
pp. 379386.
Kriegner, A., Cervantes, J.C., Burg, K., Mwanga, R.O.M. and Zhang, D.P. (2003) A genetic linkage
map of sweetpotato (Ipomoea batatas (L) Lam) based on AFLP markers. Molecular Breeding 11,
169185.
Krishnan, P., Kruger, N.J. and Ratcliffe, R.G. (2005) Metabolite fingerprinting and profiling in plants using
NMR. Journal of Experimental Botany 56, 255265.
Krizkova, L. and Hrouda, M. (1998) Direct repeats of T-DNA integrated in tobacco chromosome: characteri-
zation of junction regions. The Plant Journal 16, 673680.
Kruglyak, L. (2008) The road to genome-wide association studies. Nature Reviews Genetics 9, 314318.
Kryder, R.D., Kowalski, S.P and Krattiger, A.F. (2000) The intellectual and technical property components
of pro-vitamin A rice (GoldenRice): a preliminary freedom-to-operate review. ISAAA Briefs No. 20.
International Service for the Acquisition of Agri-biotech Applications (ISAAA), Ithaca, New York, 56
pp.
Krysan, P.J., Young, J.C. and Sussman, M.R. (1999) T-DNA as an insertional mutagen in Arabidopsis.
The Plant Cell 11, 22832290.
Krysan, P.J., Young, J.C., Jester, P.J., Monson, S., Copenhaver, G., Preuss, D. and Sussman, M.R. (2002)
Characterization of T-DNA insertion sites in Arabidopsis thaliana and the implications for saturation
mutagenesis. OMICS 6, 163174.
Kuchel, H., Ye, G., Fox, R. and Jefferies, S. (2005) Genetic and economic analysis of a targeted marker-
assisted wheat breeding strategy. Molecular Breeding 16, 6778.
Kuchel, H., Fox, R., Reinheimer, J., Mosionek, L., Willey, N., Bariana, H. and Jefferies, S. (2008) The suc-
cessful application of a marker-assisted wheat breeding strategy. Molecular Breeding 20, 295308.
Kuiper, H.A., Kok, E.J. and Engel, K.H. (2003) Exploitation of molecular profiling techniques for GM food
safety assessment. Current Opinion in Biotechnology 14, 238243.
Kuiper, M., Zabeau, M. and Vos, P. (1997) Amplification of simple sequence repeats. Patent EP 0805875.
Kumar, I. and Khush, G.S. (1986) Genetics of amylose content in rice (Oryza sativa L.). Journal of Genetics
65, 111.
Kumar, P.V.S. (1993) Biotechnology and biodiversity a dialectical relationship. Journal of Scientific and
Industrial Research 52, 523532.
Kumpatla, S.P. and Mukhopadhyay, S. (2005) Mining and survey of simple sequence repeats in expressed
sequence tags of dicotyledonous species. Genome 48, 985998.
Kurata, N., Moore, G., Nagamura, Y., Foote, T., Yano, M., Minobe, Y. and Gale, M. (1994) Conservation of
genome structure between rice and wheat. Nature Biotechnology 12, 276278.
Kusterer, B., Piepho, H.P., Utz, H.F., Schn, C.C., Muminovic, J., Meyer, R.C., Altmann, T. and Melchinger,
A.E. (2007) Heterosis for biomass-related traits in Arabidopsis investigated by a novel QTL analysis of
the triple testcross design with recombinant inbred lines. Genetics 177, 18391850.
Lagercrantz, U. and Lydiate, D. (1995) RFLP mapping in Brassica nigra indicates different recombination
rates in male and female meiosis. Genome 38, 255264.
References 671

Laird, N.M. and Lange, C. (2006) Family-based designs in the age of large-scale gene-association studies.
Nature Reviews Genetics 7, 385394.
Lalonde, S., Ehrhardt, D.W., Loqu, D., Chen, J., Rhee, S.Y. and Frommer, W.B. (2008) Molecular and
cellular approaches for the detection of proteinprotein interactions: latest techniques and current
limitations. The Plant Journal 53, 610635.
Lamkey, K.R. and Edwards, J.W. (1999) Quantitative genetics of heterosis. In: Coors, J.G. and Pandey, S.
(eds) The Genetics and Exploitation of Heterosis in Crops. American Society of Agronomy (ASA) and
Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 3148.
Lamkey, K.R., Schnicker, B.J. and Melchinger, A.E. (1995) Epistasis in an elite maize hybrid and choice of
generation for inbred development. Crop Science 35, 12721281.
Lan, H., Chen, M., Flowers, J.B., Yandell, B.S., Stapleton, D.S., Mata, C.M., Mui, E.T.-K., Flowers, M.T.,
Schueler, K.L., Manly, K.F., Williams, R.W., Kendziorski, C. and Attie, A.D. (2006) Combined expres-
sion trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2(1), e6.
Lande, R. and Thompson, R. (1990) Efficiency of marker-assisted selection in the improvement of quantita-
tive traits. Genetics 124, 743756.
Landegren, U., Kaiser, R., Sanders, J. and Hood, L. (1988) A ligase-mediated gene detection technique.
Science 241, 10771080.
Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and
reporting linkage results. Nature Genetics 11, 241247.
Lander, E.S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP
linkage maps. Genetics 121,185199.
Lander, E.S. and Green, P. (1987) Construction of multilocus genetic linkage maps in humans. Proceedings
of the National Academy of Sciences of the United States of America 84, 23632367.
Lander, E.S., Green, P., Abrahamson, J., Barlow, A., Daly, M.J., Lincoln, S.E. and Newburg, L. (1987) MAP-
MAKER: an interactive computer package for constructing primary genetic linkage maps of experimental
and natural populations. Genomics 1, 174181.
Landy, A. (1989) Dynamic, structural and regulatory aspects of lambda-site-specific recombination. Annual
Review of Biochemistry 58, 913949.
Lane, M.A., Edwards, J.L. and Nielsen, E.S. (2000) Biodiversity informatics: the challenges of rapid devel-
opment, large databases and complex data. In: Proceedings of the 26th International Conference on
Very Large Databases, 1014 September 2000, Cairo, Egypt. Very Large Data Base Endowment,
Inc., USA.
Lang, N.T., Subudhi, P.K., Virmani, S.S., Brar, D.S., Khush, G.S., Li, Z. and Huang, N. (1999) Development
of PCR-based markers for thermosensitive genetic male sterility gene tms3(t) in rice (Oryza sativa
L.). Hereditas 131, 121127.
Laperche, A., Brancourt-Hulmel, M., Heumez, E., Gardet, O., Hanocq, E., Devienne-Barret, F. and Le
Gouis, J. (2007) Using genotype nitrogen interaction variables to evaluate the QTL involved in wheat
tolerance to nitrogen constraints. Theoretical and Applied Genetics 115, 399415.
Laramie, J.M., Wilk, J.B., DeStefano, A.L. and Myers, R.H. (2007) HaploBuild: an algorithm to con-
struct non-contiguous associated haplotypes in family based genetic studies. Bioinformatics 23,
21902192.
Larkin, P.J. and Scowcroft, W.R. (1981) Somaclonal variation a novel source of variability from cell cul-
tures for plant improvement. Theoretical and Applied Genetics 60, 197214.
Lashermes, P. and Beckert, M. (1988) Genetic control of maternal haploidy in maize (Zea mays L.) and
selection of haploid inducing lines. Theoretical and Applied Genetics 76, 405410.
Lassner, M.W. and Orton, T.J. (1983) Detection of somatic variation. In: Tanksley, S.D. and Orton, T.J. (eds)
Isozymes in Plant Genetics and Breeding. Vol. 1A. Developments in Plant Genetics and Breeding, 1.
Elsevier, Amsterdam, Netherlands, pp. 209217.
Laurie, C.C., Chasalow, S.D., LeDeaux, J.R., McCarroll, R., Bush, D., Hauge, B., Lai, C., Clark, D.,
Rocheford, T.R. and Dudley, J.W. (2004) The genetic architecture of response to long-term artificial
selection for oil concentration in the maize kernel. Genetics 168, 21412155.
Laurie, D.A. and Bennett, M.D. (1986) Wheat and maize hybridization. Canadian Journal of Genetics and
Cytology 28, 313316.
Laurie, D.A. and Reymondie, S. (1991) High frequencies of fertilization and haploid seedling production in
crosses between commercial hexaploid wheat varieties and maize. Plant Breeding 106, 182189.
Laurie, D.A., Pratchett, N., Bezant, J.H. and Snape, J.W. (1994) Genetic analysis of a photoperiod response
gene on the short arm of chromosome 2(2H) on Hordeum vulgare (barley). Heredity 72, 619627.
672 References

Lebowitz, R.L., Soller, M. and Beckmann, J.S. (1987) Trait-based analysis for the detection of linkage
between marker loci and quantitative trait loci in cross between inbred lines. Theoretical and Applied
Genetics 73, 556562.
Lee, E.A, Ash, M.J. and Good, B. (2007) Re-examining the relationship between degree of relatedness,
genetic effects and heterosis in maize. Crop Science 47, 629635.
Lee, J.M., Davenport, G.F., Marshall, D., Noel Ellis, T.H., Ambrose, M.J., Dicks, J., van Hintum, T.J.L. and
Flavell, A.J. (2005) GERMINATE: a generic database for integrating genotypic and phenotypic infor-
mation for plant genetic resource collections. Plant Physiology 139, 619631.
Lee, L.-Y., Kononov, M.E., Bassuner, B., Frame, B.R., Wang, K. and Gelvin, S.B. (2007) Novel plant trans-
formation vectors containing the superpromoter. Plant Physiology 145, 12941300.
Lee, M. (1995) DNA markers and plant breeding programs. Advances in Agronomy 55, 265344.
Lee, M., Godshalk, E.B., Lamkey, K.R. and Woodman, W.L. (1989) Association of restriction length poly-
morphism among maize inbreds with agronomic performance of their crosses. Crop Science 29,
10671071.
Lee, M., Sharopova, N., Beavis, W.D., Grant, D., Katt, M., Blair, D. and Hallauer, A. (2002) Expanding the
genetic map of maize with the intermated B73 Mo17 (IBM) population. Plant Molecular Biology 48,
453461.
Leflon, M., Lecomte, C., Barbottin, A., Jeuffroy, M.-H., Robert, N. and Brancourt-Hulmel, M. (2005)
Characterization of environments and genotypes for analyzing genotype environment interaction.
Some recent advances in winter wheat and prospects for QTL detection. Journal of Crop Improvement
14, 249298.
Leister, D.M., Kurth, J., Laurie, D.A., Yano, M., Sasaki, T., Devos, K., Graner, A. and Schulze-Lefert, P.
(1998) Rapid re-organization of resistance gene homologues in cereal genomes. Proceedings of the
National Academy of Sciences of the United States of America 95, 370375.
Leng, E.R. (1962) Results of long-term selection for chemical composition in maize and their significance
in evaluating breeding systems. Zeitschrift fr Pflanzenzchtung 47, 6791.
Lerner, I.M. (1950) Population Genetics and Animal Improvement. Cambridge University Press,
Cambridge.
Lerner, I.M. (1954) Genetic Homeostasis. Oliver and Boyd, London.
Lesser, W. (2005) Intellectual property rights in a changing political environment: perspectives on the types
and administration of protection. AgBioForum 8, 6472.
Lesser, W. and Mutschler, M.A. (2004) Balancing investment incentives and social benefits when
protecting plant varieties: implementing initial systems. Crop Science 44, 11131120.
Leung, H., Wu, C., Baraoidan, M., Bordeos, A., Ramos, M., Madamba, S., Cabauatan, P., Vera Cruz, C.,
Portugal, A., Reyes, G., Bruskiewich, R., McLaren, G., Lafitte, R., Gregorio, G., Bennett, J., Brar, D.,
Khush, G., Schnable, P., Wang, G. and Leach, J. (2001) Deletion mutants for functional genomics:
progress in phenotyping, sequence assignment and database development. In: Khush, G.S., Brar,
D.S. and Hardy, B. (eds) Rice Genetics IV. Proceedings of the Fourth International Rice Genetics
Symposium, 2227 October 2000, Los Banos, Philippines. Science Publishers, Inc., New Delhi and
International Rice Research Institute, Los Banos, Philippines, pp. 239251.
Levinson, G. and Gutman, G.A. (1987) Slipped-strand mispairing: a major mechanism for DNA sequence
evolution. Molecular Biology and Evolution 4, 203221.
Lewin, B. (2007) Genes IX. Jones & Bartlett, Sudbury, Massachusetts, 892 pp.
Lewington, A. (2003) Plants for People. Eden Project Books, London.
Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models.
Genetics 49, 4967.
Lewontin, R.C. and Berlan, J.P. (1990) The political economy of agricultural research: the case of hybrid
corn. In: Carroll, C.R., Vandermeer, J.H. and Rosset, P. (eds) Agroecology. McGraw Hill, New York,
pp. 613628.
Li, C.C. (1955) Population Genetics. University of Chicago Press, Chicago, Illinois.
Li, H., Ye, G. and Wang, J. (2007) A modified algorithm for the improvement of composite interval mapping.
Genetics 175, 361374.
Li, H., Ribaut, J.M., Li, Z. and Wang, J. (2008) Inclusive composite interval mapping (ICIM) for digenic epista-
sis of quantitative traits in biparental populations. Theoretical and Applied Genetics 116, 243260.
Li, L., Zhou, Y., Cheng, X., Sun, J., Marita, J.M., Ralph, J. and Chiang, V.L. (2003) Combinatorial modifica-
tion of multiple lignin traits in trees through multigene cotransformation. Proceedings of the National
Academy of Sciences of the United States of America 100, 49394944.
References 673

Li, R., Lyons, M.A., Wittenburg, H., Paigen, B. and Churchill, G.A. (2005) Combining data from multiple
inbred line crosses improves the power and resolution of quantitative trait loci mapping. Genetics 169,
16991709.
Li, R., Tsaih, S.W., Shockley, K., Stylianou, I.M., Wergedal, J., Paigen, B. and Churchill, G.A. (2006)
Structural model analysis of multiple quantitative traits. PLoS Genetics 2(7), e114.
Li, X. and Zhang, Y. (2002) Reverse genetics by fast neutron mutagenesis in higher plants. Functional and
Integrative Genomics 2, 254258.
Li, X., Song, Y., Century, K., Straight, S., Ronald, P.C., Dong, X., Lasser, M. and Zhang, Y. (2001)
Deleagene: a fast neutron mutagenesis-based reverse genetics system for plants. The Plant Journal
27, 235242.
Li, Y., Shi, Y., Cao, Y. and Wang, T. (2004) Establishment of a core collection for maize germplasm pre-
served in Chinese national gene bank using geographic distribution and characterization data.
Genetic Resources and Crop Evolution 51, 845852.
Li, Z.K., Pinson, S.R., Stansel, J.W. and Park, W.D. (1995) Identification of quantitative trait loci (QTL) for
heading date and plant height in cultivated rice (Oryza sativa L.). Theoretical and Applied Genetics
91, 374381.
Li, Z.K., Luo, L.J., Mei, H.W., Wang, D.L., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel,
J.W., Khush, G.S. and Paterson, A.H. (2001) Overdominance epistatic loci are the primary genetic
basis of inbreeding depression and heterosis in rice: I. Biomass and grain yield. Genetics 158,
17371753.
Li, Z.K., Fu, B.-Y., Gao, Y.-M., Xu, J.-L., Ali, J., Lafitte, H.R., Jiang, Y.-Z., Rey, J.D., Vijayakumar, C.H.M.,
Maghirang, R., Zheng, T.-Q. and Zhu, L.-H. (2005) Genome-wide introgression lines and their use in
genetic and molecular dissection of complex phenotypes in rice (Oryza sativa L.). Plant Molecular
Biology 59, 3352.
Liang, C., Jaiswal, P., Hebbard, C., Avraham, S., Buckler, E.S., Casstevens, T., Hurwitz, B., McCouch, S.,
Ni, J., Pujar, A., Ravenscroft, D., Ren, L., Spooner, W., Tecle, I., Thomason, J., Tung, C.-W., Wei, X.,
Yap, I., Youens-Clark, K., Ware, D. and Stein, L. (2008) Gramene: a growing plant comparative genom-
ics resource. Nucleic Acids Research 36, D947D953.
Liang, F., Deng, Q., Wang, Y., Xiong, Y., Jin, D., Li, J. and Wang, B. (2004) Molecular marker-assisted
selection for yield-enhancing genes in the progeny of 9311 O. rufipogon using SSR. Euphytica 139,
159165.
Liang, G.H. and Skinner, D.Z. (eds) (2004) Genetically Modified Crops: Their Development, Uses and
Risks. Food Products Press, Binghamton, New York.
Lillemo, M., van Ginkel, M., Trethowan, R.M., Hernndez, E. and Rajaram, S. (2004) Associations among
international CIMMYT bread wheat yield testing locations in high rainfall areas and their implications
for wheat breeding. Crop Science 44, 11631169.
Lilley, J.M., Ludlow, M.M., McCouch, S.R. and OToole, J.C. (1996) Locating QTL for osmotic adjustment
and dehydration tolerance in rice. Journal of Experimental Botany 47, 14271436.
Lin, C., Fang, J., Xu, X., Zhao, T., Cheng, J., Tu, J., Ye, G. and Shen, Z. (2008) A built-in strategy for con-
tainment of transgenic plants: creation of selectively terminable transgenic rice. PLoS ONE 3, e1818.
Available at: http://www.plosone.org (accessed 17 November 2009).
Lin, C.S. and Binns, M.R. (1988) A method of analyzing cultivar location year experiments: a new stabil-
ity parameter. Theoretical and Applied Genetics 76, 425430.
Lin, C.S., Binns, M.R. and Lefkovitch, L.P. (1986) Stability analysis: where do we stand? Crop Science 26,
894900.
Lin, H.X., Yamamoto, T., Sasaki, T. and Yano, M. (2000) Characterization and detection of epistatic inter-
actions of 3 QTLs, Hd1, Hd2 and Hd3, controlling heading date in rice using nearly isogenic lines.
Theoretical and Applied Genetics 101, 10211028.
Lin, Y.R., Schertz, K.F. and Paterson, A.H. (1995) Comparative analysis of QTLs affecting plant height
and maturity across the Poaceae, in reference to an interspecific sorghum population. Genetics 140,
391411.
Lippman, Z.B. and Zamir, D. (2007) Heterosis: revisiting the magic. Trends in Genetics 23, 6066.
Liu, B., Zhang, S., Zhu, X., Yang, Q., Wu, S., Mei, M., Mauleon, R., Leach, J., Mew, T. and Leung, H.
(2004) Candidate defense genes as predictors of quantitative blast resistance in rice. Molecular
PlantMicrobe Interaction 17, 11461152.
Liu, B.H. (1998) Statistical Genomics: Linkage, Mapping and QTL Analysis. CRC Press, Boca Baton,
Florida, 611 pp.
674 References

Liu, G., Zhang, Z., Zhu, H., Zhao, F., Ding, X., Zeng, R., Li, W. and Zhang, G. (2008) Detection of
QTLs with additive effects and additive-by-environment interaction effects on panicle number in
rice (Oryza sativa L.) with single-segment substitution lines. Theoretical and Applied Genetics 116,
923931.
Liu, J.H., Xu, X.Y. and Deng, X.X. (2005) Intergeneric somatic hybridization and its application to crop
genetic improvement. Plant Cell, Tissue and Organ Culture 82, 1944.
Liu, J.S., Sabatti, C., Teng, J., Keats, B.J.B. and Risch, K. (2001) Bayesian analysis of haplotypes for link-
age disequilibrium mapping. Genome Research 11, 17161724.
Liu, K., Goodman, M., Muse, S., Smith, J.S., Buckler, E.D. and Doebley, J. (2003) Genetic structure
and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165,
21172128.
Liu, S., Zhou, R., Dong, Y., Li, P. and Jia, J. (2006) Development, utilization of introgression lines using a
synthetic wheat as donor. Theoretical and Applied Genetics 112, 13601373.
Liu, S.C., Kowalski, S.P., Lan, T.H., Feldmann, K.A. and Paterson, A.H. (1996) Genome-wide high-
resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142,
247258.
Liu, X.C. and Wu, J.L. (1998) SSR heterotic patterns of parents for making and predicting heterosis in rice
breeding. Molecular Breeding 4, 263268.
Liu, X.Q., Wang, L., Chen, S., Lin, F. and Pan, Q.H. (2005) Genetics and physical mapping of Pi36(t), a
novel rice blast resistance gene located on rice chromosome 8. Molecular Genetics and Genomics
274, 394401.
Liu, X.Z., Peng, Z.B., Fu, J.H., Li, L.C. and Huang, C.L. (1997) Application of RAPD in group classification
studies. Scientia Agricultura Sinica 30, 4451.
Liu, Y. and Zeng, Z.B. (2000) A general mixture model approach for mapping quantitative trait loci from
diverse cross designs involving multiple inbred lines. Genetical Research 75, 345355.
Liu, Y.G. and Whittier, R. (1995) Thermal asymmetric interlaced PCR: automatable amplification and
sequencing of insert and fragments from P1 and YAC clones for chromosome walking. Genomics 25,
674681.
Liu, Y.G., Mitsukawa, N., Oosumi, T. and Whittier, R. (1995) Efficient isolation and mapping of Arabidopsis
thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR. The Plant Journal 8,
457463.
Liu, Y.-G., Shirano, Y., Fukaki, H., Yanai, Y., Tasaka, M., Tabata, S. and Shibata, D. (1999) Complementation
of plant mutants with large genomic DNA fragments by a transformation-competent artificial chromo-
some vector accelerates positional cloning. Proceedings of the National Academy of Sciences of the
United States of America 96, 65356540.
Lloyd, A., Plaisier, C.L., Carroll, D. and Drews, G.N. (2005) Targeted mutagenesis using zinc-finger nucle-
ases in Arabidopsis. Proceedings of the National Academy of Sciences of the United States of
America 102, 22322237.
Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C.,
Kobayashi, M., Norton, H. and Brown, E.L. (1996) Expression monitoring by hybridization to high-
density oligonucleotide arrays. Nature Biotechnology 14, 16751680.
Lffler, C.M., Wei, J., Fast, T., Gogerty, J., Langton, S., Bergman, M., Merrill, B. and Cooper, M. (2005)
Classification of maize environments using crop simulation and geographic information systems. Crop
Science 45, 17081716.
Lolle, S.J., Victoria, J.L., Young, J.M. and Pruitt, R.E. (2005) Genome-wide non-Mendelian inheritance of
extra-genomic information in Arabidopsis. Nature 434, 505509.
Long, A.D., Mullaney, S.L., Reid, L.A., Fry, J.D., Langley, C.H. and Mackay, T.F. (1995) High resolution map-
ping of genetic factors affecting abdominal bristle number in Drosophila melanogaster. Genetics 139,
12731291.
Longin, C.F.H., Utz, H.F., Reif, J.C., Schipprack, W. and Melchinger, A.E. (2006) Hybrid maize breeding with
doubled haploids: I. One stage versus two-stage selection for testcross performance. Theoretical and
Applied Genetics 112, 903912.
Lonnstedt, I. and Speed, T.P. (2002) Replicated microarray data. Statistica Sinica 12, 3146.
Lonosky, P.M., Zhang, X., Honavar, V.G., Dobbs, D.L., Fu, A. and Rodermel, S.R. (2004) A proteomic analy-
sis of maize chloroplast biogenesis. Plant Physiology 134, 560574.
Lrz, H. and Wenzel, G. (eds) (2005) Molecular Marker Systems in Plant Breeding and Crop Improvement.
Biotechnology in Agriculture and Forestry, Vol. 55. Springer-Verlag, Berlin.
References 675

Louwaars, N.P., Visser, B., Eaton, D., Beekwilder, J. and van der Meer, I. (2002) Policy response to techno-
logical developments: the case of GURTs. In: Louwaars, N.P. (ed.) Seed Policy, Legislation and Law:
Widening a Narrow Focus. Food Products Press, Binghamton, New York, pp. 89102.
Louwaars, N.P., Tripp, R. and Eaton, D. (2006) Public research in plant breeding and intellectual property
rights: a call for new institutional policies. Agricultural and Rural Development Notes Issue 13, p. 4.
World Bank, Washington, DC.
Lu, C., Shen, L., Tan, Z., Xu, Y., He, P., Chen, Y. and Zhu, L. (1996) Comparative mapping of QTL for agro-
nomic traits of rice across environments using a double haploid population. Theoretical and Applied
Genetics 93, 12111217.
Lu, H., Romero-Severson, J. and Bernardo, R. (2003) Genetic basis of heterosis explored by simple
sequence repeat markers in a random-mated maize population. Theoretical and Applied Genetics
107, 494502.
Lu, H., Redus, M.A., Coburn, J.R., Rutger, J.N., McCouch, S.R. and Tai, T.H. (2005) Population structure
and breeding patterns of 145 U.S. rice cultivars based on SSR marker analysis. Crop Science 45,
6676.
Lu, L., Romero-Severson, J. and Bernardo, R. (2002) Chromosomal regions associated with segregation
distortion in maize. Theoretical and Applied Genetics 105, 622628.
Lu, X., Niu, T. and Liu, J.S. (2003) Haplotype information and linkage disequilibrium mapping for single
nucleotide polymorphisms. Genome Research 13, 21122117.
Lu, X.G., Gu, M.H. and Li, C.Q. (eds) (2001) Theory and Technology of Two-line Hybrid Rice. China Science
Press, Beijing.
Lu, X.G., Mou, T.M., Hoan, N.T. and Virmani, S.S. (2004) Two-line hybrid rice breeding in and outside
of China. In: Virmani, S.S., Mao, C.X. and Hardy, B. (eds) Hybrid Rice for Food Security, Poverty
Alleviation and Environmental Protection. International Rice Research Institute, Manila, Phillipines.
Lu, Y., Yan, J., Guimares, C.T., Taba, S., Hao, Z., Gao, S., Chen, S., Li, J., Zhang, S., Vivek, B.S.,
Magorokosho, C., Mugo, S., Makumbi, D., Parentoni, S.N., Shah, T., Rong, T., Crouch, J.H. and Xu, Y.
(2009) Molecular characterization of global maize breeding germplasm based on genome-wide single
nucleotide polymorphisms. Theoretical and Applied Genetics 120, 93115.
Lbberstedt, T., Klien, D. and Melchinger, A.E. (1998a) Comparative QTL mapping of resistance to
Ustilago maydis across four populations of European flint-maize. Theoretical and Applied Genetics
97, 13211330.
Lbberstedt, T., Melchenger, A.E., Fhr, S., Klein, D., Dally, A. and Westhoff, P. (1998b) QTL mapping in
test crosses of flint lines of maize: III. Comparison across populations for forage traits. Crop Science
38, 12781289.
Lucca, P., Ye, X.D. and Potrykus, I. (2001) Effective selection and regeneration of transgenic rice plants with
mannose as selective agent. Molecular Breeding 7, 4349.
Lucken, K.A. (1986) The breeding and production of hybrid wheat. In: Smith, E.L. (ed.) Genetic Improvement
in Yield of Wheat. American Society of Agronomy (ASA) and Crop Science Society of America (CSSA),
Madison, Wisconsin, pp. 87107.
Lucken, K.A. and Johnson, K.D. (1988) Hybrid wheat status and outlook. In: International Rice Research
Institute (IRRI) (ed.) Hybrid Rice. IRRI, Manila, Philippines, pp. 243255.
Lucker, J., Schwab, W., van Hautum, B., Blaas, J., van der Plas, L.H., Bouwmeester, H.J. and Verhoeven,
H.A. (2004) Increased and altered fragrance of tobacco plants after metabolic engineering using three
monoterpene synthases from lemon. Plant Physiology 134, 510519.
Luo, K., Duan, H., Zhao, D., Zheng, X., Deng, W., Chen, Y., Stewart, C.N., Jr, McAvoy, R., Jiang, X., Wu, Y.,
He, A., Pei, Y. and Li, Y. (2007) GM-Gene-deletor: fused loxP-FRT recognition sequences dramati-
cally improve the efficiency of FLP or CRE recombinase on transgene excision from pollen and seed
of tobacco plants. Plant Biotechnology Journal 5, 263274.
Luo, L.J., Li, Z.K., Mei, H.W., Shu, Q.Y., Tabien, R., Zhong, D.B., Ying, C.S., Stansel, J.W., Khush, G.S. and
Paterson, A.H. (2001) Overdominant epistatic loci are the primary genetic basis of inbreeding depres-
sion and heterosis in rice. II. Grain yield components. Genetics 158, 17551771.
Lupas, A. (1996) Prediction and analysis of coiled coil structures. Methods in Enzymology 266, 513523.
Lush, J.L. (1937) Animal Breeding Plans. Iowa State College Press, Ames, Iowa.
Lush, J.L. (1945) Animal Breeding Plans, 3rd edn. Iowa State College Press, Ames, Iowa.
Lussier, Y.A. and Li, J. (2004) Terminological mapping for high throughput comparative biology of pheno-
types. Proceedings of the Pacific Symposium on Biocomputing, 610 January 2004, Hawaii. PSB,
Stanford, California, pp. 202213.
676 References

Lutz, K.A., Azhagiri, A.K., Tungsuchat-Huang, T. and Maliga, P. (2007) A guide to choosing vectors for
transformation of the plastid genome of higher plants. Plant Physiology 145, 12011210.
Lyamichev, V., Mast, A.L., Hall, J.G., Prudent, J.R., Kaiser, M.W., Takova, T., Kwiatkowski, R.W., Sander,
T.J., de Arruda, M., Arco, D.A., Neri, B.P. and Brow, M.A.D. (1999) Polymorphism identification and
quantitative detection of genomic DNA by invasive cleavage of oligonucleotide probes. Nature
Biotechnology 17, 292296.
Lyman, J.M. (1984) Progress and planning for germplasm conservation of major food crops. Plant Genetic
Resources Newsletter 60, 321.
Lynch, M. and Walsh, B. (1998) Genetics and Analysis of Quantitative Traits. Sinauer Associates,
Sunderland, Massachusetts, 980 pp.
Ma, C.X., Casella, G. and Wu, R.L. (2002) Functional mapping of quantitative trait loci underlying the char-
acter process: a theoretical framework. Genetics 61, 17511762.
Ma, J.K., Hiatt, A., Hein, M., Vine, N.D., Wang, F., Stabila, P., van Dolleweerd, C., Mostov, K. and Lehner, T.
(1995) Generation and assembly of secretory antibodies in plants. Science 268, 716719.
Ma, J.K.-C., Chikwamba, R., Sparrow, P., Fischer, R., Mahoney, R. and Twyman, R.M. (2005) Plant-derived
pharmaceuticals the road forward. Trends in Plant Science 10, 580585.
MacBeath, G. and Schreiber, S.L. (2000) Printing proteins as microarrays for high-throughput function
determination. Science 289, 17601763.
MacCoss, M.J., McDonald, W.H., Saraf, A., Sadygov, R., Clark, J.M, Tasto, J.J., Gould, K.L., Wolters, D.,
Washburn, M., Weiss, A., Clark, J.I. and Yates III, J.R. (2002) Shotgun identification of protein modifi-
cations from protein complexes and lens tissue. Proceedings of the National Academy of Sciences of
the United States of America 99, 79007905.
MacDonald, J.A., Mackey, A.J., Pearson, W.R. and Haystead, T.A. (2002) A strategy for the rapid identifica-
tion of phosphorylation sites in the phosphoproteome. Molecular and Cellular Proteomics 1, 314322.
Mackay, I. and Powell, W. (2007) Methods for linkage disequilibrium mapping in crops. Trends in Plant
Science 12, 5763.
Mackay, T.F.C. (1995) The genetic basis of quantitative variation: number of sensory bristles of Drosophila
melanogaster as a model system. Trends in Genetics 11, 464470.
Mackay, T.F.C., Stone, E.A. and Ayroles, J.F. (2009) The genetics of quantitative traits: challenges and
prospects. Nature Reviews Genetics 10, 565577.
Mackill, D.J. and McNally, K.L. (2004) A model crop species: molecular markers in rice. In: Lrz, H. and
Wenzel, G. (eds) Molecular Marker Systems in Plant Breeding and Crop Improvement. Springer
Verlag, Heidelberg, pp. 3954.
Mackill, D.J., Salam, M.A., Wang, Z.Y. and Tanksley, S.D. (1993) A major photoperiod-sensitivity gene
tagged with RFLP and isozyme markers in rice. Theoretical and Applied Genetics 85, 536540.
Mackill, D.J., Zhang, Z., Redoa, E.D. and Colowit, P.M. (1996) Level of polymorphism and genetic map-
ping of AFLP markers in rice. Genome 39, 969977.
Macomber, R.S. (1998) A Complete Introduction to Modern NMR Spectroscopy. John Wiley & Sons,
Chichester, UK.
Magnuson, V.L., Ally, D.S., Nylund, S.J., Karanjawala, Z.E., Rayman, J.B., Knapp, J.I., Lowe, A.L., Ghosh, S.
and Collins, F.S. (1996) Substrate nucleotide-determined non-templates addition to adenine by Taq
DNA polymerase: implication for PCR-based genotyping and cloning. Biotechniques 21, 700709.
Maheswaran, M., Huang, N., Sreerangasamy, S.R. and McCouch, S.R. (2000) Mapping quantitative trait
loci associated with days to flowering and photoperiod sensitivity in rice (Oryza sativa L.). Molecular
Breeding 6, 145155.
Mailund, T., Schierup, M.H., Pedersen, C.N.S., Madsen, J.N., Hein, J. and Schauser, L. (2006) GeneRecon
a coalescent based tool for fine-scale association mapping. Bioinformatics 22, 23172318.
Malakoff, D. (1999) Bayes offers a new way to make sense of numbers. Science 286, 14601464.
Malmberg, R.L. and Mauricio, R. (2005) QTL-based evidence for the role of epistasis in evolution. Genetical
Research 86, 8995.
Malmberg, R.L., Held, S., Waits, A. and Mauricio, R. (2005) Epistasis for fitness-related quantitative traits in
Arabidopsis thaliana grown in the field and in the greenhouse. Genetics 171, 20132027.
Malosetti, M., Voltas, J., Romagosa, I., Ullrich, S.E. and van Eeuwijk, F.A. (2004) Mixed models including
environmental variables for studying QTL by environment interaction. Euphytica 137, 139145.
Malosetti, M., van der Linden, C.G., Vosman, B. and van Eeuwijk, A. (2007) A mixed-model approach to
association mapping using pedigree information with an illustration of resistance to Phytophthora
infestans in potato. Genetics 175, 879889.
References 677

Maluszynski, M., Kasha, K.J., Forster, B.P. and Szarejko, I. (eds) (2003) Doubled Haploid Production in
Crop Plants a Manual. Kluwer Academic Publishers, Dordrecht, Netherlands.
Mandel, J. (1969) The partitioning of interaction in analysis of variance. Journal of Research of the National
Bureau of Standards, Series B 73, 309328.
Mandel, J. (1971) A new analysis of variance model for nonadditive data. Technometrics 13, 118.
Manenti, G., Galvan, A., Pettinicchio, A., Trincucci, G., Spada, E., Zolin, A., Milani, S., Gonzalez-Neira, A.
and Dragani, T.A. (2009) Mouse genome-wide association mapping needs linkage analysis to avoid
false-positive loci. PLoS Genetics 5(1), e1000331.
Mangelsdorf, P.C. (1974) Corn: Its Origin, Evolution and Improvement. Harvard University Press, Cambridge,
Massachusetts.
Manly, K.F. (1993) A Macintosh program for storage and analysis of experimental genetic mapping data.
Mammalian Genome 4, 303313.
Mannschreck, S. (2004) Optimierung der Methode zur Chromosomalen Aufdopplung von in-vivo induzi-
erten Haploiden bei Mais (Zea mays L.). MSc thesis, Universitt Hohenheim, Germany.
Maqbool, S.B., Riazuddin, S., Loc, N.T., Gatehouse, A.M.R., Gatehouse, J.A. and Christou, P. (2001)
Expression of multiple insecticidal genes confers broad resistance against a range of different rice
pests. Molecular Breeding 7, 8593.
Marchini, J., Donnelly, P. and Cardon, L.R. (2005) Genome-wide strategies for detecting multiple loci that
influence complex diseases. Nature Genetics 4, 413417.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S.,
Chen, Y.-J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen,
S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.-B., Knight,
J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B.,
McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T.,
Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt,
K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F. and Rothberg, J.M. (2005)
Genome sequencing in open microfabricated high density picoliter reactors. Nature 437, 376380.
Marillonnet, S., Giritch, A., Gils, M., Kandzia, R., Klimyuk, V. and Gleba, Y. (2004) In planta engineering of
viral RNA replicons: efficient assembly by recombination of DNA modules delivered by Agrobacterium.
Proceedings of the National Academy of Sciences of the United States of America 101, 68526857.
Marillonnet, S., Thoeringer, C., Kandzia, R., Klimyuk, V. and Gleba, Y. (2005) Systematic Agrobacterium
tumefaciens-mediated transfection of viral replicons for efficient transient expression in plants. Nature
Biotechnology 23, 718723.
Martienssen, R.A., Rabinowicz, P.D., OShaughnessy, A. and McCombie, W.R. (2004) Sequencing the
maize genome. Current Opinion in Plant Biology 7, 102107.
Martin, G.B., Williams, J.G.K. and Tanksley, S.D. (1991) Rapid identification of markers linked to a
Pseudomonas resistance gene in tomato by using random primers and near-isogenic lines.
Proceedings of the National Academy of Sciences of the United States of America 88, 23362340.
Martin, G.B., Brommonschenkel, S.H., Chunwongse, J., Frary, A., Ganal, M.W., Spivey, R., Wu, T., Earle,
E.D. and Tanksley, S.D. (1993) Map-based cloning of a protein kinase gene conferring disease resist-
ance in tomato. Science 262, 14321436.
Martin, J.M., Talbert, L.E., Lanning, S.P. and Blake, N.K. (1995) Hybrid performance in wheat as related to
parental diversity. Crop Science 35, 104108.
Martin, O.C. and Hospital, F. (2006) Two- and three-locus tests for linkage analysis using recombinant
inbred lines. Genetics 173, 451459.
Martinez, L. (2003) In vitro gynogenesis induction and doubled haploid production in onion (Allium cepa L.).
In: Doubled Haploid Production in Crop Plants. Kluwer Academic Publisher, Dordrecht, Netherlands,
pp. 275281.
Martinez, O. and Curnow, R.N. (1992) Estimating the locations and the sizes of the effects of quantitative
trait loci using flanking markers. Theoretical and Applied Genetics 85, 480488.
Martinez, V., Thorgaard, G., Robison, B. and Sillanp, M.J. (2005) An application of Bayesian QTL map-
ping to early development in double haploid lines of rainbow trout including environmental effects.
Genetical Research 86, 209221.
Mascarenhas, M. and Busch, L. (2006) Seeds of change: intellectual property rights, genetically modified
soybeans and seed saving in the United States. Sociologia Ruralis 46, 122138.
Mather, K. (1949) Biometrical Genetics. Chapman & Hall, London.
Mather, K. and Jinks, J.L. (1982) Biometrical Genetics. Chapman & Hall, London.
678 References

Mathesius, U., Imin, N., Natera, S.H.A. and Rolfe, B.G. (2003) Proteomics as a functional genomics tool. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Totowa, New Jersey, pp. 395413.
Matsumura, H., Ito, A., Saitoh, H., Winter, P., Kahl, G., Reuter, M., Kruger, D.H. and Terauchi, R. (2005)
SuperSAGE. Cell Microbiology 7, 1118.
Matthews, P.R., Wang, M.B., Waterhouse, P.M., Thornton, S., Fieg, S.J., Gubler, F. and Jacobsen, J.V.
(2001) Marker gene elimination from transgenic barley, using co-transformation with adjacent twin
T-DNA on a standard Agrobacterium transformation vector. Molecular Breeding 7, 195202.
Matus, I., Corey, A., Filichkin, T., Hayes, P.M., Vales, M.I., Kling, J., Riera-Lizarazu, O., Sato, K., Powell, W.
and Waugh, R. (2003) Development and characterization of recombinant chromosome substitution
lines (RCSLs) using Hordeum vulgare subsp. spontaneum as a source of donor alleles in a Hordeum
vulgare subsp. vulgare background. Genome 46, 10101023.
Matzke, M.A. and Matzke, A.J.M. (1995) How and why do plants inactivate homologous (Trans) genes?
Plant Physiology 107, 679 685.
Matzke, M.A., Mette, M.F. and Matzke, A.J.M. (2000) Transgene silencing by the host genome defense:
implications for the evolution of epigenetic control mechanisms in plants and vertebrates. Plant
Molecular Biology 43, 401415.
Maxted, N., Ford-Lloyd, B.V. and Hawkes, J.G. (1997) Complementary conservation strategies. In: Maxted,
N., Ford-Lloyd, B.V. and Hawkes, J.G. (eds) Plant Genetic Resources Conservation. Chapman & Hall,
London, pp. 1539.
Mayer, J., Sharples, J. and Nottenburg, C. (2004) Resistance to Phosphinothricin. CAMBIA Intellectual
Property, Canberra.
Mayer, J.E., Pfeiffer, W.H. and Beyer, P. (2008) Biofortified crops to alleviate micronutrient malnutrition.
Current Opinion in Plant Biology 11, 166177.
Mayes, S., Parsley, K., Sylvester-Bradley, R., May, S. and Foulkes, J. (2005) Integrating genetic informa-
tion into plant breeding programmes: how will we produce varieties from molecular variation, using
bioinformatics? Annals of Applied Biology 146, 223237.
McCallum, C.M., Comai, L., Greene, E.A. and Henikoff, S. (2000) Targeting Induced Local Lesions IN
Genomes (TILLING) for plant functional genomics. Plant Physiology 123, 439442.
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A. and Hirschhorn,
J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and chal-
lenges. Nature Reviews Genetics 9, 356269.
McCouch, S.R., Teytelman, L., Xu, Y., Lobos, K.B., Clare, K., Walton, M., Fu, B., Maghirang, R., Li, Z., Xing, Y.,
Zhang, Q., Kono, I., Yano, M., Fjellstrom, R., DeClerck, G., Schneider, D., Cartinhour, S., Ware, D. and
Stein, L. (2002) Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.). DNA
Research 9, 199207.
McCouch, S.R., Sweeney, M., Li, J., Jiang, H., Thomson, M., Septiningsih, E., Edwards, J., Moncada, P.,
Xiao, J., Garris, A., Tai, T., Martinez, C., Tohme, J., Sugiono, M., McClung, A., Yuan, L.P. and Ahn, S.N.
(2007) Through the genetic bottleneck: O. rufipogon as a source of trait-enhancing alleles for O. sativa.
Euphytica 154, 317339.
McElroy, D. (1996) The industrialization of plant transformation. Nature Biotechnology 14, 715716.
McElroy, D. and Brettell, R.I.S. (1994) Foreign gene expression in transgenic cereals. Trends in Biotechnology
12, 6268.
McElroy, D., Zhang, W.G., Cao, J. and Wu, R. (1990) Isolation of an efficient actin promoter for use in rice
transformation. The Plant Cell 2, 163171.
McLaren, C.G., Bruskiewich, R.M., Portugal, A.M. and Cosico, A.B. (2005) The International Rice Information
System. A platform for meta-analysis of rice crop data. Plant Physiology 139, 637642.
McMullen, M.M., Kresovich, S., Villeda, H.S., Bradbury, P., Li, H., Sun, Q., Flint-Garcia, S., Thornsberry, J.,
Acharya, C., Bottoms, C., Brown, P., Browne, C., Eller, M., Guill, K., Harjes, C., Kroon, D., Lepak, N.,
Mitchell, S.E., Peterson, B., Pressoir, G., Romero, S., Rosas, M.O., Salvo, S., Yates, H., Hanson, M.,
Jones, E., Smith, S., Glaubitz, J.C., Goodman, M., Ware, D., Holland, J.B. and Buckler, E.S. (2009)
Genetic properties of the maize nested association mapping population. Science 325, 737740.
McNally, K.L., Bruskiewich, R., Mackill, D., Buell, C.R., Leach, J.E. and Leung, H. (2006) Sequencing multi-
ple and diverse rice varieties. Connecting whole-genome variation with phenotypes. Plant Physiology
141, 2631.
Meaburn, E., Butcher, L.M., Schalkwyk, L.C. and Plomin, R. (2006) Genotyping pooled DNA using 100K
SNP microarrays: a step towards genomewide association scans. Nucleic Acids Research 34, e28.
References 679

Meghi, M.R., Dudley, J.W., Lamkey, R.J. and Sprauge, G.F. (1984) Inbreeding depression, inbred and
hybrid grain yields and other traits of maize genotypes representing three eras. Crop Science 24,
545549.
Melchinger, A.E. (1993) Use of RFLP markers for analyses of genetic relationships among breeding mate-
rials and prediction of hybrid performance. In: Buxton, D.R., Shibles, R., Forsberg, R.A., Blad, B.L.,
Asay, K.H., Paulson, G.M. and Wilson, R.F. (eds) International Crop Science I. Crop Science Society
of America (CSSA), Madison, Wisconsin, pp. 621628.
Melchinger, A.E. (1999) Genetic diversity and heterosis. In: Coors, J.G. and Pandey, S. (eds) The Genetics
and Exploitation of Heterosis in Crops. Crop Science Society of America (CSSA), Madison, Wisconsin,
p. 54 (abstract).
Melchinger, A.E. and Gumber, R.K. (1998) Overview of heterosis and heterotic groups in agronomic crops.
In: Lamkey, K.R. and Staub, J.E. (eds) Concepts and Breeding of Heterosis in Crop Plants. Crop
Science Society of America (CSSA), Madison, Wisconsin, pp. 2944.
Melchinger, A.E., Geiger, H.H. and Schnell, F.W. (1986) Epistasis in maize (Zea mays L.) I. Comparison of
single and three-way cross hybrids among early flint and dent inbred lines. Maydica 31, 179192.
Melchinger, A.E., Lee, M., Lamkey, K.R. and Woodman, W.L. (1990) Genetic diversity for restriction frag-
ment length polymorphisms: relation to estimated genetic effects in maize inbreds. Crop Science 30,
10331040.
Melchinger, A.E., Messmer, M.M., Lee, M., Woodman, W.L. and Lamkey, K.R. (1991) Diversity and relation-
ships among U.S. maize inbreds revealed by restriction fragment length polymorphism. Crop Science
31, 669678.
Melchinger, A.E., Boppenmaier, J., Dhillon, B.S., Pollmer, W.G. and Herrmann, R.G. (1992) Genetic diver-
sity for RFLPs in European maize inbreds. II. Relation to performance of hybrids within versus between
heterotic groups for forage traits. Theoretical and Applied Genetics 84, 627681.
Melchinger, A.E., Graner, A., Singh, M. and Messmer, M.M. (1994) Relationships among European germ-
plasm: I. Genetic diversity among winter and spring cultivars revealed by RFLPs. Crop Science 34,
11911199.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (1998) Quantitative trait locus (QTL) mapping using different
testers and independent populations samples in maize reveals low power of QTL detection and large
bias in estimates of QTL effects. Genetics 149, 383403.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2000) From Mendel to Fisher. The power and limits of QTL
mapping for quantitative traits. Vortr Pflanzenzchtg 48, 132142.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2004) QTL analyses of complex traits with cross validation,
bootstrapping and other biometric methods. Euphytica 137, 111.
Melchinger, A.E., Longin, C.F., Utz, H.F. and Reif, J.C. (2005) Hybrid maize breeding with doubled haploid
lines: quantitative genetic and selection theory for optimum allocation of resources. In: Proceedings
of the Forty First Annual Illinois Corn Breeders School 78 March 2005, Urbana-Champaign, Illinois.
University of Illinois at Urbana-Champaign, pp. 821.
Melchinger, A.E., Utz, H.F., Piepho, H.P., Zeng, Z.-B. and Schn, C.C. (2007) The role of epistasis in the
manifestation of heterosis a system-oriented approach. Genetics 177, 18151825.
Melchinger, A.E., Utz, H.F. and Schn, C.C. (2008) Genetic expectations of quantitative trait loci main and
interaction effects obtained with the triple testcross design and their relevance for the analysis of
heterosis. Genetics 178, 22652274.
Menkir, A., Melake-Berhan, A., The, C., Ingelbrecht, I. and Adepoju, A. (2004) Grouping of tropical mid-
altitude maize inbred lines on the basis of yield data and molecular markers. Theoretical and Applied
Genetics 108, 15821590.
Menz, M.A., Klein, R.R., Mullet, J.E., Obert, J.A., Unruh, N.C. and Klein, P.E. (2002) A high-density genetic
map of Sorghum bicolor (L.) Moench based on 2926 AFLP, RFLP and SSR markers. Plant Molecular
Biology 48, 483499.
Mertz, E.T., Bates, L.S. and Nelson, O.E. (1964) Mutant gene that changes protein composition and
increases lysine content of maize endosperm. Science 145, 279280.
Messina, C.D., Jones, J.W., Boote, K.J. and Vallejos, C.E. (2006) A gene-based model to simulate soybean
development and yield response to environment. Crop Science 46, 456466.
Messmer, M.M., Melchinger, A.E., Herrmann, R.G. and Boppenmaier, J. (1993) Relationships among early
European maize inbreds. II. Comparisons of pedigree and RFLP data. Crop Science 33, 944950.
Meudt, H.M. and Clarke, A.C. (2007) Almost forgotten or latest practice? AFLP applications, analyses and
advances. Trends in Plant Science 12, 106117.
680 References

Meuwissen, T.H.E., Hayes, B.J. and Goddard, M.E. (2001) Prediction of total genetic value using genome-
wide dense marker maps. Genetics 157, 18191829.
Meyer, K., Benning, G. and Grill, E. (1996) Cloning of plant genes based on genetic map location. In:
Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 137154.
Meyer, S., Nowak, K., Sharma, V.K., Schulze, J., Mendel, R.R. and Hansch, R. (2004) Vectors for RNAi
technology in poplar. Plant Biology 6, 100103.
Meyers, B.C., Scalabrin, S. and Morgante, M. (2004) Mapping and sequencing complex genomes: lets get
physical! Nature Reviews Genetics 5, 578588.
Michelmore, R.W. and Shaw, D.V. (1988) Character dissection. Nature 335, 672673.
Michelmore, R.W., Paran, I. and Kesselli, R.V. (1991) Identification of markers linked to disease resistance
genes by bulked segregant analysis: a rapid method to detect markers in specific genome regions
using segregating populations. Proceedings of the National Academy of Sciences of the United States
of America 88, 98289832.
Miernyk, J.A. and Thelen, J.J. (2008) Biochemical approaches for discovering proteinprotein interactions.
The Plant Journal 53, 597609.
Miki, B. and McHugh, S. (2004) Selectable marker genes in transgenic plants: applications, alternatives
and biosafety. Journal of Biotechnology 107, 193232.
Miki, D. and Shimamoto, K. (2004) Simple RNAi vectors for stable and transient suppression of gene func-
tion in rice. Plant and Cell Physiology 45, 490495.
Mikkilineni, V. and Rocheford, T.R. (2004) RFLP variant frequency differences among Illinois long-term
selection protein strains. Plant Breeding Reviews 24(1), 111131.
Miklas, P.N., Kelly, J.D. and Singh, S.P. (2003) Registration of anthracnose-resistant pinto bean germplasm
line USPT-ANT-1. Crop Science 43, 18891890.
Miles, J.S. and Guest, J.R. (1984) Nucleotide sequence and transcriptional start point of the phosphoman-
nose isomerase gene (mana) of Escherichia coli. Gene 32, 4148.
Miller, W., Makova, K.D., Nekrutenko, A. and Hardison, R.C. (2004) Comparative genomics. Annual Review
of Genomics and Human Genetics 5, 1556.
Mitchell, A.A. and Chakravarti, A. (2003) Undetected genotyping errors cause apparent overtransmission
of common alleles in the transmission/disequilibrium test. American Journal of Human Genetics 72,
598610.
Miyahara, K. (1999) Analysis of LGC1, low glutelin mutant of rice. Gamma Field Symposia 38, 4352.
Miyao, A., Tanaka, K., Murata, K., Sawaki, H., Takeda, S., Abe, K., Shinozuka, Y., Onosato, K. and
Hirochika, H. (2003) Target site specificity of the Tos17 retrotransposon shows a preference for inser-
tion within genes and against insertion in retrotransposon-rich regions of the genome. The Plant Cell
15, 17711780.
Miyata, M., Yamamoto, T., Komori, T. and Nitta, N. (2007) Marker-assisted selection and evaluation of the
QTL for stigma exsertion under japonica rice genetic background. Theoretical and Applied Genetics
114, 539548.
Mlynarova, L., Conner, A.J. and Nap, J.P. (2006) Directed microspore-specific recombination of transgenic
alleles to prevent pollen-mediated transmission. Plant Biotechnology Journal 4, 445452.
Mo, H. (1988) Genetic expression for endosperm traits. In: Weir, B.S., Eisen, E.J., Goodman, M.M. and
Namkoog, S.N. (eds) Proceedings of the 2nd International Conference of Quantitative Genetics.
Sinauer Associates, Sunderland, Massachusetts, pp. 478487.
Mo, H. (1993a) Genetic analysis for qualitativequantitative traits. I. The genetic constitution of generations
and identification of major gene genotypes. Acta Agronomica Sinica 19, 16 (in Chinese with English
abstract).
Mo, H. (1993b) Genetic analysis for qualitativequantitative traits. II. Generation means and genetic vari-
ances. Acta Agronomica Sinica 19, 193200 (in Chinese with English abstract).
Mockler, T.C. and Ecker, J.R. (2004) Application of DNA tiling arrays for whole-genome analysis. Genomics
85, 115.
Mohler, V. and Singrn, C. (2004) General considerations: marker-assisted selection. In: Lrz, H. and
Wenzl, G. (eds) Biotechnology in Agricultural and Forestry, Vol. 55. Molecular Marker Systems in
Plant Breeding and Crop Improvement. Springer-Verlag, Berlin, pp. 305317.
Mohler, V. and Schwartz, G. (2005) Genotyping tools in plant breeding: from restriction fragment length
polymorphisms to single nucleotide polymorphisms. In: Lrz, H. and Wenzel, G. (eds) Molecular
Marker Systems in Plant Breeding and Crop Improvement. Biotechnology in Agriculture and Forestry,
Vol. 55. Springer-Verlag, Berlin, pp. 2338.
References 681

Moing, A., Deborde, C. and Rolin, D. (2007) Metabolic fingerprinting and profiling by proton NMR. In: Morot-
Gaudry, J.F., Lea, P. and Briat, J.F. (eds) Functional Plant Genomics. Science Publishers, Enfield, New
Hampshire, pp. 335344.
Molloy, M.P. and Witzmann, F.A. (2002) Proteomics: technologies and applications. Briefings in Functional
Genomics and Proteomics 1, 2329.
Moncada, P., Martinez, C.P., Borrero, J., Chatel, M., Gauch, H., Guimaraes, E., Tohme, J. and McCouch,
S.R. (2001) Quantitative trait loci for yield and yield components in an Oryza sativa Oryza rufipogon
BC2F2 population evaluated in an upland environment. Theoretical Applied Geneics 102, 4152.
Monna, L., Lin, H.X., Kojima, S., Sasaki, T. and Yano, M. (2002) Genetic dissection of a genomic region for a
quantitative trait locus, Hd3, into two loci, Hd3a and Hd3b, controlling heading date in rice. Theoretical
and Applied Genetics 104, 772778.
Mooers, C.A. (1921) The agronomic placement of varieties. Journal of American Society of Agronomy 13,
337352.
Moore, S.K. and Srivastava, V. (2006) Efficient deletion of transgenic DNA from complex integration locus
of rice mediated by Cre/lox recombination system. Crop Science 46, 700705.
Moreau, L., Charcosset, A., Hospital, F. and Gallais, A. (1998) Marker-assisted selection efficiency in popu-
lations of finite size. Genetics 148, 13531365.
Moreau, L., Lemarie, S., Charcosset, A. and Gallais, A. (2000) Economic efficiency on one cycle of marker-
assisted selection. Crop Science 40, 329337.
Moreau, L., Charcosset, A. and Gallais, A. (2004) Experimental evaluation of several cycles of marker-
assisted selection in maize. Euphytica 137, 111118.
Moreno-Gonzalez, J., Dudley, J.W. and Lambert, R.J. (1975) A design II study of linkage disequilibrium for
percent oil in maize. Crop Science 15, 840843.
Morgante, M. and Vogel, J. (1997) Compound microsatellite primers for the detection of genetic polymor-
phisms. Patent EP 0804618.
Morris, M., Dreher, K., Ribau, J.M. and Khairallah, M. (2003) Money matters (II): cost of maize inbred
line conversion schemes at CIMMYT using conventional and marker-assisted selection. Molecular
Breeding 11, 235247.
Morton, N.E. (1955) Sequential test for the detection of linkage. American Journal of Human Genetics 7,
277318.
Moser, H. and Lee, M. (1994) RFLP variation and genealogical distance, multivariate distance, heterosis
and genetic variance in oats. Theoretical and Applied Genetics 87, 947956.
Mu, J., Zhou, H., Zhao, S., Xu, C., Yu, S. and Zhang, Q. (2004) Development of contiguous introgres-
sion lines covering entire genome of the sequenced japonica rice. In: New Directions for a Diverse
Planet: Proceedings of the 4th International Crop Science Congress, 26 September1 October 2004,
Brisbane, Australia. Published on CD-ROM. Available at: http://www.cropscience.org.au/icsc2004/
(accessed 17 November 2009).
Muehlbauer, G.J., Specht, J.E., Thomas-Compton, M.A., Staswick, P.E. and Bernard, R.L. (1988) Near-
isogenic lines a potential resource in the integration of conventional and molecular marker linkage
maps. Crop Science 28, 729735.
Mueller, M., Goel, A., Thimma, M., Dickens, N.J., Aitman, T.J. and Mangion, J. (2006) eQTL Explorer: inte-
grated mining of combined genetic linkage and expression experiments. Bioinformatics 22, 509511.
Mukhambetzhanov, S.K. (1997) Culture of nonfertilized female gametophytes in vitro. Plant Cell, Tissue
and Organ Culture 48, 111119.
Mullis, K. (1992) Process for amplifying nucleic acid sequences. Patent EP 0201184B1.
Mumm, R.H. and Dudley, J.W. (1994) Classification of 148 U.S. maize inbreds. I. Cluster analysis based on
RFLPs. Crop Science 34, 842851.
Munaf, M.R. and Flint, J. (2004) Meta-analysis of genetic association studies. Trends in Genetics 20,
439444.
Muranty, H. (1996) Power of tests for quantitative trait loci detection using full-sib families in different
schemes. Heredity 76, 156165.
Murigneux, A., Baud, S. and Beckert, M. (1993) Molecular and morphological evaluation of doubled-hap-
loid lines in maize: 2. Comparison with single-seed-descent lines. Theoretical and Applied Genetics
87, 278287.
Mles, S., Peiffer, J., Brown, P.J., Ersoz, E.S., Zhang, Z., Costich, D.E. and Buckler, E.S. (2009) Association
mapping: critical considerations shift from genotyping to experimental design. The Plant Cell 21,
21942202.
682 References

Nagaraju, J. (2003) Novel FISSR-PCR primes and method of identifying genotyping diverse genomes of
plant and animal systems including rice varieties, a kit thereof. Patent WO 03085133.
Nakagahra, M. (1972) Genetic mechanism on the distorted segregation of marker gene belonging to the
eleventh linkage group in cultivated rice. Japanese Journal of Breeding 22, 232238.
Nakazaki, T., Okumoto, Y., Horibata, A., Yamahira, S., Teraishi, M., Nishida, H., Inoue, H. and Tanisaka, T.
(2003) Mobilization of a transposon in the rice genome. Nature 421, 170172.
Naqvi, S., Zhu, C., Farrea, G., Ramessara, K., Bassiea, L., Breitenbach, J., Conesa, D.P., Ros, G.,
Sandmann, G., Capell, T. and Christou, P. (2009) Transgenic multivitamin corn through biofortification
of endosperm with three vitamins representing three distinct metabolic pathways. Proceedings of the
National Academy of Sciences of the United States of America 106, 77627767.
Narayanan, N.N., Baisakh, N., Oliva, N.P., Vera Cruz, C.M., Gnanamanickam, S.S., Datta, K. and Datta, S.K.
(2004) Molecular breeding: marker-assisted selection combined with biolistic transformation for blast
and bacterial blight resistance in indica rice (cv. CO39). Molecular Breeding 14, 6171.
Naseem, A., Oehmmke, J.F. and Schimmelpfennig, D.E. (2005) Does plant variety intellectual property
protection improve farm productivity? Evidence from cotton varieties. AgBioForum 8, 100107.
Navarro, R.L., Warrier, G.S. and Maslog, C.C. (2006) Genes Are Gems: Reporting Agri-Biotechnology.
A Sourcebook for Journalists. International Crops and Research Institute for the Semi-Arid Tropics,
Andhra Pradesh, India, 136 pp.
Naylor, R.L., Falcon, W.P., Goodman, R.M., Jahn, M.M., Sengooba, T., Tefera, H. and Nelson, R.J. (2004)
Biotechnology in the developing world: a case for increased investments in orphan crops. Food Policy
29, 1544.
Negrotto, D., Jolley, M., Beer, S., Wenck, A.R. and Hansen, G. (2000) The use of hosphomannose-iso-
merase as a selectable marker to recover transgenic maize plants (Zea mays L.) via Agrobacterium
transformation. Plant Cell Reports. 19, 798803.
Nei, M. (1972) Genetic distance between populations. The American Naturalist 106, 283292.
Nei, M. (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy
of Sciences of the United States of America 70, 33213323.
Nei, M., Tajima, F. and Tateno, Y. (1983) Accuracy of estimated phylogenetic trees from molecular data. II.
Gene frequency data. Journal of Molecular Evolution 19, 153170.
Nelson, O.E. (2001) Maize: the long trail to QTM. In: Reeve, E.C.R. and Black, I. (eds) Encyclopedia of
Genetics. Fitzroy Dearborn, London, pp. 657660.
Neuffer, M.G., Coe, E.H. and Wessler, S. (1997) Mutants of Maize. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, New York.
Ngetich, K.A. (2005) Indigenous Knowledge, Alternative Medicine and Intellectual Property Rights
Concerns in Kenya. 11th General Assembly, 610 December 2005, Maputo, Mozambique. Egerton
University, Njoro, Kenya.
Nguyen, B.D., Brar, D.S., Bui, B.C., Nguyen, T.V, Pham, L.N. and Nguyen, H.T. (2003) Identification and
mapping of the QTL for aluminum tolerance introgressed from the new source Oryza rufipogon Griff.
into indica rice (Oryza sativa L.). Theoretical and Applied Genetics 106, 583593.
Nguyen, H.T., Chandra Babu, R. and Blum, A. (1997) Breeding for drought tolerance in rice: physiology and
molecular genetics considerations. Crop Science 37, 14261434.
Nguyen, T.T.T., Klueva, N., Chamareck, V., Aarti, A., Magpantay, G., Millena, A.C.M., Pathan, M.S. and
Nguyen, H.T. (2004) Saturation mapping of QTL regions and identification of putative candidate genes
for drought tolerance in rice. Molecular Genetics and Genomics 272, 3546.
Ni, J.J., Wu, P., Senadhira, D. and Huang, N. (1998) Mapping QTLs for phosphorus deficiency tolerance in
rice (Oryza sativa L.). Theoretical and Applied Genetics 97, 13611369.
Ni, Z.F., Sun, Q.X., Liu, Z.Y. and Huang, T.C. (1997) Studies on heterotic grouping in wheat: II. Genetic
diversity among common wheat, Tibet semi-wild wheat and spelt wheat. Journal of Agricultural
Biotechnology (China) 5, 103111.
Nicholas, F.W. (2006) Discovery, validation and delivery of DNA markers. Australian Journal of Experimental
Agriculture 46, 155158.
Nicholson, L., Gonzalez-Melendi, P., van Dolleweerd, C., Tuck, H., Perrin, Y., Ma, J.K.-C., Fischer, R.,
Christou, P. and Stoger, E. (2005) A recombinant multimeric immunoglobulin expressed in rice
shows assembly dependent subcellular localization in endosperm cells. Plant Biotechnology 3,
115127.
Nickson, T.E. (2008) Planning environmental risk assessment for genetically modified crops: problem for-
mulation for stress-tolerant crops. Plant Physiology 147, 494502.
References 683

Nicolas, P. and Chiapello, H. (2007) Gene prediction. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds)
Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 7185.
Niebur, W.S., Rafalski, J.A., Smith, O.S. and Cooper, M. (2004) Applications of genomics technologies to
enhance rate of genetic progress for yield of maize within a commercial breeding program. In: Fischer,
T. (ed.) New Directions for a Diverse Planet. Proceedings of the 4th International Crop Science
Congress, Brisbane, Australia. Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17
November 2009).
Nilsson, M., Malmgren, H., Samiotaki, M., Kwiatkowski, M., Chowdhary, B.P. and Landegren, U.
(1994) Padlock probes: circularization oligonucleotides for localized DNA detection. Science 265,
20852088.
Nobcourt, P. (1939) Sur la prennite et laugmentation de volume des cultures de tissus vgtaux.
Comptes Rendus des Sances-Societe Biologie 130, 12701271.
Noirot, M., Anthony, F., Dussert, S. and Hamon, S. (2003) A method for building core collections. In: Hamon,
P., Seguin, M., Perrier, X. and Glaszmann, J.C. (eds) Genetic Diversity of Cultivated Tropical Plants.
Science Publishers, Enfield, New Hampshire and CIRAD, Paris, pp. 6575.
Nordborg, M. (2000) Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with
partial self-fertilization. Genetics 154, 923929.
Nordborg, M., Borevitz, J.O., Bergelson, J., Berry, C.C., Chory, J., Hagenblad, J., Kreitman, M., Maloof,
J.N., Noyes, T., Oefner, P.J., Stahl, E.A. and Weigel, D. (2002) The extent of linkage disequilibrium in
Arabidopsis thaliana. Nature Genetics 30, 190193.
NRC (National Research Council) (2001) Genetically Modified Pest-Protected Plants: Science and
Regulation. National Academy Press, Washington, DC.
NRC (National Research Council) (2002) Environmental Effects of Transgenic Plants: the Scope and
Adequacy of Regulation. National Academy Press, Washington, DC.
Nunberg, A.N., Li, Z. and Thomas, T.L. (1996) Analysis of gene expression and gene isolation by high-
throughput sequencing of plant cDNAs. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G.
Landes Company, Austin, Texas, pp. 169177.
Nyquist, W.E. (1991) Estimation of heritability and prediction of selection response in plant populations.
Critical Review of Plant Science 10, 235322.
OBrien, S.J. and Mayr, E. (1991) Bureaucratic mischief: recognizing endangered species and subspecies.
Science 251, 11871188.
OFlanagan, R.A., Paillard, G., Lavery, R. and Sengupta, A.M. (2005) Non-additivity in proteinDNA bind-
ing. Bioinformatics 21, 22542263.
Odell, J.T., Nagy, F. and Chua, N.H. (1985) Identification of DNA sequences required for activity of the
cauliflower mosaic virus 35S promoter. Nature 313, 810812.
Ogawa, Y., Dansako, T., Yano, K., Sakurai, N., Suzuki, H., Aoki, K., Noji, M., Saito, K. and Shibata, D.
(2008) Efficient and high-throughput vector construction and Agrobacterium-mediated transformation
of Arabidopsis thaliana suspension-cultured cells for functional genomics. Plant and Cell Physiology
49, 242250.
Oka, H.I. (1988) Origin of Cultivated Rice. Japan Scientific Societies Press, Tokyo.
Okkels, T.F. and Whenham, R.J. (1994) Method for the selection of genetically transformed cells and com-
pound for the used in the method. Patent EP 0601092B1.
Olek, A. (1996) Amplification of simple sequence repeats. Patent EP 0870062.
Oleykowski, C.A., Bronson Mullins, C.R., Godwin, A.K. and Yeung, A.T. (1998) Mutation detection using a
novel plant endonuclease. Nucleic Acids Research 26, 45974602.
Oliver, S.G., Winson, M.K., Kell, D.B. and Baganz, F. (1998) Systematic functional analysis of the yeast
genome. Trends in Biotechnology 16, 373378.
Olufowote, J.O., Xu, Y., Chen, X., Park, W.D., Beachell, H.M., Dilday, R.H., Goto, M. and McCouch, S.R.
(1997) Comparative evaluation of within-cultivar variation of rice (Oryza sativa L.) using microsatellite
and RFLP markers. Genome 40, 370378.
Openshaw, S. and Bruce, W.B. (2001) Marker-assisted identification of a gene associated with a pheno-
typic trait. Patent EP 1230385.
Openshaw, S.J. and Frascaroli, E. (1997) QTL detection and marker-assisted selection for complex traits in
maize. Proceedings of Corn and Sorghum Industrial Research Conference 52, 4453.
Oraguzie, N.C., Wilcox, P.L., Rikkerink, E.H.A. and De Silva, H.N. (2007) Linkage disequilibrium. In:
Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E. and De Silva, H.N. (eds) Association Mapping in
Plants. Springer, Berlin, pp. 1139.
684 References

Orf, J.H., Chase, K., Jarvik, T., Mansur, L.M., Cregan, P.B., Adler, F.R. and Lark, K.G. (1999) Genetics of
agronomic traits: I. Comparison of three related recombinant inbred populations. Crop Science 39,
16421651.
Ortiz, R. and Smale, M. (2007) Transgenic technology: pro-poor or pro-rich? Chronica Horticulturae 47,
912.
Ossowski, S., Schwab, R. and Weigel, D. (2008) Gene silencing in plants using artificial microRNAs and
other small RNAs. The Plant Journal 53, 674690.
Ouyang, Z., Mowers, R.P., Jensen, A., Wang, S. and Zeng, S. (1995) Cluster analysis for genotype
environment interaction with unbalanced data. Crop Science 33, 13001305.
Ow, D.W. (2001) The right chemistry for marker gene removal? Nature Biotechnology 19, 115116.
Ow, D.W. (2002) Recombinase-directed plant transformation for the post-genomic era. Plant Molecular
Biology 48, 183200.
Ow, D.W., Wood, K.V., DeLuca, M., de Wet, J.R., Helinski, D.R. and Howell, S.H. (1986) Transient and stable
expression of the firefly luciferase gene in plant cells and transgenic plants. Science 234, 856859.
Owen, H.R. (1996) Plant germplasm. In: Hunter-Cevera, J.C. and Belt, A. (eds) Maintaining Cultures for
Biotechnology and Industry. Academic Press, Inc., London, pp. 197228.
Paine, J.A., Shipton, C.A., Chaggar, S., Howells, R.M., Kennedy, M.J., Vernon, G., Wright, S.Y., Hinchliffe,
E., Adams, J.L., Silverstone, A.L. and Drake, R. (2005) Improving the nutritional value of Golden Rice
through increased pro-vitamin A content. Nature Biotechnology 23, 482487.
Palmer, C.E. and Keller, W.A. (2005) Overview of haploidy. In: Palmer, C.E., Keller, W.A. and Kasha, K.J.
(eds) Biotechnology in Agriculture and Forestry, Vol. 56. Haploids in Crop Improvement II. Springer-
Verlag, Berlin, pp. 39.
Palmer, C.E., Keller, W.A. and Kasha, K.J. (eds) (2005) Biotechnology in Agriculture and Forestry, Vol. 56.
Haploids in Crop Improvement II. Springer-Verlag, Berlin.
Palmer, L.E., Rabinowicz, P.D., OShaughnessy, A.L., Balija, V.S., Nascimento, L.U., Dike, S., de la Bastide,
M., Martienssen, R.A. and McCombie, W.R. (2003) Maize genome sequencing by methylation filtra-
tion. Science 302, 21152117.
Palmer, R.G. and Shoemaker, R.C. (1998) Soybean genetics. In: Hrustic, M., Vidic, M. and Jackovic, D.
(eds) Soybean Institute of Field and Vegetative Crops. Novi Sad, Yugoslavia, pp. 4582.
Palmiter, R.D., Norstedt, G., Gelinas, R.E., Hammer, R.E. and Brinster, R.L. (1983) Metallothionein
human GH fusion genes stimulate growth of mice. Science 222, 809814.
Pan, Q.L., Liu, Y.S., Budai-Hadrian, O., Sela, M., Carmel-Goren, L., Zamir, D. and Fluhr, R. (2000)
Comparative genetics of nucleotide binding size leucine-rich repeat resistance gene homologues in
the genomes of two dicotyledons: tomato and Arabidopsis. Genetics 155, 309322.
Panaud, O., Chen, X. and McCouch, S.R. (1996) Development of microsatellite markers and characteriza-
tion of simple sequence length polymorphism (SSLP) in rice (Oryza sativa L.). Molecular and General
Genetics 252, 597607.
Pang, S.-Z., DeBoer, D.L., Wan, Y., Ye, G., Layton, J.G., Neher, M.K., Armstrong, C.L., Fry, J.E., Hinchee,
M.A.W. and Fromm, M.E. (1996) An improved green fluorescent protein gene as a vital marker in
plants. Plant Physiology 112, 893900.
Para, R., Acosta, J., Delgado-Salinas, A. and Gepts, P. (2005) A genome-wide analysis of differentia-
tion between wild and domesticated Phaseolus vulgaris from Mesoamerica. Theoretical and Applied
Genetics 111, 11471158.
Paran, I. and Michelmore, R.W. (1993) Development of reliable PCR-based markers linked to downy mil-
dew resistance genes in lettuce. Theoretical and Applied Genetics 85, 985993.
Paran, I., Kesseli, R.V. and Michemore, R.W. (1991) Identification of RFLP and RAPD markers linked to
downy mildew resistance genes in lettuce using near-isogenic lines. Genome 34,10211027.
Pardey, P.G., Wright, B.D., Nottenburg, C., Binenbaum, E. and Zambrano, P. (2003) Intellectual prop-
erty and developing countries: freedom to operate in agricultural biotechnology. Biotechnology
and Genetic Resource Policies Brief 3. International Food Policy Research Institute (IFPRI),
Washington, DC.
Parekh, S.R. (ed.) (2004) The GMO Handbook: Genetically Modified Animals, Microbes and Plants in
Biotechnology. Humana Press, Totowa, New Jersey.
Parinov, S. and Sundaresan, V. (2000) Functional genomics in Arabidopsis: large scale insertional mutagen-
esis complements the genome sequencing project. Current Opinion in Biotechnology 11, 157161.
Parisseaux, B. and Bernardo, R. (2004) In silico mapping of quantitative trait loci in maize. Theoretical and
Applied Genetics 109, 508514.
References 685

Park, S.J., Walsh, E.J., Reinbergs, E., Song, L.S.P. and Kasha, K. (1976) Field performance of doubled
haploid barley lines in comparison with lines developed by the pedigree and single seed descent
methods. Canadian Journal of Plant Science 56, 467474.
Parkin, I.A.P., Gulden, S.M., Sharp, A.G., Lukens, L., Trick, M., Osborn, T.C. and Lydiate, D.J. (2005)
Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis
thaliana. Genetics 171, 765781.
Paterson, A.H. (1996a) Mapping genes responsible for differences in phenotype. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 4154.
Paterson, A.H. (1996b) Physical mapping and map-based cloning: bridging the gap between DNA markers
and genes. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, Texas,
pp. 5562.
Paterson, A.H. (ed.) (1998) Molecular Dissection of Complex Traits. CRC Press, Boca Raton, Florida,
305 pp.
Paterson, A.H., Lander, E.S., Hewitt, J.D., Peterson, S., Lincoln, S.E. and Tanksley, S.D. (1988) Resolution
of quantitative traits into Mendelian factors, using a complete linkage map of restriction fragment
length polymorphisms. Nature 335, 721726.
Paterson, A.H., Deverna, J.W., Lanini, B. and Tanksley, S.D. (1990) Fine mapping of quantitative trait loci
using selected overlapping recombinant chromosomes, in an interspecific cross of tomato. Genetics
124, 735742.
Paterson, A.H., Damon, S., Hewitt, J.D., Zamir, D., Rabinowitch, H.D., Lincoln, S.E., Lander, E.C. and
Tanksley, S.D. (1991) Mendelian factors underlying quantitative traits in tomato: comparison across
species, generation and environments. Genetics 127, 181197.
Paterson, A.H., Lin, Y.R., Li, Z., Schertz, K.F., Doebley, J.F., Pinson, S.R.M., Liu, S.-C., Stansel, J.W. and
Irvine, J.E. (1995) Convergent domestications of cereal crops by independent mutations at corre-
sponding genetic loci. Science 269, 17141718.
Paterson, A.H., Saranga, Y., Menz, M., Jiang, C.X. and Wright, R.J. (2003) QTL analysis of genotype envi-
ronment interactions affecting cotton fiber quality. Theoretical and Applied Genetics 106, 384396.
Patwardhan, B. (2005) Ethnopharmacology and drug discovery. Journal of Ethnopharmacology 100,
5052.
Peacock, J. and Chaudhury, A. (2002) The impact of gene technologies on the use of genetic resources. In:
Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic
Diversity. International Plant Genetic Resources Institute, Rome, pp. 3342.
Peakall, R., Gilmore, S., Keys, W., Morgante, M. and Rafalski, A. (1998) Cross-species amplification of
soybean (Glycine max) simple sequence repeats (SSRs) within the genus and other legume genera:
implications for the transferability of SSRs in plants. Molecular Biology and Evolution 15, 12751287.
Pearson, J.V., Huentelman, M.J., Halperin, R.F., Tembe, W.D., Melquist, S., Homer, N., Brun, M., Szelinger,
S., Coon, K.D., Zismann, V.L., Webster, J.A., Beach, T., Sando, S.B., Aasly, J.O., Heun, R., Jessen, F.,
Klsch, H., Tsolaki, M., Daniilidou, M., Reiman, E.M., Papassotiropoulos, A.P., Hutton, M.L., Stephan,
D.A. and Craig, D.W. (2007) Identification of the genetic basis for complex disorders by use of pooling-
based genomewide single-nucleotide-polymorphism association studies. American Journal of Human
Genetics 80, 126139.
Pearson, W.R., Wood, T., Zhang, Z. and Miller, W. (1997) Comparison of DNA sequences with protein
sequences. Genomics 15, 2436.
Peleg, Z., Saranga, Y., Suprunova, T., Ronin, Y., Rder, M.S., Kilian, A., Korol, A.B. and Fahima, T. (2008)
High-density genetic map of durum wheat wild emmer wheat based on SSR and DArT markers.
Theoretical and Applied Genetics 117, 103115.
Peleman, J.D. and van der Voort, J.R. (2003) Breeding by design. Trends in Plant Science 8, 330334.
Peleman, J.D., Wye, C., Zethof, J., Sorensen, A.P., Verbakel, H., van Oeveren, J., Gerats, T. and van der
Voort, J.R. (2005) Quantitative trait locus (QTL) isogenic recombinant analysis: a method for high-
resolution mapping of QTL within a single population. Genetics 171, 13411352.
Pea, L. (ed.) (2004) Methods in Molecular Biology, Vol. 286: Transgenic Plants: Methods and Protocols.
Humana Press Inc., Totowa, New Jersey.
Peng, J., Richards, D.E., Hartley, N.M., Murphy, G.P., Devos, K.M., Flintham, J.E., Beales J., Fish, L.J.,
Wordland, A.J., Pelica, F., Sudhakar D., Christou, P., Snape, J.W., Gale, M.D. and Harberd, N.P. (1999)
Green revolution genes encode mutant gibberellin response modulators. Nature 400, 256261.
Peng, Z.B., Liu, X.Z., Fu, J.H., Li, L.C. and Huang, C.L. (1998) Preliminary studies on the superior inbred
groups and construction of heterosis mode. Acta Agronomica Sinica 24, 711717.
686 References

Pereira, M.G., Lee, M.M. and Rayapati, P.J. (1994) Comparative RFLP and QTL mapping in sorghum and
maize. In: Second Internal Conference on the Plant Genome. Scherago Int., New York, Poster 169.
Prez, T., Albornoz, J. and Dominguez, A. (1998) An evaluation of RAPD fragment reproducibility and
nature. Molecular Evolution 7, 13471358.
Prez-Enciso, M. (2004) In silico study of transcriptome genetic variation in outbred populations. Genetics
166, 547554.
Perkins, J.M. and Jinks, J.L. (1973) The assessment and specificity of environmental and genotypeenvi-
ronmental components of variability. Heredity 30, 111126.
Perlin, M. (1995) Method and system for genotyping. Patent EP 0714537.
Perumal, R., Krishnaramanujam, R., Menz, M.A., Katil, S., Dahlberg, J., Magill, C.W. and Rooney, W.L.
(2007) Genetic diversity among sorghum races and working groups based on AFLPs and SSRs. Crop
Science 47, 13751383.
Pesek, J. and Baker, R.J. (1969) Desired improvement in relation to selection indices. Canadian Journal of
Plant Science 49, 803804.
Peters, J.L., Cnudde, F. and Gerats, T. (2003) Forward genetics and map-based cloning approaches. Trends
in Plant Science 8, 484491.
Peterson, D.G., Schulze, S.R., Sciara, E.B., Lee, S.A., Bowers, J.E., Nagel, A., Jiang, N., Tibbitts, D.C.,
Wessler, S.R. and Paterson, A.H. (2002) Integration of Cot analysis, DNA cloning and high throughput
sequencing facilitate genome characterization and gene discovery. Genome Research 12, 795807.
Peterson, P.A. (1992) Quantitative inheritance in the era of molecular biology. Maydica 37, 718.
Pettersson, F., Morris, A.P., Barnes, M.R. and Cardon, L.R. (2008) Goldsurfer2 (Gs2): a comprehensive tool
for the analysis and visualization of genome wide association studies. BMC Bioinformatics 9, 138.
Phillips, R.L. (2006) Genetic tools from nature and the nature of genetic tools. Crop Science 46,
22452252.
Phillips, R.L. (2008) Can genome sequencing of model plants be helpful for crop improvement? Proceedings
of 5th International Crop Science Congress, 1318 April 2008, Jeju, Korea. International Crop Science
Society, Madison, Wisconsin.
Phillips, R.L., Chen, J., Okediji, R. and Burk, D. (2004) Intellectual property rights and the public good. The
Scientist 18, 8.
Phizicky, E., Bastiaens, P.I.H., Zhu, H., Snyder, M. and Fields, S. (2003) Protein analysis on a proteomic
scale. Nature 422, 208215.
Pickering, R.A. and Devaux, P. (1992) Haploid production: approaches and use in plant breeding. In: Shewry,
P.R. (ed.) Barley: Genetics, Molecular Biology and Biotechnology. CAB International, Wallingford, UK,
pp. 511539.
Picoult-Newberg, L., Ideker, T.E., Pohl, M.G., Taylor, S.L., Donaldson, M.A., Nickerson, D.A. and Boyce-
Jacino, M. (1999) Mining SNPs from EST databases. Genome Research 9, 167174.
Piepho, H.P. (2000) A mixed model approach to mapping quantitative trait loci in barley on the basis of
multiple environment data. Genetics 156, 20432050.
Pillen, K., Pineda, O., Lewis, C.B. and Tanksley, S.D. (1996) Status of genome mapping tools in the taxon
Solonaceae. In: Paterson, A.H. (ed.) Genome Mapping in Plants. R.G. Landes Company, Austin, TX,
pp. 281308.
Pineda, O., Bonierbale, M.W., Plaisted, R.L., Brodie, B.B. and Tanksley, S.D. (1993) Identification of RFLP
markers linked to the H1 gene conferring resistance to the potato cyst nematode Globodera rosto-
chiensis. Genome 36, 152156.
Plant Ontology Consortium (2002) The Plant Ontology Consortium and plant ontologies. Comparative
and Functional Genomics 3, 137142.
Plomion, C., Durel C.-E. and OMalley, D.M. (1996) Genetic dissection of height in maritime pine seedlings
raised under accelerated growth conditions. Theoretical and Applied Genetics 93, 849858.
Plotsky, Y., Cahaner, A., Haberfeld, A., Lavi, U., Lamont, S.J. and Hillel, J. (1993) DNA fingerprint bands
applied to linkage analysis with quantitative trait loci in chickens. Animal Genetics 24, 105110.
Podlich, D.W. and Cooper, M. (1998) QU-GENE: a platform for quantitative analysis of genetic models.
Bioinformatics 14, 632653.
Podlich, D.W., Cooper, M.E. and Basford, K.E. (1999) Computer simulation of a selection strategy to
accommodate genotypeenvironment interactions in a wheat recurrent selection programme. Plant
Breeding 118, 1728.
Podlich, D.W., Winkler, C.R. and Cooper, M. (2004) Mapping as you go: an effective approach for marker-
assisted selection of complex traits. Crop Science 44, 15601571.
References 687

Poehlman, J.M. and Quick, J.S. (1983) Crop breeding in a hunger world. In: Wood, D.R., Rawal, K.M.
and Wood, M.N. (eds) Crop Breeding. American Society of Agronomy and Crop Science Society of
America, Madison, Wisconsin, pp. 119.
Pollak, L.M., Gardner, C.O., Kahler, A.L. and Thomas-Compton, M. (1984) Further analysis of the mating
system in two mass selected populations of maize. Crop Science 24, 793796.
Pooni, H.S., Kumar, I. and Khush, G.S. (1992) A comprehensive model for disomically inherited metrical
traits expressed in triploid tissues. Heredity 69, 166174.
Pooni, H.S., Kumar, I. and Khush, G.S. (1993) Genetical control of amylose content in selected crosses of
indica rice. Heredity 70, 269280.
Popelka, J.C. and Altpeter, F. (2003) Agrobacterium tumefaciens-mediated genetic ransformation of rye
(Secale cereale L.). Molecular Breeding 11, 203211.
Popelka, J.C., Xu, J. and Altpeter, F. (2003) Generation of rye plants with low copy number after biolis-
tic gene transfer and production of instantly marker-free transgenic rye. Transgenic Research 12,
587596.
Porceddu, A., Albertini, E., Barcaccia, G., Marconi, G., Bertoli, F. and Veronesi, F. (2002) Development of
S-SAP markers based on an LTR-like sequence from Medicago sativa L. Molecular Genetics and
Genomics 267, 107114.
Porta, C. and Lomonossoff, G.P. (2002) Viruses as vectors for the expression of foreign sequences in
plants. Biotechnology and Genetic Engineering Reviews 19, 245291.
Portyanko, V.A., Hoffman, D.L., Lee, M. and Holland, J.B. (2001) A linkage map of hexaploid oat based on
grass anchor DNA clones and its relationship to other oat maps. Genome 44, 249265.
Potrykus, I. (2005) Golden Rice, vitamin A and blindness public responsibility and failure. Available at:
http://www.goldenrice.org/PDFs/Potrykus_Zurich_2005.pdf (accessed 17 November 2009).
Prasanna, B.M., Vasal, S.K., Kassahun, B. and Singh, N.N. (2001) Quality protein maize. Current Science
81, 13081319.
Preston, L.R., Harker, N., Holton, T. and Morell, M.K. (1999) Plant cultivar identification using DNA analysis.
Plant Varieties and Seeds 12, 191205.
Price, A.H. and Tomos, A.D. (1997) Genetic dissection of root growth in rice (Oryza sativa L.): II. Mapping
quantitative trait loci using molecular markers. Theoretical and Applied Genetics 95, 143152.
Primmer, C.R., Ellengren, H., Saino, N. and Moller, A.P. (1996) Directional evolution in germline microsatel-
lite mutations. Nature Genetics 13, 391393.
Primrose, S.B. (1995) Principles of Genome Analysis. Blackwell Science, Oxford, UK, pp. 1437.
Pritchard, J.K. and Rosenberg, N.A. (1999) Use of unlinked genetic markers to detect population stratifica-
tion in association studies. American Journal of Human Genetics 65, 220228.
Pritchard, J.K., Stephens, M. and Donnelly, P. (2000a) Inference of population structure using multilocus
genotype data. Genetics 155, 945959.
Pritchard, J.K., Stephens, M., Rosenberg, N.A. and Donnelly, P. (2000b) Association mapping in structured
populations. American Journal of Human Genetics 67, 170181.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., de Bakker,
P.I.W., Daly, M.J. and Sham, P.C. (2007) PLINK: a toolset for whole-genome association and popula-
tion-based linkage analysis. American Journal of Human Genetics 81, 559575.
Qi, X., Stam, P. and Lindhout, P. (1998) Use of locus-specific AFLP markers to construct a high-density
molecular map in barley. Theoretical and Applied Genetics 96, 376384.
Qi, X., Pittaway, T.S., Lindup, S., Liu, H., Waterman, E., Padi, F.K., Hash, C.T., Zhu, J., Gale, M.D. and
Devos, K.M. (2004) An integrated genetic map and a new set of simple sequence repeat markers for
pearlmillet, Pennisetum glaucum. Theoretical and Applied Genetics 109, 14851493.
Qian, W., Sass, O., Meng, J., Li, M., Frauen, M. and Jung, C. (2007) Heterotic patterns in rapeseed (Brassica
napus L.): I. Crosses between spring and Chinese semi-winter lines. Theoretical and Applied Genetics
115, 2734.
Quarrie, S.A., Lazic-Jancic, V., Kovacevic, D., Steed, A. and Pekic, S. (1999) Bulk segregant analysis with
molecular markers and its use for improving drought resistance in maize. Journal of Experimental
Botany 50, 12991306.
Rabinowicz, P.D., Schulz, K., Dedhia, N., Yordan, C., Parnemm, L.D., Parnell., L.D., Stein, L., McCombie, R.
and Martienssen, R.A. (1999) Differential methylation of genes and retrotransposons facilitates shot
gun sequencing of maize genome. Nature Genetics 23, 305308.
Raboin, L.-M., Pauquet, J., Butterfield, M., DHont, A. and Glasmann, J.-C. (2008) Analysis of genome-wide link-
age disequilibrium in the high polyploidy sugarcane. Theoretical and Applied Genetics 116, 701714.
688 References

Rae, S.J., Macaulay, M., Ramsay, L., Leigh, F., Mathews, D., OSullivan, D.M., Donini, P., Morris, P.C.,
Powell, W., Marshall, D.F., Waugh, R. and Thomas, W.T.B. (2007) Molecular barley breeding. Euphytica
158, 295303.
Rafalski, A. (2002) Applications of single nucleotide polymorphisms in crop genetics. Current Opinion in
Plant Biology 5, 94100.
Ragavan, S. (2006) Of plant variety protection, agricultural subsidies and the WTO. Available at: http://www.
law.ou.edu/faculty/facfiles/OfPlantVarietyProtection.pdf (accessed 17 November 2009).
Ragot, M. and Lee, M. (2007) Marker-assisted selection in maize: current status, potential, limitations
and perspectives from the private and public sectors. In: Guimares, E.P., Ruane, J., Scherf, B.D.,
Sonnino, A. and Dargie, J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives
in Crops, Livestock, Forestry and Fish. Food and Agriculture Organization of the United Nations,
Rome, pp. 117150.
Ragot, M., Biasiolli, M., Delbut, M.F., DellOrco, A., Malgarini, L., Thevenin, P., Vernoy, J., Vivant, J.,
Zimmermann, R. and Gay, G. (1995) Marker-assisted backcrossing: a practical example. In: Bervill,
A. and Tersac, M. (eds) Les Colloques, No. 72. Techniques et Utilisations des Marqueurs Molculaires.
Institute National de la Recherche Agronomique (INRA), Paris, pp. 4556.
Ragot, M., Gay, G., Muller, J.P. and Durovray, J. (2000) Efficient selection for the adaptation to the envi-
ronment through QTL mapping and manipulation in maize. In: Ribaut, J.-M. and Poland, D. (eds)
Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in Water-limited
Environments, Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mxico, DF, pp.
128130.
Rajaram, S., van Ginkel, M. and Fischer, R.A. (1994) CIMMYTs wheat breeding mega-environments (ME).
In: Proceedings of the 8th International Wheat Genetics Symposium, 2025 July 1993 Beijing, China.
Agricultural Scientech Press, Beijing, pp. 11011106.
Ramachandran, S. and Sundaresan, V. (2001) Transposons as tools for functional genomics. Plant
Physiology and Biochemistry 39, 243252.
Ramage, R.T. (1983) Heterosis and hybrid seed production in barley. In: Frankel, R. (ed.) Monographs on
Theoretical and Applied Genetics, Vol. 6. Heterosis. Springer-Verlag, Berlin, pp. 7193.
Ramakrishna, W. and Bennetzen, J.L. (2003) Genomic colinearity as a tool for plant gene isolation. In:
Grotewold, E. (ed.) Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and
Protocols. Humana Press, Inc., Totowa, New Jersey, pp. 109121.
Ramakrishna, W., Dubcovsky, J., Park, Y.-J., Busso, C., Emberton, J., SanMiguel, P. and Bennetzen, J.L.
(2002) Different types and rates of genome evolution detected by comparative sequence analysis of
orthologous segments from four cereal genomes. Genetics 162, 13891400.
Ramessar, K., Peremarti, A., Gmez-Galera, S., Naqvi, S., Moralejo, M., Muoz, P., Capell, T. and Christou,
P. (2007) Biosafety and risk assessment framework for selectable marker genes in transgenic crop
plants: a case of the science not supporting the politics. Transgenic Research 16, 261280.
Ramlingam, J., Basharat, H.S. and Zhang, G. (2002) STS and microsatellite marker-assisted selection for
bacterial blight resistance and waxy gene in rice, Oryza sativa L. Euphytica 127, 255260.
Rao, K.E.P. and Rao, V.R. (1995) The use of characterization data in developing a core collection of sor-
ghum. In: Hodgkin, T., Brown, A.H.D., van Hintum, Th.J.L. and Morales, E.A.V. (eds) Core Collections
of Plant Genetic Resources. WileySayce, Chichester, UK, pp. 109115.
Rappsilber, J., Siniossoglou, S., Hurt, E.C. and Mann, M. (2000) A generic strategy to analyze the spatial
organization of multi-protein complexes by cross-linking and mass spectrometry. Analytical Chemistry
72, 267275.
Rebai, A. and Goffinet, B. (1993) Power of test for QTL detection using replicated progenies derived from
a diallel crosses. Theoretical and Applied Genetics 86, 10141022.
Rebai, A., Goffinet, B., Mangin, B. and Perret, D. (1994) QTL detection with diallel schemes. In: van Ooijen,
J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of Molecular Markers. Centre for
Plant Breeding and Reproduction Research, Wageningen, Netherlands, pp. 170177.
Reddy, B.V.S. and Comstock, R.E. (1976) Simulation of the backcross breeding method. I. Effect of herit-
ability and gene number on fixation of desired alleles. Crop Science 16, 825830.
Reed, J., Privalle, L., Powell, M.L., Meghji, M., Dawson, J., Dunder, E., Suttie, J., Wenck, A., Launis, K.,
Kramer, C., Chang, Y.-F., Hansen, G. and Wright, M. (2001) Phosphomannose isomerase: an efficient
selectable marker for plant transformation. In Vitro Cellular and Developmental Biology Plant 37,
127132.
References 689

Reeves, T., Pinstrup-Anderson, P. and Randya-Lorch, R. (1999) Food security and role of agricultural
research. In: Coors, J.G. and Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops.
ASA-CSSA-SSSA, Madison, Wisconsin, pp. 15.
Reif, J.C., Melchinger, A.E., Xia, X.C., Warburton, M.L., Hoisington, D.A., Vasal, S.K., Srinivasan, G., Bohn,
M. and Frisch, M. (2003) Genetic distance based on simple sequence repeats and heterosis in tropi-
cal maize populations. Crop Science 43, 12751282.
Reif, J.C., Xia, X.C., Melchinger, A.E., Warburton, M.L., Hoisington, D.A., Beck, D., Bohn, M. and Frisch,
M. (2004) Genetic diversity determined within and among CIMMYT maize populations of tropical,
subtropical and temperate germplasm by SSR markers. Crop Science 44, 326334.
Reif, J.C., Melchinger, A.E. and Frisch, M. (2005) Genetical and mathematical properties of similarity and dis-
similarity coefficients applied in plant breeding and seed bank management. Crop Science 45, 17.
Reiter, R. (2001) PCR-based marker systems. In: Phillip, R.L. and Vasil, I.K. (eds) DNA-Based Markers in
Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 929.
Remington, D.L., Thornsberry, J.M., Matsuoka, Y., Wilson, L.M., Whitt, S.R., Doebley, J., Kresovich, S.,
Goodman, M.M. and Buckler IV, E.S. (2001) Structure of linkage disequilibrium and phenotypic asso-
ciations in the maize genome. Proceedings of the National Academy of Sciences of the United States
of America 98, 1147911484.
Repellin, A., Bga, M., Jauhar, P.P. and Chibbar, R.N. (2001) Genetic enrichment of cereal crops via alien
gene transfer: new challenges. Plant Cell, Tissue and Organ Culture 64, 159183.
Reymond, M., Muller, B., Leonardi, A., Charcosset, A. and Tardieu, F. (2003) Combining quantitative trait
loci analysis and an ecophysiological model to analyze the genetic variability of the responses of
maize leaf growth to temperature and water deficit. Plant Physiology 131, 664675.
Reyna, N. and Sneller, C.H. (2001) Evolution of marker-assisted introgression of yield QTL alleles into
adapted soybean. Crop Science 41, 13171321.
Reynolds, J., Weir, B.S. and Cockerham, C.C. (1983) Estimation of the coancestry coefficient: basis for a
short-term genetic distance. Genetics 105, 767769.
Rhee, S.Y. (2005) Bioinformatics: current limitations and insights for the future. Plant Physiology 138,
569570.
Ribaut, J.-M. and Betrn, J. (1999) Single large-scale marker-assisted selection (SLS-MAS). Molecular
Breeding 5, 531541.
Ribaut, J.-M. and Ragot, M. (2007) Marker-assisted selection to improve drought adaptations in maize: the
backcross approach, perspectives, limitations and alternatives. Journal of Experimental Botany 58,
351360.
Ribaut, J.-M., Hoisington, D.A., Deutsch, J.A., Jiang, C. and Gonzlez-de-Len, D. (1996) Identification
of quantitative trait loci under drought conditions in tropical maize. I. Flowering parameters and the
anthesis-silking interval. Theoretical and Applied Genetics 92, 905914.
Ribaut, J.-M., Huu, X., Hoisington, D. and Gonzales de Leon, D. (1997) Use of STSs and SSRs as rapid and
reliable preselection tools in marker-assisted selection backcross scheme. Plant Molecular Biology
Reporter 15, 156164.
Ribaut, J.-M., Edmeades, G., Perotti, E. and Hoisington, D. (2000) QTL analyses, MAS results and per-
spectives for drought-tolerance improvement in tropical maize. In: Ribaut, J.-M. and Poland, D. (eds)
Molecular Approaches for the Genetic Improvement of Cereals for Stable Production in Water-limited
Environments. Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Mxico, DF, pp.
131136.
Ribaut, J.-M., Jiang, C. and Hoisington, D. (2002a) Simulation experiments on efficiencies of gene intro-
gression by backcrossing. Crop Science 42, 557565.
Ribaut, J.-M., Bnziger, M., Betran, J., Jiang, C., Edmeades, G.O., Dreher, K. and Hoisington, D. (2002b)
Use of molecular markers in plant breeding: drought tolerance improvement in tropical maize. In:
Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International, Wallingford,
UK, pp. 8599.
Richardson, K.L., Vales, M.I., Kling, J.G., Mundt, C.C. and Hayes, P.M. (2006) Pyramiding and dissecting
disease resistance QTL to barley stripe rust. Theoretical and Applied Genetics 113, 485495.
Rick, C.M. (1974) High soluble-solids content in large-fruited tomato lines derived from a wild green-fruited
species. Hilgardia 42, 493510.
Rick, C.M. (1988) Tomato-like nightshades: affinities, autoecology and breeders opportunities. Economic
Botany 42, 145154.
690 References

Rickert, A.M., Premstaller, A., Gebhardt, C. and Oefner, P.J. (2002) Genoptying of SNPs in a polyploid
genome by pyrosequencing. Biotechniques 32, 592603.
Roa-Rodriguez, C. (2003) Promoters Used to Regulate Gene Expression. CAMBIA Intellectual Property,
Canberra.
Roa-Rodriguez, C. and Nottenburg, C. (2003a) Agrobacterium-mediated Transformation of Plants. CAMBIA
Intellectual Property, Canberra.
Roa-Rodriguez, C. and Nottenburg, C. (2003b) Antibiotic Resistance Genes and Their Uses in Genetic
Transformation, Especially in Plants. CAMBIA Intellectual Property, Canberra.
Rber, F.K. (1999) Fortpflanzungsbiologische und genetische Untersuchungen mit RFLP-Markern zur
in-vivo-Haploideninduktion bei Mais. Dissertation, University of Hohenheim. Grauer Verlag, Stuttgart.
Rber, F.K., Gordillo, G.A. and Geiger, H.H. (2005) In vivo haploid induction in maize performance of new
inducers and significance of doubled haploid lines in hybrid breeding. Maydica 50, 275284.
Robert, V.J.M., West, M.A.L., Inai, S., Caines, A., Arntzen, L., Smith, J.K. and St-Clair, D.A. (2001) Marker-
assisted introgression of blackmold resistance QTL alleles from wild Lycopersicon chesmanii to
cultivated tomato (L. esculentum) and evaluation of QTL phenotypic effects. Molecular Breeding 8,
217233.
Roberts, E.H. (1973) Predicting the viability of seeds. Seed Science and Technology 1, 499514.
Roberts, J.K. (2002) Proteomics and a future generation of plant molecular biologists. Plant Molecular
Biology 48, 143154.
Robertson, D.S. (1985) A possible technique for isolating genomic DNA for quantitative traits in plants.
Journal of Theoretical Biology 117, 110.
Robertson, D.S. (1989) Understanding the relationship between qualitative and quantitative genetics. In:
Helentjaris, T. and Burr, B. (eds) Development and Application of Molecular Markers to Problems in
Plant Genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, pp. 8187.
Rockman, M.V. and Kruglyak, L. (2006) Genetics of global gene expression. Nature Reviews Genetics 7,
862872.
Rockman, M.V. and Kruglyak, L. (2008) Breeding designs for recombinant inbred advanced intercross
lines. Genetics 179, 10691078.
Rockman, M.V. and Wray, G.A. (2002) Abundant raw material for cis-regulatory evolution in humans.
Molecular Biology and Evolution 19, 19912004.
Rder, M., Plaschke, J. and Ganal, M. (1997) Microsatellite markers for plants of the species Triticum aes-
tivum and tribe Triticeae and the use of said markers. Patent EP 0835324B1.
Rogers, J.S. (1972) Measures of genetic similarity and genetic distance. In: Studies in Genetics VII, Publ.
7213. University of Texas, Austin, Texas, pp. 145153.
Romagosa, I. and Fox, P.N. (2003) Genotype environment interaction and adaptation. In: Hayward, M.D.,
Bosemark, N.O. and Romagosa, I. (eds) Plant Breeding, Principles and Prospects. Chapman & Hall,
London, pp. 373390.
Romagosa, I., Ullrich, S.E., Han, F. and Hayes, P.M. (1996) Use of the additive main effects and multiplica-
tive interaction model in QTL mapping for adaptation in barley. Theoretical and Applied Genetics 93,
3037.
Romano, A., van der Plas, L.H.W., Witholt, B., Eggink, G. and Mooibroek, H. (2005) Expression of poly-3-
(R)-hydroxyalkanoate (PHA) polymerase and acyl-CoA-transacylase in plastids of transgenic potato
leads to the synthesis of a hydrophobic polymer, presumably medium-chain-length PHAs. Planta 220,
455464.
Romeis. J., Bartsch, D., Bigler, F., Candolfi, M.P., Gielkens, M.M.C., Hartley, S.E., Hellmich, R.L., Huesing,
J.E., Jepson, P.C., Layton, R., Quemada, H., Raybould, A., Rose, R.I., Schiemann, J., Sears, M.K.,
Shelton, A.M., Sweet, J., Vaituzis, Z. and Wolt, J.D. (2008) Assessment of risk of insect-resistant
transgenic crops to nontarget anthropods. Nature Biotechnology 26, 203208.
Rommens, C.M., Haring, M.A., Swords, K., Davies, H.V. and Belknap, W.R. (2007) The intragenic approach
as a new extension to traditional plant breeding. Trends in Plant Science 12, 397403.
Ron Parra, J. and Hallauer, A.R. (1997) Utilization of exotic maize germplasm. Plant Breeding Reviews 14,
165187.
Rong, J., Feltus, F.A., Waghmare, V.N., Pierce, G.J., Chee, P.W., Draye, X., Saranga, Y., Wright, R.J.,
Wilkins, T.A., May, O.L., Smith, C.W., Gannaway, J.R., Wendel, J.F. and Paterson, A.H. (2007) Meta-
analysis of polyploid cotton QTL shows unequal contributions of subgenomes to a complex network
of genes and gene clusters implicated in lint fiber development. Genetics 176, 25772588.
Roos, E.E. (1984) Genetic shifts in mixed bean populations. I. Storage effects. Crop Science 24, 240244.
References 691

Roos, E.E. (1988) Genetic changes in a collection over time. HortScience 23, 8690.
Rostoks, N., Mudie, S., Cardle, L., Russell, J., Ramsay, L., Booth, A., Svensson, J.T., Wanamaker, S.I.,
Walia, H., Rodriguez, E.M., Hedley, P.E., Liu, H., Morris, J., Close, T.J., Marshall, D.F. and Waugh, R.
(2005) Genome-wide SNP discovery and linkage analysis in barley based on genes responsive to
abiotic stress. Molecular Genetics and Genomics 274, 515527.
Rudd, S., Schoof, H. and Mayer, K. (2005) PlantMarkers a database of predicted molecular markers from
plants. Nucleic Acids Research 33, D628632.
Ruf, S., Karcher, D. and Rock, R. (2007) Determining the transgene containment level provided by chlo-
roplast transformation. Proceedings of the National Academy of Sciences of the United States of
America 114, 69987002.
Sackville Hamilton, N.R. and Chorlton, K.H. (1997) Regenaration of accessions in seed collections: a deci-
sion guide. Handbook for Genebanks No. 5. International Plant Genetic Resources Institute, Rome.
Saghai Maroof, M.A., Yang, G.P., Zhang, Q. and Gravois, K.A. (1997) Correlation between molecular marker
distance and hybrid performance in US southern long grain rice. Crop Science 37, 145150.
Saha, S., Sparks, A.B., Rago, C., Akmaev, V., Wang, C.J., Vogelstein, B., Kinzler, K.W. and Velculescu, V.E.
(2002) Using the transcriptome to annotate the genome. Nature Biotechnology 20, 508512.
Saint-Louis, D. and Paquin, B. (2003) Method for genotyping microsatellite DNA markers by mass spec-
trometry. Patent WO 03035906.
Sakamoto, T. and Matsuoka, M. (2008) Identifying and exploiting grain yield genes in rice. Current Opinion
in Plant Biology 11, 209214.
Salathia, N., Lee, H.N., Sangster, T.A., Morneau, K., Landry, C.R., Schellenberg, K., Behere, A.S.,
Gunderson, K.L., Cavalieri, D., Jander, G. and Queitsch, C. (2007) Indel arrays: an affordable alterna-
tive for genotyping. The Plant Journal 51, 727737.
Salse, J., Piegu, B., Cooke, R. and Delseny, M. (2004) New in silico insight into the synteny between rice
(Oryza sativa L.) and maize (Zea mays SL.) highlights reshuffling and identifies new duplications in
the rice genome. The Plant Journal 38, 396409.
Salvi, S. and Tuberosa, R. (2005) To clone or not to clone plant QTLs: present and future challenges. Trends
in Plant Science 10, 297304.
Samalova, M., Brzobohaty, B. and Moore, I. (2005) pOp6/LhGR: a stringently regulated and highly respon-
sive dexamethasone-inducible gene expression system for tobacco. The Plant Journal 41, 919935.
San Noeum, L.H. (1976) Haploids of Hordeum vulgare L. from in vitro culture of unfertilized ovaries. Annales
de l Amelioration des Plantes 26, 751754.
Snchez-Monge, E. (1993) Introduction. In: Hayward, M.D., Bosemark, N.O. and Romagosa, I. (eds) Plant
Breeding, Principles and Prospects. Chapman & Hall, London, pp. 35.
Sanda, S.L. and Amasino, R.M. (1996) Ecotype-specific expression of a flowering mutant phenotype in
Arabidopsis thaliana. Plant Physiology 111, 641644.
Sano, Y. (1990) The genic nature of gamete eliminator in rice. Genetics 125, 183191.
Sant, V.J., Patankar, A.G., Sarode, N.D., Mhase, L.B., Sainani, M.N., Deshmukh, R.B., Ranjekar, P.K. and
Gupta, V.S. (1999) Potential of DNA markers in detecting divergence and in analyzing heterosis in
Indian elite chickpea cultivars. Theoretical and Applied Genetics 98, 12171225.
Saravanan, R.S., Bashir, S. and Rose, J.K.C. (2004) Plant proteomics. In: Christou, P. and Klee, H. (eds)
Handbook of Plant Biotechnology. John Wiley & Sons Ltd, Chichester, UK, pp. 183199.
Sari-Gorla, M., Calinski, T., Kaczmarek, Z. and Krajewski, P. (1997) Detection of QTL environment inter-
action in maize by a least squares interval mapping method. Heredity 78, 146157.
Sarkar, K.R., Pandey, A., Gayen, P., Mandan, J.K., Kumar, R. and Sachan, J.K.S. (1994) Stabilization of
high haploid inducer lines. Maize Genetics Cooperation Newsletter 68, 6465.
Satagopan, J.M., Yandell, B.S., Newton, M.A. and Osborn, T.G. (1996) A Bayesian approach to detect
quantitative trait loci using Markov chain Monte Carlo. Genetics 144, 805816.
Sauer, S., Gelfand, D.H., Boussicault, F., Bauer, K., Reichert, F. and Gut, I.G. (2002) Facile method for auto-
mated genotyping of single nucleotide polymorphisms by mass spectrometry. Nucleic Acid Research
30, e22.
Sawkins, M.C., Farmer, A.D., Hoisington, D., Sullivan, J., Tolopko, A., Jiang, Z. and Ribaut, J.M. (2004)
Comparative Map and Trait Viewer (CMTV): an integrated bioinformatic tool to construct consen-
sus maps and compare QTL and functional genomics data across genomes and experiments. Plant
Molecular Biology 56, 465480.
Sax, K. (1923) The association of size differences with seed coat pattern and pigmentation in Phaseolus
vulgaris. Genetics 8, 552560.
692 References

Scarascia-Mugnozza, G.T. and Perrino, P. (2002) The history of ex situ conservation and use of plant genetic
resources. In: Engels, J.M.M., Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing
Plant Genetic Diversity. International Plant Genetics Resources Institute (IPGRI), Rome, pp. 122.
Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb,
J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B. and Friend, S.H. (2003) Genetics of gene
expression surveyed in maize, mouse and man. Nature 422, 297302.
Schaeffer, M., Byrne, P. and Coe, E.H., Jr (2006) Consensus quantitative trait maps in maize: a database
strategy. Maydica 51, 357367.
Schauer, N. and Fernie, A.R. (2006) Plant metabolomics: towards biological function and mechanism.
Trends in Plant Science 11, 508516.
Scheuring, C., Barthelson, R., Gailbraith, D., Betran, J., Cothren, J.T., Zeng, Z.-B. and Zhang, H.-B. (2006)
Preliminary analysis of differential gene expression between a maize superior hybrid and its parents
using the 57K maize gene-specific long-oligonucleotide microarray. In: 48th Annual Maize Genetic
Conference, 912 March 2006, Pacific Grove, California, 132 pp.
Schmid, K.J., Rosleff Srensen, T., Stracke, R., Trjk, O., Altmann, T., Mithell-Olds, T. and Weisshaar, B.
(2003) Large-scale identification and analysis of genome wide single nucleotide polymorphisms for
mapping in Arabidopsis thaliana. Genome Research 13, 12501257.
Schmidt, R. (2002) Plant genome evolution: lessons from comparative genomics at the DNA level. Plant
Molecular Biology 48, 2137.
Schmierer, D.A., Kandemir, N., Kudrna, D.A., Jones, B.L., Ullrich, S.E. and Kleinhofs, A. (2004) Molecular
marker-assisted selection for enhanced yield in malting barley. Molecular Breeding 14, 463473.
Schn, C.C., Utz, H.F., Groh, S., Truberg, B., Openshaw, S. and Melchinger, A.E. (2004) Quantitative trait
locus mapping based on resampling in a vast maize testcrosses experiment and its relevance to
quantitative genetics for complex traits. Genetics 167, 485498.
Schranz, M.E., Song, B.-H., Windsor, A.J. and Mitchell-Olds, T. (2007) Comparative genomics in the
Brassicaceae: a family-wide perspective. Current Opinion in Plant Biology 10, 168175.
Schller, C., Backes, G., Fischbeck, G. and Jahoor, A. (1992) RFLP markers to identify the alleles on the Mla
locus conferring powdery mildew resistance in barley. Theoretical and Applied Genetics 84, 330338.
Schuster, S.C. (2008) Next-generation sequencing transforms todays biology. Nature Methods 5, 1618.
Schwarz, G., Herz, M., Huang, X.Q., Michalek, W., Jahoor, A., Wenzel, G. and Mohler, V. (2000) Application
of fluorescence-based semi-automated AFLP analysis in barley and wheat. Theoretical and Applied
Genetics 100, 545551.
Scott, K.D. (2001) Microsatellites derived from ESTs and their comparison with those derived by other
methods. In: Henry, R.J. (ed.) Plant Genotyping: the DNA Fingerprinting of Plants. CAB International,
Wallingford, UK, pp. 225237.
Searle, S.R. (1987) Linear Model for Unbalanced Data. John Wiley & Sons, New York.
Seaton, G., Haley, C.S., Knott, S.A., Kearsey, M. and Visscher, P.M. (2002) QTL Express: mapping quanti-
tative trait loci in simple and complex pedigrees. Bioinformatics 18, 339340.
Seitz, C., Vitten, M., Steinbach, P., Hartl, S., Hirsche, J., Rathje, W., Treutter, D. and Forkmann, G. (2007)
Redirection of anthocyanin synthesis in Osteospermum hybrida by a two-enzyme manipulation strat-
egy. Phytochemistry 68, 824833.
Seitz, G. (2005) The use of doubled haploids in corn breeding. In: Proceedings of the Forty First Annual
Illinois Corn Breeders School, 78 March 2005, Urbana-Champaign, Illinois. University of Illinois at
Urbana-Champaign, pp. 18.
Seki, M., Narusaka, M., Satou, M., Fujita, M., Sakurai, T., Oono, Y., Akiyama, T., Yamaguchi-Shinozaki,
K., Iida, K., Carninci, P., Ishisa, J., Kawai, J., Nakajima, M., Hayashizaki, Y., Enju, A. and Shinozaki,
K. (2005) Full-length cDNAs for the discovery and annotation of genes in Arabidopsis thaliana. In:
Leister, D. (ed.) Plant Functional Genomics. Food Products Press, Binghamton, New York, pp. 322.
Semagn, K., Bjrnstad, ., Skinnes, H., Mary, A.G., Tarkegne, Y. and William, M. (2006) Distribution of
DArT, AFLP and SSR markers in a genetic linkage map of a doubled-haploid hexaploid wheat popula-
tion. Genome 49, 545555.
Sen, S. and Churchill, G.A. (2001) A statistical framework for quantitative trait mapping. Genetics 159,
371387.
Septiningsih, E.M., Prasetiyono, J., Lubis, E., Tai, T.H., Tjubaryat, T., Moeljopawiro, S. and McCouch, S.R.
(2003) Identification of quantitative trait loci for yield and yield components in an advanced backcross
population derived from the Oryza sativa variety IR64 and the wild relative O. rufipogon. Theoretical
and Applied Genetics 107, 14191432.
References 693

Service, R.F. (2006) Gene sequencing. The race for the $1000 genome. Science 311, 15441546.
Servin, B., Martin, O.C., Mzard, M. and Hospital, F. (2004) Toward a theory of marker-assisted gene pyra-
miding. Genetics 168, 513523.
Sessions, A. Burke, E., Presting, G., Aux, G., McElver, J., Patton, D., Dietrich, B., Ho, P., Bacwaden, J.,
Ko, C., Clarke, J.D., Cotton, D., Bullis, D., Snell, J., Miguel, T., Hutchison, D., Kimmerly, B., Mitzel,
T., Katagiri, F., Glazebrook, J., Law, M. and Goff, S.A. (2002) A high-throughput Arabidopsis reverse
genetics system. The Plant Cell 14, 29852994.
Setimela, P., Chitalu, Z., Jonazi, J., Mambo, A., Hodson, D. and Bnziger, M. (2005) Environmental clas-
sification of maize-testing sites in the SADC region and its implication for collaborative maize breeding
strategies in the subcontinent. Euphytica 145, 123132.
Sham, P., Bader, J.S., Craig, I., ODonovan, M. and Owen, M. (2002) DNA pooling: a tool for large-scale
association studies. Nature Reviews Genetics 3, 862871.
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and
Ideker, T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction
networks. Genome Research 13, 24982504.
Sharopova, N., McMullen, M.D., Schultz, L., Schroeder, S., Sanchez-Villeda, H., Gardiner, J., Bergstrom,
D., Houchins, K., Melia-Hancock, S., Musket, T., Duru, N., Polacco, M., Edwards, K., Ruff, T., Register,
J.C., Brouwer, C., Thompson, R., Velasco, R., Chin, E., Lee, M., Woodman-Clikeman, W., Long, M.J.,
Liscum, E., Cone, K., Davis, G. and Coe, E.H., Jr (2002) Development and mapping of SSR markers
for maize. Plant Moelcular Biology 48, 463481.
Shatskaya, O.A., Zabirova, E.R., Shcherbak, V.S. and Chumak, M.V. (1994) Mass induction of maternal
haploids in corn. Maize Genetics Cooperation Newsletter 68, 51.
Shen, J.H., Li, M.F., Chen, Y.Q. and Zhang, Z.H. (1982) Breeding by anther culture in rice improvement.
Scientia Agricultura Sinica 2,1519.
Shen, L., Courtois, B., McNally, K.L., Robin, S. and Li, Z. (2001) Evaluation of near-isogenic lines of rice
introgressed with QTLs for root depth through marker-aided selection. Theoretical and Applied
Genetics 103, 7583.
Shen, Y.-J., Jiang, H., Jin, J.-P., Zhang, Z.-B., Xi, B., He, Y.-Y., Wang, G., Wang, C., Qian, L., Li, X., Yu, Q.-B., Liu,
H.-J., Chen, D.-H., Gao, J.-H., Huang, H., Shi, T.-L. and Yang, Z.-N. (2004) Development of genome-wide
DNA polymorphism database for map-based cloning of rice genes. Plant Physiology 135, 11981205.
Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nature Biotechnology 26, 11351145.
Shi, Y., Wang, T., Li, Y. and Darmency, H. (2008) Impact of transgene inheritance on the mitigation of gene
flow between crops and their wild relatives: the example of foxtail millet. Genetics 180, 969975.
Shibata, D. and Liu, Y.G. (2000) Agrobacterium-mediated plant transformation with large DNA fragments.
Trends in Plant Science 5, 354357.
Shimamoto, K. and Kyozuk, J. (2002) Rice as a model for comparative genomics of plants. Annual Review
of Plant Biology 53, 399419.
Shin, B.K., Wang, H., Yim, A.M., Naour, F.L., Brichory, F., Jang, J.H., Zhao, R., Puravs, E., Tra, J., Michael,
C.W., Misek, D.E. and Hanash, S.M. (2003) Global profiling of the cell surface proteome of cancer
cells uncovers an abundance of proteins with chaperone function. Journal of Biological Chemistry
278, 76077616.
Shizuya, H., Birren, B., Kim, U., Mancino, V., Slepak, T., Tachiiri, Y. and Simon, M. (1992) Cloning and stable
maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-
based vector. Proceedings of the National Academy of Sciences of the United States of America 89,
87948797.
Shoemaker, J.S., Painter, I.S. and Weir, B.S. (1999) Bayesian statistics in genetics. A guide for the uniniti-
ated. Trends in Genetics 15, 354358.
Shrawat, A.K. and Lrz, H. (2006) Agrobacterium-mediated transformation of cereals: a promising approach
crossing barriers. Plant Biotechnology Journal 4, 575603.
Shuber, A. and Pierceall, W. (2002) Methods for detecting nucleotide insertion or deletion using primer
extension. Patent EP 1203100.
Shull, G.H. (1908) The composition of a field of maize. American Breeders Association Report 4, 296301.
Siepel, A., Farmerm A., Tolopko, A., Zhuang, M, Mendes, P., Beavis, W. and Sobral, B. (2001) ISYS: a
decentralized, component-based approach to the integration of heterogeneous bioinformatic
resources. Bioinformatics 17, 8394.
Sillanp, M.J. and Arjas, E. (1998) Bayesian mapping of multiple quantitative trait loci from incomplete
inbred line cross data. Genetics 148, 13731388.
694 References

Sillanp, M.J. and Arjas, E. (1999) Bayesian mapping of multiple quantitative trait loci from incomplete
outbred offspring data. Genetics 151, 16051619.
Sillanp, M.J. and Bhattacharjee, M. (2005) Bayesian association-based fine mapping in small chromo-
somal segments. Genetics 169, 427439.
Sillanp, M.J. and Corander, J. (2002) Model choice in gene mapping: what and why. Trends in Genetics
18, 301307.
Silver, J. (1985) Confidence limits for estimates of gene linkage based on analysis of recombinant inbred
strains. Journal of Heredity 76, 436440.
Simko, I., Costanzo, S., Haynes, K.G., Christ, B.J. and Jones, R.W. (2004a) Linkage disequilibrium map-
ping of a Verticillium dahliae resistance quantitative trait locus in tetraploid potato (Solanum tubero-
sum) through a candidate gene approach. Theoretical and Applied Genetics 108, 217224.
Simko, I., Haynes, K.G., Ewing, E.E., Costanzo, S., Christ, B.J. and Jones, R.W. (2004b) Mapping
genes for resistance to Verticillium albo-atrum in tetraploid and diploid potato populations using
haplotype association tests and genetic linkage analysis. Molecular Genetics and Genomics 271,
522531.
Simmonds, N.W. (1979) Principles of Crop Improvement. Longman, London.
Simmonds, N.W. (1982) The context of the workshop. In: Withers, L.A. and Williams, J.T. (eds) Crop Genetic
Resources the Conservation of Difficult Material. IUBS Series B42, International Union of Biological
Sciences/International Board for Plant Genetic Resources/International Genetic Federation, Paris,
pp. 13.
Singh, M., Ceccarelli, S. and Grando, S. (1999) Genotype environment interaction of crossover type:
detecting its presence and estimating the crossover point. Theoretical and Applied Genetics 99,
988995.
Singh, R.P., Rajaram, S., Miranda, A., Huerta-Espino, J. and Autrique, E. (1998) Comparison of two cross-
ing and four selection schemes for yield, yield traits and slow rusting resistance to leaf rust in wheat.
Euphytica 100, 3543.
Singla-Pareek, S.L., Reddy, M.K. and Sopory, S.K. (2003) Genetic engineering of the glyoxalase pathway
in tobacco leads to enhanced salinity tolerance. Proceedings of the National Academy of Sciences of
the United States of America 100, 1467214677.
Sinha, S.K. and Swaminathan, M.S. (1984) New parameters and selection criteria in plant breeding. In:
Vose, P.B. and Blixt, S.G. (eds) Crop Breeding, a Contemporary Basis. Pergamon Press, Oxford, UK.
Siripoonwiwat, W. (1995) Application of restriction fragment length polymorphism (RFLP) markers in the
analysis of chromosomal regions associated with some quantitative traits for hexaploid oat improve-
ment. MS thesis, Cornell University, Ithaca, New York.
Sivamani, E., Huet, H., Shen, P., Ong, C.A., DeKochko, A., Fauquet, C.M. and Beachy, R.N. (1999) Rice
plants (Oryza sativa L.) containing three rice tungro spherical virus (RTSV) coat protein transgenes
are resistant to virus infection. Molecular Breeding 5, 177185.
Skinner, D.Z., Muthukrishnan, S. and Liang, G.H. (2004) Transformation: a powerful tool for crop improve-
ment. In: Liang, G.H. and Skinner, D.Z. (eds) Genetically Modified Crops: Their Development, Uses
and Risks. Food Products Press, Binghamton, New York, pp. 116.
Skol, A.D., Scott, L.J., Abecasis, G.R. and Boehnke, M. (2006) Joint analysis is more efficient than replica-
tion-based analysis for two-stage genome-wide association studies. Nature Genetics 38, 209213.
Slater, S., Mitsky, T.A., Houmiel, K.L., Hao, M., Reiser, S.E., Taylor, N.B., Tran, M., Valentin, H.E., Rodriguez,
D.J., Stone, D.A., Padgette, S.R., Kishore, G. and Gruys, K.J. (1999) Metabolic engineering of
Arabidopsis and Brassica for poly(3-hydroxybutyrate-co-3-hydroxyvalerate) copolymer production.
Nature Biotechnology 17, 10111016.
Slatkin, M. (1985) Gene flow in natural populations. Annual Review of Ecology and Systematics 16,
393430.
Smith, D., Yanai, Y., Lui, Y.-G., Ishiguro, S., Okada, K., Shibata, D., Whitter, R.F. and Fedoroff, N.V. (1996)
Characterization and mapping of Ds-GUS-T-DNA lines for targeted insertional mutagenesis. The
Plant Journal 10, 721732.
Smith, G.D. and Egger, M. (1998) Meta-analysis bias in location and selection of studies. BMJ 317,
625629.
Smith, H.F. (1936) A discriminant function for plant selection. Annals of Eugenics 7, 240250.
Smith, J.S.C. (1986) Genetic diversity within the corn belt dent racial complex of maize (Zea mays L.).
Maydica 21, 349367.
Smith, J.S.C. and Smith, O.S. (1992) Fingerprinting crop varieties. Advances in Agronomy 47, 85140.
References 695

Smith, M.E., Coffman, W.R. and Barker, T.C. (1990) Environmental effects on selection under high and
low input conditions. In: Kang, M.S. (ed.) Genotype-By-Environment Interactions and Plant Breeding.
Louisiana State University Agriculture Center, Baton Rouge, Louisiana, pp. 261272.
Smith, O.S., Smith, J.S.C., Bowen, S.L., Tenborg, R.A. and Wall, S.J. (1990) Similarities among a group
of elite maize inbreds as measured by pedigree, F1 grain yield, grain yield heterosis and RFLPs.
Theoretical and Applied Genetics 80, 833840.
Smith, O.S., Smith, J.S.C., Bowen, S.L. and Tenborg, R.A. (1991) Numbers of RFLP probes necessary to
show associations between lines. Maize Genetics Newsletter 65, 66.
Smith, O.S., Hoard, K., Shaw, F. and Shaw, R. (1999) Prediction of single-cross performance. In: Coors,
J.G. and Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops. American Society of
Agronomy (ASA) and Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 277285.
Smith, S. and Beavis, W. (1996) Molecular marker assisted breeding in a company environment. In: Sobral,
B.W.S. (ed.) The Impact of Plant Molecular Genetics. Birkhuer, Boston, Massachusetts, pp. 259272.
Smith, S. and Helentjaris, T. (1996) DNA fingerprinting and plant variety protection. In: Paterson, A.H. (ed.)
Genome Mapping in Plants. R.G. Landes Company, Austin, Texas, pp. 95110.
Sneath, P. and Sokal, R.R. (1973) Numerical Taxonomy, 2nd edn. W.H. Freeman, San Francisco,
California.
Sobral, B.W.S. (2002) The role of bioinformatics in germplasm conservation and use. In: Engels, J.M.M.,
Ramanatha Rao, V., Brown, A.H.D. and Jackson, M.T. (eds) Managing Plant Genetic Diversity.
International Plant Genetics Resources Institute (IPGRI), Rome, pp. 171178.
Sobral, B.W.S., Waugh, M. and Beavis W. (2001) Information systems approaches to support discovery
in agricultural genomics. In: Phillips, R.L. and Vasil, I.K. (eds) DNA-based Markers in Plants. Kluwer
Academic Publishers, Dordrecht, Netherlands.
Sobrino, B., Briona, M. and Carracedoa, A. (2005) SNPs in forensic genetics: a review on SNP typing
methodologies. Forensic Science International 154, 181194.
Sobrizal, K., Ikeda, K., Sanchez, P.L., Doi, K., Angeles, E.R., Khush, G.S. and Yoshimura, A. (1999)
Development of Oryza glumaepatulla introgression lines in rice, O. sativa L. Rice Genetics Newsletter
16, 107.
Sokal, R.R. (1986) Phenetic taxonomy: theory and methods. Annual Review of Ecological Systems 17,
423442.
Soller, M. and Beckmann, J.S. (1990) Marker-based mapping of quantitative trait loci using replicated pro-
genies. Theoretical and Applied Genetics 80, 205208.
Somers, D.J., Isaac, P. and Edwards, K. (2004) High-density microsatellite consensus map for bread wheat
(Triticum aestivum L.). Theoretical and Applied Genetics 109, 11051114.
Song, J., Bradeen, J.M., Naess, S.K., Raasch, J.A., Wielgus, S.M., Haberlach, G.T., Liu, J., Austin-Phillips,
S., Buell, C.R., Helgeson, J.P. and Jiang, J. (2003) Gene RB cloned from Solanum bulbocastanum
confers broad spectrum resistance to potato late blight. Proceedings of the National Academy of
Sciences of the United States of America 100, 91289133.
Song, R. and Messing, J. (2003) Gene expression of a gene family in maize based on noncollinear hap-
lotypes. Proceedings of the National Academy of the Sciences of the United States of America 100,
90559060.
Song, R., Llaca, V. and Messing, J. (2002) Mosaic organization of orthologous sequences in grass genome.
Genome Research 12, 15491555.
Sopory, S. and Munshi, M. (1996) Anther culture. In: Jain, S.M., Sopory, S.K. and Vielleux, R.E. (eds) In
Vitro Haploid Production in Higher Plants, Vol. 1.Kluwer Academic Publisher, Dordrecht, Netherlands,
pp. 145176.
Sorensen, D. and Gianola, D. (2002) Likelihood, Bayesian and MCMC Methods in Quantitative Genetics.
Springer-Verlag Inc., New York.
Sorrells, M.E. and Wilson, W.A. (1997) Direct classification and selection of superior alleles for crop
improvement. Crop Science 37, 691697.
Sorrells, M.E., La Rota, M., Bermudez-Kandianis, C.E., Greene, R.A., Kentety, R., Munkvold, J.D.,
Miftahudin, Mahmoud, A., Ma, X.F., Gustafson, P.J., Qi, L.L., Echalier, B., Gill, B.S., Matthews, D.E.,
Lazo, G.R., Chao, S., Anderson, O.D., Edwards, H., Linkiewicz, A.M., Dubcovsky, J., Akhunov, E.D.,
Dvorak, J., Zhang, D., Nguyen, H.T., Peng, J., Lapitan, N.L.V., Gonzalez-Hernandez, J.L., Anderson,
J.A., Hossain, K., Kalavacharla, V., Kianian, S.F., Choi, D.-W., Close, T.J., Dilbirligi, M., Gill, K.S.,
Steber, C., Walker-Simmons, M.K., McGuire, P.E. and Qualset, C.Q. (2003) Comparative DNA
sequence analysis of wheat and rice genomes. Genome Research 13, 18181827.
696 References

Sourdille, P., Singh, S., Cadalen, T., Brown-Guedira, G.L., Gay, G., Qi, L., Gill, B.S., Dufour, P., Murigneux,
A. and Bernard, M. (2004) Microsatellite-based deletion bin system for the establishment of genetic-
physical map relationships in wheat (Triticum aestivum L.). Functional and Integrative Genomics 4,
1225.
Southern, E.M. (1975) Detection of specific sequences among DNA fragments separated by gel electro-
phoresis. Journal of Molecular Biology 98, 503517.
Spielman, D., Cohen, J. and Zambrano, P. (2006) Will agbiotech applications reach marginalized farmers?
Evidence from developing countries. AgBioForum 9, 2330.
Spielman, R.S., McGinnis, R.E. and Ewens, W.J. (1993) Transmission test for linkage disequilibrium: the
insulin gene region and insulin-dependent diabetes mellitus (IDDM). American Journal of Human
Genetics 52, 506516.
Spooner, D., van Treuren, R. and de Vicente, M.C. (2005) Molecular markers for genebank management.
IPGRI Technical Bulletin No. 10. Available at: http://www.ipgri.cgiar.org/publications/pdf/1082.pdf
(accessed 30 June 2007).
Sprague, G.F. and Tatum, L.A. (1942) General vs. specific combining ability in single crosses of corn.
Journal of American Society of Agronomy 34, 923932.
Sprague, G.F., Russell, W.A., Penny, L.H. and Horner, T.W. (1962) Effects of epistasis on grain yield of
maize. Crop Science 2, 205208.
Springer, P.S. (2000) Gene traps: tools for plant development and genomics. The Plant Cell 12,
10071020.
Stadler, L.J. (1928) Mutations in barley induced by X-rays and radium. Science 68, 186187.
Stam, P. (1991) Some aspects of QTL analysis. Proceedings of the Eighth Meeting of the Eucarpia Section
Biometrics on Plant Breeding, 16 July 1991, Brno, Czechoslovakia, pp. 2432.
Stam, P. (1993) Construction of integrated genetic linkage maps by means of a new computer package:
JoinMap. The Plant Journal 3, 739744.
Stam, P. (1995) Marker-assisted breeding. In: Van Ooijen, J.W. and Jansen, J. (eds) Biometrics in Plant
Breeding: Applications of Molecular Markers. Proceedings of the 9th Meeting of EUCARPIA Section
on Biometrics in Plant Breeding (1994). Centre for Plant Breeding and Reproduction Research,
Wageningen, Netherlands, pp. 3244.
Stam, P. (2003) Marker-assisted introgression: speed at any cost? In: van Hintum, Th.J.L., Lebeda, A.,
Pink, D. and Schut, J.W. (eds) Proceedings of the Eucarpia Meeting on Leafy Vegetables Genetics
and Breeding, 1921 March 2003, Noordwijkerhout, Netherlands. Centre for Genetic Resources
(CGN), Wageningen, Netherlands, pp. 117124.
Stam, P. and Zeven, A.C. (1981) The theoretical proportion of the donor genome in near-isogenic lines of
self-fertilizers bred by backcrossing. Euphytica 30, 227238.
Stamatoyannopoulos, J.A. (2004) The genomics of gene expression. Genomics 84, 449457.
Stanford, J.C. (2000) The development of the biolistic process. In Vitro Cellular and Developmental
Biology Plant 36, 303308.
Stanford, J.C., Klein, T.M., Wolf, E.D. and Allen, N. (1987) Delivery of substances into cells and tissues
using a particle bombardment process. Particulate Science and Technology 5, 2737.
Staub, J.E. (1999) Intellectual property rights, genetic markers and the hybrid seed production. Journal of
New Seeds 1, 3964.
Stebbins, G.L. (1957) Self fertilization and population variability in the higher plants. American Nature 91,
337354.
Stebbins, G.L. (1970) Adaptive radiation of reproductive characteristics in angiosperms: I. Pollination
mechanisms. Annual Review of Ecology and Systematics 1, 307326.
Steele, K.A., Price, A.H., Shashidhar, H.E. and Witcombe, J.R. (2006) Marker-assisted selection to intro-
gress rice QTL controlling root traits into an Indian upland rice variety. Theoretical and Applied
Genetics 112, 208221.
Stein, L. (2001) Genome annotation: from sequence to biology. Nature Reviews Genetics 2, 493503.
Stein, L.D. (2002) Creating a bioinformatics nation. Nature 417, 119120.
Stein, L.D. (2003) Integrating biological databases. Nature Reviews Genetics 4, 337345.
Stein, N., Perovic, D., Kumlehn, J., Pellio, B., Stracke, S., Streng, S., Ordon, E. and Graner, A. (2005)
The eukaryotic translation initiation factor 4E confers multiallelic recessive Bymovirus resistance in
Hordeum vulgare (L.). The Plant Journal 42, 912922.
Stelly, D.M., Lee, J.A. and Rooney, W.L. (1988) Proposed schemes for mass-extraction of doubled haploids
of cotton. Crop Science 28, 885890.
References 697

Sterling, T.D. (1959) Publication decision and their possible effects on inferences drawn from tests of sig-
nificance or vice versa. Journal of the American Statistical Association 54, 3034.
Stich, B. and Melchinger, A.E. (2009) Comparison of mixed-model approaches for association mapping in
rapeseed, potato, sugar beet, maize, and Arabidopsis. BMC Genomics 10, 94.
Stich, B., Melchinger, A.E., Piepho, H.-P., Heckenberger, M., Maurer, H.P. and Reif, J.C. (2006) A new test
for family-based association mapping using inbred lines from plant breeding programs. Theoretical
and Applied Genetics 113, 11211130.
Stich, B., Yu, J., Melchinger, A.E., Piepho, H.P., Utz, H.F., Maurer, H.P. and Buckler, E.S. (2007) Power
to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy.
Genetics 176, 563570.
Stich, B., Mhring, J., Piepho, H.-P., Heckenberger, M., Buckler, E.S. and Melchinger, A.E. (2008)
Comparison of mixed-model approaches for association mapping. Genetics 178, 17451754.
Stitt, M. and Fernie, A.R. (2003) From measurements of metabolites to metabolomics: an on the fly per-
spective illustrated by recent studies of carbonnitrogen interactions. Current Opinion in Biotechnology
14, 136144.
Stoyanova, S.D. (1991) Genetic shifts and variations of gliadins induced by seed aging. Seed Science and
Technology 19, 363371.
Stratton, D.A. (1998) Reaction norm functions and QTL-environments for flowering time in Arabidopsis
thaliana. Heredity 81, 144155.
Stuber, C.W. (1992) Biochemical and molecular markers in plant breeding. Plant Breeding Reviews 9,
3761.
Stuber, C.W. (1994a) Breeding multigenic traits. In: Phillips, R.L. and Vasil, I.K. (eds) DNA Based Markers
in Plants. Kluwer Academic Publishers, Dordrecht, Netherlands, pp. 97115.
Stuber, C.W. (1994b) Heterosis in plant breeding. Plant Breeding Reviews 12, 227251.
Stuber, C.W. (1995) Mapping and manipulating quantitative traits in maize. Trends in Genetics 11,
477481.
Stuber, C.W. (1999) Biochemistry, molecular biology and physiology of heterosis. In: Coors, J.G. and
Pandey, S. (eds) The Genetics and Exploitation of Heterosis in Crops. American Society of Agronomy
(ASA) and Crop Science Society of America (CSSA), Madison, Wisconsin, pp. 173184.
Stuber, C.W. and Moll, R.H. (1972) Frequency changes of isozyme alleles in a selection experiment for
grain yield in maize (Zea mays L.). Crop Science 12, 337340.
Stuber, C.W. and Sisco, P.H. (1991) Marker-facilitated transfer of QTL alleles between elite inbred lines and
responses of hybrids. Proceedings of 46th Annual Corn and Sorghum Industry Research Conference
46, 104113.
Stuber, C.W., Moll, R.H., Goodman, M.M., Schaffer, H.E. and Weir, B.S. (1980) Allozyme frequency
changes associated with selection for increased grain yield in maize (Zea mays). Genetics 95,
225336.
Stuber, C.W., Goodman, M.M. and Moll, R.H. (1982) Improvement of yield and ear number resulting from
selection at allozyme loci in a maize population. Crop Science 22, 737740.
Stuber, C.W., Lincoln, S.E., Wolff, D.W., Helentjaris, T. and Lander, E.S. (1992) Identification of genetic fac-
tors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers.
Genetics 132, 823839.
Stuber, C.W., Polacco, M. and Senior, M.L. (1999) Synergy of empirical breeding, marker-assisted selec-
tion and genomics to increase crop yield potential. Crop Science 39, 15711583.
Stuper, R.M. and Springer, N.M. (2006) Cis-transcriptional variation in maize inbred lines B73 and Mo17
lead to additive expression patterns in the F1 hybrid. Genetics 173, 21992210.
Subrahmanyam, N.C. and Kasha, K.J. (1975) Chromosome doubling of barley haploids by nitrous oxide
and colchicine treatment. Canadian Journal of Genetics and Cytology 17, 573583.
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A.,
Pomeroy, S.L., Golub, T.R., Lander, E.S. and Mesirov, J.P. (2005) Gene set enrichment analysis: a
knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the
National Academy of Sciences of the United States of America 102, 1554515550.
Sughrou, J.R. and Rockeford, T.R. (1994) Restriction fragment length polymorphism differences among the
Illinois long-term selection oil strains. Theoretical and Applied Genetics 87, 916924.
Sugita, K., Kasahara, T., Matsunaga, E. and Ebinuma, H. (2000) A transformation vector for the produc-
tion of marker-free transgenic plants containing a single copy transgene at high frequency. The Plant
Journal 22, 461469.
698 References

Sullivan, S.N. (2004) Plant genetic resources and the law: past, present and future. Crop Science 135, 1015.
Sumner, L.W., Mendes, P. and Dixon, R.A. (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62, 817836.
Sun, D.J., He, Z.H., Xia, X.C., Zhang, L.P., Morris, C., Appels, R., Ma, W. and Wang, H. (2005) A novel STS
marker for polyphenol oxidase activities in bread wheat. Molecular Breeding 16, 209218.
Sun, Q.X., Huang, T.C., Ni, Z.F. and Procunier, D.J. (1996) Studies on heterotic grouping in wheat: I.
Genetic diversity between varieties revealed by RAPD. Journal of Agricultural Biotechnolgy (China)
4, 103110.
Sun, Q.X., Wu, L.M., Ni, Z.F., Meng, F.R., Wang, Z.K. and Lin, Z. (2004) Differential gene expression pat-
terns in leaves between hybrids and their parental inbreds are correlated with heterosis in a diallelic
cross. Plant Science 166, 651657.
Sun, Y., Wang, J., Crouch, J.H. and Xu, Y. (2009) Efficiency of selective genotyping for complex traits and
its innovative use in genetics and plant breeding. Molecular Breeding (in press)
Sundaresan, V., Springer, P., Volpe, T., Haward, S., Jones, J.D., Dean, C., Ma, H. and Martienssen, R.
(1995) Patterns of gene action in plant development revealed by enhancer trap and gene trap trans-
posable elements. Genes and Development 9, 17971810.
Suter, B., Kittanakom, S. and Stagljar, I. (2008) Two-hybrid technologies in proteomics research. Current
Opinion in Biotechnology 19, 316323.
Suzuki, Y., Uemura, S., Saito, Y., Murofushi, N., Schmitz, G., Theres, K. and Yamaguchi, I. (2001) A novel
transposon tagging element for obtaining gain-of-function mutants based on a self-stablizing Ac deriv-
ative. Plant Molecular Biology 45, 123131.
Swaminathan, M.S. (2006) An evergreen revolution. Crop Science 46, 22932303.
Swaminathan, M.S. (2007) Can science and technology feed the world in 2025? Field Crops Research
104, 39.
Swaminathan, M.S. and Singh, M.P. (1958) X-ray induced somatic haploidy in watermelon. Current Science
27, 6364.
Swanson-Wagner, R.A., Jia, Y., DeCook, R., Borsuk, L.A., Nettleton, D. and Schnable, P.S. (2006) All pos-
sible modes of gene action are observed in a global comparison of gene expression in a maize F1
hybrid and its inbred parents. Proceedings of the National Academy of Sciences of the United States
of America 103, 68056810.
Syvnen, A.-C. (1999) From gels to chips: minisequencing primer extension for analysis of point mutations
and single nucleotide polymorphisms. Human Mutation 13, 110.
Syvnen, A.-C. (2001) Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature
Reviews Genetics 2, 930942.
Syvnen, A.-C. (2005) Toward genome-wide SNP genotyping. Nature Genetics 37, S5S10.
Syvnen, A.-C., Aalto-Setala, K., Harju, L., Kontula, K. and Soderlund, H. (1990) A primer-guided nucle-
otide incorporation assay in the genotyping of apolipoprotein E. Genomics 8, 684692.
Szalma, S.J., Hostert, B.M., LeDeaux, J.R., Stuber, C.W. and Holland, J.B. (2007) QTL mapping with near-
isogenic lines in maize. Theoretical and Applied Genetics 114, 12111228.
Szarejko, I. and Forster, B.P. (2007) Doubled haploidy and induced mutation. Euphytica 158, 359370.
Tabashnik, B.E., Gassmann, A.J., Crowder, D.W. and Carrire, Y. (2008) Insect resistance to Bt crops:
evidence verus theory. Nature Biotechnology 26, 199202.
Taberner, A., Dopazo, J. and Castaaera, P. (1997) Genetic characterization of populations of a de novo
arisen sugar beet pest, Aubeonymus mariaefranciscae (Coleopteram Curculionidae), by RAPD anal-
ysis. Journal of Molecular Evolution 45, 2431.
Tai, G.C.C. (1971) Genotypic stability analysis and its application to potato regional trials. Crop Science
11, 184190.
Taji, A., Kumar, P.P. and Lakshmann, P. (2002) In Vitro Plant Breeding. Food Products Press, Binghamton,
New York, 167 pp.
Takahashi, Y., Shomura, A., Sasaki, T. and Yano, M. (2001) Hd6, a rice quantitative trait locus involved in
photoperiod sensitivity, encodes the alpha subunit of protein kinase CK2. Proceedings of the National
Academy of Sciences of the United States of America 98, 79227927.
Talbot, C.J., Nicod, A., Cherny, S.S., Fulker, D.W., Collins, A.C. and Flint, J. (1999) High-resolution mapping
of quantitative trait loci in outbred mice. Nature Genetics 21, 305308.
Tan, Y.F., Li, J.X., Yu, S.B., Xing, Y.Z., Xu, C.G. and Zhang, Q. (1999) The three important traits for cooking
and eating quality of rice grain are controlled by a single locus in an elite rice hybrid, Shanyou 63.
Theoretical and Applied Genetics 99, 642648.
References 699

Tang, G.L., Reinhart, B.J., Bartel, D.P. and Zamore, P.D. (2003) A biochemical framework for RNA silencing
in plants. Genes and Development 17, 4963.
Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M. and Paterson, A.H. (2008) Synteny and collinearity in
plant genomes. Science 320, 486488.
Tanksley, S.D. (1983) Molecular markers in plant breeding. Plant Molecular Biology Reporter 1, 13.
Tanksley, S.D. (1993) Mapping polygenes. Annual Review of Genetics 27, 205233.
Tanksley, S.D. and McCouch, S.R. (1997) Seed banks and molecular maps: unlocking genetic potential
from the wild. Science 277, 10631066.
Tanksley, S.D. and Nelson, J.C. (1996) Advanced backcross QTL analysis: a method for the simultaneous
discovery and transfer of valuable QTLs from unadapted germplasm into elite breeding. Theoretical
and Applied Genetics 92, 191203.
Tanksley, S.D. and Rick, C.M. (1980) Isozyme gene linkage map of the tomato: applications in genetics and
breeding. Theoretical and Applied Genetics 57, 161170.
Tanksley, S.D., Miller, J., Paterson, A. and Bernatzky, R. (1988) Molecular mapping of plant chromosomes. In:
Gustafson, J.P. and Appels, R. (eds) Chromosome Structure and Function Impact of New Concepts.
Proceedings of the 18th Stadller Genetics Symposium. Plenum Press, New York, pp. 157173.
Tanksley, S.D., Young, N.D., Paterson, A.H. and Bonierbale, M.W. (1989) RFLP mapping in plant breeding:
new tools for an old science. Bio/Technology 7, 257263.
Tanksley, S.D., Ganal, M.W. and Martin, G.B. (1995) Chromosome landing: a paradigm for map based gene
cloning in plants with large genomes. Trends in Genetics 11, 6368.
Tanksley, S.D., Grandillo, S., Fulton, T.M., Zamir, D., Eshed, Y., Petiard, V., Lopez, J. and Beck-Bunn, T.
(1996) Advanced backcross QTL analysis in a cross between an elite processing line of tomato and
its wild relative L. pimpinellifolium. Theoretical and Applied Genetics 92, 213224.
Tao, Q. and Zhang, H.-B. (1998) Cloning and stable maintenance of DNA fragments over 300 kb in
Escherichia coli with conventional plasmid-based vectors. Nucleic Acids Research 26, 49014909.
Tarchini, R., Biddle, P., Wineland, R., Tingey, S. and Rafalski, A. (2000) The complete sequence of 340kb
of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4.
The Plant Cell 12, 381391.
Tardieu, F. (2003) Virtual plants: modeling as a tool for the genomics of tolerance to water deficit. Trends in
Plant Science 8, 914.
Tauz, D. and Renz, M. (1984) Simple sequences are ubiquitous repetitive components of eukaryotic
genomes. Nucleic Acids Research 12, 41274138.
Taylor, B.A. (1978) Recombinant inbred strains: use in gene mapping. In: Morse, H.C. (ed.) Origin of Inbred
Mice. Academic Press, New York, pp. 423438.
Tekeoglu, M., Rajesh, P.N. and Muehlbauer, F.J. (2002) Integration of sequence tagged microsatellites to
the chickpea genetic map. Theoretical and Applied Genetics 105, 847854.
Temnykh, S., Park, W.D., Ayres, N., Cartinhour, S., Hauck, N., Lipovich, L., Cho, Y.G., Ishii, T. and McCouch,
S.R. (2000) Mapping and genome organization of microsatellite sequences in rice (Oryza sativa L.).
Theoretical and Applied Genetics 100, 697712.
Tenaillon, M.I., Sawkins, M.C., Long, A.D., Gaut, R.L., Doebley, J.F. and Gaut, B.S. (2001) Patterns of DNA
sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proceedings of the
National Academy of Sciences of the United States of America 98, 91619166.
Tenhola-Roininen, T., Immonen, S. and Tanhuanp, P. (2006) Rye doubled haploids as a research and
breeding tool a practical point of view. Plant Breeding 125, 584590.
Terada, R., Urawa, H., Inagaki, Y., Tsugane, K. and Iida, S. (2002) Efficient gene targeting by homologous
recombination in rice. Nature Biotechnology 20, 10301034.
Tessier, D.C., Arbour, M., Benoit, F., Hogues, H. and Rigby, T. (2005) A DNA microarray fabrication strat-
egy for research laboratories. In: Sensen, C.W. (ed.) Handbook of Genome Research. Genomics,
Proteomics, Metabolomics, Bioinformatics, Ethical and Legal Issues. WILEY-VCH, Weinheim,
Germany, pp. 223238.
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant
Arabidopsis thaliana. Nature 408, 796815.
Therneau, T.M. and Grambsch, P.M. (2000) Modeling Survival Data: Extending the Cox Model. Springer,
New York.
Thiel, T., Michalek, W., Varshney, R.K. and Graner, A. (2003) Exploiting EST data bases for the develop-
ment and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical
and Applied Genetics 106, 411422.
700 References

Thoday, J.M. (1961) Location of polygenes. Nature 191, 368370.


Thomas, C.D., Cameron, A., Green, R.E., Bakkenes, M., Beaumont, L.J., Collingham, Y.C., Erasmus,
B.F.N., Ferreira de Siqueira, M., Grainger, A., Hannah, L., Hughes, L., Huntley, B., van Jaarsveld,
A.S., Midgley, G.F., Miles, L., Ortega-Huerta, M.A., Peterson, A.T., Phillips, O.L. and Williams, S.E.
(2004) Extinction risk from climate change. Nature 427, 145148.
Thompson, J.A., Halewood, M., Engels, J. and Hoogendoorn, C. (2004) Plant genetic resources collections:
a survey of issues concerning their value, accessibility and status as public goods. In: New Directions
for a Diverse Planet: Proceedings of the 4th International Crop Science Congress, 26 September1
October 2004, Brisbane, Australia. Published on CD-ROM. Available at: http://www.cropscience.org.
au/icsc2004/ (accessed 17 November 2009).
Thomson, M.J., Tai, T.H., McClung, A.M., Hinga, M.E., Lobos, K.B., Xu, Y., Martinez, C. and McCouch,
S.R. (2003) Mapping quantitative trait loci for yield, yield components and morphological traits in an
advanced backcross population between Oryza rufipogon and the Oryza sativa cultivar Jefferson.
Theoretical and Applied Genetics 107, 479493.
Thomson, M.J., Edwards, J.D., Septiningsih, E.M., Harrington, S.E. and McCouch, S.R. (2006) Substitution
mapping of dth1.1, a flowering-time quantitative trait locus (QTL) associated with transgressive varia-
tion in rice, reveals multiple sub-QTL. Genetics 172, 25012514.
Thorisson, G.A., Muilu, J. and Brookes, A.J. (2009) Genotypephenotype databases: challenges and solu-
tions for the post-genomic era. Nature Reviews Genetics 10, 918.
Thornsberry, J.M., Goodman, M.M., Doebley, J., Kresovich, S., Nielsen, D. and Buckler IV, E.S. (2001)
Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28, 286289.
Tikhonov, A.P., SanMiguel, P.J., Nakajima, Y., Gorenstein, N.M., Bennetzen, J.L. and Avramova, Z. (1999)
Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proceedings of the
National Academy of Sciences of the United States of America 96, 74097414.
Till, B.J., Reynolds, S.H., Greene, E.A., Codomo, C.A., Enns, L.C., Johnso, J.E., Burtner, C., Odden, A.R.,
Young, K., Taylor, N.E., Henikoff, J.G., Comai, L. and Henikoff, S. (2003) Large-scale discovery of
induced point mutations with high-throughput TILLING. Genome Research 13, 524530.
Till, B.J., Comai, L. and Henikoff, S. (2007) TILLING and EcoTILLING for crop improvement. In: Varshney,
R.K. and Tuberosa, R. (eds) Genomics-Assisted Crop Improvement. Vol.1: Genomic Approaches and
Platforms. Springer, Dordrecht, Netherlands, pp. 333350.
Tinker, N.A. and Mather, D.E. (1993) GREGOR: software for genetic simulation. Journal of Heredity 84, 237.
Tirosh, I., Bilu, Y. and Barkai, N. (2007) Comparative biology: beyond sequence analysis. Current Opinion
in Biotechnology 18, 371377.
Tomita, M., Hashimoto, K., Takahashi, K, Shimizu, T.S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S.,
Yugi, K., Venter, J.C. and Hutchison, C.A. III (1999) E-CELL: software environment for whole-cell
simulation. Bioinformatics 15, 7284.
Trawick, B.W. and McEntyre, J.R. (2004) Bibliographic databases. In: Sansom, C.E. and Horton, R.M. (eds)
The Internet for Molecular Biologists. Oxford University Press, Oxford, UK, pp. 116.
Trethewey, R.N. (2005) Metabolite profiling in plants. In: Leister, D. (ed.) Plant Functional Genomics. Food
Products Press, Binghamton, New York, pp. 85117.
Tripp, R., Louwaars, N.P. and Eaton, D. (2006) Intellectual property rights for plant breeding and rural
development: challenges for agricultural policymakers. Agricultural and Rural Development Notes
Issue 12.
Truco, M.J., Antonise, R., Lavelle, D., Ochoa, O., Kozik, A., Witsenboer, H., Fort, S.B., Jeuken, M.J.W.,
Kesseli, R.V., Lindhout, P., Michelmore, R.W. and Peleman, J. (2007) A high-density, integrated genetic
linkage map of lettuce (Lactuca spp.). Theoretical and Applied Genetics 115, 735746.
Tsien, R.Y. (1998) The green fluorescent protein. Annual Review of Biochemistry 67, 509544.
Tu, J., Datta, K., Alam, M.F., Khush, G.S. and Datta, S.K. (1998a) Expression and function of a hybrid Bt
toxin gene in transgenic rice conferring resistance to insect pests. Plant Biotechnology 15, 183191.
Tu, J., Ona, I., Zhang, Q., Mew, T.W., Khush, G.S. and Datta, S.K. (1998b) Transgenic rice variety IR72 with
Xa21 is resistant to bacterial blight. Theoretical and Applied Genetics 97, 3136.
Turcotte, E.L. and Feaster, C.V. (1963) Haploids: high-frequency production from single-embryo seeds in a
line of Pima cotton. Science 140, 14071408.
Turcotte, E.L. and Feaster, C.V. (1967) Semigamy in Pima cotton. Journal of Heredity 58, 5457.
Tuvesson, S., Dayteg, C., Hagberg, P., Manninen, O., Tanhuanp, P., Tenhola-Roininen, T., Kiviharju, E.,
Weyen, J., Frster, J., Schondelmaier, J., Lafferty, J., Marn, M. and Fleck, A. (2007) Molecular markers
and doubled haploids in European plant breeding programmers. Euphytica 158, 305312.
References 701

Tyo, K.E., Alper, H.S. and Stephanopoulos, G.N. (2007) Expanding the metabolic engineering toolbox:
more options to engineer cells. Trends in Biotechnology 25, 132137.
Tzfira, T. and Citovsky, V. (2006) Agrobacterium-mediated genetic transformation of plants: biology and
biotechnology. Current Opinion in Biotechnology 17, 147154.
Tzfira, T., Tian, G.W., Lacroix, B., Vyas, S., Li, J., Leitner-Dagan, Y., Krichevsky, A., Taylor, T., Vainstein, A.
and Citovsky, V. (2005) pSAT vectors: amodular series of plasmids for autofluorescent protein tagging
and expression of multiple genes in plants. Plant Molecular Biology 57, 503516.
Tzfira, T., Kozlovsky, S.V. and Vitaly Citovsky, V. (2007) Advanced expression vector systems: new weapons
for plant research and biotechnology. Plant Physiology 145, 10871089.
Ufaz, S. and Galili, G. (2008) Improving the content of essential amino acids in crop plants: goals and
opportunities. Plant Physiology 147, 954961.
Uga, Y., Fukuta, Y., Cai, H.W., Iwata, H., Ohsawa, R., Morishima, H. and Fujimura, T. (2003) Mapping QTLs
influencing rice floral morphology using recombinant inbred lines derived from a cross between Oryza
sativa L. and Oryza rufipogon Griff. Theoretical and Applied Genetics 107, 218226.
Uga, Y., Nonoue, Y., Liang, Z.W., Lin, H.X., Yamamoto, S., Yamanouchi, U. and Yano, M. (2007) Accumulation
of additive effects generates a strong photoperiod sensitivity in the extremely late-heading rice cultivar
Nona Bokra. Theoretical and Applied Genetics 114, 14571466.
Ukai, Y., Osawa, R., Saito, A. and Hayashi, T. (1995) MAPL: a package of computer programs for construc-
tion of DNA polymorphism linkage maps and analysis of QTL (in Japanese). Breeding Science 45,
139142.
Ulloa, M., Saha, S., Jenkins, J.N., Meredith, W.R., Jr, McCarty, J.C., Jr and Stelly, D.M. (2005) Chromosomal
assignment of RFLP linkage groups harboring important QTLs on an intraspecific cotton (Gossypium
hirsutum L.) joinmap. Journal of Heredity 96, 132144.
Ungerer, M.C., Halldorsdottir, S.S., Purugganan, M.D. and Mackay, T.F.C. (2003) Genotypeenvironment
interactions at quantitative trait loci affecting inflorescence development in Arabidopsis thaliana.
Genetics 165, 353365.
nl, M., Morgan, M.E. and Minden, J.S. (1997) Difference gel electrophoresis: a single gel method for
detecting changes in protein extracts. Electrophoresis 18, 20712077.
Upadhyaya, H.D. and Ortiz, R. (2001) A mini core subset for capturing diversity and promoting utiliza-
tion of chickpea genetic resources in crop improvement. Theoretical and Applied Genetics 102,
12921298.
Upadhyaya, H.D., Bramel, P.J., Ortiz, R. and Singh, S. (2002) Developing a mini core of peanut for utiliza-
tion of genetic resources. Crop Science 42, 21502156.
Upadhyaya, H.D., Gowda, C.L.L., Pundir, R.P.S., Reddy, V.G. and Singh, S. (2006a) Development of core
subset of fingermillet germplasm using geographical origin and data on 14 quantitative traits. Genetic
Resources and Crop Evolution 53, 679685.
Upadhyaya, H.D., Reddy, L.J., Gowda, C.L.L., Reddy, K.N. and Singh, S. (2006b) Development of a mini
core subset for enhanced and diversified utilization of pigeonpea germplasm resources. Crop Science
46, 21272132.
UPOV (The International Union for the Protection of New Varieties of Plants) (1991) The 1991 Act of
the UPOV Convention. Available at: http://www.upov.int/en/publications/conventions/1991/content.
htm (accessed 17 November 2009).
UPOV (The International Union for the Protection of New Varieties of Plants) (2005) UPOV Report on the Impact
of Plant Variety Protection. UPOV Publication No. 353 (E), UPOV, Geneva, December 2005, 98 pp.
Urwin, P.E., McPheron, M.J. and Atkinson, H.J. (1998) Enhanced transgenic plant resistance to nematodes
by dual proteinase inhibitor constructs. Planta 204, 472479.
Urwin, P., Yi, L., Martin, H., Atkinson, H. and Gilmartin, P.M. (2000) Functional characterization of the
EMCV IRES in plants. Plant Journal 24, 583589.
USDA (United States Department of Agriculture) (2002a) Statistical indicators. Agricultural Outlook,
JanuaryFebruary 2002, Economic Research Service, USDA, Washington, DC, pp. 3059.
USDA (United States Department of Agriculture) (2002b) Genetically engineered crops: US adoption and
impacts. Agricultural Outlook, September 2002, Economic Research Service, USDA, Washington,
DC, pp. 2427.
Usuka, J., Zhu, W. and Brendel, V. (2000) Optimal sliced alignment of homologous cDNA to a genomic DNA
template. Bioinformatics 16, 203211.
Utz, H.F. and Melchinger, A.E. (1994) Comparison of different approaches to interval mapping of quantita-
tive trait loci. In: van Ooijen, J.W. and Jansen, J. (eds) Biometrics in Plant Breeding: Applications of
702 References

Molecular Markers. Proceedings of the Ninth Meeting of the EUCARPIA Section Biometrics in Plant
Breeding, 68 July 1994, Wageningen, Netherlands, pp. 195204.
Utz, H.F. and Melchinger A.E. (1996) PLABQTL: a program for composite interval mapping of QTL. Journal
of Agricultural Genomics. Available at: http://www.cabi-publishing.org/jag/papers96/paper196/
indexp196.html (accessed 30 June 2007).
Utz, H.F., Melchinger, A.E. and Schn, C.C. (2000) Bias and sampling error of the estimated proportion of
genotypic variance explained by quantitative loci determined from experimental data in maize using
cross validation and validation with independent samples. Genetics 154, 18391849.
Vain, P., Afolabi, A.S., Worland, B. and Snape, J.W. (2003) Transgene behaviour in populations of rice
plants transformed using a new dual binary vector system: pGreen/pSoup. Theoretical and Applied
Genetics 107, 210217.
Vallegos, C.E. and Chase, C.D. (1991) Linkage between isozyme markers and a locus affecting seed size
in Phaseolus vulgaris L. Theoretical and Applied Genetics 81, 413419.
van Berloo, R. (1999) GGT: software for the display of graphical genotypes. Journal of Heredity 90,
328329.
van Berloo, R. and Stam, P. (1998) Marker-assisted selection in autogamous RIL populations: a simulation
study. Theoretical and Applied Genetics 96, 147154.
van Berloo, R. and Stam, P. (1999) Comparison between marker-assisted selection and phenotypical
selection in a set of Arabidopsis thaliana recombinant inbred lines. Theoretical and Applied Genetics
98, 113118.
van Berloo, R. and Stam, P. (2001) Simultaneous marker-assisted selection for multiple traits in autog-
amous crops. Theoretical and Applied Genetics 102, 11071112.
van Berloo, R., Aalbers, H., Werkman, A. and Niks, R.E. (2001) Resistance QTL confirmed through devel-
opment of QTL-NILs for barley leaf rust resistance. Molecular Breeding 8, 187195.
van der Fits, L., Hilliou, F. and Memelink, J. (2001) T-DNA activation tagging as a tool to isolate regula-
tors of a metabolic pathway from a generally non-tractable plant species. Transgenic Research 10,
513521.
van der Wurff, A.W., Chan, Y.L., Van Straalen, N.M. and Schouten, J. (2000) TE-AFLP: combining rapidity
and robustness in DNA fingerprinting. Nucleic Acids Research 28, e105.
van Deynze, A.E., Nelson, J.C., ODonoughue, L.S., Ahn, S.N., Siripoonwiwat, W., Harrington, S.E.,
Yglesias, E.S., Braga, D.P., McCouch, S.R. and Sorrells, M.E. (1995a) Comparative mapping in
grasses. Oat relationships. Molecular and General Genetics 249, 349356.
van Deynze, A.E., Nelson, J.C., ODonoughue, L.S., Ahn, S.N., Siripoonwiwat, W., Harrington, S.E.,
Yglesias, E.S., Braga, D.P., McCouch, S.R. and Sorrells, M.E. (1995b) Comparative mapping in
grasses. Wheat relationships. Molecular and General Genetics 248, 744754.
van Eeuwijk, F.A., Denis, J.-B. and Kang, M.S. (1996) Incorporating additional information on genotype and
environments in models for two-way genotype by environment tables. In: Kang, M.S. and Gaugh, H.G.
(eds) Genotype-by-Environment Interaction. CRC Press, Boca Raton, Florida, pp. 1550.
van Eeuwijk, F.A., Crossa, J., Vargas, M. and Ribaut, J.M. (2001) Variants of factorial regression for ana-
lysing QTL by environment interaction. In: Gallais, A., Dillmann, C. and Goldringer, I. (eds) Eucarpia,
Quantitative Genetics and Breeding Methods: the Way Ahead. Institut National de la Rescherche
Agronomique (INRA) Editions, Versailles. Les colloques 96, 107116.
van Eeuwijk, F.A., Crossa, J., Vargas, M. and Ribaut, J.-M. (2002) Analysing QTL by environment inter-
action by factorial regression, with an application to the CIMMYT drought and low nitrogen stress
programme in maize. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB
International, Wallingford, UK, pp. 245256.
van Eeuwijk, F.A., Malosetti, M., Yin, X., Struik, P.C. and Stam, P. (2004) Modeling differential pheno-
typic expression. In: New Directions for a Diverse Planet: Proceedings 4th International Crop Science
Congress (ICSC), 26 September1 October 2004, Brisbane, Australia. ICSC, Brisbane, Australia.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
van Eeuwijk, F.A., Malosetti, M., Yin, X., Struik, P.C. and Stam, P. (2005) Statistical models for genotype
by environment data: from conventional ANOVA models to eco-physiological QTL models. Australian
Journal of Agricultural Research 56, 883894.
van Eijk, M., Peleman, J. and de Ruiter-Bleeker, M. (2001) Microsatellite-AFLP. Patent EP 1282729.
van Ginkel, M., Trethowan, R., Ammar, K., Wang, J. and Lillemo, M. (2002) Guide to bread wheat breeding
at CIMMYT (rev). Wheat special report No. 5. Centro Internacional de Mejoramiento de Maiz y Trigo
(CIMMYT), Mexico, DF.
References 703

van Oeveren, A.J. and Stam, P. (1992) Comparative simulation studies on the effects of selection for quan-
titative traits in autogamous crops: early selection versus single seed decent. Heredity 69, 342351.
van Ooijen, A.J. and Voorrips, R.E. (2001) JoinMap (tm) 3.0: Software for the Calculation of Genetic Linkage
Maps. Plant Research International, Wageningen, Netherlands.
van Ooijen, J.W. (1992) Accuracy of mapping quantitative trait loci in autogamous species. Theoretical and
Applied Genetics 84, 803811.
van Os, H., Andrzejewski, S., Bakker, E., Barrena, I., Bryan, G.J., Caromel, B., Ghareeb, B., Ishidore,
E., de Jong, W., van Koert, P., Lefebvre, V., Milbourne, D., Ritter, E., van der Voort, J.N.A.M.,
Rousselle-Bourgeois, E., van Vliet, J., Waugh, R., Visser, R.G.F., Bakker, J. and van Eck, H.J.
(2006) Construction of a 10,000-marker ultradense genetic recombination map of potato: provid-
ing a framework for accelerated gene isolation and a genomewide physical map. Genetics 173,
10751087.
van Treuren, R. (2001) Efficiency of reduced primer selectivity and bulked DNA analysis for the rapid detec-
tion of AFLP polymorphisms in a range of crop species. Euphytica 117, 2737.
van Wijk, K.J. (2001) Challenges and prospects of plant proteomics. Plant Physiology 126, 301308.
Vandepoele, K. and Van de Peer, Y. (2005) Exploring the plant transcriptome through phylogenetic profiling.
Plant Physiology 137, 3142.
Vaneck, J.M., Blowers, A.D. and Earle, E.D. (1995) Stable transformation of tomato cell-cultures after bom-
bardment with plasmid and YAC DNA. Plant Cell Reports 14, 299304.
Vane-Wright, R.I., Humphries, D.J. and Williams, P.H. (1991) What to protect? Systematics and the agony
of choice. Biological Conservation 55, 235254.
Varela, M., Crossa, J., Rane, J., Joshi, A. and Trethowan, R. (2006) Analysis of a three-way interaction
including multi-attributes. Australian Journal of Agricultural Research 57, 11851193.
Vargas, M., van Eeuwijk, F.A., Crossa, J. and Ribaut, J.-M. (2006) Mapping QTL and QTL environment
interaction for CIMMYT maize drought stress program using factorial regression and partial least
squares methods. Theoretical and Applied Genetics 122, 10091023.
Varshney, R.K., Graner, A. and Sorrells, M.E. (2005a) Genic microsatellite markers in plants: features and
applications. Trends in Biotechnology 23, 4855.
Varshney, R.K., Graner, A. and Sorrells, M.E. (2005b) Genomics-assisted breeding for crop improvement.
Trends in Plant Science 10, 621630.
Varshney, R.K., Nayak, S.N., May, G.D. and Jackson, S.A. (2009) Next-generation sequencing technolo-
gies and their implications for crop genetics and breeding. Trends in Biotechnology 27, 522530.
Vavilov, N.I. (1926) Studies on the origin of cultivated plants. Bulletin of Applied Botany, Genetics and Plant
Breeding 16, 1248.
Veena, J.H., Doerge, R.W. and Gelvin, S. (2003) Transfer of T-DNA and Vir proteins to plant cells by
Agrobacterium tumefaciens induces expression of host genes involved in mediating transformation
and suppresses host defense gene expression. The Plant Journal 35, 219236.
Velculescu, V.E., Zhang, L., Vogelstein, B. and Kinzler, K.W. (1995) Serial analysis of gene expression.
Science 270, 484487.
Veldboom, L.R., Lee, M. and Woodman, W.L. (1994) Molecular-marker-facilitated studies in an elite maize
population: I. Linkage analysis and determination of QTL for morphological traits. Theoretical and
Applied Genetics 88, 716.
Veldboom, L.R., Lee, M. and Woodman, W.L. (1996) Molecular-marker-facilitated studies in an elite maize
population: I. Linkage analysis and determination of QTL for morphological traits. Theoretical and
Applied Genetics 88, 716.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J. et al. (2001) The sequence of the human
genome. Science 291, 13041351.
Verbyla, A.P., Eckermann, P.J., Thompson, R. and Cullis, B.R. (2003) The analysis of quantitative trait
loci in multi-environment trials using a multiplicative mixed model. Australian Journal of Agricultural
Research 54, 13951408.
Verdonk, J.C., De Vos, C.H.R., Verhoeven, H.A., Harina, M.A., van Tunen, A.J. and Schuurink, R.C. (2003)
Regulation of floral scent production in petunia revealed by targeted metabolomics. Phytochemistry
62, 9971008.
Verhaegen, D., Plomion, C., Gion, J.-M., Poitel, M., Costa, P. and Kremer, A. (1997) Quantitative trait
dissection analysis in Eucalyptus using RAPD markers: 1. Detection of QTL in interspecific hybrid
progeny, stability of QTL expression across different ages. Theoretical and Applied Genetics 95,
597608.
704 References

Verweire, D., Verleyen, K., Buck, S.D., Claeys, M. and Angenon, G. (2007) Marker-free transgenic plants
through genetically programmed auto-excision. Plant Physiology 145, 12201231.
Veyrieras, J.-B., Goffinet, B. and Alain Charcosset, A. (2007) MetaQTL: a package of new computational
methods for the meta-analysis of QTL mapping experiments. BMC Bioinformatics 8, 49.
Vickers, C., Xue, G. and Gresshoff, P.M. (2006) A novel cis-acting element, ESP, contributes to high-level
endosperm-specific expression in an oat globulin promoter. Plant Molecular Biology 62, 195214.
Vigouroux, Y., Mitchell, S., Matsuoka, Y., Hamblin, M., Kresovich, S., Smith, J.S.C., Jaqueth, J., Smith, O.S.
and Doebley, J. (2005) An analysis of genetic diversity across the maize genome using microsatellites.
Genetics 169, 16171630.
Villar, M., Lefevre, F., Bradshaw, H.D., Jr and du-Cros, E.T. (1996) Molecular genetics of rust resistance in
poplars (Melampsora larici-populina Kleb/Populus sp.) by bulked segregant analysis in a 2 2 facto-
rial mating design. Genetics 143, 531536.
Virk, P.S., Ford-Lloyd, B.V., Jackson, M.T., Pooni, H.S., Clemeno, T.P. and Newbury, H.J. (1996) Predicting
quantitative variation within rice germplasm using molecular markers. Heredity 76, 296304.
Vision, T.J., Brown, D.G., Shmoys, D.B., Durrett, R.T. and Tanksley, S.D. (2000) Selective mapping: a strat-
egy for optimizing the construction of high-density linkage maps. Genetics 155, 407420.
Visscher, P.M. and Goddard, M.E. (2004) Prediction of the confidence interval of quantitative trait loci loca-
tion. Behavior Genetics 34, 477482.
Visscher, P.M., Thompson, R. and Haley, C.S. (1996) Confidence intervals in QTL mapping by bootstrap-
ping. Genetics 143, 10131020.
Visscher, P.M., Hill, W.G. and Wray, N.R. (2008) Heritability in the genomics era concepts and misconcep-
tions. Nature Reviews Genetics 9, 255266.
Vogl, C. and Xu, S. (2000) Multipoint mapping of viability and segregation distorting loci using molecular
markers. Genetics 155, 14391447.
Vos, P., Hogers, R., Bleeker, M., Reijans, M., van de Lee, T., Hornes, M., Frijters, A., Pot, J., Peleman,
H., Kuiper, M. and Zabeau, M. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids
Research 23, 44074414.
Vuylsteke, M., Kuiper, M. and Stam, P. (2000) Chromosomal regions involved in hybrid performance and
heterosis: their AFLP-based identification and practical uses in prediction models. Heredity 85,
208218.
Walden, I. (1998) Preserving diversity: the role of property rights. In: Swanson, T.M. (ed.) Intellectual
Property Rights and Biodiversity Conservation. Cambridge University Press, Cambridge, UK,
pp. 176 197.
Walker, D., Boerma, H.R., All, J. and Parrott, W. (2002) Combining cry1Ac with QTL alleles from PI 229358
to improve soybean resistance to lepidopteran pests. Molecular Breeding 9, 4351.
Walker, D.R., Narvel, J.M., Boerma, H.R., All, J.N. and Parrott, W.A. (2004) A QTL that enhances and
broadens Bt insect resistance in soybean. Theoretical and Applied Genetics 109, 10511957.
Wallace, D.H. (1985) Physiological genetics of plant maturity, adaptation and yield. Plant Breeding Reviews
3, 21158.
Wallace, R.B., Shaffer, J., Murphy, R.F., Bonner, J., Hirose, T. and Itakura, K. (1979) Hybridization of syn-
thetic oligodeoxyribonucleotide to phi 174 DNA: the effect of single base pair mismatch. Nucleic Acids
Research 6, 35433557.
Walling, G.A., Visscher, P.M., Andersson, L., Rothschild, M.F., Wang, L., Moser, G., Groenen, A.M.,
Bidanel, J.P., Cepica, S., Archibald, A.L., Geldermann, H., Koning, D.J., Milan, D. and Haley, C.S.
(2000) Combined analysis of data from quantitative trait loci mapping studies: chromosome 4 effects
on porcine growth and fatness. Genetics 155, 13691378.
Wall Tvet, M.W. (2005) How will a Substantive Patent Law Treaty affect the public domain for genetic
resources and biological material? Journal of World Intellectual Property 8, 311344.
Walsh, B. (2001) Quantitative genetics in the age of genomics. Theoretical Population Biology 59,
175184.
Walsh, B. (2004) Population- and quantitative-genetic models of selection limits. Plant Breeding Reviews
24 (Part 1), 177225.
Wan, S., Wu, J., Zhang, Z., Sun, X., Lv, Y., Gao, C., Ning, Y., Ma, J., Guo, Y., Zhang, Q., Zheng, X., Zhang,
C., Ma, Z. and Lu, T. (2008) Activation tagging, an efficient tool for functional analysis of the rice
genome. Plant Molecular Biology 69, 6980.
Wang, D.L., Zhu, J., Li, Z.K. and Paterson, A.H. (1999) Mapping QTLs with epistatic effects and QTL
environment interactions by mixed linear model approaches. Theoretical and Applied Genetics 99,
12551264.
References 705

Wang, E., Robertson, M.J., Hammer, G.L., Carberry, P.S., Holzworth, D., Meinke, H., Chapman, S.C.,
Hargreaves, J.N.G., Huth, N.I. and McLean, G. (2002) Development of a generic crop model template
in the cropping system model APSIM. European Journal of Agronomy 18, 121140.
Wang, G.-L., Mackill, D.J., Bonman, J.M., McCouch, S.R., Champoux, M.C. and Nelson, R.J. (1994) RFLP
mapping of genes conferring complete and partial resistance to blast in a durably resistant rice culti-
var. Genetics 136, 14211434.
Wang, G.W., He, Y.Q., Xu, C.G. and Zhang, Q. (2005) Identification and confirmation of three neutral alleles
conferring wide compatibility in inter-subspecific hybrids of rice (Oryza sativa L.) using near-isogenic
lines. Theoretical and Applied Genetics 111, 702710.
Wang, G.W., He, Y.Q., Xu, C.G. and Zhang, Q. (2006) Fine mapping of f5-Du, a gene conferring wide-
compatibility for pollen fertility in inter-subspecific hybrids of rice (Oryza sativa L.). Theoretical and
Applied Genetics 112, 382387.
Wang, H., Zhang, Y.M., Li, X., Masinde, G.L., Mohan, S., Baylink, D.J. and Xu, S. (2005) Bayesian shrink-
age estimation of quantitative trait loci parameters. Genetics 170, 465480.
Wang, J., van Ginkel, M., Podlich, D., Ye, G., Trethowan, R., Pfeiffer, W., DeLacy, I.H., Cooper, M. and Rajaram,
S. (2003) Comparison of two breeding strategies by computer simulation. Crop Science 43, 17641773.
Wang, J., van Ginkel, M., Trethowan, R., Ye, G., DeLacy, I., Podlich, D. and Cooper, M. (2004) Simulating the
effects of dominance and epistasis on selection response in the CIMMYT Wheat Breeding Program
using QuCim. Crop Science 44, 20062018.
Wang, J., Eagles, H.A., Trethowan, R. and van Ginkel, M. (2005) Using computer simulation of the selec-
tion process and known gene information to assist in parental selection in wheat quality breeding.
Australian Journal of Agricultural Research 56, 465473.
Wang, J., Chapman, S.C., Bonnett, D.G., Rebetzke, G.J. and Crouch, J. (2007) Application of population
genetic theory and simulation models to efficiently pyramid multiple genes via marker-assisted selec-
tion. Crop Science 47, 582588.
Wang, J.K. and Bernardo, R. (2000) Variance and marker estimates of parental contribution to F2 and BC1-
derived inbreds. Crop Science 40, 659665.
Wang, J.K. and Pfeiffer, W.H. (2007) Simulation modeling in plant breeding: principles and applications.
Agricultural Sciences in China 6, 908921.
Wang, X., Rea, T., Bian, J., Gray, S. and Sun, Y. (1999) Identification of the gene responsive to etoposide-
induced apoptosis: application of DNA chip technology. FEBS Letters 445, 269273.
Wang, X., Hu, Z., Wang, W., Li, Y., Zhang, Y.M. and Xu, C. (2007) A mixture model approach to the mapping
of QTL controlling endosperm traits with bulked samples. Genetica 132, 5970.
Wang, X.Y., Chen, P.D. and Zhang, S.Z. (2001) Pyramiding and marker-assisted selection for powdery
mildew resistance genes in common wheat. Acta Genetica Sinica 28, 640646 (in Chinese; summary
in English).
Wang, Y., Chen, B., Hu, Y., Li, J. and Lin, Z. (2005) Inducible excision of selectable marker gene from trans-
genic plants by the Cre/lox site-specific recombination system. Transgenic Research 14, 605614.
Wang, Y.H., Liu, S.J., Ji, S.L., Zhang, W.W., Wang, C.M., Jiang, L. and Wan, J.M. (2005) Fine mapping and
marker-assisted selection (MAS) of a low glutelin content gene in rice. Cell Research 15, 622630.
Wang, Z., Zou, Y., Li, X., Zhang, Q., Chen, L., Wu, H., Su, D., Chen, Y., Guo, J., Luo, D., Long, Y., Zhong, Y.
and Liu, Y.G. (2006) Cytoplasmic male sterility of rice with Boro II cytoplasm is caused by a cytotoxic
peptide and is restored by two related PPR motif genes via distinct modes of mRNA silencing. The
Plant Cell 18, 676687.
Ware, D. and Stein, L. (2003) Comparison of genes among cereals. Current Opinion in Plant Biology 6,
121127.
Ware, D.H., Jaiswal, P., Ni, J., Yap, I.V., Pan, X., Clark, K.Y., Teytelman, L., Schmidt, S.C., Zhao, W., Chang,
K., Cartinhour, S., Stein, L.D. and McCouch, S.R. (2002) Gramene, a tool for grass genomics. Plant
Physiology 130, 16061613.
Warthmann, N., Chen, H., Ossowski, S., Weigel, D. and Herv, P. (2008) Highly specific gene silencing by
artificial miRNAs in rice. PLoS ONE 3(3), e1829.
Wassom, J.J., Wong, J.C., Martinez, E., King, J.J., DeBaene, J., Hotchkiss, J.R., Mikkilineni, V., Bohn, M.O.
and Rocheford, T.R. (2008) QTL associated with maize kernel oil, protein and starch concentrations;
kernel mass; and grain yield in Illinois High Oil B73 backcross-derived lines. Crop Science 48,
243252.
Waugh, R., Mclean, K., Flavell, A.J., Pearce, S.R., Kumar, A. and Thomas, B.B.T. (1997) Genetic distribu-
tion of Bare-1 like retrotransposable elements in the barley genome revealed by sequence-specific
amplification polymorphism (S-SAP). Molecular and General Genetics 253, 687694.
706 References

Wayne, M.L. and McIntyre, L.M. (2002) Combining mapping and arraying: an approach to candidate gene
identification. Proceedings of the National Academy of Sciences of the United States of America 99,
1490314906.
Weber, A.L., Briggs, W.H., Rucker, J., Baltazar, B.M., Snchez-Gonzalez, J.D.J., Feng, P., Buckler, E.S. and
Doebley, J. (2008) The genetic architecture of complex traits in teosinte (Zea mays ssp. parviglumis):
new evidence from association mapping. Genetics 180, 12211232.
Weckwerth, W. (2003) Metabolomics in systems biology. Annual Review of Plant Biology 54, 669689.
Weckwerth, W., Wenzel, K. and Fiehn, O. (2004) Process for the integrated extraction, identification and
quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks.
Proteomics 4, 7883.
Wehrhahn, C. and Allard, R.W. (1965) The detection and measurement of the effects of individual genes
involved in the inheritance of a quantitative character in wheat. Genetics 51, 109119.
Weigel, D. and Nordborg, M. (2005) Natural variation in Arabidopsis. How do we find the causal genes?
Plant Physiology 138, 567568.
Weigel, D., Ahn, J.H., Blazquez, M.A., Borevitz, J.O., Christensen, S.K., Fankhauser, C., Ferrandiz, C.,
Kardailsky, I., Malancharuvil, E.J., Neff, M.M., Nguyen, J.T., Sato, S., Wang, Z., Xia, Y., Dixon, R.A.,
Harrison, M.J., Lamb, C.J., Yanofsky, M.F. and Chory, J. (2000) Activation tagging in Arabidopsis. Plant
Physiology 122, 10031013.
Weir, B.S. (1990) Genetic Data Analysis, Methods for Discrete Population Genetic Data. Sinauer Associates,
Inc., Sunderland, Massachusetts, pp. 222260.
Weir, B.S. (1996) Genetic Data Analysis II. Sinauer Associates, Inc., Sunderland, Massachusetts, 376 pp.
Weise, S., Grosse, I., Klukas, C., Koschtzki, D., Scholz, U., Schreiber, F. and Junker, B.H. (2006) Meta-All:
a system for managing metabolic pathway information. BMC Bioinformatics 7, 465.
Welch, R.M. and Graham, R.D. (2004) Breeding for micronutrients in staple food crops from a human nutri-
tion perspective. Journal of Experimental Botany 55, 353364.
Welch, S.M., Dong, Z. and Roe, J.L. (2004) Modeling gene networks controlling transition to flowering in
Arabidopsis. In: New Directions for a Diverse Planet: Proceedings 4th International Crop Science
Congress (ICSC), 26 September1 October 2004, Brisbane, Australia. ICSC, Brisbane, Australia.
Available at: http://www.cropscience.org.au/icsc2004/ (accessed 17 November 2009).
Welsh, J. and McClelland, M. (1990) Fingerprinting genomes using PCR with arbitrary primers. Nucleic
Acids Research 18, 72317238.
Wenck, A. and Hansen, G. (2004) Positive selection. In: Pea, L. (ed.) Methods in Molecular Biology, Vol. 286.
Transgenic Plants: Methods and Protocols. Humana Press Inc., Totowa, New Jersey, pp. 227235.
Wenzel, W.G. and Pretorius, A.J. (2000) Heterosis and xenia in sorghum malt quality. South-African Journal
of Plant Soil 17, 6669.
Wenzl, P., Carling, J., Kudrna, D., Jaccoud, D., Huttner, E., Kleinhofs, A. and Kilian, A. (2004) Diversity
array technology (DArT) for whole-genome profiling of barley. Proceedings of the National Academy
of Sciences of the United States of America 101, 99159920.
Wenzl, P., Li, H., Carling, J., Zhou, M., Raman, H., Paul, E., Hearnden, P., Maier, C., Xia, L., Caig, V.,
Ovesn, J., Cakir, M., Poulsen, D., Wang, J., Raman, R., Smith, K.P., Muehlbauer, G.J., Chalmers,
K.J., Kleinhofs, A., Huttner, E. and Kilian, A. (2006) A high-density consensus map of barley linking
DArT markers to SSR, RFLP and STS loci and agricultural traits. BMC Genomics 7, 206.
Werner, K., Friedt, W. and Ordon, F. (2005) Strategies for pyramiding resistance genes against the barley
yellow mosaic virus complex (BaMMV, BaYMV, BaYMV-2). Molecular Breeding 16, 4555.
Wesley, S.V., Helliwell, C.A., Smith, N.A., Wang, M.B., Rouse, D.T., Liu, Q., Gooding, P.S., Singh, S.P.,
Abbott, D., Stoutjesdijk, P.A., Robinson, S.P., Gleave, A.P., Green, A.G. and Waterhouse, P.M. (2001)
Construct design for efficient, effective and high-throughput gene silencing in plants. The Plant Journal
27, 581590.
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio,
M., Edgar, R., Federhen, S., Feolo, M., Geer, L.Y., Helmberg, W., Kapustin, Y., Khovayko, O.,
Landsman, D., Lipman, D.J., Madden, T.L., Maglott, V., Miller, D.R., Ostell, J., Pruitt, K.D., Schuler,
G.D., Shumway, M., Sequeira, E., Sherry, S.T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov,
R.L., Tatusova, T.A., Wagner, L. and Yaschenko, E. (2007) Database resources of the National Center
for Biotechnology Information. Nucleic Acids Research 36, D13D21.
White, J.W. and Hoogenboom, G. (1996) Integrating effects of genes for physiological traits into crop growth
models. Agronomy Journal 88, 416422.
References 707

White, J.W. and Hoogenboom, G. (2003) Gene-based approaches to crop simulation: past experiences
and future opportunities. Agronomy Journal 95, 5264.
White, P.J. and Broadley, M.R. (2005) Biofortifying crops with essential mineral elements. Trends in Plant
Science 10, 586593.
White, P.R. (1934) Potentially unlimited growth of excised tomato root tips in a liquid medium. Plant
Physiology 9, 585600.
Whitelaw, C.A., Barbazuk, W.B., Pertea, G., Chan, A.P., Cheung, F., Lee, Y., Zheng, L., van Heeringen,
S., Karamycheva, S., Bennetzen, J.L., SanMiguel, P., Lakey, N., Bedell, J., Yuan, Y., Budiman, M.A.,
Resnick, A., van Aken, S., Utterback, T., Riedmuller, S., Williams, M., Feldblyum, T., Schubert, K.,
Beachy, R., Fraser, C.M. and Quackenbush, J. (2003) Enrichment of gene-coding sequences in maize
by genome filtration. Science 302, 21182120.
Whitelegge, J.P. (2002) Plant proteomics: BLASTing out of a MudPIT. Proceedings of the National Academy
of Sciences of the United States of America 99, 1156411566.
Whitesides, G.M. (2006) The origins and the future of microfluidics. Nature 442, 368373.
Whittaker, J.C., Haley, C.S. and Thompson, R. (1997) Optimal weighting of information in marker-assisted
selection. Genetical Research 69, 137144.
Wiemann, S., Weil, B., Wellenreuther, R., Gassenhuber, J., Glassl, S., Ansorge, W., Bocher, M., Blocker,
H., Bauersachs, S., Blum, H., Lauber, J., Dsterhft, A., Beyer, A., Khrer, K., Strack, N., Mewes,
H.-W., Ottenwlder, B., Obermaier, B., Tampe, J., Heubner, D., Wambutt, R., Korn, B., Klein, M. and
Poustka, A. (2001) Toward a catalog of human genes and proteins: sequencing and analysis of 500
novel complete protein coding human cDNAs. Genome Research 11, 422435.
Wilkes, G. (1993) Germplasm collections: their use, potential, social responsibility and genetic vulnerability.
In: Buxton, D.R., Shibles, R., Forsberg, R.A., Blad, B.L., Asay, K.H., Paulsen, G.M. and Wilson,
R.F. (eds) International Crop Science I. Crop Science Society of America, Madison, Wisconsin,
pp. 445450.
Wilkinson, M., Schoof, H., Ernst, R. and Haase, D. (2005) BioMOBY successfully integrates distrib-
uted heterogeneous bioinformatics web services. The PlaNet Exemplar case. Plant Physiology
138, 57.
Wilkins-Stevens, P., Hall, J.G., Lyamichev, V., Neri, B.P., Lu, M., Wang, L., Smith, L.M. and Kelso, D.M.
(2001) Analysis of single nucleotide polymorphisms with solid phase invasive cleavage reactions.
Nucleic Acids Research 29, e77.
William, H.M., Morris, M., Warburton, M. and Hiosington, D.A. (2007a) Technical, economic and policy
considerations on marker-assisted selection in crops: lessons from the experience at an international
agricultural research center. In: Guimares, E.P., Ruane, J., Scherf, B.D., Sonnino, A. and Dargie,
J.D. (eds) Marker-Assisted Selection, Current Status and Future Perspectives in Crops, Livestock,
Forestry and Fish. Food and Agriculture Organization of the Unites Nations, Rome, pp. 381404.
William, H.M., Trethowan, R. and Crosby-Galvan, E.M. (2007b) Wheat breeding assisted by markers:
CIMMYTs experience. Euphytica 157, 307319.
Williams, C.E. and St Clair, D.A. (1993) Phenetic relationships and levels of variability detected by restric-
tion fragment length polymorphism and random amplified polymorphic DNA analysis of cultivated and
wild accessions of Lycopersicon esculentum. Genome 36, 619630.
Williams, E.J. (1952) The interpretation of interactions in factorial experiments. Biometrika 39, 6581.
Williams, J.G.K., Kubelik, A.R., Livak, K.J., Rafalski, J.A. and Tingey, S.V. (1990) DNA polymorphisms ampli-
fied by arbitrary primers are useful as genetic markers. Nucleic Acids Research 18, 65316535.
Williams, J.S. (1962) The evaluation of a selection index. Biometrics 18, 375393.
Wilson, J.A. (1968) Problems in hybrid wheat breeding. Euphytica 17 (Suppl.1), 1333.
Wilson, L.M., Whitt, S.R., Ibanez, A.M., Rocheford, T.R., Goodman, M.M. and Buckler IV, E.S. (2004)
Dissection of maize kernel composition and starch production by candidate gene association. The
Plant Cell 16, 27192733.
Wilson, P. and Driscoll, C.J. (1983) Hybrid wheat. In: Frankel, R. (ed.) Monographs on Theoretical and
Applied Genetics, Vol. 6. Heterosis. Springer-Verlag, Berlin, pp. 94123.
Wilson, W.A., Harrington, S.E., Woodman, W.L., Lee, M., Sorrells, M.E. and McCouch, S.R. (1999)
Inferences on the genome structure of progenitor maize through comparative analysis of rice, maize
and the domesticated panicoids. Genetics 153, 453473.
Windsor, A.J. and Mitchell-Olds, T. (2006) Comparative genomics as a tool for gene discovery. Current
Opinion in Biotechnology 17, 17.
708 References

Wingbermuehle, W.J., Gustus, C. and Smith, K.P. (2004) Exploiting selective genotyping to study genetic
diversity of resistance to Fusarium head blight in barley. Theoretical and Applied Genetics 109,
11601168.
Wink, M. (1988) Plant breeding: importance of plant secondary metabolites for protection against patho-
gens and herbivores. Theoretical and Applied Genetics 75, 225233.
Winkler, R.G. and Feldman, K.A. (1998) PCR-based identification of T-DNA insertion mutants. Methods in
Molecular Biology 82, 129136.
Winzeler, E.A., Richards, D.R., Conway, A.R., Goldstein, A.L., Kalman, S., McCullough, M.J., McCusker,
J.H., Stevens, D.A., Wodicka, L., Lockhart, D.J. and Davis, R.W. (1998) Direct allelic variation scan-
ning of the yeast genome. Science 281, 11941197.
Wishart, D.S., Tzur, D., Knox, C., Eisner, R., Guo, A.C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney,
S., Fung, C., Nikolai, L., Lewis, M., Coutouly, M.-A., Forsythe, I., Tang, P., Shrivastava, S., Jeroncic, K.,
Stothard, P., Amegbey, G., Block, D., Hau, D.D., Wagner, J., Miniaci, J., Clements, M., Gebremedhin,
M., Guo, N., Zhang, Y., Duggan, G.E., Macinnis, G.D., Weljie, A.M., Dowlatabadi, R., Bamforth, F.,
Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B.D., Vogel, H.J. and Querengesser, L. (2007) HMDB:
The Human Metabolome Database. Nucleic Acids Research 35(Database issue), D521526.
Witcombe, J.R. (1996) Participatory approaches to plant breeding and selection. Biotechnology and
Development Monitor 29, 26.
Witcombe, J.R. and Hash, C.T. (2000) Resistance gene deployment strategies in cereal hybrids using
marker-assisted selection: gene pyramiding, three-way hybrids and synthetic parent populations.
Euphytica 112, 175186.
Withers, L.A. (1993) New technologies for the conservation of plant genetic resources. In: Buxton, D.R.,
Shibles, R., Forsberg, R.A., Blad, B.L., Asay, K.H., Paulsen, G.M. and Wilson, R.F. (eds) International
Crop Science I. Crop Science Society of America, Madison, Wisconsin, pp. 429435.
Withers, L.A. (1995) Collecting in vitro for genetic resources conservation. In: Guarino, L., Ramanatha
Rao, V. and Reid, R. (eds) Collecting Plant Genetic Diversity. CAB International, Wallingford, UK, pp.
511515.
Wold, B. and Myers, R.M. (2008) Sequence census methods for functional genomics. Nature Methods 5,
1921.
Wolf, Y.I., Rogozin, I.B., Grishin, N.V. and Koonin, E.V. (2003) Genome-scale phylogenetic trees. In: Frontiers
in Computational Genomics. Caister Academic Press, Wymondham, UK, pp. 241260.
Wollenweber, B., Porter, J.R. and Lbberstedt, T. (2005) Need for multidisciplinary research towards a
second green revolution. Commentary. Current Opinion in Plant Biology 8, 337341.
Wong, D.W.S. (1997) The ABCs of Gene Cloning. Chapman & Hall, New York.
Worland, A.J. and Law, C.N. (1986) Genetic analysis of chromosome 2D of wheat. I. The location of genes
affecting height, daylength insensitivity, hybrid dwarfism and yellow-rust resistance. Zeitschrift fr
Pflanzenzchtung 96, 331345.
Wouters, F.S., Verveer, P.J. and Bastiaens, P.I.H. (2001) Imaging biochemistry inside cells. Trends in Cell
Biology 11, 203221.
Wright, A.J. and Mowers, R.P. (1994) Multiple regression for molecular-marker, quantitative trait data from
large F2 populations. Theoretical and Applied Genetics 89, 305312.
Wright, S. (1921a) Correlation and causation. Journal of Agricultural Research 20, 557585.
Wright, S. (1921b) Systems of mating I. The biometric relations between parent and offspring. Genetics 6,
111123.
Wright, S. (1978) Evolution and Genetics of Populations, Vol. IV. The University of Chicago Press, Chicago,
Illinois.
Wright, S.I., Bi, I.V., Schroeder, S.G., Yamasaki, M., Doebley, J.F., McMullen, M.D. and Gaut, B.S. (2005)
The effects of artificial selection on the maize genome. Science 308, 13101314.
Wu, C., Li, X.J., Yuan, W.Y., Chen, G.X., Kilian, A., Li, J., Xu, C., Li, X.H., Zhou, D.-X., Wang, S. and Zhang,
Q. (2003) Development of enhancer trap lines for functional analysis of the rice genome. The Plant
Journal 35, 418427.
Wu, H., Sparks C., Amoah, B. and Jones, H.D. (2003) Factors influencing successful Agrobacterium-
mediated genetic transformation of wheat. Plant Cell Reports 21, 659668.
Wu, H., Sparks, C. and Jones, H.D. (2006) Characterization of T-DNA loci and vector backbone sequences
in transgenic wheat produced by Agrobacterium-mediated transformation. Molecular Breeding 18,
195208.
Wu, L., Nandi, S., Chen, L., Rodriguez, R.L. and Huang, N. (2002) Expression and inheritance of nine
transgenes in rice. Transgenic Research 11, 533541.
References 709

Wu, M.S., Wang, S.C. and Dai, J.R. (2000) Application of AFLP markers to heterotic grouping of elite maize
inbred lines. Acta Agronomica Sinica 26, 913.
Wu, R., Lou, X.Y., Ma, C.X., Wang, X., Larkins, B.A. and Casella, G. (2002a) An improved genetic model
generates high-resolution mapping of QTL for protein quality in maize endosperm. Proceedings of the
National Academy of Sciences of the United States of America 99, 1128111286.
Wu, R., Ma, C.-S. and Casella, G. (2002b) Joint linkage and linkage disequilibrium mapping of qualitative
trait loci in natural mapping populations. Genetics 160, 779792.
Wu, R., Ma, C.X., Gallo-Meagher, M., Littell, R.C. and Casella, G. (2002c) Statistical methods for dis-
secting triploid endosperm traits using molecular markers: an autogamous model. Genetics 162,
875892.
Wu, R., Ma, C.-X., Lin, M., Wang, Z. and Casella, G. (2004) Functional mapping of quantitative trait loci
underlying growth trajectories using the transform-both-sides of the logistic model. Biometrics 60,
729738.
Wu, R., Ma, C. and Casella, G. (2007) Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL
(Statistics for Biology and Health). Springer, Berlin.
Wu, R.L. and Lin, M. (2006) Functional mapping: how to map and study the genetic architecture of dynamic
complex traits. Nature Reviews Genetics 7, 229237.
Wu, R.L. and Zeng, Z.B. (2001) Joint linkage and linkage disequilibrium mapping in natural populations.
Genetics 157, 899909.
Wu, W., Zhou, Y., Li, W., Mao, D. and Chen, Q. (2002) Mapping of quantitative trait loci based on growth
models. Theoretical and Applied Genetics 105, 10431049.
Wu, W.R. and Li, W.M. (1994) A new approach for mapping quantitative trait loci using complete genetic
marker linkage maps. Theoretical and Applied Genetics 89, 535539.
Wu, W.R. and Li, W.M. (1996) Model fitting and model testing in the method of joint mapping of quantitative
trait loci. Theoretical and Applied Genetics 92, 477482.
Wu, W.-R., Li, W.-M., Tang, D.-Z., Lu, H.-R. and Worland, A.J. (1999) Time-related mapping of quantitative
trait loci underlying tiller number in rice. Genetics 151, 297303.
Xi, Z.Y., He, F.H., Zeng, R.Z., Zhang, Z.M., Ding, X.H., Li, W.T. and Zhang, G.Q. (2006) Development of
a wide population of chromosome single segment substitution lines in the genetic background of an
elite cultivar of rice (Oryza sativa L.). Genome 49, 476484.
Xia, L., Peng, K., Yang, S., Wenzl, P., de Vincente, M.C., Fregene, M. and Kilian, A. (2005) DArT for high-
throughput genotyping of cassava (Manihot esculenta) and its wild relatives. Theoretical and Applied
Genetics 110, 10921098.
Xia, X.C., Reif, J.C., Melchinger, A.E., Frisch, M., Hoisington, D.A., Beck, D., Pixley, K. and Warburton,
M.L. (2005) Genetic diversity among CIMMYT maize inbred lines investigated with SSR markers: II.
Subtropical, tropical mid-altitude and highland maize inbred lines and their relationships with elite U.S.
and European maize. Crop Science 45, 25732582.
Xiang, C., Han, P., Lutziger, I., Wang, K. and Oliver, D.J. (1999) A mini binary vector series for plant trans-
formation. Plant Molecular Biology 40, 711717.
Xiao, J., Li, J., Yuan, L. and Tanksley, S.D. (1995) Dominance is the major genetic basis of heterosis in rice
as revealed by QTL analysis using molecular markers. Genetics 140, 745754.
Xiao, J., Grandillo, S., Ahn, S.N., McCouch, S.R., Tanksley, S.D., Li, J. and Yuan, L. (1996a) Genes from
wild rice improve yield. Nature 384, 223224.
Xiao, J., Li, J., Yuan, L., McCouch, S.R. and Tanksley, S.D. (1996b) Genetic diversity and its relationship to
hybrid performance and heterosis in rice as revealed by PCR-based markers. Theoretical and Applied
Genetics 92, 637643.
Xiao, J., Li, J., Yuan, L. and Tanksley, S.D. (1996c) Identification of QTLs affecting traits of agronomic
importance in a recombinant inbred population derived from subspecific rice cross. Theoretical and
Applied Genetics 92, 230244.
Xiao, J., Li, L., Grandillo, S., Yuan, L., Tanksley, S.D. and McCouch, S.R. (1998) Identification of trait-improving
quantitative trait loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150, 899909.
Xiong, Q., Qiu, Y. and Gu, W. (2008) PGMapper: a web-based tool linking phenotype to genes. Bioinformatics
24, 10111013.
Xu, C., He, X. and Xu, S. (2003) Mapping quantitative trait loci underlying triploid endosperm traits. Heredity
90, 228235.
Xu, S. (1996) Mapping quantitative trait loci using four-way crosses. Genetical Research 68, 175181.
Xu, S. (1998) Mapping quantitative trait loci using multiple families of line crosses. Genetics 148,
517524.
710 References

Xu, S. (2002) QTL analysis in plants. In: Camp, N.J. and Cox, A. (eds) Methods in Molecular Biology,
Vol. 195. Quantitative Trait Loci: Methods and Protocols. Humana Press, Totowa, New Jersey, pp.
283310.
Xu, S. (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163, 789801.
Xu, S. (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait-loci. Biometrics
63, 513521.
Xu, S. and Jia, Z. (2007) Genome-wide analysis of epistatic effects for quantitative traits in barley. Genetics
175, 19551963.
Xu, Y. (1994) Application of molecular markers in genetic improvement of quantitative traits in plants. In:
Proceedings of the Third Young Scientists Symposium on Crop Genetics and Breeding. Publishing
House of Agricultural Science and Technology of China, Beijing, pp. 3849.
Xu, Y. (1997) Quantitative trait loci: separating, pyramiding and cloning. Plant Breeding Reviews 15,
85139.
Xu, Y. (2002) Global view of QTL: rice as a model. In: Kang, M.S. (ed.) Quantitative Genetics, Genomics
and Plant Breeding. CAB International, Wallingford, UK, pp. 109134.
Xu, Y. (2003) Developing marker-assisted selection strategies for breeding hybrid rice. Plant Breeding
Reviews 23, 73174.
Xu, Y. and Crouch, J.H. (2008) Marker-assisted selection in plant breeding: from publications to practice.
Crop Science 48, 391407.
Xu, Y. and Luo, L. (2002) Biotechnology and germplasm resource management in rice. In: Luo, L., Ying, C.
and Tang, S. (eds) Rice Germplasm Resources. Hubei Science and Technology Publisher, Wuhan,
China, pp. 229250.
Xu, Y.B. and Shen, Z.T. (1991) Diallel analysis of tiller number at different growth stages in rice (Oryza
sativa L.). Theoretical and Applied Genetics 83, 243249.
Xu, Y. and Shen, Z. (1992a) Detection and genetic analyses of the gene dispersed crosses: some theoreti-
cal considerations. Acta Agricultura Zhejiangensis 18, 109117 (in English with Chinese abstract).
Xu, Y. and Shen, Z. (1992b) Detection and genetic analyses of the gene dispersed cross for tiller angle in
rice (Oryza sativa L.). Acta Agricultura Zhejiangensis 4, 5460.
Xu, Y. and Shen, Z. (1992c) Accumulation of the alleles with similar effects at four loci controlling tiller angle
from gene dispersed crosses in rice (Oryza sativa L.). Journal of Biomathematics (Beijing) 7, 110.
Xu, Y.B. and Shen, Z.T. (1992d) Distorted segregation of waxy gene and its characterization in indica
japonica hybrids. Chinese Journal of Rice Science 6, 8992 (in Chinese).
Xu, Y. and Zhu, L. (1994) Molecular Quantitative Genetics (in Chinese). China Agriculture Press, Beijing,
China, 291 pp.
Xu, Y., Shen, Z., Chen, Y. and Zhu, L. (1995) A statistical technique and generalized computer software
for interval mapping of quantitative trait loci and its application. Acta Agronomica Sinica 21, 18 (in
Chinese with English abstract).
Xu, Y., Zhu, L., Xiao, J., Huang, N. and McCouch, S.R. (1997) Chromosomal regions associated with seg-
regation distortion of molecular markers in F2, backcross, doubled haploid and recombinant inbred
populations of rice (Oryza sativa L.). Molecular and General Genetics 253, 535545.
Xu, Y., McCouch, S.R. and Shen, Z. (1998) Transgressive segregation of tiller angle in rice caused by com-
plementary action of genes. Crop Science 38, 1219.
Xu, Y., Lobos, K.B. and Clare, K.M. (2002) Development of SSR markers for rice molecular breeding. In:
Proceedings of Twenty-Ninth Rice Technical Working Group Meeting, 2427 February 2002, Little
Rock, Arkansas. Rice Technical Working Group, Little Rock, Arkansas, p. 49.
Xu, Y., Ishii, T. and McCouch, S.R. (2003) Marker-assisted evaluation of germplasm resources for plant
breeding. In: Mew, T.W., Brar, D.S., Peng, S. and Hardy, B. (eds) Rice Science: Innovations and Impact
for Livelihood. Proceedings of the 24th International Rice Research Conference, 1619 September
2002, Beijing. International Rice Research Institute, Chinese Academy of Engineering and Chinese
Academy of Agricultural Sciences, Beijing, pp. 213229.
Xu, Y., Beachell, H. and McCouch, S.R. (2004) A marker-based approach to broadening the genetic base
of rice (Oryza sativa L.) in the US. Crop Science 44, 19471959.
Xu, Y., McCouch, S.R. and Zhang, Q. (2005) How can we use genomics to improve cereals with rice as a
reference genome? Plant Molecular Biology 59, 726.
Xu, Y., Wang, J. and Crouch, J.C. (2008) Selective genotyping and pooled DNA analysis: an innovative
use of an old concept. In: Proceedings of the 5th International Crop Science Congress, 1318 April
2008, Jeju, Korea. Published on CD-ROM. Available at: http://www.cropscience2008.com (accessed
30 June 2008).
References 711

Xu, Y., Babu, R., Skinner D.J., Vivek, B.S. and Crouch, J.H. (2009a) Maize mutant Opaque2 and the improve-
ment of protein quality through conventional and molecular approaches. In: Shu, Q.Y. (ed.) Induced
Plant Mutations in the Genomics Era. Food and Agriculture Organization of the United Nations, Rome,
pp. 191196.
Xu, Y., Lu, Y., Yan, J., Babu, R., Hao, Z., Gao, S., Zhang, S., Li, J., Vivek, B.S., Magorokosho, C., Mugo,
S., Makumbi, D., Taba, S., Palacios, N., Guimares, C.T., Araus, J.-L., Wang, J., Davenport, G.F.,
Crossa, J. and Crouch, J.H. (2009b) SNP-chip based genomewide scan for germplasm evaluation and
markertrait association analysis and development of a molecular breeding platform. Proceedings of
14th Australasian Plant Breeding & 11th Society for the Advancement in Breeding Research in Asia &
Oceania Conference, 1014 August 2009, Cairns, Tropical North Queensland, Australia. Distributed
by CD-ROM.
Xu, Y., Skinner, D.J., Wu, H., Palacios-Rojas, N., Araus, J.L., Yan, J., Gao, S., Warburton, M.L. and Crouch,
J.H. (2009c) Advances in maize genomics and their value for enhancing genetic gains from breed-
ing. International Journal of Plant Genomics Volume 2009, Article ID 957602, 30 pages. Available at:
http://www.hindawi.com/journals/ijpg/2009/957602.html (accessed 21 December 2009).
Xu, Y., This, D., Pausch, R.C., Vonhof, W.M., Coburn, J.R., Comstock, J.P., McCouch, S.R. (2009d) Water
use efficiency determined by carbon isotope discrimination in rice: genetic variation associated with
population structure and QTL mapping. Theoretical and Applied Genetics 118, 10651081.
Xue, W., Xing, Y., Weng, X., Zhao, Y., Tang, W., Wang, L., Zhou, H., Yu, S., Xu, C., Li, X. and Zhang, Q.
(2008) Natural variation in Gdh7 is an important regulator of heading date and yield potential in rice.
Nature Genetics 40, 761767.
Xue, Y. and Xu, Z. (2002) An introduction to the China Rice Functional Genomics Program. Comparative
and Functional Genomics 3, 161163.
Yadav, N.S., Vanderleyden, J., Bennett, D.R., Barnes, W.M. and Chilton, M.D. (1982) Short direct repeats
flank the T-DNA on a nopaline Ti plasmid. Proceedings of the National Academy of Sciences of the
United States of America 79, 63226326.
Yadav, R.S., Hash, C.T., Bidinger, F.R., Cavan, G.P. and Howarth, C.J. (2002) Quantitative trait loci associ-
ated with traits determining grain and stover yield in pearlmillet under terminal drought stress condi-
tions. Theoretical and Applied Genetics 104, 6783.
Yamada, K., Lim, J., Dale, J.M., Chen, H., Shinn, P., Palm, C.J., Southwick, A.M., Wu, H.C., Kim, C., Nguyen,
M., Pham, P., Cheuk, R., Karlin-Newmann, G., Liu, S.X., Lam, B., Sakano, H., Wu, T., Yu, G., Miranda,
M., Quach, H.L., Tripp, M., Chang, C.H., Lee, J.M., Toriumi, M., Chan, M.M.H., Tang, C.C., Onodera,
C.S., Deng, J.M., Akiyama, K., Ansari, Y., Arakawa, T., Banh, J., Banno, F., Bowser, L., Brooks, S.,
Carninci, P., Chao, Q., Choy, N., Enju, A., Goldsmith, A.D., Gurjal, M., Hansen, N.F., Hayashizaki, Y.,
Johnson-Hopson, C., Hsuan, V.W., Iida, K., Karnes, M., Khan, S., Koesema, E., Ishida, J., Jiang, P.X.,
Jones, T., Kawai, J., Kamiya, A., Meyers, C., Nakajima, M., Narusaka, M., Seki, M., Sakurai, T., Satou,
M., Tamse, R., Vaysberg, M., Wallender, E.K., Wong, C., Yamamura, Y., Yuan, S., Shinozaki, K., Davis,
R.W., Athanasios Theologis, A. and Ecker, J.R. (2003) Empirical analysis of transcriptional activity in
the Arabidopsis genome. Science 302, 842846.
Yamagishi, M., Yano, M., Fukuta, Y., Fukui, K., Otani, M. and Shimada, T. (1996) Distorted segregation
of RFLP markers in regenerated plants derived from anther culture of an F1 hybrid of rice. Genes &
Genetic Systems 71, 3741.
Yamamoto, T., Takemori, N., Sue, N. and Nitta, N. (2003) QTL analysis of stigma exsertion in rice. Rice
Genetics Newsletter 20, 3334.
Yamazaki, M., Tsugawa, H., Miyao, A., Yano, M., Wu, J., Yamamoto, S., Matsumoto, T., Sasaki, T. and
Hirochika, H. (2001) The rice retrotransposon Tos17 prefers low-copy-number sequences as integra-
tion targets. Molecular and General Genetics 265, 336344.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1998a) Molecular dissection of developmental behavior
of plant height in rice (Oryza sativa L.). Genetics 150, 12571265.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1998b) Quantitative trait loci analysis for the developmen-
tal behavior of tiller number in rice (Oryza sativa L.). Theoretical and Applied Genetics 97, 267274.
Yan, J., Zhu, J., He, C., Benmoussa, M. and Wu, P. (1999) Molecular marker-assisted dissection of genotype
environment interaction for plant type traits in rice (Oryza sativa L.). Crop Science 39, 538544.
Yan, J., Yang, X., Shah, T., Snchez-Villeda, H., Li, J., Warburton, M., Zhou, Y., Crouch, J.H. and Xu, Y.
(2009) High-throughput SNP genotyping with the GoldenGate assay in maize. Molecular Breeding
(in press).
Yan, W. and Kang, M.S. (2003) GGE Biplot Analysis: a Graphical Tool for Breeders, Geneticists and
Agronomists. CRC Press, Boca Raton, Florida.
712 References

Yan, W. and Rajcan, I. (2002) Biplot evaluation of test sides and trait relations of soybean in Ontario. Crop
Science 42, 1120.
Yan, W. and Tinker, N.A. (2006) Biplot analysis of multi-environment trial data: principles and applications.
Canadian Journal of Plant Science 86, 623645.
Yan, W., Hunt, L.A., Sheng, Q. and Szlavnies, Z. (2000) Cultivar evolution and mega-environment investiga-
tion based on GGE biplot. Crop Science 40, 596605.
Yan, W., Rutger, J.N., Bockelman, H.E. and Tai, T. (2004) Development of a core collection from the USDA
rice germplasm collection. In: Norman, R.J., Meullenet, J.-F. and Moldenhauer, K.A.K. (eds) B. R.
Wells Rice Research Studies 2003. Arkansas Agricultural Expteriment Research Station Series No.
517, pp. 8896. Available at: http://www.uark.edu/depts/agripub/publications/research (accessed 31
December 2007).
Yan, W., Kang, M.S., Ma, B., Woods, S. and Cornelius, P.L. (2007) GGE biplot vs. AMMI analysis of geno-
type-by-environment data. Crop Science 47, 643655.
Yang, H., You, A., Yang, Z., Zhang, F., He, R., Zhu, L. and He, G. (2004) High-resolution genetic mapping
at the Bph5 locus for brown planthopper resistance in rice (Oryza sativa L.). Theoretical and Applied
Genetics 110, 182191.
Yang, H.-C., Liang, Y.-J., Huang, M.-C., Li, L.-H., Lin, C.H., Wu, J.-Y., Chen, Y.-T. and Fann, C.S.J. (2006a)
A genome-wide study of preferential amplification/hybridization in microarray-based pooled DNA
experiments. Nucleic Acids Research 34, e106.
Yang, H.-C., Pan, C.-C., Lin, C.-Y. and Fann, C.S.J. (2006b) PDA: pooled DNA analyzer. BMC Bioinformatics
7, 233.
Yang, J., Hu, C., Hu, H., Yu, R., Xia, Z., Ye, X. and Zhu, J. (2008) QTLNetwork: mapping and visualizing
genetic architecture of complex traits in experimental populations. Bioinformatics 24, 721723.
Yang, R. and Xu, S. (2007) Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics
176, 11691185.
Yang, R.-C. (2004) Epistasis of quantitative trait loci under different gene action models. Genetics 167,
14931505.
Yang, R.-C. (2007) Mixed model analysis of crossover genotype-environment interactions. Crop Science
47, 10511062.
Yang, R.Q., Tan, Q. and Xu, S.Z. (2006) Mapping quantitative trait loci for longitudinal traits in line crosses.
Genetics 173, 23392356.
Yang, X., Rupe, M., Bickel, D., Arthur, L., Smith, O. and Guo, M. (2006) Effects of cistrans-regulation on
allele-specific transcript expression in the meristems of maize hybrids. In: 48th Annual Maize Genetic
Conference, 912 March 2006, Pacific Grove, California, 132 pp.
Yang, X.R., Wang, J.R., Li, H.L. and Li, Y.F. (1983) Studies on the general medium for anther culture of
cereals and increasing of the frequency of green pollen-plantlets-induction of Oryza sativa subsp.
hseni. In: Shen, J.H., Zhang, Z.H. and Shi, S.D. (eds) Studies on Anther-Cultured Breeding in Rice.
Agriculture Press, Beijing, pp. 6169.
Yano, M., Harushima, Y., Nagamura, Y., Kurata, N., Minobe, Y. and Sasaki, T. (1997) Identification of quan-
titative trait loci controlling heading date in rice using a high-density linkage map. Theoretical and
Applied Genetics 95, 10251032.
Yano, M., Katayose,Y., Ashikari, M., Yamanouchi, U., Monna, L., Fuse, T., Baba, T., Yamamoto, K., Umehara,
Y., Nagamura, Y. and Sasaki, T. (2000) Hd1, a major photoperiod sensitivity quantitative trait locus
in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. The Plant Cell 12,
24732484.
Yano, M., Kojima, S., Takahashi, Y., Lin, H.X. and Sasaki, T. (2001) Genetic control of flowering time in rice,
as short-day plant. Plant Physiology 127, 14251429.
Yao, Y., Ni, Z., Zhang, Y., Chen, Y., Ding, Y., Han, Z., Liu, Z. and Sun, Q. (2005) Identification of differentially
expressed genes in leaf and root between wheat hybrid and its parental inbreds using PCR-based
cDNA subtraction. Plant Molecular Biology 58, 367384.
Yates, F. and Cochran, W.G. (1938) The analysis of groups of experiments. Journal of Agricultural Science
28, 556580.
Ye, X., Al-Babili, S., Klti, A., Zhang, J., Lucca, P., Beyer, P. and Potrykus, I. (2000) Engineering the pro-
vitamin A (b-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 287,
303305.
Yi, N. (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci.
Genetics 167, 967975.
References 713

Yi, N. and Shriner, D. (2008) Advances in Bayesian multiple quantitative trait loci mapping in experimental
crosses. Heredity 100, 240252.
Yi, N. and Xu, S. (2002) Mapping quantitative trait loci with epistatic effects. Genetical Research 79,
185198.
Yi, N., George, V. and Allison, D.B. (2003) Stochastic search variable selection for identifying multiple quan-
titative trait loci. Genetics 164, 11291138.
Yi, N., Yandell, B.S., Churchill, G.A., Allison, D.B., Eisen, E.J. and Pomp, D. (2005) Bayesian model selec-
tion for genome-wide epistatic quantitative trait loci analysis. Genetics 70, 13331344.
Yi, N., Zinniel, D.K., Kim, K., Eisen, E.J., Bartolucci, A., Allison, D.B. and Pomp, D. (2006) Bayesian analyses
of multiple epistasis QTL models for body weight and body composition in mice. Genetical Research
87, 4560.
Yi, N., Banerjee, S., Pomp, D. and Yandell, B.S. (2007) Bayesian mapping of genomewide interacting quan-
titative trait loci for ordinal traits. Genetics 176, 18551864.
Yin, T.M., DiFazio, S.P., Gunter, L.E., Riemenschneider, D. and Tuskan, G.A. (2004) Large-scale heter-
ospecific segregation distortion in Populus revealed by a dense genetic map. Theoretical and Applied
Genetics 109, 451463.
Yin, X., Kropff, M.J. and Stam, P. (1999) The role of ecophysiological models in QTL analysis: the example
of specific leaf area in barley. Heredity 82, 415421.
Yin, X., Stam, P., Kropff, M.J. and Schapendonk, A.H.C.M. (2003) Crop modeling, QTL mapping and their
complementary role in plant breeding. Agronomy Journal 95, 9098.
Yin, X., Struik, P.C. and Kropff, M.J. (2004) Role of crop physiology in predicting gene-to-phenotype rela-
tionships. Trends in Plant Science 9, 426432.
Yin, X., Struik, P.C., Tang, J., Qi, C. and Liu, T. (2005) Model analysis of flowering phenology in recombinant
inbred lines of barley. Journal of Experimental Botany 56, 959965.
Yoo, B.H. (1980) Long-term selection for a quantitative character in large replicate populations of Drosophila
melanogaster. I. Response to selection. Genetical Research 35, 117.
Yoon, D.-B., Kang, K.-H., Kim, H.-J., Ju, H.-G., Kwon, S.-J., Suh, J.-P., Jeong, O.-Y. and Ahu, S.-N. (2006)
Mapping quantitative trait loci for yield components and morphological traits in an advanced backcross
population between Oryza grandiglumis and the O. japonica cultivar Hwaseongbyeo. Theoretical and
Applied Genetics 112, 10521062.
Young, N.D. (1999) A cautiously optimistic vision for marker assisted breeding. Molecular Breeding 5,
505 510.
Young, N.D. and Tanksley, S.D. (1989a) Restriction fragment length polymorphism maps and the concept
of graphical genotypes. Theoretical and Applied Genetics 77, 95101.
Young, N.D. and Tanksley, S.D. (1989b) RFLP analysis of the size of chromosomal segments retained
around the Tm-2 locus of tomato during backcross breeding. Theoretical and Applied Genetics 77,
353359.
Young, N.D., Zamir, D., Ganal, M. and Tanksley, S.D. (1988) Use of isogenic lines and simultaneous probing
to identify DNA markers tightly linked to the Tm-2a gene in tomato. Genetics 120, 579585.
Yousef, G.G. and Juvik, J.A. (2001a) Comparison of phenotypic and marker-assisted selection for quantita-
tive traits in sweet corn. Crop Science 41, 645655.
Yousef, G.G. and Juvik, J.A. (2001b) Evaluation of breeding utility of a chromosomal segment from
Lycopersicon chmielewskii that enhances cultivated tomato soluble solids. Theoretical and Applied
Genetics 103, 10221027.
Yu, G.-X. and Wise, R.P. (2000) An anchored AFLP- and retrotransponson-based map of diploid Avena.
Genome 43, 736749.
Yu, J., Hu, S., Wang, J., Wong, G.K.S., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Cao, M., Liu, J.,
Sun, J., Tang, J., Chen, Y., Huang, X., Lin, W., Ye, C., Tong, W., Cong, L., Geng, J., Han, Y., Li, L., Li,
W., Hu, G., Huang, X., Li, W., Li, J., Liu, Z., Li, L., Liu, J., Qi, Q., Liu, J., Li, L., Li, T., Wang, X., Lu, H.,
Wu, T., Zhu, M., Ni, P., Han, H., Dong, W., Ren, X., Feng, X., Cui, P., Li, X., Wang, H., Xu, X., Zhai, W.,
Xu, Z., Zhang, J., He, S., Zhang, J., Xu, J., Zhang, K., Zheng, X., Dong, J., Zeng, W., Tao, L., Ye, J.,
Tan, J., Ren, X., Chen, X., He, J., Liu, D., Tian, W., Tian, C., Xia, H., Bao, Q., Li, G., Gao, H., Cao, T.,
Wang, J., Zhao, W., Li, P., Chen, W., Wang, X., Zhang, Y., Hu, J., Wang, J., Liu, S., Yang, J., Zhang, G.,
Xiong, Y., Li, Z., Mao, L., Zhou, C., Zhu, Z., Chen, R., Hao, B., Zheng, W., Chen, S., Guo, W., Li, G.,
Liu, S., Tao, M., Wang, J., Zhu, L., Yuan, L. and Yang, H. (2002) A draft sequence of the rice genome
(Oryza sativa L. ssp. indica). Science 296, 7992.
714 References

Yu, J., Arbelbide, M. and Bernardo, R. (2005a) Power of in silico QTL mapping from phenotypic,
pedigree and marker data in a hybrid breeding program. Theoretical and Applied Genetics 110,
10611067.
Yu, J., Wang, J., Lin, W., Li, S., Li, H., Zhou, J., Ni, P., Dong, W., Hu, S., Zeng, C., Zhang, J., Zhang, Y.,
Li, R., Xu, Z., Li, S., Li, X., Zheng, H., Cong, L., Lin, L., Yin, J., Geng, J., Li, G., Shi, J., Liu, J., Lv,
H., Li, J., Wang, J., Deng, Y., Ran, L., Shi, X., Wang, X., Wu, Q., Li, C., Ren, X., Wang, J., Wang, X.,
Li, D., Liu, D., Zhang, X., Ji, Z., Zhao, W., Sun , Y., Zhang, Z., Bao, J., Han, Y., Dong, L., Ji, J., Chen,
P., Wu, S., Liu, J., Xiao, Y., Bu, D., Tan, J., Yang, L., Ye, C., Zhang, J., Xu, J., Zhou, Y., Yu, Y., Zhang,
B., Zhuang, S., Wei, H., Liu, B., Lei, M., Yu, H., Li, Y., Xu, H., Wei, S., He, X., Fang, L., Zhang, Z.,
Zhang, Y., Huang, X., Su, Z., Tong, W., Li, J., Tong, Z., Li, S., Ye, J., Wang, L., Fang, L., Lei, T., Chen,
C., Chen, H., Xu, Z., Li, H., Huang, H., Zhang, F., Xu, H., Li, N., Zhao, C., Li, S., Dong, L., Huang,
Y., Li, L., Xi, Y., Qi, Q., Li, W., Zhang, B., Hu, W., Zhang, Y., Tian, X., Jiao, Y., Liang, X., Jin, J., Gao,
L., Zheng, W., Hao, B., Liu, S., Wang, W., Yuan, L., Cao, M., McDermott, J., Samudrala, R., Wang,
J., Wong, G.K.-S. and Yang, H. (2005b) The genome of Oryza sativa: a history of duplications. PLoS
Biology 3, E38.
Yu, J., Pressoir, G., Briggs, W., Bi, I.V., Yamasaki, M., Doebley, J.F., McMullen, M.D., Gaut, B.S.,
Nielsen, D.M., Holland, J.B., Kresovich, S. and Buckler, E.S. (2006) A unified mixed-model
method for association mapping that accounts for multiple levels of relatedness. Nature Genetics
38, 203207.
Yu, J., Hollan, J.B., McMullen, M.D. and Buckler, E.S. (2008) Genetic design and statistical power of nested
association mapping in maize. Genetics 178, 539551.
Yu, J., Zhang, Z., Zhu, C., Tabanao, D.A., Pressoir, G., Tuinstra, M.R., Kresovich, S., Todhunter, R.J. and
Buckler, E.S. (2009) Simulation appraisal of the adequacy of number of background markers for rela-
tionship estimation in association mapping. The Plant Genome 2, 6377.
Yu, J.K., La Rota, M., Kantety, R.V. and Sorrells, M.E. (2004) EST-derived SSR markers for comparative
mapping in wheat and rice. Molecular Genetics and Genomics 271, 742751.
Yu, K., Park, S.J. and Poysa, V. (2000) Marker-assisted selection of common beans for resistance to com-
mon bacterial blight: efficacy and economics. Plant Breeding 119, 411415.
Yu, S.B., Li, J.X., Xu, C.G., Tan, Y.F., Gao, Y.J., Li, X.H., Zhang, Q.F. and Saghai Maroof, M.A. (1997)
Importance of epistasis as the genetic basis of heterosis in an elite rice hybrid. Proceedings of the
National Academy of Sciences of the United States of America 94, 92269231.
Yu, W., Andersson, B., Worley, K.C., Muzny, D.M., Ding, Y., Liu, W., Ricafrente, J.Y., Wentland, M.A.,
Lennon, G. and Gibbs, R.A. (1997) Large-scale concatenation cDNA sequencing. Genome Research
7, 353358.
Yu, W., Han, F., Gao, Z., Vega, J.M. and Birchler, J. (2007) Construction and behavior of engineered mini-
chromosomes in maize. Proceedings of the National Academy of Sciences of the United States of
America 104, 89248929.
Yuan, L.P. (1992) Development and prospects of hybrid rice breeding. In: You, C.B. and Chen, Z.L. (eds)
Agricultural Biotechnology. Proceedings of the Asian Pacific Conference on Agricultural Biotechnology.
China Agricultural Press, Beijing, pp. 97105.
Yuan, L.P. (2002) Future outlook on hybrid rice research and development. In: Abstracts of the Fourth
International Symposium on Hybrid Rice, 1417 May 2002, Hanoi, Vietnam. International Rice Research
Institute (IRRI), Manila, Philippines, p.3.
Yuan, L.P. and Chen, H.X. (eds) (1988) Breeding and Cultivation of Hybrid Rice. Hunan Science and
Technology Press, Changsha, China.
Yuan, Y., SanMiguel, P.J. and Bennetzen, J.L. (2003) High Cot sequence analysis of the maize genome.
The Plant Journal 34, 249255 (erratum: The Plant Journal 36, 430).
Zabeau, M. and Voss, P. (1993) Selective restriction fragment amplification: a general method for DNA
fingerprinting. European Patent Application. 92402629.7 (Publ. Number 0 534 858 A1).
Zale, J.M., Clancy, J.A., Ullrich, S.E., Jones, B.L., Hays, P.M. and the North American Barley Genome
Mapping Project (2000) Summary of barley malting QTL mapped in various mapping populations.
Barley Genetics Newsletter 30, 14.
Zamir, D. (2001) Improving plant breeding with exotic genetic libraries. Nature Reviews Genetics 2,
983989.
Zeng, R., Zhang, Z. and Zhang, G. (2000) Identification of multiple alleles at the Wx locus in rice using
microsatellite class and GT polymorphism. In: Liu, X. (ed.) Theory and Application of Crop Research.
China Science and Technology Press, Beijing, pp. 202205.
References 715

Zeng, Z.-B. (1993) Theoretical basis of separation of multiple linked gene effects on mapping quantita-
tive trait loci. Proceedings of the National Academy of Sciences of the United States of America 90,
1097210976.
Zeng, Z.-B. (1994) Precision mapping of quantitative trait loci. Genetics 136, 14571468.
Zeng, Z.-B. (1998) Mapping quantitative trait loci: interval mapping, composite interval mapping and mul-
tiple interval mapping. Summer Institute for Statistical Genetics, Module 7. Department of Statistics,
North Carolina State University, Raleigh, North Carolina.
Zenkteler, M. and Nitzsche, W. (1984) Wide hybridization experiments in cereals. Theoretical and Applied
Genetics 68, 311315.
Zhang, H.B. and Wing, R.A. (1997) Physical mapping of the rice genome with BACs. Plant Molecular
Biology 35, 115127.
Zhang, J., Chandra Babu, R., Pantuwan, G., Kamoshita, A., Blum, A., Wade, L., Sarkarung, S., OToole, J.C.
and Nguyen, N.T. (1999) Molecular dissection of drought tolerance in rice: from physio-morphological
traits to field performance. In: Ito, O., OToole, J. and Hardy, B. (eds) Genetic Improvement of Rice
for Water-limited Environments. International Rice Research Institute (IRRI), Manila, Philippines,
pp. 331343.
Zhang, J., Zheng, H.G., Aarti, A., Pantuwan, G., Nguyen, T.T., Tripathi, J.N., Sarial, A.K., Robin, S., Babu,
R.C., Nguyen, B.D., Sarkarung, S., Blum, A. and Nguyen, H.T. (2001) Locating genomic regions asso-
ciated with components of drought resistance in rice: comparative mapping within and across species.
Theoretical and Applied Genetics 103, 1929.
Zhang, J., Xu, Y., Wu, X. and Zhu, L. (2002) A bentazon and sulfonylurea sensitive mutant: breeding, genet-
ics and potential application in seed production of hybrid rice. Theoretical and Applied Genetics 105,
1622.
Zhang, J., Li, X., Jiang, G., Xu, Y. and He, Y. (2006) Pyramiding of Xa7 and Xa21 for the improvement of
disease resistance to bacterial blight in hybrid rice. Plant Breeding 125, 600605.
Zhang, J.F. and Stewart, J.McD. (2004) Semigamy gene is associated with chlorophyll reduction in cotton.
Crop Science 44, 20542062.
Zhang, L.P., Lin, G.Y., Nio-Liu, D. and Foolad, M.R. (2003) Mapping QTLs conferring early blight (Alternaria
solani) resistance in a Lycopersicon esculentum L. hirsutum cross by selective genotyping. Molecular
Breeding 12, 319.
Zhang, N., Xu, Y., Akash, M., McCouch, S. and Oard, J.H. (2005) Identification of candidate markers asso-
ciated with agronomic traits in rice using discriminant analysis. Theoretical and Applied Genetics 110,
727729.
Zhang, Q. (2007) Strategies for developing green super rice. Proceedings of the National Academy of
Sciences of the United States of America 104, 1640216409.
Zhang, Q. and Huang, N. (1998) Mapping and molecular marker-based genetic analysis for efficient hybrid
rice breeding. In: Virmani, S.S., Siddiq, E.A. and Muralidharan, K. (eds) Advances in Hybrid Rice
Technology. Proceedings of the Third International Symposium on Hybrid Rice, 1416 November 1996,
Hyderabad, India. International Rice Research Institute (IRRI), Manila, Philippines, pp. 243256.
Zhang, Q., Gao, Y.J., Yang, S.H., Ragab, R.A., Saghai Maroof, M.A. and Li, Z.B. (1994) A diallel analysis of hetero-
sis in elite hybrid rice based on RFLPs and microsatellites. Theoretical and Applied Genetics 89, 185192.
Zhang, Q., Gao, Y.J., Saghai Maroof, M.A., Yang, S.H. and Li, J.X. (1995) Molecular divergence and hybrid
performance in rice. Molecular Breeding 1, 133142.
Zhang, S., Raina, S., Li, H., Li, J., Dec, E., Ma, H., Huang, H. and Fedoroff, N.V. (2003) Resources for tar-
geted insertional and deletional mutagenesis in Arabidopsis. Plant Molecular Biology 53, 133150.
Zhang, W., McElroy, D. and Wu, R. (1991) Analysis of rice Act1 5' region activity in transgenic rice plants.
The Plant Cell 3, 11551165.
Zhang, X., Yazaki, J., Sundaresan, A., Cokus, S., Chan, S.W.-L., Chen, H., Henderson, I.R., Shinn, P.,
Pellegrini, M., Jacobsen, S.E. and Ecker, J.R. (2006) Genome-wide high-resolution mapping and func-
tional analysis of DNA methylation in Arabidopsis. Cell 126, 11891201.
Zhang, Y.M. and Xu, S. (2004) Mapping quantitative trait loci in F2 incorporating phenotypes of F3 progeny.
Genetics 166, 19811993.
Zhang, Y.M. and Xu, S. (2005) A penalized maximum likelihood method for estimating epistatic effects of
QTL. Heredity 95, 96104.
Zhang, Z., Bradbury, P.J., Kroon, D.E., Casstevens, T.M. and Buckler, E.S. (2006) TASSEL 2.0: a software
package for association and diversity analyses in plants and animals. Poster presented at Plant and
Animal Genomes XIV Conference, 1418 January 2006, San Diego, California.
716 References

Zhao, J.Z., Cao, J., Li, Y., Collins, H.L., Roush, R.T., Earle, E.D. and Shelton, A.M. (2003) Transgenic plants
expressing two Bacillus thuringiensis toxins delay insect resistance evolution. Nature Biotechnology
21, 14931497.
Zhao, M.F., Li, X.H., Yang, J.B., Xu, C.G., Hu, R.Y., Liu, D.J. and Zhang, Q. (1999) Relationships between
molecular marker heterozygosity and hybrid performance in intra- and inter-subspecific crosses of
rice. Plant Breeding 118, 139144.
Zhao, S. and Bruce, W.B. (2003) Expression profiling using cDNA microarray. In: Grotewold, E. (ed.)
Methods in Molecular Biology, Vol. 236. Plant Functional Genomics: Methods and Protocols. Humana
Press, Totowa, New Jersey, pp. 365380.
Zhao, W., Li, H., Hou, W. and Wu. R. (2007) Wavelet-based parametric functional mapping of developmen-
tal trajectories with high-dimensional data. Genetics 176, 18791892.
Zhao, Z., Wang, C., Jiang, L., Zhu, S., Ikehashi, H. and Wan, J. (2006) Identification of a new hybrid sterility
gene in rice (Oryza sativa L.). Euphytica 151, 331337.
Zheng, K., Qian, H., Shen, B., Zhuang, J., Liu, H. and Lu, J. (1994) RFLP-based phylogenetic analysis of
wide compatibility varieties in Oryza sativa L. Theoretical and Applied Genetics 88, 6569.
Zheng, X., Wu, E.J.G., Lou, X.Y., Xu, H.M. and Shi, C.H. (2008) The QTL analysis on maternal and
endosperm genome and their environmental interactions for characters of cooking quality in rice
(Oryza sativa L.). Theoretical and Applied Genetics 116, 335342.
Zhou, P.H., Tan, Y.F., He, Y.A., Xu, C.G. and Zhang, A. (2003) Simultaneous improvement of four quality traits
of Zhenshan 97, an elite parent of hybrid rice, by molecular marker-assisted selection. Theoretical
and Applied Genetics 106, 326331.
Zhu, C., Gore, M., Buckler, E.S. and Yu, J. (2008) Status and prospects of association mapping in plants.
The Plant Genome 1, 520.
Zhu, H., Bilgin, M. and Snyder, M. (2003) Proteomics. Annual Review of Biochemistry 72, 783812.
Zhu, J. and Weir, B.S. (1994) Analysis of cytoplasmic and maternal effects. II. Genetic models for triploid
endosperm. Theoretical and Applied Genetics 89, 160166.
Zhu, L., Xu, J., Chen, Y., Ling, Z., Lu, C. and Xu, Y. (1994) Location of unknown resistance gene to rice blast
using molecular markers (in Chinese). Science in China (Ser. B) 24, 10481052.
Zhu, Q., Maher, A., Masoud, S., Dixon, R.A. and Lamb, C.J. (1994) Enhanced protection against fungal
attack by constitutive co-expression of chitinase and glucanase genes in transgenic tobacco. Bio/
Technology 12, 807812.
Zhu, Y. Nomura, T., Xu, Y., Zhang, Y., Peng, Y., Mao, B., Hanada, A., Zhou, H., Wang, R., Li, P., Zhu, X.,
Mander, L.N., Kamiya, Y., Yamaguchi, S. and He, Z. (2006) ELONGATED UPPERMOST INTERNODE
encodes a cytochrome P450 monooxygenase that epoxidizes gibberellins in a novel deactivation
reaction in rice. The Plant Cell 18, 442456.
Zhu, Z.F., Sun, C.Q., Jiang, T.B., Fu, Q. and Wang, X.K. (2001) The comparison of genetic divergences
and its relationships to heterosis revealed by SSR and RFLP markers in rice (Oryza sativa L.). Acta
Genetica Sinica 28, 738745.
Zhuang, J.Y., Lin, H.X., Lu, J., Qian, H.R., Hittalmani, S., Huang, N. and Zheng, K.L. (1997) Analysis of
QTL environment interaction for yield components and plant height in rice. Theoretical and Applied
Genetics 95, 799808.
Zimmerli, L. and Somerville, S. (2005) Transcriptomics in plants: from expression to gene function. In: Leister,
D. (ed.) Plant Functional Genomics. Food Products Press, Binghamton, New York, pp. 5584.
Zivy, M., Joyard, J. and Rossignol, M. (2007) Proteomics. In: Morot-Gaudry, J.F., Lea, P. and Briat, J.F. (eds)
Functional Plant Genomics. Science Publishers, Enfield, New Hampshire, pp. 217244.
Zobel, R.W., Wright, M.J. and Gaugh, H.G., Jr (1988) Statistical analysis of a yield trial. Agronomy Journal
80, 388393.
Zou, F., Yandell, B.S. and Fine, J.P. (2001) Statistical issues in the analysis of quantitative traits in combined
crosses. Genetics 158, 13391346.
Zou, W. and Zeng, Z.-B. (2008) Statistical methods for mapping multiple QTL. International Journal of Plant
Genomics 2008, Article ID 286561. Available at: http://www.hindawi.com/journals/ijpg/2008/286561.
html (accessed 17 November 2009).
Zou, W., Aylor, D.L. and Zeng, Z.-B. (2007) eQTL Viewer: visualizing how sequence variation affects
genome-wide transcription. BMC Bioinformatics 8, 7.
Index

Ab initio gene prediction structure 471


exon prediction 423424 superbinary vector 471
gene modelling 424 transfer DNA (T-DNA) 470
Ac/Ds tagging system 446 vector backbone 470
Adaptor-ligated PCR 453 vector selection 472473
Advanced intercrossing lines (AILs) 251 Biochemical markers see Protein markers
Agrobiodiversity 151 Biolistics 464465
Allele diversity at orthologous candidate Bracket markers 295296
(ADOC) 172 Breeding efforts, public and private sectors 89
Allelic and genotypic frequencies concept 11 Breeding informatics
Amplified fragment length polymorphism bioinformatics
(AFLP) markers 2931, 231, 232 definition 551
Analysis of variance (ANOVA) 200201, 215, plant breeding 552553
231, 270, 281, 282, 394, 396, 397, 405, computer network 550
406, 606, 617 data collection procedures 554555
Average Environment Coordination (AEC) databases
view 399 concept 551552
definition 51
meta-databases 552
Backcross inbred line (BIL) 46, 47, 139, 140, 143 schema 551
Bayesian mapping and tools maintenance 595, 598
advantages 219 types 552
Bayesian shrinkage estimation (BSE) environmental information 561562
method 221222 gene ontology and plant ontology 598
model selection 222 genetic polymorphisms and gene
penalized maximum likelihood (PML) expression profiling 551
method 221223 genotypic information
statistics 219220 expression information 560
Bayesian model selection approach 268 molecular markers 558559
Best linear unbiased prediction (BLUP) sequences 559560
procedure 604 germplasm information
Binary vectors, gene transfer 469470 characterization 556
auxiliary plasmids 471 genealogy 556557
DNA fragments 470 genetic stocks 557558
gateway-based vector 472 Internet resources 556
modular vectors 471 passport data 556

717
718 Index

Breeding informatics (continued) nucleotide sequence


pedigree 556 databases 580583
information integration protein sequence database 581584
controlled vocabulary and rice databases 593594
ontology 564566 species-specific database 590
database integration 567568 transforming information to new
data standardization 563564 cultivars 554
generic databases 564 universal database 598
information system development 562 universal system, information management
interoperable query system 567 and data analysis 553554
molecular data types 562563 user-friendly 595
redundant data condensing 567 Breeding objectives 1719
tool-based information integration 568 Breeding programme
ways 563 adaptation and stability
information management systems adaptive response 412
AGROBASE Generation II 577 broad adaptation 412
breeding information management crossover interactions 411412
systems 574 environmental signals 413
comprehensive statistical analysis genetic basis 411
system 578 resistances/tolerance 412413
cross-connectivity dealing with GEI 410
capabilities 578579 measurement
Ensembl database schema 579 interaction, intermediate growth
GENEFLOW 578 stages 413
GERMINATE 577 multi-environmental testing 413
information infrastructure 572 unbalanced data 413414
international crop information QTL-by-environment interaction, MAS 414
system 574577 resource-limited environments 411
laboratory information management Breeding types 5
systems 573574 Bulked segregant analysis (BSA) 254, 255
pitfalls 572573
Plabsoft database 577
text mining 579 Cascading pedigree-based scheme 312314
information mining Cell totipotency 7, 156, 458
comparative informatics 571572 Chromosome walking technique
data mining 571 closest markers 435436
sequence similarity analysis 572 contig assembly 436
information retrieval difficulties 436
abstracts 569570 DNA markers 437
automated system 569 gene isolation 437
bibliographic databases 569 recombination 436
books and text-rich web sites Cleaved amplified polymorphic sequence
570571 (CAPS) technique 27, 28, 34
document search 568569 Clustering analysis 388389
full text, research articles 570 Composite interval mapping (CIM) 211, 240,
limitation 595 242, 246
phenotypic information 560561 basis 206
plant databases ICIM 208209
Arabidopsis thaliana likelihood analysis 206207
databases 591592 marker selection cofactor 207208
dendrome 595597 model 206
general genomics and proteomics Concatenated cDNA sequencing (CCS)
databases 584586 strategy 434
general plant databases 584, 587590 Conserved-intron scanning primers (CISP) 99
maize 595597 Consultative Group on International Agricultural
molecular biology databases 579, 580 Research (CGIAR) 499, 504, 518,
NCBI 579580 522, 546
Index 719

Conventional breeding manipulations 286 MAPPOP 607


Crop plant domestication MAPQTL 606607
centres of diversity 2 MCQTL 607
examples, plant species 2 QGENE 606
geographic origins 3 QTL CARTOGRAPHER 606
larger and diffuse areas 3 QTLNETWORK 607
selection process 1, 2 requirement 605606
three important steps 2 web-based tools 607
Crop species cultivation 2 linkage-disequilibrium based QTL mapping
Cross-hybridization 458 genome-wide association (GWA)
Cytological markers 2425 mapping 609
Cytoplasmic male sterility (CMS) 297, 350 integrated haplotype and LD
analysis 609610
LD-based QTL mapping 608609
Darwins Theory of Evolution 5 marker-assisted selection (MAS)
Decision support tools data analysis 613
analytical tools 623624 inbred and synthetic creation 614615
biological relationship graphs 625 methodologies and
breeding by design implementation 613614
approach 621 recombinants 613
breeding product prediction 622 microarray analysis 624
concept 621 molecular breeding
parental selection 621622 elements 599
selection method evaluation 622623 forward and reverse genetics
technological tools 621 approach 599, 600
breeding population management MAS and genetic transformation 599
heterotic patterns 603604 software 599600
hybrid performance 604605 ontological analysis 624625
comparative mapping and consensus maps simulation and modelling
Comparative Map and Trait Viewer gene databases 620621
(CMTV) software 611612 genetic models 616618
markertrait association studies 612 importance 615616
Rosetta Syllego System 612613 QUGENE 618619
systematic approach 612 QULINE 619620
eQTL mapping 607608 Degree of heterozygosity 7, 135, 375
genetic map construction 605 Deletion mutagenesis 450451
genotype-by-environment interaction Diversity array technology (DArT) 3941
analysis 610611 DNA amplification fingerprinting
germplasm management and evaluation (DAF) 27
genebank curators 601 see also DNA markers
GENE-MINE 602 DNA libraries 7071
genetic diversity 600601 DNA markers 288289
GIS and DNA marker data 601 amplified fragment length polymorphism
GRAPHICAL GENOTYPES (AFLP)
software 603 assays 30
marker-assisted germplasm evaluation flowchart 29
(MAGE) 601 primer 2930
molecular characterization 601602 selective PCR amplification 2829
POWERMARKER software 603 separation 30
single feature polymorphisms techniques 3031
(SFPs) 602 classification 41
STRUCTURE software 602603 comparison 44
Suite of Nucleotide Analysis Programs diversity array technology (DArT)
(SNAP) 603 advantages 41
linkage-based QTL mapping genotyping 40
Bayesian QTL mapping 607 microarrays 3940
epistasis 607 procedure 40
720 Index

DNA markers (continued) Donor parent (DP) 139141, 143145


functional markers Doubled haploids (DHs)
characterization 43 applications
development 4243 genomics 127129
Dwarf8 43 plant breeding 129130
gene targeted/genic markers (GTMs) DH lines evaluation
EST-SSR marker 42 epigenetic variation, tissue culture 126
versus genomic marker 4142 genetic variation, culture 125126
target region amplification polymorph- randomness 124
ism (TRAP) markers 42 somaclonal variation 124
and genomics 8 source plants, genetic variation 125
molecular mechanism 22, 23 diploidization, haploid plants 123124
molecular techniques 22 haploid production
random amplified polymorphic DNA (RAPD) anther culture/androgenesis 117,
advantages 27 120122
PCR-based markers 2728 chromosome/genome
principle 27 elimination 117119
reproducibility 28 hap allele 116
sequenced characterized amplified inducer-based approach 117, 122123
regions (SCAR) 28 ovary culture/gynogenesis 117, 119120
random markers (RMs) 41 parthenogenesis 117, 122
restriction fragment length polymorphism recalcitrant species 116
(RFLP) issues, future studies 131
analytical steps 2526 limitations 130131
comparative and synteny mapping 27 quantitative genetics
DNA digestion 25, 26 epistatic effect implication 127
molecular probes 26 expected gain, selection 126127
neutral variant site 25 selection pressure 146147
PCR-RFLP 27
restriction fragment size 26
workflow 26 Epistasis
simple sequence repeat (SSR) population strategies
characteristics 33 allelic relationships 269270
disadvantages 3334 QTL-by-genetic-background
expressed sequence tag (EST) 32 interaction tests 269
genotyping method 3233 triple testcross (TTC) design 270
genotyping system 33 statistical methods
libraries 31 Bayesian model selection approach 268
mutation mechanism 31 Markov chain Monte Carlo (MCMC)
screening steps 3132 sampling 268269
single nucleotide polymorphism (SNP)
allele-specific hybridization (ASH) 35
coding sequences 34 Factorial regression model 407408
C/T transitions 34 False discovery rate
detection systems 36 definition 246
discovery 3435 genome-wide error rate (GWER) 247
DNA array 3637 Food security 1819
invader assay 35 Fragaria ananassa 4
light detection 3839
mass spectrometry 37
multiplexed microarray system 39 Gamete eliminator 148
oligonucleotide ligation assay (OLA) 36 GenBank database 19
plate readers 36 Genealogy ontology 557
primer extension 3536 Gene introgression
rice 34 background selection
type 21 application 300
Donor genome content (DGC) 293, 294 carrier chromosome 305306
Index 721

cistrans configuration 299 positional cloning


concept 298299 chromosome walking see Chromosome
development, assumption 299300 walking technique
BC generation, DGC complementation tests 439
genome size effect 304305 fine mapping and candidate gene
markers and target gene 301 identification 439
QTL alleles 300301 map-based cloning, Arabidopsis
foreground selection 437, 438
bracket markers 295297 ORFX 439441
multiple markers 297 postmeiotic progeny 436
single markers 294295 preliminary genetic mapping 439
issues 293294 QTL effects 437438
linkage drag Gene pyramiding schemes
gene transgression, lettuce 303 cascading pedigree-based scheme 312314
recombination events 304 crossing and selection strategies
segment removal 302303 alleles enrichment 315316
traditional BC programme 303 biparental, back- and top-crosses 314
unlinked DNA removal 303 sequential culling 314315
multiple gene introgression 307308 definition 308310
whole genome selection 306307 different traits 316
Gene, isolation and functional analysis gene transmission probability 311
comparative approaches MARS versus genome-wide
experimental procedures 428429 selection 316318
genomic bases 426427 number of pedigrees 310311
QTL cloning 429431 pedigree height 310
definition 417 population size 311312
expressed sequence tags (ESTs) Gene targeted/genic markers (GTMs) 4142
directed EST screens 434 Gene transfer
full-length cDNA generation 432433 Agrobacterium-mediated transformation
full-length cDNA sequencing 433434 Agrobacterium strains 462
generation 431432 cereals 462463
gene expression analysis 454455 host plant genes 463464
genomic sequence 418 scheme 463
homologous probes core plant transformation facilities 461
cDNA library clones 456 electroporation
directed mutagenesis 456457 advantages 467468
functional genomics 456 transfection system 468
nucleotide sequence 456 transient expression 467
proteins and peptides 455456 expression vectors
in silico prediction binary plasmids 469
ab initio gene prediction 423424 binary vectors 470471
computational sequence analysis 419 cellular functions 468
evidence-based gene design and construction 469
prediction 419420 Focus Issue of Plant Physiology 470
homology-based gene gateway-based binary vectors 472
prediction 420423 pBin19 469
integrated methods 424425 superbinary and binary
intrinsic and extrinsic vectors 469470
methods 419420 transformation vector
protein function detection 425426 selection 472473
mutagenesis foreign DNA 461462
insertional mutagenesis 443448 genetic transformation 459461
knock-out mutagenesis 443 genomics 499500
mutagenic agents 442 in planta transformation 467
non-tagging mutagenesis 448451 particle bombardment
RNA interference 451 versus Agrobacterium-mediated
see also Mutagenesis transformation 467
722 Index

Gene transfer (continued) likelihood ratio and linkage test 53


diverse cell types, foreign DNA linkage mapping, genotyping errors 54
delivery 465 mapping populations 4548
gene gun and system 464 backcross (BC1) populations 4647
high molecular weight DNA F2 populations 46
delivery 466 maximum and minimum map
organelle transformation 466 distance 48
particle acceleration 464465 parental lines selection 4546
recalcitrant plant species 465 permanent populations 47
versus stable transformation 467 population size 4748
transgene integration 466467 and relationship 46
vectors 465466 map units 45
yeast artificial chromosome (YAC) maximum likelihood estimation (MLE),
DNA 466 recombinant frequency 5153
plant tissue culture 458459 log-likelihood 51
polyethylene glycol (PEG)-facilitated score 52
protoplast fusion 467 variance 52, 53
selectable marker gene molecular maps, plants
elimination, transgenic representative genetic maps 5456
plants 478480 sophistication 56
functions 473474 multi-point analysis and marker order 54
plant transformation 474477 plants 5456
positive selection 477478 segregation and linkage tests 4951
technology 500 2 tests 5051
transgene genotype and frequency 49, 50
confirmation and gene expression Mendelian segregation 4950
analysis 481484 theoretical segregation ratios 49, 50
expression 481 Genetic markers
inactivation 486487 criteria 21
integration 481 cytological markers
promoters 485486 application 25
reporter genes 484485 chromosome structure 24
stacking 487492 DNA markers
transgenic crop amplified fragment length poly-
commercialization 492499 morphism (AFLP) 2831
Genetic diversity 56, 41, 183, 374, 600 diversity array technology
germplasm (DArT) 3941
dissimilarity coefficients 177 functional markers 4243
factors, impact 175176 genic markers 4142
gene pools 174 molecular mechanism 22, 23
genetic similarity 176, 177 molecular techniques 22
germplasm classification 178180 random amplified polymorphic DNA
heterozygosity 176 (RAPD) 2728
maize 174, 175 restriction fragment length poly-
molecular markers 174 morphism (RFLP) 2527
phylogenetics 180 simple sequence repeat (SSR) 3134
polymorphism 176 single nucleotide polymorphism
taxa 176 (SNP) 3439
maize kernel phenotype 152, 153 type 21
Genetic engineering and gene transfer 78 genetic variation 21
Genetic linkage maps 40, 128 isozyme markers 25
interference and mapping functions 4849 morphological markers
coefficient of coincidence 48 genetic stocks and maps 24
crossover interference 49 Mendelian laws of inheritance
double crossover 48 22, 24
Haldanes map function 49 tissue culture and mutation
recombinant frequency 4849 breeding 24
Index 723

neutral DNA variation 21 Genomics


protein markers 25 cDNA sequencing process
Genetic transformation technologies advantages 7980
Agrobacterium tumefaciens cDNA libraries 80
methodology 524 complex eukaryotic genomes 81
biolistics 524 limitations 81
genes and DNA sequences collinearity
broad patents claims 527 implications 98100
Bt and cry genes 527528 macrocollinearity 9697
foreign genetic material 526 microcollinearity 9798
gene patent 526527 orthology and paralogy 9596
Golden Rice comparative maps 9495
deal 530531 genome organization
freedom-to-operate C-value see Genome size
challenges 529530 eukaryotes 68
Humanitarian Project 531 organisms DNA content 69
product clearance profile 529, 530 sequence complexity 70
regulations 531532 genome sequencing process
SGR2 531 ABI PRISM technology 75
tangible property 529, 531 Caenorhabditis elegans 73
vitamin A deficiency 529 capillary electrophoresis (CE) 75
patents 525 clone-by-clone or hierarchical
plant regeneration 524 sequencing strategies 7677
regulatory elements 528529 deoxynucleotide triphosphates
selectable marker genes 528 (dNTPs) 75
transformation method 525526 genome filtering strategies 7778
Genetic variation high-resolution charge-coupled device
crossover, genetic drift and gene flow 9 sensor 75
mutation 910 modified Sanger sequencing
Genetically modified plants method 74
barley 493 plant genomic sequences 7879
biotech/GM crops 493 shotgun sequencing strategy 7677
contribution 496 Solexa 76
developing countries 494 SOLiD system 76
economic benefits 494495 metabolomics
innovations 496 capillary electrophoresis (CE) 89
maize 494 definition 88
pesticide use 495 Fourier transform ion cyclotron reso-
regulation system 497 nance MS (FTICR-MS) 89
regulatory approval 495 gas chromatography 89
risk assessment 496497 HPLC 90
soybean 494 NMR spectroscopy 90
cereals 462463 orbitrap 89
maize 493 physical mapping
metabolomics 499 DNA libraries 7071
organelle transformation 466 five methods 73
pollen movement monitoring 498 high molecular weight DNA
rice 493, 498499 isolation 72
Setaria viridis 498 insert DNA, ligation 73
tobacco 461 large-insert cloning vectors 7172
see also Gene transfer proteomics
General combining ability (GCA) 15, 127, 367 post-translational modifications 8788
General microarray process 101 protein extraction 8384
Generation Challenge Programme (GCP) 172, protein identification and
556, 576 quantification 84
Genome size 6971, 79, 94 protein profiling 8485
Genome-wide error rate (GWER) 247 proteinprotein interactions 8587
724 Index

Genomics (continued) re-sequencing method 185, 186


trait extrapolation 94 single feature polymorphism (SFP) 185
transcriptomics 20, 8183 anatomic and quality characteristics 171
Genotype-by-environment interaction (GEI) 6 artificial/synthetic 158159
breeding classical germplasm
adaptation and stability 411413 definition 156
dealing ways 410 identification, agricultural
measurement 413414 ecosystem 158
QTL-by-environment interaction, collection redundancies and gaps 181182
MAS 414 collections, issues 161162
resource-limited environments 411 cryopreservation 168169
see also Breeding programme data collection standardization 190
crop improvement 382 ecosystem diversity, definition 151
E(NK) model 415416 enhancement
environmental characterization gene introgression 188
average linkage method 389390 purification, germplasm
categories of sites 387 collections 186187
clustering analysis 388389 tissue culture and
cultivar evaluation system 390391 transformation 187188
location selection 393394 generalized concept
mega-environments 387388 DNA 156
predictable factors 387 gene pools 155
sociological factors 386 genetic information 156
strategies 387 taxa 157
unpredictable factors 387 tissue culture and embryo rescue 155
see also Geographic Information genetic diversity
System (GIS) dissimilarity coefficients 177
genotype performance stability factors, impact 175176
GGE bi-plot analysis 398399 gene pools 174
linearbilinear models 396398 genetic similarity 176, 177
mixed model 400402 germplasm classification 178180
genotype ranking 381382 heterozygosity 176
molecular dissection maize 174, 175
environment partition 402404 molecular markers 174
MET and genotypic data 410 phylogenetics 180
QTL mapping see Quantitative trait polymorphism 176
locus (QTL) mapping taxa 176
multi-environment trials genetic drifts/shifts and gene flow
basic data analysis and genetic stability 183
interpretation 384386 heterozygosity 183
experimental design 383384 maize 183
Geographic Information System (GIS) population size 182
database view 391 storage effects 183
maize mega-environments, southern Africa genetic evaluation and utilization 171
characteristics 393 information integration and
drought related parameters utilization 190192
392393 information system 188189
germplasm development, SADC in situ and ex situ conservation 159160
region 392 in vitro evaluation 173174
Striga-prone areas 392 in vitro storage techniques
map view 391 advantages 167
model view 391 application, examples 168
wheat production environments 391392 disadvantages 168
Germplasm International Board for Plant Genetic
allele mining Resources (IBPGR) 168
EcoTILLING 185 plant tissue culture,
rate limiting factor 185 flowchart 166167
Index 725

marker-assisted germplasm evaluation High-input agriculture 4


(MAGE) 172173 High-throughput SNP genotyping system 292
rejuvenation and multiplication 170 Homology-based gene prediction
species descriptors 171 EST/cDNA databases 421
synthetic seeds and DNA storage 169170 homologous genomic sequences 422423
tissue specificity 171 protein sequence databases 421422
unique germplasm 184185 translated genomic sequence versus
Germplasm conservation 56 nucleotide database 422
GGE bi-plot analysis Hygromycin 476
Average Environment Coordination (AEC)
view 399
construction 398399 Inclusive composite interval mapping
mean versus stability 399 (ICIM) 208209
Glyphosate 477 Information integration
Green Revolution 1617 controlled vocabulary and
ontology 564566
database integration 567568
Haploid production data standardization 563564
anther culture/androgenesis generic databases 564
cytokinins 121 information system development 562
japonica cultivars 121 interoperable query system 567
microspore embryogenesis 120 molecular data types 562563
sucrose 121 redundant data condensing 567
temperature and light 121 tool-based information integration 568
chromosome/genome elimination ways 563
bulbosum method 117 Information management systems
maize pollen method 118 AGROBASE Generation II 577
mechanism 118119 breeding information management
somatic reduction 118 systems 574
wheat maize technique 118 comprehensive statistical analysis
hap allele 116 system 578
inducer-based approach 117, 122123 cross-connectivity capabilities 578579
ovary culture/gynogenesis 117 Ensembl database schema 579
embryogenic frequency 120 GENEFLOW 578
gynogenic haploids 120 GERMINATE 577
2-methyl-4-chlorophenoxyacetic acid information infrastructure 572
(MCPA) 119 international crop information
parthenogenesis 117, 122 system 574577
recalcitrant species 116 laboratory information management
HardyWeinberg equilibrium (HWE) 12 systems 573574
Heritability 1213 pitfalls 572573
Heterosis 6, 20 Plabsoft database 577
gene expression analysis text mining 579
allelic variation 369370 Information mining
cis-regulatory variation 370 comparative informatics 571572
expressed sequence tags (ESTs) 369 data mining 571
expression QTL (eQTL) 370 sequence similarity analysis 572
heterotic group construction Information retrieval
future aspects 374 abstracts 569570
hybrid performance 371372 automated system 569
molecular marker 372374 bibliographic databases 569
QTL books and text-rich web sites 570571
factors affecting 367368 document search 568569
overdominant (ODO) effects 368369 full text, research articles 570
rice, genetic analysis 368 Informative markers 290291
yield-related heterosis 368 Inserted gene sequence see Transgene
High-density molecular map 289 Insertional mutagenesis see Mutagenesis
726 Index

Intellectual property rights (IPR) International Convention for the Protection


aspects 502 of New Varieties of Plants see UPOV
criteria 501 convention
effects 501502 International Crop Information System
industrial property 502 (ICIS) 566, 574577, 590
literary and artistic property 502 International Wheat Genome Sequencing
modified living organisms 501 Consortium (IWGSC) 79
molecular breeding Interval mapping
genetic transformation assumptions 202203
technologies 524532 likelihood approach 203205
marker-assisted plant breeding 532534 Inverse polymerase chain reaction (IPCR) 452
product development and Invitrogen 472
commercialization 534535 Isozyme markers 25
national IPR laws 502
plant breeding
biotechnology techniques 503 Karyotypic markers 183
challenges 503 Kauffmanns landscape concept 415416
copyright and database protection 502 Kyoto Encyclopedia of Genes and Genomes
international treaties 503504 (KEGG) 584
public and right holder 502
seed-saving practices 503
technology 546547 Large-scale breeding activities 4
see also Plant variety protection (PVP) Linkage disequilibrium (LD) mapping
International Agreements, plant breeding allele non-random association 223
The 1983 International Undertaking on applications 231233
Plant Genetic Resources 514515 Bayesian methods 231
The 1992 Convention on Biological factors
Diversity (CBD) epistasis 227
objectives 515 founder effect 227
transfer of technology 515516 genetic drift 228
The 1994 TRIPS Agreement mating patterns 228
obligation, crop cultivars protection 516 migration 228
purpose and objective 516 mutation 227
sui generis system 516517 population structure 227
The 2001 International Treaty on Plant selection 227228
Genetic Resources for Food and markertrait association 224
Agriculture measurement
multilateral system (MS) 517 allele and haplotype frequencies 224
objectives 517 decay plots 225, 227
plant genetic resources for food and polymorphisms 225, 226
agriculture (PGRFA) 518 statistical significance 225
standard material transfer agreement mixed models 230231
(SMTA) 517518 principal component analysis 229230
UPOV convention quantitative inbred pedigree disequilibrium
breeders exemption 514 test (QIPDT) 231
breeders rights 513514 single nucleotide polymorphisms
definition 510511 (SNPs) 223
distinctness, uniformity and stability structured association 229
(DUS) 511512 transmission disequilibrium test (TDT)
essentially derived varieties/ and derivatives 228229
cultivars 512 Long-term selection
farmers privilege 512513 divergent selection rice
revisions 510 genetic fixation 334
see also Intellectual property rights (IPR); large-effect QTL 334335
Plant variety protection (PVP) transgression 334
International Agricultural Research Centres maize
(IARCs) 509, 546 epistasis 331
Index 727

gene frequency 330331 inbred and synthetic creation 614615


gene segregation 331 methodologies and
genetic interpretations 331 implementation 613614
marker-assisted evaluation 332334 recombinants 613
mutational variance 331332 DNA markers 287
plant breeding 334 environment-dependent traits
procedure 328330 biotic and abiotic stresses 355
selection limits 330 genic male sterility 354
selection responses 327 photoperiod/temperature
sensitivity 353354
gene introgression
Mapping approaches background selection 297300
fine mapping barley 363
advanced intercrossing lines BC generation, donor genome
(AILs) 251 content 300302
population size and mating design 252 carrier chromosome, background
recombinant inbred lines selection 305306
(RILs) 251252 drought tolerance 360361
southern leaf blight (SLB) disease 252 elite germplasm 359360
minor QTL mapping foreground selection 294297
heritability 253 genome size effect 304305
map distance 253 issues 293294
overshadow effect 252 linkage drag 301304
QTL threshold 253254 maize 362363
sample size 253 multiple gene introgression 307308
trait locus detection, marker rice 361362
method 253254 wheat 362
regional mapping whole genome selection 306307
bulked segregant analysis (BSA) 254 wild cultivated plants 358359
NIL versus RIL, QTL see also Gene introgression
mapping 254255 gene pyramiding
NILs 254 biotic stresses 364366
rice, candidate gene identification 255 crossing and selection
Marker-assisted plant breeding see strategies 314316
Marker-assisted selection (MAS) definition 308
Marker-assisted selection (MAS) different traits 316
application, bottlenecks disease resistance breeding 364
breeding programme 339 marker-assisted recurrent
cost-effective and high-throughput selection (MARS) 364, 367
genotyping systems 342343 MARS versus genome-wide
effective markertrait association 342 selection 316318
epistasis 344 see also Gene pyramiding schemes
genotype-by-environment genetic markers and maps
interaction 344 DNA markers 288
maize cultivar development 339340 foreground and background
phenotyping 343344 selection 289
released crop cultivars 340 high-density molecular map 289
sample tracking 344 simple sequence repeat (SSR)
costbenefit analysis markers 288289
conventional breeding versus single nucleotide polymorphism (SNP)
MAB 346347 markers 288
DNA markers 347 high-throughput SNP genotyping
MAS versus PS 346 system 292
molecular costs 345 hybrid prediction
data management and delivery 292293 conclusions and prospects 376377
decision support tools favourable allele combination 376
data analysis 613 genome-wide heterozygosity 374375
728 Index

Marker-assisted selection (MAS) (continued) without testcrossing/progeny


heterosis-associated markers 376 test 337338
see also Heterosis steps 532
IPR 532 testcrossing traits
long-term selection see Long-term selection cytoplasmic male sterility (CMS) 350
marker characterization fertility restoration 350
allele number 290 heterosis 352353
informative markers 290291 outcrossing 350352
polymorphism information content wide compatibility 352
(PIC) value 290 Markov chain Monte Carlo (MCMC)
marker-assisted breeding method 534 approach 209, 219, 220, 222, 223, 231,
markertrait associations, validation 268269
causes 291292 Material transfer agreements (MTA) 522
genotype-by-environment Mega-environments 387388
interaction 291 Mendelian genetics 5
monogenic and oligogenic traits 291 Molecular breeding tools see Genetic markers;
multiple allele interaction 291 Molecular maps; Omics and arrays
phenotypic scoring methods 292 Molecular maps
segregating and mapping genes 291 chromosome theory and linkage
microsatellite primers and PCR 532 crossing over 45
opportunities and challenges meiosis 43, 45
crop-specific issues 378379 recombination frequency 45
developing countries 380 genetic linkage mapping
genetic networks 379 interference and mapping
molecular tools and breeding functions 4849
systems 378 likelihood ratio and linkage test 53
quantitative traits 379 linkage mapping, genotyping errors 54
overview 533 map units 45
PCR product analysis 532 mapping populations 4548
gel electrophoresis 533 maximum likelihood estima-
mass spectrometry 533, 534 tion (MLE), recombinant
microarray analysis 533534 frequency 5153
properties 288 multi-point analysis and marker
quantitative traits, selection for order 54
genotypic selection 322323 plants 5456
index selection 320322 segregation and linkage tests 4951
integrated marker-assisted genetic map integration
selection 323324 conventional and molecular maps 56
marker scores 319320 genetic and physical maps 5758
phenotypic value 318319 multiple molecular maps 5657
seed and quality traits Morphological markers 22, 24, 56, 148, 188,
hybrid seed traits 356 196, 266
quality traits 356 Multi-auto transformation (MAT) vector sys-
seed traits 355356 tem 479, 480
seed DNA-based genotyping Multiple interval mapping (MIM)
advantages 347 genotypic values and variance
DNA extraction 347348 components 212213
genotype selection 348349 likelihood analysis 209211
issues, maize 349 pre-model selection 211
selection schemes stepwise selection analysis 211
early breeding stage 338 stopping rules 212
independent environments 338 Multiple quantitative trait locus model
multiple genes and multiple epistasis 218219
traits 338339 genetic variance 218
whole genome selection 339 parameters, class selection 217
without laborious field/intensive labo- partitioned heritability 218
ratory work 338 reality 217
Index 729

Mutagenesis Omics and arrays


gene isolation biochip or DNA microarray 100
adaptor-ligated PCR 453 comparative genomics
complementation test 453 collinearity 95100
co-segregation 452 comparative maps 9495
and functional analysis 454 trait extrapolation 94
gene-indexed mutations 453454 data acquisition and quantification 106
inverse PCR (IPCR) 452 experimental design 103104
mutant screening 452 functional genomics
plasmid rescue 452 metabolomics 8891
reverse genetic screen 453 proteomics 8388
thermal asymmetric interlaced (TAIL)- transcriptomics 8183
PCR 452453 hybridization and post-hybridization
insertional mutagenesis washes 105106
activation tagging 447448 in situ hybridization (ISH) 68
advantages 444 labelling process 104105
chromosome, tagging mass spectrometry (MS)
efficiency 443444 electrospray ionization 62
entrapment/enhancer/promoter FT-MS instrument 62
tagging 448 integrated liquid-chromatography
limitations 444 systems 62
retrotransposon tagging 446447 ionization techniques 62
T-DNA tagging 444445 mass analyser 61, 62
transposon tagging 445446 mass-to-charge ratio (m/z) 61
knock-out mutagenesis 443 matrix-assisted laser desorption/
mutagenic agents 442 ionization 62
mutant libraries 441442 multidimensional protein identifica-
non-tagging mutagenesis tion technology (MudPIT) 63
chemical agents 449 protein pre-fractionation 63
deletion mutagenesis 450451 phenomics
ionizing radiation 449 phenotypes, importance 92
point mutations 449450 plants 9293
transformation-mediated production of arrays
mutagenesis 448449 Affymetrix software suite 101
RNA interference 451 array content 103
gene-specific DNA 101
glass microscope slides 103
National Agricultural Research Institute source of arrays 102103
(NARI) 506, 508509 spotting pins 103
Near-isogenic lines (NILs) types of arrays 102
application 145 protein microarrays
backcrossing and genetic effects 139140 carbohydrates 108, 109
chromosome substitution 140 fluorescence detection 108
gene tagging strategy 142143 microfluidics 109
introgression line (IL) libraries quantitative real-time-PCR (QRT-PCR)
advantages 141 65, 67
Lycopersicon pennellii introgression sample preparation 104
lines 142 serial analysis of gene expression
rice introgression lines 141142 (SAGE) 65, 66
mutation 140 single feature polymorphism (SFP)
selfing-derived NILs 140 111, 112
theoretical considerations, genetic statistical analysis and data mining
mapping 143145 Affymetrix 108
whole genome selection, permanent Arabidopsis GeneChip 106
populations 140 co-expressed genes 107
Neutral markers 41 microarray analysis components 107
Non-tagging mutagenesis see Mutagenesis potential pre-processing steps 107
730 Index

Omics and arrays (continued) seed laws 521


structural genomics specifications 534
cDNA sequencing 7981 Substantive Patent Law Treaty
genome organization 6870 (SPLT) 509
genome sequencing 7379 sui generis PVP legislation 519
physical mapping 7073 trade secrets 523
suppression subtractive hybridization utility patent 519520
(SSH) 6768 Plant genetic resources
two-dimensional gel electrophoresis (2DE) agrobiodiversity 151
cell-surface proteome 61 association/linkage disequilibrium
isoelectric point and mass 60 mapping 193
laser-capture microdissection 61 core collections
lysate 60 accession selection, rice 164, 165
protein analysis 59, 60 examples 164, 165
silver nitrate and Coomassie blue gene-specific markers 166
stains 60 genetic markers 164
two-dimensional polyacrylmide genetic variation 163
gel electrophoresis (2D mini-core collections 164
PAGE) 6061 molecular markers 164, 166
zoom gels 61 objectives 163
universal chip or microarray 109, 110 POWERCORE program 164
whole-genome tiling microarrays selection, general procedure 163
(WGAs) 110, 111 Generation Challenge Programme 192
yeast two-hybrid system genetic erosion
DNA-binding domain 63, 64 deforestation and land clearance 153
5-FOA 63, 64 genetic drift and selection
open reading frames (ORFs) 65 pressure 153154
proteinprotein interactions 63 Green Revolution technology 153
Saccharomyces cerevisiae 64 rates of extinction 152
sensitivity 63 genetic vulnerability
URA3 reporter gene 64 Irish potato famine 154
Overshadow effect 252 types of uniformity, crop 155
germplasm
allele mining 185186
Parthenogenesis 117, 122 anatomic and quality
Partial least square (PLS) strategy 408 characteristics 171
Patents artificial/synthetic
agbiotech patent 546 germplasm 158159
criteria 501 classical germplasm 156, 158
description 519 collection, issues 161162
DNA sequences 526527 collection redundancies and
Golden Rice 529 gaps 181182
industrial property 502 cryopreservation 168169
intellectual property system 520 ecosystem diversity, definition 151
marker genes 528 enhancement 186188
marker-assisted selection 532 generalized concept 155157
members 516 genetic diversity 174180
microsatellite primers 532533 genetic drifts/shifts and gene
modified living organisms 501 flow 182184
national 534 genetic evaluation and utilization
plant breeding 502504 171
plant transformation method 525 information integration and
product development and utilization 190192
commercialization 534535 information system 188189
protection 520521 in situ and ex situ
PVP, USA 541542 conservation 159160
research exemption 509 in vitro evaluation 173174
Index 731

in vitro storage techniques 166168 seed saving 547548


marker-assisted germplasm evaluation strategies
(MAGE) 172173 biological protection 521
rejuvenation and multiplication 170 brands and trademarks 523
species descriptors 171 contract law 522523
standardization of data collection 190 patents 519521
synthetic seeds and DNA plant breeders rights 518519
storage 169170 seed laws 521522
tissue specificity 171 trade secrets 523
unique germplasm 184185 Point mutations 449450
Global Crop Diversity Trust 192 Pollen killer see Gamete eliminator
maize kernel phenotype, genetic diver- Polymorphism information content (PIC) 290
sity 152, 153 Polyploidy 5, 38, 93, 95, 125, 168, 224, 236, 437
species diversity 151 Populations
Plant variety protection (PVP) breeding methods 67
administrative challenges 544545 mean and variance 12
extension and enforcement 543544 properties and classification
genetic resources 546 diallel crosses 114115
herbal medicines 549 genetic background-based
impacts classification 114
agricultural production and trade 507 genetic constitution-based
breeding strategies 508 classification 113114
genetic diversity 506507 genetic maintenance-based
international agricultural research 509 classification 114
maintenance breeding 506 inbreeding populations 116
NARI organization 508509 natural cultivars 114
public breeding 507 North Carolina designs 115
technology 507508 triple testcross (TTC) and simplified
intellectual property rights see Intellectual TTC (STTC) 115116
property rights (IPR) Product development and
international agreements see International commercialization 534535
agreements, plant breeding Protein information resource (PIR) 583584
molecular techniques Protein markers 25
cultivar identification 539540
DUS testing 536537
essentially derived varieties Qualitativequantitative traits 260261
(EDV) 537539 Quantitative genetics 11, 115, 116, 195, 242,
molecular markers 535536 378, 415, 599, 615, 618
seed certification 540 doubled haploids (DHs)
seed purification 540541 epistatic effect implication 127
SNP technology 535 expected gain, selection 126127
SSR 535 genotype-by-environment interaction 6
needs Quantitative/real-time RT-PCR (QRT-PCR) 482
benefits 506 Quantitative trait locus (QTL) mapping 554,
breeders rights, exercise of 505506 559, 568, 576, 578, 590
business 504 allele dispersion, screening 255258
market demands 504505 Bayesian mapping
patent 504 advantages 219
product development 504, 505 Bayesian shrinkage estimation (BSE)
public and private-sector plant method 221222
breeding 505 model selection 222
practice penalized maximum likelihood (PML)
Canada 542 method 221223
developing countries 542543 statistics 219220
European Community (EU) 541 bulked DNA analysis
participatory plant breeding (PPB) 543 detection power and mean LOD
USA 541542 score 280, 281
732 Index

Quantitative trait locus (QTL) mapping (continued) genetic backgrounds


entire population genotype epistasis 267270
replacement 282 heterogeneous 265267
genome-wide association mapping 283 homogeneous 264265
major gene-controlled traits 278, 279 multiple alleles 270
phenotypic variance 281282 see also Epistasis
problems 279280 genome-wide threshold 244, 245
quantitative traits 278279 growth and developmental stages
selective phenotyping dynamic mapping 271273
method 283285 dynamic traits 271
simulation studies 280 hitchhiking effect 197
trait-specific genetic mapping 282283 in silico mapping
complicated traits mixed-model approach 238
correlated traits 258259 pros and cons 237238
qualitativequantitative traits 260261 statistical power 238239
seed traits 261262 interval mapping
trait components 258 assumptions 202203
composite interval mapping (CIM) likelihood approach 203205
basis 206 linkage disequilibrium (LD) mapping
ICIM 208209 allele non-random association 223
likelihood analysis 206207 applications 231233
marker selection, cofactor 207208 Bayesian methods 231
model 206 factors 226228
confidence interval 243244 markertrait association 224
cross validation (CV) and sample size measurement 224226
genetic variance 243 mixed models 230231
heritability 243 principal component analysis 229230
Pioneer Hi-Bred, maize 242 quantitative inbred pedigree disequi-
dynamic mapping librium test (QIPDT) 231
advantage 272 single nucleotide polymorphisms
biological and statistical (SNPs) 223
advantages 272273 structured association 229
effect-accumulation analysis 271 transmission disequilibrium test (TDT)
effectincrement analysis 271272 and derivatives 228229
high-dimensional data 273 mapping approaches
multivariate analysis 272 fine mapping 251252
QTL effects 272 minor QTL mapping 252254
reversely-jump Markov chain Monte regional mapping 254255
Carlo (RJ-MCMC) 273 see also Mapping approaches
rice IR64/Azucena DHs 272 meta-analysis
time-dependent effects 273 examples 236237
experimental designs 214 QTL effects 235236
false discovery rate QTL locations 234
definition 246 QTL maps 234235
genome-wide error rate (GWER) 247 standard statistical principles 233
GEI multiple crosses
AMMI value assessment 408409 ANOVA 215
factorial regression model 407408 cross interaction and epistatic
fibre quality 405 interaction, genotype 214215
grain yield components 405 disadvantages 215
inconsistent QTL detection 405 identity by descent (IBD) 215
inflorescence development mid-parent heterosis (MPH) 215
patterns 409 triple testcross (TTC) design 216
mixed models 406407 multiple interval mapping (MIM)
QTL nitrogen interactions 409410 genotypic values and variance
QTL-sharing frequencies 404405 components 212213
structural equation model 408 likelihood analysis 209211
test environment 409 pre-model selection 211
Index 733

stepwise selection analysis 211 56, 57, 128, 129, 136, 164, 165, 173, 181,
stopping rules 212 183, 184, 232, 242, 254, 264, 266, 290,
multiple QTL see multiple quantitative trait 299, 373, 377, 403, 430, 551, 554, 612
locus model Retrotransposon tagging 446447
multiple traits, gene expression Reversed breeding-to-genetics see Long-term
cis acting eQTL 275 selection
eQTL hotspot detection 275 RM190 289
non-additivity of transcription 274 RNA interference (RNAi) 451, 473
polygenic transcriptional
variation 274, 276
tissue transcript variation 274 Segregation distortion, genetic control 148149
transcript level, genetic Selectable marker gene
complexity 274 elimination, transgenic plants
permutation and thresholds 244246 co-transformation 478
pooled analysis 216217 homologous recombination 480
power and sample size positive markers 480
additive effect 240 recombination 479
dominance effect 241 transposons 479480
false positive and negative functions 473474
errors 239240 plant transformation 475
linkage effect 241242 antibiotic resistance genes 476
marker effects 240 CaMV 35S transcript 475476
separation 249255 classes 474
single marker-based approaches engineering herbicide
analysis of variance detoxification 476477
(ANOVA) 200201 5-enol-pyruvylshikimate-3-phosphate
assumptions 197198 synthase 477
backcross (BC) design 199 glutamine synthase 476
F2 design 199200 herbicide tolerance genes 476
likelihood approach 201202 selection 474475
regression approach 201 3 signal 476
thresholds, interval mapping 244 positive selection
carbohydrate 477478
b-glucuronidase (GUS) 477
Random amplified polymorphic DNA (RAPD) phosphomannose isomerase (PMI) 478
markers 2728 versus negative selection marker 477
Recombinant inbred lines (RILs) 251252 types 477
genetic map construction 136 Selective breeding 3
inbreeding and genetic effects Selective restriction fragment amplification
continuous inbreeding 132, 133 (SRFA) 29
homozygosity 131, 132 Self-pollinated species 7
mean value and variance 133, 134 Semigamy see Parthenogenesis
intermated RILs 136137 Sequence tagged site (STS) markers 289
map distance and recombinant Simple sequence repeat (SSR) markers 3134,
fraction 135136 288289
multi-way/nested RIL populations 137138 Single marker-based approaches
selection pressure 147148 analysis of variance (ANOVA) 200201
single seed descent (SSD) method assumptions 197198
advantages and disadvantages 135 backcross (BC) design 199
multiple-seed procedure 134135 F2 design 199200
single-hill procedure 134 likelihood approach 201202
single-seed procedure 133134 regression approach 201
Recombination frequency 145146 Single markers 294295
Recurrent parent (RP) 139145, 439, 440 Single nucleotide polymorphism (SNP)
Recurrent selection 1516 markers 3439, 288
Recurrent selection method 7 Somaclonal variation 7
Restriction fragment length polymorphism Southern African Development Community
(RFLP) markers 8, 2527, 29, 30, 34, 45, (SADC) countries 392
734 Index

Southern leaf blight (SLB) disease 252 b-glucuronidase (GUS) 484485


Specific combining ability (SCA) 15, 367 gene regulation and function
Standard Material Transfer Agreement (SMTA) 517 analysis 484
Streptomyces hygroscopicus 474, 476 green fluorescent protein (GFP) 485
Structural equation model 408 luciferase 485
Sui generis system 502 properties 484
PVP legislation 519 stacking
TRIPS Agreement 516517 co-transformation via particle
Swiss-PROT database 583584 bombardment 490492
System-wide Information Network for Genetic co-transformation via plasmids 490
Resources (SINGER) 566 multiple gene transfer 487488
multi-transgene-stacking method 489
sexual crosses 488, 490
Thermal asymmetric interlaced (TAIL)- transgenic crop commercialization
PCR 452453 biotech/GM crops 493496
Thermal-sensitive genic male sterility (TGMS) 297 commercial targets 492493
Trade-Related Aspects of Intellectual Property monitoring transgenes 498499
Rights (TRIPS) product release and marketing
crop cultivars 501, 503 strategies 498
international trade 507 regulation system 497
obligation, crop cultivars protection 516 risk assessment 496497
plant variety protection, developing Transgenic breeding 499500 see also Gene
countries 542543 transfer
purpose and objective 516 Translation of EMBL nucleotide sequence
sui generis system 516517 database (TrEMBL) 583584
Transcriptomics 20 Transposon tagging system
Transformation-mediated mutagenesis 448449 Ac/Ds tagging system 446
Transgene class II transposons 445446
confirmation DNA elements 445
Northern blotting 482 maize transposon Ac 446
PCR method 482 retroelements 445
Southern blotting 482 T-DNA insertions 446
Western blotting 482 transposable elements 445
expression 481
gene expression analysis
Agrobacterium-mediated Unweighted pair-group method using
transformation 483484 arithmetic averages (UPGMA)
full-length cDNA 483 method 389390
gene silencing 482483 UPOV convention
Mendelian segregation 482 breeders exemption 514
variation 482 breeders rights 513514
virus-based vectors 483 definition 510511
inactivation distinctness, uniformity and stability
action 487 (DUS) 511512
causes 486487 essentially derived varieties/cultivars 512
minimization 487 farmers privilege 512513
signals 487 plant breeders right 518519
integration 481 revisions 510
promoters 485486 rural sector 507
constitutive transgene updated 545546
expression 485486
non-constitutive transgene
expression 486 Yeast artificial chromosome (YAC) 466
reporter genes
anthocyanin biosynthetic pathway
genes 485 Zip-code arrays 109, 110

You might also like