Professional Documents
Culture Documents
Protein function
in the post-genomic era
David Eisenberg, Edward M. Marcotte, Ioannis Xenarios & Todd O. Yeates
Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, Box 951570, University of California at
Los Angeles, Los Angeles, California 90095-1570, USA (e-mail: david@mbi.ucla.edu)
Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists
are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the
thousands of sequences being determined and deposited, and the thousands of expression profiles being
generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge
take? These are questions being addressed by scientists in the field known as functional genomics.
Box 1
The method of phylogenetic profiles
Genomes
P2
P3
P5
P1 P2 P4 P5 P7
P6
B. burgdorferi (BB) P7
S. cerevisiae (SC)
P1
P2
P3
E. coli (EC)
P5
P1
P6
P3
P5
P7
P4
H. pylori (HP)
P6
Profile clusters
P4
Phylogenetic profile
EC
P2
P7
SC BB HP
P1
P2
P3
P4
P5
P6
P7
Conclusion
P1
P3
P6
0
0
1
1
1
1
1
1
0
0
P5
1
1
823
insight progress
RAD7
a
SNF5
Silencing
proteins
Histones
ZDS2
RFC5
RFC4
RFC3
RFC2
YNL218W
H2A1
GTP-binding and
-stimulating proteins
GSP1
HTN1
PSP2
MOG1
SIR3
GSP2
NAP1
SIR4
H3
Nucleosome
formation
H2B1
SIR2
CRM1
YNLO78W
TOF2
SRP1
YOL070C
YER152C
SXM1
H4
HST3
HST2
HST1
Nuclear import
KCC4
RIF1
Septin formation
Protein targeting
Other
Phylogenetic profiles
Experimental
>2 prediction methods
CCT6
Sup45
CCT4
Sup35
CCT7
CCT5
CCT3
CCT2
CSE1
RAP1
HST4
insight progress
Box 2
The Rosetta Stone method for detecting functional linkage
General concept
Rosetta Stone in organism 1
Protein A
in organism 2
Protein B
in organism 2
A
A
B
B
C. elegans
Ade 5,7,8
Yeast Pur2
Yeast Pur3
E. coli TrpC
Yeast TrpG
Yeast TrpF
the same phylogenetic profile (that is, the same pattern of presence or
absence) in all surveyed genomes, it is inferred that the two proteins
have a functional link. That is, why would two proteins always both be
inherited into a new species, or neither inherited, unless the two
function together? The power of the method to detect functional
linkage can be appreciated when the number of possible phylogenetic profiles is considered: because each protein can be either present or
absent in each genome, if there are n fully sequenced genomes, there
are up to 2n phylogenetic profiles. Currently there are about 30 fully
sequenced genomes in the public domain, meaning there are 230
(~109) possible phylogenetic profiles. This number far exceeds the
number of protein families, so that a proteins phylogenetic profile is
a nearly unique characterization of its pattern of distribution among
genomes. Hence any two proteins having identical or similar
phylogenetic profiles are likely to be engaged in a common pathway
or complex.
Functional linkages between proteins have also been detected
by analysing fusion patterns of protein domains (Box 2). Not
infrequently, separate proteins A and B in one organism are expressed
as a fused protein in some other species. When expressed as a fused
protein, the two domains A and B are almost certainly linked in
function. Thus a successful search through other genome sequences
for the corresponding fused protein is powerful evidence that A and B
are linked functionally. Because A and B have unrelated sequences,
this type of functional linkage cannot be detected by a homology
search. Also, because the fused protein has similarity to both A and B,
it is termed a Rosetta Stone sequence12.
A third computational method that reveals functional linkages
from genome sequences is the gene neighbour method13,14. If in
several genomes the genes that encode two proteins are neighbours
on the chromosome, the proteins tend to be functionally linked. This
method can be powerful in uncovering functional linkages in
prokaryotes, where operons are common, but also shows promise for
analysing interacting proteins in eukaryotes (Box 3).
from yeast, centred on the yeast prion protein Sup35. In this network,
most of the links are predicted by phylogenetic profiles, the Rosetta
stone method and mRNA expression patterns. Sup35 is known to
regulate translation, and it is therefore of interest that most of the
predicted linkages are to other proteins involved in protein synthesis,
folding and targeting. This indicates that at least some of the predicted links are meaningful. As methods improve for detecting protein
linkages, it seems likely that most yeast proteins will be included in
expanded versions of the networks of Fig. 1. A central feature of these
networks is that most proteins interact with several other proteins.
Genome 1
Genome 2
Genome 3
. . .
Functional networks
When methods for detecting functional linkages are applied to all the
proteins of an organism11,15, networks of interacting, functionally
linked proteins can be traced out. Two examples from yeast are given
in Fig. 1. Figure 1a shows interactions among histones and related
proteins such as silencing proteins. These were determined mostly by
experiments, but some links were predicted by the Rosetta Stone
method and by phylogenetic profiles. Some of the links are to
proteins known only from their genome sequences, and without
other functional information; their linkage to this network indicates
an intimate functional interaction among proteins involved in gene
silencing, DNA packaging and nuclear transport.
Figure 1b shows a second network of functionally linked proteins
NATURE | VOL 405 | 15 JUNE 2000 | www.nature.com
825
insight progress
B
A
X
S
C
Y
P
V
Figure 2 The evolution of the meaning of protein function. The traditional view is
illustrated on the left, and the post-genomic view on the right.
826
1. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res. 25, 33893402 (1997).
2. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform
for investigating biology. Science 282, 20122018 (1998).
3. Chervitz, S. A. et al. Comparison of the complete protein sets of worm and yeast: orthology and
divergence. Science 282, 20222028 (1998).
4. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genomescale analysis of protein functions and evolution. Nucleic Acids Res. 28, 3336 (2000).
5. Uetz, P. et al. A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae.
Nature 403, 623627 (2000).
6. Ito, T. et al. Toward a protein-protein interaction map of the budding yeast: a comprehensive system
to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc.
Natl Acad. Sci. USA 97, 11431147 (2000).
7. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression
profiling. Nature 403, 503511 (2000).
8. Lashkari, D. A. et al. Yeast microarrays for genome wide parallel genetic and gene expression
analysis. Proc. Natl Acad. Sci. USA 94, 1305713062 (1997).
9. Brown, P. O. & Botstein, D. Exploring the new world of the genome with DNA microarrays. Nature
Genet. 21, 3337 (1999).
10. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genomewide expression patterns. Proc. Natl Acad. Sci. USA 95, 1486314868 (1998).
11. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O. & Eisenberg, D. A combined
algorithm for genome-wide prediction of protein function. Nature 402, 8386 (1999).
12. Marcotte, E. M. et al. Detecting protein function and protein-protein interactions from genome
sequences. Science 285, 751753 (1999).
13. Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins
that physically interact. Trends Biochem. Sci. 23, 324328 (1998).
14. Overbeek, R., Fonstein, M., DSouza, M., Pusch, G. D. & Maltsev, N. The use of gene clusters to infer
functional coupling. Proc. Natl Acad. Sci. USA 96, 28962901 (1999).
15. Enright, A. J., Iliopoulos, I., Kyrpides, N. C. & Ouzounis, C. A. Protein interaction maps for
complete genomes based on gene fusion events. Nature 402, 8690 (1999).
16. Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. & Yeates, T. O. Assigning protein
functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA
96, 42854288 (1999).
17. Andrade, M. A. & Valencia, A. Automatic extraction of keywords from scientific text: application to
the knowledge domain of protein families. Bioinformatics 14, 600607 (1998).
18. Kim, S. H. Structural genomics of microbes: an objective. Curr. Opin. Struct. Biol. (in the press).
19. Xenarios, I. et al. DIP: the Database of Interacting Proteins. Nucleic Acids Res. 28, 289291 (2000).
20. Wickner, R. B. [URE3] as an altered URE2 protein: evidence for a prion analog in Saccharomyces
cerevisiae. Science 264, 566569 (1994).
21. Bork, P. et al. Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707725 (1998).
22. Huynen, M., Dandekar, T. & Bork, P. Differential genome analysis applied to the species-specific
features of Helicobacter pylori. FEBS Lett. 426, 15 (1998).
23. Hegyi, H. & Gerstein, M. The relationship between protein structure and function: a comprehensive
survey with application to the yeast genome. J. Mol. Biol. 288, 147164 (1999).
24. Ouzounis, C. & Kyrpides, N. The emergence of major cellular processes in evolution. FEBS Lett. 390,
119123 (1996).
25. Gaasterland, T. & Ragan, M. A. Constructing multigenome views of whole microbial genomes.
Microb. Comp. Genomics 3, 177192 (1998).
26. Wu, Q. & Maniatis, T. A striking organization of a large family of human neural cadherin-like cell
adhesion genes. Cell 97, 779790 (1999).