You are on page 1of 10

Genetic Code: Introduction

Kimitsuna Watanabe, University of Tokyo, Tokyo, Japan


The genetic code establishes the relationship between all 64 possible arrangements of the four nucleotide bases contained in either DNA (A, T, G and C) or RNA (A, U, G and C) and the 20 amino acids that are used to construct proteins. The historical events that contributed to the deciphering of the genetic code led to the development of the field of molecular biology.

Introductory article
Article Contents
. Historical Background to Breaking the Code . Topology of the Code as Revealed by Frameshift Mutations . The Code Decoded Using Protein Synthesis In Vitro . Termination and Initiation Codons . In Vivo Code . The Universal Genetic Code

Historical Background to Breaking the Code


The DNA double helix model proposed by James Watson and Francis Crick in 1953 demonstrated that a gene consists of a stretch of double-stranded DNA (deoxyribonucleic acid), which can replicate itself according to a complementary base-pairing rule. This landmark discovery immediately raised the question as to how DNA, which constitutes the gene, is able to direct the production of proteins: that is, how information stored in nucleic acid is expressed as information contained in proteins. Because this was essentially a coding problem, the correlation between nucleic acid bases and the corresponding amino acid in the protein was named the genetic code. The eld of molecular biology can be said to have developed largely as a result of the dedicated work that went into elucidating this code. Shortly after the structure of the DNA double helix was deduced, a theoretical physicist, George Gamow, arrived at a hypothesis that attempted to resolve the coding problem. He postulated that in protein synthesis each amino acid is inserted into a specic cavity created around a complementary pair of nucleotide bases, and polymerization then takes place along and around the rotational axis of the DNA double helix. Gamow envisaged the cavities for amino acid insertion as being diamond-shaped, bounded by a base at each of the four corners. If we consider the helix to be vertically oriented, then the left and right corners of the diamond comprise a complementary base pair either adenine (A) and thymine (T), or guanine (G) and cytosine (C). The bases at the top and bottom corners are on the same strand and consist of one base from the adjacent base pairs immediately above and below the complementary pair. Dierent combinations of top and bottom bases with the left and right complementary pairs would give a total of 20 cavity variations, each being specic for the insertion of one of the 20 amino acids used as building blocks for proteins. Although Gamows hypothesis contributed to the eventual deciphering of the genetic code by introducing the concept that the nucleic acid sequence acts as a template for assembling amino acids into proteins, it was soon

. Progress after the Cold Spring Harbor Symposium

disproved by several experimental observations. First, because DNA has a rotational axis perpendicular to the double helix axis it has no specic direction, which means the sequence orientation cannot be determined. Second, it seemed highly unlikely that diamond-shaped cavities could structurally dierentiate 20 amino acids with various sidechains. Third, experimental evidence already existed that proteins are produced in the cytoplasm, not in the nucleus where DNA is stored. Fourth, a number of scientists suspected that RNA (ribonucleic acid) is a direct mediator of the transfer of genetic information from DNA to proteins. Fifth, adjacent diamond-shaped cavities shared a nucleotide between them. This overlapping arrangement would inevitably limit which amino acids could adjoin each other sequentially in synthesized proteins; however, no such constraints were apparent. Around the same time, Gamow also founded RNA Tie Club, the aim of which was to solve the riddle of RNA structure and to understand the way it builds proteins. The club had 20 regular members, each of whom wore an RNA necktie representing one of the 20 amino acids used in proteins (Ala was assigned to Gamow and Tyr to Crick). Members presented unpublished reports, known as Notes for the RNA Tie Club, that examined a variety of ideas and working hypotheses, many of which were later substantiated and greatly inuenced experimental work. The systematic study of the genetic code that began with the activities of the RNA Tie Club eventually led to the concept of the triplet codon and its experimental verication. In 1956, Crick postulated the comma-less code, in which each amino acid is specically assigned to a corresponding codon and the triplets do not overlap. Such a code is nondegenerate (i.e. each amino acid is encoded by only one codon). To explain the correlation between the resulting 64 codons and the 20 amino acids used in proteins, Crick dierentiated a set of 20 sense codons corresponding to amino acids; the remaining 44 triplets, which were considered not meaningful, were termed nonsense codons. First, the four single-base triplets (UUU, CCC, AAA and GGG) were classied as nonsense
1

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

codons because tandem repeats of these codons would not allow coding for tandem sequences of the same amino acid. Among the remaining 60 codons, each unique triplet was classied as a sense codon and the two derivative codons that would occur when the triplet sequence was repeated were deemed nonsense codons. For example, taking UAC as the unique triplet, repeating this sequence gives UACUAC. In this case, the codons ACU and CUA that occur in the sequence are derivative, and thus nonsense, codons. This process results in just 20 sense codons, which matches the total number of amino acid contained in proteins. Crick published the above proposal in 1958 in a paper entitled On protein synthesis, in which two other important hypotheses were also discussed. One was the sequence hypothesis, which asserted that the specicity of nucleic acids is entirely expressed in the nucleotide sequence, and that this sequence serves as the signal to determine the amino acid sequence. This apparent colinearity between nucleotide sequences of DNA and amino acid sequences of proteins was experimentally supported by studies on sickle cell anaemia carried out by Linus Pauling and Vernon Ingram, as well as by Seymour Benzers research on the detailed gene structure of T4 phage (a bacteriophage that infects the bacterium Escherichia coli) and the relationship between its mutant and altered mutant proteins. The second hypothesis proposed by Crick was what he termed the Central Dogma of molecular biology, which stated that genetic information ows unidirectionally from DNA to protein via RNA as an intermediary. Thus proteins are the nal products of gene expression. This idea served as the guideline for subsequent studies on gene expression, particularly in elucidating the molecular mechanism of protein synthesis.

mutant that has reverted to the wild type) could grow on a medium containing either E. coli strain B or strain K12. By this selective plating technique, a T4 phage mutant and its revertant can easily be distinguished. In their experiments, Crick and colleagues used proavin, a mutagen that has the ability to induce base insertions and/or deletions during DNA replication by intercalating into adjacent bases. They rst used proavin to induce a mutation in the T4 phage rII cistron (a cistron is a region of DNA similar to a gene); the resultant mutant (called the frameshift mutant) was named strain FC0. By further treating the frameshift mutant with proavin, a revertant (called the suppressor mutant) was obtained. Genetic analysis revealed that this suppressor mutant possessed a second mutation very close to the mutation site of strain FC0 (Figure 1). When the rst mutation is an insertion ( 1 ) and the second is a deletion ( 2 ) at a point very near the rst mutation site, then the revertants suppressive mutation is ( 1 , 2 ) since the original coding frame is restored by the second mutation. This type of suppressor mutant is referred to as an insertion-suppressed revertant strain. Crick and co-workers isolated various other deletion mutants, such as FC7 and FC9, by recombining the insertion-suppressed revertant strain with the wild type. Similarly, various insertion mutants were obtained by recombining a deletion-suppressed revertant strain with the wild type. These isolated mutant strains were then further recombined in various ways to construct a series of recombinant strains with double and triple mutations, which were observed to have the following characteristic features: 1. No rIIB cistron activity was observed in either the ( 1 ) or ( 2 ) single-mutation strains. However, recombinant strains with ( 1 , 2 ) or ( 2 , 1 ) mutations recovered the rIIB phenotype, but only when the loci of the two mutations were very close to each other. 2. Recombinant strains with ( 2 , 2 ) or ( 1 , 1 ) mutations were inactive, but triple mutants with mutations of the same type, i.e. ( 1 , 1 , 1 ) or ( 2 , 2 , 2 ), recovered the rIIB phenotype. Recombinant strains with triple mutations of dierent types, for example ( 1 , 1 , 2 ) or ( 1 , 2 , 2 ), showed no rIIB activity. These results can be interpreted by referring to Figure 2. The messenger RNA (mRNA) is read with the dened reading frame up to the rst mutation point, but from there the frame is shifted one base downstream or upstream by a ( 1 ) or ( 2 ) mutation, respectively. If the second mutation that suppresses the rst occurs downstream in the close vicinity of the rst mutation point, the correct reading frame is restored from this second mutation point. If the region between the rst and second mutations (in which the reading frame is shifted) is short, the mutation combination has a minimum eect on the protein synthesis process and functional proteins are produced; a mutant with this combination shows the wild-type phenotype. This experi-

Topology of the Code as Revealed by Frameshift Mutations


For several years following Cricks proposal of a commaless code, which was a product of pure theoretical brainstorming, no experimental evidence was forthcoming to support or refute it. Unexpectedly, the breakthrough in solving the coding problem came in a contribution from a quite dierent eld genetic experiments using the T4 bacteriophage. In a study published in 1961, Crick, Sydney Brenner and co-workers provided clear evidence on how the genetic code is read. They had already observed that when E. coli strain B was infected by T4 phage with a mutation in its rII region, a distinctive large plaque (zone of infection) indicative of phage growth was obtained, whereas no such plaque was evident when E. coli strain K12 was infected by the same mutant phage. However, both the wild-type and revertant T4 phages (a revertant is a
2

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

A cistron

B cistron Genotype (+) FCO () and (+) Phenotype

Wild type

FC9 ()

(+) FCO FC7 ()

(+) and ()

Wild type

()

Mutant

() and (+)

Wild type

() and ()

Mutant

(0, () and ()

Wild type

Figure 1 Gene map of frameshift mutants of T4 phage rIIB cistron caused by proflavin. Modified from Crick FHC, Barnett L, Brenner S and Watts-Tobin RJ (1961) Nature 192: 1227 1232.

mental nding immediately disproved Cricks idea of a comma-less code if it held, a nonsense codon would arise at any insertion ( 1 ) or deletion ( 2 ) mutation site, leading to the termination of protein synthesis. Why do the triple mutants ( 1 , 1 , 1 ) and ( 2 , 2 , 2 ) manifest the wild-type phenotype? The answer is because the original coding frame is recovered by the insertion or deletion of three bases at dierent loci. Frameshift mutations in T4 phage thus revealed the following fundamental intrinsic characteristics of the genetic code: the code is read from a dened point with a reading frame made up of three-base combinations, and it is not overlapping but degenerate (i.e. there is more than one codon for most amino acids).

The Code Decoded Using Protein Synthesis In Vitro


It was also in 1961 that Marshall Nirenberg and Johann Matthaei succeeded for the rst time in assigning an element of the genetic code by using a cell-free E. coli protein synthesis system containing ribosomes, S100 fraction, transfer RNA (tRNA), amino acids, adenosine triphosphate (ATP), an energy-recycling system and a template. In this cell-free system, a dened isotope-labelled amino acid and the other 19 unlabelled amino acids were

incubated with a template either native DNA, RNA or synthetic polyribonucleotides. Selective incorporation of an amino acid was observed by isolation and analysis of aggregated proteins, in the form of the acid-insoluble fraction, in the protein synthesis system. As a result, Nirenberg and Matthaei discovered that adding poly(U) a long chain of repeating uridylic acid units to the cellfree system enhanced the incorporation of Phe, demonstrating that in the genetic code UUU corresponds to the amino acid phenylalanine (Phe) (Table 1). This nding was reported at the International Biochemical Congress held in Moscow in 1961 and the news rapidly spread around the world. By that time, Nirenbergs group had already succeeded in elucidating a second element of the genetic code: CCC codes for proline (Pro). As soon as news of the breakthrough in deciphering the code reached Severo Ochoas laboratory, his group began synthesizing various RNA copolymers in which two kinds of nucleotide were mixed at xed ratios; they then examined the composition of the codon triplets. An example is shown in Table 2. The copolymer Poly(5A1C), synthesized by mixing ve parts of A and one part of C, contained A and C in a 5 : 1 ratio but in a random order. Ochoas analysis showed that poly(5A1C) enhanced the incorporation of the amino acids Asp, Gln, His, Lys, Pro and Thr. Statistically, the three triplets in the copolymer made up of 2A1C (AAC, ACA and CAA) could be
3

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

(1) Wild-type cistron

C AT C AT C AT C AT C AT C AT C AT C AT C AT

(2) One base addition (+) C AT C AT G C A T C A T C A T C A T C A T C A T C A

(3) One base deletion () C AT C AT C AT C AT C AT C A C AT C AT C AT C

(4) Double mutation of (2) and (3) (+) ()

C AT C AT G C A T C A T C A T C A C AT C AT C AT

Mutation

Original reading frame

(5) Three base addition (+) (+) (+)

C AT C AT C G A T C A T G C AT C AT G C AT C AT

Mutation

Original reading frame

Figure 2 Frameshift mutants of T4 phage rIIB cistron caused by proflavin. Addition and deletion on the nucleotide sequence. Redrawn from Crick FHC, Barnett L, Brenner S and Watts-Tobin RJ (1961) Nature 192: 1227 1232.

expected to enhance the incorporation of the relevant amino acids by a proportion one-fth that of AAA (Lys). Similarly, the triplets consisting of 1A2C could be expected to incorporate the relevant amino acids by a proportion 1/25 that of AAA (Lys). As shown in Table 2, a good relationship was observed between the theoretical and experimental values, demonstrating that codon triplets consisting of 2A1C are likely to code for Asp, Gln and Thr, while those comprising 1A2C would code for His, Pro and Thr. Nirenbergs group carried out similar experiments, and by 1964 the nucleotide compositions of all the codon triplets corresponding to all 20 amino acids were determined. However, the problem of matching each specic

codon sequence to the corresponding amino acid remained. This matching problem was also solved by Nirenbergs group. They designed a novel experiment utilizing the nding that when a triplet codon sequence is added to a cell-free system, ribosomes specically bind to the aminoacyl-tRNA possessing an anticodon complementary to the codon sequence added. By trapping the resulting complex (i.e. ribosomes attached to the aminoacyl-tRNA and the codon sequence) on a nitrocellulose membrane and then identifying the amino acid attached to the tRNA, each codon was matched with its respective amino acid. As a result of this second breakthrough, the nucleotide sequences of most triplet codons were unambiguously

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

Table 1 Poly(U)-dependent poly(Phe) synthesis Condition Completea 2 poly(U) 2 ribosome 2 S100 2 ATP 2 energy regenerating system 1 RNAase 1 DNAase
a

Incorporation of [14C]Phe (counts/min) 29 500 70 52 106 83 120 27 600

Copoly(UC), copoly(AGU) and other sequences were similarly synthesized. It was found that copoly(AG) and copoly(UC) produced polypeptides with alternating amino acids Arg and Glu, copoly(Arg-Glu), and with alternating Ser and Leu, copoly(Ser-Leu), respectively. Information gained from this experiment combined with the already established relationships between amino acids and the nucleotide compositions of corresponding codons enabled AGA, GAG, UCU and CUC to be respectively matched with Arg, Glu, Ser and Leu. Thus, 61 codon triplets were unambiguously assigned to 20 amino acids (Table 4). The remaining three triplets were determined to be termination codons as described below.

The complete system comprises ribosome, tRNA, poly(U), [14C]Phe, 19 other unlabelled amino acids, S100 (a mixture of enzymes necessary for protein synthesis), ATP, and the energy regenerating system in the buer. Phe, phenylalanine. Modied from Nirenberg MW and Matthaei JH (1961) Proceedings of the National Academy of Sciences of the USA 47: 15881602.

Termination and Initiation Codons


Termination codons were resolved by molecular genetic studies of T4 phage nonsense mutants. These mutants are unable to grow in E. coli cells because synthesis of T4 proteins is terminated at the nonsense mutation sites and as a result only premature proteins are synthesized by ribosomes in abortive translation. It was found, however, that T4 phage nonsense mutants could grow in an E. coli strain called suppressor mutant Su 1 . This Su 1 strain was named the amber suppressor after Harris Bernstein, a graduate student in Richard Epsteins laboratory who was involved in the discovery of the T4 phage mutants (Bernstein is the German for amber). Brenners group treated T4 phage with a mutagen and used the resultant mutant to infect the E. coli wild-type strain (Su 2 ) or the suppressor mutant strain (Su 1 ). By selectively collecting T4 phages that grew in Su 1 but not in Su 2 they succeeded in isolating the desired mutant T4 phage. When the head protein of the mutant phage grown in wild-type Su 2 and mutant Su 1 E. coli cells was sequenced, the amino acid sequence of the protein obtained from the Su 1 cells was identical to that of the wild-type T4 phage-

assigned (Table 3). Nirenbergs group also experimentally demonstrated for the rst time that the minimal unit of a codon is a triplet by conrming that specic binding of aminoacyl-tRNA to ribosomes was observed only when a template oligonucleotide longer than trinucleotide was added to the ribosome system. At almost the same time, Har Gobind Khoranas group developed a dierent method to determine the nucleotide sequences of codon triplets. They succeeded in synthesizing polyribonucleotides with dened repeated sequences using RNA polymerase and a synthetic DNA template constructed from DNA polymerase and a short, organochemically produced DNA fragment with a dened sequence. In this approach, copoly(AG) was synthesized with a sequence of alternating A and G nucleotides.

Table 2 Incorporation of amino acids directed by poly(5A1C) as an artificial mRNA in the in vitro protein synthesis system Relative amount of amino acid incorporation (experimentally obtained) Assumed codon Sum of occurrence Calculated frequency of occurrence of a triplet frequencies of triplets 3A Asp Gln His Lys Pro Thr 24.2 23.7 6.5 100 7.2 26.5 2A1C 2A1C 1A2C 3A 1A2C, 3C 2A1C, 1A2C 2A1C 20 20 4.0 100 20 4.0 4.0 0.8 1A2C 3C 20 20 4 100 4.8 24

Amino acid

Modified from Speyer JF, Lengyel P, Basilio C et al. (1963) Cold Spring Harbor Symposia on Quantitative Biology 28: 559567.

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

Table 3 Binding of aminoacyl-tRNA to ribosomes stimulated by trinucleotides Trinucleotides UUU, UUC UUA, UUG, CUU, CUC, CUA, CUG AAU, AUC, AUA AUG GUU, GUC, GUA, GUG UCU, UCC, UCA, UCG CCU, CCC, CCA, CCG AAA, AAG UGU, UGC GAA, GAG Aminoacyl-tRNA Phe Leu Ile Met Val Ser Pro Lys Cys Glu

infected cells except for the replacement of a single amino acid: Gln in the wild-type T4 phage-infected cells was replaced by Ser in the Su 1 protein. The protein from the Su 2 cells comprised a premature fragment whose sequence was identical to that of the wild type up to the amino acid immediately before Gln (Table 5). As such a mutation would have resulted from a mutation at a single base, it was very likely that a single base mutation in the Gln codon in the wild-type cells resulted in a termination codon in the Su 2 cells. Similarly, another single base mutation in the termination codon would have given rise to the Ser codon in Su 1 cells. When the T4 phage mutant was used to infect dierent Su 1 strains and the head proteins of the resultant T4 phages were sequenced, several amino acids were found to be inserted into the position originally occupied by Gln in the wild-type protein (Figure 3). Given that a single base mutation is the most plausible mechanism to explain these phenomena, it was concluded that the

termination codon found in Su 2 cells must be UAG. Alan Garens group also arrived at the same conclusion in experiments using amber mutants of E. coli alkaline phosphatase. Later, two other nonsense mutations of T4 phage were also isolated, and E. coli mutants that could suppress these mutations were concomitantly found. In a similar manner to that described above, the nonsense codons responsible for these mutations were identied as UAA and UGA, which were named ochre and opal respectively. These three codons (UAG, UAA and UGA) that code for no amino acid but instead cause protein synthesis to terminate were named termination (or stop) codons. In 1965, Mario Capecchis group demonstrated that termination codons are also able to function in an in vitro protein synthesis system. They used an R17 phage amber mutant in which the codon coding for the seventh amino acid of the coat protein, Glu, was replaced by the termination codon UAG. When R17 mutant RNA was translated as a template in an in vitro translation system prepared from the E. coli Su 2 strain, a peptide consisting of six amino acids was produced. On the other hand, when it was translated in a system prepared from the E. coli Su 1 strain, a full-length coat protein was produced with Ser replacing Glu at the seventh position. This result demonstrated that the UAG amber codon does indeed serve as a terminator of protein synthesis. Further investigation of Su 2 and Su 1 E. coli strains also revealed that tRNA is responsible for this suppression mechanism. The tRNA identied with the suppressor function was named suppressor tRNA. Howard Goodman was the rst to sequence the amber suppressor tRNATyr (in 1965). He found that the anticodon of the suppressor tRNA was changed to CUA, allowing base pairing with UAG of the amber codon. For ochre suppressor tRNAGln with the UAA codon and opal

Table 4 Incorporation of amino acids stimulated by artificial mRNAs with alternating sequences Polynucleotide poly(UC) poly(AG) poly(UG) poly(AC) poly(UAC) poly(GUA) poly(UUG) poly(UAUC) poly(UUAC) Synthesized polypeptides poly(Ser-Leu) poly(Arg-Glu) poly(Val-Cys) poly(Thr-His) poly(Tyr), poly(Thr), poly(Leu) poly(Val), poly(Ser) poly(Leu), poly(Cys), poly(Val) poly(Tyr-Leu-Ser-Ile) poly(Leu-Leu-Thr-Tyr) Assigned codons UCU: Ser, CUC: Leu AGA: Arg, GAG: Glu GUG: Val, UGU: Cys ACA: Thr, CAC: His UAC: Tyr, ACU: Thr, CUA, Leu GUA: Val, AGU: Ser, (UAG:stop) UUG: Leu, UGU: Cys, GUU: Val UAU: Tyr, CUA: Leu, AUC: Ile, UCU: Ser UUA: Leu, UAC: Tyr, ACU: Thr, CUU: Leu

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

Table 5 Comparison of amino acid sequences of T4 phage head protein between a wild-type phage and its mutant phage H36 grown in either Su or Su+ E. coli strain Wild type H36Su H36Su+ AlaGlyValPheAspPheGlnAspPro IleAsp AlaGlyValPheAspPhe AlaGlyValPheAspPheSerAspPro IleAsp

Replaced amino acids in the Su+ strain Wild type Lys (AAG) Gln (CAG) Stop (UAG) Glu (GAG) Scr (UCG) Leu (UUG) Trp (UGG) Tyr (UAU) Tyr (UAC) Stop (UAA)

Gln (CAG)

Modified from Brenner S, Strelton AO and Kaplan S (1965) Nature 206: 994998.

Figure 3 Replacement of the amber codon with amino acids in the head protein of T4 phage amber mutant grown in various Escherichia coli Su 1 strains.

suppressor tRNATrp with the UGA codon, the corresponding anticodons U*UA and U*CA were respectively identied. The initiation codon was identied mainly through the eorts of two groups. In 1966, Norton Zinder and colleagues found that the N-terminus of proteins synthesized by an in vitro translation system using a phage RNA as a messenger is always formyl-Met. In the same year, Brian Clark and Kjeld Marcker identied two species of tRNAMet in E. coli cells, one of which bound to formyl-Met while the other bound only to Met. From these ndings, it was deduced that the initiation codon used in the translation was AUG and that it coded for Met. The initiation site in mRNA was discovered by Joan Steitz in 1969 using RNA of R17 phage. She formed an initiation complex in which 32P-labelled R17 phage RNA was bound to the ribosome under the initiation conditions for protein synthesis (consisting of fMet-tRNA, guanosine triphosphate (GTP) and initiation factors). The 32Plabelled RNA in the complex was digested with pancreatic ribonuclease and the nucleotide sequences of the 32Plabelled RNA fragments, which were protected from RNAase digestion by the ribosome, were determined. As shown in Figure 4, the protected RNA fragments contained all the N-terminal amino acid sequences in three phage proteins. The result unambiguously demonstrated that the initiation codon was AUG. A similar nding was obtained using Qb phage RNA.

In Vivo Code
As described above, all the elements of the genetic code were deciphered using in vitro protein synthesis systems. Demonstrating a functioning code in vitro, however, does not guarantee that it functions in vivo. In 1966, George Streisingers group conducted an experiment to conrm the in vivo validity of the deduced genetic code by analysing the T4 phage lysozyme gene and the amino acid sequence of the corresponding protein. Because the lysozyme is a single protein unit and is relatively easy to purify, it was considered a good model to investigate whether changes in a gene are reected in the expressed protein. Streisingers group prepared the double mutant eJ42eJ44 by recombining the genes of two frameshift mutants, eJ42 and eJ44, and compared their amino acid sequence with that of the wildtype lysozyme. In the mutant amino acid sequence, ve amino acid residues of eJ42 and eJ44 were changed from those of the wild-type lysozyme (Figure 5), indicating that two mutations, a base deletion and a complementary insertion, had occurred very close to each other and that between the two mutations the reading frame had shifted. When the experimental results were evaluated according to the genetic code table, the mutation combination deletion of A followed by addition of G (see Figure 5) alone could provide an unambiguous explanation. Heinz FraenkelConrats and Heinz-Gu nter Wittmanns groups obtained similar results in their experiments using tobacco mosaic virus RNA.

Coat protein

AGAGCCUAACCGGGGUUUGAAGC|AUG|GCU|UCU|AAC|UUU| fMet Ala Ser Asn Phe

Replicase

AAACAUGAGGAUUACCC|AUG|UCG|AAG|ACA|ACA|AAG| fMet Ser Lys Thr Thr Lys

A protein

CCUAGGAGGUUUGACCU|AUG|CGA|GCU|UUU|AGU| fMet Arg Ala Phe Ser

Figure 4 Nucleotide sequences of the translation initiation sites in the coat protein, replicase and A protein of R17 RNA. The underlined sequences are the Shine Dalgarno sequences. Redrawn from Steitz (1969) Nature 224: 957 964.

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

The following intrinsic characteristics of the gene and its expression were revealed by the work discussed in the foregoing sections: 1. The genetic code elements deciphered by in vitro experiments are also used in vivo. 2. Genetic information is read from a dened point in the nucleotide sequence with a reading frame consisting of three-base combinations. 3. The direction of mRNA translation is from the 5 to the 3 end (deduced from another experimental study not discussed here).

mental results thus far obtained, a genetic code table was constructed: all the 64 codons were unambiguously assigned to each of the 20 amino acids after discussion and evaluation of research ndings. Crick also contributed in the arrangement of the genetic code table that is used today (Figure 6). As the genetic code had been found to be common to all the organisms examined bacteria, yeasts, viruses, plants and animals it was named the universal genetic code. The concept of this universal code led to the hypothesis that all living organisms on earth derive from a common origin.

The Universal Genetic Code


In 1966, a symposium was held at Cold Spring Harbor in the USA to discuss the coding problem. Participants included the leading molecular biologists of the day and other relevant researchers. By combining all the experi-

Progress after the Cold Spring Harbor Symposium


Active research continued after the Symposium. Using R17 RNA, Steitz conrmed the code by comparing the nucleotide sequence of native RNA and the amino acid sequence synthesized from the RNA (1969). Walter Fiers group determined the entire nucleotide sequence of MS2

Wild type

----ThrLysSerProSerLeuAsnAla---

Mutation site

eJ42eJ44 mutant

-----ThrLysValHisHisLeuMetAla---

Wild type

Amino acid sequence Nucleotide sequence

---LysSerProSerLeuAsnAla-----AAR A GUCCAUCACUUAAU GCN---

Nucleotide sequence

---AARGUCCAUCACUUAAUGGCN---

Amino acid sequence

- - - L y s Va l H i s H i s L e u M e t A la - - -

Deletion

Frameshift eJ42eJ44 mutant

Insertion

Figure 5 Comparison between amino acid sequences of lysozyme of wild-type T4 phage and eJ42eJ44 frameshift mutant. Determination of in vivo code. Redrawn from Terzaghi E, Okada Y, Streisinger G et al. (1996) Proceedings of the National Academy of Sciences of the USA 56: 500 507.

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

phage RNA (3569 nucleotides) and conrmed the codon assignments in the genetic code table in the process (1976). In another direction, developments in genetic engineering allowed DNA to be sequenced. Sangers group was the rst to determine a complete DNA nucleotide sequence that of FX phage (5386 nucleotides; 1978). The DNA nucleotide sequences of the phage fd (6408 nucleotides; 1978) and the mammalian virus SV40 (5243 base pairs; 1978) were subsequently determined. By directly comparing the nucleotide sequence of RNA, or DNA, with the amino acid sequence of the corresponding protein, the accuracy of the universal genetic code table in vivo was conrmed. Nowadays, DNAs from diverse sources from viruses and microorganisms to plants and animals can be analysed easily to determine their nucleotide sequences. The amino acid sequences of proteins encoded by genes can

also be readily determined. By comparing the homologies of amino acid sequences of known proteins with those of unknown ones, the structures and functions of unknown proteins can be deduced. Projects to determine the genomes of various organisms, including the human genome, have met with success: the complete DNA sequences of about 100 species have already been determined and the next task is to identify each gene by elucidating its biological function. In pursuing these goals, attention must be paid to nonuniversal genetic codes; some organisms have been found to use unconventional codes, and others as yet unknown may also do so.

Second base U UUU UUC U UUA UUG CUU CUC C CUA First base CUG leu leu CCA CCG pro pro CAA CAG gln gln CGA CGG arg arg ser ser arg arg A G U C A G Third Base leu leu leu leu UCA UCG CCU CCC ser ser pro pro UAA UAG CAU CAC ochre amber UGA UGG CGU CGC opal trp arg arg A G U C phe phe UCU UCC C ser ser UAU UAC A tyr tyr UGU UGC G cys cys U C

his his

AUU AUC A AUA AUG

ile ile ile met

ACU ACC ACA ACG

thr thr thr thr

AAU AAC AAA AAG

asn asn lys lys

AGU AGC AGA AGG

GUU G GUC GUA GUG

val val val val

GCU GCC GCA GCG

ala ala ala ala

GAU GAC GAA GAG

asp asp glu glu

GGU GGC GGA GGG

gly gly gly gly

U C A G

Figure 6 The universal genetic code. At the time of the Cold Spring Harbor Symposium (1966) the UGA opal codon was not identified and all the codons were not completely allocated. The initiation codon was uncertain.

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

Genetic Code: Introduction

Further Reading
Beck E, Sommer R, Auerswald EA et al. (1978) Nucleotide sequence of bacteriophage fd DNA. Nucleic Acids Research 5: 44954503. Brenner S, Stretton AO and Kaplan S (1965) Genetic code: the nonsense triplets for chain termination and their suppression. Nature 206: 994998. Capecchi MR and Gussin GN (1965) Suppression in vitro: identication of a serine-sRNA as a nonsense suppressor. Science 149: 417422. Clark BFC and Marcker KA (1966) The role of N-formyl-methyonylsRNA in protein synthesis. Journal of Molecular Biology 17: 394406. Cold Spring Harbor (1968) The Genetic Code. Cold Spring Harbor Symposia on Quantitative Biology, vol. 31. Crick FHC (1956) A Note for the RNA Tie Club. Crick FHC (1958) On protein synthesis. Symposia of the Society for Experimental Biology 12: 138163. Crick FHC, Barnett L, Brenner S and Watts-Tobin RJ (1961) General nature of the genetic code for protein synthesis. Nature 192: 1227 1232. Fiers W, Contreras R, Duerinck F et al. (1976) Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature 260: 500507. Fiers W, Contreras R, Haegeman G et al. (1978) Complete nucleotide sequence of SV40 DNA. Nature 273: 113120. Gamow G (1954) Possible relation between deoxyribonucleic acid and protein structures. Nature 173: 318. Goodman HM, Abelson J, Landy A, Brenner S and Smith J (1968) Amber suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA. Nature 217: 10191024. Judson HF (1979) The Eighth Day of Creation, The Makers of the Revolution in Biology. New York: Simon and Shuster. Last JA, Stanley Jr, WM, Salas M et al. (1965) Translation of the genetic message, IV. UAA as a chain termination codon. Proceedings of National Academy of Sciences of the USA 57: 10621067.

Nirenberg MW and Leder P (1964) RNA code words and protein synthesis. The eect of trinucleotides on the binding of sRNA to ribosomes. Science 145: 13991407. Nirenberg MW and Matthaei JH (1961) The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the USA 47: 15881602. Nishimura S, Jones DS and Khorana HG (1965) Studies on polynucleotides XLVIII. The in vitro synthesis of a co-polynucleotides containing two amino acids in alternating sequence dependent upon a DNA-like polymer containing two nucleotides in alternating sequence. Journal of Molecular Biology 13: 302324. Osawa S (1995) Evolution of the Genetic Code. Oxford: Oxford Science. Sanger F, Coulson AR, Friedmann T et al. (1978) The nucleotide sequence of bacteriophage FX174. Journal of Molecular Biology 125: 225246. Speyer JF, Lengyel P, Basilio C et al. (1963) Synthetic polynucleotides and the amino acid code. Cold Spring Harbor Symposia on Quantitative Biology 28: 559567. Steitz JA (1969) Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature 224: 957964. Terzaghi E, Okada Y, Streisinger G (1966) Change of a sequence of amino acids in phage T4 lysozyme by acridine-induced mutations. Proceedings of the National Academy of Sciences of the USA 56: 500 507. Webster RE, Engelhardt DL and Zinder ND (1966) In vitro protein synthesis: chain initiation. Proceedings of the National Academy of Sciences of the USA 5: 155161. Weigert MG and Garen A (1965) Base composition of nonsense codons in E. coli. Evidence from amino-acid substitutions at a tryptophan site in alkaline phosphatase. Nature 206: 992994.

10

ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

You might also like