Transcription

MOLECULAR BIOLOGY
Transcription
U. N. Dwivedi
Department of Biochemistry
University of Lucknow, Lucknow-226007
and
Smita Rastogi
Department of Biotechnology, Integral University, Lucknow
20-Jul-2006 (Revised 25-Jan-2008)
CONTENTS
Introduction
Transcription in prokaryotes (Synthesis of mRNA/rRNA/tRNA)
Prokaryotic transcription apparatus
RNA polymerase (RNA Pol) or DNA dependent RNA Polymerase
Structure of RNA polymerase
Synthesis of RNA in 5 3 direction
Requirement of Mg++
Significance of subunit of RNA Pol
Functions of RNA polymerase
Fidelity of RNA synthesis
Promoters
Overall process of prokaryotic transcription
Initiation
Elongation
Termination
Transcription in eukaryotes
Eukaryotic transcription apparatus
RNA polymerase or DNA dependent RNA Polymerase (RNA Pol)
Eukaryotic promoters
Enhancers
Transcription Factors
Elongation factors
Overall process of eukaryotic transcription
Post transcriptional processing
Post transcriptional processing of mRNA (maturation of mRNA)
Post transcriptional processing of mRNA in prokaryotes
Post transcriptional processing of mRNA in Eukaryotes
Alternative mRNA processing
Post transcriptional processing of tRNA and rRNA (maturation of tRNA and
rRNA)
Post transcriptional processing of tRNA
Post transcriptional processing of rRNA
Inhibitors of transcription
RNA Pol binding inhibitors
DNA specific inhibitors
Reverse transcriptase (RT) (RNA directed DNA polymerase)
Key words
Synthesis of mRNA, rRNA and tRNA; Prokaryotic and eukaryotic RNA polymerases; Promoters; Transcription
factors; Enhancers; Post transcriptional RNA processing: Capping; Splicing; Polyadenylation; Inhibition of
transcription; Reverse transcriptase
2
Introduction
DNA stores genetic information in a stable form that can be readily replicated. However, the
expression of this genetic information requires its flow from DNA to RNA to protein. The first
step i.e. the conversion of DNA sequence information into RNA sequence information or more
precisely the process of RNA synthesis according to the instructions of DNA template is called
transcription.
Before studying the details of transcription, few points that need mention are:
$ The two strands of double stranded DNA are coding strand and template strand. The coding
strand of DNA has the same sequence as that of RNA transcript except for thymine (T) in
place of uracil (U). The coding strand is also called sense or (+) strand. The template strand
is also called antisense or (-) strand. The sequence of the template strand is the complement
of the RNA transcript (Fig. 1).
Coding, Plus (+), Sense strand

5' 3'
Promoter
3' 5'
Template, Minus (-), Antisense strand
Double stranded DNA
Fig. 1: Coding and non coding strands in a DNA
$ The first nucleotide of a transcribed DNA sequence is denoted as + 1 and is called start site.
The sequences towards the 5 side of start site are referred to as upstream sequences and
denoted with minus sign. The sequences towards the 3 side are downstream sequences and
denoted with plus sign. Thus, the second nucleotide downstream of + 1 site is + 2 and so
on. The nucleotide preceding the start site is denoted as - 1 and so on. There is no 0 (zero)
nucleotide. These designations refer to the coding strand of DNA. The coding strand for a
particular gene may be located in either strand of a given DNA.
$ Different parts of the genome can be transcribed to different extents, choice of which part
to transcribe and how extensively can be regulated by regulatory elements.
$ RNA synthesis occurs in 5 3 direction.
Transcription in prokaryotes (Synthesisof mRNA/rRNA/tRNA)

RNA synthesis in prokaryotes, like all biological polymerization reactions takes place in three
stages: Initiation, Elongation and Termination. The transcription is initiated by the binding of
RNA polymerase to a specific DNA sequence called promoter, in a defined orientation, leading
to the transcription of the same strand from that promoter. In order to study the transcription
process, detailed information of RNA polymerase and promoter is important. The following
sections deal with the properties and functions of RNA Pol and prokaryotic promoter.
3
Prokaryotic transcription apparatus
(1) RNA polymerase (RNA Pol) or DNA dependent RNA polymerase
RNA Pol is present in all prokaryotic cells and was first discovered in 1960 by Samuel Weiss
and Jerard Hurwitz. In E. coli (eubacteria), a single type of RNA Pol appears to be responsible
for almost all the synthesis of RNA such as mRNA, rRNA and tRNA. Various bacteriophages
also encode RNA Pol that synthesizes only phage-specific RNAs.
The RNA Pol moves along the template, synthesizing RNA starting from the promoter
(described below) until it reaches a sequence called terminator. This action defines a
transcription unit that extends from the promoter to the terminator and the immediate product of
transcription is called primary transcript. The primary transcript is, however, almost always
unstable, and is either degraded or cleaved to give the mature products, viz, mRNA / rRNA /
tRNA.
(A) Structure of RNA polymerase

E. coli RNA Pol is a large multisubunit enzyme with a molecular weight of ~ 500 kD. It is one of
the largest enzymes in the bacterial cell. The dimensions of the enzyme are 90 X 95 X 160 .
The core RNA Pol of E. coli contains four types of subunits with a structure consisting of
2. The properties and functions of various subunits of RNA Pol are summarized in the
Table 1.
Another subunit called subunit binds only transiently to the core enzyme, forming a
holoenzyme 2. E. coli has several factors which are summarized in Table 2. 70 is used
for general transcription while other factors are activated by specific environmental conditions.
Thus, 32, 54, 28 or F, H, etc, are induced at the time of heat shock, nitrogen starvation,
flagellar, shock respectively.
E. coli RNA Pol has the overall shape of a crab claw, where the two pincers are made up
predominantly of two large subunits, namely and (Fig. 2).
Further structural analysis shows that RNA Pol there is a channel or groove that allow DNA,
RNA and ribonucleotides into and out of the enzymes active center cleft (Fig. 3). The channel
for DNA lies at the interface of the and subunits. The NTP-uptake channel allows
ribonucleotides to enter the active center. The RNA exit channel allows the growing RNA chain
to leave the enzyme as it is synthesized during elongation. The downstream DNA (i.e. DNA
ahead of the enzyme, yet to be transcribed) enters active center cleft in double stranded form
through the downstream DNA channel (between the pincers). Within the active center cleft, the
DNA strands separate from position +3. The non-template strand exits the active center cleft
through the non-template strand (NT) channel and travels across the surface of the enzyme. The
template strand, in contrast, follows a path through the active center cleft and exits through the
template strand (T) channel. RNA Pol surrounds the DNA. The length of groove could hold 16
bp in bacterial enzyme and ~25 bp in eukaryotic enzyme.
4
Table 1: Properties and functions of subunits of E. coli RNA pol
S. Gene Product Size Number of Number of Function Phases of

No. coding (subunit) (kD) amino acid subunit per transcription
for residues holoenzyme during which
subunit the subunit is
required
1 rpo A subunit 40 329 2 Function uncertain, All stages
Probably involved in
enzyme assembly,
promoter recognition
at UP element,
binding of some
activators
2 rpo B subunit 155 1342 1 Catalytic center All stages
(Phosphodiester bond
formation)
3 rpo C subunit 160 1407 1 Catalytic center (DNA All stages
template binding)
4 rpo Z subunit ~10 91 1 Unknown All stages
5 rpo D 70 subunit 70 613 1 Promoter specificity, Only during
(rps D) recognition and initiation
binding, RNA
synthesis initiation
(Increases binding
efficiency at
promoter, decreases
non-specific binding,
converts closed
promoter complex to
open promoter
complex)
5
Table 2: Types of E. coli factors
S. Gene Promoter sequence Functions

No factor -35 Distance -10 sequence
sequence between -10
and -35
regions (bp)
1 rpo D 70 TTGACA 16-18 TATAAT General
2 rpo H 32 CCCTTGA 13-15 CCCGATNT Heat shock
A
3 rpo N 54 CTGGNA 6 TTGCA Nitrogen starvation
4 fli A 28 or CTAAA 15 GCCGATAA Flagellar
F
5 sig H H AGGANPu 11-12 GCTGAATCA Cytochrome
Pu biogenesis;
Generation of
potential nutrient
sources; Transport,
Cell wall
metabolism
important for
competence and
sporulation
initiation
Upstream
DNA
RNA Pol movement
RNA exit
Rudder DNA enters jaws
Wall
Bridge
Nucleotides
Fig. 2: Crab claw structure of RNA Pol
6
Active site
RNA pincer
exit channel Downstream
+1 DNA
-10
Upstream -35 pincer
DNA flap
T channel NT channel
Fig. 3: Channel structure of RNA Pol
The map of the E. coli 70 factor identifies four conserved regions, namely 1-4, which are further
subdivided into sub-regions (Fig. 4). These sub-regions have different functions. The subregion
2.4 (also called -10 region or unwinding domain) confers specificity by recognizing -10 region of
promoter, while subregion 4.2 (also called -35 region or recognition domain) provides binding
energy by recognizing -35 region of the promoter. The details of other sub-regions are tabulated
in Table 3.
Recognizes 'Extended -10' region

Recognizes -35 region Recognizes -10 region
C N
4 3 2 1
Responsible for melting
Fig. 4: Regions of factor and their functions
7
Table 3: Functions of regions and sub-regions of factor
S. No. Region Function / Properties of some sub-regions

1 1 The region 1 comprising of sub-regions 1.1 and 1.2 is present at the N-
terminal end of factor;
This region is negatively charged and has regulatory function;
In free form of factor, sub-region 1.1 plays an autoinhibitory role by
occluding its DNA binding domains (i.e., 2.4 and 4.2);
Association of factor with core enzyme changes conformation of factor
leading to release of autoinhibition;
In holoenzyme, sub-region 1.1 being negatively charged occupies positively
charged region in active center cleft of RNA Pol, thereby this sub-region
acts as DNA mimic;
Upon melting of DNA, the sub-region 1.1 shifts by 20-50 and hence
clears the DNA entry channel allowing DNA entry
2 2 The sub-regions 2.1 and 2.2 are highly conserved part of factor;
These are involved in interaction with core enzyme;
The sub-region 2.3 resembles protein that binds single stranded nucleic acid
and is involved in melting reaction;
The sub-region 2.4 has -helical structure that specifically recognizes -10
region (i.e., it determines specificity);
It is also called -10 region or unwinding domain of factor
3 3 The sub-region 3.1 binds intervening DNA sequence (distance between -10
and -35 regions i.e. ~75 )
When factor binds core enzyme, its N-terminal domain of sub-region 3.2
blocks RNA exit channel; It thus acts as molecular mimic of RNA; This is
removed from RNA exit channel for elongation to occur;
Act of ejection of this sub-region from RNA exit channel takes several
attempts and leads to abortive initiations
4 4 The sub-region 4.2 has -helical structure that specifically recognizes -35
region of promoter;
It is also called -35 region or recognition domain of factor
(B) Synthesis of RNA in 5 3 direction

The results of labeling experiments with -32P substrates confirmed that RNA chains, like DNA
chains grow in the 5 3 direction, which involves the movement of the enzyme RNA Pol in a
3 5 direction along the antisense DNA strand (template). So, the template DNA strand is
copied in 3 5 direction and the 3-OH group of the growing RNA chain attacks the -P of the
8
incoming rNTP.
For transcription RNA Pol requires DNA template, ribonucleoside triphosphates (rNTPs; viz.
rATP, rGTP, rCTP and UTP), Mg++. There is no requirement of any primer. The enzyme is most
active when bound to double stranded DNA, but only one of the two strands serve as a template.
The 3-OH group of the growing RNA chain attacks the -P of the incoming NTP and releases
pyrophosphate. This reaction is thermodynamically favorable and the subsequent degradation of
the pyrophosphate to orthophosphate locks the reaction in the direction of RNA synthesis. The 5
triphosphate group of the first residue in a nascent (newly formed) RNA molecule is not cleaved
to release PPi, but remains intact throughout the transcription process. Thus, the reaction is
driven by the release and subsequent hydrolysis of PPi as summarized in Scheme 1.
(NMP)n + NTP (NMP)n+1 + PPi

RNA Incoming Lengthened Pyrophosphate
ribonucleotide RNA
where,
NMP signifies ribonucleoside monophosphate;
n represents number of NMPs
Pyrophosphatase
PPi 2Pi
Pyrophosphate Orthophosphate
Scheme 1: Reaction catalyzed by RNA Pol
RNA Pol requires that the initiating NTP be brought into its active site and held stably on its
template whereas the next NTP is presented with correct geometry for chemistry of
polymerization to occur. This is particularly difficult because RNA Pol starts most transcripts
with A and that ribonucleotide binds the template nucleotide T with only two H-bonds. Thus,
the enzyme has to make specific interactions with the initiating NTP, holding it rigidly in the
correct orientation to allow chemical attack on the incoming NTP. The requirement for such
specific interactions between the enzyme and the initiating NTP probably explains why most
transcripts start with same nucleotide. The interactions are specific for that nucleotide (or A) and
thus only chains beginning with A are held in a manner suitable for efficient initiation. It is
believed that the interactions are provided by various parts of the RNA Pol holoenzyme,
including part of . Consistent with this, in experiments using an RNA Pol containing a 70
derivative lacking this part of , initiation requires much higher than normal concentrations of
initiating it.
(C) Requirement of Mg++

The active site of the core enzyme is made up of regions from and subunits which is found at
the base of the pincers within a region called the active center cleft and contains two metal ions
9
(Mg++) in its active form, consistent with the proposed two metal ion catalytic mechanism for
nucleotide addition proposed for all types of polymerases. One metal ion remains bound to the
enzyme whereas the other appears to come in with the nucleoside tri phosphate and leave with
the pyrophosphate. The and subunits extensively interact with one another, particularly at
the base of the channel where the active site Mg++ ion is located. The subunit binds a Zn++ ion
via four cysteine residues that are invariant in prokaryotes but not in eukaryotes. Three
conserved aspartate residues (Asp) of the enzyme participate in binding these metal ions.
(D) Significance of subunit of RNA Pol

The transient binding of factor to the core enzyme is concerned specifically with promoter
recognition. factor has domains that recognize the promoter sequence. alone does not bind to
DNA because N-terminal region of behaves as an autoinhibition domain. It occludes the DNA
binding domains when free thereby suppressing the activities of the DNA binding regions. When
subunit binds to core enzyme (2), it changes the conformation of factor so that the
inhibition is released and the DNA binding domains can contact DNA.
Comparisons of the crystal structures of core enzyme and holoenzyme show that factor lies
largely on the surface of the core enzyme. It has an elongated structure that extends past the
DNA binding site. The subunit binds transiently to the core enzyme and directs the RNA Pol
holoenzyme to specific binding sites on DNA where transcription begins.
factor participates in initiation of RNA synthesis by formation of open complex. In contrast to

-35 region, which simply provides binding energy to secure polymerase to the promoter, -10
region has a more elaborate role in transcription initiation, because it is within that element that
DNA melting is initiated in the transition from the closed to open complex. Thus, the sub-region
2.4 (unwinding domain or -10 domain) of that interacts specifically with the -10 region of
promoter is doing more than simply binding DNA, while the specific interactions of sub-region
4.2 (recognition domain or -35 domain) with -35 sequence of promoter just provides binding
energy. In keeping with this expectation, the -helix involved in recognition of the -10 region
contains several aromatic amino acids that can interact with bases on the non-template strand in a
manner that stabilizes the melted DNA Unwinding increases negative supercoiling of DNA.
When the holoenzyme forms an open complex on DNA, the N-terminal domain is displaced
from the active site. It swings 20-50 away and the two DNA binding regions separate by 15 ,
presumably to acquire a more elongated conformation appropriate for contacting DNA.
The factor dissociates from the rest of the RNA Pol when RNA chain reaches 8-9 nucleotides
in length. It is not necessary for elongation phase. When factor is released from core enzyme, it
reverts to a general affinity for all DNA, irrespective of sequence, that suits it to continue
transcription. It therefore becomes immediately available for use by another core enzyme.
A change in association between and holoenzyme changes binding affinity for DNA so that
core enzyme can move along DNA. RNA Pol encounters a dilemma in reconciling its needs for
initiation with those for elongation. Initiation requires tight binding only to particular sequences
(promoters), while elongation requires close association with all sequences that the enzyme
encounters during transcription. This dilemma is solved by the reversible association between
10
factor and core enzyme. factor is either released following initiation or changes its association
with core enzyme so that it no longer participates in DNA binding. There is only 30% of the
amount of factor present in the cell compared with core enzyme complexes. Therefore one-
third of the polymerase complexes can exist as holoenzyme at any one time. Because there are
fewer molecules of than of core enzyme, the utilization of core enzyme requires that
recycles. This occurs immediately after initiation in about one third of cases, presumably and
core dissociate at some later point in the other cases. Irrespective of the exact timing of its
release from core enzyme, factor is involved only in initiation.
After the release of factor from the RNA Pol, the core enzyme moves along the DNA
synthesizing the growing RNA strand. The factor can then complex with a further core enzyme
complex and reinitiate transcription.
(E) Functions of RNA polymerase

RNA Pol performs multiple functions in the process of transcription:
$ Binding to DNA and recognition of promoters

All sequence specific contacts that the holoenzyme makes with the DNA (with the -10 and
-35 regions as well as the so-called extended-10 region just upstream of the -10 region)
are mediated by the subunit via conserved residues. The binding of causes the core
enzymes pincers to come together so as to narrow the channel between them by ~10 .
The outer surface of the holoenzyme is almost uniformly negatively charged, whereas
those surfaces presumed to interact with nucleic acids, particularly the inner walls of the
main channel, are positively charged.
$ Melting of DNA
This melting occurs between positions -11 and +3, in relation to the transcription start
site. The double helix reforms at -11 in the upstream DNA behind the enzyme. The and
subunits contact DNA at many points downstream of the active site. They make several
contacts with the coding strand in the region of the transcription bubble, thus stabilizing
the separated single strands. The RNA is contacted largely in the region of the
transcription bubble.
As the enzyme moves along DNA, the base in the template strand at the start of the turn
will be flipped to face the nucleotide entry site. The RNA-DNA hybrid is 9 bp long and the
5 end of RNA is forced to leave the DNA when it hits a protein called rudder.
Once DNA has been melted, the individual strands have a flexible structure in the
transcription bubble. This enables DNA to take its turn in the active site. But before
transcription starts, the DNA double helix is a relatively rigid straight structure. This
straight structure enters the polymerase without being blocked by the wall due to
conformational shift that occur in enzyme. Adjacent to the wall is a clamp. In the free form
of RNA Pol, this clamp swings away from the wall to allow DNA to follow a straight path
through the enzyme. After DNA has been melted to create the transcription bubble, the
clamp must swing back into position against the wall.
11
$ Selection of correct ribonucleotides
It selects the correct ribonucleotide triphosphate and catalyzes the formation of a
phosphodiester bond. This process is repeated many times as the enzyme moves
unidirectionally along the DNA template. RNA Pol is completely processive, i.e., a
transcript is synthesized from start to end by a single RNA Pol molecule.
$ Stabilization of single stranded regions

It itself stabilizes single stranded regions.
$ Elongation
It is involved in elongation. When RNA Pol forms initial elongation complex after the first
10 bp have been synthesized, the RNA Pol may lose factor and lose contacts from -35
and -55. At 15-20 bp, general elongation complex is formed and covers 30-40 bp. The
elongating RNA Pol is a processive machine that synthesizes and proofreads RNA. DNA
passes through the elongating enzyme in a manner very similar to its passage through the
open complex. Thus, double stranded DNA enters the front of the enzyme between the
pincers. At the opening of the catalytic cleft, the strands separate to follow different paths
through the enzyme before exiting via their respective channels and reforming a double
helix behind the elongating polymerase. Ribonucleotides enter the active site through their
defined channel and are added to the growing RNA chain under the guidance of the
template DNA strand. Only eight or nine nucleotides of the growing RNA chain remain
base paired to the DNA template at any given time, the remainder of the RNA chain is
peeled off and directed out of the enzyme through the RNA exit channel.
RNA chain elongation requires that the double stranded DNA template be opened up at the
point of RNA synthesis so that the template strand can be transcribed to its complementary
RNA strand. In doing so, the RNA chain only transiently forms a short length of RNA-
DNA hybrid duplex, as is indicated by the observation that transcription leaves the
template duplex intact and yields single stranded RNA. The unpaired bubble of DNA in
the open initiation complex apparently travels along the DNA with RNA Pol. There are
two ways this might occur: (i) If the RNA Pol followed the template strand in its helical
path around the DNA, the DNA would build up little supercoiling because the DNA
duplex would never be unwound by more than about a turn. However, the RNA transcript
would wrap around the DNA, once per duplex turn. This model is implausible since it is
unlikely that its DNA and RNA could be readily untangled. The RNA would not
spontaneously unwind from the long and often circular DNA in any reasonable time and
no known topoisomerase can accelerate this process. (ii) If the RNA Pol moves in a
straight line while the DNA rotates, the RNA and DNA will not become entangled. Rather,
the DNAs helical turn are pushed ahead of the advancing transcription bubble so as to
more tightly wind the DNA ahead of the bubble (which promotes positive supercoiling)
and the linking number of the entire DNA remains unchanged). This model is supported
by the observations that the transcription of plasmids in E. coli causes their positive
supercoiling in Gyrase mutants (which cannot relax positive supercoils) and their negative
supercoiling in topoisomerase I mutants (which cannot relax negative supercoils). Infact,
by tethering RNA Pol to a glass surface and allowing it to transcribe DNA that had been
fluorescently labeled at one end, Kazuhiko Kinosita demonstrated, through fluorescence
microscopy (using techniques similar to those showing that the F1F0ATPase is a rotary
12
engine) that single DNA molecules rotated in the expected direction during transcription.
$ Proofreading
In addition, RNA Pol carries out two proofreading functions as well. The first of these is
called pyrophosphorilytic editing. In this, the enzyme uses its active site, in a simple back
reaction, to catalyze the removal of an incorrectly inserted ribonucleotide, by
reincorporation of PPi. The enzyme can then incorporate another ribonucleotide in its place
in the growing RNA chain. Note that the enzyme can remove either correct or incorrect
bases in this manner, but spends longer hovering over mismatches than matches and so
removes the former more frequently. In the second proofreading mechanism, called
hydrolytic editing, the polymerase back tracks by one or more nucleotides and cleaves the
RNA product, removing the error containing sequence. Hydrolytic editing is stimulated by
Gre factors, which as well as, enhancing hydrolytic editing function, also serve as
elongation stimulating factors. That is, they ensure that polymerase elongates efficiently
and helps overcome arrest at sequences that are difficult to transcribe. This combination
of functions is comparable to those imposed on the eukaryotic RNA Pol II by the
transcription factor TFIIs. Another group of proteins, the Nus proteins, joins polymerase in
the elongation phase and promotes, in still rather undefined ways, the process of
elongation and termination.
$ Termination
It detects termination signals that specify where a transcript ends. The length of RNA-
DNA hybrid is determined by a structure within the enzyme that forces the RNA-DNA
hybrid to separate, allowing the RNA chain to exit from the enzyme and the DNA chain to
rejoin its DNA partner. The RNA product does not remain base paired to the template
DNA strand, rather the enzyme displaces the growing chain only a few nucleotides behind
where each ribonucleotide is added. Because this release follows so closely behind the site
of polymerization, multiple RNA Pol molecules can transcribe the same gene at the same
time, each following closely along behind another. Thus, a cell synthesizes large numbers
of transcripts from a single gene (or other DNA sequence) in a short time.
Thus, RNA Pol has the facility to unwind and rewind DNA, to hold the separated strands of
DNA and the RNA product, to catalyze the addition of ribonucleotides to the growing RNA
chain and to adjust the difficulties in progressing by cleaving the RNA product and restarting
RNA synthesis (with the assistance of some accessory factors).
(F) Fidelity of RNA synthesis

Unlike DNA Pol, RNA Pol lacks a separate proof reading 3 5 exonuclease active site and
hence error rate is high. Thus, in contrast with DNA Pol, RNA Pol does not correct the nascent
polynucleotide chain. Consequently, the fidelity of transcription is much lower than that of
replication. The error rate of RNA synthesis is of the order of one mistake per 104 or 105
nucleotides, about 105 times as high as that of DNA synthesis. The much lower fidelity of RNA
synthesis can be tolerated because mistakes are not transmitted to progeny. Moreover, for most
genes, many RNA transcripts are synthesized from a single gene and all RNAs are eventually
degraded and replaced. A few defective transcripts are unlikely to be harmful to the cell than a
mistake in the permanent information in DNA.
13
(2) Promoters
The promoter is the region of DNA where RNA polymerase binds to initiate transcription. The
information for promoter function is provided directly by the DNA sequence; its structure is the
signal for transcription. The promoter surrounds the first base pair that is transcribed into RNA,
the start point. As the promoters are present on the same DNA molecule as genes being
transcribed or regulated, these are called cis-acting elements. E. coli has about 2000 promoter
sites in its 4.6 X 106 bp genome. There are different types of promoters in E. coli, but most
prevalent one is 70 promoter (standard promoter), which is dealt with in detail in the following
discussion.
(A) Consensus sequences

A comparison of many prokaryotic promoter sequences reveals RNA Pol binding sites. An
essential nucleotide sequence, called conserved sequence, should be present in all the promoters.
However, conserved sequence need not be necessarily conserved at every single position, some
variation is permitted. Putative DNA recognition sites can be defined in terms of an idealized
sequence that represents the base most often present at each position. A consensus sequence is
defined by aligning all known examples so as to maximize their homology. For a sequence to be
accepted as a consensus each particular base must be reasonably predominant at its position and
most of the actual examples must be related to the consensus by rather few (1-2) substitutions.
The sequence of promoter in E. coli lack any extensive conservation of sequence over the 60 bp
associated with RNA Pol. The sequence of much of the binding site is irrelevant. But some short
stretches within the promoter are conserved and they are critical for its function.
Bacterial promoters have following features:
(i) Start point (+1 position): The initiating (+1) nucleotide is usually (>90% of the time) a
purine nucleotide (A or G; A occurs more often than G). It is common for the start point to be the
central base in the poorly conserved CAT or CGT sequence, but the conservation of the base
triplet is not great enough to regard it as an obligatory signal.
(ii) -10 sequence or Pribnow Box: The most conserved sequence recognizable in almost all
promoters is a 6 bp long AT rich motif centered at ~10 nucleotides upstream of the start site.
Because of its position, it is named as -10 sequence. This is also known as Pribnow Box (named
after David Pribnow, who pointed out its existence in 1975). The center of the hexamer generally
is close to 10 bp upstream of the start point; the distance varies in known promoters from -18 to -
9. Its consensus is 5TATAAT and its average can be summarized in the form 5
T80A95T45A60A50T96 3 where the subscript denotes the % occurrence of the most frequently
found base, which in this case varies from 45-96%.
If the frequency of occurrence indicates likely importance in binding RNA Pol, we would expect
the initial highly conserved TA and the final almost completely conserved T in the -10 region to
be the most important bases. The region is AT rich and hence low energy is required for strand
separation at this region. A mutation in this region has been implicated to affect melting reaction.
(iii) -35 sequence: A 6 bp long sequence centered at ~35 nucleotides upstream of the start site.
The consensus is 5TTGACA. In more detailed form the conservation is 5 T82T84G78A65C54A45
14
3 where, the subscript denotes the % occurrence of the most frequently found base, which in this
case varies from 45-84%.
(iv) Distance between -10 and -35 sequences: The distance between these conserved
sequences (-10 and -35 regions) is also very critical. It is between 16-19 bp in 90% of the
promoters (a separation of 17 nucleotides is optimal). In the exceptions it is as little as 15
nucleotides and as large as 20 nucleotides. However, the actual sequence of this intervening
DNA is unimportant. The distance represents a single turn of the helix, thereby providing
appropriate separation for simultaneous interaction of factor with the two motifs (-10 and -35
sequences).
The promoters with the -10 and -35 sequences as 5TATAAT and 5TTGACA respectively are
called standard promoters. These are recognized by 70 subunit of RNA Pol. Individual
promoters usually differ from the consensus at one or more positions. A typical bacterial
promoter is represented in Fig. 5.
5TTGACA...16-18 bp....TATAAT.Purine.....3
- 35 region - 10 region +1
[Recognition Domain] [Pribnow Box] [Start site]
[Unwinding Domain]
Fig. 5: Constitution of a typical bacterial promoter
(v) Some other conserved sequences of 70 promoters: 70 promoters of some genes have
additional consensus sequences such as:
(a) Upstream promoter elements or UP elements: Richard Gourse discovered that

promoters of certain highly expressed genes (for eg. genes encoding rRNA, the rrn genes)
contain a third AT rich recognition element, called UP (upstream promoter) element and occurs
between positions 40 and 60. This UP element binds the C-terminal domain (CTD) of the
RNA Polymerase subunit. UP elements stimulate transcription at promoters that contain them
by providing additional specific interaction site between the RNA Pol and DNA. The efficiency
with which an RNA Pol binds to a promoter and initiates transcription is determined in large
measure by these sequences, the spacing between them and their distance from the transcription
start site. The sequence of UP element is: 5 NNAAAA/TA/TTA/TTTTTNNAAAANN
(b) Extended-10 element: Another class of 70 promoters lack a -35 region and instead has
a so called extended -10 element. This comprises a standard -10 region with an additional short
sequence element at its upstream end. These elements are recognized by the region of RNA
Pol. Extra contacts made between polymerase and this additional sequence element compensate
for the absence of a -35 region, for eg. gal genes of E. coli use such a promoter.
15
Various combinations of bacterial promoter elements are shown in Fig. 6.
~ 17 bp
- 35 - 10 +1
UP - 35 - 10 +1
element
- 10 +1
Extended-10
Fig. 6: Combinations of bacterial promoter elements
(B) Promoter efficiency

Promoters differ markedly in their efficacy. Depending upon the relatedness to the consensus
sequences of the -10 and -35 sequences, the promoters are classified as strong promoters and
weak promoters. Promoters with sequences closer to the consensus are generally stronger than
those that match lesser. Strength of the promoter signifies the number of transcripts it can initiate
in a given time. Genes with strong promoters are transcribed frequently, as often as every 2
minutes in E. coli. In contrast, genes with very weak promoters are transcribed about once in 10
minutes.
Mutation of a single base in either -10 or -35 sequences can alter promoter activity. Mutations in
the -35 region usually affect initial binding of RNA Pol and mutations in the -10 region usually
affect the melting reaction.
(C) Supercoiling is an important feature regulating efficiency of promoters

Efficiency of some promoters is emphasized by the effects of supercoiling. Negative
supercoiling increases the efficiency of some promoters by assisting the melting reaction by both
prokaryotic and eukaryotic RNA Pol. As RNA Pol transcribes DNA unwinding and rewinding
occurs. This requires that either the entire transcription complex rotates about the DNA or the
DNA itself must rotate about its helical axis. The twin domain model for transcription illustrates
the consequences of the rotation of the DNA. As RNA Pol pushes forward along the double
helix, it generates positive supercoils (more tightly wound DNA) ahead and leaves negative
supercoils (partially unwound DNA) behind. For each helical turn traversed by RNA Pol, +1 turn
is generated ahead and -1 turn behind. Transcription therefore has a significant effect on the
(local) structure of DNA. As a result, the enzyme gyrase, which introduces negative supercoils
and topoisomerase I, which removes negative supercoils, are required to rectify the situation in
16
front of and behind the polymerase, respectively. Inappropriate superhelicity in the DNA being
transcribed halts transcription. Quite possibly the torsional tension in the DNA generated by
negative superhelicity behind the transcription bubble is required to help drive the transcriptional
process, whereas too much such tension prevents the opening and maintenance of the
transcription bubble.
The dependence of a promoter on supercoiling is determined by its sequence. This would predict
that some promoters have sequences that are easier to melt and are therefore less dependent on
supercoiling, while others have more difficult sequences and have a greater need to be
supercoiled. An alternative is that the location of the promoter might be important if different
regions of the bacterial chromosome have different degrees of supercoiling.
(D) Functions of promoter regions

The function of -35 sequences is to provide the signal for recognition by RNA polymerase, while
the -10 sequence allows the promoter-polymerase complex to convert from closed to open
form. Thus, -35 sequence comprise a recognition domain while the -10 sequence comprises
unwinding domain of the promoter. The consensus sequence of the -10 site consists exclusively
of AT base pairs, which assists the initial melting of DNA into single strands. The lower energy
needed to disrupt AT base pairs as compared to GC base pairs, means that a stretch of AT pairs
demands the minimum amount of energy for strand separation.
A typical promoter relies on its -35 and -10 sequences to be recognized by RNA Pol, but one or
the other of these sequences can be absent from some (exceptional) promoters. In at least some
of these cases, RNA Pol alone cannot recognize the promoter, and the reaction also requires
ancillary proteins, which overcome the deficiency in intrinsic interaction between RNA Pol and
the promoter.
(E) Alternative promoter sequences

There are several alternative promoter sequences that are recognized by different subunits.
These promoters have sequences that differ from the consensus sequence of a conventional or
standard promoter. Some examples are listed in Fig. 7.
Heat Shock 5. CCCTTGAA13-15 bp CCCGATNT...3

Nitrogen
5.....CTGGNA.6 bp.TTGCA3
starvation
Flagella 5 CTAAA15 bp GCCGATAA....3
where, N can be any nucleotide
Fig. 7: Alternative promoter sequences
17
(3) Overall process of prokaryotic transcription
The process of transcription can be divided in three steps namely, Initiation, Elongation and
Termination (Fig. 8).
RNA Pol
Promoter recognition
+1 DNA
Promoter
Promoter binding
(closed complex)
Initiation
Promoter melting
(open complex)
Initial transcription
RNA
Elongation after
abortive initiations &
promoter clearance Elongation
RNA Elongation
Termination,
release of RNA &
RNA Pol
Termination
DNA
+ +
RNA Pol
RNA
Fig. 8: The overall process of transcription in prokaryotes
18
(A) Initiation
Transcription begins with the insertion of the first ribonucleotide (usually a purine). The end of
initiation is signified by promoter clearance, where the RNA Pol moves ahead (along the DNA
template) from the promoter site without dissociating, freeing the promoter for further initiation
events. Promoter clearance occurs only if the open promoter complex is stable and this usually
follows a number of abortive initiations where short transcripts are generated. This is a general
property of RNA Pol and appears to be required for denovo strand synthesis. Initiation is usually
the rate-limiting step in transcription and is the primary level of gene regulation in both
prokaryotes and eukaryotes.
The pathway of transcription initiation consists of two major parts, binding and initiation, and
each part has multiple steps, which are summarized below. RNA Pol recognizes the promoter
region, leads to local unwinding at the site bound by RNA Pol and causes some abortive
initiations. During this phase the RNA Pol remains stationary at the site of binding (i.e.
promoter) and its conformation remains essentially the same. During this phase, the first ~8-9
nucleotides are added. The initiation phase ends when the enzyme succeeds in extending the
RNA chain and clears the promoter. Regulatory proteins that bind to specific sequences near
promoter sites and interact with RNA polymerase also markedly influence the frequency of
transcription of many genes.
The initiating reaction is simply the coupling of two NTPs in the reaction given below:
ppp A + ppp N pppApN + PPi
Bacterial RNAs have 5-triphosphate groups as was demonstrated by the incorporation of

radioactive label into RNA when it was synthesized with [-32P] ATP. In such a case, only the 5
terminus of the RNA can retain the label because the internal phosphodiester groups of RNA are
derived from the -phosphate groups of NTPs.
Initiation in transcription is further divided into discrete phases of DNA binding and initiation of
RNA synthesis, which are described below:
(i) Template and promoter recognition and formation of closed binary complex: The
holoenzyme-promoter reaction starts by forming a closed binary complex. Closed means that
the DNA remains duplex. Initially, the subunit of the enzyme RNA Pol ( subunit is involved
in promoter selection) binds loosely and reversibly to duplex DNA and searches for the promoter
sequence. This is the closed binary complex or closed promoter complex or closed promoter-
polymerase complex. In E. coli, RNA Pol binding occurs within a region stretching ~50 bp
before the transcription start site to ~20 bp beyond it. Because the formation of closed binary
complex is reversible, it is usually described by equilibrium constant (KB). There is a wide range
in values of the equilibrium constant for forming the closed complex. Formation of the closed
complex is readily reversible and RNA Pol can as easily dissociate from the promoter as make
the transition to the open complex.
(ii) Formation of open binary complex or isomerization: The transition from the closed
promoter complex (in which DNA is double helical) to the open promoter complex (in which
19
a DNA segment is unwound) is an essential event in transcription. In the bacterial enzyme
bearing 70, this transition often termed isomerization, does not require energy derived from
ATP hydrolysis and is instead the result of a spontaneous conformational change in the DNA-
enzyme complex to a more energetically favorable form. Isomerization is essentially irreversible
and once complete, typically guarantees that transcription will subsequently initiate (though
regulation can still be imposed after this point in some cases).
Although RNA Pol can search for promoter sites when bound to double helical DNA, a segment
of the helix must be unwound before synthesis can begin. A region of duplex DNA must be
unpaired so that the nucleotides on one of its strands become accessible for base pairing with
incoming ribonucleotides. When the correct sequence is recognized by RNA Pol holoenzyme,
the DNA at the promoter site is intact and locally unwound (DNA melting). The series of events
leading to formation of an open complex is called tight binding. Due to tight binding, the
interaction between the RNA Pol holoenzyme and DNA becomes irreversible and the closed
complex undergoes a transition to open complex. Thus, the closed complex is converted into an
open complex by melting of a short region of DNA within the sequence bound by the enzyme.
This characterizes the open binary complex, open promoter complex or open promoter-
polymerase complex. Here, DNA strands separate locally over a distance of ~17 bp of DNA
(from within the -10 region to position +2 or +3), which corresponds to 1.6 turns of the B-DNA
helix. This opening frees the template strand to be available for base pairing with
ribonucleotides. Unwinding increases the negative supercoiling of DNA. Negative supercoiling
of circular DNA favors transcription of genes because it facilitates unwinding.
For strong promoters, conversion into an open binary complex is irreversible, so this reaction is
described by a rate constant (k2). This reaction is fast. factor is involved in the DNA melting
reaction.
(iii) Formation of ternary complex (unstable) and Abortive initiations: The next step is to
incorporate the first two nucleotides and then catalyze a phosphodiester bond formation between
them. This generates a ternary complex that contains RNA as well as DNA and enzyme. The
ribonucleotides are aligned on the template strand and joined together. The initiating
ribonucleotide is usually a purine (A or G). RNA Pol makes specific interactions with the
initiating purine, holding it rigidly in correct orientation to allow chemical attack on incoming
NTP. The requirement for such specific interactions between the enzyme and the initiating NTP
probably explains why most transcripts start with same nucleotide. The interactions are specific
for that nucleotide (or A) and thus only chains beginning with A are held in a manner suitable
for efficient initiation. It is believed that the interactions are provided by various parts of the
RNA Pol holoenzyme, including part of . Consistent with this, in experiments using an RNA
Pol containing a 70 derivative lacking this part of , initiation requires much higher than normal
concentrations of initiating it. The region containing RNA Pol, DNA and nascent RNA is called
a transcription bubble (called so because it contains a locally melted bubble of DNA) or
transcription complex. Formation of ternary complex is described by the rate constant ki; this is
even faster than the rate constant k2.
Further nucleotides can be added without any enzyme movement to generate an RNA chain of
up to 9 bases. Thus, RNA Pol forms an unstable ternary complex comprising of DNA-RNA
hybrid helix (i.e. DNA template and short RNA) and RNA Pol holoenzyme. This RNA-DNA
20
helix is thus ~8 bp long, which corresponds to about one complete turn of the double helix. The
RNA-DNA hybrid also rotate each time a nucleotide is added so that 3-OH end of RNA stays at
the catalytic site of RNA Pol. Incorporation of first 9-10 ribonucleotides is a rather inefficient
process. After each base is added, there is a certain probability that the enzyme will release the
chain. At this stage the enzyme often releases short transcripts (each have less than ~10
ribonucleotides) and then starts synthesis of RNA again. Abortive initiations (i.e. synthesis of
short RNA) probably involve synthesizing an RNA chain that fills the active site. If the RNA is
released, the initiation is aborted and must start again. A cycle of abortive initiation usually
occurs to generate a series of very short oligonucleotides.
Initiation is accomplished when the enzyme manages to move along the template to the next
region of the DNA into the active site. The occurrence of a cycle of abortive initiations before
the enzyme moves to the next phase is a general property of RNA Pol and appears to be required
for denovo strand synthesis.
(iv) Formation of ternary complex (stable) and Promoter clearance: Once an RNA Pol
holoenzyme succeeds in synthesizing a nascent RNA chain of ~9-10 bases, i.e. when initiation
succeeds, is no longer necessary. The enzyme makes the transition to the elongation ternary
complex of core polymerase, DNA and nascent RNA. This involves a conformational change in
polymerase that help it to grip the template more firmly converting the ternary complex to the
elongation form. This conformational change is followed by movement of the RNA Pol away
from the promoter site, without dissociating, thereby freeing the promoter (i.e. promoter
clearance) for further initiation events. Thus, promoter clearance occurs only if the open complex
is stable (stable ternary complex) and usually follows a number of abortive initiations. This
signifies the end of the initiation phase and the transition to the elongation phase leading to the
extension of RNA chain beyond 10 bases. The efficiency of promoter clearance is modulated by
the nature of the first fifty or so bases in the transcribed region. The minimum value of the
promoter clearance time (i.e. the time taken by the RNA Pol to leave the promoter so that
another RNA Pol can initiate) is 1-2 sec, within which the RNA Pol establishes the maximum
frequency of initiation as <1 event per sec.
(B) Elongation
When the first ~9 nucleotides have been added, the transcribed template strand is scrunched in
the active site. The active site can hold a transcript of 6-9 nucleotides. The transcription bubble
moves along DNA and the RNA chain is extended in the 5 3 direction (Fig. 9).
As the RNA Pol holoenzyme clears the initiation site and enters the elongation phase of
transcription, the subunit may either dissociate or remains associated with the core enzyme. It
was discovered that factor is released after initiation. However, this may not be strictly true.
Direct measurements of elongating RNA Pol complexes show that ~70% of them retain factor.
Such a third of elongating polymerases lack , the original conclusion is certainly correct that it
is not necessary for elongation. In those cases where it remains associated with core enzyme, the
nature of the association has almost certainly changed.
The core enzyme without binds more strongly to the DNA template. From this point onwards,
the core enzyme undertakes RNA chain elongation beyond 10 bases. The core enzyme then
21
moves along the template strand, opening (or unwinding) the DNA helix ahead of the site of
polymerization (i.e. front or leading edge) so as to expose a new segment of the template in
single stranded condition. During this time, subsequent ribonucleotides are added to the 3 end of
the growing RNA chain. Elongation involves the movement of the transcription bubble (a
distance of 170 / second, corresponding to a rate of elongation of ~50 nucleotides / sec) by a
disruption of DNA structure, in which the template strand of the transiently unwound region is
paired with the nascent RNA at the growing point. As in the initiation phase, about 17 bp of
DNA are unwound at a time throughout the elongation phase. It has been found that the RNA-
DNA hybrid and the unwound region of DNA stay rather constant as RNA Pol moves along the
DNA template, thereby indicating that the unwound DNA reseals (or rewinds) at the same rate
behind (i.e. rear or trailing edge) the RNA Pol. The RNA-DNA hybrid must also rotate each time
a nucleotide is added so that the 3-OH end of the RNA stays at the catalytic site. When the RNA
chain extends to 15-20 bases, the enzyme makes a further transition to form the complex that
undertakes elongation and now it covers 30-40 bp (depending on the stage in elongation cycle).
Unwound DNA
(17 bp opened)
Double helical DNA
Coding strand
Template strand
RNA -DNA hybrid
Rewinding Unwinding
3' elongation site
Nascent RNA
RNA polymerase
5'ppp
Movement of RNA polymerase
Fig. 9: Transcription bubble
(C) Termination
Termination involves following steps:
& Cessation of formation of phosphodiester bonds
& Dissociation of RNA-DNA hybrid
& Rewinding of melted region of DNA
& Release of RNA Pol from DNA
Sequences called terminators trigger the elongating polymerase to dissociate from the DNA
and release the RNA chain it has made. E. coli has at least two classes of termination signals, one
class relies on a protein factor called (rho) and the other is -independent. Both dependent
and independent terminators respond to a functioning signal that lies within the newly
22
synthesized RNA rather than in template DNA. In both types of termination, pausing by RNA
Pol is important in order to allow time for actual termination event to occur.
(i) -independent (intrinsic) termination: Many terminators require a hairpin to form in the
secondary structure of the RNA being transcribed. This indicates that termination depends on the
RNA product and is not determined simply by scrutiny of DNA sequence during transcription.
-independent terminators (Intrinsic terminators) have two structural features:

$ A hairpin in secondary structure
The first feature is a region that produces an RNA transcript with self-complementary
sequences, permitting the formation of a hairpin structure centered 15-20 nucleotides before
the projected end of the RNA strand. Formation of the hairpin structure in the RNA disrupts
several AU base pairs in the RNA-DNA hybrid segment. This also pauses RNA Pol
immediately after it has synthesized a stretch of RNA that folds into a hairpin and disrupts
important interactions between RNA and the RNA Pol, thereby facilitating dissociation of
the transcript. Hairpin usually contains a GC rich region near base of stem. The typical
distance between hairpin and U rich region is 7-9 bases. There are ~1100 sequences in E.
coli genome that fit this criterion, suggesting that half of the genes have intrinsic terminator.
$ A region that is rich in U residues at the very end of the unit

The hairpin only works as an efficient terminator when it is followed by a stretch (4 or
more AU) of AU base pairs. This is because under those circumstances, at the time the
hairpin forms, the growing RNA chain will be held on the template at the active site by only
AU base pairs. As AU base pairs, the weakest of all base pairs, (weaker even than AT base
pairs), are more easily disrupted by the effects of the stem loop on the transcribing
polymerase and so the RNA will more readily dissociate (Fig. 10).
Rho independent termination

Formation of
Template strand Hairpin
mRNA GC rich
region GC rich
region
UUUUUUU UUUUUUU
Coding strand
GC rich
UUUUUUU Double stranded DNA

mRNA
Fig. 10: Rho () independent (intrinsic) termination
23
(ii) -dependent termination: As already discussed, RNA Pol needs no help to terminate
transcription at a hairpin followed by several U residues. At other sites, however, termination
requires the participation of additional factor. This discovery was prompted by the observation
that some RNA molecules synthesized in vitro by RNA Pol acting alone are longer than those
made in vivo. The missing factor, a protein that caused the correct termination, was isolated and
named rho (), also called rho transcription terminator factor. Additional information about the
action of the rho was obtained by adding this termination factor to an incubation mixture at
various times after the initiation of RNA synthesis. RNAs with sedimentation coefficients of
10S, 13S and 17S were obtained when rho was added at initiation, a few seconds after initiation
and 2 minutes after initiation, respectively. If no rho was added, transcription yielded a 23S RNA
product. It is evident that the template contains at least three termination sites that respond to rho
(yielding 10S, 13S and 17S RNA) and one termination site that does not (yielding 23S RNA).
Thus, specific termination at a site producing 23S RNA can occur in the absence of rho.
However, detects additional termination signals that are not recognized by RNA Pol alone
(Fig. 11).
Initiation Termination in absence of

DNA Template
(Rho) sites
(Indicated by arrows)
No Rho (23S species)

RNA Transcripts Rho present at start of synthesis (10S
species)
Rho added 30 sec later
(13S species)
Rho added 2 min later
(17S species)
Fig. 11: Effect of Rho protein on the size of the transcript
The -dependent terminators lack the sequence of repeated A residues in the template strand
but usually include a CA rich sequence called a rut (rho utilization) element. Optimally these
sites consist of stretches of about 40 nucleotides that do not fold into a secondary structure i.e.
they remain largely single stranded. They are also C rich. The second level of specificity is that
rho fails to bind any transcript that is being translated i.e. transcript bound to ribonucleotides. In
bacteria transcription and translation are coupled tightly, translation initiates on growing RNA
transcript as soon as they start exiting polymerase, while they are still being synthesized. Thus,
rho typically terminates only those transcripts still being transcribed beyond the end of a gene or
operon.
24
is a homo-hexameric terminator protein with a size of ~275 kD (each subunit size is 419
residues). The X-ray structure of protein reveal that the six monomers form an open ring. The
ring is not flat. The sixth subunit is further down in the plane of the page than the first. Its first
and sixth subunits are separated by a gap of 12 and the helical pitch (rise along the helix axis)
between them is 45 . The RNA transcript on which acts, is believed to bind along the bottom
of each subunit and then thread through the middle of the ring. Each subunit consists of two
domains that can be separated by proteolysis: Its N-terminal domain or RNA binding domain
binds single stranded polynucleotides and its C-terminal domain or ATP-hydrolysis domain,
which is homologous to the and subunits of the F1-ATPase, binds an NTP. It hydrolyzes
ATP in the presence of single stranded RNA, probably through recognition of a specific
structural feature rather than a consensus sequence. The RNA, which is only partially visible in
the structure, binds to the so-called primary RNA binding sites on the N-terminal domains that
face the interior of the helix and to the so-called secondary RNA binding sites on the C-terminal
domain that have been implicated in mRNA translocation and unwinding.
The protein has an ATP-dependent RNA-DNA helicase activity. It binds to nascent RNA at
specific binding sites or recognition sequences (Fig. 12). It then uses its RNA-dependent ATPase
activity to provide the energy to translocate along the RNA in the 5 3 direction to a
sequence that is rich in C and poor in G residues preceding the actual termination site. C is by far
the most common base (41%) and G is the least common base (14%). As a general rule, the
efficiency of -dependent terminators increases with the length of C-rich or G-poor region. Rho
hydrolyzes ATP in presence of single stranded RNA, probably through recognition of a specific
structural feature rather than a consensus sequence.
Coding strand
Template strand
RNA -DNA hybrid
ATP + H 2O
A
RN
RNA polymerase
ADP + Pi
Rho protein
5'ppp
Mechanism of the termination of transcription by rho protein
Fig. 12: Mechanism of Rho dependent termination
25
Proteins, in addition to , mediate and modulate termination. For eg. Nus A protein enables RNA
Pol in E. coli to recognize a characteristic class of termination sites. In E. coli, specialized
termination signals called attenuators are regulated to meet the nutritional needs of the cell.
Transcription in eukaryotes
Robert Roeder and William Rutter discovered that eukaryotic transcription machinery is much
more complex as compared to that of prokaryotes, as large number of polypeptides are
associated with the eukaryotic transcription machinery. The mechanism of eukaryotic
transcription is, however, similar to that in prokaryotes.
Unlike in bacteria, eukaryotic genome is packaged into the chromatin structure (nucleosomal
structure) and therefore is inaccessible to the transcription machinery. Prior to transcription of a
specific gene, its chromatin structure is modified to become more accessible to the transcription
apparatus. The two most well understood mechanisms of chromatin modifications are:
(i) Specific modifying complexes: Many eukaryotic gene activator proteins modify chromatin
structures by recruiting histone acetyltransferases.
(ii) Nucleosome remodeling by chromatin remodeling complexes.
Acetylation and remodeling prepares the gene promoter to initiation assembly of RNA Pol, other
accessory proteins and gene specific transcription factors to initiate the transcription process.
In transcription, only some regions of the genome are transcribed and the regions chosen vary in
different cells or in the same cell at different times i.e. one to several thousand transcripts can be
made of a given region in a single cell.
Eukaryotic transcription apparatus
Eukaryotic transcription machinery involves three RNA polymerases, number of general

transcription factors, several elongation factors and large repertoire of gene specific transcription
factors and activators. Furthermore, the entire transcription machinery is coupled with an
enormously complex signal transduction cascade that integrates the external stimuli with the
transcription machinery.
(1) RNA polymerase or DNA dependent RNA polymerase (RNA Pol)

Eukaryotic cells have three kinds of nuclear RNA polymerases, RNA Pol I, II and III. These are
distinct complexes but have certain subunits in common. Each RNA Pol is large and has 12 or
more different subunits. In S. cerevisiae, RNA Pol I, II and III have 14, 12 and 17 subunits
respectively. While some of these subunits are exclusive for one RNA Pol, others are either
identical or structurally related. Each polymerase has a specific function and is recruited to a
specific promoter sequence. They differ in their template specificity and location in the nucleus.
Although all eukaryotic RNA Pols are homologous to one another and to prokaryotic RNA Pol,
RNA Pol II contains a unique carboxyl terminal domain called tail. Another major distinction
26
among the polymerases lies in their responses to the fungal toxin -amanitin, a cyclic
octapeptide that contains several modified amino acids. The activities of different RNA Pols are
distinguished by their different sensitivities to the toxin. Properties of different eukaryotic RNA
polymerases have been summarized in Table 4.
In addition to these three different nuclear RNA Pols, eukaryotic cells contain separate
polymerases in mitochondria and chloroplast. These small (~100 kD) single subunit RNA Pols,
which resemble those encoded by certain bacteriophages are much simpler than the nuclear RNA
Pols, although they catalyze the same reaction.
Table 4: Properties of different eukaryotic RNA polymerases
S. No. Properties RNA Pol I RNA Pol II RNA Pol III

1 Location Nucleoli Nucleoplasm Nucleoli
2 Function Synthesis of Synthesis of precursors Synthesis of
(cellular precursors of most of mRNA and some precursors of 5S
transcripts) rRNA (5.8S, 18S small nuclear RNAs rRNA and tRNA
and 28S) (snRNAs) and small
nuclear and
cytosolic RNAs
3 Sensitivity to - Insensitive Very sensitive Moderately
amanitin fungal (Strongly inhibited); sensitive
toxin (cyclic binds tightly and inhibit (inhibited by
octapeptide) elongation phase high
concentrations)
4 Number of 14 12 17
subunits
5 Polymerase 50-70% 20-40% 10%
activity / cell
6 Class of genes Class I Class II Class III
transcribed
(A) RNA Pol I (RNA Pol A)

RNA Pol I (Pol I or Pol A) is located in the nucleoli. It is responsible for continuous synthesis of
rRNA during interphase. The continuous transcription of multiple gene copies of the RNAs is
essential for sufficient production of the processed rRNAs, which are packaged into ribosomes.
Human cells contain 5 clusters of around 40 copies of rRNA gene situated on different
chromosomes. Each rRNA cluster is known as a nucleolar organizer region, since the nucleolus
contains large loops of DNA corresponding to the gene clusters. After a cell emerges from
mitosis, rRNA synthesis restarts and tiny nucleoli appear at the chromosomal locations of the
rRNA genes. Each rRNA gene produces a 45S rRNA transcript called pretranscript or
preribosomal RNA or pre-rRNA, which is ~13000 nucleotide long. During active rRNA
27
synthesis, the pre-rRNA transcripts are packed along the rRNA genes and may be visualized in
electron microscope as Christmas tree structures. In these structures, the RNA transcripts are
densely packed along the DNA and stick out perpendicularly from the DNA. The 45S
pretranscript is cleaved to give one copy each of 28, 18, 5.8S rRNAs, which are 5000, 2000 and
160 nucleotides long respectively.
(B) RNA Pol II (RNA Pol B)

Among the three RNA polymerases, RNA Pol II (Pol II or Pol B) is functionally most versatile
as it transcribes the mRNAs and some specialized RNAs such as most of the small nuclear RNAs
(snRNAs). RNA Pol II is central to eukaryotic gene expression and has been studied extensively.
RNA Pol II is located in the nucleoplasm. This enzyme can recognize thousands of promoters
that vary greatly in sequence. Although this RNA Pol II is strikingly more complex than its
bacterial counterpart, the complexity masks a remarkable conservation of structure, function and
mechanism.
(i) Structure: RNA Pol II is somewhat larger than and has several subunits that have no
counterpart in Thermus aquaticus / bacterial RNA Pol. Pol II is a huge enzyme with a molecular
mass of up to 600 kD. The enzyme contains two nonidentical large (>120 kD) subunits
comprising ~65% of its mass that are homologs of the prokaryotic RNA Pol and subunits
and up to 12 additional small (<50 kD) subunits, two of which are homologs of prokaryotic
RNA Pol subunits and one of which is a homolog of prokaryotic RNA Pol subunit. Of these
small subunits, five are identical in all three eukaryotic RNA Pols and two others (the RNA Pol
homologs) are identical in RNA Pol I and III. Thus, 10 of the 12 RNA Pol II subunits are
either identical or closely similar to subunits of RNA Pol I and III. Moreover, the sequences of
these subunits are highly conserved (~50% identical) across species from yeast to humans (and
to a less extent between eukaryotes and bacteria). In fact, in all ten cases tested, a human RNA
Pol II subunit could replace its counterpart in yeast without loss of cell viability.
Roger Kornberg determined the X-Ray crystallographic structure of RNA Pol II in yeast. Overall
the shape of yeast RNA Pol II enzyme resembles a crab claw, which is similar to bacterial Taq
RNA Pol. The yeast enzyme has positions and core folds similar to their homologous subunits in
bacterial RNA Pol. The two pincers of the crab claw (RNA Pol II) are made up predominantly
of the RPB1 and RPB2. The active site, which is made up of regions from both these subunits, is
found at the base of the pincers within a region called the active center cleft. The highly
conserved helical segment of RBP1 called bridge bridges the two pincers forming the enzymes
cleft. This helix is straight in all X-Ray structures of RNA Pol II yet determined, but it is bent in
that of Taq RNA Pol. A massive (~59 kD) portion of RPB1 and RPB2 named the clamp swings
down over the DNA to trap it in the cleft. A portion of RPB2 called the wall directs the
template strand out of the cleft in a ~90 turn. A loop called the rudder extends from the clamp.
There are various channels that allow DNA, RNA and ribonucleotides into and out of the
enzymes active center cleft.
Various subunits of RNA Pol II are summarized below (Table 5).
(a) RBP1 having C-terminal Domain (CTD) and RBP2: RBP1 is the largest subunit and
exhibits a high degree of homology to the subunit of a bacterial RNA Pol. It contains the
28
active site of the enzyme RNA Pol II.
It has an unusual feature, a long carboxyl terminal domain (CTD) called tail. The tail consists
of many highly conserved repeats of a heptad amino acid sequence Tyr-Ser-Pro-Thr-Ser-Pro-
Ser (YSPTSPS). There are 27 repeats in the yeast enzyme (18 exactly matching the consensus),
52 (21 exact) in the mouse enzyme and 53 in human enzyme. This CTD is separated from the
main body of the enzyme by an unstructured linker sequence. These repeats are essential for
viability. The CTD sequence may be subjected to phosphorylation at Ser and Tyr. Five of the 7
residues in these particularly hydrophilic repeats bear OH groups and at least 50 of them,
predominantly those on Ser residues, are subject to reversible phosphorylation by CTD kinases
and CTD phosphatases. In vitro studies have shown that RNA Pol II initiates transcription only
when the CTD is unphosphorylated. Phosphorylation of CTD occurs during transcription
elongation as RNA Pol leaves the promoter. Charge-charge repulsions between nearby phosphate
groups probably cause a highly phosphorylated CTD to project as far as 500 from the globular
portion of RNA Pol II. The phosphorylated CTD provides the binding sites for numerous
auxillary factors that have essential roles in the transcription process. The CTD has been shown
to be an important target for differential activation of transcription elongation. Such so-called
tail is absent in bacterial enzyme.
RBP2 is structurally similar to the bacterial subunit.
(b) RBP3 and RBP11: These two subunits show some structural homology to the bacterial
subunits.
(c) Rbp4 and Rbp7: Genetic studies have demonstrated that some of the Pol II specific
subunits are dispensable. Thus, two subunits, Rbp4 and Rbp7, are not essential for activity and
are present in RNA Pol II in less than stoichiometric amounts. Rbp7 has a 102-residue segment
that is 30% identical to a portion of 70 of E. coli. These subunits are absent in yeast
(Saccharomyces cerevisiae) RNA Pol II.
(d) RBP6: RPB6 is homologous to the subunits of bacterial RNA Pol.
Although Pol II has the smallest number of subunits, it transcribes the largest and most diverse
array of promoters. A number of other proteins, which are not part of the Pol II complex, are
used by RNA Pol II as subsidiary proteins, thereby contributing to its functional diversity.
(ii) Nucleotide addition and RNA Pol II translocation: RNA Pol II binds two Mg++ ions at
its active site in the vicinity of 5 conserved acidic residues, which suggests that RNA Pol
catalyze RNA elongation via a two-metal ion catalytic mechanism for nucleotide addition similar
to that proposed for all types of polymerase. As is the case with Taq RNA Pol, the surface of the
RNA Pol II is almost entirely negatively charged except for the DNA binding cleft and the
region about the active site, which are positively charged.
(C) RNA Pol III (also called as RNA Pol C)

RNA Pol III occurs in the nucleoplasm and synthesizes the precursors of 5SrRNA, tRNA,
U6snRNAs and a variety of other small nuclear and cytosolic RNAs. It has 16 or more subunits.
29
RNA Pol III transcribes the 5S rRNA component of large ribosomal subunit. This is the only
rRNA subunit to be transcribed separately. Like the other rRNA genes, which are transcribed by
RNA Pol I, the 5S rRNA genes are tandemly arranged in a gene cluster. In humans, there is a
single cluster of around 2000 genes. Less is known about signals and ancillary factors involved
in termination for eukaryotic polymerases. Each class of polymerase uses a different mechanism.
Genetic studies have demonstrated that in contrast to Pol I and Pol II, all subunits of Pol III are
essential. Table 5 summarizes various prokaryotic and eukaryotic RNA polymerase subunits.
Table 5: Comparison of prokaryotic and eukaryotic subunits of RNA polymerases
S. No. Prokaryotic Eukaryotic

Bacterial Archaeal RNA Pol I RNA Pol II RNA Pol
III
1 A / A RPA1 RPB1 RPC1
2 B RPA2 RPB2 RPC2
3 D RPC5 RPB3 RPC5
4 L RPC9 RPB11 RPC9
5 K RPB6 RPB6 RPB6
[+6 others] [+9 others] [+7 others] [+12 others]
Note: The subunits in each column are listed in order of decreasing molecular weight.
(2) Eukaryotic promoters

Unlike bacterial promoters, which have relatively simple structures, eukaryotic promoters are
highly complex in nature. The various promoters are described in the following sections:
(A) RNA Pol I promoter

Since, the numerous rRNA genes in a given eukaryotic cell have essentially identical sequences,
its RNA Pol I only recognizes one promoter. Yet, in contrast to the case for RNA Pol II and III,
RNA Pol I promoters are specific, i.e., an RNA Pol I only recognizes its own promoter and those
of closely related species. Pol I promoters vary greatly in sequence from one species to another.
Thus, e.g. mammalian RNA Pol I has a bipartite promoter consisting of two transcription control
regions:
(i) Core promoter element: It refers to minimal set of sequence element required for accurate
transcription initiation. It spans positions -31 to +6. It includes transcription start site and hence
overlaps the transcribed region. It has a short conserved sequence element, a short AT rich
sequence around start point called initiator sequence (Inr). This sequence is essential for
transcription (Fig. 13).
30
(ii) Upstream control element (UCE) or Upstream promoter element (UPE): It is located
between residues -187 and -107 bp upstream from the start site (Fig. 13). The element is GC rich.
The UCEs are ~85% identical and ~50-80 bp long. The sequence is bound by specific
transcription factors, which then recruit RNA Pol I to the transcription start site. The UCE is thus
responsible for an increase in efficiency of transcription by 10- to 100-fold compared to that
from the core element alone.
Transcription start site

(+1)
Upstream control element (UCE)
(~50-80 bp long; GC rich)
Pre-rRNA
gene
about -31 +6
-100
Core promoter
Fig. 13: RNA Pol I promoter
(B) RNA Pol II Promoter

The promoters recognized by RNA Pol II are considerably longer, complex and more diverse
than those of prokaryotic genes. Like RNA Pol I, RNA Pol II promoter consists of core promoter
and regulatory regions, which are described below: (Fig. 14).
(i) Core promoter (Basal elements): The eukaryotic core promoter refers to the minimal
set of sequence elements required for accurate transcription initiation by the Pol II
machinery. A core promoter is ~40 nucleotides long, extending either upstream or
downstream of the transcription start site. Four elements found in Pol II core promoters
are TATA box, BRE, Inr and DPE. Typically, a promoter includes only two or three of
these four elements. Many Pol II promoters have a few sequence features in common,
including a TATA box (eukaryotic consensus sequence TATAAA) near base pair -30 and
an Inr sequence (initiator) near the RNA start site at +1. However, few Pol II promoters
lack a TATA box or a consensus Inr element or both. The sequence elements summarized
here are more variable among the Pol II promoters of eukaryotes than among E. coli
promoters.
(a) TATA box or Hogness box: An A/T rich sequence (TATAA/TAA/T) called TATA
box is located -25 to -30 bp upstream of the transcription start site. The consensus
sequence (homologous segment, TATA box) is T82A97T93A85A63/T37A83A50/T37 and
the subscripts indicate the % occurrence of corresponding base. This TATA box
resembles the -10 region of prokaryotic promoters (TATAAT), although they differ in
their locations relative to the transcription start site (-27 vs -10). This conserved region
was first discovered by Goldberg Hogness and is also called (GH) box or Hogness box.
31
The TATA box is the major assembly point for the proteins of the preinitiation
complexes of Pol II. The deletion of the TATA box does not necessarily eliminate
transcription; rather it generates heterogeneities in the transcriptional start site,
thereby indicating that the TATA box participates in selecting this site.
(b) TFIIB recognition element (BRE): Immediate upstream of the TATA box is the
TFIIB recognition element, which is targeted by TFIIB. The consensus sequence is:
G/CG/CG/ACGCCC.
(c) Initiator sequence (Inr): The initiator element (Inr) is located around the
transcription start site (+1). The consensus sequence of Inr is:
C/TC/TANT/AC/TC/T. Many initiator elements have a C at position -1 and an A at
+1. The DNA is unwound at the initiator sequence and the transcription start site is
usually within or very near this sequence.
(d) Downstream promoter element (DPE): Further downstream in the transcribed

element is downstream promoter element having the consensus sequence:
A/GGA/TCGTG.
-37 -31-30 -26 -2 +4 +28 +32
BRE TATA Inr DPE
Binding sites for: TFIIB TBP of TFIID TFIID TFIID

Consensus G/CG/CG/ACGCCC YYANT/AYY
sequences: TATAA/TAA/T A/GGA/TCGTG
where, N represents any nucleotide and Y is pyrimidine nucleotide
Fig. 14: RNA Pol II promoter
Promoters contain different combinations of conserved elements. No element is

common to all the promoters. The elements found in any individual promoter differ in
number, location and orientation. Some eukaryotic genes contain an initiator element
instead of a TATA box. Other promoters have neither a TATA box nor an initiator
element. These genes are generally transcribed at low rates and initiation may occur
at different start sites over a length of up to 200 bp. These genes often contain a GC
rich 20-50 bp region within the first 100-200 bp upstream from start site (described
below).
(ii) Upstream regulatory elements (URE): The basal elements primarily determine the
location of the start point, but also sponsor initiation only at a rather low level. Thus, the
32
basal elements are not sufficient for strong promoter activity. Additional elements called
upstream regulatory elements located between -40 and -200 bp (present on template
strand) upstream of transcription start site are important in order to increase the low
activity of basal promoters. These sequences are important in regulating Pol II promoters
and vary greatly in type and number. They serve as binding sites for a wide variety of
proteins that affect the activity of Pol II. These elements are found in many genes, which
vary widely in their levels of expression in different tissues. The examples are:
(a) GC box: The structural genes expressed in all tissues, eg. House keeping genes or
constitutive genes (genes that are continuously expressed rather than regulated), have
one or more copies of the sequence 5-GGGCGG-3 located upstream from their
transcription start sites. They are located at about -90 position, however, the positions
of these upstream sequences vary from one promoter to another. Often multiple
copies are present in the promoter and they occur in either orientation. The structural
genes that are selectively expressed in one or a few types of cells often lack these GC
rich sequences.
(b) CAAT box: The gene region extending between -50 and -110 also contains promoter
elements. They can occur in either orientation. For instance, many eukaryotic
structural genes, including those encoding the various globins, have a conserved
sequence of consensus 5-GGNCAATCT-3 (the CAAT box) located between about
-70 and -90 whose alteration greatly reduces the transcription rate of the gene. Globin
genes have, in addition, a conserved CACCC box upstream from CCAAT box that
has also been implicated in transcriptional initiation.
The CAAT and GC boxes in eukaryotes differ from that of the similar regions in
prokaryotes. The positions of these upstream sequences vary from one promoter to
another, in contrast with the quite constant location of the -35 region in prokaryotes.
The CAAT box and the GC box can be effective when present on the template strand,
unlike the -35 region, which must be present on the coding strand. These differences
between prokaryotes and eukaryotes reflect fundamentally different mechanisms for
the recognition of cis acting elements. The -10 and -35 sequences in prokaryotic
promoters correspond to binding sites for RNA Pol and its associated factor. In
contrast, the TATA, CAAT, GC boxes and other cis acting elements in eukaryotic
promoters are recognized by proteins other than RNA Pol itself.
Although the promoter conveys directional information (transcription proceeds only

in the downstream direction), the GC and CAAT boxes seem to be able to function in
either orientation. They can function at distances that vary considerably from the start
point. This implies that the elements function solely as DNA binding sites to bring
transcription factors into the vicinity of the start point; the structure of a factor must
be flexible enough to allow it to make protein-protein contacts with the basal
apparatus irrespective of the way in which its DNA-binding domain is oriented and
its exact distance from the start point. GC and CAAT boxes thus play a strong role in
determining the efficiency of the promoter, but do not influence its specificity.
33
(C) RNA Pol III promoter
The promoters recognized by RNA Pol III are well characterized. Interestingly, some of the
sequences required for the regulated initiation of Pol III are located within the gene itself,
whereas others are in more conventional locations upstream of the RNA start site (Fig. 15).
(i) 5S rRNA genes: The genes for 5S rRNA are organized in a tandem cluster. The
promoters of genes transcribed by RNA Pol III can be located entirely within the
transcribed region (i.e. internal) of the gene. These sequences are therefore conserved
sequences in both 5S rRNA and DNA.
Donald Brown established this through the construction of a series of deletion mutants of
a Xenopus borealis 5S RNA gene. The 5S rRNA promoter contains the following
conserved sequences, which are depicted in Fig. 15.
(a) C box: It is located 81-99 bases downstream from the transcription start site.
(b) A box: It is located at around 50-65 bases downstream of the transcription start site.
The sequence of the Box A is: 5-TGGCNNAGTGG-3.

(+1)
+55 +81
Box A
Box C
Conserved sequences: TGGCNNAGTGG
Fig. 15: RNA Pol III promoter for 5S rRNA
(ii) tRNA genes: RNA Pol III promoters of tRNA genes contain two highly conserved
sequences within the DNA encoding the tRNA (internal transcription control regions),
namely Box A and Box B. These regions lie downstream from the transcription start site
i.e. after the transcription start site and within the transcription unit (Fig. 16).
(a) Box A: It is located around 50-65 bases downstream of transcription start site. The
sequence of the Box A is: 5-TGGCNNAGTGG-3.
(b) Box B: It is located downstream of transcription start site. The sequence of Box B is:
5-GGTTCGANNCC-3.
As both of these sequences lie within the gene, these are conserved in both tRNA and
DNA. Thus, these sequences also encode important sequences in the tRNA itself,
called the D-loop and the TC loop.
34
(+1)
+55
Box A Box B
Conserved sequences: TGGCNNAGTGG GGTTCGANNCC
Fig. 16: RNA Pol III promoter for tRNA
(iii) Alternative RNA Pol III promoters: A number of RNA Pol III promoters are regulated
by upstream as well as downstream promoter sequences.
Further studies have shown, however, that the promoters of other RNA Pol III-
transcribed genes lie entirely upstream of their start sites. These upstream sites also bind
transcription factors that recruit RNA Pol III. These promoters require only upstream
sequences including the TATA box and other sequences found in RNA Pol II promoters.
Some promoters such as the U6 small nuclear RNA (U6 snRNA) and small RNA genes
from the Epstein-Barr virus use only regulatory sequences upstream from their
transcription start sites. The coding region of the U6 snRNA has a characteristic A box.
However, this sequence is not required for transcription. The U6 snRNA upstream
sequence contains sequences typical of RNA Pol II promoters, including a TATA box at
bases -30 to -23. These promoters also share several other upstream transcription factor
binding sequences with many URNA genes, which are transcribed by RNA Pol II. These
observations suggest that common transcription factors can regulate both RNA Pol II and
RNA Pol III genes.
(3) Enhancers
Promoters are not the only types of cis acting sequences. Transcription from many eukaryotic
promoters can be stimulated by control elements that are located many thousands of base pairs
away from the transcription start site. This was first observed in the genome of the DNA virus
SV40. A sequence of around 100 bp from SV40 DNA can significantly increase transcription
from a basal promoter even when it is placed far upstream or downstream. Such distal sequences
are called enhancers. The enhancer elements thus constitute the distal part of the promoter and
can be located either upstream or downstream of the transcription start site. Enhancers are
common in eukaryotes and rare in prokaryotes (exception: present with 54 factor).
Enhancers have the following general characteristics and functions:

& Enhancer sequences are short sequence elements. They are generally a few hundred base
pair long (100-200 bp) and contain multiple sequence elements, which contribute to the
35
total activity of the enhancer. They consist of sets of elements, similar to upstream
promoter, but density of sequences is more i.e. these are more compactly organized as
compared to upstream promoter.
& Like promoters, they are cis-acting regulatory elements.
& They are able to function over long distance of more than 1000 bp whether from an
upstream or downstream position relative to start site. They are therefore also called long-
range regulatory elements. In contrast, promoters are small range elements.
& They can modulate (activate) transcription of the cognate genes when placed in either
orientation with respect to linked genes. They are active even when placed in reverse
orientation. They thus contain bidirectional elements and are orientation-independent (Fig.
17).
Upstream enhancer activates promoter
5
E P
Transcription
Downstream enhancer activates promoter

5
P E
Transcription
Fig. 17: Activation of transcription by enhancer is orientation and direction independent
& Interestingly, the positions of enhancers relative to promoters are not fixed and they can
vary substantially. They can modulate (activate) transcription of the cognate genes even
when moved away from its original location either upstream or downstream of the coding
sequence. Thus, in natural genomes, enhancers can be located within genes also. They are
thus position-independent.
& Enhancers contain the same sequence elements that are found at promoter. The density of
sequence components is greater in the enhancer than in the promoter.
& They may be ubiquitous or tissue / cell type-specific. They may be active in only certain
cells. Enhancers play key roles in regulating gene expression in a specific tissue or
developmental stage.
& A given enhancer binds regulators at a given time and place. Alternative enhancers bind
different groups of regulators and control expression of the same gene at different times
and places in response to different signals.
& They exert strong activation of transcription of a linked gene from the correct start site.
They exert preferential stimulation of the closest of two tandem promoters. These DNA
sequences, although not promoter themselves, can enormously increase the effectiveness of
promoters.
36
& Enhancer sequences are targeted by a number of sequence-specific DNA binding proteins
called gene specific transcription factors and activators. The assembly or clustered group of
activators at enhancer region is called enhancons. It is believed that enhancers can regulate
transcription of a specific gene from a distant location by bending or looping out of the
intervening DNA sequence (interstitial DNA between promoter and enhancer regions) so
that the transcription factors bound to it can directly interact with the RNA Pol II
machinery bound at promoter and influences its action.
& Activation at a distance raises a problem. When an activator binds at an enhancer, there
may be several genes within its range, yet a given enhancer typically regulates only one
gene. Other regulatory sequences called insulators or boundary elements are found between
enhancers and some promoters. Insulators block activation of the promoter by activators
bound at the enhancer. These elements, although still poorly understood, ensure activators
do not work indiscriminately.
& Elements analogous to enhancers in yeast are called Upstream Activator Sequences
(UASs). It, however, works only upstream of the promoter and cannot function when
located downstream.
(4) Transcription factors

RNA Pol II requires an array of other proteins for its activity, called transcription factors in order
to form the active transcription complex. In contrast to somewhat smaller prokaryotic RNA Pol
holoenzymes, eukaryotic RNA Pols do not independently bind their target DNAs. Rather they
are recruited to their target promoters through the mediation of very large and complicated
complexes of transcription factors and their ancillary proteins.
Eukaryotic system requires two types of transcription factors:

(A) General transcription factors
(B) Gene Specific transcription factors
(A) General Transcription factors (GTFs)

These are set of proteins, which bind to RNA Pol II promoters and together initiate transcription.
They are collectively known as general transcription factors. These multisubunit factors are
named as transcription factors TFIIA, TFIIB and TFIIC etc (TF stands for transcription factor
and II refers to RNA Pol II).
The general transcription factors collectively perform the functions similar to that performed by
in bacterial transcription. However, these factors do not show any significant sequence
homology to factor. They have been shown to assemble on basal promoters in a specific order
and they may be subject to multiple levels of regulation. They help polymerase to bind to the
promoter.
The binding of a transcription factor to its cognate DNA sequence enables the RNA Pol to locate
the proper initiation site. Such highly complex assembly of RNA Pols and associated proteins is
absent in prokaryotes. The binding of the TFs to the promoter leads to the melting of DNA
(comparable to the transition from closed to open complex in bacteria). They also help
polymerase escape from the promoter and embark on elongation phase.
37
The general transcription factors, TFIIs, required at every Pol II promoter are highly conserved
in all eukaryotes. The properties of various GTFs required by RNA Pols are summarized in
Table 6.
Table 6: Properties of RNA Pol II (yeast) promoters associated general transcription

factors
S. Transcription Number of Subunit (s) Properties / Function(s)

No. protein subunits Mr (D)
1 TBP (TFIID) 1 38000 TBP (38 kD) is part of TFIID (700 kD);
TFIID also contains TBP associated factors
(TAFs); TBP has saddle like structure and
its concave surface recognizes TATA box
in the minor groove; TBP is regulated by
TAFII230 that binds to its concave surface
thereby preventing the binding of TBP to
DNA.
2 TFIIA 3 12000, Stabilizes binding of TFIIB and enhances
19000, transcription; Allows binding of TBP (as
35000 TFIID) to the promoter; Prevents binding
of DR1 and DR2 inhibitors to TFIID;
Removes inhibition of TBP by TAFII230
3 TFIIB 1 35000 Binds to TBP; Interacts with upstream of
TATA box in major groove (at BRE) and
downstream of TATA box in minor groove
and allows asymmetric assembly of
complex thereby allowing unidirectional
transcription; Recruits Pol II-TFIIF
complex
4 TFIIE 4 34000, Heterotetramer of two subunits; Recruits
57000 TFIIH; Has ATPase and helicase activities;
Stimulates kinase activity of TFIIH
5 TFIIF 2 30000,74000 Binds tightly to Pol II; Binds to TFIIB and
prevents binding to Pol II to nonspecific
DNA sequences; Acts as elongation factor
later
6 TFIIH 12 35000- Largest; Two subunits have ATPase
89000 activity; One subunit has protein kinase
activity; Unwinds DNA at promoter
(helicase activity); Phosphorylates Pol II
(within the CTD); Recruits nucleotide
excision repair proteins for DNA repair
7 TFIIJ Not Not Required for transcription (at least in
characterized characterized vitro); Probably plays role in promoter
clearance and elongation
38
Many RNA Pol II promoters, which do not contain a TATA box, have an initiator element
overlapping their start site. It seems that at these promoters, TBP is recruited to the promoter by
a further DNA binding protein, which binds to the initiator element. TBP then recruits the other
transcription factors and RNA Pol in manner similar to that, which occurs in TATA box
promoters.
Similarly, transcription factors, TFI and TFIII, are required to stimulate the transcription by RNA
Pol I and III, respectively.
(B) Gene specific transcription factors

Although RNA Pol II and its associated factors (TFIIs) play a major role in initiation of
transcription of various mRNA encoding genes, the extent of their transcription is modulated by
another set of transcription factors called gene specific transcription factors. The term gene
specific transcription factors is used because the combination of such factors may actually
direct the transcription of one gene as opposed to others. Gene specific transcription factors play
a major role in tissue specific gene expression and eliciting certain responses such as immune
response, apoptosis, cell differentiation etc. Gene specific transcription factors are characterized
by a DNA binding domain, which recognizes specific cis-regulatory sequences located in the
proximal and the distal regions of the promoter. Following binding to the cognate sequence, gene
specific transcription factors mediate their effect on RNA Pol II through another domain called
transactivation domain. Transactivation domain communicates with the Pol II machinery through
a group of proteins called mediators or activators. Activators do not bind DNA directly but act as
bridging molecule between Pol II and the gene specific transcription factors. Many eukaryotic
gene specific transcription factors have been characterized till date and a few of them are listed
in Table 7.
Table 7: Gene specific transcription factors and their functions
S. No. Name Species Function

Skeletal muscle specific gene
1 MyoD Human, Mouse etc.
expression
Immune response, cytokine gene
2 NF kappa B Human, Mouse etc.
expression
Human, Mouse etc., Activation of glucocorticoid,
3 Glucocorticoid
receptors responsive genes
One of the characteristics of the gene specific transcription factors is that they possess distinct
structural motifs essential for DNA recognition and transactivation function. They are often
classified on the basis of such structural features such as homeodomain, helix turn helix, helix
loop helix and Zn finger. Quite often two gene specific transcription factors belonging to the
same structural family dimerize and bind to the target sequence in a bipartite manner. One such
eg. is the transcription factor AP-1 which is a dimer of Jun (39 kD) and Fos (65 kD) proteins.
39
They belong to the leucine zipper family and target the sequence TGACTCA. Gene specific
transcription factors are often targeted by various signal-transducing kinases such as MAP
kinase, which phosphorylates them to induce their activities. Some gene specific transcription
factors are also localized in the cytoplasm in an inactive form and upon activation are
translocated to the nucleus for activity. For eg. transcription factor NK kappa B remains bound to
an inhibitory protein called I kappa B which retains it in the cytoplasm. Upon receiving
appropriate signal, I kappa B is ubiquintylated and degraded, resulting in the release of NK
kappa B that is then translocated to the nucleus.
(5) Elongation factors

During elongation, the activity of the Pol II is greatly enhanced by proteins called
elongation factors. The transition from initiation to elongation phase involves the shedding of
most of the initiation factors and mediator. In their place another set of factors is recruited. This
exchange of initiation factors for those factors required for elongation and RNA processing
involves phosphorylation of the CTD of RNA Pol II. Properties of various elongation factors are
summarized in Table 8.
(6) Overall process of eukaryotic transcription

(A) Synthesis of precursor of mRNA by RNA Pol II
The process of transcription by Pol II can be described in terms of several phases - assembly and
initiation, elongation and termination - each associated with characteristic proteins (Fig. 18).
(i) Assembly and Initiation: The eukaryotic transcription involves the assembly of RNA Pol II
and transcription factors at a promoter. The step-by-step pathway described below leads to active
transcription in vitro. In the cell, many of the proteins may be present in larger, preassembled
complexes, simplifying the pathways for assembly on promoters. Two major points of
differences in the initiation phase of transcription in prokaryotes and eukaryotes are: melting
requires ATP hydrolysis and secondly promoter escape occurs after phosphorylation of
polymerase.
The formation of preinitiation complex or basal transcription apparatus thus involves following
steps:
$ Binding of TBP: In the first step, TBP, a component of TFIID transcription factor, binds
TATA box 105 times as tightly to the TATA box as to noncognate sequences. Both DPE
and initiator sequences are also targeted by TFIID. TBP bound to TATA box is the center
point of the initiation complex. This binding induces large conformational changes in the
bound DNA. When TBP binds to TATA box, it distorts the DNA using a -sheet inserted
into the minor groove. This distortion generates a binding site for TFIIB, which in turn
provides a platform for the recruitment of the Pol II and TFIIF. This complex is distinctly
asymmetric. The asymmetry is crucial for specifying a unique start site and ensuring that
transcription proceeds unidirectionally.
40
Table 8: Elongation factors involved in eukaryotic transcription
S. Transcription No. of Subunit Properties / Function(s)

No. protein subunits (s) Mr
Elongation factors required for elongation stage
1 Elongin (S III) 3 15000, Involved in elongation; Enhances the
18000, elongation rate (2000 nucleotides per
110000 minute); Suppresses the pausing of RNA
Pol II
2 ELL 1 80000 Name derived from Eleven-nineteen
lysine rich leukemia;
The gene for ELL is the site of
chromosomal recombination events
frequently associated with acute myeloid
leukemia;
Involved in elongation; Enhances the
elongation rate (2000 nucleotides per
minute); Suppresses the pausing of RNA
Pol II
3 TFIIS (S II) 1 38000 Involved in elongation; Reduces the
length of time for which the Pol II pauses
at sequences that could slower its progress
and hence Pol II does not transcribe all
regions at constant rate; Stimulates proof
reading activity of RNA Pol II
4 p-TEFb 2 43000, Positive Transcription Elongation Factor
124000 b;
Involved in elongation; Phosphorylates
Pol II (within the CTD) at Ser 2; Contains
CDK9 protein kinase which also helps in
phosphorylation; Recruits elongation
factor TAT-SF1; Phosphorylates and
activates elongation factor hSPT5;
Also involved in RNA processing;
Recruits capping enzyme and splicing
machinery
5 TFIIF 4 (2 each 30000, Binds tightly to Pol II
type) 74000
Elongation factors used in processing
6 hSPT5 - - Involved in RNA processing; Recruits and
stimulates 5 capping enzyme
7 TAT-SF1 - - Involved in RNA processing; Recruits the
components of the splicing machinery
41
p y y
Promoter
DNA
TATA +1
box TBP TFIID
TFIIA TFIIB
Promoter recognition,
binding, melting
TFIIF
& clearance
Initiation
RNA Pol II
with CTD tail
TFIIE TFIIH
RNA Pol II
movement Elongation
P P
P
P
P
P
RNA synthesis
Termination
DNA
+
RNA + RNA Pol II
with CTD tail
Fig. 18: Transcription by eukaryotic RNA Pol II
42
$ Binding of TFIIA: In the next step, TFIIA binds directly to TBP and stabilizes its
interaction with DNA and thereby enhances transcription. TFIIA binding, although not
always essential, can be important at non-consensus promoters where TBP binding is
relatively weak.
$ Binding of TFIIB: The formation of a closed complex begins when the TBP binds to the
factor TFIIB, which also binds to DNA on either side of TBP.
$ Recruitment of TFIIF-Pol II: The TFIIB-TBP complex is next bound by another complex
consisting of TFIIF and Pol II. TFIIF helps target Pol II to its promoters, both by
interacting with TFIIB and by reducing the binding of the polymerase to nonspecific sites
on the DNA.
$ Binding of TFIIE and TFIIH: Following recruitment of Pol II-TFIIF, two more
transcription factors viz TFIIE and TFIIH are recruited to complete the assembly of the
closed preinitiation complex. They bind upstream of Pol II.
TFIIH is a complex factor having multiple enzymatic activities including ATPase, helicase,
kinase and DNA repair activities. The DNA helicase activity of TFIIH promotes the unwinding
of DNA near the RNA start site (i.e. Inr), thereby creating an open complex. This process
requires the hydrolysis of ATP. The helicase activity is required for unwinding the DNA and the
DNA repair activity presumably couples transcription with DNA repair to avoid transcription of
any faulty gene. TFIIH has an additional function during the initiation phase. A kinase activity in
one of its subunits phosphorylates Pol II at many places in the CTD. Several other protein
kinases, including CDK9, which is part of the complex p-TEFb, also phosphorylate the CTD. In
the preinitiation complex, TFIIE stimulates the kinase activity of TFIIH resulting in the
hyperphosphorylation of the carboxyl terminal domain (CTD) of Pol II. Sometimes in the
formation of this complex, the carboxyl terminal domain of the polymerase is phosphorylated on
the serine and threonine residues and then the Pol II escapes the promoter to begin transcription.
The importance of the CTD is highlighted by the finding that yeast cell containing mutant Pol II
with fewer than 10 repeats is not viable. Phosphorylation of CTD causes a conformational
change in the overall complex that weakens the interaction of Pol II with TBP, thereby aiding in
initiation of transcription. Most of the factors are released before the Pol leaves the promoter and
can than participate in another round of initiation.
$ Requirement of additional proteins including mediator complex, nucleosome

modifiers and remodellers: One reason for the additional requirements of mediator
complex, nucleosome modifiers and remodellers is that the DNA template in vivo is
packaged into nucleosomes and chromatin. This condition complicates binding of
polymerase and its associated factors to the promoter.
Transcription regulatory proteins called activators help recruit polymerase to the promoter,
stabilizing its binding there. This recruitment is mediated through interactions between DNA
bound activators and parts of the transcription machinery. Often the interaction is with the CTD
tail of the large polymerase subunit through one surface, while presenting other surfaces for
interaction with DNA-bound activators. This explains the need for mediator to achieve
significant transcription in vivo. Despite this central role in transcriptional activation, deletion of
individual subunits of mediator often leads to loss of expression of only a small subset of genes,
different for each subunit (it is made up of many subunits). This result likely reflects the fact that
different activators are believed to interact with different mediator subunits to bring polymerase
43
to different genes. In addition, mediator aids initiation by regulating the CTD kinase in TFIIH.
The need of nucleosome modifiers and remodellers also differs at different promoters or even at
the same promoter under different circumstances. When and where required, these complexes are
also recruited by the DNA-bound activators. Nucleosome modifying enzymes include histone
acetyltransferase, histone deacetylase and histone methylase.
$ Promoter melting, abortive initiation, synthesis of nascent RNA, phosphorylation of

CTD and promoter clearance or escape: After promoter melting, synthesis of nascent
RNA is initiated. Just as in bacterial case, there occurs a period of abortive initiations
before the Pol II escapes the promoter and enters the elongation phase. During abortive
initiation, the Pol II synthesizes a series of short transcripts. As Pol II continues the
elongation, TFIIB, TFIIF and TFIIH are also released from the promoter by a so-called
promoter clearance. TFIIF, however, remains associated with Pol II and helps the
elongation by suppressing pausing. In contrast to the situation in bacteria, promoter
melting in eukaryotes also requires hydrolysis of ATP and is mediated by TFIIH. In
contrast to bacteria, promoter escape in eukaryotes also involves phosphorylation of
polymerase. The form of Pol II recruited to the promoter initially contains a largely
unphosphorylated tail, but the species found in the elongation complex bears multiple
phosphoryl groups on its tail. Addition of these phosphate groups help polymerase shed
most of the general transcription factors used for initiation and which the enzyme leaves
behind as it escapes the promoter. Indeed, in addition to TFIIH, a number of other kinases
(eg. p-TEFb) have been identified that act on CTD as well as a phosphatases that removes
the phosphates added by those kinases. Regulating the phosphorylation state of the CTD of
Pol II controls late steps, those involving processing of the RNA as well.
(ii) Elongation: Once RNA Pol has initiated transcription, it shifts into the elongation phase.
This transition involves the Pol II enzyme shedding most of its initiation factors, for eg. general
transcription factors and mediator. During synthesis of the initial 60-70 nucleotides of RNA,
TFIIE is released. Subsequently, TFIIH is released. However, TFIIF remains associated with Pol
II throughout elongation. Pol II then enters the elongation phase of transcription. In the place of
transcription factors and mediator, another set of factors is recruited. This new set of factors
stimulates Pol II elongation and RNA proof reading. These proteins that greatly enhance the
activity of the Pol II are called elongation factors. Examples include TFIIS, pTEFb, hSPT5,
Elongin and ELL. The elongation factors suppress pausing or arrest of transcription by the Pol II-
TFIIF complex and also coordinate interactions between protein complexes involved in
posttranscriptional processing of mRNAs. The enzymes involved in all these processes are, like
several of the initiation factors, recruited to the C-terminal tail of large subunit of Pol II, the
CTD. In this case, however, the factors favor the phosphorylated form of the CTD. Thus,
phosphorylation of the CTD leads to an exchange of initiation factors for those factors required
for elongation and RNA processing. As is evident from the crystal structure of yeast Pol II, the
polymerase CTD lies directly adjacent to the channel through which the newly synthesized RNA
exits the enzyme. This, together with its length (it can extend some 800 from the body of
enzyme) allows the tail to bind several components of the elongation and processing machinery
and to deliver them to the emerging RNA. Some other elongation factors are required for RNA
processing.
44
(iii) Termination and release: Once the RNA transcript is completed, transcription is
terminated. The enzyme RNA Pol II does not terminate immediately. Rather, it continues to
move along the template, generating a second RNA molecule that can become as long as several
hundred nucleotides before terminating i.e. termination of mRNA synthesis is combined with
polyadenylation (hence the details of termination step are described after polyadenylation). Pol II
is dephosphorylated, dissociated from the template, recycled and is then ready to initiate another
transcript. In the process, new RNA is released, which may be degraded without ever leaving the
nucleus.
(B) Synthesis of precursors of rRNA by RNA Pol I

The pre-rRNA transcription units contain three sequences that encode the 5.8S, 18S and 28S
rRNAs (Fig. 19). Pre-rRNA transcription units are arranged in clusters in the genome as long
tandem arrays separated by non-transcribed spacer sequences. RNA Pol I in nucleolus
synthesizes pre-rRNA. The arrays of rRNA genes loop together to form the nucleolus and are
known as nucleolar organizer regions.
18S rRNA 5.8S rRNA 28S rRNA
5S rRNA subunit transcribed separately
Fig. 19: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt)
The synthesis of rRNA (5.8S, 18S and 28S) involves transcription factors and complexes, for eg.
Upstream binding factor (UBF) and eukaryotic transcription complex called Selectivity factor
(SL-1) (similar complex in different species are called TIF-IB, Rib1). UBF is a specific DNA
binding protein, which binds to UCE. It greatly stimulates the transcription rate. In its absence, a
low rate of basal transcription is seen. SL-1 contains four subunits: one TBP (TATA binding
protein) and three TAFIs (TBP associated factors for RNA Pol I).
The process of transcription of rRNA (5.8S, 18S and 28S) is outlined below and depicted in Fig.
20.
$ UBF binding: UBF binds to the sequence in the upstream part of core element, called
upstream control element (UCE) of RNA Pol I promoter. Other UBF also binds to the
upstream region of the core element (core promoter). The sequences in the two UBF
binding sites have no obvious similarity. One molecule of the UBF is thought to bind to
each sequence element. UBF-UBF binds by protein-protein interaction causing intervening
DNA to form loop between the two binding sites. (Some are of the view that a single UBF
binds to two different sites, viz UCE and the upstream part of the core element).
45
+1
UCE Core
UBF
UBF
UBF
TAF Is
Is
SL1
TAF
TBP
TAF Is
SL1
UBF
UBF
RNA Pol I
SL1
UBF
UBF
Fig. 20: rRNA transcription initiation
$ Selectivity factor binding: Selectivity factor (SL-1) binds to and stabilizes the UBF-DNA
46
complex. It interacts with the free downstream part of the core element. Binding of UBF
increases transcription initiation activity by SL-1. Acanthamoeba has a simple
transcription control system. This has a single control element and a single factor TIF-1,
which are required for RNA Pol I binding and initiation at the rRNA promoter.
$ RNA Pol I binding: SL-1 binding allows RNA Pol I to bind the complex and initiate
transcription and is essential for rRNA transcription.
(C) Synthesis of precursors of tRNA and 5S rRNA by RNA Pol III

(i) tRNA: The promoter of tRNA genes has two consensus sequences downstream of
transcription start site, namely Box A and Box B, as described in earlier section. Two
complex DNA binding factors have been identified which are required for transcription initiation
by RNA Pol III. These are transcription factors TFIIIC and TFIIIB (Fig. 20). TFIIIC is large
protein complex having six subunits and has a size of >500 kD. It is the assembly factor for
positioning TFIIIB at right location. TFIIIB is the true initiation factor for Pol III. It has three
subunits TBP, B and BRF (TFIIB related factor; it has homology to TFIIB-the RNA Pol II
initiation factor). B is comparable to sigma factor of prokaryotes and functions to initiate
transcription bubble. TFIIIB has no sequence specificity and therefore its binding site appears to
be determined by the position of the TFIIIC binding to DNA. Once TFIIIB has bound, TFIIIC
can be removed without affecting transcription. TFIIIC is therefore an assembly factor for the
positioning of the initiation factor TFIIIB.
The process of transcription, involving following steps is outlined in Fig. 21.
Transcription initiation at eukaryotic tRNA promoter:
+1
A Box B Box
TFIIIC
A Box B Box
TFIIIC
TBP
B''
TFIIIB
BRF
TBP
B''
A Box B Box
BRF
TFIIIC
RNA Pol III

RNA Pol III
TBP
B''
A Box B Box
BRF
TFIIIC
Fig. 21: Transcription initiation by eukaryotic tRNA promoter
47
$ TFIIIC binding: TFIIIC binds to both Box A and Box B of the tRNA promoter.
$ TFIIIB binding: TFIIIB binds TFIIIC-DNA complex and interacts with DNA upstream
from TFIIIC binding site (TFIIIB binds 50 bp upstream from A box).
$ RNA Pol III binding: TFIIIB helps in recruitment of RNA Pol III. The enzyme RNA Pol
III then initiates transcription, presumably displacing TFIIIC from DNA template as it
goes.
Termination of transcription occurs without accessory factors. A cluster of dA residues is often
sufficient for termination and the termination efficiency depends on surrounding sequence. An
example of an efficient termination signal in somatic 5S rRNA genes of Xenopus borealis is 5-
GCAAAAGC-3.
(ii) 5S rRNA: The promoter of tRNA genes has two consensus sequences downstream
transcription start site, namely Box A and Box C, as described in earlier section. The process
of transcription of 5S rRNA genes involves the transcription factors TFIIIA, TFIIIC and TFIIIB.
TFIIIA is assembly factor for positioning TFIIIB at right location. TFIIIB is true initiation factor
for Pol III. TFIIIB has no sequence specificity and therefore its binding site appears to be
determined by the position of the TFIIIC binding to DNA.
The process of transcription involves following steps.
$ TFIIIA binding: TFIIIA binds strongly to Box C promoter sequence.
$ TFIIIC binding: TFIIIC then binds to TFIIIA-DNA complex interacting also with Box
A sequence.
$ TFIIIB binding: Once TFIIIC has bound, TFIIIB can interact with the complex.
$ RNA Pol III binding: TFIIIB then recruits RNA Pol III to initiate transcription.
Fig. 22 depicts the schematic representation of the process of transcription of 5S rRNA.
Post transcriptional RNA processing

Transcription products of all three eukaryotic RNA polymerases undergo various alterations to
yield the mature product. RNA processing is the collective term used to describe these alterations
to the primary transcript. The various post transcriptional processing occurring to RNA are
summarized in Table 9.
(A) Post transcriptional processing of mRNA

In prokaryotes, there is little or no processing of prokaryotic mRNA after synthesis by RNA Pol.
Indeed many mRNA molecules are translated while they are being transcribed, i.e. before being
completely synthesized. Prokaryotic mRNA is degraded rapidly from the 5 end and the first
cistron (protein-coding region) can therefore only be translated for a limited amount of time.
In eukaryotes, RNA Pol II synthesizes mRNA as longer precursors (pre-mRNA), the population
of different pre-mRNAs being called heterogeneous nuclear RNA (hnRNA). Once transcribed,
eukaryotic precursor mRNA has to be processed in various ways before being exported from the
nucleus where it can be translated (Table 7).
48
Transcription initiation at eukaryotic 5S rRNA promoter:
+1
A Box C Box
TFIIIA
+1
A Box C Box
TFIIIA
TFIIIC
A Box B Box
TFIIIC TFIIIA
TBP
B''
TFIIIB
BRF
TBP
B''
A Box B Box
BRF
TFIIIC TFIIIA
RNA Pol III

RNA Pol III
TBP
B''
A Box B Box
BRF
TFIIIC TFIIIA
Fig. 22: Transcription initiation at eukaryotic 5S rRNA promoter
Table 9: Various post transcriptional processing occurring to RNA
1. End modification It occurs during the synthesis of eukaryotic and archael mRNAs. This
involves addition of nucleotides to the 5 or 3 ends of the primary
transcripts or their cleavage products. Such events do not occur in case of
prokaryotes. These include:
(i) Capping of 5 end of mRNA
(ii) Polyadenylation of 3 end of mRNA
2. Splicing It is the removal of introns (non-coding sequences in the genes) from the
precursor RNAs (i.e. eukaryotic mRNAs, and some eukaryotic rRNAs
and tRNAs). It leads to physical change in the length of the transcript.
3. Cutting events These involve cutting of primary transcripts (or removal of nucleotides)
of rRNA and tRNA with endonucleases or exonuclease to produce
mature transcripts in both prokaryotes and eukaryotes. It leads to physical
change in the length of the transcript.
4. Chemical These modifications are made within the rRNAs, tRNAs and mRNAs.
modifications The rRNAs and tRNAs of all organisms are modified by addition of new
chemical groups. These groups are added on either the base or the sugar
moiety of specific nucleotides in RNAs. It occurs to a much lesser extent
with pre-mRNA in eukaryotes. Equivalent events in archaea are poorly
understood. Chemical modification of mRNA called RNA editing is seen
in a diverse group of eukaryotes.
49
Strikingly, there is an overlap in proteins involved in elongation and those required for RNA
processing. As mentioned earlier, the transcription elongation factor, pTEFb activates another
elongation factor hSPT5, which helps in recruitment and stimulation of the 5 capping enzyme.
Another example is the pTEFb-induced recruitment of the elongation factor TAT-SF1, which
further recruits the components of the splicing machinery. Thus, it seems that transcription and
RNA processing are interconnected, presumably to ensure their proper coordination, allowing
cotranscriptional processing of primary transcript.
(a) Capping of 5 end: Capping involves the addition of a modified G base (m7G) to the 5 end
of mRNA. Specifically it is a methylated G and it is joined to the RNA transcript by an unusual
5 5 linkage involving three phosphates. The cap is added in reverse polarity (5 to 5), thus
acting as a barrier to 5 exonuclease attack, but it also promotes splicing, transport and
translation.
The 5 cap is created in three enzymatic steps:

$ Removal of a phosphate group from 5 end of transcript: RNA triphosphatase removes
the -phosphate at the 5 end of RNA by hydrolysis leading to the formation of a
diphosphate at the 5 end (the initiating nucleotide of a transcript initially retains its -, -
and -phosphates).
$ Addition of GTP: In the next step, the enzyme guanylyl transferase catalyzes the
nucleophilic attack of the resulting terminal -phosphate on the -phosphoryl group of a
molecule of GTP, with - and -phosphates of the GTP serving as a pyrophosphate-
leaving group. The reaction leads to the formation of an unusual 5-5 triphosphate
linkage. This distinctive terminus is called a Cap.
$ Modification of terminal G residue by the addition of the methyl group: Once this
linkage is made, the newly added G and the purine at the original 5 end of the mRNA are
further modified. The N-7 nitrogen of terminal guanine is methylated by methyltransferase
using S-methyl adenosine as methyl group donor and forms Cap 0. The adjacent riboses
may also be methylated to form Cap 1 or Cap 2.
(b) Splicing: In eukaryotes, genes are characterized by non-coding sequences interspersed

between stretches of coding sequences. These intervening non-coding sequences are called
introns and the coding sequences are called exons. Introns can be as small as 10 nucleotides and
as long as hundreds of kilobases. The introns are copied when the gene is transcribed and are
removed from the precursor mRNA by cutting and joining reactions. This removal of introns
occurs in nucleus and is called splicing (Scheme 2.). Thus, the precursor mRNAs, which forms
the nuclear RNA fraction called heterogeneous nuclear RNA (hnRNA) (unspliced transcripts)
become mature mRNAs after splicing. The splicing of nuclear pre-mRNA is the most
complicated post transcriptional processing event.
In eukaryotes, each intron in nuclear pre-mRNA, also called GU-AG intron, is characterized by a
signature sequence in both 3 and 5 ends that are recognized by the spliceosome. The borders
between introns and exons are thus marked by specific nucleotide sequences within the pre-
mRNAs. Thus, for splicing, an intron should have a 5-GU, an AG-3 and a branch point
sequence. Thus, the sequences within the RNA delineate where splicing will occur (Scheme 3).
50
Exon 1 Exon 2 Exon 3 Exon 4
Pre-mRNA 5 3
Intron 1 Intron 2 Intron 3
Splicing
mRNA
Scheme 2: Schematic representation of splicing in eukaryotes
3 splice site
Upstream Exon AG GUAAGU UACUAAC PynNCAG G Downstream Exon
Branch site
5 splice site
Scheme 3: Splice sites in most of the vertebrates
Splicing requires the cooperation of several small nuclear ribonucleoproteins (snRNPs,

pronounced snurps). The snRNPs are formed when uracil-rich snRNAs made by RNA Pol II
are complexed with specific proteins. The snRNPs interact with one another forming a complex,
called spliceosome, which helps to hold the upstream and downstream exons close together
while looping out the intron. This folding of the pre-mRNA into the correct conformation is
essential for splicing. After the spliceosome forms, a rearrangement takes place before the two-
step splicing reaction (Transesterification reactions) can occur with release of the intron as a
lariat. Spliceosome is made up of 5 snRNPs, but the exact makeup differs at different stages of
the splicing reaction: different snRNPs come and go at different times, each carrying out
particular functions in the reaction. There are also many proteins within the spliceosome that are
not part of the snRNPs. Spliceosome thus comprises of about 150 proteins and 5 snRNAs and is
similar in size to a ribosome.
51
Various snRNPs involved in the process and their sizes and functions are summarized in the
Table 10.
Table 10: Small nuclear ribonucleoprotein particles (snRNPs) in the splicing of nuclear
mRNA precursors
S. No. snRNPs Size (nucleotides) Role

1 U1 snRNP 165 Binds to 5 splice site and then the 3 splice site;
Promotes binding of U2 snRNP
2 U2 snRNP 185 Binds the branch site (aided by U2AF);
Forms part of catalytic center after binding with
U6
3 U5 snRNP 116 Binds 5 splice site (A loop in U5 snRNA is
immediately adjacent to the first base positions
in both exons)
4 U4 snRNP 145 Masks the catalytic activity of U6
5 U6 snRNP 106 Catalyzes splicing
Splicing of nuclear pre-mRNA consists of two successive transesterification reactions, in which

phosphodiester bonds within the pre-mRNA are broken and new ones are formed. Thus, the
number of phosphodiester bonds remains constant during reactions. The chemistry of the
splicing process is simple. The intron is removed in a form called a lariat as the flanking exons
are joined (Fig. 23).
Thus, the snRNPs have three roles in splicing reaction:

& To recognize 5 splice site and branch site and bring them together as required
& To catalyze (or help to catalyze) the RNA cleavage and joining reactions
Various steps of splicing pathway can be summarized as follows:

$ Formation of Early (E) complex: Initially, the 5 splice site is recognized by the U1
snRNP (using base pairing between its snRNA and the pre-mRNA). One subunit of
U2 auxillary factor (U2AF) binds to the pyrimidine tract and the other to the 3 splice
site. The former subunit interacts with branch point binding protein (BBP) and helps
that protein to bind to the branch site. This arrangement of proteins and RNA is called
the Early (E) complex.
$ Formation of A complex: U2 snRNP then binds to the branch site, aided by U2AF
and displacing BBP. This arrangement is called the A complex. The base pairing
between the U2 snRNA and the branch site is such that the branch site A residue is
extruded from the resulting stretch of double helical RNA as a single nucleotide
bulge. This A residue is thus unpaired and available to react with the 5 splice site.
52
3'
3' Exon 2
3'
Exon 2 G
G P
3' splice site G
P P G 3'OH
A
G G 3'OH G
A G
A A Exon 1
C A
C C
N N N
Yn Yn
5' Yn
5'
5' 3' Spliced product 3'

Exon 1 Y Y Y
2'OH A Branch site 2' 2'
A A
A
G
R
Y
N
Y
First
transesterification
R
Y
N
Second
transesterification
reaction
+ R
Y
N
reaction Y Y
P
5' 5'
5' splice site G P P
G G
U U U
A A A
A Intron Intron Intron
A A
G G G
U U U
Precursor mRNA Lariat intermediate Lariat form of intron
Fig. 23: Splicing of nuclear pre-mRNA
$ Formation of B complex: The next step is a rearrangement of the A complex to

bring together all three splice sites. This is achieved as follows: the U4 snRNP and
U6 snRNP, along with the U5 snRNP, join the complex. Together these three snRNPs
are called tri-snRNP particle, within which the U4 snRNP and U6 snRNP are held
together by complementary base pairing between their RNA components and the U5
snRNP is more loosely associated through protein-protein interactions. With the entry
of the tri-snRNP, the A complex is converted into the B complex.
$ Formation of C complex: In the next step, U1 snRNP leaves the complex and U6
snRNP replaces it at the 5 splice site. This requires that the base pairing between the
U1 snRNA and the pre-mRNA be broken, allowing the U6 snRNA to anneal with the
same region (infact, to an overlapping sequence). These steps complete the assembly
pathway. The next rearrangement triggers catalysis and occurs as follows: U4 snRNP
is released from the complex, allowing U6 snRNP to interact with U2 snRNP
(through RNA:RNA base pairing). This rearrangement is called the C complex. This
rearrangement has following consequences:
& It produces the active site. It brings together those components believed to be solely regions
of the U2 snRNA and U6 snRNA within the spliceosome forming the active site.
& The same rearrangement also ensures the proper positioning of the substrate RNA to be
acted upon.
& Formation of the active site juxtaposes the 5 splice site pre-mRNA and the branch site.
It is striking that the active site is primarily formed of RNA, but also that it is only formed at this
53
stage of spliceosome assembly. Presumably this strategy lessens the chance of aberrant splicing.
Linking the formation of the active site to the successful completion of the earlier steps in
spliceosome assembly makes it highly likely that the active site is available only at legitimate
splice sites.
$ Joining of exons and release of mature mRNA: The juxtaposition of the 5 splice
site pre-mRNA and the branch site facilitates the first transesterification reaction. The
second reaction, between the 5 and 3 splice sites, is aided by the U5 snRNP, which
helps to bring the two exons together. The final steps involve release of mRNA
product and the snRNPs. The snRNPs are initially bound to the lariat, but get recycled
after rapid degradation of that piece of RNA.
Components of the splicing machinery arrive or leave the complex at each step due to changes
associated with structural rearrangements necessary for the splicing reaction to proceed. There is
evidence to suggest that some of the components shown do not arrive or leave precisely when
indicated in the figure, they may, for eg., remain present but weaken their association with the
complex rather than dissociating completely. It is also not possible to be sure of the order of
some changes shown, particularly the two steps involving changes in U6 pairing: when it takes
over from U1 snRNP at the 5 splice site, compared to when it takes over from U4 snRNP in
binding U2 snRNP. Despite these uncertainties, the critical involvement of different components
of the machinery at different stages of the splicing reaction and the general dynamic nature of the
spliceosome, are as shown in Fig. 24.
Some eukaryotic pre-mRNAs do not fall into the GU-AG intron category. They have different
consensus sequences at their splice sites. These are AU-AC introns, which have been found in
approximately 20 genes in organisms as diverse as humans, plants and Drosophila. These introns
require U11 / U12 snRNPs.
(c) Polyadenylation of 3 end (followed by termination of transcription): The final RNA

processing event, polyadenylation of the 3 end of pre-mRNA, is intimately linked with the
termination of transcription. Just as with capping and splicing, the polymerase CTD tail is
involved in recruiting the enzymes necessary for polyadenylation. Once polymerase has reached
the end of a gene, it encounters specific sequences that, after being transcribed into RNA, trigger
the transfer of polyadenylation enzymes to that RNA, leading to three events (Fig. 25):
& Cleavage of the message
& Addition of many adenine residues to its 3 end by Poly A polymerase
& Termination of transcription by polymerase
Eukaryotic mature mRNA transcripts have more nucleotides beyond 3 end. Indeed, the
nucleotide preceding the poly (A) is not the last nucleotide to be transcribed.
Polyadenylation was once looked on as a post transcriptional event but it is now recognized
that the process is an inherent part of the mechanism for termination of transcription by RNA Pol
II.
54
Spliceosomal mediated splicing reaction:
5' A 3'
U1 snRNP
BBP U2AF65 35
+
5' A 3'
U2 snRNP
BBP
A
U5 snRNP
Tri snRNP U2AF65 35
U6 snRNP
particle
U4 snRNP
U1 snRNP
U4 snRNP
A
Lariat form of
intron
5' 3'
Spliced exons
Fig. 24: Spliceosomal mediated slicing
In mammals, polyadenylation is directed by a signal sequence in the mRNA, almost invariably

5-AAUAAA-3. These are cleaved by a specific endonuclease that recognizes the sequence
AAUAAA. Cleavage does not occur if this sequence or a segment of some 20 nucleotides on its
3 side is deleted. The presence of internal AAUAAA sequences in some mature mRNAs
indicates that AAUAAA is only part of the cleavage signal. This sequence is located between 10
and 30 nucleotides upstream of the dinucleotide 5-CA-3 and is followed 10-20 nucleotides later
55
by a GU rich region. Both the poly (A) signal sequence and the GU rich region are binding sites
for multisubunit protein complexes.
& Cleavage and polyadenylation specificity factor (CPSF) binds poly (A) signal sequence.
& Cleavage stimulation factor (CstF) binds GU rich region.
Ongoing transcription
5 3
Cleavage site
AAAAAAAAAAAAA Poly A
5 Pol
Fig. 25: Termination step involves cleavage followed by polyadenylation of transcript
Besides, Poly (A) polymerase and at least two other protein factors must associate with bound
CPSF and CstF in order for polyadenylation to occur.
After cleavage by the endonuclease, template-independent RNA polymerase called poly (A)
polymerase adds about 250 adenylate residues to the 3 end of the transcript. Virtually, all
eukaryotic mRNAs have a series of up to 250 adenosines at their 3 ends. This enzyme uses ATP
as a precursor and adds A residues using the same chemistry as RNA polymerase. These A
residues are not specified by DNA sequence, i.e. these A(s) are added without a template. Thus,
the long tail of A(s) is found in the RNA but not the DNA. It is not clear what determines the
length of the poly A tail, but that process involves other proteins that bind specifically to the poly
A sequence (described later). The polymerase does not act at the extreme 3 end of the transcript,
but at an internal site, which is cleaved to create a new 3 end to which the poly (A) tail is added.
The reaction catalyzed is as follows:
RNA + n ATP RNA-(AMP)n + PPi
The additional factors required include polyadenylate-binding protein (PABP). These PABPs
catalyze the following functions:
& To help the polymerase to add the adenosines
& Possibly influences the length of the poly (A) tail that is synthesized
& Appears to play a role in maintenance of the tail after synthesis
& Also play a role in translation
In yeast, the signal sequences in the transcript are slightly different, but the protein complexes
are similar to those in mammals and polyadenylation is thought to occur by more or less the
56
same mechanism.
CPSF is known to interact with TFIID and is recruited into the polymerase complex during the
initiation stage. By riding along the template with RNA Pol II, CPSF is able to bind to the poly
(A) signal sequence as soon as it is transcribed, initiating the polyadenylation reaction. Both
CPSF and CstF contact with the CTD of the polymerase. It has been suggested that the nature of
these contacts changes when the poly (A) signal sequence is located and that this change alters
the properties of the elongation complex so that termination becomes favored over continued
RNA synthesis. As a result, transcription stops soon after the poly (A) signal sequence has been
transcribed. The details of the termination step linking cleavage and polyadenylation to
termination of transcription are outlined in Fig. 26.
It is noteworthy that the long tail of A(s) is unique to transcripts made by RNA Pol II, a feature
that allows experimental isolation of protein coding mRNAs by affinity chromatography. The
mature mRNA is then transported from the nucleus.
It is not known what links polyadenylation to termination, but it is clear that the polyadenylation
signal is required for termination (interestingly, RNA cleavage is not). Two basic models have
been proposed to explain the link between polyadenylation and termination:
& First that the transfer of 3 processing enzymes from the polymerase CTD tail to the RNA
triggers a conformational change in the polymerase that reduces processivity of the
enzyme, leading to spontaneous termination soon afterward.
& The second model proposes that the absence of a 5 cap on the second RNA molecule is
sensed by the polymerase, which, as a result, recognizes the transcript as improper and
terminates. The absence of the cap reflects the absence of the capping enzymes on the CTD
at this stage of the transcription cycle (these enzymes are loaded onto the CTD at the point
where initiation turns to elongation and are then displaced in favor of the splicing
machinery).
The role of poly (A) tail is still not firmly established despite much effort. Even though
polyadenylation can be identified as an inherent part of the termination process, this does not
explain the necessity to add a poly (A) tail to the transcript. Evidence that it enhances translation
efficiency and the stability of mRNA is accumulating. The poly (A) tail on pre-mRNA is thought
to help stabilize the molecule since a poly (A)-binding protein binds to it, which should act to
resist 3 exonuclease action. In addition, the poly (A) tail may help in the translation of the
mature mRNA in the cytoplasm. Blocking the synthesis of poly (A) tail by exposure to 3-
deoxyadenosine (cordycepin) does not interfere with the synthesis of primary transcript. The
mRNA devoid of a poly (A) tail can be transported out of the nucleus. However, an mRNA
molecule devoid of a poly (A) tail is usually a much less effective template for protein synthesis
than is one with a poly (A) tail. Thus, poly (A) tail has a role in initiation of translation. It is
further supported by research showing that poly (A) polymerase is repressed during those
periods of the cell cycle when relatively little protein synthesis occurs. Indeed some mRNAs are
stored in an unadenylated form and receive the poly (A) tail only when translation is imminent.
The half-life of an mRNA molecule may also be determined in part by the rate of degradation of
its poly (A) tail. Histone pre-mRNAs do not get polyadenylated, but are cleaved at a special
sequence to generate their mature 3 ends.
57
Pre-mRNA
5..AAUAAA..CA.GU rich region3

10-30 bp 10-20 bp
Polyadenylate binding
protein
CPSF RNA Pol II CstF
5..AAUAAA..CA..GU rich region3
Polyadenylated mRNA Polyadenylation
5..AAUAAA..CAAAAAAAAAAAAAA3
RNA Polyadenylation
signal sequence (AAUAAA)
DNA
CstF
CPS Cleavage proteins attaches to signal sequence
F
RNA CPSF CstF
DNA
Termination is favored over

CPSF is shown attached to the RNA Pol II elongation complex that is synthesizing RNA. CPSF
binds to the polyadenylation signal sequence AAUAAA as soon as it is transcribed. This changes
the interaction between CPSF and the CTD of RNA Pol II so that termination of transcription is
now favored over continued elongation. CstF probably attaches the GU rich region downstream of
AAUAAA. The CPSF is shown to leave the complex in order to bind to the polyadenylation signal,
when in reality it may maintain its attachment to RNA Pol II during the polyadenylation process.
Fig. 26: Termination signal and the link between polyadenylation and termination of transcription by
RNA Pol II
58
(d) Pre-mRNA methylation: The final modification or processing event that many pre-
mRNA undergo is specific methylation of certain bases. In vertebrates, the most common
methylation event is on the N6 position of A residues, particularly when these A residues occur
in the sequence 5-RRACX-3, where X is rarely G. Up to 0.1% of pre-mRNA A residues are
methylated and the methylations seem to be largely conserved in the mature mRNA, though their
function is unknown.
Alternative mRNA processing

Alternative mRNA processing is the conversion of pre-mRNA species into more than one type of
mature mRNA.
Alternative processing can be achieved in four different ways:

& By using different poly (A) sites
& By using different promoters
& By retaining certain introns / by retaining or removing certain exons
& RNA editing
(a) Alternative poly (A) sites: Some pre-mRNAs contain more than one poly (A) site and these
may be used under different circumstances (eg. in different cell types) to generate different
mature mRNAs. The cell or organism has a choice of which one to use. It is possible that if the
upstream site is used then sequences that control mRNA stability or location are removed in the
portion that is cleaved off. Thus mature mRNAs with the same coding region, but differing
stabilities or locations, could be used in the same cell at a frequency that reflects their relative
efficiencies (strengths) and the cell would contain both types of mRNA. The efficiency of a poly
(A) site may reflect how well it matches the consensus sequences. In other situations, one cell
may exclusively use one poly (A) site, while a different cell uses another. The most likely
explanation is that in one cell the stronger site is used by default, but in the other cell a factor is
present that activates the weaker site so it is used exclusively, or that prevents the stronger site
from being used. In some cases, the use of alternative poly (A) sites causes different patterns of
splicing to occur. In some cases, factors will bind near to and activate or repress a particular site.
(b) Alternative promoters: The use of different promoters in different cell types and at different
developmental stages lead to the generation of different mature mRNAs.
(c) Alternative splicing: In many cases, the generation of different mature mRNAs from a
particular type of gene transcript can occur by varying the use of 5- and 3-splice sites. This is
called alternative splicing. Hence, a single transcript can be spliced in multiple ways resulting in
a number of protein coding sequences.
The alternative splicing events as depicted in Fig. 27 are:

& Exon skipped
& Intron retained
& Exon extended using cryptic splice sites
& Alternative exons
59
Intron 1 Intron 2
Exon 1 Exon 2 Exon 3
5' 3' DNA
Transcription
Intron 1 Intron 2
Exon 1 Exon 2 Exon 3 Primary
5' 3' RNA transcript
Splicing
Intron 1
Exon 1Exon 2Exon 3 Exon 1 Exon 3 Exon 1 Exon 2 Exon 3 Exon 1 Exon 2Exon 3 Exon 1 Exon 2
+ Spliced mRNA
Exon 1 Exon 3
Normal Exon skipped Exon extended Intron retained Alternative exons
Fig. 27: Types of splicing
By this strategy, a gene can give rise to more than one polypeptide product with partially
overlapping sequences and is more common in higher eukaryotes. Some pre-mRNAs can be
spliced in more than one way, generating alternative mRNAs. It is estimated that 30% of the
genes in human genome are spliced in alternative ways to generate more than one protein per
gene. Some examples of alternatively spliced pre-mRNA are: troponin, tropomyosin, myosin,
actin, fibronectin, fibrinogen, nerve growth factor, aldolase, alcohol dehydrogenase, calcitonin,
SV40 T-antigen, Drosophila sxl, tra and dsx pre-mRNA for sex determination etc.
As these splicing events occur differently in different cell types, it is likely that cell type-specific
factors are responsible for activating or repressing the use of processing sites near to where they
bind. Thus, the application of SR proteins (serine-arginine rich) and hnRNPs to guide alternative
splicing mechanism has been suggested.
(d) RNA editing: An unusual form of RNA processing in which the sequence of the primary
transcript is altered is called RNA editing. RNA editing, like RNA splicing, is a process in which
sequence of RNA changes after or during its transcription i.e. at the level of mRNA. In this form
of RNA processing, the nucleotide sequence of the primary transcript is altered by changing /
inserting / deleting residues at specific points along the molecule. Thus, the protein produced
upon translation is different from that predicted from the gene sequence i.e. coding sequence in
RNA differs from the sequence of DNA from which it was transcribed. This is thus a method for
increasing protein diversity, similar to alternative splicing. RNA editing occurs in two different
situations, with different causes.
There are two major mechanisms that mediate editing:

$ Site-specific deamination: In mammalian cells, there are cases in which a substitution
occurs in an individual base in mRNA, causing a change in the sequence of the protein that
is coded. For eg., apolipoprotein B gene and mRNA in mammalian intestine and liver,
glutamate receptors in rat brain etc.
60
The mammalian genome contains a single (interrupted) apolipoprotein B gene whose sequence is
identical in all the genes, with a coding region of 4563 codons. This gene is transcribed into an
mRNA that is translated into a protein of 512 kD, called apo B100, representing the full coding
sequence in the liver. A shorter form of protein, called apo B48 of ~250 kD size is synthesized in
intestine. This protein consists of the N-terminal half of the full-length protein. It is translated
from an mRNA whose sequence is identical with that of liver except for a change (deamination
by cytidine deaminase) from C to U at codon 2153 in 26th exon. This substitution changes the
codon CAA for glutamine into the ochre UAA for termination. The two proteins though
translated from the same gene have different functions. Apo B48, which is formed only in small
intestine functions in chylomicrons to transport triacylglycerols from the intestine to the liver. On
the other hand, Apo B100, which is formed only in liver functions in VLDL, IDL and LDL to
transport cholesterol from liver to peripheral tissues.
Another example is provided by glutamate receptors in rat brain. Editing at one position changes
a glutamine codon in DNA into a codon for arginine in RNA; the change affects the conductivity
of the channel and therefore has an important effect on controlling ion flow through the
neurotransmitter. At another position in the receptor, an arginine codon is converted to a glycine
codon.
$ Guide RNA-directed (gRNA-directed) uridine insertion or deletion: In mitochondria of

trypanosome and leishmania, more widespread changes occur in transcripts of several
genes, when bases are systematically added or deleted.
Additions or deletions (most usually of uridine) occur in trypanosome and leishmania

mitochondria and in paramyxovirus. Extensive editing reactions occur in trypanosomes in which
as many as half of the bases in an mRNA are derived from editing. The editing reaction uses a
template consisting of a guide RNA that is complementary to the mRNA sequence. An enzyme
complex including endonuclease, terminal uridylyltransferase and RNA ligase catalyzes the
reaction. The free nucleotide is used as a source of addition. Such type of editing is also called
Pan editing.
Besides the above-mentioned types, two other terms are associated with RNA editing. These
include:
& Insertional editing: This type of editing occurs with some RNAs, for eg., the
paramyxovirus P gene, which gives rise to at least two different proteins because of the
insertion of the Gs at specific positions in the mRNA. Guide RNAs do not specify these
insertions, instead they are added by the RNA Pol as the mRNA is being synthesized.
& Polyadenylation editing: This type of editing is seen in many animal mitochondrial
mRNAs. Five of the mRNAs transcribed from the human mitochondrial genome end with
just a U or UA, rather than with one of the three termination codons. Polyadenylation
converts the terminal U or UA into UAAAA.. and so several features that appear to
have evolved in order to make vertebrate mitochondrial genome as small as possible.
61
(B) Post transcriptional processing of tRNA and rRNA (maturation of tRNA and rRNA)
tRNA and rRNA are synthesized as precursors and they undergo cleavage by nuclease i.e.
undergo processing and are not translated. In E. coli, three kinds of rRNA molecules and a tRNA
molecule are excised from a single primary RNA transcript that also contains spacer regions.
Other transcripts contain arrays of several kinds of tRNA or of several copies of the same tRNA.
Mature rRNAs and tRNAs are generated by cleavage and other modifications of nascent RNA
chains.
In eukaryotes, rRNA and tRNA molecules, in contrast with mRNAs and small RNAs that
participate in splicing, do not have caps. Because rRNAs and tRNAs are non-coding, chemical
modifications to their nucleotides affect only the structural features and possibly, catalytic
activities of the molecules.
Post transcriptional processing of tRNA

The rRNA operons of E. coli contain coding sequences for tRNAs. In addition, there are other
operons in E. coli that contain up to seven tRNA genes separated by spacer sequences. Mature
tRNA molecules are processed from precursor transcripts of both of these types of operon by
nucleases. The nucleases that cleave and trim the precursors of tRNA are highly precise. In
prokaryotes, the generation of mature tRNA involves cleavage of a 5 leader sequence, removal
of 3 terminal extra residues and chemical modifications of several bases and ribose units.
Similar to prokaryotes, eukaryotic pre-tRNA contains extra nucleotides at 5 and 3 ends and
also modified bases. Besides, some eukaryotic pre-tRNAs and archael transcripts also contain
introns, which are different from pre-mRNA introns. Such introns are rare in bacteria. The
primary transcript forms a secondary structure with characteristic stems and loops, which allow
endonucleases to recognize and cleave off the 5 leader and the two 3 nucleotides. Unlike
prokaryotes, 5-CCA-3 at the 3 end of the mature tRNAs are added by separate enzymatic
reactions and not encoded by the genes.
Various pre-tRNA processing events are summarized below:

(a) Removal of the 3 terminal residues: Once the primary transcript has folded, it has
characteristic stems and loops and extra nucleotides at the 3 end in both prokaryotes and
eukaryotes. The extra flanking nucleotides at the 3 end are cleaved by an endonuclease, RNase
E or F at the base of the stem so that the precursor tRNA still has extra nucleotides. The
exonuclease RNase D then removes the remaining extra nucleotides at the 3 end, one at a time.
(b) Cleavage of a 5-leader sequence: The primary transcript of tRNA contains extra
nucleotides at the 5 end in both prokaryotes and eukaryotes. These nucleotides are cleaved by an
endonuclease, RNase P. This generates the correct 5 terminus of all tRNA molecules in E. coli.
This enzyme is a ribozyme containing a catalytically active RNA molecule, capable of catalyzing
a chemical reaction in the absence of protein. It is therefore a very simple ribonucleoprotein
(RNP). RNase P enzymes are found in both prokaryotes and eukaryotes, being located in the
nucleus of the latter. They are therefore small nuclear RNPs (snRNPs). In E. coli, the
endonuclease is composed of a 377 nucleotide RNA and a small basic protein of 13.7 kD. The in
vitro RNase P ribozyme reaction requires a higher Mg++ concentration than occurs in vivo, so the
62
protein component probably helps to catalyze the reaction in cells.
(c) Attachment of CCA at 3 end: tRNA nucleotidyl transferase enzyme then adds CCA at
the 3 end in eukaryotes. tRNA nucleotidyl transferase is unusual enzyme that binds three
ribonucleotide triphosphate precursors in separate active sites and catalyzes the formation of
phosphodiester bonds to produce CCA (3) sequence. So this sequence is not DNA or RNA
dependent. The template is the binding site of enzyme. A major difference between prokaryotes
and eukaryotes is that, in the former, the 5-CCA-3 at the 3 end of the mature tRNAs is
encoded by the genes. In eukaryotic nuclear-encoded tRNAs, this is not the case.
(d) Chemical modifications of several bases and ribose units: Another processing event is
the modification of bases and ribose units of tRNAs in both prokaryotes and eukaryotes. Such
unusual bases are found in all tRNA molecules. They are formed by the enzymic modification of
a standard ribonucleotide in a tRNA precursor. Modification involves methylation, acetylation,
deamination, reduction, rearrangement, attachment of isopentenyl or SH group of bases. Many of
these modifications were first identified in tRNAs, within which approximately one in ten
nucleotides become altered. For eg., uridylate residues are modified after transcription to form
ribothymidylate and pseudouridylate. These modifications generate diversity, allowing greater
structural and functional versatility. These modifications are thought to mediate the recognition
of individual tRNAs by the enzymes that attach amino acids to these molecules and to increase
the range of the interactions that can occur between tRNAs and codons during translation,
enabling a single tRNA to recognize more than one codon.
Most of these modifications are carried out directly on an existing nucleotide within the
transcript but two modified nucleotides, quenosine and wyosine are put in place by cutting out an
entire nucleotide and replacing it with the modified version.
Different pre-tRNAs are processed in a similar way, but the base modifications are unique to
each particular tRNA type.
(e) Removal of introns: Some eukaryotic pre-tRNAs and archael transcripts also contain
introns. In eukaryotes and archaea, therefore the next step in tRNA processing is the removal of
the intron, which occurs by endonucleolytic cleavage at each end of the intron followed by
ligation of the half molecules of tRNA. The introns of yeast pre-tRNA can be processed in
vertebrates and therefore the eukaryotic tRNA processing machinery seems to have been highly
conserved during evolution. Fig. 28 shows various processing events of pre-tRNA in E. coli.
Post transcriptional processing of rRNA

In the prokaryote, E. coli, there are seven different operons for rRNA that are dispersed
throughout the genome and which are called rrnH, rrnE, etc. Each operon contains one copy of
each of the 5S, the 16S and the 23S rRNA sequences (Fig. 29). Within this operon, coding
sequences for tRNA molecules are also present and these primary transcripts are processed to
give both rRNA and tRNA molecules. The initial transcript has a sedimentation coefficient of
30S (~6500 nucleotides) and is normally quite short-lived.
63
pre-tRNA processing in E. coli:
RNase D
RNase
D
3'
A
5' C
RNase P C Endonuclease
(RNase E / F)
D loop T loop
Variable arm
Anticodon loop
Fig. 28: Pre-tRNA processing in E. coli
16S rRNA 4S tRNA

23S rRNA 5S rRNA
Fig. 29: Pre-rRNA transcript in prokaryotes (30S) (~6500 nt)
In many eukaryotes, the precursor rRNA contains one copy of the 18S coding region and one
copy each of the 5.8S and 28S coding regions, which together are the equivalent of the 23S
rRNA in prokaryotes (Fig. 30). The eukaryotic 5S rRNA is transcribed by RNA Pol III from
unlinked genes to give a 121-nucleotide transcript, which undergoes little or no processing.
The post-transcriptional processing of rRNA takes place in a defined series of steps:

(a) Modification of bases and ribose units: The primary rRNA transcript folds up into a
number of stem-loop structures by base pairing between complementary sequences in the
transcript. The formation of this secondary structure of stems and loops allows some proteins to
bind to form a ribonucleoprotein (RNP) complex. Many of these proteins remain attached to the
RNA and become part of the ribosome. The same modifications occur at the same positions on
all copies of an rRNA and these modified positions are, to a certain extent, the same in different
species. Functions for the modifications have not been identified, although most occur within
64
those parts of rRNAs thought to be most critical for the activity of these molecules in ribosomes.
Modified nucleotides might, for eg., be involved in rRNA catalyzed reactions such as synthesis
of peptide bonds.
18S rRNA 5.8S rRNA 28S rRNA
5S rRNA subunit transcribed separately
Fig. 30: Pre-rRNA transcript in eukaryotes (45S) (~13000 nt)
After the binding of proteins, modifications such as base and sugar (usually adenosine)
methylations take place, using S-adenosyl methionine (SAM) as methylating agent. In contrast to
the modifications made to bacterial rRNAs, which are carried out by enzymes that directly
recognize the sequences and / or structures of the regions of RNA containing the nucleotides to
be modified, the methylation in eukaryotes requires small nucleolar RNPs (snoRNPs). However,
the bacterial rRNAs are less heavily modified than eukaryotes ones. The snoRNAs are 70-100
nucleotides in length and are located in nucleolus. The snoRNAs contain segments of 10-21
nucleotides that are precisely complementary to segments of mature rRNAs containing O2
methylation sites. These snoRNA sequences are located between the conserved sequence motifs
known as box C (RUGAUGA) and box D (CUGA), which are respectively located on the 5 and
3 sides of the complementary segments. The site for methylation in rRNA is exactly the 5th
position upstream of box D. Methylation is mediated by a complex of nucleolar proteins
including methyltransferase. For conversion of uridine to pseudouridine, snoRNAs having
conserved motifs i.e., box H / ACA, are involved. These snoRNAs contain the sequence motifs
ACANNN at the 3 end and box H (conserved sequence ANANNA) at its 5 end. The conserved
motifs of such snoRNAs form a specific base paired interaction with its target site containing U,
which is then recognized by the modifying enzyme.
The chemical modifications occurring during maturation of rRNA and tRNA are listed in Table
11.
(b) Cleavage of precursor rRNA by nucleases: The cleavage includes two steps: The
primary cleavage event, which is mainly carried out by RNase III, releases precursors of the 5S,
16S and 23S molecules. The secondary cleavage step further cleaves at the 5- and 3-ends of
each of these precursors by RNases M5, M16 and M23, respectively, leading to release of mature
rRNA (Fig. 31).
For mammalian pre-rRNA, the 47S precursor (13500 nucleotide) undergoes a number of
cleavages, firstly in the external transcribed spacers (ETSs) 1 and 2. Cleavages in the internal
transcribed spacers (ITSs) then release the 20S pre-rRNA from the 32S pre-rRNA (Fig. 32). Both
of these precursors must be trimmed further and the 5.8S region must base pair to the 28S rRNA
before the mature molecules are produced. As with prokaryotic pre-rRNA, the precursor folds
and complexes with proteins as it is being transcribed. This takes place in the nucleolus.
65
Table 11: Examples of chemical modifications of nucleotides during rRNA and tRNA
processing
S. No. Modification Details Examples

1 Methylation Addition of one or more Methylation of guanosine gives 1-
methyl (-CH3) groups to methyl guanosine; 1-methyl
2 2
the base or sugar adenosine; N, N -Dimethyl
guanosine; N7-methylguanosine; 3-
methylcytidine; Ribothymidine
2 Deamination Removal of an amino (- Deamination of adenosine gives
NH2) group from the Inosine
base
3 Sulphur substitution Replacement of oxygen Formation of 4-Thiouridine
with sulphur
4 Base isomerization Changing the positions Isomerization of uridine gives
(Rearrangement) of atoms in the ring pseudouridine
component of the base
5 Double bond Converting a double Double bond saturation converts
saturation bond to a single bond uridine to dihydrouridine
(Reduction)
6 Acetylation Addition of acetyl N4-Acetylcytidine
group to the base
7 Nucleotide Replacement of an Incorporation of Quenosine and
replacement existing nucleotide with Wyosine
a new one
Processing of E. coli primary rRNA primary transcript:

Pre-tRNA Pre-5S rRNA
Pre-16S rRNA Pre-23S rRNA Pre-tRNA
Primary transcript
RNase III III P F III III P F P E
Precursors
RNase
M16 M16 M16 M16 M5M5
Mature rRNAs
16S rRNA 23S rRNA 5S rRNA
Fig. 31: Processing of E. coli primary rRNA transcript
66
Processing of mammalian primary rRNA primary transcript:
Pre-5.8S rRNA
Pre-18S rRNA Pre-28S rRNA 47S pre-rRNA
ETS1 ETS2
primary transcript
ITS1 ITS2
RNase
45S pre-rRNA
primary transcript
RNase
41S pre-rRNA
primary transcript
RNase
20S & 32S pre-rRNA
precursors
RNase
Mature rRNAs
18S 5.8S 28S
rRNA rRNA rRNA
Fig. 32: Processing of E. coli primary rRNA transcript
(c) Removal of introns: Some eukaryotic and archael rRNA pretranscripts, for eg.,
Tetrahymena thermophila, contain an intron in the precursor for the largest rRNA. Such introns
in pre-rRNA are extremely rare in bacteria. These pre-rRNAs undergo an unusual form of
processing before it can function. The RNA folds into an enzymatically active form or ribozyme
and splice out the introns. Although this process occurs in vivo in the presence of protein, it has
been shown that the intron can actually excise itself in the test tube in the complete absence of
protein.
Inhibitors of transcription
There are two types of inhibitors of transcription:
& RNA Pol binding inhibitors
& DNA specific inhibitors
RNA Pol binding inhibitors

Inhibitors that inhibit RNA Pol by binding noncovalently to RNA Pol are called RNA Pol
specific inhibitors. For eg., Rifamycin, Rifampicin, Streptolydigin, -Amanitin etc. (Fig. 33).
Two related antibiotics, rifampicin B, which is produced by Streptomyces mediterranei and its
semisynthetic derivative rifampicin specifically inhibit transcription by prokaryotic, but not
eukaryotic RNA polymerases. This selectivity and their high potency (bacterial RNA Pol is 50%
inhibited by 2 X 10-8 M rifamycin) have made them medically useful bactericidal agents against
Gram-positive bacteria and TB. Rifamycins inhibit neither the binding of RNA Pol to the
promoter nor the formation of the first phosphodiester bond, but they prevent further chain
67
elongation. The inactivated RNA Pol remains bound to the promoter, thereby blocking its
initiation by uninhibited enzymes.
Rifampicin: CH3 CH3
HO
H3C O
CH3 O
OH
OH OH
CH3
CH3
CH3
O NH
H3C
H3CO N
N
N
O
O OH
O
CH3
Rifampicin
CH3 CH3
R ifam ycin B:
HO
H 3C O
CH3 O
OH
OH OH
CH3
CH3
O NH
H 3C
H 3CO
H
O
_
O OCH OO
2
O
CH3
Rifamycin B
Amanitin: OH
CH OH
H3C 2
O
H
HN N H
N
HO
O
O
O
N
O
NH CH3
O
S
H2N OH
N
CH3
O
HN C
O
NH O
NH
O
Amanitin
Fig. 33: RNA Pol binding inhibitors
68
The poisonous mushroom Amanita phalloides contains a series of unusual bicyclic octapeptides
such as -amanatin, which disrupts mRNA formation in animal cells by blocking Pol II and at
higher concentrations, Pol III. Neither Pol I nor bacterial RNA Pol is sensitive to -amanatin nor
is the RNA Pol II of Amanita phalloides itself.
DNA specific inhibitors

Inhibitors that bind noncovalently to DNA template are called DNA specific inhibitors. For eg.,
Actinomycin D, Ethidium bromide, Acridine, Aflatoxin, 2-acetylaminofluorene etc. (Fig. 34).
Actinomycin D, produced by Streptomyces antibioticus, is a DNA specific inhibitor. It binds

tightly and specifically to duplex DNA and in doing so, strongly inhibits both transcription and
DNA replication in both prokaryotes and eukaryotes, presumably by interfering with the passage
of RNA and DNA polymerases. At low concentration, it inhibits transcription without
significantly affecting DNA replication. It has no effect on binding RNA Pol to DNA.
Several other intercalation agents, including ethidium bromide and proflavin also inhibit nucleic
acid synthesis, presumably by similar mechanisms. Acridine inhibits RNA synthesis in a fashion
similar to Actinomycin D i.e. by intercalation and deformation of DNA. Ethidium bromide is a
DNA specific dye, which intercalates between the DNA and binds preferentially to supercoiled
DNA. Aflatoxin (Fig. 35) obtained from the fungus Aspergillus flavus, inhibits both replication
and transcription. 2-acetyl amino fluorine is a synthetic carcinogen and inhibits both replication
and transcription.
Reverse transcriptase (RT) (RNA directed DNA polymerase)

The existence of RT in RNA viruses was predicted by Howard Temin in 1962 and the enzymes
were ultimately detected by Temin and independently by David Baltimore in 1970.
$ Source: Genes of all cellular organisms are made of DNA. The same is true for some viruses,
but for others the genetic material is RNA. Viruses are genetic elements enclosed in protein coats
that can move from one cell to another but are not capable of independent growth. One well-
studied example of an RNA virus is TMV, which infects the leaves of tobacco plants. This virus
consists of single strand of RNA (6930 nucleotides) surrounded by a protein coat of 2130
identical subunits. RNA directed RNA polymerase catalyze the replication of this viral RNA.
Another important class of RNA virus comprises the retroviruses, so called because the genetic
information flows from RNA to DNA rather than from DNA to RNA. This class includes HIV-1
as well as a number of RNA viruses that produce tumors in susceptible animals. Retrovirus
particles contain two copies of single stranded RNA molecule.
69
Ethidium brom ide:
O O
Aflatoxin B1: H N NH
2 2
O
+
N
C H
2 5
O O OCH3
Aflatoxin Ethidium bromide
Acridine:
+
(CH ) N N N(CH )
32 H 32
Acridine
Actinom ycin D: O O
=
C C
H3C CH
3
CH CH M ethyl-Val CH CH
H C CH
3 3
N CH N CH
3 3
C=O O=C
CH CH
2 2
H C N Sarcosine N CH
3 3
O O=C O=C O
HC CH
HC HC
H 2C CH
Pro 2
N N
HC CH
C=O O=C
H3C CH
3
CH CH D-Val CH CH
H C CH
3 3
NH NH
C=O C=O
CH CH Thr CH CH
CH HN NH CH
3 3
O=C C=O
N NH
2
O O
CH 3 CH 3
Phenoxazone ring system
Actinomycin D
Fig. 34: DNA specific inhibitors
70
Reverse transcriptases have been isolated and purified from several different RNA tumor viruses;
they have molecular weights ranging from 70000 to 160000. The RNA viruses containing RTs
are known as retroviruses (retro is Latin prefix for backward). Some RTs have also been isolated
from malignant cells of some animals and from human patients with leukemia, which closely
resemble the reverse transcriptase of some RNA tumor viruses. RTs, however, have also been
found in cells of animals and people thought to be normal and not infected by tumor viruses;
they have also been found in wild type E. coli. Telomerase is also a specialized RT.
$ Reaction catalyzed and properties: On infection with RNA viruses, the single stranded
RNA viral genome (~10000 nucleotides) and the enzyme enter the host cell. The RT first
catalyzes the synthesis of a DNA strand complementary to viral RNA, then degrades the RNA
strand of viral RNA-DNA hybrid and replaces it with DNA. The resulting duplex DNA often
becomes incorporated into the genome of eukaryotic host cell. These integrated (and dormant)
viral genes can be activated and transcribed and the gene products, viz viral proteins and the viral
RNA genome itself are packaged as new viruses (Scheme 4).
RNA
RT (RNA dependent RNA Pol)

Formation of RNA-DNA hybrid

RT (RNaseH)

Degradation of RNA strand

RT (DNA dependent DNA Pol)

Synthesis of second strand of DNA

Integrase

Integration into host genome
Transcription and translation
Formation of viral proteins and viral RNA

Packaging of new virus
Scheme 4: Flow diagram indicating role of reverse transcriptase in retroviral cycle
71
Like many DNA and RNA Pols, RT contains Zn++. Each RT is most active with its own virus,
but each can be used experimentally to make DNA complementary to a variety of RNAs. The
DNA synthesis and RNA degradation activities use separate active sites on the protein. The
reverse transcriptases closely resemble the DNA directed DNA polymerases and RNA
polymerases in that they make DNA in the 5 3 direction, utilize deoxyribonucleotides as
precursors and require both a template and a primer strand, which must have a free 3-OH
terminus. RTs require RNA template for nucleic acid synthesis; however, they can also utilize
DNA templates, but the latter are less effective than RNAs. The RTs are very active on natural
RNA templates, including the very large RNAs present in the viral particles. The DNAs
produced hybridize with their RNA templates. RTs, like RNA Pols, do not have 3 5 proof
reading exonucleases. They generally have error rate of about 1 per 20000 nucleotides added. An
error rate this high is extremely unusual in DNA replication and appears to be a feature of most
enzymes that replicate the genomes of RNA viruses. A consequence is a high mutation rate and
faster rate of viral evolution, which is a factor in the frequent appearance of new strains of
disease causing retroviruses.
RT catalyzes three different reactions:

& RNA dependent DNA synthesis or Reverse transcriptase activity
& RNA degradation or RNaseH activity
& DNA dependent DNA polymerase
$ Functions and applications: The function of RTs in normal cells is not understood; its
resence suggests that the transcription of messages from RNA into DNA is a normal process, for
eg., in synthesis of multiple copies of certain genes. The recognition of RTs has thus opened
some new avenues of research in biochemical genetics. RTs have become important reagents in
the study of DNA-RNA relationships and in DNA cloning techniques. They make possible the
synthesis of DNA complementary to an mRNA template and synthetic DNA prepared in this
manner called complementary DNA (cDNA) can be used to clone cellular genes.
Suggested Readings
1. Berg J.M., Tymoczko J.L., Stryer L., Biochemistry, International Edition, V Edition, W.H. Freeman & Co. New
York.
2. Watson J.D., Baker T.A., Bell S.P., Gann A., Levine M., Losick R., Molecular Biology of the Gene, Fifth
Edition, Pearson Education.
3. Lewin B., Genes VIII, International Edition, Pearson Education International.
4. Glick B.R., Pasternak J.J., Molecular Biotechnology Principles and Applications of Recombinant DNA, III
Edition, ASM Press.
5. Turner P.C., McLennan A.G., Bates A.D., White M.R.H., Instant notes Molecular Biology, II Edition.
6. Das H.K., Textbook of Biotechnology, Wiley Dreamtech.
7. Nelson D.L., Cox M.M., Lehninger Principles of Biochemistry, IV Edition, W.H. Freeman & Co., New York.
8. Voet D., Voet J.G., Biochemistry, John Wiley & Sons.
9. Twymann R.M., Advanced Molecular Biology, Viva Books Pvt. Ltd.
10. Brown T.A., Genomes 2, Wiley Liss Publ.
11. Metzler D.E., Biochemistry: The clinical reactions of living cells, II Edition, Volume 2, Elsevier Publ.
72

Transcription

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transcription

Uploaded by

Copyright:

Available Formats

MOLECULAR BIOLOGY

20-Jul-2006 (Revised 25-Jan-2008)

Coding, Plus (+), Sense strand

Double stranded DNA

Fig. 1: Coding and non coding strands in a DNA

Transcription in prokaryotes (Synthesisof mRNA/rRNA/tRNA)

(A) Structure of RNA polymerase

S. Gene Product Size Number of Number of Function Phases of

S. Gene Promoter sequence Functions

Fig. 2: Crab claw structure of RNA Pol

Fig. 3: Channel structure of RNA Pol

Recognizes 'Extended -10' region

Fig. 4: Regions of factor and their functions

S. No. Region Function / Properties of some sub-regions

(B) Synthesis of RNA in 5 3 direction

(NMP)n + NTP (NMP)n+1 + PPi

Scheme 1: Reaction catalyzed by RNA Pol

(C) Requirement of Mg++

(D) Significance of subunit of RNA Pol

factor participates in initiation of RNA synthesis by formation of open complex. In contrast to

(E) Functions of RNA polymerase

$ Binding to DNA and recognition of promoters

$ Stabilization of single stranded regions

(F) Fidelity of RNA synthesis

(A) Consensus sequences

Fig. 5: Constitution of a typical bacterial promoter

(a) Upstream promoter elements or UP elements: Richard Gourse discovered that

Fig. 6: Combinations of bacterial promoter elements

(B) Promoter efficiency

(C) Supercoiling is an important feature regulating efficiency of promoters

(D) Functions of promoter regions

(E) Alternative promoter sequences

Heat Shock 5. CCCTTGAA13-15 bp CCCGATNT...3

where, N can be any nucleotide

Fig. 7: Alternative promoter sequences

Fig. 8: The overall process of transcription in prokaryotes

ppp A + ppp N pppApN + PPi

Bacterial RNAs have 5-triphosphate groups as was demonstrated by the incorporation of

Movement of RNA polymerase

Fig. 9: Transcription bubble

-independent terminators (Intrinsic terminators) have two structural features:

$ A region that is rich in U residues at the very end of the unit

Rho independent termination

UUUUUUU Double stranded DNA

Fig. 10: Rho () independent (intrinsic) termination

Initiation Termination in absence of

No Rho (23S species)

Fig. 11: Effect of Rho protein on the size of the transcript

Mechanism of the termination of transcription by rho protein

Fig. 12: Mechanism of Rho dependent termination

Eukaryotic transcription apparatus

Eukaryotic transcription machinery involves three RNA polymerases, number of general

(1) RNA polymerase or DNA dependent RNA polymerase (RNA Pol)

Table 4: Properties of different eukaryotic RNA polymerases

S. No. Properties RNA Pol I RNA Pol II RNA Pol III

(A) RNA Pol I (RNA Pol A)

(B) RNA Pol II (RNA Pol B)

Various subunits of RNA Pol II are summarized below (Table 5).

RBP2 is structurally similar to the bacterial subunit.

(d) RBP6: RPB6 is homologous to the subunits of bacterial RNA Pol.

(C) RNA Pol III (also called as RNA Pol C)

Table 5: Comparison of prokaryotic and eukaryotic subunits of RNA polymerases

S. No. Prokaryotic Eukaryotic