Professional Documents
Culture Documents
The isochore organization and the compositional distribution of homologous coding sequences in
the nuclear genome of plants
ABSTRACT
I R L Press
5273
METHODS
DNAs.
Ethiolated
5274
5275
aestivum; Z. mays.
RESULTS
cC
Buoyant density
g/cm3
4
2
Actin
2
0
m, *
Adh
Cab
c2
Chs
GAPDH
O 0G~~~~~Histones
f
E2
6 ~~
A
~Ru bisco
40
50
60
GC ,.%
70
80
0
2
Ad h
0
Cob
4)
c
4)
0)
L
-o
E
z
4
2
0
2
0
2
0
2
0
2
0
8
6
4
2
0
r-
Chs
nF l-
-GAPDH
His tones
R u b sc o
40
A
:~~~~~~~~~~~~~~~~~~~~~~60
50
70e,%
GC, %
Tyha
5278
4
2
t0
U,,O
2
0 o
E04
2
0
8
6
4
2
0
GC ,%
Compositional distribution
5279
A c t in
Ad h
C~~~
,0 50
2
E 0
L
position
4
2
Cab
mnn*
GA PD H
z2
Phy tochrome
6
o30 n, l,50
b
~~~~
|
g
~~~~Ru
2 2
30
40
50
C60/7
80
sco
90 100
usage
CoTearison
of
GC
levels
of
exons
and
introns
with
GC
levels
of
Phe TST
TIC
Zs
As
Cp
Phyt. Phyt. Actl
GM
Actl
Zm
Ps
Adhl
AdhI
CAPDIO
SO.
CAPDH
1.14
0.86
0.91
1.09
0.50
1.50
0.72
1.28
0.43
1.57
0.57
1.43
0.20
1.24
1.86
0.30
0.50
0.30
1.80
0.90
1.00
2.75
1.00
0.30
1.S0
1.43
1.72
3.00
1.66
2.70
0.75
1.20
2.04
0.99
1.50
0.82
0.72
U.24
0.73
2.20
0.13
1.24
0.73
0.76
1.27
1.15
1.39
1.85
0.46
1.15
CTG
1.21
1.38
2.05
1.14
0.29
0.62
0.52
Ile ATT
ATC
ATA
1.07
1.34
0.59
1.48
0.87
0.65
1.50
1.37
0.13
Val GTT
GTC
CTA
GTG
1.40
0.77
0.72
1.11
1.43
0.64
0.84
1.09
0.39
0.64
2.06
0.52
0.39
1.03
0.92
0.92
0. 31
1.85
Ser TCT
TCC
TCA
1.28
1.21
0.94
0.40
1.38
0.48
0.64
1.28
1.28
0.86
0.35
0.35
Leu TTA
TTG
CTT
0.34
1.11
1.60
CTC
CTA
0.82
0.92
TCG
1.56
0.52
1.65
1.S5
0.41
1.42
1.55
0.43
0.83
0.20
2.18
Zs
13
1.29
0.86
0.38
1.71 3.00
m
2.14
2.62
0.50
0.57
1.99
1.99
1.14
0 .49
1. 72
2.14
0.86
0.43
4.15
0.86
J.08
0.16
2.75
1.00
2.00
3.00
1.46
0. 36
0.36
1.82
1.89
1. 20
0 .40
2.81
1.47
1.00
2.50
1.00
3.14
1.11
1.11
.67
1.11
0.29
1.33
1.33
U.:3
0 3
2.50
0.50
2.00
1.33
2.00
1.00
2.00
0.24
2.44
1.00
0 .40
2. 18
0.67
2.00
1.00
1.56
0.80
1.82
0.50
1.75
0.25
1.00
1.36
1.09
1.06
1.76
1.14
1.43
1.14
0.29
2.33
2.21
0.95
0.63
0.32
2.25
1.90
0.21
1.47
0.42
1.20
0.60
0.80
1.40
1.20
0.40
1.60
0.80
1.33
0.67
0.67
1.33
0.44
2.22
1.67
1.82
1.80
1.80
0.20
0.20
1.40
1.00
0.8U
1.09
2.00
0.73
0.18
1.64
1.46
0.54
O.J6
1.41
0.94
0.82
0.82
2.34
0.28
1.10
0.28
1.93
1.79
0.28
2.52
1.04
0.44
1.58
1.33
0.82
3.00
2.40
2.40
1.09
1.09
0.67
2.67
1.33
0.67
1.33
1.33
4.00
2.40
1.20
2.00
0.80
4.00
1.24
0.76
1.52
0.48
0.67
1.29
0.71
0.40
1.60
0.33
1.67
0.67
0.50
1.33
1.S0
2.00
1.00
1.00
0.80
1.62
0.50
1.18
0.12
1.00
1.00
2.00
2.00
2.00
1.43
2.00
1.00
1.00
2.00
2.00
2.00
0.67
1.33
2.00
0.75
1.25
1.56
0.75
0.44
0.91
1.09
1.33
1.20
0.67
1.25
1.09
0.40
1.60
0.57
1.43
2.00
Asn AAT
AAC
0.93
1.50
0.50
0.22
1.78
1.43
0.44
1.56
1.07
0.93
0.46
1.07
1.54
1.60
Lys AAA
8.50
1.50
0.86
1.14
0.21
1.79
0.42
1.58
0.38
1.62
0.92
1.08
0.29
1.71
0.32
1.68
2.00
1.29
0.71
1.37
0.63
1.39
1.36
0.64
0.77
1.23
1.68
0.40
0.S4
0.68
0.61
1.46
1.12
0.93
1.07
0.27
1.73
0.24
1.76
2.00
0.55
1.43
0.64
1.36
Cys TGT
1.13
0.87
1.40
0.60
0.80
1.20
1.33
0.67
0.62
1.38
1.64
0.46
Arg CGT
CGC
CGA
CGG
0.68
1.11
0.22
0.89
0.34
1.33
0.34
0.34
0.34
4.24
2.57
0.92
0.46
0.92
0.35
0.86
Set ACT
AGC
1.10
1.38
0.76
0.64
2.36
0.43
1.93
0.35
0.29
2.12
1.72
2.00
2.00
2.00
2.00
1.75
0.21
1.90
1.26
0.63
1.47
0.63
2.77
0.92
1.23
0.21
1.68
0.77
0.33
3.33
1.23
2.04
1.96
0.57
1.6U
0.92
0.92
2.00
2.00
2.00
0.67
1.33
2.00
2.00
2.00
0.40
1.60
2.00
1.14
0.86
2.00
0.91
1.09
0.33
2.00
2.00
1.08
2.15
0.92
0.92
0.57
2.00
0.40
0.91
2.00
2.00
2.00
1.09
2.00
0.29
1.71
2.00
1.09
0.91
1.29
2.00
0.20
1.80
2.00
0.29
1.71
0.20
1.80
2.00
0.10
1.90
0.38
1.62
0.50o
1.50o
1.S0
0.50
2.00
1.33
0.67
2.00
0.91
1.09
2.00
1.14
0.86
2.10
0.83
1.17
0.13
2.00
1.14
0.86
2.00
2.00
0.57
1.43
2.00
0.67
1.33
2.00
0.45
1.55
2.00
2.00
2.00
_
_m
2.00
2.00
1.00
1.50
0.35
4.59
1.76
0.35
4.00
1.99
1.33
1.09
2.OU
0.71
_ _
1.09
1.09
1.87
2.00
2.00
0.40
2.00
2.57
2.40
8
0.80
0.35
2.50
0.24
2.70
2.67
1.33
0.44
1.56
0.57
0.80
0.80
2.22
2.28
1.14
1.09
2.18
0.30
2.40
0 .30
0 .93
1.72
0.73
2.40
1.78
0.57
0.80
1.14
0.86
2.28
1.71
1.50
2.33
1.29
0.71
0.99
1.01
1.05
0.42
2.53
2.28
1.52
0.48
Glu GAA
GAG
4.00
1.20
1.60
0.20
1.80
0.73
1.27
0.47
1.53
1.00
0.57
1.14
1.29
0.71
0.97
3.00
1.80
1.20
0.60
1.00
0.40
His CAT
CAC
2.00
3.00
1.60
Gln CAA
CAG
0.91
1.28
1.20
0.55
0.49
1 .30
1.78
0.65
1.30
1.85
1.33
Tyr TAT
TAC
3rd
0.77
1.23
2.67
1.16
0.42
G+CS
2.00
0.65
0.58
1.39
2.00
0.44
0.29
2.15
M1
Chs
2.16
1.89
0.75
1.21
AGG
Zm
Chs
0.45
1.55
1.89
1.52
0.96
1.52
Gly GGT
Rub.
1.67
1.67
1.67
0.17
8.50
1.58
Gm
Rub.
1.89
0.76
1.84
Arg AGA
Cm
1.94
1.75
1.10
2.08
0.43
2.57
Ala GCT
GCC
GCA
GCG
8.56
0.91
0.75
1.50
0.38
0.11
3.23
1.00
0.91
TGC
3.00
2.00
0.95
1.33
1.33
CAC
0.23
0.46
0.68
2.32
0.73
Asp GAT
1.50
0.75
4.50
1.72
0.48
1.52
0.28
AAG
2.00
1.06
0.94
4.00
1.30
1.02
1.40
0.28
1.33
Ph
Cab
1.50
1.99
1.00
0.50
Tht ACT
ACC
ACA
ACC
1.09
1.27
Cab
2.87
0.29
0.29
0.26
1.14
0.38
Zs
114
1.00
1.00
1.60
0.57
CCC
CCA
CCG
At
114
1.00
1.00
2.0U
2.24
0.49
0.98
0.29
Pro CCT
Z2
At
HJ
0.40
0 11
0.64
1.90
Zs
0.26
3.65
2.63
130
1.13
0.75
_
2.25
_
1.50
1.14
1.55
1.72
0. 86
0.67
0. 67
0.67
_
0. 78
0.75
0. 75
2.86
0.34
0. 80
0 .78
0.78
5.25
_
4.00
0.75
0786
2.66
0.63
1.27
3.00
0.67
1.20
3.60
1.76
2.12
1.20
1.20
1.99
0.57
3.43
0.47
2.59
0.24
0.78
1.65
0.24
2.12
0.12
3.12
0 12
0.62
1.88
0.38
1. 50
0. 25
3.27
0.36
0. 36
_
_
1.00
0.44
1.99
1.45
1.99
1.67
0.71
0.43
0.71
2.14
0.92
3.27
1.60
1.07
0.53
0.80
1.62
0.50
1.62
1.84
1.87
1.47
0.40
0.27
0.28
0.26
0.38
0.97
0.54
8.86
0.32
0.86
0.97
45.9
54.0
48.6
68.6
51.1
61.5
51.9
69.4
50.1
64.6
50.8
69.3
53.7
41.7
68.7
56.0
93.4
50.4
94.2
53.9
95.6
45.7
97.0
61.4
97.7
62.1
GGC
0.64
GGA
GGG
1.12
1.01
0.64
1.39
0.75
0.96
1.Su
47.1
41.9
51.4
47.0
54.9
46.8
37.7
59.6
42.9
66.6
2.77
1.09
2.50
1.99
0.71
1.86
0.57
2.86
0.57
1.99
1.14
1.43
1.43
1.29
1.16
positions.
5281
20
5
I ntrons
80
60_
20
50
40
30
Flanking sequences,GCY%
60
position
Ist codon
80
0~~*0~
60
O00
40
20
position
2nd codon
80
60
0
0
00
v40
20
3rd codon
80
posi t ion
60
_
0
40
4
20
30
40
60
50
Flanking sequences,GC/.
Co2Eositional
distributions of DNA
fraqments
monocots
(1).
When third codon positions of homologous genes are
examined, the GC range covered in the case of dicots was 36-64%,
which again, expectedly, corresponds to the center of
distribution of third codon positions of all available sequences
for dicots (1). In the case of Gramineae, the range was 46-100%;
this covered both the low and the high GC range of all third
codon positions. Expectedly again, the lowest and the highest GC
values corresponded to the coding sequences which were lowest
and highest in GC, respectively.
Second codon positions showed GC levels which were very
close in dicots and Gramineae. The latter exhibited, however,
slightly higher values in two cases and a slightly lower value
in one case.
Finally, the case of first codon positions is similar to
that of third codon positions, but values from Gramineae were
not as much higher than dicot values; in one case, that of the
actin gene, the value was even slightly lower.
A conclusion to be drawn from the data of Figs. 2-5 is that
the differences in GC levels previously found in coding
sequences from dicots and Gramineae (1) are also found in the
set of homologous coding sequences under consideration (and in
their codon positions). In other words, the differences were not
due to the gene samples investigated, but were real differences,
5285
direction.
Codon_usaqe
A detailed statistical analysis of codon usage patterns in
plant genes is being carried out and will be presented elsewhere
in due time. Some conclusions can, however, be already drawn on
the basis of the results obtained in this work.
First of all, as expected, all coding sequences exhibiting
very high GC levels in codon third positions are characterized
by the absence of a very large number (23 to 31) of codons and
by very low levels (RSCU < 0.30) of some additional codons. This
applies to all coding sequences higher than 90t GC in third
codon position. The effect is, however, very evident also in the
80-90% GC range and perhaps already above 70% GC (this cannot be
decided, however, since only one value in the 70-80% GC range is
available). One should expect a similar phenomenon of codon
absence also for very low GC values in third positions. Indeed,
the only GC value in third codon position lower than 30% (18.4%,
in the lignin-forming peroxidase gene from tobacco) shows an
important level of absent or strongly underrepresented codons.
Coding sequences lower than 70% GC in third codon positions
showed a different extent of codon avoidance which was, however,
markedly different for different genes. Indeed, as shown by the
data of Table 1, some genes with 50-60% GC in third codon
position may show a strong codon avoidance (this is the case of
the genes for histone H3, histone H4 and rubisco from dicots),
5286
5288
5289
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
and Bernardi G.
J., Matassi G., Montero L.M.
(1988) Nucl. Acids Res. 16:4269-42852.
Bernardi G, Olofsson B., Filipski J., Zerial M., Salinas J.,
(1985)
and Rodier F.
Cuny G, Meunier-Rotival M.
Science 228:953-958.
Gouy M., Milleret F., Mugnier C., Jacobzone M. and Gautier C.
(1984) Nucl. Acids Res. 12:121-127.
Sharp P.M., Tuohy T.M.F. and Mosurcki K.R. (1986)
Nucl. Acids Res. 14:5125-5143.
Ingle J., Pearson G.G. and Sinclair J. (1973) Nature
242:193-197.
Shapiro, H.S. (1976). In: Fasman, G.D. (ed.) Handbook of
Biochemistry and Molecular Biology, 3rd edn, vol. II,
pp. 241-281, CRC Press Inc.
Perrin P. and Bernardi G. (1987) J. Mol. Evol. 26:301-310.
Bernardi G., Mouchiroud D., Gautier C. and Bernardi G. (1988)
J. Mol. Evol. 28: 7-18.
Niesbach-Klosgen, U., Barzen, E., Bernhardt, J., Rohde, W.,
Schwarz-Sommer, Zs., Reif, H.J., Wienand, U. and Saedler, H.
(1987) J. Mol. Evol. 26, 213-225.
Ikemura, T. (1985) Mol. Biol. Evol. 2:13-34.
Murray E.E., Lotzer J. and Eberle M. (1989) Nucl. Acids
Res. 17:477-498.
5290