Lec 12

Code Generation Ⅱ
CS308 Compiler Theory 1

A Simple Code Generator
• One of the primary issues: deciding how to use registers to best
advantage
d
• Four principal uses:
– IIn most machine
hi architectures,
hi some or all
ll off the
h operands
d off an operation
i must be
b in
i
registers in order to perform the operation.
– Registers make good temporaries to hold the result of a sub expression or a variable that is
used only within a single basic block.
– Registers are used to hold (global) values that are computed in one basic block and used in
other blocks.
– Registers are often used to help with run-time storage management.

A Simple Code Generator
• Assumption of the code-generation algorithm in this section:
– Some set of registers is available to hold the values that are used within the block.
– The basic block has already been transformed into a preferred sequence of three-address
instructions
– For each operator, there is exactly one machine instruction that takes the necessary
operands in registers and performs that operation, leaving the result in a register

Register and Address Descriptors
• Descriptors are necessary for variable load and store decision.
• Register descriptor
– For each available register
– Keeping track of the variable
ariable names whose
hose ccurrent
rrent value
al e is in that register
– Initially, all register descriptors are empty
• Address descriptor
p
– For each program variable
– Keeping track of the location (s) where the current value of that variable can be found
– Stored
S d iin the
h symbol-table
b l bl entry for f that
h variable
i bl name.

The Code-Generation Algorithm
• Function getReg(I)
– Selecting registers for each memory location associated with the three-address instruction I.
• Machine Instructions for Operations
– For a three
three-address
address instr
instruction
ction ssuch
ch as x = y + z, do the follo
following:
ing:
1. Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry, and Rz .
2 . If y is not in Ry (according to the register descriptor for Ry) , then issue an instruction
LD Ry , y' , where y' is one of the memory locations for y (according to the address
descriptor for y) .
3. Similarly,y, if z is not in Rz , issue an instruction LD Rz,, z’ , where z’ is a location for z.
4. Issue the instruction ADD Rx , Ry , Rz.

• Machine Instructions for Copy Statements
– For x=y, getReg will always choose the same register for both x and y.
– If y is not in that register Ry , generate instruction LD Ry , y.
– If y was in Ry , do nothing
nothing.
– Need to adjust the register description for Ry so that it includes x as one of the values.
• Ending the Basic Block

– generate the instruction ST x, R, where R is a register in which x's value exists at the end of
the block if x is live on exit from the block.
block

• Managing Register and Address Descriptors
1 . For the instruction LD R, x
– (a) Change the register descriptor for register R so it holds only x.
– ((b)) Change
g the address descriptor
p for x by y addingg register
g R as an additional location.
2. For the instruction ST x, R, change the address descriptor for x to include its own location.
3. For an operation such as ADD Rx , Ry , Rz for x = y + z
– (a)
( ) Change
Ch th
the register
i t descriptor
d i t for
f Rx
R so that
th t it holds
h ld only
l x.
– (b) Change the address descriptor for x so that its only location is Rx .
– Note that the memory location for x is not now in the address descriptor for x .
– (c) Remove Rx from the address descriptor of any variable other than x.
4. When process a copy statement x = y , after generating the load for y into register Ry, if
needed, and after managing descriptors as for all load statements (per rule 1 ) :
– (a) Add x to the register descriptor for Ry .
– (b) Change the address descriptor for x so that its only location is Ry .

Design of the Function getReg
• Pick a register Ry for y in x=y+z
– 1 . y is currently in a register, pick the register.
– 2. y is not in a register, but there is an empty register, pick the register.
– 3. y is not in a register,
g , and there is no empty
p y register.
g
• Let R be a candidate register, and suppose v is one of the variables in the register descriptor
• need to make sure that v's value either is not needed, or that there is somewhere else we can go to
get the value of R.
R
(a) OK if the address descriptor for v says that v is somewhere besides R,
(b) OK if v is x, and x is not one of the other operands of the instruction(z in this example)
(c) OK if v is not used later
(d) Generate the store instruction ST v, R to place a copy of v in its own memory location. This
operation is called a spill.

Design of the Function getReg
• Pick a register Rx for x in x=y+z
– Almost as for y, differences:
1. Since a new value of x is being computed, a register that holds only x is a choice for Rx;
2. If y is not used after the instruction,, and Ryy holds onlyy y after beingg loaded,, then Ryy can be
used as Rx; A similar option holds regarding z and Rz ·

Test yourself
• Exercise 8.6.1
• Exercise 8.6.3

Peephole Optimization
• The peephole is a small, sliding window on a program.
• Peephole optimization, is done by examining a sliding window of target
instructions and replacing instruction sequences within the peephole by
a shorter
h t or faster
f t sequence, whenever
h possible.
ibl
• Peephole optimization can be applied directly after intermediate code
generation to improve the intermediate representation.
representation

Eliminating Redundant Loads and Stores
• LD instruction can be deleted for the sequence of
• Exception:
– The store instruction had a label.

Eliminating Unreachable Code
• An unlabeled instruction immediately following an unconditional jump
may be
b removed.d
• This operation can be repeated to eliminate a sequence of instructions.

Flow-of- Control Optimizations
• Unnecessary jumps can be eliminated in either the intermediate code or
the
h target code
d bby peephole
h l optimizations.
i i i
Suppose there is only one jump to L1

Algebraic Simplification and Reduction in Strength
• Algebraic identities can be used to eliminate three-address statements
– x = x+0; x=x*1
• Reduction
Reduction-in-strength
in strength transformations can be applied to replace
expensive operations
– x2 ; power(x, 2); x*x
– Fixed-point multiplication or division; shift
– Floating-point division by a constant can be approximated as multiplication by a constant

Use of Machine Idioms
• The target machine may have hardware instructions to implement
certain
i specific
ifi operations
i efficiently.
ffi i l
• Using these instructions can reduce execution time significantly.
• Example:
– some machines have auto-increment and auto-decrement addressing modes.
– The use of the modes greatly improves the quality of code when pushing or popping a stack
as in parameter passing.
– These modes can also be used in code for statements like x = x + 1 .

Register Allocation and Assignment
• Efficient utilization of registers is vitally important in generating good
code.
d
• This section presents various strategies for deciding at each point in a
program :
– what values should reside in registers (register allocation) and
– in which register
g each value should reside (register
( g assignment)
g ).

Global Register Allocation
• A natural approach to global register assignment is to try to keep a
f
frequently
l usedd value
l in
i a fixed
fi d register
i throughout
h h a loop.
l
• One strategy for global register allocation is to assign some fixed
number
b off registers
i t tot hold
h ld the
th mostt active
ti values
l in
i eachh inner
i loop.
l

Usage Counts
• Keeping a variable x in a register for the duration of a loop L
– Save one unit for each use of x
– save two units if we can avoid a store of x at the end of a block.
• An approximate formula for the benefit to be realized from allocating a

register x within loop L is
where use(x, B) is the number of times x is used in B prior to any

d fi i i off x, live(x,
definition l ( B)) is i 1 if x is
i live
li on exit
i from
f B andd is
i assigned
i d
a value in B, and live(x, B) is 0 otherwise.

Test yourself
• Usage counts of the variables

Discuss: Register Assignment for O uter Loops

Register Allocation by Graph Coloring
• Graph coloring is a systematic technique for allocating registers and
managing
i register
i spills.
ill
• Two steps:
(1) Target-machine instructions are selected as though there are an
infinite number of symbolic registers;
(2) Construct register-interference graph, and color the register-
interference graph using k colors, where k is the number of
assignable registers.
• Note that whether a graph is k-colorable is NP-complete.

Register Allocation by Graph Coloring
• Heuristic technique:
– Suppose a node n in a graph G has fewer than k neighbors.
– Remove n and its edges from G to obtain a graph G' .
– A k-coloringg of G' can be extended to a k-coloringg of G byy assigning
g g n a color not assigned
g
to any of its neighbors.
– By repeatedly eliminating nodes having fewer than k edges from G,
– 1) either we obtain the empty graph,
graph in which case we can produce a k-coloring
k coloring for G
– 2) or we obtain a graph in which each node has k or more adjacent nodes. Then a k-coloring
is no longer possible. (At this point a node is spilled by introducing code to store and
reload the register)

Instruction Selection by Tree Rewriting
• Instruction selection
– selecting target-language instructions to implement the operators in the intermediate
representation
– a large combinatorial task, especially for CISC machines
• In this section, we treat instruction selection as a tree-rewriting problem.

Tree-Translation Schemes
• Example:
– a tree for the assignment statement a [i] = b + 1 , where the array a is stored on the run-
time stack and the variable b is a global in memory location Mb .
the ind operator treats its argument as a

memory address.
dd

• The target code is generated by applying a sequence of tree-rewriting
rules
l to reduce
d the
h input
i tree to a single
i l node.
d
• Each tree-rewriting rule has the form
where replacement is a single node, template is a tree, and action is a

code
d fragment,
f t as in
i a syntax-directed
t di t d translation
t l ti scheme.
h
• Example:
E l



Code Generation by Tiling an Input Tree
• What if we use the tree-translation scheme above on the tree

Code Generation by Tiling an Input Tree
• To implement the tree-reduction process, we must address some issues
related
l d to tree-pattern matching:
hi
– How is tree-pattern matching to be done?
– What do we do if more than one template matches at a given time?

Pattern Matching by Parsing
• Uses an LR parser to do the pattern matching
• The input tree can be treated as a string by using its prefix
representation.

Pattern Matching by Parsing
• The tree-translation scheme can be converted into a syntax-directed
translation
l i scheme
h

Routines for Semantic Checking
• Restrictions on Attribute value
– Generic templates can be used to represent classes of instructions and the semantic actions
can then be used to pick instructions for specific cases.
• Parsing-action
g conflicts can be resolved byy disambiguating
g g ppredicates
that can allow different selection strategies to be used in different
contexts.

General Tree Matching
• The LR-parsing approach to pattern matching based on prefix
representations
i favors
f the
h left
l f operandd off a binary
bi operator.
• Postfix representation
– an LR-parsing
LR i approachh to pattern matching
hi would
ld favor
f the
h right
i h operand.
d
• Hand-written code generator
– an ad-hoc matcher can be written.
written
• Code-generator generator
– needingg a ggeneral tree-matchingg algorithm.
g
– An efficient top-down algorithm can be developed by extending the string pattern-matching
techniques

Test yourself
• Exercise 8.9.1 b)
• Exercise 8.9.2

Optimal Code Generation for Expressions
• Objective: generate optimal code for an expression tree when there is a
fi d number
fixed b off registers
i with
i h which
hi h to evaluate
l the
h expression.
i

Ershov Numbers
• Rules of assigning to the nodes of an expression tree a number
– The number tells how many registers are needed to evaluate that node without storing any
temporaries.
1. Label any leaf l .

2 . The label of an interior node with one child is the label of its child.
3 The label of an interior node with two children is
3.
(a) The larger of the labels of its children, if those labels are different.
(b) One plus the label of its children if the labels are the same.

Ershov Numbers
• Example:
(a - b) + e * (c + d)

Generating Code From Labeled Expression Trees
• Algorithm : Generating code from a labeled expression tree.
– INPUT: A labeled tree with each operand appearing once (no common sub expressions ) .
– OUTPUT : An optimal sequence of machine instructions to evaluate the root into a register.
– METHOD: Start at the root of the tree. If the algorithm
g is applied
pp to a node with label k,,
then only k registers will be used. However, there is a "base" b >=1 for the registers used so
that the actual registers used are Rb, Rb+l , . . . Rb+k-l.
1 . To generate machine code for an interior node with label k and two children with equal
labels do the following:
(a) Recursively generate code for the right child, using base=b + 1 . The result of the right child
appears in register Rb+k .
(b) Recursively generate code for the left child, using base b; the result appears in Rb+k-l
(c) Generate the instruction OP Rb+k , Rb+k
b+k-ll , Rb+k , where OP is the appropriate operation for
the interior node in question.

Generating Code From Labeled Expression Trees
• Algorithm : Generating code from a labeled expression tree. (cont.)
2. Suppose we have an interior node with label k and children with unequal labels. Then one of
the children has label k, and the other has label m < k. Do the following with using base b:
(a) Recursively generate code for the big child, using base b; the result appears in register
Rb+k-l .
(b) Recursively generate code for the small child, using base b; the result appears in register
Rb+m-l . Note that since m < k, neither Rb+k-l nor anyy higher-numbered
g register
g is used.
(c) Generate the instruction OP Rb+k-l , Rb+m-l , Rb+k-l or the instruction
OP Rb+k-1, Rb+k-l , Rb+m-l , depending on whether the big child is the right or left child.
3 . For a leaf representing operand x, if the base is b generate the instruction LD Rb , x .

Example
((a - b)) + e * (c
( + d))
C d for
Code f t2:
t2
Complete sequence of instructions
The label of the root is 3, the result will appear

in R3, and only R1, R2, and R3 will be used.
The base for the root is b = 1.

Evaluating Expressions with an Insufficient Supply
of Registers
• Algorithm: Generating code from a labeled expression tree.
INPUT: A labeled tree and a number of registers r>=2.
OUTPUT: An optimal sequence of machine instructions, using no more than r registers.
METHOD: Start at the root of the tree,, with base b = 1. For a node N with label r or less,, the
algorithm is exactly the same as the above Algorithm. For an interior node N labeled k > r:
1 . N has at least one child with label r or greater. Pick the larger child to be the "big" child and
let the other child be the "little"
little child.
2. For the big child, use base b = 1. The result of this evaluation will appear in register Rr.
3. Generate the machine instruction ST tk , Rr , where tk is a temporary variable .
4 For
4. F the h li
little
l child:
hild If it
i has
h label
l b l r or greater, pick
i k base
b b = 1.
1 If its
i label
l b l is
i j < r, b = r - j.
j
Then recursively apply this algorithm to the little child; the result appears in Rr .
5. Generate the instruction LD Rr-l , tk .
6. If the big child is the right child of N, then generate the instruction
OP Rr , Rr , Rr-1. If the big child is the left child, generate OP Rr, Rr-1 , Rr.

Example
For t3
F t3, using
i the
th original
i i l algorithm,
l ith
and the output is
Final output:
We then
W th need d bboth
th registers
i t ffor th
the lleft
ft child
hild
of the root, we need to generate the instruction

Test yourself
• Exercise 8.10.1 a)
• Exercise 8.10.3
• Exercise 8.10.2

Dynamic Programming Code-Generation
• The dynamic programming algorithm can be used to generate code for
any machine
hi with
i h r interchangeable
i h bl registers
i andd load,
l d store, andd add
dd
instructions.

Contiguous Evaluation
• The dynamic programming algorithm partitions the problem of
generating
i optimal
i l code
d for
f an expression
i into
i the
h sub-problems
b bl off
generating optimal code for the sub expressions of the given expression.
• Contiguous
g evaluation:
– Complete the evaluations of T1, T2, then evaluate root
• Noncontiguous evaluation:
– First evaluate part of T1 leaving the value in a register, next evaluate T2, then return to
evaluate the rest of T1
• Dynamic programming algorithm uses contiguous evaluation.
Contiguous Evaluation
• For the register machine in this section, we can prove that given any
machine-language
hi l program P to evaluate
l an expression
i tree T, we can
find an equivalent program p’ such that
1 . P’ is of no higher cost than P,

2 . P’ uses no more registers than P, and
3. p’ evaluates the tree contiguously.
• This implies that every expression tree can be evaluated optimally by a

contiguous program.

The Dynamic Programming Algorithm
• The dynamic programming algorithm proceeds in three phases (suppose
the
h target machine
hi has
h r registers)
i )
1. Compute bottom-up for each node n of the expression tree T an array C of costs, in which
the ith component C[i] is the optimal cost of computing the subtree S rooted at n into a
register, assuming i registers are available for the computation, for 1<=i<=r.
2. Traverse T, using the cost vectors to determine which subtrees of T must be computed into
memory.
3. Traverse each tree using the cost vectors and associated instructions to generate the final
target code. The code for the subtrees computed into memory locations is generated first.

Example
( a-b) +c* ( d/e ) Final output:

Lec 12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 12

Uploaded by

Copyright:

Available Formats

Code Generation Ⅱ

CS308 Compiler Theory 1

CS308 Compiler Theory 2

CS308 Compiler Theory 3

CS308 Compiler Theory 4

CS308 Compiler Theory 5

• Ending the Basic Block

CS308 Compiler Theory 6

CS308 Compiler Theory 7

CS308 Compiler Theory 9

CS308 Compiler Theory 10

CS308 Compiler Theory 11

CS308 Compiler Theory 12

CS308 Compiler Theory 13

CS308 Compiler Theory 14

Suppose there is only one jump to L1

CS308 Compiler Theory 15

CS308 Compiler Theory 16

CS308 Compiler Theory 17

CS308 Compiler Theory 18

CS308 Compiler Theory 19

• An approximate formula for the benefit to be realized from allocating a

where use(x, B) is the number of times x is used in B prior to any

CS308 Compiler Theory 20

CS308 Compiler Theory 21

CS308 Compiler Theory 22

• Note that whether a graph is k-colorable is NP-complete.

CS308 Compiler Theory 23

CS308 Compiler Theory 24

• In this section, we treat instruction selection as a tree-rewriting problem.

CS308 Compiler Theory 25

the ind operator treats its argument as a

CS308 Compiler Theory 26

where replacement is a single node, template is a tree, and action is a

CS308 Compiler Theory 27

CS308 Compiler Theory 28

CS308 Compiler Theory 29

CS308 Compiler Theory 30

CS308 Compiler Theory 31

CS308 Compiler Theory 32

CS308 Compiler Theory 33

CS308 Compiler Theory 34

CS308 Compiler Theory 35

CS308 Compiler Theory 36

CS308 Compiler Theory 37

1. Label any leaf l .

CS308 Compiler Theory 38

CS308 Compiler Theory 39

CS308 Compiler Theory 40

3 . For a leaf representing operand x, if the base is b generate the instruction LD Rb , x .

CS308 Compiler Theory 41

Complete sequence of instructions

The label of the root is 3, the result will appear

The base for the root is b = 1.

CS308 Compiler Theory 42

CS308 Compiler Theory 43

CS308 Compiler Theory 44

CS308 Compiler Theory 45

CS308 Compiler Theory 46

1 . P’ is of no higher cost than P,

• This implies that every expression tree can be evaluated optimally by a

CS308 Compiler Theory 48