You are on page 1of 32

Bottom Up Parsing

November 2016

Bottom Up Parsing

Bottom up parsing algorithms will begin with


an empty stack.
One or more input symbols are moved onto
the stack, which are then replaced by
nonterminals according to the grammar rules.
When all the input symbols have been read,
the algorithm terminates with the starting
nonterminal, alone on the stack, if the input
string is acceptable

Shift Reduce Parsing

Bottom up parsing involves two fundamental


operations.
The process of moving an input symbol to the
stack is called a shift operation, and
The process of replacing symbols on the op of
the stack with a nonterminal is called a
reduce operation
For the following grammar

A derivation tree for the string caabaab


could look like

The shift reduce parser will proceed as follows

1)each step will be either a shift (shift an input


symbol to the stack) or reduce (reduce
symbols on the stack to a nonterminal)
2)The sequence of
pictured horizontally to
show, more clearly, the shifting of input
characters onto the stack and the sentential
forms corresponding to this parse

The algorithm accepts the input if the stack


can be reduced to the starting nonterminal
when all of the input string has been read

Shift reduce parsing will always correspond to


a right-most derivation

Whenever a reduce operation is performed,


the symbols being reduced are always on top
of the stack.
The string of symbols being reduced is called
a handle
It is imperative in bottom up parsing that the
algorithm be able to find a handle whenever
possible

If the parser for a particular grammar can be


implemented with a shift reduce algorithm, we
say the grammar is LR
The L indicates we are reading input from the
left, and the R indicates we are finding a
right-most derivation
The shift reduce parsing algorithm always
performs a reduce operation when the top of
the stack corresponds to the right side of a
rule

However, if the grammar is not LR, there may


be instances where this is not the correct
operation, or there may be instances where it
is not clear which reduce operation should be
performed
For instance, When parsing the input string
aaab using the grammar below

We reach a point where it appears that we


have a handle on top of the stack (the
terminal a), but reducing that handle does not
lead to a correct bottom up parse
This is called a shift/reduce conflict because
the parser does not know whether to shift an
input symbol or reduce the handle on the
stack
This means that the grammar is not LR, and
we must either rewrite the grammar or use a
different parsing algorithm

Another problem in shift reduce parsing occurs


when it is clear that a reduce operation should
be performed,
but there is more than one grammar rule
whose right hand side matches the top of the
stack,

and it is not clear which rule should be used.

This is called a reduce/reduce conflict

For instance for the following grammar

An attempt to parse the input string aa with


the shift reduce algorithm will result in
reduce/reduce conflict
we encounter a reduce/reduce conflict when
the handle a is on the stack because we don't
know whether to reduce using rule 2 (correct)
or rule 3 (incorrect)

In the examples presented it is possible to


avoid conflicts by looking ahead at additional
input characters
An LR algorithm that looks ahead k input
symbols is called LR(k).
Read about: LR parsing with tables

Code Generation

Code Genration

The primary objective of the code generator is to


convert atoms or syntax trees to instructions
In the process, it is also necessary to handle
register allocation for machines that have
several general purpose CPU registers.
Label atoms must be converted to memory
addresses.
For some languages, the compiler has to check
data types and call the appropriate type
conversion routines if the programmer has
mixed data types in an expression or assignment

Code Generation

Many designers view the construction of


compilers as made up of two logical parts
The front end and the back end.

The front end consists of lexical and syntax analysis and


is machine-independent.

The back end consists of code generation and


optimization and is very machine-dependent

Converting Atoms to Instructions

Each atom class would result in a different


instruction or sequence of instructions.
If the CPU of the target machine requires that
all arithmetic be done in registers, then an
example of a translation of an ADD atom
would be as shown below

Single Pass Vs. Multiple Passes

There
are
several
different
ways
of
approaching the design of the code generation
phase.
The difference between these approaches is
generally characterized by the number of
passes which are made over the input file
A code generator which scans a file of atoms
once is called a single pass code generator
A code generator which scans it more than
once is called a multiple pass code
generator

Code Optimization

Code Optimization

Optimization is the process of improving


generated code so as to

Reduce its potential running time and/or

Reduce the space required to store it in memory

Software designers are often faced with


decisions which involve a space-time tradeoff
However, many optimization techniques are
capable of improving the object program in
both time and space

Optimization techniques can be separated into


two general classes: local and global
Local optimization techniques normally are
concerned with transformations on small sections
of code (involving only a few instructions)

Generally operate on the machine language instructions

Codes which are produced by the code generator

Global optimization techniques are generally


concerned with larger blocks of code, or even
multiple blocks or modules, and

will be applied to the intermediate form, atom strings, or


syntax trees put out by the parser

Local and Global Optimization

Both local and global optimization phases are


optional

The output of the parser is the input to the global


optimization phase

The output of the global optimization phase is the input


to the code generator

The output of the code generator is the input to the local


optimization phase,

And the output of the local optimization phase is the final


output of the compiler

Local and Global Optimization

A fundamental question of philosophy is


inevitable in the design of the optimization
phases

Should the compiler make extensive transformations and


improvements to the source program, or

Should it respect the programmers decision to do things


that are inefficient or unnecessary?

Most compilers tend to assume that the


average programmer does not intentionally
write inefficient code, and will perform the
optimizing transformations

Global Optimization

For the following C++ statement

The sequence of atoms put out by the parser


could be

Global Optimization

In the above example, it is clearly not necessary to


evaluate the sum b + c twice.
In addition, the MOV atom is not necessary because the
MUL atom could store its result directly into the variable
a

The atom sequence shown is equivalent to the one


given before, but requires only two atoms because

It makes use of common subexpressions and

It stores the result in the variable a, rather than a temporary location.

Local Optimization

If the compiler
expression

is

given

the

following

The parser would translate the expression into


the following stream of atoms

Local Optimization

The simplest code generator would generate three


instructions corresponding to each atom:

Load the first operand into a register (LOD),

Perform the operation, and

Store the result back to memory (STO).

The code generator would then produce the


following instructions from the atoms:

Local Optimization

Notice that the third and fourth instructions in


this sequence are entirely unnecessary since
the value being stored and loaded is already
at its destination
It optimized to the following sequence of four
instructions by eliminating the intermediate
Load and Store instructions

End

You might also like