You are on page 1of 6

XCS-602: Principles of Compiler Design

Assignment No 1
Report
Submitted by,
J Raffiq Ahmed Khan
III-CSE-A
Reg.no:113012012537
in partial fulfilment for the award of the degree of
Bachelor of Technology in Computer Science and Engineering

Introduction:
A compiler is a computer program (or a set of programs) that transforms source code written in a
programming language (the source language) into another computer language (the target language), with
the latter often having a binary form known as object code. The most common reason for converting
source code is to create an executable program.
The name "compiler" is primarily used for programs that translate source code from a high-level
programming language to a lower level language (e.g., assembly language or machine code). If the
compiled program can run on a computer whose CPU or operating system is different from the one on
which the compiler runs, the compiler is known as a cross-compiler. More generally, compilers are a
specific type of translator.
A program that translates from a low level language to a higher level one is a decompiler. A program that
translates between high-level languages is usually called a source-to-source compiler or transpiler. A
language rewriter is usually a program that translates the form of expressions without a change of
language. The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used
to help create the lexer and parser.

Structure of Compiler:
Compilers bridge source programs in high-level languages with the underlying hardware. A compiler
verifies code syntax, generates efficient object code, performs run-time organization, and formats the
output according to assembler and linker conventions. A compiler consists of:
The front end: Verifies syntax and semantics, and generates an intermediate representation or IR of the
source code for processing by the middle-end. Performs type checking by collecting type information.
Generates errors and warning, if any, in a useful way. Aspects of the front end include lexical analysis,
syntax analysis, and semantic analysis.
The middle end: Performs optimizations, including removal of useless or unreachable code, discovery
and propagation of constant values, relocation of computation to a less frequently executed place (e.g.,
out of a loop), or specialization of computation based on the context. Generates another IR for the
backend.
The back end: Generates the assembly code, performing register allocation in process. (Assigns processor
registers for the program variables where possible.) Optimizes target code utilization of the hardware by
figuring out how to keep parallel execution units busy, filling delay slots. Although most algorithms for
optimization are in NP, heuristic techniques are well-developed.

Phases of Compiler:
Since writing a compiler is a nontrivial task, it is a good idea to structure the work. A typical way of
doing this is to split the compilation into several phases with well-defined interfaces. Conceptually, these
phases operate in sequence (though in practice, they are often interleaved), each phase (except the first)
taking the output from the previous phase as its input. It is common to let each phase be handled by a
separate module. Some of these modules are written by hand, while others may be generated from
specifications. Often, some of the modules can be shared between several compilers.

A common division into phases is described below. In some compilers, the ordering of phases may differ
slightly, some phases may be combined or split into several phases or some extra phases may be inserted
between those mentioned below.
Lexical analysis: This is the initial part of reading and analysing the program text: The text is read and
divided into tokens, each of which corresponds to a symbol in the programming language, e.g., a variable
name, keyword or number.
Syntax analysis: This phase takes the list of tokens produced by the lexical analysis and arranges these
in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often
called parsing.
Type checking: This phase analyses the syntax tree to determine if the program violates certain
consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that does
not make sense given the type of the variable, such as trying to use a Boolean value as a function pointer.
Intermediate code generation: The program is translated to a simple machine independent intermediate
language.
Register allocation: The symbolic variable names used in the intermediate code are translated to
numbers, each of which corresponds to a register in the target machine code.
Machine code generation: The intermediate language is translated to assembly language (a textual
representation of machine code) for a specific machine architecture.
Assembly and linking: The assembly-language code is translated into binary representation and
addresses of variables, functions, etc., are determined.

Figure 1Phases Of Compiler

Interpreters
An interpreter is another way of implementing a programming language. Interpretation shares many
aspects with compiling. Lexing, parsing and type-checking are in an interpreter done just as in a
compiler. But instead of generating code from the syntax tree, the syntax tree is processed directly to

evaluate expressions and execute statements, and so on. An interpreter may need to process the same
piece of the syntax tree (for example, the body of a loop) many times and, hence, interpretation is
typically slower than executing a compiled program. But writing an interpreter is often simpler than
writing a compiler and the interpreter is easier to move to a different machine so for applications where
speed is not of essence, interpreters are often used.
Compilation and interpretation may be combined to implement a programming language: The compiler
may produce intermediate-level code which is then interpreted rather than compiled to machine code. In
some systems, there may even be parts of a program that are compiled to machine code, some parts that
are compiled to intermediate code, which is interpreted at runtime while other parts may be kept as a
syntax tree and interpreted directly. Each choice is a compromise between speed and space: Compiled
code tends to be bigger than intermediate code, which tend to be bigger than syntax, but each step of
translation improves running speed

Cousins of Compilers:
I Pre-processors:
Pre-processors produce input to compilers. They may perform the following functions.
Macro Processing: A pre-processor may allow a user to define macros that are shorthands for longer
constructs.
File Inclusion: A pre-processor may include header files into the program text.
Like <stdio.h> etc..
Rational Pre-processors: The processors augment older languages with more modern flow-of-control
and data-structuring facilities.
Language extensions: These pre-processors attempt to add capabilities to the language by what amounts
to built-in macros.
II Assemblers:
Assembly code is a mnemonic version of machine code, in which names are used instead of binary codes
for operations and names are also given to memory addresses.
For Example:
MOV a,R1
Add #2,R1
MOV R1,b
III Loaders and Link-Editors:
Usually, a program called a loader performs the two functions of loading and link-editing. The process of
loading consists of taking re-locatable machine code. The link-editor allows us to make a single program
from several files of re-locatable machine code.

Compiler Construction Tools


The Lex and Yacc :
The classic Unix tools for compiler construction.

Lex is a "tokenizer," helping to generate programs whose control flow is directed by instances of regular
expressions in the input stream. It is often used to segment input in preparation for further parsing (as
with Yacc).
Yacc provides a more general parsing tool for describing the input to a computer program. The Yacc user
specifies the grammar of the input along with code to be invoked as each structure in that grammar is
recognized. Yacc turns that specification into a subroutine to process the input.
If you are writing a compiler, that "process" involves generating code to be assembled to generate the
object code. Alternatively, if you are writing an interpreter, the "code to be invoked" will be code
controlling flow of the user's application.
Lemon
A LALR(1) parser generator that claims to be faster and easier to program than Bison or Yacc.
GCC - RTL Representation
Most of the work of the compiler is done on an intermediate representation called register transfer
language. In this language, the instructions to be output are described, pretty much one by one, in an
algebraic form that describes what the instruction does.
People frequently have the idea of using RTL stored as text in a file as an interface between a language
front end and the bulk of GNU CC. This idea is not feasible. GNU CC was designed to use RTL
internally only. Correct RTL for a given program is very dependent on the particular target machine. And
the RTL does not contain all the information about the program

Example

References:

Alfred V.Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, Compilers Principles, Techniques,
& Tools Second Edition, Pearson Publications.

https://en.wikipedia.org/wiki/Compiler

You might also like