You are on page 1of 33

CS 434

Compilers Design
Dr. Ayman Hamarsheh

Lecture 1

Introduction Programs, Interpreters and Translators

Programming languages are notations for describing computations to people and to machines All the software running on all the computers was written in some programming language Before a program can be run, it first must be translated into a form in which it can be executed by a computer The software systems that do this translation are called compilers

a compiler is a program that can read a program in one language - the source language - and translate it into an equivalent program in another language the target language; An important role of the compiler is to report any errors in the source program that it detects during the translation process. If the target program is an executable machine-language program, it can then be called by the user to process inputs and produce outputs

An interpreter is another common kind of language processor. Instead of producing a target program as a translation, an interpreter appears to directly execute the operations specified in the source program on inputs supplied by the user. The machine-language target program produced by a compiler is usually much faster than an interpreter at mapping inputs to outputs . An interpreter, however, can usually give better error diagnostics than a compiler, because it executes the source program statement by statement.

The main advantages of compilers

They produce programs which run quickly. They can spot syntax errors while the program is being compiled (i.e. you are informed of any grammatical errors before you try to run the program).

The main advantages of interpreters There is no lengthy "compile time", i.e. you do not have to wait between writing a program and running it, for it to compile They tend to be more "portable", which means that they will run on a greater variety of machines.

In addition to a compiler, several other programs may be required to create an executable target program. A source program may be divided into modules stored in separate files. The task of collecting the source program is sometimes entrusted to a separate program, called a preprocessor. The preprocessor may also expand shorthands, called macros, into source language statements.

The compiler may produce an assembly language program as its output, because assembly language is easier to produce as output and is easier to debug. The assembly language is then processed by a program called an assembler that produces relocatable machine code as its output.

Large programs are often compiled in pieces, so the relocatable machine code may have to be linked together with other relocatable object files and library files into the code that actually runs on the machine. The linker resolves external memory addresses, where the code in one file may refer to a location in another file. The loader then puts together all of the executable object files into memory for execution.

Source Program

Translators Compilers

Target Program

Interpreters

Lecture 2 The Structure of a Compiler


Analysis-Synthesis Model of Translation (Compilation)

There are two parts of compilation:


Analysis part Synthesis part

Source code

Front End

Intermediate Representation

Back End

Machine code

Errors

The Analysis Part:


It is often called the front end of the compiler Breaks up the source program into constituent pieces and imposes a grammatical structure on these pieces. Creates an intermediate representation of the source program. If the analysis part detects that the source program is either syntactically ill formed or semantically unsound, then it must provide informative messages, so the user can take corrective action.

Collects information about the source program and stores it in a data structure called a symbol table, which is passed along with the intermediate representation to the synthesis part. During analysis, the operations implied by the source program are determined and recorded in a hierarchical structure called a tree.

The Synthesis Part:


It is often called the back end of the compiler constructs the desired target program from the intermediate representation and the information in the symbol table.

Phases of compilation process:


Compiler operates as a sequence of phases, each of which transforms one representation of the source program to another. In practice, several phases may be grouped together, and the intermediate representations between the grouped phases need not be constructed explicitly. Symbol table, which stores information about the entire source program, is used by all phases of the compiler.

Phases of compilation process:


Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Machine-Independent code optimization Code Generation Machine-Dependent Code Optimization

Issues in compiler design


The compiler deals with many big-picture issues Compiler construction brings together techniques from disparate parts of Computer Science. Compilers are engineered objectssoftware systems built with distinct goals in mind. In building a compiler, the compiler writer makes myriad design decisions, each decision has an impact on the resulting compiler. a well designed compiler must observe is inviolable.

Lecture 3 Programming Language Specifications

Definition of Syntax
In computer science, the syntax of a programming language is the set of rules that define the combinations of symbols that are considered to be correctly structured programs in that language. The syntax of a language defines its surface form. Text-based programming languages are based on sequences of characters. visual programming languages are based on the spatial layout and connections between symbols (which may be textual or graphical).

Definition of Syntax
The syntax of a programming language describes the proper form of its programs. The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols.

The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars.

Semantics and Pragmatics


The two stages of analysis semantics and pragmatics, are concerned with getting at the meaning of a sentence. In the first stage (semantics) a partial representation of the meaning is obtained based on the possible syntactic structure(s) of the sentence, and on the meanings of the words in that sentence In the second stage, the meaning is elaborated based on contextual and world knowledge

Semantics
In general, the input to the semantic stage of analysis may be viewed as being a set of possible parses of the sentence, and information about the possible word meanings.

Lecture 4
In-depth Study of Syntactic Specifications

Syntactic
The syntactic analysis of source code usually entails the transformation of the linear sequence of tokens into a hierarchical syntax tree (abstract syntax trees are one convenient form of syntax tree)

Syntax definition
The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols

Syntax definition
The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics Not all syntactically correct programs are semantically correct

Using natural language as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false: "John is a married bachelor. " is grammatically well-formed but has no generally accepted meaning.

No ambiguity allowed in programming languages in form (syntax) and meaning (semantics) Distinction between syntax and semantics: many programming languages have features that mean the same (shared semantics) but are expressed differently identifying which is which helps the learning curve

Syntax Specification
Formalism: set of production rules Microsyntax rules: concatenation, alternation (choice among finite alternatives), Kleene closure - The set of strings produced by these three rules is a regular set or regular language - The rules are specified by regular expressions they generate the regular language - Strings in the regular language are recognized by scanners

You might also like