You are on page 1of 11

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

9,633,660 members (43,378 online)

Sign in

home features

articles

quick answers help

discussions
Search for articles, questions, tips

community

Articles Languages C / C++ Language Utilities

Next

Article Browse Code Stats Revisions Alternatives Comments & Discussions (13)

Designing a Compiler
By masaniparesh , 22 Oct 2008
4.27 (13 votes)

About Article
this article talks about compiler design and implementation Type Licence First Posted Views Downloads Bookmarked Article CPOL 22 Oct 2008 39,080 2,737 38 times

Download mycompiler - 175.75 KB

Introduction
This is my project during my bechlor degree program. I have designed the partial C-Compiler. Though it is C-compiler the concept of all the compilers will be almost same. I have used LEX and YACC tools to generate the Lexical and Syntax analysis. I did the code optimized face as well. And I have generated final code in X86 machine and used MASM assembler to run the code. It was very successful. Compiler is a program that reads a program written in one language, called source language, and translated it in to an equivalent program in another language, called target language. It reports errors detected during the translation of source code to target code.Source program can be of any programming language. Here our source program is ANSI C program. Target program can be either Assembly language code or machine code. Here we have produced 8086 Assembly level code for the given source code. The first phase of the compiler, Lexical Analyzer, is being implemented by using LEX tool provided with Linux. The second phase, Syntax Analyzer, is being implemented by using Yacc tool provided by Linux. The third phase, Semantic Analyzer, and the fourth phase, Intermediate Code Generation, are carried out as the part of action corresponding to the production rules in the parser. The intermediate code so produced is 8086 Assembly code, which is converted into an executable by using MASM.

C Windows Dev Design Advanced

1 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

Phases of Compiler
There are mainly SIX phases of a compiler. The first four Lexical analysis, Syntax analysis, Semantic analysis and Intermediate Code Generation are part of Analysis phases, which are machine independent phases. While the other two Code Optimization and Code Generation are part of Synthesis phases, which are highly machine dependent phases. The phases of a compiler are:

I)Lexical Analysis
Lexical Analysis or Linear Analysis or Scanning, in which the stream of characters making up the source program is read from left-to-right and grouped in to tokens, sequence of characters having a collective meaning. The blanks separating the characters of the tokens and the comments statements appearing within the program would normally be eliminated during lexical analysis. There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing. 1)Simpler design is perhaps the most important consideration. The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases. For example, a parser embodying the conventions for comments and white spaces is significantly more complex than one that can assume comments and white space have already been removed by a lexical analyzer. if we are designing a new language, separating the lexical and syntactic conventions can lead to a cleaner overall language design. 2)Compiler efficiency is improved. A separate lexical analyzer allows us to construct a specialized and potentially more efficient processor for the task. A large amount of time is spent reading the source program and partitioning it into tokens. Specialized buffering techniques for reading input characters and processing tokens can significantly speed up the performance of a compiler. 3)Compiler portability is enhanced. Input alphabets peculiarities and other device-specific anomalies can be restricted to the lexical analyzer. The representation of special or non-standard symbols can be isolated in the lexical analyzer.

Top News
How Newegg crushed the shopping cart patent and saved online retail
Get the Insider News free each morning.

Related Articles
XamlVerifier - check or auto correct binding path at compile and design time Design Patterns 2 of 3 Structural Design Patterns Form Designer Design Patterns 1 of 3 Creational Design Patterns

2 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

Specialized tools have been designed to help automate the construction of lexical analyzers and parsers when they are separated. LEX is a widely used tool to specify lexical analyzers for a variety of languages. We refer to the tool as the LEX compiler, and to its input specification as the LEX language. LEX is generally used in the manner of a lexical analyzer, is prepared by creating a program lex.l in the LEX language. Then, lex.l is run through the LEX compiler to produce a C program lex.yy.c. The program lex.yy.c consists of a tabular representation of a transition diagram constructed from the regular expressions of lex.l, together with a standard routine that uses the table to recognize lexemes. The actions associated with regular expressions in lex.l are pieces of C code and are carried over directly to lex.yy.c. Finally, lex.yy.c is run through the compiler to produce an object program a.out, which is the lexical analyzer that transforms an input stream into sequence of tokens. Please refer figure: interaction_with_lex and lexical_analyzer

Design Patterns 3 of 3 Behavioral Design Patterns Compiler Patterns WPF Diagram Designer - Part 4 Screen Designer Classes A designable PropertyTree for VS.NET WPF Diagram Designer: Part 1 WPF Diagram Designer - Part 3 Object Design for the Perplexed Designer centric Wizard control How a C++ compiler implements exception handling MVVM Diagram Designer NoSpamEmailHyperlink: 1. Design An XML Compiler Design by Contract Framework Designing Nested Controls What are Online Compilers & Online IDEs?

LEX Specifications
A LEX program consists of three parts: declarations %% transition rules %% auxiliary procedures The declaration section includes declarations section includes declarations of variables, manifest constants, and regular definitions. (A manifest constant is an identifier that is declared to represent a constant.) The regular definitions are used as components of the regular expressions appearing in the translation rules. The transition rules of a LEX program are statements of the form P1 {action1} P2 {action2} . .. Pn {actionn} Where each P1 is a regular expression and each action1 is a program fragment describing what action the lexical analyzer should take when pattern P1 matches lexeme. In LeX, the actions are written in C; in general, however, they can be in any implementation language.

3 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

II)Syntax Analysis (I have attached our Syntax analyzer rules and YACC usage)
Syntax Analysis or Hierarchical Analysis, in which characters or tokens are grouped hierarchically into nested collections with collective meaning. Hierarchical analysis also termed as Parsing, involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Usually, the grammatical phrases of the source program are represented by a Parse tree.

III)Semantic Analysis
Semantic Analysis, in which certain checks are performed to ensure that the components of a program fit together meaningfully. The semantic analysis phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses hierarchical structure determined by the syntaxanalysis phase to identify the operators and operands of expressions and statements. An important component of semantic analysis is type checking. Here the compiler checks that each operator has operands that are permitted by the source language specification. Moreover, a compiler must check that the source program follows both the syntactic and semantic conventions of the source language. This checking, called static checking ensures that certain kinds of programming errors will be detected and reported. The checking done when the target program runs is termed as dynamic checking. In principle, any check can be done dynamically, if the target code carries the type of an element along with the value of that element. A sound system eliminates the need for dynamic checking for type errors because it allows us to determine statistically that these errors cannot occur when the target program runs. That is, if a sound system assigns a type other than type_error to a program part, then type errors cannot occur when the target code for the program part is run. A language is strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors.

4 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

Static checks include:


1. Type checks: A compiler should report an error if an error if an operator is applied to an incompatible operand. 2. Flow-of-control checks: Statements that cause flow of control to leave a construct must have some place to which hto transfer the flow of control. 3. Uniqueness checks: There are situations in which an object must be defined exactly once. 4. Name-related checks: Sometimes, the same name must appear two or more times. The compiler must check that the same name is used at both places.

IV)Intermediate Code Generation


After the syntax and semantic analysis, some compilers generate as explicit intermediate representation of the source program. We can think of this intermediate representation as a program for an abstract machine. This intermediate representation should have two important properties: it should be easy to produce and easy to translate into the target program. The intermediate representation can have a variety of forms. We consider an intermediate form of 8086 Assembly language, which is like three address code. This intermediate form has several properties: 1)Each instruction has at most one operator in addition to the assignment. Thus, generating these instructions, the compiler has to decide on the order in which operations are to be done. 2)The compiler must generate a temporary name to hold the value computed by each instruction. 3)Some instructions may have fewer than three operands. In the analysissynthesis model of a compiler, the front end translates a source program into an intermediate representation from which the back end generates target code. Details of the target language are confined to the back end, as far as possible. Although a source program can be translated directly into the target language, some benefits of using a machineindependent intermediate form are: 1.Retargeting is facilitated; a compiler for a different machine can be created by attacking a back end for the new machine to an existing front end.

5 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

2.A machineindependent code optimizer can be applied to the intermediate representation.

V)Code Optimization
The code optimization phase attempts to improve intermediate code, so that faster-running machine code will result. There is a great variation in the amount of code optimization different compilers perform. In those that do the most, called optimizing compilers, a significant amount of the time of the compiler is spent on this phase. However, there are simple optimizations that significantly improve the running time of the target program without slowing down compilation too much.

VI)Machine Code Generation


The final phase of the compiler is the generation of target code, consisting normally relocatable machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task. A crucial aspect is the assignment of variables to registers.

Symbol Table Management


An essential function of a compiler is to record the identifiers used in the source program and collect information about various attributes of each identifier. These attributes may provide information about the storage allocated for an identifier (if applicable), its type, its scope and in the case of procedure names, such things as the number and types of arguments, the method of passing each argument and the type of return value if the functions returns any value. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. When an identifier in the source program is detected by the lexical analyzer, the identifier is entered into the symbol table. However, the attributes of an identifier

6 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

cannot normally be determined during lexical analysis. The remaining phases enter information about identifiers into the symbol table and then use this information in various ways.

Error Detection and Reporting:


Each phase can encounter errors. However, after detecting an error, a phase must somehow deal with that error, so that compilation can proceed, allowing further errors in the source program to be detected. A compiler that stops when it finds error is not as helpful as it could be. The syntax and semantic analysis phases usually handle a large fraction of the errors detectable by the compiler. The lexical phase can detect errors where the characters remaining in the input do not form any token of the language. Errors where the token stream violates the structure rules (syntax) of the language are determined by the syntax analysis phase. During semantic analysis the compiler tries to detect constructs then have the right syntactic structure but no meaning to the operation involved.

Sources:
1. Compilers: Principles, Techniques and Tools By, Alfred V. Aho Ravi Sethi Jeffery D. Ullman 2. Compiler Design in C By Allen I. Holub 3. Linux manual pages for LEX and Yacc

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

7 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

masaniparesh

System Programming and Research oriented work is my passion.

Article Top
Like 1 0 Tweet 0

Sign Up to vote Poor

Excellent

Vote

Comments and Discussions

You must Sign In to use this message board. Search this forum Profile popups Spacing Relaxed
Open All

Go Noise Very High Layout Per page 10 Update

First Prev Next

My vote of 4

sharmila sivamany

7 Jan '13 - 3:36

this is helpful
Sign In View Thread Permalink

8 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

regarding my project

Afrab

2 Apr '12 - 4:18

i have taken up the project "Syntax Analyzer" in which i want to show the generation of parse tree for the program... how can i do it? which language can i use..? plz comment
Sign In View Thread Permalink

help needed with ur code...


sir,

megh16

30 Aug '10 - 13:23

i was wondering what all tools are needed to compile and run your code. i mean we will need a UNIX installation first of all that can run lex and yacc. can we use UWIN for this? also you said that we need MASM ( microsoft assembler ) to convert the asm to binary. so should i use the 32 bit MASM32, or some other 16 bit version. the question is whether the assembly u r generating for x86, is 32 bit or 16bit?
Sign In View Thread Permalink

problem with code

vddiaz

17 Nov '09 - 18:43

hi, i try to use your code but i can't because when i try to compile with gcc the files lex.yy.c and y.tab.c some error occur and the files don't compile I'm using cygwin and the flex and bison of cygwin... thanks for your help Vctor Daz
Sign In View Thread Permalink

9 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

Re: problem with code

masaniparesh

17 Nov '09 - 19:47

Sorry...I am not able to spend time on this further but I am sure all the required information I have given. Please follow the procedure given in docs and you will able to get rid of it. I am not sure but you might need to customize few items based on the plateform you are using. Cheers, Paresh
Sign In View Thread Permalink

problem wid ur code

namefull

8 May '09 - 20:48

your project doesnt seem to be running for the test file you have provided
Sign In View Thread Permalink

Re: problem wid ur code

masaniparesh

17 Nov '09 - 19:45

Sorry...I am not able to spend time on this further but I am sure all test files were running absolutely fine.
Sign In View Thread Permalink

Re: problem wid ur code

Aabhas Sharma

9 Dec '11 - 9:49

I've tried on multiple platforms and it still doesn't compile.. Maybe the versions of flex and bison that you are using are outdated.
Sign In View Thread Permalink

10 of 11

2013-02-05 08:46

Designing a Compiler - CodeProject

http://www.codeproject.com/Articles/30353/Designing-a-Compiler

Nitpicking?

f0dder

27 Oct '08 - 21:22

This might be nitpicking, but the tool is called 'lex' and not 'LeX' - yes, it ends in 'ex', but it's not related to the TeX typesetting system and shouldn't really be funky-cased like that
Sign In View Thread Permalink

Re: Nitpicking?

masaniparesh

28 Oct '08 - 2:58

You are right. I have changed it to LEX. Thanks for comment. Regards, Paresh
Sign In View Thread Permalink

Last Visit: 31 Dec '99 - 19:00 Update: 4 Feb '13 - 16:43

Last

Refresh

1 2

Next

General Answer

News Joke

Suggestion Question Rant Admin

Bug

Permalink | Advertise | Privacy | Mobile Web03 | 2.6.130204.1 | Last Updated 23 Oct 2008

Layout: fixed | fluid

Article Copyright 2008 by masaniparesh Everything else Copyright CodeProject, 1999-2013 Terms of Use

11 of 11

2013-02-05 08:46

You might also like