Professional Documents
Culture Documents
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
Sign in
home features
articles
discussions
Search for articles, questions, tips
community
Next
Article Browse Code Stats Revisions Alternatives Comments & Discussions (13)
Designing a Compiler
By masaniparesh , 22 Oct 2008
4.27 (13 votes)
About Article
this article talks about compiler design and implementation Type Licence First Posted Views Downloads Bookmarked Article CPOL 22 Oct 2008 39,080 2,737 38 times
Introduction
This is my project during my bechlor degree program. I have designed the partial C-Compiler. Though it is C-compiler the concept of all the compilers will be almost same. I have used LEX and YACC tools to generate the Lexical and Syntax analysis. I did the code optimized face as well. And I have generated final code in X86 machine and used MASM assembler to run the code. It was very successful. Compiler is a program that reads a program written in one language, called source language, and translated it in to an equivalent program in another language, called target language. It reports errors detected during the translation of source code to target code.Source program can be of any programming language. Here our source program is ANSI C program. Target program can be either Assembly language code or machine code. Here we have produced 8086 Assembly level code for the given source code. The first phase of the compiler, Lexical Analyzer, is being implemented by using LEX tool provided with Linux. The second phase, Syntax Analyzer, is being implemented by using Yacc tool provided by Linux. The third phase, Semantic Analyzer, and the fourth phase, Intermediate Code Generation, are carried out as the part of action corresponding to the production rules in the parser. The intermediate code so produced is 8086 Assembly code, which is converted into an executable by using MASM.
1 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
Phases of Compiler
There are mainly SIX phases of a compiler. The first four Lexical analysis, Syntax analysis, Semantic analysis and Intermediate Code Generation are part of Analysis phases, which are machine independent phases. While the other two Code Optimization and Code Generation are part of Synthesis phases, which are highly machine dependent phases. The phases of a compiler are:
I)Lexical Analysis
Lexical Analysis or Linear Analysis or Scanning, in which the stream of characters making up the source program is read from left-to-right and grouped in to tokens, sequence of characters having a collective meaning. The blanks separating the characters of the tokens and the comments statements appearing within the program would normally be eliminated during lexical analysis. There are several reasons for separating the analysis phase of compiling into lexical analysis and parsing. 1)Simpler design is perhaps the most important consideration. The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases. For example, a parser embodying the conventions for comments and white spaces is significantly more complex than one that can assume comments and white space have already been removed by a lexical analyzer. if we are designing a new language, separating the lexical and syntactic conventions can lead to a cleaner overall language design. 2)Compiler efficiency is improved. A separate lexical analyzer allows us to construct a specialized and potentially more efficient processor for the task. A large amount of time is spent reading the source program and partitioning it into tokens. Specialized buffering techniques for reading input characters and processing tokens can significantly speed up the performance of a compiler. 3)Compiler portability is enhanced. Input alphabets peculiarities and other device-specific anomalies can be restricted to the lexical analyzer. The representation of special or non-standard symbols can be isolated in the lexical analyzer.
Top News
How Newegg crushed the shopping cart patent and saved online retail
Get the Insider News free each morning.
Related Articles
XamlVerifier - check or auto correct binding path at compile and design time Design Patterns 2 of 3 Structural Design Patterns Form Designer Design Patterns 1 of 3 Creational Design Patterns
2 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
Specialized tools have been designed to help automate the construction of lexical analyzers and parsers when they are separated. LEX is a widely used tool to specify lexical analyzers for a variety of languages. We refer to the tool as the LEX compiler, and to its input specification as the LEX language. LEX is generally used in the manner of a lexical analyzer, is prepared by creating a program lex.l in the LEX language. Then, lex.l is run through the LEX compiler to produce a C program lex.yy.c. The program lex.yy.c consists of a tabular representation of a transition diagram constructed from the regular expressions of lex.l, together with a standard routine that uses the table to recognize lexemes. The actions associated with regular expressions in lex.l are pieces of C code and are carried over directly to lex.yy.c. Finally, lex.yy.c is run through the compiler to produce an object program a.out, which is the lexical analyzer that transforms an input stream into sequence of tokens. Please refer figure: interaction_with_lex and lexical_analyzer
Design Patterns 3 of 3 Behavioral Design Patterns Compiler Patterns WPF Diagram Designer - Part 4 Screen Designer Classes A designable PropertyTree for VS.NET WPF Diagram Designer: Part 1 WPF Diagram Designer - Part 3 Object Design for the Perplexed Designer centric Wizard control How a C++ compiler implements exception handling MVVM Diagram Designer NoSpamEmailHyperlink: 1. Design An XML Compiler Design by Contract Framework Designing Nested Controls What are Online Compilers & Online IDEs?
LEX Specifications
A LEX program consists of three parts: declarations %% transition rules %% auxiliary procedures The declaration section includes declarations section includes declarations of variables, manifest constants, and regular definitions. (A manifest constant is an identifier that is declared to represent a constant.) The regular definitions are used as components of the regular expressions appearing in the translation rules. The transition rules of a LEX program are statements of the form P1 {action1} P2 {action2} . .. Pn {actionn} Where each P1 is a regular expression and each action1 is a program fragment describing what action the lexical analyzer should take when pattern P1 matches lexeme. In LeX, the actions are written in C; in general, however, they can be in any implementation language.
3 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
II)Syntax Analysis (I have attached our Syntax analyzer rules and YACC usage)
Syntax Analysis or Hierarchical Analysis, in which characters or tokens are grouped hierarchically into nested collections with collective meaning. Hierarchical analysis also termed as Parsing, involves grouping the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Usually, the grammatical phrases of the source program are represented by a Parse tree.
III)Semantic Analysis
Semantic Analysis, in which certain checks are performed to ensure that the components of a program fit together meaningfully. The semantic analysis phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses hierarchical structure determined by the syntaxanalysis phase to identify the operators and operands of expressions and statements. An important component of semantic analysis is type checking. Here the compiler checks that each operator has operands that are permitted by the source language specification. Moreover, a compiler must check that the source program follows both the syntactic and semantic conventions of the source language. This checking, called static checking ensures that certain kinds of programming errors will be detected and reported. The checking done when the target program runs is termed as dynamic checking. In principle, any check can be done dynamically, if the target code carries the type of an element along with the value of that element. A sound system eliminates the need for dynamic checking for type errors because it allows us to determine statistically that these errors cannot occur when the target program runs. That is, if a sound system assigns a type other than type_error to a program part, then type errors cannot occur when the target code for the program part is run. A language is strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors.
4 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
5 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
V)Code Optimization
The code optimization phase attempts to improve intermediate code, so that faster-running machine code will result. There is a great variation in the amount of code optimization different compilers perform. In those that do the most, called optimizing compilers, a significant amount of the time of the compiler is spent on this phase. However, there are simple optimizations that significantly improve the running time of the target program without slowing down compilation too much.
6 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
cannot normally be determined during lexical analysis. The remaining phases enter information about identifiers into the symbol table and then use this information in various ways.
Sources:
1. Compilers: Principles, Techniques and Tools By, Alfred V. Aho Ravi Sethi Jeffery D. Ullman 2. Compiler Design in C By Allen I. Holub 3. Linux manual pages for LEX and Yacc
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
7 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
masaniparesh
Article Top
Like 1 0 Tweet 0
Excellent
Vote
You must Sign In to use this message board. Search this forum Profile popups Spacing Relaxed
Open All
My vote of 4
sharmila sivamany
this is helpful
Sign In View Thread Permalink
8 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
regarding my project
Afrab
i have taken up the project "Syntax Analyzer" in which i want to show the generation of parse tree for the program... how can i do it? which language can i use..? plz comment
Sign In View Thread Permalink
megh16
i was wondering what all tools are needed to compile and run your code. i mean we will need a UNIX installation first of all that can run lex and yacc. can we use UWIN for this? also you said that we need MASM ( microsoft assembler ) to convert the asm to binary. so should i use the 32 bit MASM32, or some other 16 bit version. the question is whether the assembly u r generating for x86, is 32 bit or 16bit?
Sign In View Thread Permalink
vddiaz
hi, i try to use your code but i can't because when i try to compile with gcc the files lex.yy.c and y.tab.c some error occur and the files don't compile I'm using cygwin and the flex and bison of cygwin... thanks for your help Vctor Daz
Sign In View Thread Permalink
9 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
masaniparesh
Sorry...I am not able to spend time on this further but I am sure all the required information I have given. Please follow the procedure given in docs and you will able to get rid of it. I am not sure but you might need to customize few items based on the plateform you are using. Cheers, Paresh
Sign In View Thread Permalink
namefull
your project doesnt seem to be running for the test file you have provided
Sign In View Thread Permalink
masaniparesh
Sorry...I am not able to spend time on this further but I am sure all test files were running absolutely fine.
Sign In View Thread Permalink
Aabhas Sharma
I've tried on multiple platforms and it still doesn't compile.. Maybe the versions of flex and bison that you are using are outdated.
Sign In View Thread Permalink
10 of 11
2013-02-05 08:46
http://www.codeproject.com/Articles/30353/Designing-a-Compiler
Nitpicking?
f0dder
This might be nitpicking, but the tool is called 'lex' and not 'LeX' - yes, it ends in 'ex', but it's not related to the TeX typesetting system and shouldn't really be funky-cased like that
Sign In View Thread Permalink
Re: Nitpicking?
masaniparesh
You are right. I have changed it to LEX. Thanks for comment. Regards, Paresh
Sign In View Thread Permalink
Last
Refresh
1 2
Next
General Answer
News Joke
Bug
Permalink | Advertise | Privacy | Mobile Web03 | 2.6.130204.1 | Last Updated 23 Oct 2008
Article Copyright 2008 by masaniparesh Everything else Copyright CodeProject, 1999-2013 Terms of Use
11 of 11
2013-02-05 08:46