You are on page 1of 11

Compiler Design

Laboratory Manual( V1.X)


Subject Code: CS60X
Semester: VI
Year: 2014- 15 (Spring Semester)

January 30, 2015

Faculty:
Mr. Bhaskar Mondal

Department of Computer Science and Engineering


National Institute of Technology Jamshedpur
Jamshedpur, Jharkhand, India- 831014

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

Contents
1

.
.
.
.

2
2
4
4
8

.
.
.
.

8
8
8
8
9

Syntax Directed Definition


3.1 Day 8: Exercise Using C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
9

Generation of Intermediate Code


4.1 Day 9: Exercise Using C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
9

Advanced Problems

Books to Follow

Design of Lexical Analyzer


1.1 Introduction . . . . . . . . .
1.2 Day 1: Using C . . . . . . .
1.3 Day 2: Introduction to LEX .
1.4 Day 3-4: LEX Exercise . . .

.
.
.
.

Parser
2.1 Day: 5: Exercise using C . . .
2.2 Introduction to YACC . . . . .
2.3 Day: 6: Exercise using YACC
2.4 Day 7: Exercise Using C . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

Version: 1.0 Page 1

.
.
.
.

.
.
.
.

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

Instructions
Maintain Index/ content properly.
Brief descriptions including algorithm used and flowchart of the work you did for each exercise.
Copies of the main C, Lex and Yacc you used for the exercises, along with the numerical results.
You must calculate and mention computational Complexity of each experiment.
You must provide Test Cases/sample Input and Output at the end of exercise, Print the plot
files (.jpg) corresponding with the different exercises(if any).
Explanations of anything unusual or interesting, or points of confusion that you were unable to
resolve outside lab.
If you believe I have an error in a lab, please inform me of it. Explain why you think it is an
error and, if you like, suggest a correction.

Design of Lexical Analyzer

1.1

Introduction

Figure 1
The patterns in the input are written using an extended set of regular expressions. These are:

Version: 1.0 Page 2

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com


x
.
[xyz]
[abj-oZ]
[-Z]
[-Z]
r*
r+
r?
r2,5
r2,
r4
name
"[xyz]foo"
\x
\0
\123
\x2a
(r)
rs
r|s
r/s

r0
r$

< s > r0
< > r0
<< EOF >>0

Compiler Design Lab. Manual

match the character x


any character (byte) except newline
a "character class"; in this case, the pattern matches either an x, a y, or a z
a "character class" with a range in it; matches an a, a b, any letter from j through
o, or a Z
a "negated character class", i.e., any character but those in the class. In this case,
any character EXCEPT an uppercase letter.
any character EXCEPT an uppercase letter or a newline
zero or more rs, where r is any regular expression
one or more rs
zero or one rs (that is, "an optional r")
anywhere from two to five rs
two or more rs
exactly 4 rs
the expansion of the "name" definition (see above)
the literal string: [xyz]"foo
if x is an a, b, f, n, r, t, or v, then the ANSI-C interpretation of x.
Otherwise, a literal x (used to escape operators such as *)
a NUL character (ASCII code 0)
the character with octal value 123
the character with hexadecimal value 2a
match an r; parentheses are used to override precedence (see below)
the regular expression r followed by the regular expression s; called "concatenation"
either an r or an s
an r but only if it is followed by an s. The text matched by s is included when
determining whether this rule is the longest match, but is then returned to the input
before the action is executed. So the action only sees the text matched by r. This
type of pattern is called trailing context. (There are some combinations of r/s
that flex cannot match correctly; see notes in the Deficiencies / Bugs section below
regarding "dangerous trailing context".)
an r, but only at the beginning of a line (i.e., which just starting to scan, or right after
a newline has been scanned).
an r, but only at the end of a line (i.e., just before a newline). Equivalent to "r/\n".
Note that flexs notion of "newline" is exactly whatever the C compiler used to
compile flex interprets as; in particular, on some DOS systems you must either
filter out s in the input yourself, or explicitly use r/\r for "r$".
an r, but only in start condition s (see below for discussion of start conditions)
<s1,s2,s3>r same, but in any of start conditions s1, s2, or s3
an r in any start condition, even an exclusive one.
an end-of-file <s1,s2>EOF an end-of-file when in start condition s1 or s2

Note that inside of a character class, all regular expression operators lose their special meaning
except escape (0 \0 ) and the character class operators, -, ], and, at the beginning of the class, . What
is a token

Version: 1.0 Page 3

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com


lexeme
sum
=
3
+
2
;

1.2

Compiler Design Lab. Manual

token
IDENT
ASSIGN_OP
NUMBER
ADD_OP
NUMBER
SEMICOLON

Day 1: Using C

1. Write a C Program to Design Lexical Analyzer which will identify keywords, identifiers, sentinels, special characters, operators, number of lines in code.
2. Write down a program in C to identify a input as id/ keywords/ number, the program should able
to take a line of instruction (int rate = 50;) and recognize all the small parts of the instruction.

1.3

Day 2: Introduction to LEX

% sudo apt-get install flex

(in ubuntu)

http://rpmfind.net/linux/rpm2html/search.php?query=flex
Installation of f lex on Windows Down load and Install Cygwin from https://www.cygwin.com/
Cygwin is:
a large collection of GNU and Open Source tools which provide functionality similar to a Linux
distribution on Windows.
a DLL (cygwin1.dll) which provides substantial POSIX API functionality.
Use Command Install gcc, make, gdb, flex ... package
Installation of f lex on Windows Installation Instructions
Step 1: Download FLEX
Step 2: Download DevC++
Step 3: Install FLEX in "C:\GnuWin32"
Step 4: Install DevC++ in "C:\Dev-Cpp"
Step 5: Open Environment Varibles (Steps on how to get to environment variables is given below)
Step 6: Add this "C:\GnuWin32\bin;C:\Dev-Cpp\bin;" to PATH.
Step 7: Stop
How to set to Environment Variables in Windows
Step 1: Click Start
Step 2: Right Click "Computer"
Step 3: Click Properties
Step 4: When the window opens, Click on the Advanced Settings in the left pane
Step 5: Click on Environment Variables in the bottom
Step 6: Select path in the 2nd window and click edit and add the lines mentioned above
Compiling lex programs Lets assume that you have a lex program written under name first.l
Step 1: Open Command prompt
Step2: Type "flex first.l"
Step3: Type "cc lex.yy.x"
Step 4: type "a"
Version: 1.0 Page 4

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com


Lex Source mylex.l

Lex.yy.c

Compiler Design Lab. Manual

Lex Compiler

C/ C++ Compiler

Input Stream

a.out

Lex.yy.c

a.out

Tokens

Figure 2
Compilation Here is the step by step method of compiling a LEX program.
% flex example.l (output f i l e lex .yy.c)
% gcc lex.yy.c -lfl
(-lfl : to link flex library )
or
%gcc lex.yy.c -ll
(-lfl : to link lex library )

Execution:
% cat input | ./a.out (in linux)
\$ cat input | ./a.exe (in windows)

You can also use:


yyin=fopen(

input

);

Writing Lex Program: The Structure of a Lex Program


(Declarations)
%%
(Regular expression rules)
%%
(Subroutines definitions)

Declarations Lex copies the material between %{ and %} directly to the generated C file, so
you may write any valid C codes here E.g.
%{
#define A 100
%}
WS [ \t]+
letter [A-Za-z]
digit [0-9]
op_plus "+"

Regular expression rules Each rule is made up of two parts


A pattern (regular expression)
An action
Lex had a set of simple disambiguating rules:
Lex patterns only match a given input character or string once
Version: 1.0 Page 5

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

Lex executes the action for the longest possible match for the current input
It can consists of any legal C code,
Lex copies it to the C file after the end of the Lex generated code
[\t ]+
{op_plus}
[a-zA-Z]+
.|\n

/* ignore white space */ ;


return OP_PLUS;
{ printf(%s: is alpha\n", yytext); }
{ ECHO; /* normal default anyway */ }

%%

Subroutines definitions The main functionn and other C functions if required.


main()
{
yylex();
}

Special Variables/Procedures:
yytext
yyleng
yylineno
yywrap
int yywrap(void)
int yylex(void)
char *yytext
yyleng
yylval
FILE *yyout
FILE *yyin
INITIAL
BEGIN
ECHO

where token text is stores


length of the token text
the current line number
A user function. It returns 1 when no more input to process, otherwise, return 0
wrapup, return 1 if done, 0 if not done 0
call to invoke lexer, returns token
pointer to matched string
length of matched string
value associated with token
output file
input file
initial start condition
condition switch start condition
write matched string

Create a Lex file, lets say "exp.l" and open it up in your favorite text editor (read: Notepad++)
%{
#define A 100
%}
WS [\t]+
letter [A-Za-z]
digit [0-9]
op_plus "+"
%%
[0-9]+ {printf ("digit");};
%%
main(){
yylex();
}
yywrap(void)
{
return 0;
}

Version: 1.0 Page 6

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

Example 2:
int num_lines = 0, num_chars = 0;
%%
\n
.

++num_lines; ++num_chars;
++num_chars;

%%
main()
{
yylex();
printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );
}

Scanner for a Pascal-like language:


%{
/* need this for the call to atof() below */
#include <math.h>
%}
DIGIT
ID

[0-9]
[a-z][a-z0-9]*

%%
{DIGIT}+

{
printf( "An integer: %s (%d)\n", yytext,
atoi( yytext ) );
}

{DIGIT}+"."{DIGIT}*
{
printf( "A float: %s (%g)\n", yytext,
atof( yytext ) );
}
if|then|begin|end|procedure|function
{
printf( "A keyword: %s\n", yytext );
}
{ID}

printf( "An identifier: %s\n", yytext );

"+"|"-"|"*"|"/"

printf( "An operator: %s\n", yytext );

"{"[^}\n]*"}"

/* eat up one-line comments */

[ \t\n]+

/* eat up whitespace */

printf( "Unrecognized character: %s\n", yytext );

%%
main( argc, argv )
int argc;
char **argv;
{
++argv, --argc; /* skip over program name */
if ( argc > 0 )
yyin = fopen( argv[0], "r" );

Version: 1.0 Page 7

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

else
yyin = stdin;
yylex();
}

1.4

Day 3-4: LEX Exercise

1. Program using LEX to recognize digit, number, words, operators, command lines, spaces.
2. Program using LEX to count the number of characters, words, spaces and lines in a given input
file.
3. Program using LEX to count the numbers of comment lines in a given C program. Also eliminate
them and copy the resulting program into separate file.
4. Program using LEX to recognize a valid arithmetic expression and to recognize the identifiers
and operators present. Print them separately.
5. Program using LEX to recognize whether a given sentence is simple or compound.
6. Program using LEX to recognize and count the number of identifiers in a given input file.

Parser

2.1

Day: 5: Exercise using C

1. Write a C program to compute FIRST, FOLLOW of a given grammar. 2. Write a C program to


compute FIRST, FOLLOW and look-ahead of a given grammar

2.2

Introduction to YACC

1. Install YACC. (Bison)


2. Study of LALR parser generation by Yacc.
3. Generate SLR parser using Bison
Lexical Rules

Grammer Rules

Lex

Yacc

yylex

yyparse

Input

Parsed input

Figure 3

2.3

Day: 6: Exercise using YACC

1. Convert The BNF rules into Yacc form and write code to generate abstract syntax tree.
2. YACC program to recognize a valid arithmetic expression that uses operators +, -, * and /.
Version: 1.0 Page 8

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

3. YACC program to recognize a valid variable, which starts with a letter, followed by any number
of letters or digits.
4. YACC program to evaluate an arithmetic expression involving operators +, -, * and /.
5. YACC program to recognize strings aaab, abbb, ab and a using the grammar (anbn, n
0).
6. Program to recognize the grammar (anb, n = 10).

2.4

Day 7: Exercise Using C

1. Implementation of Predictive Parser.

Syntax Directed Definition

3.1

Day 8: Exercise Using C

1. Write a C program to implement the syntax-directed definition of if E then S1 and if E then


S1 else S2.
2. Write a yacc program that accepts a regular expression as input and produce its parse tree as
output.

Generation of Intermediate Code

4.1

Day 9: Exercise Using C

1. Write a program for generating for various intermediate code forms: A Program to Generate
Machine Code.
(a) Three address code
(b) Quadruple
2. Write a program to generate the intermediate code in the form of Polish Notation

Advanced Problems
1.
2.
3.
4.

Develop a recursive decent parser


Write a program to Simulate Heap storage allocation strategy
Generate Lexical analyzer using LEX.
Generate YACC specification for a few syntactic categories.
(a) rogram to recognize a valid arithmetic expression that uses operator +, - , * and /.
(b) Program to recognise a valid variable which starts with a letter followed by any number of
letters or digits.
(c) Program to recognise the gramar(anb where n 10)
(d) Implementation of Calculator using LEX and YACC

Books to Follow

1 John R. Levine, Tony Mason, Doug Brown; Lex & Yacc, OReilly & Associates1992. ISBN:
9781565920002, Online: https://books.google.co.in/books?id=fMPxfWfe67EC
Version: 1.0 Page 9

Faculty: Bhaskar Mondal, Email:bm6779@gmail.com

Compiler Design Lab. Manual

2 Charles N. Fischer, Richard J. LeBlanc Jr., Ron K. Cytron, Crafting A Compiler 2011, Pearson Education. ISBN: 9780133001570, Online:https://books.google.co.in/books?id=GSYrAAAAQBAJ

Evaluation Scheme
EC
No.

Evaluation
Component

Duration

Weightage

Data &
Time

Nature of
Component

You May Meet Me:Every day 5:00pm.


You may mail me at bm.6779@gmail.com; (always mention your Roll Number followed by Subject at
the subject field.)

Version: 1.0 Page 10

You might also like