Dinesh Compiler Design Lab Work-3

OBJECTIVE 1: STUDY OF DIFFERENT PHASES OF COMPILER.
STUDY:
NAME: DINESH KISHNANI Roll No. :0114CS151044

Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as
a stream of characters and converts it into meaningful lexemes. Lexical analyzer
represents these lexemes in the form of tokens as:
<token-name, attribute-value>
Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by
lexical analysis as input and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code grammar, i.e. the parser
checks if the expression made by the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared before use or not etc.
Intermediate Code Generation

After semantic analysis the compiler generates an intermediate code of the source code
for the target machine. It represents a program for some abstract machine. It is in
between the high-level language and the machine language.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution without wasting
resources (CPU, memory).
Code Generation
In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) re-locatable machine
code.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here.
OBJECTIVE 2: DEVELOP A LEXICAL ANALYZER TO RECOGNIZE A DIFFERENT
DIGIT IN C++. (EX. NUMBER, CHARACTER, BLANK, NEW LINE ETC.)
PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
intlcount=1,bcount=0,ccount=0,dcount=0,scount=0;
chari;
cout<<"\n Enter the sentence, add $ at the end \n";
while((i=cin.get())!='$')
{
if(i==' ')
bcount++;
else if(i=='\n')
lcount++;
else if(isdigit(i))
dcount++;
else if(isalpha(i))
ccount++;
else
scount++;
}
cout<<"\n No of Characters = "<<ccount;
cout<<"\n No of Digit = "<<dcount;
cout<<"\n No of Special Char = "<<scount;
cout<<"\n No of Blank = "<<bcount;
cout<<"\n No of Line = "<<lcount;
getch();
}
Output:
Enter the sentence, add $ at the end

Compiler Design
No of Characters = 16
No of Digit = 3
OBJECTIVE 3: DEVELOP A LEXICAL ANALYZER TO RECOGNIZE A
DIFFERENT PATTERNS IN C++. (EX. IDENTIFIERS, CONSTANTS,
COMMENTS, OPERATORS ETC.)
STUDY:
PROGRAM:
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
charstr[20];
int count=1;
cout<<"Enter the input string \n";
cin>>str;
if(isdigit(str[0]))
{
for(inti=1;i<strlen(str);i++)
{
if(isdigit(str[i]))
count++;
}
if(count==strlen(str))
cout<<"string is constant \n";
else
cout<<"string is neither constant nor identifier \n";
}
else if(isalpha(str[0]))
{
for(int j=1;j<strlen(str);j++)
{
if(isalnum(str[j]))
count++;
}
if(count==strlen(str))
cout<<"string is identifier \n";
}
getch();
}
-------------------------------------------------------------------------------------------------------------------
Output1: Output2: Output3:
Enter the input string Enter the input string Enter the input string
Truba 234 23Truba
string is identifier string is constant string is neither constant nor

identifier
-------------------------------------------------------------------------------------------------------------------
Take below example.
c = a + b;
After lexical analysis a symbol table is generated as given below.
-------------------------------------------------------------------------------------------------------------------
Token Type
c identifier
= operator
a identifier
+ operator
b identifier
; separator
-------------------------------------------------------------------------------------------------------------------

OBJECTIVE 4: WRITE A C PROGRAM TO RECOGNIZE STRINGS UNDER 'A*',
'A*B+', 'ABB'.
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<stdlib.h>
void main()
{
char s[20],c;
int state=0,i=0;
clrscr();
printf("\n Enter a string:");
gets(s);
while(s[i]!='\0')
{
switch(state)
{
case 0: c=s[i++];
if(c=='a')
state=1;
else if(c=='b')
state=2;
else
state=6; 80
break;
case 1: c=s[i++];
if(c=='a')
state=3;
else if(c=='b')
state=4;
else
state=6;
break;
case 2: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;

case 3: c=s[i++];
if(c=='a')
state=3;
else if(c=='b')
state=2;
else
state=6;
break;
case 4: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=5;
else
state=6;
break;
case 5: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;
case 6: printf("\n %s is not recognised.",s); exit(0); } }
if((state==1)||(state==3))
printf("\n %s is accepted under rule 'a*'",s);
else if((state==2)||(state==4))
printf("\n %s is accepted under rule 'a*b+'",s);
else if(state==5)
printf("\n %s is accepted under rule 'abb'",s);
getch();
}
-------------------------------------------------------------------------------------------------------------------
Output1: Output2: Output3:
Enter the input string Enter the input string Enter the input string
aaa aaab abb
aaais accepted under rule 'a*' aaabis accepted under rule 'a*b+ abbis accepted under rule 'abb'

OBJECTIVE 5: STUDY OF LEX TOOLS
STUDY:
During the first phase the compiler reads the input and converts strings in the source to
tokens. With regular expressions we can specify patterns to lex so it can generate code
that will allow it to scan and match strings in the input. Each pattern specified in the
input to lex has an associated action. Typically an action returns a token that represents
the matched string for subsequent use by the parser. Initially we will simply print the
matched string rather than return a token value.
Now we can easily understand some of lex’s limitations.
For example, lex cannot be used to recognize nested structures such as parentheses.
Nested structures are handled by incorporating a stack. Whenever we encounter a “(”
we push it on the stack. When a “)” is encountered we match it with the top of the stack
and pop the stack. However lex only has states and transitions between states. Since it
has no stack it is not well suited for parsing nested structures.
Regular expressions in lex are composed of metacharacters (Table 1). Pattern-matching
examples are shown in Table 2. Within a character class normal operators lose their
meaning. Two operators 7 allowed in a character class are the hyphen (“-”) and
circumflex (“^”). When used between two characters the hyphen represents a range of
characters. The circumflex, when used as the first character, negates the expression. If
two patterns match the same string, the longest match wins. In case both matches are
the same length, then the first pattern listed is used.
... definitions ...
%%
... rules ...

%%
... subroutines ...
Input to Lex is divided into three sections with %% dividing the sections.

Variable yytext is a pointer to the matched string (NULL-terminated) and yyleng is the
length of the matched string. Variable yyout is the output file and defaults to stdout.
Function yywrap is called by lex when input is exhausted. Return 1 if you are done or 0
if more processing is required. Every C program requires a main function. In this case
we simply call yylex that is the main entrypoint for lex. Some implementations of lex
include copies of main and yywrap in a library thus eliminating the need to code them
explicitly. This is why our first example, the shortest lex program, functioned properly.

EXAMPLE 1
EXAMPLE 2

EXAMPLE 3
>>geditsnazzle.lex
This example can be compiled by running this:
% lexsnazzle.lex
This will produce the file "lex.yy.c", which we can then compile with g++:
% g++ lex.yy.c -lfl -o snazzle
Notice the "-lfl", which links in the dynamic lex libraries (actually flex libraries, hence the "fl"). On some
systems you might have to use "-ll" instead. If that doesn't work, ask someone who'll know where on your
system the lex/flex libraries are kept.
% ./snazzle
90
Found an integer:90
23.4
Found a floating-point number:23.4
456

OBJECTIVE 6: WRITE A LEX PROGRAM TO RECOGNIZE A VALID ARITHMETIC
EXPRESSION AND TO RECOGNIZE THE IDENTIFIERS AND OPERATORS PRESENT.
PRINT THEM SEPARATELY.
PROGRAM:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
intflaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{printf("Invalid expression");}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}
OUTPUT:
Enter the expression (a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d] Valid expression
Add=1 Sub=0 Mul=1 Div=0
Operators are:+*

OBJECTIVE 7: WRITE A PROGRAM TO CHECK WHETHER A STRING BELONGS
TO THE GRAMMAR OR NOT.
PROGRAM:
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
charstr[20],tok[20];
int a=0,b=0,c,x=0;
cout<<"\n enter the string \n";
cin>>str;
while(str[a]!='\0')
{if(str[a]=='i')
tok[b]='1';
else if(str[a]=='*')
tok[b]='*';
else if(str[a]=='+')
tok[b]='+';
else
{cout<<"\n invalid string 1 \n";
break; }
a++;
b++;}
tok[b]='$';
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++;}
while(tok[1]!='$')
{b=0;
if((tok[b]=='1' &&tok[b+1]=='*' &&tok[b+2]=='1') || (tok[b]=='1' &&tok[b+1]=='+' &&tok[b+2]=='1'))
{tok[b]='1';
b++;
c=b+2;
while(tok[c]!='$')
{tok[b]=tok[c];
b++;
c++;
}tok[b]='$';
}else
{cout<<"\n invalid string 2 \n";
break;
} x=0;
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++; } }
if(tok[0]=='1' &&tok[1]=='$')
cout<<"\n valid string \n"; else
cout<<"\n invalid string 3 \n";
getch(); }
OBJECTIVE 8: WRITE A PROGRAM TO COMPUTE FIRST OF NON-TERMINALS.
PROGRAM:
#include<conio.h>
#include<string.h>
void main()
{
char t[5],nt[10],p[5][5],first[5][5];
inti,j,not,nont,k=0,f=0;
clrscr();
cout<<"Enter the no. of Non-terminals in the grammer:";
cin>>nont;
cout<<"\nEnter the Non-terminals in the grammer:";
for(i=0;i<nont;i++)
{
cin>>nt[i];
}
cout<<"\nEnter the no. of Terminals in the grammer: ( Enter e for absiline ) ";
cin>>not;
cout<<"Enter the Terminals in the grammer:";
for(i=0;i<not||t[i]=='$';i++)
{
cin>>t[i];
}
for(i=0;i<nont;i++)
{
p[i][0]=nt[i];
first[i][0]=nt[i];
}
cout<<"\nEnter the no of productions :\n";
for(i=0;i<nont;i++)
{
cout<<"\nEnter the production for"<<p[i][0]<<"( End the production with '$' sign ) :";
for(j=0;p[i][j]!='$';)
{
j+=1;
cin>>p[i][j];
}}
for(i=0;i<nont;i++)
{
cout<<"\n The production for"<<p[i][0]<<" -> ";
for(j=1;p[i][j]!='$';j++)
{
cout<<p[i][j];
}
}
for(i=0;i<nont;i++)
{
f=0;
for(j=1;p[i][j]!='$';j++)
{
for(k=0;k<not;k++){

if(f==1)
break;
if(p[i][j]==t[k])
{
first[i][j]=t[k]; first[i][j+1]='$'; f=1;
break;
}
else if(p[i][j]==nt[k])
{
first[i][j]=first[k][j];
if(first[i][j]=='e')
continue; first[i][j+1]='$'; f=1;
break;
}
}
}
}
for(i=0;i<nont;i++)
{
cout<<"\n\nThe first of"<<first[i][0]<<" -> ";
for(j=1;first[i][j]!='$';j++)
{
cout<<first[i][j];
}
}
getch();
}
OUTPUT:
Enter the no. of Non-terminals in the grammer: 2
Enter the Non-terminals in the grammer: S A
Enter the no. of Terminals in the grammer: ( Enter e for absiline ) : 3
Enter the Terminals in the grammer: e f g
Enter the no of productions : 2
Enter the production for S ( End the production with '$' sign ) : Es
Enter the production for A ( End the production with '$' sign ) : fg
The production for S ->eS
The production for A ->fg
The first of S -> e
The first of A -> f

OBJECTIVE 9: REPRESENTATION OF THE INTERMEDIATE CODE IN
(A) POST FIX NOTATION (B) 3 ADDRESS CODE
STUDY:
Rather than generate assembler directly from parse trees, many approaches generate an
intermediate representation
• The intermediate representation is machine-independent, and a second step translates it to

machine-specific assembler.
• This means that work to port to a new machine is reduced.
• Optimisation on intermediate code, not on object code.
• Two common formats:
O Postfix notation
O Three address code
Postfix Notation
• Also called suffix notation or reverse polish notation
•Used as an intermediate representation between the parse tree and the generated assembler
code.
Mapping program forms to Postfix

Mathematical and boolean expressions
•a + b => a b +
•a * (b + c) => a b c + *
•a == b => a b ==
Unary operators
•-a =>a -
Assignment
•a = a + b => a a b + =
Goto statements
• goto L1 => L1 jump_to

If statements
• if<p> then <inst1> else <inst2> =><p> L1 jump_if_false<inst1> L2 jump_to L1: <inst2 > L2
-------------------------------------------------------------------------------------------------------------------
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer, in the form of
an annotated syntax tree. That syntax tree then can be converted into a linear representation, e.g.,
postfix notation. Intermediate code tends to be machine independent code. Therefore, code generator
assumes to have unlimited number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions and then generate
the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the expression. A three-address
code can be represented in two forms:
Quadruples and Triples.
-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------

INDEX
Sr.No. Experiment Description Experiment Submission Remarks/Signature
Date Date
1 STUDY OF DIFFERENT PHASES OF COMPILER.
2 DEVELOP A LEXICAL ANALYZER TO

RECOGNIZE A DIFFERENT DIGIT IN C++. (EX.
NUMBER, CHARACTER, BLANK, NEW LINE
ETC.)
3 DEVELOP A LEXICAL ANALYZER TO

RECOGNIZE A DIFFERENT PATTERNS IN C++.
(EX. IDENTIFIERS, CONSTANTS, COMMENTS,
OPERATORS ETC.)
4 WRITE A C PROGRAM TO RECOGNIZE

STRINGS UNDER 'A*', 'A*B+', 'ABB'.
5 STUDY OF LEX TOOLS
6 WRITE A LEX PROGRAM TO RECOGNIZE A

VALID ARITHMETIC EXPRESSION AND TO
RECOGNIZE THE IDENTIFIERS AND
OPERATORS PRESENT. PRINT THEM
SEPARATELY.
7 WRITE A PROGRAM TO CHECK WHETHER A

STRING BELONGS TO THE GRAMMAR OR
NOT.
8 WRITE A PROGRAM TO COMPUTE FIRST OF

NON-TERMINALS.
9 REPRESENTATION OF THE INTERMEDIATE

CODE IN
(A) POST FIX NOTATION

(B) 3 ADDRESS CODE

Dinesh Compiler Design Lab Work-3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dinesh Compiler Design Lab Work-3

Uploaded by

Copyright:

Available Formats

OBJECTIVE 1: STUDY OF DIFFERENT PHASES OF COMPILER.

NAME: DINESH KISHNANI Roll No. :0114CS151044

Intermediate Code Generation

Enter the sentence, add $ at the end

Output1: Output2: Output3:

string is identifier string is constant string is neither constant nor

NAME: DINESH KISHNANI Roll No. :0114CS151044

NAME: DINESH KISHNANI Roll No. :0114CS151044

case 6: printf("\n %s is not recognised.",s); exit(0); } }

NAME: DINESH KISHNANI Roll No. :0114CS151044

... definitions ...

... rules ...

NAME: DINESH KISHNANI Roll No. :0114CS151044

NAME: DINESH KISHNANI Roll No. :0114CS151044

NAME: DINESH KISHNANI Roll No. :0114CS151044

This example can be compiled by running this:

% g++ lex.yy.c -lfl -o snazzle

Found a floating-point number:23.4

NAME: DINESH KISHNANI Roll No. :0114CS151044

NAME: DINESH KISHNANI Roll No. :0114CS151044

NAME: DINESH KISHNANI Roll No. :0114CS151044

Enter the Non-terminals in the grammer: S A

Enter the no. of Terminals in the grammer: ( Enter e for absiline ) : 3

Enter the Terminals in the grammer: e f g

Enter the no of productions : 2

The production for S ->eS

The production for A ->fg

The first of S -> e

The first of A -> f

NAME: DINESH KISHNANI Roll No. :0114CS151044

• The intermediate representation is machine-independent, and a second step translates it to

• This means that work to port to a new machine is reduced.

• Optimisation on intermediate code, not on object code.

• Two common formats:

O Three address code

Mapping program forms to Postfix

• goto L1 => L1 jump_to

NAME: DINESH KISHNANI Roll No. :0114CS151044

r being used as registers in the target program.

Quadruples and Triples.

NAME: DINESH KISHNANI Roll No. :0114CS151044

2 DEVELOP A LEXICAL ANALYZER TO

3 DEVELOP A LEXICAL ANALYZER TO

4 WRITE A C PROGRAM TO RECOGNIZE

5 STUDY OF LEX TOOLS

6 WRITE A LEX PROGRAM TO RECOGNIZE A

7 WRITE A PROGRAM TO CHECK WHETHER A

8 WRITE A PROGRAM TO COMPUTE FIRST OF

9 REPRESENTATION OF THE INTERMEDIATE

(A) POST FIX NOTATION

NAME: DINESH KISHNANI Roll No. :0114CS151044

You might also like