You are on page 1of 18

OBJECTIVE 1: STUDY OF DIFFERENT PHASES OF COMPILER.

STUDY:

NAME: DINESH KISHNANI Roll No. :0114CS151044


Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source code as
a stream of characters and converts it into meaningful lexemes. Lexical analyzer
represents these lexemes in the form of tokens as:

<token-name, attribute-value>

Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by
lexical analysis as input and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code grammar, i.e. the parser
checks if the expression made by the tokens is syntactically correct.

Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared before use or not etc.

Intermediate Code Generation


After semantic analysis the compiler generates an intermediate code of the source code
for the target machine. It represents a program for some abstract machine. It is in
between the high-level language and the machine language.

Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution without wasting
resources (CPU, memory).

Code Generation
In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) re-locatable machine
code.

Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here.
NAME: DINESH KISHNANI Roll No. :0114CS151044
OBJECTIVE 2: DEVELOP A LEXICAL ANALYZER TO RECOGNIZE A DIFFERENT
DIGIT IN C++. (EX. NUMBER, CHARACTER, BLANK, NEW LINE ETC.)

PROGRAM:

#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
intlcount=1,bcount=0,ccount=0,dcount=0,scount=0;
chari;
cout<<"\n Enter the sentence, add $ at the end \n";
while((i=cin.get())!='$')
{
if(i==' ')
bcount++;
else if(i=='\n')
lcount++;
else if(isdigit(i))
dcount++;
else if(isalpha(i))
ccount++;
else
scount++;
}
cout<<"\n No of Characters = "<<ccount;
cout<<"\n No of Digit = "<<dcount;
cout<<"\n No of Special Char = "<<scount;
cout<<"\n No of Blank = "<<bcount;
cout<<"\n No of Line = "<<lcount;
getch();
}

Output:

Enter the sentence, add $ at the end


Compiler Design
No of Characters = 16
No of Digit = 3
NAME: DINESH KISHNANI Roll No. :0114CS151044
OBJECTIVE 3: DEVELOP A LEXICAL ANALYZER TO RECOGNIZE A
DIFFERENT PATTERNS IN C++. (EX. IDENTIFIERS, CONSTANTS,
COMMENTS, OPERATORS ETC.)

STUDY:

PROGRAM:

#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
charstr[20];
int count=1;
cout<<"Enter the input string \n";
cin>>str;

if(isdigit(str[0]))
{
for(inti=1;i<strlen(str);i++)
{
if(isdigit(str[i]))
count++;
}
if(count==strlen(str))
cout<<"string is constant \n";
else
cout<<"string is neither constant nor identifier \n";
}
else if(isalpha(str[0]))
{
for(int j=1;j<strlen(str);j++)
NAME: DINESH KISHNANI Roll No. :0114CS151044
{
if(isalnum(str[j]))
count++;
}
if(count==strlen(str))
cout<<"string is identifier \n";
}
getch();
}
-------------------------------------------------------------------------------------------------------------------

Output1: Output2: Output3:

Enter the input string Enter the input string Enter the input string
Truba 234 23Truba

string is identifier string is constant string is neither constant nor


identifier
-------------------------------------------------------------------------------------------------------------------
Take below example.

c = a + b;
After lexical analysis a symbol table is generated as given below.

-------------------------------------------------------------------------------------------------------------------

Token Type

c identifier

= operator

a identifier

+ operator

b identifier

; separator

-------------------------------------------------------------------------------------------------------------------

NAME: DINESH KISHNANI Roll No. :0114CS151044


OBJECTIVE 4: WRITE A C PROGRAM TO RECOGNIZE STRINGS UNDER 'A*',
'A*B+', 'ABB'.

PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<stdlib.h>
void main()
{
char s[20],c;
int state=0,i=0;
clrscr();
printf("\n Enter a string:");
gets(s);
while(s[i]!='\0')
{
switch(state)
{
case 0: c=s[i++];
if(c=='a')
state=1;
else if(c=='b')
state=2;
else
state=6; 80
break;

case 1: c=s[i++];
if(c=='a')
state=3;
else if(c=='b')
state=4;
else
state=6;
break;

case 2: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;

NAME: DINESH KISHNANI Roll No. :0114CS151044


case 3: c=s[i++];
if(c=='a')
state=3;
else if(c=='b')
state=2;
else
state=6;
break;

case 4: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=5;
else
state=6;
break;

case 5: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;

case 6: printf("\n %s is not recognised.",s); exit(0); } }

if((state==1)||(state==3))
printf("\n %s is accepted under rule 'a*'",s);
else if((state==2)||(state==4))
printf("\n %s is accepted under rule 'a*b+'",s);
else if(state==5)
printf("\n %s is accepted under rule 'abb'",s);

getch();
}
-------------------------------------------------------------------------------------------------------------------
Output1: Output2: Output3:
Enter the input string Enter the input string Enter the input string
aaa aaab abb

aaais accepted under rule 'a*' aaabis accepted under rule 'a*b+ abbis accepted under rule 'abb'

NAME: DINESH KISHNANI Roll No. :0114CS151044


OBJECTIVE 5: STUDY OF LEX TOOLS

STUDY:

During the first phase the compiler reads the input and converts strings in the source to
tokens. With regular expressions we can specify patterns to lex so it can generate code
that will allow it to scan and match strings in the input. Each pattern specified in the
input to lex has an associated action. Typically an action returns a token that represents
the matched string for subsequent use by the parser. Initially we will simply print the
matched string rather than return a token value.
Now we can easily understand some of lex’s limitations.
For example, lex cannot be used to recognize nested structures such as parentheses.
Nested structures are handled by incorporating a stack. Whenever we encounter a “(”
we push it on the stack. When a “)” is encountered we match it with the top of the stack
and pop the stack. However lex only has states and transitions between states. Since it
has no stack it is not well suited for parsing nested structures.
Regular expressions in lex are composed of metacharacters (Table 1). Pattern-matching
examples are shown in Table 2. Within a character class normal operators lose their
meaning. Two operators 7 allowed in a character class are the hyphen (“-”) and
circumflex (“^”). When used between two characters the hyphen represents a range of
characters. The circumflex, when used as the first character, negates the expression. If
two patterns match the same string, the longest match wins. In case both matches are
the same length, then the first pattern listed is used.

... definitions ...

%%

... rules ...


%%
... subroutines ...

Input to Lex is divided into three sections with %% dividing the sections.

NAME: DINESH KISHNANI Roll No. :0114CS151044


Variable yytext is a pointer to the matched string (NULL-terminated) and yyleng is the
length of the matched string. Variable yyout is the output file and defaults to stdout.
Function yywrap is called by lex when input is exhausted. Return 1 if you are done or 0
if more processing is required. Every C program requires a main function. In this case
we simply call yylex that is the main entrypoint for lex. Some implementations of lex
include copies of main and yywrap in a library thus eliminating the need to code them
explicitly. This is why our first example, the shortest lex program, functioned properly.

NAME: DINESH KISHNANI Roll No. :0114CS151044


EXAMPLE 1

EXAMPLE 2

NAME: DINESH KISHNANI Roll No. :0114CS151044


EXAMPLE 3

>>geditsnazzle.lex

This example can be compiled by running this:

% lexsnazzle.lex

This will produce the file "lex.yy.c", which we can then compile with g++:

% g++ lex.yy.c -lfl -o snazzle

Notice the "-lfl", which links in the dynamic lex libraries (actually flex libraries, hence the "fl"). On some
systems you might have to use "-ll" instead. If that doesn't work, ask someone who'll know where on your
system the lex/flex libraries are kept.

% ./snazzle

90

Found an integer:90

23.4

Found a floating-point number:23.4

456

NAME: DINESH KISHNANI Roll No. :0114CS151044


OBJECTIVE 6: WRITE A LEX PROGRAM TO RECOGNIZE A VALID ARITHMETIC
EXPRESSION AND TO RECOGNIZE THE IDENTIFIERS AND OPERATORS PRESENT.
PRINT THEM SEPARATELY.

PROGRAM:

%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
intflaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{printf("Invalid expression");}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}

OUTPUT:
Enter the expression (a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d] Valid expression
Add=1 Sub=0 Mul=1 Div=0
Operators are:+*

NAME: DINESH KISHNANI Roll No. :0114CS151044


OBJECTIVE 7: WRITE A PROGRAM TO CHECK WHETHER A STRING BELONGS
TO THE GRAMMAR OR NOT.

PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
charstr[20],tok[20];
int a=0,b=0,c,x=0;
cout<<"\n enter the string \n";
cin>>str;
while(str[a]!='\0')
{if(str[a]=='i')
tok[b]='1';
else if(str[a]=='*')
tok[b]='*';
else if(str[a]=='+')
tok[b]='+';
else
{cout<<"\n invalid string 1 \n";
break; }
a++;
b++;}
tok[b]='$';
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++;}
while(tok[1]!='$')
{b=0;
if((tok[b]=='1' &&tok[b+1]=='*' &&tok[b+2]=='1') || (tok[b]=='1' &&tok[b+1]=='+' &&tok[b+2]=='1'))
{tok[b]='1';
b++;
c=b+2;
while(tok[c]!='$')
{tok[b]=tok[c];
b++;
c++;
}tok[b]='$';
}else
{cout<<"\n invalid string 2 \n";
break;
} x=0;
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++; } }
if(tok[0]=='1' &&tok[1]=='$')
cout<<"\n valid string \n"; else
cout<<"\n invalid string 3 \n";
getch(); }
NAME: DINESH KISHNANI Roll No. :0114CS151044
OBJECTIVE 8: WRITE A PROGRAM TO COMPUTE FIRST OF NON-TERMINALS.
PROGRAM:

#include<iostream.h>
#include<conio.h>
#include<string.h>

void main()
{
char t[5],nt[10],p[5][5],first[5][5];
inti,j,not,nont,k=0,f=0;
clrscr();
cout<<"Enter the no. of Non-terminals in the grammer:";
cin>>nont;
cout<<"\nEnter the Non-terminals in the grammer:";
for(i=0;i<nont;i++)
{
cin>>nt[i];
}
cout<<"\nEnter the no. of Terminals in the grammer: ( Enter e for absiline ) ";
cin>>not;
cout<<"Enter the Terminals in the grammer:";
for(i=0;i<not||t[i]=='$';i++)
{
cin>>t[i];
}
for(i=0;i<nont;i++)
{
p[i][0]=nt[i];
first[i][0]=nt[i];
}
cout<<"\nEnter the no of productions :\n";
for(i=0;i<nont;i++)
{
cout<<"\nEnter the production for"<<p[i][0]<<"( End the production with '$' sign ) :";

for(j=0;p[i][j]!='$';)
{
j+=1;
cin>>p[i][j];
}}
for(i=0;i<nont;i++)
{
cout<<"\n The production for"<<p[i][0]<<" -> ";
for(j=1;p[i][j]!='$';j++)
{
cout<<p[i][j];
}
}
for(i=0;i<nont;i++)
{
f=0;
for(j=1;p[i][j]!='$';j++)
{
for(k=0;k<not;k++){

NAME: DINESH KISHNANI Roll No. :0114CS151044


if(f==1)
break;
if(p[i][j]==t[k])
{
first[i][j]=t[k]; first[i][j+1]='$'; f=1;
break;
}
else if(p[i][j]==nt[k])
{
first[i][j]=first[k][j];
if(first[i][j]=='e')
continue; first[i][j+1]='$'; f=1;
break;
}
}
}
}
for(i=0;i<nont;i++)
{
cout<<"\n\nThe first of"<<first[i][0]<<" -> ";
for(j=1;first[i][j]!='$';j++)
{
cout<<first[i][j];
}
}
getch();
}

OUTPUT:
Enter the no. of Non-terminals in the grammer: 2

Enter the Non-terminals in the grammer: S A

Enter the no. of Terminals in the grammer: ( Enter e for absiline ) : 3

Enter the Terminals in the grammer: e f g

Enter the no of productions : 2

Enter the production for S ( End the production with '$' sign ) : Es

Enter the production for A ( End the production with '$' sign ) : fg

The production for S ->eS

The production for A ->fg

The first of S -> e

The first of A -> f

NAME: DINESH KISHNANI Roll No. :0114CS151044


OBJECTIVE 9: REPRESENTATION OF THE INTERMEDIATE CODE IN
(A) POST FIX NOTATION (B) 3 ADDRESS CODE

STUDY:
Rather than generate assembler directly from parse trees, many approaches generate an
intermediate representation

• The intermediate representation is machine-independent, and a second step translates it to


machine-specific assembler.

• This means that work to port to a new machine is reduced.

• Optimisation on intermediate code, not on object code.

• Two common formats:

O Postfix notation

O Three address code

Postfix Notation
• Also called suffix notation or reverse polish notation

•Used as an intermediate representation between the parse tree and the generated assembler
code.

Mapping program forms to Postfix


Mathematical and boolean expressions

•a + b => a b +

•a * (b + c) => a b c + *

•a == b => a b ==

Unary operators

•-a =>a -

Assignment

•a = a + b => a a b + =

Goto statements

• goto L1 => L1 jump_to

NAME: DINESH KISHNANI Roll No. :0114CS151044


If statements

• if<p> then <inst1> else <inst2> =><p> L1 jump_if_false<inst1> L2 jump_to L1: <inst2 > L2

-------------------------------------------------------------------------------------------------------------------

Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer, in the form of
an annotated syntax tree. That syntax tree then can be converted into a linear representation, e.g.,
postfix notation. Intermediate code tends to be machine independent code. Therefore, code generator
assumes to have unlimited number of memory storage (register) to generate code.

For example:

a = b + c * d;

The intermediate code generator will try to divide this expression into sub-expressions and then generate
the corresponding code.

r1 = c * d;
r2 = b + r1;
a = r2

r being used as registers in the target program.

A three-address code has at most three address locations to calculate the expression. A three-address
code can be represented in two forms:

Quadruples and Triples.

-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------

NAME: DINESH KISHNANI Roll No. :0114CS151044


INDEX
Sr.No. Experiment Description Experiment Submission Remarks/Signature
Date Date
1 STUDY OF DIFFERENT PHASES OF COMPILER.

2 DEVELOP A LEXICAL ANALYZER TO


RECOGNIZE A DIFFERENT DIGIT IN C++. (EX.
NUMBER, CHARACTER, BLANK, NEW LINE
ETC.)

3 DEVELOP A LEXICAL ANALYZER TO


RECOGNIZE A DIFFERENT PATTERNS IN C++.
(EX. IDENTIFIERS, CONSTANTS, COMMENTS,
OPERATORS ETC.)

4 WRITE A C PROGRAM TO RECOGNIZE


STRINGS UNDER 'A*', 'A*B+', 'ABB'.

5 STUDY OF LEX TOOLS

6 WRITE A LEX PROGRAM TO RECOGNIZE A


VALID ARITHMETIC EXPRESSION AND TO
RECOGNIZE THE IDENTIFIERS AND
OPERATORS PRESENT. PRINT THEM
SEPARATELY.

7 WRITE A PROGRAM TO CHECK WHETHER A


STRING BELONGS TO THE GRAMMAR OR
NOT.

8 WRITE A PROGRAM TO COMPUTE FIRST OF


NON-TERMINALS.

9 REPRESENTATION OF THE INTERMEDIATE


CODE IN

(A) POST FIX NOTATION


(B) 3 ADDRESS CODE

NAME: DINESH KISHNANI Roll No. :0114CS151044

You might also like