Professional Documents
Culture Documents
STUDY:
<token-name, attribute-value>
Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by
lexical analysis as input and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code grammar, i.e. the parser
checks if the expression made by the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared before use or not etc.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution without wasting
resources (CPU, memory).
Code Generation
In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) re-locatable machine
code.
Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the
identifier's names along with their types are stored here.
NAME: DINESH KISHNANI Roll No. :0114CS151044
OBJECTIVE 2: DEVELOP A LEXICAL ANALYZER TO RECOGNIZE A DIFFERENT
DIGIT IN C++. (EX. NUMBER, CHARACTER, BLANK, NEW LINE ETC.)
PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
intlcount=1,bcount=0,ccount=0,dcount=0,scount=0;
chari;
cout<<"\n Enter the sentence, add $ at the end \n";
while((i=cin.get())!='$')
{
if(i==' ')
bcount++;
else if(i=='\n')
lcount++;
else if(isdigit(i))
dcount++;
else if(isalpha(i))
ccount++;
else
scount++;
}
cout<<"\n No of Characters = "<<ccount;
cout<<"\n No of Digit = "<<dcount;
cout<<"\n No of Special Char = "<<scount;
cout<<"\n No of Blank = "<<bcount;
cout<<"\n No of Line = "<<lcount;
getch();
}
Output:
STUDY:
PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
clrscr();
charstr[20];
int count=1;
cout<<"Enter the input string \n";
cin>>str;
if(isdigit(str[0]))
{
for(inti=1;i<strlen(str);i++)
{
if(isdigit(str[i]))
count++;
}
if(count==strlen(str))
cout<<"string is constant \n";
else
cout<<"string is neither constant nor identifier \n";
}
else if(isalpha(str[0]))
{
for(int j=1;j<strlen(str);j++)
NAME: DINESH KISHNANI Roll No. :0114CS151044
{
if(isalnum(str[j]))
count++;
}
if(count==strlen(str))
cout<<"string is identifier \n";
}
getch();
}
-------------------------------------------------------------------------------------------------------------------
Enter the input string Enter the input string Enter the input string
Truba 234 23Truba
c = a + b;
After lexical analysis a symbol table is generated as given below.
-------------------------------------------------------------------------------------------------------------------
Token Type
c identifier
= operator
a identifier
+ operator
b identifier
; separator
-------------------------------------------------------------------------------------------------------------------
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<stdlib.h>
void main()
{
char s[20],c;
int state=0,i=0;
clrscr();
printf("\n Enter a string:");
gets(s);
while(s[i]!='\0')
{
switch(state)
{
case 0: c=s[i++];
if(c=='a')
state=1;
else if(c=='b')
state=2;
else
state=6; 80
break;
case 1: c=s[i++];
if(c=='a')
state=3;
else if(c=='b')
state=4;
else
state=6;
break;
case 2: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;
case 4: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=5;
else
state=6;
break;
case 5: c=s[i++];
if(c=='a')
state=6;
else if(c=='b')
state=2;
else
state=6;
break;
if((state==1)||(state==3))
printf("\n %s is accepted under rule 'a*'",s);
else if((state==2)||(state==4))
printf("\n %s is accepted under rule 'a*b+'",s);
else if(state==5)
printf("\n %s is accepted under rule 'abb'",s);
getch();
}
-------------------------------------------------------------------------------------------------------------------
Output1: Output2: Output3:
Enter the input string Enter the input string Enter the input string
aaa aaab abb
aaais accepted under rule 'a*' aaabis accepted under rule 'a*b+ abbis accepted under rule 'abb'
STUDY:
During the first phase the compiler reads the input and converts strings in the source to
tokens. With regular expressions we can specify patterns to lex so it can generate code
that will allow it to scan and match strings in the input. Each pattern specified in the
input to lex has an associated action. Typically an action returns a token that represents
the matched string for subsequent use by the parser. Initially we will simply print the
matched string rather than return a token value.
Now we can easily understand some of lex’s limitations.
For example, lex cannot be used to recognize nested structures such as parentheses.
Nested structures are handled by incorporating a stack. Whenever we encounter a “(”
we push it on the stack. When a “)” is encountered we match it with the top of the stack
and pop the stack. However lex only has states and transitions between states. Since it
has no stack it is not well suited for parsing nested structures.
Regular expressions in lex are composed of metacharacters (Table 1). Pattern-matching
examples are shown in Table 2. Within a character class normal operators lose their
meaning. Two operators 7 allowed in a character class are the hyphen (“-”) and
circumflex (“^”). When used between two characters the hyphen represents a range of
characters. The circumflex, when used as the first character, negates the expression. If
two patterns match the same string, the longest match wins. In case both matches are
the same length, then the first pattern listed is used.
%%
Input to Lex is divided into three sections with %% dividing the sections.
EXAMPLE 2
>>geditsnazzle.lex
% lexsnazzle.lex
This will produce the file "lex.yy.c", which we can then compile with g++:
Notice the "-lfl", which links in the dynamic lex libraries (actually flex libraries, hence the "fl"). On some
systems you might have to use "-ll" instead. If that doesn't work, ask someone who'll know where on your
system the lex/flex libraries are kept.
% ./snazzle
90
Found an integer:90
23.4
456
PROGRAM:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
intflaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{printf("Invalid expression");}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}
OUTPUT:
Enter the expression (a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d] Valid expression
Add=1 Sub=0 Mul=1 Div=0
Operators are:+*
PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
charstr[20],tok[20];
int a=0,b=0,c,x=0;
cout<<"\n enter the string \n";
cin>>str;
while(str[a]!='\0')
{if(str[a]=='i')
tok[b]='1';
else if(str[a]=='*')
tok[b]='*';
else if(str[a]=='+')
tok[b]='+';
else
{cout<<"\n invalid string 1 \n";
break; }
a++;
b++;}
tok[b]='$';
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++;}
while(tok[1]!='$')
{b=0;
if((tok[b]=='1' &&tok[b+1]=='*' &&tok[b+2]=='1') || (tok[b]=='1' &&tok[b+1]=='+' &&tok[b+2]=='1'))
{tok[b]='1';
b++;
c=b+2;
while(tok[c]!='$')
{tok[b]=tok[c];
b++;
c++;
}tok[b]='$';
}else
{cout<<"\n invalid string 2 \n";
break;
} x=0;
cout<<"\n";
while(tok[x]!='$')
{cout<<tok[x];
x++; } }
if(tok[0]=='1' &&tok[1]=='$')
cout<<"\n valid string \n"; else
cout<<"\n invalid string 3 \n";
getch(); }
NAME: DINESH KISHNANI Roll No. :0114CS151044
OBJECTIVE 8: WRITE A PROGRAM TO COMPUTE FIRST OF NON-TERMINALS.
PROGRAM:
#include<iostream.h>
#include<conio.h>
#include<string.h>
void main()
{
char t[5],nt[10],p[5][5],first[5][5];
inti,j,not,nont,k=0,f=0;
clrscr();
cout<<"Enter the no. of Non-terminals in the grammer:";
cin>>nont;
cout<<"\nEnter the Non-terminals in the grammer:";
for(i=0;i<nont;i++)
{
cin>>nt[i];
}
cout<<"\nEnter the no. of Terminals in the grammer: ( Enter e for absiline ) ";
cin>>not;
cout<<"Enter the Terminals in the grammer:";
for(i=0;i<not||t[i]=='$';i++)
{
cin>>t[i];
}
for(i=0;i<nont;i++)
{
p[i][0]=nt[i];
first[i][0]=nt[i];
}
cout<<"\nEnter the no of productions :\n";
for(i=0;i<nont;i++)
{
cout<<"\nEnter the production for"<<p[i][0]<<"( End the production with '$' sign ) :";
for(j=0;p[i][j]!='$';)
{
j+=1;
cin>>p[i][j];
}}
for(i=0;i<nont;i++)
{
cout<<"\n The production for"<<p[i][0]<<" -> ";
for(j=1;p[i][j]!='$';j++)
{
cout<<p[i][j];
}
}
for(i=0;i<nont;i++)
{
f=0;
for(j=1;p[i][j]!='$';j++)
{
for(k=0;k<not;k++){
OUTPUT:
Enter the no. of Non-terminals in the grammer: 2
Enter the production for S ( End the production with '$' sign ) : Es
Enter the production for A ( End the production with '$' sign ) : fg
STUDY:
Rather than generate assembler directly from parse trees, many approaches generate an
intermediate representation
O Postfix notation
Postfix Notation
• Also called suffix notation or reverse polish notation
•Used as an intermediate representation between the parse tree and the generated assembler
code.
•a + b => a b +
•a * (b + c) => a b c + *
•a == b => a b ==
Unary operators
•-a =>a -
Assignment
•a = a + b => a a b + =
Goto statements
• if<p> then <inst1> else <inst2> =><p> L1 jump_if_false<inst1> L2 jump_to L1: <inst2 > L2
-------------------------------------------------------------------------------------------------------------------
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer, in the form of
an annotated syntax tree. That syntax tree then can be converted into a linear representation, e.g.,
postfix notation. Intermediate code tends to be machine independent code. Therefore, code generator
assumes to have unlimited number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions and then generate
the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
A three-address code has at most three address locations to calculate the expression. A three-address
code can be represented in two forms:
-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------