You are on page 1of 74

Programming and Data Structures

Algorithms
Compiler Design

Programming and Data Structures


Here is the list of topics covered under GATE 2016 Computer Science programming
and data structures:
Programming in C : C programming is a popular computer programming language which is
widely used for system and application software. Despite being fairly old programming
language, C programming is widely used because of its efficiency and control. C
Programming Language is very important topic asked in GATE 2016 Computer Science
Exam. Here is complete notes on Programming in C for GATE 2016 Computer Science
Exam.
Recursion: A function that calls itself directly or indirectly is called a recursive function. The
recursive factorial function uses more memory than its non-recursive counter part. Recursive
function requires stack support to save the recursive function calls. Here is complete notes on
Recursive function for GATE 2016 Computer Science Exam.
Arrays: It is a collection of similar elements (having same data type. Array elements occupy
contiguous memory locations. Here is complete notes on Arrays & Pointers for GATE 2016
Computer Science Exam.

GATE 2016 Exam : Data Structure


A data structure is a specialised way for organising and storing data in memory, so that one
can perform operations on it. Here are the important topics covered under Data Structure
for GATE 2016 Computer Science Exam.
Stacks: A stack is an ordered collection of items into which new items may be inserted and
from which items may be deleted at one end, called the TOP of the stack. It is a LIFO (Last
In First Out) kind of data structure. Read complete information on Stacks here.
Queues: It is a non-primitive, linear data structure in which elements are added/inserted at
one end (called the REAR) and elements are removed/deleted from the other end (called the
FRONT). A queue is logically a FIFO (First in First Out) type of list. For complete details on
Queues, click here.
Linked Lists: Linked list is a special data structure in which data elements are linked to one
another. Here, each element is called a node which has two parts

Info part which stores the information.


Address or pointer part which holds the address of next element of same type.

Linked list is also known as self-referential structure. Get complete notes on Linked List
Here.

Trees: Trees are used to represent data containing a hierarchical relationship between
elements e. g., records, family trees and table contents. For complete details on Trees, click
here.
Binary search trees: A binary tree T, is called binary search tree (or binary sorted tree), if
each node N of T has the following property. The value at N is greater than every value in the
left subtree of N and is less than or equal to every value in the right subtree of N. Click here
to get detailed notes on Binary Search Trees.
Binary heaps: The binary heap data structure is an array that can be viewed as a complete
binary tree. Each node of the binary tree corresponds to an element of the array. The array is
completely filled on all levels except possibly lowest (lowest level is filled in left to right
order and need not be complete). For complete study notes on Binary heaps, click here.
Graphs: A graph is a collection of nodes called vertices, and the connections between them,
called edges. Get complete Study notes on Graphs here.

Programming in C
One of the Important topics of GATE 2016 Computer Science Exam is Programming &
Data structure. The programming language asked in GATE 2016 Exam is C Programming
Language.
Programming in C : GATE 2016 Exam

All C programs must have a function in it called main


Execution starts in function main
C is case sensitive
Comments start with /* and end with */. Comments may span over many lines.
C is a free format language
The #include <stdio.h> statement instructs the C compiler to insert the entire
contents of file stdio.h in its place and compile the resulting file.

Character Set: The characters that can be used to form words, numbers and expressions
depend upon the computer on which the program runs. The characters in C are grouped into
the following categories: Letters, Digits, Special characters and White spaces.
C Tokens: The smallest individual units are known as C tokens. C has six types of tokens.
They are: Keywords, Identifiers, Constants, Operators, String, and Special symbols.
Keywords: All keywords are basically the sequences of characters that have one or more
fixed meanings. All C keywords must be written in lowercase letters. e.g., break, char, int,
continue, default, do etc.
Identifiers: A C identifier is a name used to identify a variable, function, or any other userdefined item. An identifier starts with a letter A to Z, a to z, or an underscore _ followed by
zero or more letters, underscores, and digits (0 to 9).
Constants: Fixed values that do not change during the execution of a C program.
Backslash character constants are used in output functions. e.g., \b used for backspace and
\n used for new line etc.
Operator: It is symbol that tells computer to perform certain mathematical or logical
manipulations. e.g., Arithmetic operators (+, -, *, /) etc.
String: A string is nothing but an array of characters (printable ASCII characters).
Delimiters / Separators: These are used to separate constants, variables and statements e.g.,
comma, semicolon, apostrophes, double quotes and blank space etc.
Variable:

A variable is nothing but a name given to a storage area that our programs can
manipulate.

Each variable in C has a specific type, which determines the size and layout of the
variables memory.
The range of values that can be stored within that memory and the set of operations
that can be applied to the variable.

Data Types
Different Types of Modifier with their Range:
Type Conversions

Implicit Type Conversion: There are certain cases in which data will get
automatically converted from one type to another:
o When data is being stored in a variable, if the data being stored does not match
the type of the variable.
o The data being stored will be converted to match the type of the storage
variable.
o When an operation is being performed on data of two different types. The
smaller data type will be converted to match the larger type.
The following example converts the value of x to a double precision
value before performing the division. Note that if the 3.0 were changed
to a simple 3, then integer division would be performed, losing any
fractional values in the result.
average = x / 3.0;
o When data is passed to or returned from functions.
Explicit Type Conversion: Data may also be expressly converted, using the typecast
operator.
o The following example converts the value of x to a double precision value
before performing the division. ( y will then be implicitly promoted, following
the guidelines listed above. )
average = ( double ) x / y;
Note that x itself is unaffected by this conversion.

Expression:

lvalue:
Expressions that refer to a memory location are called lvalue expressions.
An lvalue may appear as either the left-hand or right-hand side of an
assignment.
o Variables are lvalues and so they may appear on the left-hand side of an
assignment
rvalue:
o The term rvalue refers to a data value that is stored at some address in
memory.
o An rvalue is an expression that cannot have a value assigned to it which means
an rvalue may appear on the right-hand side but not on the left-hand side of an
assignment.
o Numeric literals are rvalues and so they may not be assigned and cannot
appear on the left-hand side.
o
o

C Flow Control Statements


Control statement is one of the instructions, statements or group of statement in a
programming language which determines the sequence of execution of other instructions or
statements. C provides two styles of flow controls.
1. Branching (deciding what action to take)
2. Looping (deciding how many times to take a certain action)
If Statement: It takes an expression in .parenthesis and a statement or block of statements.
Expressions will be assumed to be true, if evaluated values are non-zero.
The switch Statement: The switch statement tests the value of a given variable (or
expression) against a list of case values and when a match is found, a block of statements
associated with that case is executed:
The Conditional Operators (?, : ): The ?, : operators are just like an if-else statement except
that because it is an operator we can use it within expressions. ? : are a ternary operators in
that it takes three values. They are the only ternary operator in C language.
flag = (x < 0) ? 0 : 1;
This conditional statement can be evaluated as following with equivalent if else statement.
if (x < 0) flag = 0;
else flag = 1;
Loop Control Structure
Loops provide a way to repeat commands and control. This involves repeating some portion
of the program either a specified numbers of times until a particular condition is satisfied.
while Loop:
initialize loop counter;
while (test loop counter using a condition/expression)
{
<Statement1>
<Statement2>

< decrement/increment loop counter>


}

for Loop:
for (initialize counter; test counter; increment/decrement counter)
{
<Statement1>
<Statement2>

}
do while Loop:
initialize loop counter;
do
{
<Statement1>
<Statement2>

}
while (this condition is true);

The break Statement: The break statement is used to jump out of a loop instantly, without
waiting to get back to the conditional test.
The continue Statement: The continue statement is used to take the control to the
beginning of the loop, by passing the statement inside the loop, which have not yet been
executed.
goto Statement: C supports an unconditional control statement, goto, to transfer the control
from one point to another in a C program.
C Variable Types
A variable is just a named area of storage that can hold a single value. There are two main
variable types: Local variable and Global variable.

Local Variable: Scope of a local variable is confined within the block or function, where it is
defined.
Global Variable: Global variable is defined at the top of the program file and it can be
visible and modified by
any function that may reference it. Global variables are initialized automatically by the
system when we define them. If same variable name is being used for global and local
variables, then local variable takes preference in its scope.
Storage Classes in C
A variable name identifies some physical location within the computer, where the string of
bits representing the variables value, is stored.
There are basically two kinds of locations in a computer, where such a value maybe kept:
Memory and CPU registers.
It is the variables storage class that determines in which of the above two types of locations,
the value should be stored.
We have four types of storage classes in C: Auto, Register, Static and Extern.
Auto Storage Class: Features of this class are given below.

Storage Location Memory


Default Initial Value Garbage value
Scope Local to the block in which the variable is defined.
Life Till the control remains within the block in which variable is defined.

Auto is the default storage class for all local variables.


Register Storage Class: Register is used to define local variables that should be stored in a
register instead of RAM. Register should only be used for variables that require quick access
such as counters. Features of register storage class are given below.

Storage Location CPU register


Default Initial Value Garbage value
Scope Local to the block il) which variable is defined.
Life Till the control remains within the block in which the variable is defined.

Static Storage Class: Static is the default storage class for global variables. Features of static
storage class are given below.

Storage Location Memory


Default Initial Value Zero
Scope Local to the block in which the variable is defined. In case of global variable,
the scope will be through out the program.
Life Value of variable persists between different function calls.

Extern Storage Class: Extern is used of give a reference of a global variable that is
variable to all the program files. When we use extern, the variable cant be initialized as all it
does, is to point the variable name at a storage location that has been previously defined.

Storage Location: Memory


Default Initial Value: Zero
Scope: Global
Life: As long as the programs execution does not come to an end.

Operator Precedence Relations: Operator precedence relations are given below from
highest to lowest order:
Functions

A function is a self-contained block of statements that perform a coherent task of


some kind.
Making function is a way of isolating one block of code from other independent
blocks of code.
After calling function, a function can take a number of arguments or parameters and a
function can be called any number of times.
A function can call itself such a process is called recursion.
Functions can be of two types: (i) Library functions , and (ii) User-defined functions

Call by Value: If we pass values of variables to the function as parameters, such kind of
function calling is known as call by value.
Call by Reference: Variables are stored somewhere in memory. So, instead of passing the
value of a variable, if we pass the location number / address of the variable to the function,
then it would become a call by reference.
Pointers
A pointer is a variable that stores memory address. Like all other variables, it also has a
name, has to be declared and occupies some spaces in memory. It is called pointer because it
points to a particular location.

& =Address of operator


* = Value at address operator or indirection operator
&i returns the address of the variable i.
*(&i) return the value stored at a particular address printing the value of *(&i) is
same as printing the value of i.

NULL Pointers

Uninitilized pointers start out with random unknown values, just like any other
variable type.

Accidentally using a pointer containing a random address is one of the most common
errors encountered when using pointers, and potentially one of the hardest to
diagnose, since the errors encountered are generally not repeatable.

Combinations of * and ++

*p++ accesses the thing pointed to by p and increments p


(*p)++ accesses the thing pointed to by p and increments the thing pointed to by p
*++p increments p first, and then accesses the thing pointed to by p
++*p increments the thing pointed to by p first, and then uses it in a larger expression.

Pointer Operations:

Assignment: You can assign an address to a pointer. Typically, you do this by using
an array name or by using the address operator (&).
Value finding (dereferencing): The * operator gives the value stored in the pointedto location.
Taking a pointer address: Like all variables, pointer variables have an address and a
value. The & operator tells you where the pointer itself is stored.
Adding an integer to a pointer: You can use the + operator to add an integer to a
pointer or a pointer to an integer. In either case, the integer is multiplied by the
number of bytes in the pointed-to type, and the result is added to the original address.
Incrementing a pointer: Incrementing a pointer to an array element makes it move
to the next element of the array.
Subtracting an integer from a pointer: You can use the operator to subtract an
integer from a pointer; the pointer has to be the first operand or a pointer to an integer.
The integer is multiplied by the number of bytes in the pointed-to type, and the result
is subtracted from the original address.
Note that there are two forms of subtraction. You can subtract one pointer from
another to get an integer, and you can subtract an integer from a pointer and get a
pointer.
Decrementing a pointer: You can also decrement a pointer. In this example,
decrementing ptr2 makes it point to the second array element instead of the third.
Note that you can use both the prefix and postfix forms of the increment and
decrement operators.
Differencing: You can find the difference between two pointers. Normally, you do
this for two pointers to elements that are in the same array to find out how far apart
the elements are. The result is in the same units as the type size.
Comparisons: You can use the relational operators to compare the values of two
pointers, provided the pointers are of the same type.

Recursion
A function that calls itself directly or indirectly is called a recursive function. The recursive factorial
function uses more memory than its non-recursive counter part. Recursive function requires stack
support to save the recursive function calls.

Factorial Recursive Function

GCD Recursive Function

Fibonacci Sequence Recursive Function

Power Recursive Function (xy)

Arrays

It is a collection of similar elements (having same data type.

Array elements occupy contiguous memory locations.

Example: a[i]: The name a of the array is a constant expression, whose value is the
address of the 0th location.

a = a+0 &a[0]
a+1 &a[1]

a+i &a[i]

&(*(a+i)) &a[i] a+i

*(&a[i]) *(a+i) a[i]

Address of an Array Element: a[i] = a + i * sizeof(element)

Multi-Dimensional Array
In C language, one can have arrays of any dimensions. Let us consider a 3 3 matrix

3 3 matrix for multi-dimensional array


To access the particular element from the array, we have to use two subscripts; one for row
number and other for column number. The notation is of the form a [i] [j], where i stands for
row subscripts and j stands for column subscripts.
We can also define and initialize the array as follows

Note: Two Dimensional Array b[i][j]

For Row Major Order: Size of b[i][j] = b + ( Number of rows * i + j )*sizeof(element)

For Column Major Order: Size of b[i][j] = b + ( Number of Columns * j + i


)*sizeof(element)

*(*(b + i) + j) is equivalent to b[i][j]

*(b + i) + j is equivalent to &b[i][j]

*(b[i] + j) is equivalent to b[i][j]

b[i] + j is equivalent to &b[i][j]

(*(b+i))[j] is equivalent to b[i][j]

Strings
In C language, strings are stored in an array of character (char) type along with the null
terminating character \0 at the end.

Example: char name[ ] = { G, A, T,E, T, O, P, \O};

\0 = Null character whose ASCII value is O.


0 = ASCII value is 48.
In the above declaration \0 is not necessary. C inserts the null character
automatically.

%S = It is used in printf ( ) as a format specification for printing out a string.


All the following notations refer to the same element: name [i] , * (name + i), * (i +
name), i [name]

Stacks
A stack is an ordered collection of items into which new items may be inserted and from
which items may be deleted at one end, called the TOP of the stack. It is a LIFO (Last In First
Out) kind of data structure.

Operations on Stack

Push: Adds an item onto the stack. PUSH (s, i); Adds the item i to the top of stack.

Pop: Removes the most-recently-pushed item from the stack. POP (s); Removes the
top element and returns it as a function value.

Implementation of Stack
A stack can be implemented using two ways: Array and Linked list.

But since array sized is defined at compile time, it cant grow dynamically. Therefore, an
attempt to insert/push an element into stack (which is implemented through array) can cause a
stack overflow situation, if it is already full.

Go, to avoid the above mentioned problem we need to use linked list to implement a stack,
because linked list can grow dynamically and shrink at runtime.

Applications of Stack
There are many applications of stack some of the important applications are given below.

Backtracking. This is a process when you need to access the most recent data
element in a series of elements.

Depth first Search can be implemented.

Function Calls: Different ways of organising the data are known as data structures.

Simulation of Recursive calls: The compiler uses one such data structure called
stack for implementing normal as well as recursive function calls.

Parsing: Syntax analysis of compiler uses stack in parsing the program.

Expression Evaluation: How a stack can be used for checking on syntax of an


expression.
o

Infix expression: It is the one, where the binary operator comes between the
operands.
e. g., A + B * C.

Postfix expression: Here, the binary operator comes after the operands.
e.g., ABC * +

Prefix expression: Here, the binary operator proceeds the operands.


e.g.,+ A * BC

This prefix expression is equivalent to A + (B * C) infix expression. Prefix notation is also


known as Polish notation. Postfix notation is also known as suffix or Reverse Polish notation.

Reversing a List: First push all the elements of string in stack and then pop elements.

Expression conversion: Infix to Postfix, Infix to Prefix, Postfix to Infix, and Prefix
to Infix

Implementation of Towers of Hanoi

Computation of a cycle in the graph

Queues
It is a non-primitive, linear data structure in which elements are added/inserted at one end
(called the REAR) and elements are removed/deleted from the other end (called the FRONT).
A queue is logically a FIFO (First in First Out) type of list.
Operations on Queue

Enqueue: Adds an item onto the end of the queue ENQUEUE(Q, i); Adds the item i
onto the end of queue.

Dequeue: Removes the item from the front of the queue. DEQUEUE (Q); Removes
the first element and returns it as a function value.

Queue Implementation: Queue can be implemented in two ways.

Static implementation (using arrays)

Dynamic implementation (using painters)

Circular Queue: In a circular queue, the first element comes just after the last element or a
circular queue is one in which the insertion of a new element is done at the very first location
of the queue, if the last location of queue is full and the first location is empty.
Note: A circular queue overcomes the problem of unutilised space in linear queues
implemented as arrays.
We can make following assumptions for circular queue.

Front will always be pointing to the first element (as in linear queue).

If Front = Rear, the queue will be empty.

Each time a new element is inserted into the queue, the Rear is incremented by 1.
Rear = Rear + 1

Each time, an element is deleted from the queue, the value of Front is incremented by
one.
Front = Front + 1

Double Ended Queue (DEQUE): It is a list of elements in which insertion and deletion
operations are performed from both the ends. That is why it is called double-ended queue or
DEQUE.
Priority Queues: This type of queue enables us to retrieve data items on the basis of priority
associated with them. Below are the two basic priority queue choices.
Sorted Array or List
It is very efficient to find and delete the smallest element. Maintaining sorted ness make the
insertion of new elements slow.
Applications of Queue:

Breadth first Search can be implemented.

CPU Scheduling

Handling of interrupts in real-time systems

Routing Algorithms

Computation of shortest paths

Computation a cycle in the graph

Linked Lists
Linked list is a special data structure in which data elements are linked to one another. Here,
each element is called a node which has two parts

Info part which stores the information.

Address or pointer part which holds the address of next element of same type. Linked
list is also known as self-referential structure.

Syntax of declaring a node which contains two fields in it one is for storing information and
another is for storing address of other node, so that one can traverse the list.

Advantages of Linked List: Linked lists are dynamic data structure as they can grow and
shrink during the execution time.

Efficient memory utilisation because here memory is not pre-allocated.

Insertions and deletions can be done very easily at the desired position.

Disadvantages of Linked List: More memory is required, if the number of fields are, more.

Access to an arbitrary data item is time consuming.

Singly Linked List: In this type of linked list, each node has only one address field which
points to the next node. So, the main disadvantage of this type of list is that we cant access
the predecessor of node from the current node.
Doubly Linked List: Each node of linked list is having two address fields (or links) which
help in accessing both the successor node (next node) and predecessor node (previous node).
Circular Linked List: It has address of first node in the link (or address) field of last node.
Circular Doubly Linked List: It has both the previous and next pointer in circular manner.
Operations on Linked Lists: The following operations involve in linked list are as given
below
Creation: Used to create a lined list.

Insertion: Used to insert a new node in linked list at the specified position. A new node may
be inserted

At the beginning of a linked list

At the end of a linked list

At the specified position in a linked list

In case of empty list, a new node is inserted as a first node.

Deletion: This operation is basically used to delete as item (a node). A node may be deleted
from the

Beginning of a linked list.

End of a linked list.

Specified position in the list.

Traversing: It is a process of going through (accessing) all the nodes of a linked list from
one end to the other end.

Trees
Tree (Non-linear Data Structures): Trees are used to represent data containing a
hierarchical relationship between elements e. g., records, family trees and table contents. A
tree is the data structure that is based on hierarchical tree structure with set of nodes.

For above tree:


Number of level = 4
Max level number = 3

Node: Each data item in a tree.

Root: First or top data item in hierarchical arrangement.

Degree of a Node: Number of subtrees of a given node.

e. 9., Degree of A = 3, Degree of E = 2

Degree of a Tree: Maximum degree of a node in a tree.

e. g., Degree of above tree = 3

Depth or Height: Maximum level number of a node + 1(i.e., level number of farthest
leaf node of a tree + 1).

e.g., Depth of above tree = 3 + 1= 4

Non-terminal Node: Any node except root node whose degree is not zero.

Forest: Set of disjoint trees.

Siblings: D and G are siblings of parent Node B.

Path: Sequence of consecutive edges from the source node to the destination node.

Internal nodes: All nodes those have children nodes are called as internal nodes.

Leaf nodes: Those nodes, which have no child, are called leaf nodes.

The depth of a node is the number of edges from the root to the node.

The height of a node is the number of edges from the node to the deepest leaf.

The height of a tree is the height of the root.

Binary Tree: A binary tree is a tree like structure that is rooted and in which each node has
at most two children and each child of a node is designated as its left or right child. In this
kind of tree, the maximum degree of any node is at most 2.

A binary tree T is defined as a finite set of elements such that

T is empty (called NULL tree or empty tree).

T contains a distinguished Node R called the root of T and the remaining nodes of T
form an ordered pair of disjoint binary trees T1 and T2.

Any node N in a binary tree T has either 0, 1 or 2 successors. Level l of a binary tree T can
have at most 2l nodes.

Extended Binary Trees: 2.Trees or Strictly Binary Trees


If every non-terminal node in a binary tree consist of non-empty left subtree and right
subtree. In other words, if any node of a binary tree has either 0 or 2 child nodes, then such
tree is known as strictly binary tree or extended binary tree or 2- tree.

Complete Binary Tree: A complete binary tree is a tree in which every level, except
possibly the last, is completely filled.

A Complete binary tree is one which have the following properties

Which can have 0, 1 or 2 children.

In which first, we need to fill left node, then right node in a level.

In which, we can start putting data item in next level only when the previous level is
completely filled.

Full Binary Tree


Tree Traversal: Three types of tree traversal are given below

Preorder
o

Process the root R.

Traverse the left subtree of R in preorder.

Traverse the right subtree of R in preorder.

Inorder
o Traverse the left subtree of R in inorder.
o

Process the root R.

Traverse the right subtree of R in inorder.

Postorder
o Traverse the left subtree of R in postorder.
o

Traverse the right subtree of R in postorder.

Process the root R.

Breadth First Traversal (BFT): The breadth first traversal of a tree visits the nodes in the
order of their depth in the tree.
BFT first visits all the nodes at depth zero (i.e., root), then all the nodes at depth 1 and so on.
At each depth, the nodes are visited from left to right.

Depth First Traversal (DFT): In DFT, one starts from root and explores as far as possible
along each branch before backtracking.

Perfect Binary Tree or Full Binary Tree: A binary tree in which all leaves are at the same
level or at the same depth and in which every parent has 2 children.

Here, all leaves (D, E, F, G) are at depth 3 or level 2 and every parent is having exactly 2
children.

Maximum number of nodes in a binary tree: Let a binary tree contain MAX, the
maximum number of nodes possible for its height h. Then h= log(MAX + 1) 1.

The number n of nodes in a binary tree of height h is atleast n = h + 1 and atmost n =


2h+1 1, where h is the depth of the tree

The height of the Binary Search Tree equals the number of links from the root node to
the deepest node.

Binary Search Trees


A binary tree T, is called binary search tree (or binary sorted tree), if each node N of T has
the following property. The value at N is greater than every value in the left subtree of N and
is less than or equal to every value in the right subtree of N. A BST holds the following
properties:

Each node can have up to two child nodes.

The left subtree of a node contains only nodes with keys less than the nodes key.

The right subtree of a node contains only nodes with keys greater than the nodes key.

The left and right subtree each must also be a binary search tree.

A unique path exists from the root to every other node.

Traversals of Binary Search Tree


Inorder Tree Walk: During this type of walk, we visit the root of a subtree between the left
subtree visit and right subtree visit.
Inorder (x) If x NIL then Inorder (left[x]) print key[x] Inorder (right[x])

Preorder Tree Walk: In which we visit the root node before the nodes in either subtree.
Preorder (x) If x NIL then PRINT key[x] Preorder (left[x]) Preorder (right[x])
Postorder Tree Walk: In which we visit the root node after the nodes in its subtrees.
Postorder(x) If x NIL then Postorder (left[x]) Postorder (right[x]) PRINT key [x]
Search an element in BST: The most basic operator is search, which can be a recursive or
an iterative function. A search can start from any node, If the node is NULL (i.e. the tree is

empty), then return NULL which means the key does not exist in the tree. Otherwise, if the
key equals that of the node, the search is successful and we return the node. If the key is less
than that of the node, we search its left subtree. Similarly, if the key is greater than that of the
node, we search its right subtree. This process is repeated until the key is found or the
remaining subtree is null. To search the key in the BFS, just call the method from the root
node.
Insertion of an element: Insertion begins as a search would begin; We examine the root and
recursively insert the new node to the left subtree if its key is less than that of the root, or the
right subtree if its key is greater than the root. If the key exists, we can either replace the
value by the new value or just return without doing anything.
Deletion of an element: The deletion is a little complex. Basically, to delete a node by a
given key, we need to find the node with the key, and remove it from the tree. There are three
possible cases to consider:

Deleting a leaf: we can simply remove it from the tree.

Deleting a node with one child: remove the node and replace it with its child.

Deleting a node with two children: find its in-order successor (left-most node in its right subtree), lets say R. Then copy Rs key and value to the node, and remove R from its right subtree.

Key Points of BST

It takes (n) time to walk (inorder, preorder and pastorder) a tree of n nodes.

On a binary search tree of height h, Search, Minimum, Maximum, Successor, Predecessor,


Insert, and Delete can be made to run in O(h) time.

The height of the Binary Search Tree equals the number of links from the root node to the
deepest node.

The disadvantage of a BST is that if every item which is inserted to be next is greater than the
previous item, then we will get a right skewed BST or if every item which is to be inserted is
less than to the previous item, then we will get a left skewed BST.

So, to overcome the skewness problem in BST, the concept of AVL- tree or height balanced
tree came into existence.
Balanced Binary Trees: Balancing ensures that the internal path lengths are close to the
optimal n log n. A balanced tree will have the lowest possible overall height. AVL trees and
B trees are balanced binary trees.
AVL Trees: An AVL (Adelson-Velskii and Land is) is a binary tree with the following
properties.

For any node in the tree, the height of the left and right subtrees can differ by atmost 1.

The height of an empty subtree is 1.

Every node of an AVL tree is associated with a balance factor.

Balance factor of a node = Height of left subtree Height of right subtree

A node with balance factor 1, 0 or 1 is considered as balanced.

AVL Tree is height balanced binary tree.

The objective is to keep the structure of the binary tree always balanced with n given nodes so
that the height never exceeds O(log n).

After every insert or delete we must ensure that the tree is balanced.

A search of the balanced binary tree is equivalent to a binary search of an ordered list.

In both cases, each check eliminates half of the remaining items. Hence searching is O(log n).

Rotations: A tree rotation is required when we have inserted or deleted a node which leaves
the tree in an unbalanced form.

Left rotation (L-rotation): Left rotation of nodes is shown in below figure.

Right rotation (R-rotation): Right rotation of nodes is shown in below figure.

Double right-left rotation (R-L rotation):

A non-empty binary tree T is an AVL-tree if and only if


|h(TL) h(TR)| 1

where, h(TL) = Height of left subtree TL of tree T


h(TR) = Height of right subtree TR of tree T

h(TL) h(TR) is also known as Balance Factor (BF). For an AVL (or height balanced tree),
the balance factor can be either 0, 1 or 1. An AVL search tree is binary search tree which is
an AVL-tree.

Binary Heaps
The binary heap data structure is an array that can be viewed as a complete binary tree. Each
node of the binary tree corresponds to an element of the array. The array is completely filled
on all levels except possibly lowest (lowest level is filled in left to right order and need not be
complete).
There are two types of heap trees: Max heap tree and Min heap tree.
Max heap: In a heap, for every node i other than the root, the value of a node is greater than
or equal (at most) to the value of its parent. A[PARENT (i)] A[i]. Thus, the largest element
in a heap is stored at the root.
Min heap: In a heap, for every node i other than the root, the value of a node is less than or
equal (at most) to the value of its parent. A[PARENT (i)] A[i]. Thus, the smallest element in
a heap is stored at the root.
The root of the tree A[1] and given index i of a node, the indices of its parent, left child and
right child can be computed as follows:
PARENT (i): Parent of node i is at floor(i/2)
LEFT (i): Left child of node i is at 2i
RIGHT (i): Right child of node i is at (2i + 1)
Heapify: Heapify is a procedure for manipulating heap data structures. It is given an array A
and index i into the array. The subtree rooted at the children of A[i] are heap but node A[i]
itself may possibly violate the heap property.
A[i] < A[2i] or A[i] < A[2i +1].
The procedure Heapify manipulates the tree rooted at A[i] so it becomes a heap.
Heapify (A, i)
1. l left [i]
2. r right [i]
3. if l heap-size [A] and A[l] > A[i]
4. then largest l
5. else largest i
6. if r heap-size [A] and A[i] > A[largest]
7. then largest r

8. if largest i
9. then exchange A[i] A[largest]
10. Heapify (A, largest)

Time complexity of Heapify algorithm is: O(log n)


Building a Heap: Heapify procedure can be used in a bottom-up fashion to convert an array
A[1 . . n] into a heap. Since the elements in the subarray A[n/2+1 . . n] are all leaves, the
procedure Build_Heap goes through the remaining nodes of the tree and runs Heapify on
each one. The bottom-up order of processing node guarantees that the subtree rooted at
children are heap before Heapify is run at their parent.
Build_Heap (A)
1. heap-size (A) length [A]
2. For i floor(length[A]/2) down to 1 do
3. Heapify (A, i)
Time complexity of Build_Heap algorithm is: O(n)
Heap of height h has the minimum number of elements when it has just one node at the
lowest level.
Minimum nodes of Heap of a height h: The levels above the lowest level form a complete
binary tree of height h -1 and 2h -1 nodes. Hence the minimum number of nodes possible in a
heap of height h is 2h nodes.
Maximum nodes of Heap of a height h: Heap of height h, has the maximum number of
elements when its lowest level is completely filled. In this case the heap is a complete binary
tree of height h and hence has (2h+1 -1) nodes.
For Min heap tree of n-elements:
Insertion of an element: O(log n)
Delete minimum element: O(log n)
Remove an element: O(log n)
Find minimum element: O(1)

Graphs
A graph is a collection of nodes called vertices, and the connections between them, called
edges.
Directed Graph: When the edges in a graph have a direction, the graph is called a directed
graph or digraph and the edges are called directed edges or arcs.
Adjacency: If (u,v) is in the edge set we say u is adjacent to v.
Path: Sequence of edges where every edge is connected by two vertices.
Loop: A path with the same start and end node.
Connected Graph: There exists a path between every pair of nodes, no node is disconnected.
Acyclic Graph: A graph with no cycles.
There are many ways of representing a graph:

Adjacency List

Adjacency Matrix

Incidence Matrix

Incidence List

Graph Traversals: Visits all the vertices that it can reach starting at some vertex. Visits all
vertices of the graph if and only if the graph is connected (effectively computing Connected
Components). Traversal never visits a vertex more than once.
The breadth first search (BFS) and the depth first search (DFS) are the two algorithms used
for traversing and searching a node in a graph.
Depth first search (DFS) Algorithm
Step 1: Visit the first vertex, you can choose any vertex as the first vertex (if not explicitly
mentioned). And push it to the Stack.
Step 2: Look at the undiscovered adjacent vertices of the top element of the stack and visit
one of them (in any particular order).
Step 3: Repeat Step 2 till there is no undiscovered vertex left.
Step 4: Pop the element from the top of the Stack and Repeat Step 2, 3 and 4 till the stack is
not empty.
Applications of DFS

Minimum spanning tree

To check if graph has a cycle

Topological sorting

To find strongly connected components of graph

To find bridges in graph

Analysis of the DFS: The running time of the algorithm would be O(|V|+|E|).
Breadth First Search Algorithm
Step 1: Visit the first vertex, you can choose any node as the first node. And add it into the a
queue.
Step 2: Repeat the below steps till the queue is not empty.
Step 3: Remove the head of the queue and while staying at the vertex, visit all connected
vertices and add them to the queue one by one (you can choose any order to visit all the
connected vertices).
Step 4: When all the connected vertices are visited. Repeat Step 3.
Applications of BFS

To find shortest path between two nodes u and v

To test bipartite-ness of a graph

To find all nodes within one connected component

To check if graph has a cycle

Analysis of BFS: The running time complexity will be O(|V| +|E|).

Algorithms
The following topics covered under Algorithm:

Introduction of Algorithms: Algorithm can be classified by the amount of time they need to
complete compared to their input size. The analysis of an algorithm focuses on the
complexity of algorithm which depends on time or space. For more details on Introduction of
Algorithms, click here

Searching: Searching is majorly of two types: Sequential and binary search. To know more
on sequential and binary search, click here

Sorting: Sorting can be of two types, namely, In-place sorting and Stable Sorting . The inplace sorting algorithm does not use extra storage to sort the elements of a list. Stable sorting
algorithm maintain the relative order of records with equal values during sorting. Get detailed
study notes on Sorting here

Hashing: Hashing is a common method of accessing data records. A hash system that stores
records in an array, called a hash table. Hash function primarily is responsible to map
between the original data items and the smaller table. For complete Notes on Hashing, click
here

Space and time complexities: Space and Time complexities covers Asymptotic Notations
and Analysis of Algorithms. To get complete study notes on Space and time complexities,
click here

Algorithm design techniques: The idea of the technique is to divide the problem into
smaller but similar sub problems (divide), solve it (conquer), and (combine) these solutions to
create a solution to the original problem. To know more on Algorithm design technique, click
here

Minimum Spanning Trees: Minimum Spanning Trees can be shown through Prims
algorithms and Kruskals algorithms. To know more on Minimum Spanning Trees, Click
Here

Shortest Paths: Shortest path problem is to determine one or more shortest path between a
source vertex and a target vertex, where a set of edges are given. To know more on Shortest
Path, Click here

Introduction of Algorithm
An algorithm is well defined computational procedure that transforms inputs into outputs,
achieving the desired input-output relationship.

An algorithm is a set of rules for carrying out calculation either by hand or on a


machine.

An algorithm is a finite step-by-step procedure to achieve a required result.

An algorithm is a sequence of computational steps that transform the input into the
output.

An algorithm is a sequence of operations performed on data that have to be organized


in data structures.

An algorithm is an abstraction of a program to be executed on a physical machine.

Searching
Sequential Search (Linear Search)
Pseudo code of Sequential search:

Analysis of Sequential Search: The time complexity in sequential search in all three cases is
given below.

Best case: When we find the key on the first location of array, then the complexity is
O(1).
Worst case: When the key is not found in the array and we have to scan the complete
array, then the complexity is O(n).
Average case: When the key could appear anywhere in the array, then a successful
search will take total 1 + 2 + 3 ++ n comparisons.

comparisons.

Average number of comparisons =

So, the time complexity = O(n)


Binary Search
In each step, binary search algorithm divides the array into three sections.

Middle element
Elements on left side of the middle element
Elements on right side of the middle element

Pseudo Code of Binary Search:

Analysis of Binary Search: The time complexity of binary search in all three cases is given
below

Best case: The best case complexity is O(1)

Average case: T(n) = O(log2 n)

Worst case: In worst case, the complexity of binary search is O(log2 n)


o The number of comparisons performed by Algorithm binary search on a sorted
array of size n is at most log n + 1.

Sorting
Sorting can be of two types: In-place sorting and Stable Sorting.

In-place Sorting: The in-place sorting algorithm does not use extra storage to sort the
elements of a list.
Stable Sorting: Stable sorting algorithm maintain the relative order of records with equal
values during sorting.

Bubble Sorting

Pseudo Code for Bubble Sorting:

Analysis of Bubble Sort: The time complexity of bubble sort in all three cases is given
below

Best case O(n)


Average case O(n2)
Worst case O(n2)

Insertion Sorting

The insertion sort only passes through the array once. Therefore, it is very fast and efficient
sorting algorithm with small arrays. Insertion sort is useful only for small files or very nearly
sorted files.
Pseudo Code of Insertion Sort

Analysis of Insertion Sort

Worst case: Input is in reverse sorted order.

Average case: All permutations are equally likely.

Best case: Array is already sorted. T(n) = O(n)

Selection Sorting

A simple and straight forward algorithm to sort the entries in array A.


First, we find the minimum element and store it in A[1]. Next, we find the minimum of the
remaining n 1 elements and store it in A[2]. We continue this way until the second largest
element is stored in A[n1].
Algorithm: Selection sort
Input: An array A[1n] of n elements.
Output: A{1n] sorted in non-decreasing order.
for(i=1; i<n; i++)
{
k=i;
for(j=i+1; j<n; j++)

{
if(A[j] < A[k] then k=j;
}
if(k i) then swap(A[i], A[k]);
}
Analysis of Selection Sort

Worst case: T(n) = O(n2).


Best case: T(n) = O(n2).
Average case: T(n) = O(n2).

Heap Sort

Heap sort is simple to implement and is a comparison based sorting. It is in-place sorting but
not a stable sort.
Max heap: A heap in which the parent has a larger key than the childs is called a max heap.
Min heap: A heap in which the parent has a smaller key than the childs is called a min heap.
Heapify:
Heapify (A, i)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

l left [i]
r right [i]
if l heap-size [A] and A[l] > A[i]
then largest l
else largest i
if r heap-size [A] and A[i] > A[largest]
then largest r
if largest i
then exchange A[i] A[largest]
Heapify (A, largest)

Build Heap

BUILD_HEAP (A)
1. heap-size (A) length [A]
2. For i floor(length[A]/2) down to 1 do
3. Heapify (A, i)

Heap Sort

HEAPSORT (A)
1. BUILD_HEAP (A)
2. for i length (A) down to 2 do
exchange A[1] A[i]
heap-size [A] heap-size [A] 1
Heapify (A, 1)

Analysis of Heap Sort: The total time for heap sort is O (n log n) in all three cases (best,
worst and average).

Heapify: which runs in O(logn) time.


Build-Heap: which runs in linear time O(n).
Heap Sort: which runs in O(n logn) time.
Extract-Max: which runs in O(logn) time.

Note:

Merge sort is a fast sorting algorithm whose best, worst, and average case complexity are all
in O(n log n), but unfortunately it uses O(n) extra space to do its work.
Quicksort has best and average case complexity in O(n log n), but unfortunately its worst
case complexity is in O(n2).

Hashing
Hashing is a common method of accessing data records.
Hash Table: A hash system that stores records in an array, called a hash table.
Hash Function: Hash function primarily is responsible to map between the original data
items and the smaller table. There are many hash functions approaches as follows
Division Method: Mapping a key K into one of m slots by taking the remainder of K divided
by m.
h (K) = K mod m

Mid-Square Method: Mapping a key K into one of m slots, by getting the some middle
digits from value K2.
h(k) = K2 and get middle (log10 m) digits
Folding Method: Divide the key K into some sections, besides the last section, have same
length. Then, add these sections together.

Shift folding
Folding at the boundaries

h(K) = (section divided from K) by a or b


Collision: No matter what the hash function, there is the possibility that two different keys
could resolve to the same hash address. This situation is known as a collision.
Chaining: A chain is simply a linked list of all the elements with the same hash key.
h(K) = key % table slots
Hashing with Linear Probing: When using a linear probing method the item will be stored
in the next available slot in the table, assuming that the table is not already full. This is
implemented via a linear searching for an empty slot, from the point of collision. If the end of
table is reached during the linear search, the search will wrap around to the beginning of the
table and continue from there.

Space and Time Complexity


Asymptotic Notations: The notations we use to describe the asymptotic running time of an algorithm
are defined in terms of functions whose domains are the set of natural numbers N = {0, 1, 2 }. The
asymptotic notations consist of the following useful notations.

Big Oh (O): If we write f(n) = O(g(n)), then there exists a function f(n) such that f(n) cg
(n) with any constant c. Or we can say g(n) is an asymptotic upper bound for f(n).
2
3
o Example: 2n = O(n), 2n= O(n ), 2n =O(n )
Big Omega (): If we write f(n) = (g(n)), then there exists a function f(n) such that f(n)
cg(n) with any constant c. Or we can say Function g(n) is an asymptotic lower bound for f(n).
o Example: n = (log2n) with constant c = 1
Big Theta (): If we write f(n) = (g(n)), then there exists a function f(n) such that c1g(n)
f(n) c2g(n) with any positive constants c1 and c2. Or we can say Function g(n) is an
asymptotically tight bound for f(n).
o f(n) = (g(n)) if and only if f = O(g(n)) and f(n) = (g(n)).
Small Oh (o): Notation if we write f(n) = o(g(n), then there exists a function such that f(n) <
c g(n) with any positive constant c. Or we can say Function g(n) is an asymptotically tight
upper bound of f(n).
Example: n1.99 = o(n2)
Small Omega (): Notation if we write f(n) = (g(n)), then these exists a function such
that f(n) > cg(n) with any positive constant c. Or we can say g(n) is asymptotically tight
lower bound of f(n).
2.00001
o Example: n
= (n2) and n2 (n2)

Analysis of Algorithms

Algorithm can be classified by the amount of time they need to complete compared to their
input size. The analysis of an algorithm focuses on the complexity of algorithm which
depends on time or space.

Time Complexity: The time complexity is a function that gives the amount of time required
by an algorithm to run to completion.
o Worst case time complexity: It is the function defined by the maximum amount of
time needed by an algorithm for an input of size n.
o Average case time complexity: It is the execution of an algorithm having typical
input data of size n.
o Best case time complexity: It is the minimum amount of time that an algorithm
requires for an input of size n.
Space Complexity: The space complexity is a function that gives the amount of space
required by an algorithm to run to completion.

Recurrence Relations

A recurrence is a function defined in terms of One or more base cases and Itself with smaller
arguments.
Example:

Above recurrence relation can be computed asymptotically that is T(n) = O(n2).


In algorithm analysis, we usually express both the recurrence and its solution using
asymptotic notation.

Methods to Solve the Recurrence Relation

There are two methods to solve the recurrence relation given as: Substitution method
and Master method.
1. Substitution Method: There are two steps in this method

Guess the solution


Use induction to find the constants and show that the solution works.

2. Master Method: The master method gives us a quick way to find solutions to recurrence
relations of the form T(n) = aT (n/b) + f(n). Where, a and b are constants, a 1 and b > 1)

T(n) = aT(n/b) + f (n) where f(n) (nd), d 0


d
d
o Case-1: If a < b , T(n) (n )
d
d
o Case-2: If a = b , T(n) (n log n)
d
log b a
o Case-3: If a > b , T(n) (n
)
Examples:
2
o T(n) = 4T(n/2) + n T(n) (n )
2
2
o T(n) = 4T(n/2) + n T(n) (n log n)
3
3
o T(n) = 4T(n/2) + n T(n) (n )

Algorithm Design Techniques


Divide and Conquer: (Divide, Conquer, Combine)
The idea is to divide the problem into smaller but similar sub problems (divide), solve it
(conquer), and (combine) these solutions to create a solution to the original problem.
For Divide-and-Conquer algorithms the running time is mainly affected by 3 criteria:

Breaking the problem into several sub-problems that are similar to the original
problem but smaller in size,
Solve the sub-problem recursively (successively and independently), and then
Combine these solutions to sub-problems to create a solution to the original problem.

Divide-and-Conquer Examples:

Mergesort
Quicksort
Binary tree traversals
Binary search
Multiplication of large integers
Matrix multiplication: Strassens algorithm
Closest-pair and convex-hull algorithms

Merge Sort
Merge sort is a comparison based sorting algorithm. Merge sort is a stable sort.
Algorithm MERGE
Input: An array A[1m] of elements and three indices p, q and r, with 1 p q< r m, such
that both the sub arrays A[pq] and A[q+1r] are sorted individually in non-decreasing
order.
Output: A[pr] contains the result of merging the two subarrays A[pq] and A[q+1..r].
// B[] is an auxiliary array.
s=p;
t=q+1;
k=p;
while (s p && t r)
{
if (A[s] A[t] )

{
B[k]=A[s];
s=s+1;
}
else
{
B[k] = A[t];
t=t+1;
}
k=k+1;
}
if (s=q+1)
{ B[kr] = A[tr];}
else { B[kr] = A[sq]; }
A[pr]=B[pr]
Analysis of Merge Sort
The merge sort algorithm always divides the array into two balanced lists.
So, the recurrence relation for merge sort is:
T(n)

= 1; if(n <1)

= 2T(n/2)+4n otherwise

Best case O(n log n)


Average case O(n log n)
Worst case O(n log n)

Quick Sort
It is in-place sorting. It is also known as partition exchange sort.
The elements A[low..high] to be sorted are rearranged using Algorithm split so that the pivot
element, which is always A[low], occupies its correct position A[w], and all elements that are

less than or equal to A[w] occupy the positions A[low..w 1], while all elements that are
greater than A[w] occupy the positions A[w + 1..high]. The subarrays A[low..w 1] and
A[w+1..high] are then recursively sorted to produce the entire sorted array. The formal
algorithm is shown as Algorithm quicksort.
Algorithm: Quicksort
Input: An array A[1n] of n elements.
Output: The elements in A sorted in non-decreasing order.
1. quicksort(A,1,n)
Procedure quicksort(A, low, high)
{
if (low < high)
{
SPLIT ( A[lowhigh], w)// w is the new position of A[low];
quicksort(A, low, w-1);
quicksort(A,w+1,high);
}
}
Analysis of Quick Sort

Worst case O(n2) This happens when the pivot is the smallest (or the largest)
element.
Best case O(n log n) The pivot is in the middle and the subarrays divide into balanced
partition every time.
Average case O(n log n).

Advantages of Quick Sort Method

One of the fastest algorithms on average.


It does not need additional memory, so this is called in-place sorting.

Greedy Algorithms
A greedy algorithm is an algorithm that uses the heuristic of making the locally optimal
choice at each stage of problem solving, with the hope of finding a globally optimal.

Minimum Spanning Trees


Prims algorithm will use a priority queue that can be implemented with Red-Black trees or
heaps, and Kruskals algorithm can be implemented using a new data structure called the
Union-Find data structure which is composed of trees and linked lists.
Both algorithms use a greedy strategy. This means that the overall problem is solved by repeatedly
making the choice that is best locally, and hoping that the combination is best globally.

Prims algorithm
The basic idea is to start at some arbitrary node and grow the tree one edge at a time, always adding
the smallest edge that does not create a cycle. What makes Prims algorithm distinct from Kruskals is
that the spanning tree grows connected from the start node. We need to do this n-1 times to make sure
that every node in the graph is spanned. The algorithm is implemented with a priority queue. The
output will be a tree represented by a parent array whose indices are nodes.
We keep a priority queue filled with all the nodes that have not yet been spanned. The value of each
of these nodes is equal to the smallest weight of the edges that connect it to a the partial spanning tree.
1. Initialize the Pqueue with all the nodes and set their values to a number larger than any edge,
set the value of the root to 0, and the parent of the root to nil.
2. While Pqueue is not empty do { Let x be the minimum value node in the Pqueue; For every
node y in xs adjacency list do { if y is in Pqueue and the weight on the edge (x,y) is less than
value(y) { Set value(y) to weight of (x,y); Set parent of y to x;} } }
An example will be done in class, and you can find one in your text on page 508.

Analysis of Prims Algorithm


The time complexity for Prims algorithm is O (e log n)
The analysis of Prims algorithm will show an O(e log n) time algorithm, if we use a heap to
implement the priority queue. Step 1 runs in O(n) time, because it is just building a heap. Step 2 has a
loop that runs O(n) times because we add a new node to the spanning tree with each iteration. Each
iteration requires us to find the minimum value node in the Pqueue. If we use a simple binary heap,
this takes O(log n). The total number of times we execute the last two lines in step 2 is O(e) because
we never look at an edge more than twice, once from each end. Each time we execute these two lines,

there may be a need to change a value in the heap, which takes O(lg n). Hence the total time
complexity is O(n) + O(n log n) + O(e log n), and the last term dominates.

Kruskals Algorithm
Kruskals algorithm also works by growing the tree one edge at a time, adding the smallest edge that
does not create a cycle.
We start with n distinct single node trees, and the spanning tree is empty. At each step we add the
smallest edge that connects two nodes in different trees.
In order to do this, we sort the edges and add edges in ascending order unless an edge is already in a
tree.
For each edge (u,v) in the sorted list in ascending order do {
If u and v are in different trees then add (u,v) to the spanning tree, and union the trees that contain u
and v.}
Hence we need some data structure to store sets of edges, where each set represents a tree and the
collections of sets represents the current spanning forest. The data structure must support the
following operations: Union (s, t) which merges two trees into a new tree, and Find-Set(x) which
returns the tree containing node x.

Union-Find Data Structure


The Union-Find data structure is useful for managing equivalence classes, and is indispensable for
Kruskals algorithm. It is a data structure that helps us store and manipulate equivalence classes. An
equivalence class is simply a set of things that are considered equivalent (satisfies reflexive,
symmetric and transitive properties). Each equivalence class has a representative element. We can
union two equivalence classes together, create a new equivalence class, or find the representative of
the class which contains a particular element. The data structure therefore supports the operations,
Makeset, Union and Find. The Union-Find can also be thought of as a way to maipulate disjoint sets,
which is just a more general view of equivalence classes.
Makeset(x) initializes a set with element x. Union(x, y) will union two sets together. Find(x) returns
the set containing x. One nice and simple implementation of this data structure used a tree defined by
a parent array. A set is stored as a tree where the root represents the set, and all the elements in the set
are descendents of the root. Find(x) works by following the parent pointers back until we reach nil
(the roots parent). Makeset(x) just initializes an array with parent equal to nil, and data value x.
Union(x,y) is done by pointing the parent of x to y. Makeset and Union are O(1) operations but Find

is an O(n) operation, because the tree can get long and thin, depending on the order of the parameters
in the calls to the Union. In particular it is bad to point the taller tree to the root of the shorter tree.

We can fix this by changing Union. Union(x,y) will not just set the parent of x to y. Instead it will
first calculate which tree, x or y, has the greater number of nodes. Then it points the parent of the tree
with the fewer nodes to the root of the tree with the greater nodes. This simple idea guarantees that the
height of a tree is at most lg n. This means that the Find operation has become O(log n).

Analysis of Kruskals Algorithm


The total time taken by this algorithm to find the minimum spanning tree is (E log2 E) (if edges are
afready sorted). Out the time complexity, if edges are not sorted is O(e log n).

Compiler Design
Here is the list of topics covered under GATE 2016 Compiler Design Chapter:
Lexical Analysis: Lexical analyzer reads the source program character by character and
returns the tokens of the source program. It puts information about identifiers into the symbol
table. To read more on Lexical analysis, Click Here
Parsing: Syntax analyzer creates the syntactic structure of the given source program. This
syntactic structure is mostly a parse tree. The syntax of a programming is described by a

Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form) notation in the
description of CFGs. To read more on Parsing, Click Here
Syntax Directed Translation: Grammar symbols are associated with attributes to associate
information with the programming language constructs that they represent. Values of these
attributes are evaluated by the semantic rules associated with the production rules. For
detailed notes on Syntax Directed Translation, Click Here
Runtime Environments: It refers how do we allocate the space for the generated target code
and the data object of our source programs? The places of the data objects that can be
determined to compile time will be allocated statically. But the places for the some of data
objects will be allocated at run-time. For detailed notes on Runtime Environments, Click
Here
Intermediate Code Generation: Intermediate codes are machine independent codes, but
they are close to machine instructions. The given program in a source language is converted
to an equivalent program in an intermediate language by the intermediate code generator. For
more on Intermediate Code Generation, Click Here

Lexical Analysis
Lexical analyzer reads the source program character by character and returns the tokens of
the source program. It puts information about identifiers into the symbol table.
The Role of Lexical Analyzer:

It is the first phase of a compiler

It reads the input character and produces output sequence of tokens that the Parser
uses for syntax analysis.

It can either work as a separate module or as submodule.

Lexical Analyzer is also responsible for eliminating comments and white spaces from
the source program.

It also generates lexical errors.

Lexical Analyzer is also responsible for eliminating comments and white spaces from
the source program.
It also generates lexical errors.

Tokens, Lexemes and Patterns

A token describes a pattern of characters having same meaning in the source program
such as identifiers, operators, keywords, numbers, delimiters and so on. A token may
have a single attribute which holds the required information for that token. For
identifiers, this attribute is a pointer to the symbol table and the symbol table holds the
actual attributes for that token.

Token type and its attribute uniquely identify a lexeme.

Regular expressions are widely used to specify pattern.

Tokens, Patterns and Lexemes


Lexeme: Sequence of character in the source pm that is matched against the pattern for a
token
Pattern: The rule associated with each set of string is called pattern. Lexeme is matched
against pattern to generate token.
Token: Token is word, which describes the lexeme in source pgm. Its is generated when
lexeme is matches against pattern.

Example:
Lexeme: A1, Sum, Total

Pattern: Starting with a letter and followed by letter or digit but not a keyword.
Token: ID

Lexeme: If | Then | Else

Pattern: If | Then | Else


Token: IF | THEN | ELSE

Lexeme: 123.45

Pattern: Starting with digit followed by a digit or optional fraction and or optional
exponent
Token: NUM

Parsing
Syntax Analyzer (Parser): Syntax analyzer creates the syntactic structure of the given
source program. This syntactic structure is mostly a parse tree. The syntax of a programming
is described by a Context-Free Grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.

The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of
that program. Otherwise the parser gives the error messages.

What syntax analysis cannot do!

To check whether variables are of types on which operations are allowed

To check whether a variable has been declared before use

To check whether a variable has been initialized

These issues will be handled in semantic analysis

We categorise the parser into two groups


1. Top-down parser (starts from the root).
2. Bottom-up parser (starts from the leaf).

Efficient top-down and bottom-up parsers can be implemented only for subclasses of
context-free grammars.

Both top-down and bottom-up parsers scan the input from left-to-right (one symbol at
a time).
1. LL for top-down parsing
2. LR for bottom-up parsing)

Example: Consider the following grammar


E ::= E + T | E T | T
T ::= T * F | T / F | F
F ::= num | id
For the input string: id(x) + num(2) * id(y)
Analysis of the top-down parsing:

E => E + T
=> E + T * F
=> T + T * F
=> T + F * F
=> T + num * F
=> F + num * F
=> id + num * F
=> id + num * id
Top down parsing uses left most derivation to derive the string and uses substitutions
during derivation process.
Analysis of the Bottom-up parsing:
id(x) + num(2) * id(y)
=> id(x) + num(2) * F
=> id(x) + F * F
=> id(x) + T * F
=> id(x) + T
=> F + T
=> T + T
=> E + T
=> E

Bottom up parsing uses reverse of right most derivation to verify the string and uses
reductions during the process.
Context-Free Grammars: Inherently recursive structures of a programming language are
defined by a CFG. In a CFG, we have A start symbol (one of the non-terminals). A finite set
of terminals (in our case, this will be the set of tokens). A set of non-terminals (syntactic
variables).
A finite set of production rules in the following form
A , where A is non-terminal and a is a string of terminals (including the empty string).
Parse Trees: Graphical representation for a derivation that filters out the order of choosing
non-terminals to avoid rewriting. The root node represents the start symbol, inner nodes of a
parse tree are non-terminal Symbol.
Ambiguity: A grammar produces more than one parse tree for a sentence is called as
ambiguous grammar. Unambiguous grammar refers unique selection of the parse tree for
a sentence.
Ambiguity elimination:
. Ambiguity is problematic because meaning of the programs can be incorrect
. Ambiguity can be handled in several ways
Enforce associativity and precedence
Rewrite the grammar (cleanest way)
. There are no general techniques for handling ambiguity
. It is impossible to convert automatically an ambiguous grammar to an unambiguous one
Left Recursion: A grammar is left recursive, if it has a non-terminal A such that there is a
derivation.
A A for some string

The left-recursion may appear in a single step of the derivation (immediate left recursion) or
may appear in more than one step of the derivation.
A top down parser with production A A may loop forever
From the grammar A A | b left recursion may be eliminated by transforming the
grammar to
AbR
RR|
Left recursion is an issue of concern in top down parsers. A grammar is left-recursive if
we can find some non-terminal A which will eventually derive a sentential form with
itself as the left-symbol. In other words, a grammar is left recursive if it has a nonterminal A such that there is a derivation
A + A a for some string a. These derivations may lead to an infinite loop.
Top-down parsing technique cant handle left recursive grammars. So, we have to convert
our left recursive grammar into an equivalent grammar which is not left recursive.
Removal of left recursion
In general
A A1|A2||Am
|1|2||n
Transforms to
A 1A|2A|.|nA
A 1A|2A||mA|
Left Factoring: A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left factored.

grammar a new equivalent grammar suitable for predictive parsing.


stmt if expr then stmt else stmt | if expr then stmt
When we see, if we cant know which production rule is to be chosen then rewrite stmt in the
derivation,
In general,

where is not empty and the first symbols of 1 and 2 (if they have one) are different.
When processing , we cant know whether expand
A 1 | 2
But, if we rewrite the grammar as follows
A A
A 1 | 2, so we can immediately expand A to A.
Dangling else problem can be handled by left factoring
stmt if expr then stmt else stmt | if expr then stmt
can be transformed to
stmt if expr then stmt S
S else stmt |
Top-down Parsing: There are two main techniques to achieve top-down parse tree
1. Recursive descent parsing
2. Predictive parsing

Recursive Descent Parsing (Uses Backtracking): Backtracking is needed (if a choice of a


production rule does not work, we backtrack to try other alternatives). It tries to find the left
most derivation. It is not efficient.
e.g., If the grammar is
S aBc
B bc|b and the input is abc

Predictive Parser
. A non recursive top down parsing method
. Parser predicts which production to use
. It removes backtracking by fixing one production for every non-terminal and input token(s)
. Predictive parsers accept LL(k) languages
First L stands for left to right scan of input
Second L stands for leftmost derivation
k stands for number of lookahead token
. In practice LL(1) is used

Predictive Parser
Functions used in Constructing LL (1) Parsing Tables

Two functions are used in the construction of LL (1) parsing tables: FIRST and
FOLLOW.

FIRST () is a set of the terminal symbols which occur as first symbols in strings
derived from , where is any string of grammar symbols. If derives to , then is
also in FIRST ().

FOLLOW (A) is the set of the terminals which occur immediately after (FOLLOW)
the non-terminal A in the strings derived from the starting symbol.

First set is computed for all non-terminals, but follow set is computed only for those
non-terminals in their first set contain epsilon.

For every terminal x in FIRST(X), there is an entry (production which derives x) in


LL(1) table when x is not an epsilon.

For every terminal y in Follow(Y), there is an entry (null production) in the table.

Predictive Parser
To Compute FIRST of any String X

If X is a terminal symbol FIRST (X) = {X}

If X is a non-terminal symbol and X is a production rule is in FIRST (X).

If X is a non-terminal symbol and X Y1, Y2, . , Yn is a production rule.


If a terminal a in FIRST (Yj) and is in all FIRST (Yj) for j = 1, , i -1, then a is in
FIRST (X).
If is in all FIRST (Yj) for j = 1, n, then is in FIRST (X).

If X is , then FIRST (X)= { }

If X is Y1, Y2, Yn
If a terminal a in FIRST (Yi) and is in all FIRST (Yj) for j = 1, i -1, then a is in
FIRST (X).
If is in all FIRST (Yj) for j =1,..n, then is in FIRST (X).

Predictive Parser
Example:
For the expression grammar
E T E
E +T E |
T F T
T * F T |
F ( E ) | id
First(E) = First(T) = First(F) = { (, id }
First(E) = {+, }
First(T) = { *, }
To compute FOLLOW (for Non-terminals):
If S is the start symbol, $ is in FOLLOW (S).

If A B is a production rule, then everything in FIRST () is FOLLOW (B)


except .

If (A B is a production rule) or (A B is a production rule and is in FIRST


() then everything in FOLLOW (A) is in FOLLOW (B).

Apply these rules until nothing more can be added to any FOLLOW set.

LL(1) Parsing algorithm

The parsing table is a two-dimensional array M [X,a] , where X is a non-terminal, and


a is a terminal or the symbol $.

The parser considers X the symbol on top of stack, and a the current input symbol

These two symbols determine the action to be taken by the parser

Assume that $ is a special token that is at the bottom of the stack and terminates the
input string.

1. If X = a = $, the parser halts and announces successful completion of parsing.


2. If X = a $, the parser pops X off the stack and advances the input pointer to the next
input symbol.
3. If X is a nonterminal, the program consults entry M[X,a] of the parsing table M. This
entry will be either an X-production of the grammar or an error entry. If, for example,
M[X,a] = {X UVW}, the parser replaces X on top of the stack by UVW (with U
on the top). If M[X,a] = error, the parser calls an error recovery routine.
Example: Consider the grammar
E T E
E +T E |
T F T
T * F T |
F ( E ) | id
Parse table for the grammar is given below:

For the above grammar and parsing table, we verify the string id + id * id in the following
way with the help of parsing algorithm.

Bottom-up Parsing Techniques: A bottom-up parser creates the parse tree of the given
input string from leaves towards the root. A bottom-up parser tries to find the right most
derivation of the given input in the reverse order.
Bottom-up parsing is also known as shift reduce parsing.

A more powerful parsing technique

LR grammars more expensive than LL

Can handle left recursive grammars

Can handle virtually all the programming languages

Natural expression of programming language syntax

Automatic generation of parsers (Yacc, Bison etc.)

Detects errors as soon as possible

Allows better error recovery

Shift Reduce Parsing: A shift reduce parser tries to reduce the given input string into the
starting symbol. At each reduction step, a substring of the input matching to the right side of
a production rule is replaced by the non-terminal at the left side of that production rule.
Handle: A handle of a string is a substring that matches the right side of a production rule.

Handles always appear at the top of the stack and never inside it.

This makes stack a suitable data structure.

Actions: There are four possible actions of a shift reduce parser

Shift: The next input symbol is shifted onto the top of the stack.

Reduce: Replace the handle on the top of the stack by the non-terminal.

Accept: Successful completion of parsing.

Error: Parser discovers a syntax error and calls an error recovery routine.

Conflicts During Shift Reduce Parsing


There are CFGs for which shift reduce parser cant be used. Stack contents and the next input
symbol may not decide action.
The general shift-reduce technique is:

if there is no handle on the stack then shift

If there is a handle then reduce


However, what happens when there is a choice?

What action to take in case both shift and reduce are valid?

Shift/Reduce Conflict: Whether make a shift operation or a reduction.


Reduce/Reduce Conflict: The parser cant decide which of several reductions to make.
Types of Shift Reduce Parsing: There are two main categories of shift reduce parsers
Operator Precedence Parser: Simple, but supports only a small class of grammars.
LR Parsers:

LR parsers accept LR(k) languages


L stands for left to right scan of input
R stands for rightmost derivation
k stands for number of lookahead tokens

Types of LR Parsers:

SLR (Simple) LR parser

CLR (general) LR parser (canonical LR)

LALR (Intermediate) LR parser (look-ahead LR)

SLR, CLR and LALR work in same way, but their parsing tables may different.
Relative power of various classes :
SLR(1) LALR(1) LR(1)
SLR(k) LALR(k) LR(k)
LL(k) LR(k)
SLR (1) < LALR (1) < LR (1)
SLR (k) < LALR (1) < LR (k)
LL (k) < LR (k)
LR parsing: LR parsing is most general non-back tracking shift reduce parsing. The class of
grammars that can be parsed using LR methods is a proper superset of the class of grammars
that can be parsed with predictive parsers.
LL (1) grammars LR (1)) grammars
An LR parser can detect a syntactic error as soon as it is possible.

A configuration of a LR parsing is
(S0 X1 S1 Xm Sm, ai ai-1 an $)
Stack Rest of input

Sm and ai decides the parser action by consulting the parsing action table (initial stack
contains just S0).

A configuration of a LR parsing represents the right sentential form


X1 . Xm ai ai-1 an $

LR Parser Actions
Shift S: Shift the next input symbol and the state S onto the stack
(S0 X1 S1 Xm Sm, ai ai-1 an $) (S0 X1 S1 Xm Sm, ai S, ai-1 an $)
Reduce A : Pop 2|| (= r) items from the stack; let us assume that = Yl, Y2 , Yr
Then, push A and S, where S = goto [Sm r , A]
(S0 X1 S1 Xm Sm, ai ai+1 an $) (S0 X1 S1 Xm-r Sm-r, AS, ai , ai-1 an $)
Accept: Parsing successfully completed.
Error: Parser detected an error (an empty entry in the action table).

Example:
Consider the grammar And its parse table E E + T | T
T T*F|F
F ( E ) | id

Parse id + id * id using the given grammar and bottom up parsing table.


Answer:

Operator Precedence Parsing: In an operator grammar, no production rule can have E at


the right side and two adjacent non-terminals at the right side.
Precedence Relations: In operator precedence parsing, we define three disjoint precedence
relations between certain pair of terminals.
a < b, b has higher precedence than a.
a = b, b has same precedence as a.
a > b, b has lower precedence than a.
The determination of correct precedence relation between terminals are based on the
traditional notions of associativity and precedence of operators.

Syntax Directed Translation


Grammar symbols are associated with attributes to associate information with the
programming language constructs that they represent. Values of these attributes are evaluated
by the semantic rules associated with the production rules.

Attribute Grammar Framework

Generalization of CFG where each grammar symbol has an associated set of attributes

Values of attributes are computed by semantic rules

Two notations for associating semantic rules with productions

Syntax directed definition

high level specifications

hides implementation details

explicit order of evaluation is not specified

Translation schemes

indicate order in which semantic rules are to be evaluated

allow some implementation details to be shown

Evaluation of the semantic rules are as follows


1. May generate intermediate codes
2. May put information into the symbol table
3. May perform type checking
4. May issue error messages
5. May perform some other activities

An attribute may hold a string, a number, a memory location, a complex record etc.

Evaluation of a semantic rule defines the value of an attribute, but a semantic rule
may also have some side effects such as printing a value.

Attributes

Attributes fall into two classes: synthesized and inherited

The value of a synthesized attribute is computed from the values of its children nodes

The value of an inherited attribute is computed from the sibling and parent nodes.

S-Attributed grammar: A syntax directed definition that uses only synthesized attributes is
said to be an S- attributed definition. A parse tree for an S-attributed definition can be
annotated by evaluating semantic rules for attributes.
Translations are appended only at the end.
It uses bottom up parsing for evaluation.
L-attributed grammar: When translation takes place during parsing, order of evaluation is
linked to the order in which nodes are created. L-attributed definition: where attributes can be
evaluated in depth-first order.
This definition can use synthesized attributes and also restricted inherited attributes(the value
can be taken from parent and left siblings only).
Translations can append anywhere in the rhs of the production.
It uses a natural order in both top-down and bottom-up parsing is depth first-order.
Evaluation order of SDTs

Using definitions: L-attributed definition and S-attributed definition

Using parsing: Top-down evaluation and Bottom up evaluation

Dependency Graph : Directed graph indicating interdependencies among the synthesized and
inherited attributes of various nodes in a parse tree.
Syntax directed translation table: Symbols E, T and F are associated with an attribute
value.

Runtime Environments
It refers how do we allocate the space for the generated target code and the data object of our
source programs? The places of the data objects that can be determined to compile time will
be allocated statically. But the places for the some of data objects will be allocated at runtime.
The allocation and de allocation of the data objects is managed by the run-time support
package. Run-time support package is loaded together with the generated target code. The
structure of the run-time support package depends on the semantics of the programming
language (especially the semantics of procedures in that language).
Symbol Table

Compiler uses symbol table to keep track of scope and binding information about
names

symbol table is changed every time a name is encountered in the source; changes to
table occur

if a new name is discovered

if new information about an existing name is discovered

Symbol table must have mechanism to:


o

add new entries

find existing information efficiently

Two common mechanism:


o

linear lists, simple to implement, poor performance

hash tables, greater programming/space overhead, good performance

Compiler should be able to grow symbol table dynamically

if size is fixed, it must be large enough for the largest program

Procedure Activation
Each activation of a procedure is called as activation of that procedure. An execution of a
procedure starts at the beginning of the procedure body. When the procedure is completed, it
returns the control to the point immediately after the place, where that procedure is called.
Each execution of the procedure is called as its activation.

Lifetime of an activation of that procedure (including the other procedures called by


that procedure).

If a and b are procedure activations, then their lifetimes are either non-overlapping or
are nested.

If a procedure is recursive, a new activation can begin before an earlier activation of


the same procedure has ended.

Activation Tree
We can create a tree (known as activation tree) to show the way control enters and leaves
activations. In an activation tree

Each node represents an activation of a procedure.

The root represents the activation of the main program.

The node a is a parent of the node b if and only if the control flows from a tab.

The node a is left to the node b if the lifetime of a occurs before the lifetime of b.

Example:
Program main;
enter main
Procedure s;
enter p

Begin end;
enter q
Procedure p;
exit q
Procedure q;
enter s
Beginend;
exit s
Begin q; s; end;
exit p
Begin p;s; end;
enter s
exit s
exit main

Control Stack
The flow of the control in a program corresponds to a depth first traversal of the activation
tree that
1. Starts at the root.
2. Visits a node before its children.
3. Recursively visits children at each node and a left-to-right order.

A stack called control stack can be used to keep track of live procedure activations.

1. An activation record is pushed onto the control stack as the activation starts.
2. That activation record is popped when that activation ends.

When node n is at the top of the control stack, the stack contains the nodes along the
path from n to the root.

Variable Scope
The scope rules of the language determine, which declaration of a name applies when the
name appears in the program.
An occurrence of a variable is local, if that occurrence is in the same procedure in which that
name is declared and the variable is non-local, if it is declared outside of that procedure.
Example:
Procedure q;
Var a: real;
Procedure r;
Var b: integer;
Begin b=1; a=2; end;
Beginend;
Variable b is local to procedure r and variable a is non-local to procedure r.
Storage Organisation

DATA: Locations of static data can also be determined at compile time.


CODE: Memory locations for code that are determined at compile time.

STACK: Data Objects allocated at run time, supports recursion.


HEAP: Dynamically allocated object at run time, supports explicit allocation and deallocation
of memory.

Storage Allocation Strategies

Static allocation: lays out storage at compile time for all data objects

Stack allocation: manages the runtime storage as a stack

Heap allocation :allocates and de-allocates storage as needed at runtime from heap

Static allocation:

Names are bound to storage as the program is compiled

No runtime support is required

Bindings do not change at run time

On every invocation of procedure names are bound to the same storage

Values of local names are retained across activations of a procedure

Type of a name determines the amount of storage to be set aside

Address of a storage consists of an offset from the end of an activation record

Compiler decides location of each activation

All the addresses can be filled at compile time

Constraints:
o

Size of all data objects must be known at compile time

Recursive procedures are not allowed

Data structures cannot be created dynamically

Stack Allocation:

Address can bind during run-time.

Recursion supported

Run time allocation supported but can not be managed explicitly.

Heap Allocation:

Stack allocation cannot be used if:


o

The values of the local variables must be retained when an activation ends

A called activation outlives the caller

In such a case de-allocation of activation record cannot occur in last-in first-out


fashion

Heap allocation gives out pieces of contiguous storage for activation records

There are two aspects of dynamic allocation -:

Runtime allocation and de-allocation of data structures.

Languages like Algol have dynamic data structures and it reserves some part of
memory for it.

Activation Record
Information needed by a single execution of a procedure is managed using a contiguous block
of storage called activation record. When a procedure is entered, an activation record is
allocated and it is deallocated when that procedure exits. Size of each field can be determined
at compile time, although actual location of the activation record is determined at run-time.
Key Points

If a procedure has a local variable and its size depends on a parameter, its size is
determined at run-time.

Some part of the activation record of a procedure is created by that procedure


immediately after that procedure is entered and some part is created by the caller of
that procedure before that procedure is entered.

Return Value: The returned value of the called procedure is returned in this field to the
calling procedure. We can use a machine register for the return value.

Actual parameters: The field for actual parameters is used by the calling procedure to
supply parameter to the called procedure.

Optional control link: The optional control link points to the activation record of the caller.

Optional access link: It is used to refer to the non-local data held in the other activation
record.
Saved Machine Status: The field for saved machine status holds information about the state
of the machine before the procedure is called
Local data: Local data field holds data that is local to an execution of a procedure.
Temporaries: Temporary variables are stored in field of temporaries.

Intermediate Code Generation


Intermediate codes are machine independent codes, but they are close to machine
instructions. The given program in a source language is converted to an equivalent program
in an intermediate language by the intermediate code generator.

The designer of the compiler decides the intermediate language.

Syntax trees can be used as an intermediate language.

Postfix notations, three address code (quadruples) can be used as an intermediate


language.

Syntax Tree
Syntax tree is a variant of the parse tree, where each leaf represents an operand and each
interior node represent an operator.

A sentence a * (b +d) would have the following syntax tree

Three-Address Code

When each statement contains three addresses (two for operands and one for result), Most
general kind of three-address code is
x = yop z
Where x, y and z are names, constants or compiler generated temporaries and op is any
operator.
But we can also use the following notation for quadruples (much better notation because it
looks like a machine code instruction)
Op y, z, x
Apply operator op to y and z and store the result in x.

Representation of Three-Address Codes


Three-address code can be represented in various forms i.e., quadruples, triples and indirect
triples. These forms are demonstrated by way of examples below.
e.g., A = B*(C +0)
Three address code is as follow
T1 = B
T2 = C + D
T3 = T1* T2
A = T3
Quadruples

Triples

Indirect triple

You might also like