You are on page 1of 105

Data Structures-1

05/20/16

PDM MCA

What is Data Structure


In computer science, a data structure is a particular
way of storing and organizing data in a computer so
that it can be used efficiently.
Data may be organized in many different ways, the
logical or mathematical model of aparticular
organization of data in memory or ondisk is called
Data Structure.Algorithms are used for manipulation
of data.
05/20/16

PDM MCA

Data Structure Operations

The data appearing in our data structure is processed by means of certain


operations. The following four operations play a major role:
Transversing
Accessing each record exactly once so that certain items in the record may
beprocessed.(This accessing or processing is sometimes called 'visiting" the
records.)
Searching
Finding the location of the record with a given key value, or finding the
locations of all records, which satisfy one or more conditions.
Inserting\
Adding new records to the structure.
Deleting
Removing a record from the structure.
05/20/16

PDM MCA

Types of Data structure


There are two types of data structure
1. Linear Data Structure
A data structure is said to be linear if its elements form a sequence, or in other
words a linear list.
I. Array
II. Stack
III. Queue
IV. Linked List
2. Non- Linear Data Structure
A non-linear structure is mainly used to represent data containing a hierarchical
relationship between elements.
I. Tree
II. Graph
05/20/16

PDM MCA

ALGORITHM

Algorithm is a step-by-step procedure for calculations or Finite step


by step to solve the finite problem with finite amount of time. Some
other definition of algorithm are
An algorithm is a set of rules for carrying out calculation either by
hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a required
result.
An algorithm is a sequence of computational steps that transform
the input into the output.
An algorithm is a sequence of operations performed on data that
have to be organized in data structures.
PDM MCA

Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the
upper bound of the number of operation (or running time)
performed by an algorithm when the input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the
upper bound except possibly for some values of the input where
the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average
number of operations over all problem instances for a given size.
PDM MCA

Algorithm Analysis
Because, it is quite difficult to estimate the statistical behavior
of the input, most of the time we content ourselves to a worst
case behavior. Most of the time, the complexity of g(n) is
approximated by its family o(f(n)) where f(n) is one of the
following functions. n (linear complexity), log n (logarithmic
complexity), na where a 2 (polynomial complexity), an
(exponential complexity).
Optimality
Once the complexity of an algorithm has been estimated, the
question arises whether this algorithm is optimal. An algorithm

PDM MCA

Algorithm Analysis
for a given problem is optimal if its complexity reaches the
lower bound over all the algorithms solving this problem. For
example, any algorithm solving the intersection of n segments
problem will execute at least n2 operations in the worst case
even if it does nothing but print the output. This is abbreviated
by saying that the problem has (n2) complexity. If one finds an
O(n2) algorithm that solve this problem, it will be optimal and of
complexity (n2).
Reduction
Another technique for estimating the complexity of a problem is
the transformation of problems, also called problem reduction.
As an example, suppose we know a lower bound for
PDM MCA

Algorithm Analysis
a problem A, and that we would like to estimate a lower bound
for a problem B. If we can transform A into B by a
transformation step whose cost is less than that for solving A,
then B has the same bound as A.

PDM MCA

Complexity
measurement
space complexity is also important: This is essentially the
number of memory cells which an algorithm needs
Time complexity of an algorithm quantifies the amount of time
taken by an algorithm to run

PDM MCA

10

Big O Notation
O notation approximates the cost function of an algorithm
The approximation is usually good enough, especially when
considering the efficiency of algorithm as n gets very large
Allows us to estimate rate of function growth
Instead of computing the entire cost function we only need to count
the number of times that an algorithm executes its barometer
instruction(s)
The instruction that is executed the most number of times in an
algorithm (the highest order term)

PDM IT

11/320

Big O Notation
The cost function of an algorithm A, tA(n), can be approximated by
another, simpler, function g(n) which is also a function with only 1
variable, the data size n.
The function g(n) is selected such that it represents an upper bound
on the efficiency of the algorithm A (i.e. an upper bound on the
value of tA(n)).
This is expressed using the big-O notation: O(g(n)).
For example, if we consider the time efficiency of algorithm A then
tA(n) is O(g(n)) would mean that
A cannot take more time than O(g(n)) to execute or that
(more than c.g(n) for some constant c)
the cost function tA(n) grows at most as fast as g(n)

PDM IT

12/320

Top down approach


A top-down approach is essentially the breaking down of a
system to gain insight into its compositional sub-systems. In a
top-down approach an overview of the system is formulated,
specifying but not detailing any first-level subsystems. Each
subsystem is then refined in yet greater detail, sometimes in
many additional subsystem levels, until the entire specification is
reduced to base elements. A top-down model is often specified
with the assistance of "black boxes", these make it easier to
manipulate. However, black boxes may fail to elucidate
elementary mechanisms or be detailed enough to realistically
validate the model. Top down approach starts with the big
picture. It breaks down from there into smaller segments.
PDM MCA

13

Bottom up approach
A bottom-up approach is the piecing together of systems to
give rise to grander systems, thus making the original systems
sub-systems of the emergent system. Bottom-up processing is
a type of information processing based on incoming data from
the environment to form a perception. Information enters the
eyes in one direction (input), and is then turned into an image
by the brain that can be interpreted and recognized as a
perception (output). In a bottom-up approach the individual
base elements of the system are first specified in great detail

PDM MCA

14

String processing
String : A finite sequence S of zero or more
characters is called string.
String with zero characters is called empty
string.
String will be denoted by enclosing their
characters in single quotation marks.
E.g. TO BE
05/20/16

PDM MCA

15

Storing Strings
Strings are stored in 3 types of structures
1. Fixed Length Structure
2. Variable Length Structure with fixed
maximum
1. Linked Structures.

05/20/16

PDM MCA

16

String Operations
1.

Substring :Accessing a substring from a given string

SUBSTRING(string,initial,length)
2.Indexing : Also called pattern matching i.e. position where a string
pattern P first appears in agiven string T.
INDEX(text,pattern)
3. Concatenation : string consists of characters of first string followed by
characters of 2nd string.
S1//S2
4. Length : the number of characters in a string.
LENGTH(string)

05/20/16

PDM MCA

17

Word Processing
Following operations are there:
Replacement : replacing one string in the text by
another.
REPLACE(text,pattern1,pattern2)

Insertion : inserting a string in the middle of the


text.
INSERT(text,position,string)
Deletion : deleting a string from the text
DELETE(Text,position,length)
05/20/16

PDM MCA

18

Arrays
Array - a collection of a fixed number of components
wherein all of the components have the same data type
One-dimensional array - an array in which the components
are arranged in a list form
The general form of declaring a one-dimensional array is:
dataType arrayName[intExp];
where intExp is any expression that evaluates to a
positive integer

Declaring an array
The statement
int num[5];
declares an array num of 5 components of the
type int
The components are num[0], num[1],
num[2], num[3], and num[4]

Accessing Array
Components
The general form (syntax) of accessing an array component is:
arrayName[indexExp]
where indexExp, called index, is any expression whose
value is a nonnegative integer
Index value specifies the position of the component in the array
The [] operator is called the array subscripting operator
The array index always starts at 0

05/20/16

PDM MCA

23

Linked List

Definition of Linked Lists


Examples of Linked Lists
Operations on Linked Lists
Linked List as a Class
Linked Lists as Implementations of Stacks, Sets, etc.

PDM IT

24/320

Definition of Linked
Lists
A linked list is a sequence of items (objects) where every item is
linked to the next.
Graphically:

data

data

data

data
tail_ptr

head_ptr

PDM IT

25/320

Definition Details
Each item has a data part (one or more data members), and a
link that points to the next item
One natural way to implement the link is as a pointer; that is, the
link is the address of the next item in the list
It makes good sense to view each item as an object, that is, as an
instance of a class.
We call that class: Node
The last item does not point to anything. We set its link member
to NULL. This is denoted graphically by a self-loop

PDM IT

26/320

Example of Linked List


(A Waiting Line)
A waiting line of customers: John, Mary, Dan, Sue (from the head to
the tail of the line)
A linked list of strings can represent this line:

John

Mary

Dan

Sue
tail_ptr

head_ptr
PDM IT

27/320

Illustration of a linked list in memory:

node

pointer to a
next node
pointer to
an element

28/320

node

pointer to a
next node
pointer to
an element

29/320

node

pointer to a
next node
pointer to
an element

30/320

node

pointer to a
next node
pointer to
an element

31/320

32/320

Singly Linked Lists


and Arrays

33/320

Operations on Linked
Lists

Insert a new item


At the head of the list, or
At the tail of the list, or
Inside the list, in some designated position
Search for an item in the list
The item can be specified by position, or by some value
Delete an item from the list
Search for and locate the item, then remove the item, and
finally adjust the surrounding pointers
size( );
isEmpty( )

PDM IT

34/320

Insert At the Head


Insert a new data A. Call new:
List before insertion:
data

data

newPtr

data

data

head_ptr

tail_ptr

After insertion to head:


A

data

data

data

data
tail_ptr

head_ptr
The link value in the new item = old head_ptr
The new value of head_ptr = newPtr
PDM IT

35/320

Insert at the Tail


Insert a new data A. Call new:
List before insertion
data

data

newPtr

data

data

head_ptr

tail_ptr

After insertion to tail:


data

data

data

data

A
tail_ptr

head_ptr
The link value in the new item = NULL
The link value of the old last item = newPtr
PDM IT

36/320

Insert inside the List


Insert a new data A. Call new:
List before insertion:
data

data

data

newPtr

data

data

head_ptr

tail_ptr

After insertion in 3rd position:


data

data

data

data
tail_ptr

head_ptr
The link-value in the new item = link-value of 2nd item
The new link-value of 2nd item = newPtr
PDM IT

37/320

Delete the Head Item


List before deletion:
data

data

data

data

data
tail_ptr

head_ptr

List after deletion of the head item:


data

data

data

data

head_ptr

data
tail_ptr

The new value of head_ptr = link-value of the old head item


The old head item is deleted and its memory returned
PDM IT

38/320

Delete the Tail Item


List before deletion:
data

data

data

data

data
tail_ptr

head_ptr

List after deletion of the tail item:


data

data

data

data

data

tail_ptr

head_ptr

New value of tail_ptr = link-value of the 3rd from last item


New link-value of new last item = NULL.
PDM IT

39/320

Delete an inside Item


List before deletion:
data

data

data

data

data
tail_ptr

head_ptr

List after deletion of the 2nd item:


data

data

data

data

data
tail_ptr

head_ptr
New link-value of the item located before the deleted one =
the link-value of the deleted item
PDM IT

40/320

Implementation of
Linked List
A linked list is a collection of Node objects, and must support a
number of operations
Therefore, it is sensible to implement a linked list as a class
The class name for it is List

PDM IT

41/320

Circular Linked List


In linear linked lists if a list is traversed (all the elements visited) an
external pointer to the list must be preserved in order to be able to
reference the list again.
Circular linked lists can be used to help the traverse the same list
again and again if needed. A circular list is very similar to the linear
list where in the circular list the pointer of the last node points not
NULL but the first node.

PDM IT

42/320

Contd.

A Linear Linked List

PDM IT

43/320

Contd.

PDM IT

44/320

Contd.

PDM IT

45/320

Circular Linked List


In a circular linked list there are two methods to know
if a node is the first node or not.
Either a external pointer, list, points the first node
or
A header node is placed as the first node of the
circular list.
The header node can be separated from the others by
either heaving a sentinel value as the info part or
having a dedicated flag variable to specify if the node
is a header node or not.
PDM IT

46/320

Doubly Linked List


A double ended is queue where we can do the
insertion and deletion at both the front and rear
ends

PDM IT

47/320

Contd.

PDM IT

48/320

Implementation

PDM IT

49/320

Stacks
A stack is a list of elements in which an
element may be inserted or deleted only at one
end, called the top of the stack.
The elements are removed from a stack in the
reverse order of that in which they were
inserted into the stack.
Stack is also known as a LIFO (Last in Fast
out) list or Push down list.

Basic Stack Operations


PUSH: It is the term used to insert an element
into a stack.

PUSH operations on stack

Basic Stack Operations


POP: It is the term used to delete an element
from a stack.

POP operation from a stack

Standard Error Messages


in Stack
Two standard error messages of stack are
Stack Overflow: If we attempt to add new element
beyond the maximum size, we will encounter a
stack overflow condition.
Stack Underflow: If we attempt to remove
elements beyond the base of the stack, we will
encounter a stack underflow condition.

Stack Operations
PUSH (STACK, TOP, MAXSTR, ITEM): This procedure pushes
an ITEM onto a stack
1. If TOP = MAXSIZE, then Print: OVERFLOW, and Return.
2. Set TOP := TOP + 1 [Increases TOP by 1]
3. Set STACK [TOP] := ITEM. [Insert ITEM in TOP position]
4. Return
POP (STACK, TOP, ITEM): This procedure deletes the top
element of STACK and assign it to the variable ITEM
1. If TOP = 0, then Print: UNDERFLOW, and Return.
2. Set ITEM := STACK[TOP]
3. Set TOP := TOP - 1 [Decreases TOP by 1]
4. Return

Applications of Stack
Converting algebraic expressions from one
form to another. E.g. Infix to Postfix, Infix to
Prefix, Prefix to Infix, Prefix to Postfix,
Postfix to Infix and Postfix to prefix.
Evaluation of Postfix expression.
Parenthesis Balancing in Compilers.
Depth First Search Traversal of Graph.
Recursive Applications.

Algebraic Expressions
Infix: It is the form of an arithmetic expression in which
we fix (place) the arithmetic operator in between the two
operands. E.g.: (A + B) * (C - D)
Prefix: It is the form of an arithmetic notation in which
we fix (place) the arithmetic operator before (pre) its
two operands. The prefix notation is called as polish
notation. E.g.: * + A B C D
Postfix: It is the form of an arithmetic expression in
which we fix (place) the arithmetic operator after (post)
its two operands. The postfix notation is called as suffix
notation and is also referred to reverse polish notation.
E.g: A B + C D - *

Conversion from Infix to Postfix


Convert the following infix expression A + B * C D / E * H into its equivalent
postfix expression.

Evaluation of Postfix
Expression
Postfix expression: 6 5 2 3 + 8 * + 3 + *

Queue
A queue is a data structure where items are
inserted at one end called the rear and deleted
at the other end called the front.
Another name for a queue is a FIFO or
First-in-first-out list.
Operations of a Queue:
enqueue: which inserts an element at the end of
the queue.
dequeue: which deletes an element at the front of
the queue.

Representation of
Queue

Initially the queue is empty.

Now, insert 11 to the queue. Then queue status will be:

Next, insert 22 to the queue. Then the queue status is:

Representation of
Queue
Now, delete an element 11.

Next insert another element, say 66 to the queue. We cannot insert 66 to the
queue as it signals queue is full. The queue status is as follows:

Queue Operations using


Array
Various operations of Queue are:
insertQ(): inserts an element at the end of queue Q.
deleteQ(): deletes the first element of Q.
displayQ(): displays the elements in the queue.

There are two problems associated with linear


queue. They are:
Time consuming: linear time to be spent in shifting
the elements to the beginning of the queue.
Signaling queue full: even if the queue is having
vacant position.

Applications of
Queue
It is used to schedule the jobs to be processed
by the CPU.
When multiple users send print jobs to a
printer, each printing job is kept in the printing
queue. Then the printer prints those jobs
according to first in first out (FIFO) basis.
Breadth first search uses a queue data structure
to find an element from a graph.

Circular Queue
A circular queue is one in which the insertion
of new element is done at the very first
location of the queue if the last location of the
queue is full.
Suppose if we have a Queue of n elements
then after adding the element at the last index
i.e. (n-1)th , as queue is starting with 0 index,
the next element will be inserted at the very
first location of the queue which was not
possible in the simple linear queue.

Circular Queue
operations

The Basic Operations of a circular queue are


InsertionCQ: Inserting an element into a circular
queue results in Rear = (Rear + 1) % MAX, where
MAX is the maximum size of the array.
DeletionCQ : Deleting an element from a circular
queue results in Front = (Front + 1) % MAX,
where MAX is the maximum size of the array.
TraversCQ: Displaying the elements of a circular
Queue.

Circular Queue Empty: Front=Rear=0.

Circular Queue
Representation using
Arrays
Let us consider a circular queue, which can hold maximum (MAX) of six
elements. Initially the queue is empty.

Insertion and Deletion operations on a


Circular Queue
Insert new elements 11, 22, 33, 44 and 55 into the circular queue. The circular
queue status is:

Now, delete two elements 11, 22 from the circular queue. The circular queue
status is as follows:

Insertion and Deletion operations on a


Circular Queue
Again, insert another element 66 to the circular queue. The status of the circular
queue is:

Again, insert 77 and 88 to the circular queue. The status of the Circular queue
is:

Double Ended Queue


(DEQUE)
It is a special queue like data structure that
supports insertion and deletion at both the front
and the rear of the queue.
Such an extension of a queue is called a
double-ended queue, or deque, which is
usually pronounced "deck" to avoid confusion
with the dequeue method of the regular queue,
which is pronounced like the abbreviation
"D.Q."
It is also often called a head-tail linked list.

DEQUE Representation
using arrays

Types of DEQUE
There are two variations of deque. They are:
Input restricted deque (IRD)
Output restricted deque (ORD)

An Input restricted deque is a deque, which


allows insertions at one end but allows
deletions at both ends of the list.
An output restricted deque is a deque, which
allows deletions at one end but allows
insertions at both ends of the list.

Priority Queue
A priority queue is a collection of elements
that each element has been assigned a priority
and such that order in which elements are
deleted and processed comes from the
following riles:
An element of higher priority is processed before
any element of lower priority.
Two element with the same priority are processed
according to the order in which they were added to
the queue.

Priority Queue Operations and Usage


Inserting new elements.
Removing the largest or smallest element.
Priority Queue Usages are:
Simulations: Events are ordered by the time at
which they should be executed.
Job scheduling in computer systems: Higher
priority jobs should be executed first.
Constraint systems: Higher priority constraints
should be satisfied before lower priority constraints.

Tree terminology and


defination
Composed of nodes and edges
Node
Any distinguishable object

Edge

An ordered pair <u,v> of nodes


Illustrated by an arrow with tail at u and head at v
u = tail of <u,v>
v = head of <u,v>

Root
A distinguished node, depicted at the top of the tree
PDM IT

74/320

Contd.
1. A single node, with no edges, is a tree. The
root of the tree is its unique node.
2. Let T1, , Tk (k >=1) be trees with no nodes
in common, and let r1, ,rk be the roots of
those trees, respectively. Let r be a new node.
Then there is a tree T consisting of the nodes
and edges of T1, , Tk , the new node r and
the edges <r,r1>,,<r,rk>. The root of T is r
and T1, ,Tk are called the subtrees of T.
PDM IT

75/320

Contd.
r is called the parent of r1, , rk
r1, , rk are the children of r
r1, , rk are the siblings of one another
Node v is a descendant of u if
u=v
or
v is a descendant of a child of u
A path exists from u to v.
PDM IT

76/320

Tree Terminology
Every node is a descendant of itself.
If v is a descendant of u, then u is an ancestor
of v.
Proper descendants:
The descendants of a node, other than the node
itself.

Proper ancestors
The ancestors of a node, other than the node itself.
PDM IT

77/320

properties
Leaf
a node with no children

Num_nodes = Num_edges + 1
Every tree has exactly one more node than edge
Every edge, except the root, is the head of exactly
one of the edges

PDM IT

78/320

Tree Terminology
Height of a node in a tree
Length of the longest path from that node to a leaf

Depth of a node in a tree


Length of the path from the root of the tree to the
node

PDM IT

79/320

Tree terminology

w has depth 2
u has height 3
tree has height 4

PDM IT

80/320

Trees
A tree is a collection of nodes
The collection can be empty
(recursive definition) If not empty, a tree consists
of a distinguished node r (the root), and zero or
more nonempty subtrees T1, T2, ...., Tk, each of
whose roots are connected by a directed edge from
r

PDM IT

81/320

Some Terminologies

Child and parent


Every node except the root has one parent
A node can have an arbitrary number of children

Leaves
Nodes with no children

Sibling
nodes with same parent
PDM IT

82/320

Some Terminologies
Path
Length-number of edges on the path
Depth of a node
length of the unique path from the root to that node
The depth of a tree is equal to the depth of the deepest leaf

Height of a node
length of the longest path from that node to a leaf
all leaves are at height 0
The height of a tree is equal to the height of the root

Ancestor and descendant


Proper ancestor and proper descendant
PDM IT

83/320

Example: UNIX
Directory

PDM IT

84/320

Binary Trees

A tree in which no node can have more than two children

The depth of an average binary tree is considerably smaller than N,


eventhough in the worst case, the depth can be as large as N 1.

PDM IT

85/320

Contd.
An ordered tree with at most two children for
each node
If node has two child, the 1st is called the left
child, the 2nd is called the right child
If only one child, it is either the right child or
the left child

PDM IT

86/320

Example: Expression
Trees

Leaves are operands (constants or variables)


The other nodes (internal nodes) contain operators
Will not be a binary tree if some operators are not binary
PDM IT

87/320

Tree traversal
Used to print out the data in a tree in a certain
order
Pre-order traversal
Print the data at the root
Recursively print out all data in the left subtree
Recursively print out all data in the right subtree

PDM IT

88/320

Preorder, Postorder
and Inorder
Preorder traversal
node, left, right
prefix expression
++a*bc*+*defg

PDM IT

89/320

Preorder, Postorder
and Inorder
Postorder traversal
left, right, node
postfix expression
abc*+de*f+g*+

Inorder traversal
left, node, right.
infix expression
a+b*c+d*e+f*g
PDM IT

90/320

Binary Trees
Possible operations on the Binary Tree ADT

parent
left_child, right_child
sibling
root, etc

Implementation
Because a binary tree has at most two children, we can keep direct
pointers to them

PDM IT

91/320

Graph
A graph G is defined as follows:
G=(V,E)
V(G): a finite, nonempty set of vertices
E(G): a set of edges (pairs of vertices)

Directed vs. undirected


graphs
When the edges in a graph have no
direction, the graph is called undirected

Directed vs. undirected


Graph terminology
graphs (cont.)
When the edges in a graph have a direction,
the graph is called directed (or digraph)

Warning: if the graph is


directed, the order of the
vertices in each edge is
important !!
E(Graph2) = {(1,3) (3,1) (5,9) (9,11) (5,7)

Graph
Treesterminology
vs graphs
Trees are special cases of graphs!!

Graph terminology
Adjacent nodes: two nodes are adjacent if they are
connected by an edge
5 is adjacent to 7
7 is adjacent from 5

Path: a sequence of vertices that connect two


nodes in a graph
Complete graph: a graph in which every vertex is
directly connected to every other vertex

Graph Properties -- Paths


and Connectivity

Paths
A path from vertex u to v of a graph G is defined as a sequence of
adjacent (connected by an edge) vertices that starts with u and ends with
v.
Simple paths: All edges of a path are distinct.
Path lengths: the number of edges, or the number of vertices 1.
Connected graphs
A graph is said to be connected if for every pair of its vertices u and v
there is a path from u to v.
Connected component
The maximum connected subgraph of a given graph.

1-97

Graph
terminology
Graph
terminology (cont.)
What is the number of edges in a complete
directed graph with N vertices?
N * (N-1)

O( N )

Graph
terminology
Graph
terminology (cont.)
What is the number of edges in a complete
undirected graph with N vertices?
N * (N-1) / 2

O( N )

Graph
terminology
Graph terminology (cont.)
Weighted graph: a graph in which each edge
carries a value

Graph implementation
Array-based implementation
A 1D array is used to represent the vertices
A 2D array (adjacency matrix) is used to
represent the edges

Array-based
implementation

Graph implementation
(cont.)
Linked-list implementation
A 1D array is used to represent the vertices
A list is used for each vertex v which contains the
vertices which are adjacent from v (adjacency list)

Linked-list
implementation

Adjacency matrix vs.


adjacency list
representation
Adjacency matrix
Good for dense graphs --|E|~O(|V|2)
Memory requirements: O(|V| + |E| ) = O(|V|2 )
Connectivity between two vertices can be tested
quickly

Adjacency list
Good for sparse graphs -- |E|~O(|V|)
Memory requirements: O(|V| + |E|)=O(|V|)
Vertices adjacent to another vertex can be found
quickly

You might also like