Professional Documents
Culture Documents
Algorithmic problem
Specification
of input
Specification
of output as
a function of
input
Algorithmic Solution
Input instance,
adhering to
the
specification
Algorithm
Output
related to
the input as
required
50
40
30
20
10
Experimental Study
n
50
100
Pseudo-Code
A mixture of natural language and high-level
programming concepts that describes the main
ideas behind a generic implementation of a data
structure or algorithm.
Eg: Algorithm arrayMax(A, n):
Input: An array A storing n integers.
Output: The maximum element in A.
currentMax A[0]
for i 1 to n-1 do
if currentMax < A[i] then currentMax A[i]
return currentMax
Pseudo-Code
It is more structured than usual prose but
less formal than a programming language
Expressions:
use standard mathematical symbols to
describe numeric and boolean expressions
use for assignment (= in Java)
use = for the equality relationship (== in
Java)
Method Declarations:
Algorithm name(param1, param2)
Pseudo Code
Programming Constructs:
decision structures: if ... then ... [else ... ]
while-loops: while ... do
repeat-loops: repeat ... until ...
for-loop: for ... do
array indexing: A[i], A[i,j]
Methods:
calls: object method(args)
returns: return value
Analysis of Algorithms
Primitive Operation: Low-level operation
independent of programming language.
Can be identified in pseudo-code. For eg:
Data movement (assign)
Control (branch, subroutine call, return)
arithmetic an logical operations (e.g. addition,
comparison)
Example: Sorting
OUTPUT
INPUT
a permutation of the
sequence of numbers
sequence of numbers
10
b1,b2,b3,.,bn
Sort
2
Correctness
Correctness(requirements
(requirementsfor
forthe
the
output)
output)
For
Forany
anygiven
giveninput
inputthe
thealgorithm
algorithm
halts
haltswith
withthe
theoutput:
output:
bb1 <<bb2 <<bb3 <<.
.<< bbnn
1
2
3
bb1,,bb2,,bb3,,.,
b is a
1
2
3 ., bnn is a
permutation
permutationof
ofaa1,,aa2,,aa3,.,a
,.,an
1
10
Running
Runningtime
time
Depends
Dependson
on
number
numberof
ofelements
elements(n)
(n)
how
how(partially)
(partially)sorted
sorted
they
theyare
are
algorithm
algorithm
Insertion Sort
A
2
j
5 1
n
i
Strategy
Strategy
Start
Startempty
emptyhanded
handed
Insert
Insertaacard
cardininthe
theright
right
position
positionof
ofthe
thealready
alreadysorted
sorted
hand
hand
Continue
Continueuntil
untilall
allcards
cardsare
are
inserted/sorted
inserted/sorted
INPUT:
INPUT:A[1..n]
A[1..n]an
anarray
arrayof
ofintegers
integers
OUTPUT:
OUTPUT:aapermutation
permutationof
ofAAsuch
suchthat
that
A[1]
A[2]
A[n]
A[1] A[2] A[n]
for
for j2
j2 to
to nn do
do
key
key
A[j]
A[j]
Insert
InsertA[j]
A[j]into
intothe
thesorted
sortedsequence
sequence
A[1..j-1]
A[1..j-1]
ij-1
ij-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
do
do A[i+1]A[i]
A[i+1]A[i]
i-i-A[i+1]key
A[i+1]key
cost
c1
c2
0
c3
c4
c5
c6
c7
times
n
n-1
n-1
n-1
n
j =2 j
n
j =2
n
j =2
(t j 1)
(t j 1)
n-1
Best/Worst/Average Case
Total time = n(c1+c2+c3+c7) + nj=2 tj (c4+c5+c6)
(c2+c3+c5+c6+c7)
Running time
6n
5n
best-case
4n
3n
2n
1n
1 2 3 4 5
6 7 8
9 10 11 12 ..
Asymptotic Analysis
Goal: to simplify analysis of running time by
getting rid of details, which may be affected by
specific implementation and hardware
like rounding: 1,000,001 1,000,000
3n2 n2
Asymptotic Notation
The big-Oh O-Notation
c g(n)
f (n)
Running Time
n0
Input Size
Example
For functions f(n) and g(n) there are positive
constants c and n0 such that: f(n) c g(n) for n n0
f(n) = 2n + 6
conclusion:
2n+6 is O(n).
c g(n) = 4n
g(n) = n
n
Another Example
Asymptotic Notation
Simple Rule: Drop lower order terms and
constant factors.
50 n log n is O(n log n)
7n - 3 is O(n)
8n2 log n + 5n2 + n is O(n2 log n)
for i 0 to n-1 do
a0
n iterations
for j 0 to i do
i iterations
a a + X[j]
1 step with
i=0,1,2...n-1
A[i] a/(i+1)
return array A
Analysis: running time is O(n2)
A Better Algorithm
Algorithm prefixAverages2(X):
Input: An n-element array X of numbers.
Output: An n-element array A of numbers such
that A[i] is the average of elements X[0], ... , X[i].
s0
for i 0 to n do
s s + X[i]
A[i] s/(i+1)
return array A
Analysis: Running time is O(n)
Asymptotic Notation
The big-Omega
Notation
E.g., lower-bound of
searching in an unsorted
array is (n).
f (n )
c g ( n)
Running Time
n0
Input Size
Asymptotic Notation
The big-Theta Notation
asymptotically tight bound
f(n) = (g(n)) if there exists
constants c1, c2, and n0, s.t.
c1 g(n) f(n) c2 g(n) for n
n0
f (n )
Running Time
c 2 g (n )
c 1 g (n )
n0
Input Size
Asymptotic Notation
Two more asymptotic notations
"Little-Oh" notation f(n)=o(g(n))
non-tight analogue of Big-Oh
For every c, there should exist n0 , s.t. f(n)
c g(n) for n n0
Used for comparisons of running times.
If f(n)=o(g(n)), it is said that g(n)
dominates f(n).
Asymptotic Notation
Analogy with real numbers
f(n) = O(g(n))
f(n) = (g(n))
f(n) = (g(n))
f(n) = o(g(n))
f(n) = (g(n))
fg
fg
f=g
f<g
f>g
1 hour
400n
2500
150000
9000000
166666
7826087
2n2
707
5477
42426
n4
31
88
244
2n
19
25
31
Correctness of Algorithms
The algorithm is correct if for any legal
input it terminates and produces the
desired output.
Automatic proof of correctness is not
possible
But there are practical techniques and
rigorous formalisms that help to reason
about the correctness of algorithms
Algorithm
Output
Total correctness
INDEED this point is reached, AND this is the desired output
Algorithm
Output
Assertions
To prove correctness we associate a number of
assertions (statements about the state of the
execution) with specific checkpoints in the
algorithm.
E.g., A[1], , A[k] form an increasing sequence
Loop Invariants
Invariants assertions that are valid any
time they are reached (many times during
the execution of an algorithm, e.g., in
loops)
We must show three things about loop
invariants:
Initialization it is true prior to the first
iteration
Maintenance if it is true before an iteration,
it remains true before the next iteration
Termination when loop terminates the
invariant gives a useful property to show the
correctness of the algorithm
for
for jj 22 to
to length(A)
length(A)
do
do key
key A[j]
A[j]
ii j-1
j-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
do
do A[i+1]
A[i+1] A[i]
A[i]
i-i-A[i+1]
A[i+1] key
key
for
for jj 22 to
to length(A)
length(A)
do
do key
key A[j]
A[j]
ii j-1
j-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
do
do A[i+1]
A[i+1] A[i]
A[i]
i-i-A[i+1]
A[i+1] key
key
for
for jj 22 to
to length(A)
length(A)
do
do key
key A[j]
A[j]
ii j-1
j-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
do
do A[i+1]
A[i+1] A[i]
A[i]
i-i-A[i+1]
A[i+1] key
key
for
for jj 22 to
to length(A)
length(A)
do
do key
key A[j]
A[j]
ii j-1
j-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
do
do A[i+1]
A[i+1] A[i]
A[i]
i-i-A[i+1]
A[i+1] key
key
Floor:
Math Review
Geometric progression
given an integer n0 and a real number 0< a 1
1 a n +1
a = 1 + a + a + ... + a =
1 a
i =0
n
Arithmetic progression
n2 + n
i = 1 + 2 + 3 + ... + n =
2
i =0
n
Summations
The running time of insertion sort is
determined by a nested loop
for
for j2
j2 to
to length(A)
length(A)
keyA[j]
keyA[j]
ij-1
ij-1
while
while i>0
i>0 and
and A[i]>key
A[i]>key
A[i+1]A[i]
A[i+1]A[i]
ii-1
ii-1
A[i+1]key
A[i+1]key
1)
=
O
(
n
)
j =2
n
Proof by Induction
We want to show that property P is true for
all integers n n0
Basis: prove that P is true for n0
Inductive step: prove that if P is true for
all k such that n0 k n 1 then P is also
true for n
n
n(n + 1)
=
=
S
(
n
)
i
for n 1
Example
i =0
1
Basis
S (1) = i =
i =0
1(1 + 1)
2
n 1
i =0
i=0
S ( n ) = i = i + n =S ( n 1) + n =
( n 1 + 1)
(n 2 n + 2n)
= ( n 1)
+n=
=
2
2
n ( n + 1)
=
2
Stacks
Abstract Data Types (ADTs)
Stacks
Interfaces and exceptions
Java implementation of a stack
Application to the analysis of a
time series
Growable stacks
Stacks in the Java virtual
machine
New():ADT
Insert(S:ADT, v:element):ADT
Delete(S:ADT, v:element):ADT
IsIn(S:ADT, v:element):boolean
Other Examples
Simple ADTs:
Queue
Deque
Stack
Stacks
A stack is a container of objects that are
inserted and removed according to the
last-in-first-out (LIFO) principle.
Objects can be inserted at any time, but
only the last (the most-recently inserted)
object can be removed.
Inserting an item is known as pushing
onto the stack. Popping off the stack is
synonymous with removing an item.
Stacks (2)
Stacks(3)
Stacks(4)
objects in stack S
isEmpty(S:ADT): boolean - Indicates if stack
S is empty
Axioms
v)) = S
Top(Push(S, v)) = v
Pop(Push(S,
Java Stuff
Given the stack ADT, we need to code the ADT in
order to use it in the programs. We need two
constructs: interfaces and exceptions.
An interface is a way to declare what a class is to
do. It does not mention how to do it.
For
Exceptions
Exceptions are yet another programming
construct, useful for handling errors. When we
find an error (or an exceptional case), we just
throw an exception.
As soon as the exception is thrown, the flow of
control exits from the current method and goes to
where that method was called from.
What is the point of using exceptions? We
can delegate upwards the responsibility of
handling an error. Delegating upwards means
letting the code who called the current code deal
with the problem.
More Exceptions
public void eatPizza() throws
StomachAcheException
{...
if (ateTooMuch)
So when
throw new
StomachAcheException(Ouch); StomachAcheException
...}
is thrown, we exit from
private void simulateMeeting()
method eatPizza() and
{...
go to TA.eatpizza().
try
{TA.eatPizza();}
catch (StomachAcheException e)
{system.out.println(somebody has a stomach ache);}
...}
Exceptions (finally)
An Inefficient Algorithm
Algorithm computeSpans1(P):
Input: an n-element array P of numbers such that P[i] is
the price of the stock on day i
Output: an n-element array S of numbers such that S[i]
is the span of the stock on day i
for i 0 to n - 1 do
k 0; done false
repeat
if P[i - k] P[i] then k k + 1
else done true
until (k = i) or done
S[i] k
return S
We
An Efficient Algorithm
Algorithm computeSpans2(P):
Let D be an empty stack
for i 0 to n - 1 do
k 0; done false
while not (D.isEmpty() or done) do
if P[i] P[D.top()] then D.pop()
else done true
if D.isEmpty() then h -1
else h D.top()
S[i] i - h
D.push(i)
return S
4+1
a b
a b c
a b c d
Phase 1
a b c d e
8+4+1
a b c d e f
a b c d e f g
a b c d e f g h
a b c d e f g h i
a b c d e f g h i k
a b c d e f g h i k l
a b c d e f g h i k l m
a b c d e f g h i k l m n
Phase 2
12+8+1
1
1
1
16+12+1
Phase 3
ci
Growth Strategy
start with an array of size 0. cost of a special push is 3N + 1
a
1+0+1
a b
2+1+1
a b c
4+2+1
a b c d
Phase 0
Phase 1
Phase 2
a b c d e
8+4+1
a b c d e f
a b c d e f g
a b c d e f g h
Phase 3
a b c d e f g h i
16+8+1
a b c d e f g h i k
a b c d e f g h i k l
a b c d e f g h i k l m
1
1
a b c d e f g h i k l m n
Phase 4
2i
Queues
A queue differs from a stack in that its insertion
and removal routines follows the first-in-first-out
(FIFO) principle.
Elements may be inserted at any time, but only
the element which has been in the queue the
longest may be removed.
Elements are inserted at the rear (enqueued) and
removed from the front (dequeued)
Front
Queue
Rear
Queues (2)
Queues (3)
Axioms:
Front(Enqueue(New(),
v)) = v
Dequeque(Enqueue(New(), v)) = New()
Front(Enqueue(Enqueue(Q, w), v)) =
Front(Enqueue(Q, w))
Dequeue(Enqueue(Enqueue(Q, w), v)) =
Enqueue(Dequeue(Enqueue(Q, w)), v)
An Array Implementation
Create a queue using an array in a circular
fashion
A maximum size N is specified.
The queue consists of an N-element array
Q and two integer variables:
f,
Pseudo code
Algorithm size()
return (N-f+r) mod N
Algorithm isEmpty()
return (f=r)
Algorithm dequeue()
if isEmpty() then
return Queueemptyexception
Q[f] null
f (f+1)modN
Algorithm front()
if isEmpty() then
Algorithm enqueue(o)
return Queueemptyexception
if size = N - 1 then
return Q[f]
return Queuefullexception
Q[r] o
r (r +1)modN
Double-Ended Queue
A double-ended queue, or deque, supports
insertion and deletion from the front and back
The deque supports six fundamental methods
Deque
Implementation
size()
isEmpty()
last()
insertLast(o)
removeLast()
Deque
Implementation
size()
size()
isEmpty()
isEmpty()
front()
first()
enqueue(o)
insertLast(o)
dequeue()
removeFirst()
Circular Lists
Sequences
Vectors
Positions
Lists
General Sequences
Array-Based Implementation
Algorithm insertAtRank(r,e):
for i = n 1 downto r do
S[i+1] S[i]
S[r] e
nn+1
Algorithm removeAtRank(r):
e S[r]
for i = r to n - 2 do
S[i] S[i + 1]
nn-1
return
Java Implementation
public void insertAtRank (int rank, Object element)
throws BoundaryViolationException {
if (rank < 0 || rank > size())
throw new BoundaryViolationException(invalid rank);
DLNode next = nodeAtRank(rank);
// the new node will be right before this
DLNode prev = next.getPrev();
// the new node will be right after this
DLNode node = new DLNode(element, prev, next);
// new node knows about its next & prev.
// Now we tell next & prev about the new node.
next.setPrev(node);
prev.setNext(node);
size++;
}
deleting a node
after deletion
Java Implementation
public Object removeAtRank (int rank)
throws BoundaryViolationException {
if (rank < 0 || rank > size()-1)
throw new BoundaryViolationException(Invalid rank.);
DLNode node = nodeAtRank(rank); // node to be removed
DLNode next = node.getNext();
// node before it
DLNode prev = node.getPrev();
// node after it
prev.setNext(next);
next.setPrev(prev);
size--;
return node.getElement();
// returns the element of the deleted node
}
Nodes
Linked lists support the efficient execution of
node-based operations:
removeAtNode(Node v) and
insertAfterNode(Node v, Object e), would be
O(1).
However, node-based operations are not
meaningful in an array-based implementation
because there are no nodes in an array.
Nodes are implementation-specific.
Dilemma:
If
atRank(r)
returns a
position
rankOf(p) returns an
integer rank
Comparison of Sequence
Implementations
Iterators
Abstraction of the process of scanning through a
collection of elements
Encapsulates the notions of place and next
Extends the position ADT
Generic and specialized iterators
ObjectIterator: hasNext(), nextObject(), object()
PositionIterator: nextPosition()
Useful methods that return iterators: elements(),
positions()
Dictionaries
the
dictionary ADT
binary search
Hashing
Dictionaries
Dictionaries
Searching
INPUT
sequence of numbers
(database)
a single number (query)
a1, a2, a3,.,an; q
OUTPUT
index of the found
number or NIL
10
7;
10
7;
NIL
Binary Search
Idea: Divide and conquer, a key design technique
narrow down the search range in stages
findElement(22)
A recursive procedure
Algorithm BinarySearch(A, k, low, high)
if low > high then return Nil
else mid (low+high) / 2
if k = A[mid] then return mid
elseif k < A[mid] then
return BinarySearch(A, k, low, mid-1)
else return BinarySearch(A, k, mid+1, high)
2
12
low
2
14
17
19
22
27
28
33
12
14
12
14
37
high
mid
17
19
22
low
2
25
17
25
27
28
33
high
mid
19
low mid
22
25
37
27
28
33
37
An iterative procedure
INPUT: A[1..n] a sorted (non-decreasing) array of
integers, key an integer.
OUTPUT: an index j such that A[j] = k.
NIL, if j (1 j n): A[j] k
2
low 1
high n
do
mid (low+high)/2
if A[mid] = k then return mid
else if A[mid] > k then high mid-1
else low mid+1
while low <= high
return NIL
12
low
2
14
17
19
22
25
27
28
33
high
mid
12
14
17
19
22
low
12
14
37
17
25
27
28
33
high
mid
19
low mid
22
25
37
27
28
33
37
The Problem
T&T is a large phone company, and they
want to provide caller ID capability:
given a phone number, return the callers
name
phone numbers range from 0 to r = 108 -1
There are n phone numbers, n << r.
want to do this as efficiently as possible
searching
searching
search)
inserting and removing takes O(n) time
application to look-up tables (frequent
searches, rare insertions and removals)
(null)
Ankur
(null)
(null)
Another Solution
Can do better, with a Hash table -- O(1) expected
time, O(n+m) space, where m is table size
Like an array, but come up with a function to map
the large range into one which we can manage
e.g., take the original key, modulo the (relatively
small) size of the array, and use that as an index
Insert (9635-8904, Ankur) into a hashed array
with, say, five slots. 96358904 mod 5 = 4
(null)
(null)
(null)
(null)
Ankur
An Example
Let keys be entry nos of students in
CSL201. eg. 2004CS10110.
There are 100 students in the class. We
create a hash table of size, say 100.
Hash function is, say, last two digits.
Then 2004CS10110 goes to location 10.
Where does 2004CS50310 go?
Also to location 10. We have a collision!!
Collision Resolution
How to deal with two keys which hash to the
same spot in the array?
Use chaining
Set up an array of links (a table), indexed by
the keys, to lists of items with the same key
Most
To find/insert/delete an element
using h, look up its position in table T
Search/insert/delete the element in the
linked list of the hashed slot
Analysis of Hashing
An
Analysis of Hashing(2)
An good hash function is one which distributes
keys evenly amongst the slots.
An ideal hash function would pick a slot,
uniformly at random and hash the key to it.
However, this is not a hash function since we
would not know which slot to look into when
searching for a key.
For our analysis we will use this simple uniform
hash function
Given hash table T with m slots holding n
elements, the load factor is defined as =n/m
Analysis of Hashing(3)
Unsuccessful search
element is not in the linked list
Simple uniform hashing yields an average
list length = n/m
expected number of elements to be
examined
search time O(1+) (includes computing
the hash value)
( i 1)
i =1
nm
i =1
1 ( n 1) n
nm
2
n 1
=1+
2m
n
1
=1+
2m 2m
1
1+
2 2m
=1+
Hashing
Hash functions
Hash-code
maps
Compression maps
Open addressing
Linear
Probing
Double hashing
Hash Functions
Need
quick
to compute
distributes keys uniformly throughout the table
good hash functions are very rare birthday
paradox
How
find
eg.
Compression Maps
Use the remainder
h(k) = k mod m, k is the key, m the size of the
table
Need to choose m
m = be (bad)
if m is a power of 2, h(k) gives the e least
significant bits of k
all keys with the same ending go to the same
place
m prime (good)
helps ensure uniform distribution
primes not too close to exact powers of 2
= m (k A mod 1)
k is the key, m the size of the table, and A
is a constant
0<A<1
h(k)
The
steps involved
map
Universal Hashing
More on Collisions
A
to do?!?
Use
Hashing
Open Addressing
All
Hash
Linear Probing
= k mod 13
insert keys: 18 41 22 44 59 32 31 73
31
44 32
41
73
18 44 59 32 22 31 73
18 44 59 32 22 31 73
3
10
11
12
18 44 59 32 22 31 73
3
10
11
12
Deletion (2)
41
0
18 44 59 X 22 31 73
3
10
11
12
Double Hashing
Uses two hash functions, h1, h2
h1(k) is the position in the table where we first
check for key k
h2(k) determines the offset we use when
searching for k
In linear probing h2(k) is always 1.
DoubleHashingInsert(k)
if (table is full) error
probe = h1(k); offset = h2(k)
while (table[probe] occupied)
probe = (probe+offset) mod m
table[probe] = k
Double Hashing(2)
If m is prime, we will eventually examine
every position in the table
Many of the same (dis)advantages as
linear probing
Distributes keys more uniformly than linear
probing
18 41 22 44 59 32 31 73
10
11
12
Analysis (2)
Average
Avg no of probes
<= 2
<= 4
<= 8
Total no of probes
m
m
m
Analysis(3)
No of probes required to insert
m/2+m/4+m/8++m/2i elements = number of
probes required to leave 2-i fraction of the table
empty = m x i.
No of probes required to leave 1- fraction of
the table empty = - m log (1- )
Average no. of probes required to insert n
elements is - (m/n) log (1- ) = -(1/) log (1- )
unsuccessful
successful
chaining
O(1 + )
O(1 + )
probing
1
O
1
1
O ln
Trees
trees
binary trees
data structures for trees
Trees: Definitions
level 0
level 1
level 2
level 3
Trees
A tree represents a
hierarchy, for e.g. the
organization structure of a
corporation
Or table of contents
of a book
4
Another Example
Unix or DOS/Windows file system
Binary Tree
leaf or
10
Binary Tree
11
12
13
Tree
15
expandExternal(p),
removeAboveExternal(p)
other application specific methods
17
key
left[x] pointer to left
child
right[x] pt. to right
child
p[x] pt. to parent node
18
Root
Parent:
BinaryTree
LeftChild: BinaryTree
RightChild: BinaryTree
19
Unbounded Branching
UnboundedTree:
Root
Parent:
UnboundedTree
LeftChild: UnboundedTree
RightSibling: UnboundedTree
20
Tree Walks/Traversals
A
preorder traversal
Algorithm preOrder(v)
visit node v
for each child w of v do
recursively perform preOrder(w)
postorder traversal
Algorithm postOrder(v)
for each child w of v do
recursively perform postOrder(w)
visit node v
We
a b c d f g e
e
d
c
f
c f g d be a
specialization of a
postorder traversal
Algorithm evaluate(v)
if v is a leaf
return the variable stored at v
else
let o be the operator stored at v
x evaluate(v.leftChild())
y evaluate(v.rightChild())
return x o y
Traversing Trees
Algorithm inOrder(v)
if (v == null) then return
else inOrder(v.leftChild())
visit(v)
inOrder(v.rightChild())
Inorder Example
a
Inorder
b
c b f d ga e
d
c
f
on the left
from below
on the right
Printing an arithmetic
expression - so called
Eulers walk:
Print ( before
traversing the left
subtree, traverse it
Print the value of a
node
Traverse the right
subtree, print ) after
traversing it
10
generic computation
mechanism that can be
specialized by redefining
certain steps
implemented by means of
an abstract Java class
with methods that can be
redefined by its
subclasses
11
12
abcdfge
cbfdgae
bcdfg
cbfdg
d f g
f d g
a
b
e
d
c
f
13
Preorder: a b c
Postorder: c b a
b
c
c
15
A special case
If each internal node of the
binary tree has at least two
children then the tree can
be determined from the pre
and post order traversals.
Preorder
postorder
abcdfge
bcdfg
d f g
cfgdbea
a
b
e
d
c
f
cfgdb
f g d
16
Ordered Dictionaries
In
k)
k)
For
A List-Based Implementation
Unordered
list
searching
list
searching
Binary Search
Narrow
findElement(22)
Running Time
The range of candidate items to be searched is
halved after comparing the key with the middle
element
Binary search runs in O(lg n) time (remember
recurrence...)
What about insertion and deletion?
Searching a BST
To
compare
k with key[root[T]]
if k < key[root[T]], search for k in left[root[T]]
otherwise, search for k in right[root[T]]
6
Search Examples
Search(T,
11)
6)
Recursive version
Search(T,k)
01
02
03
04
05
06
x root[T]
if x = NIL then return NIL
if k = key[x] then return x
if k < key[x]
then return Search(left[x],k)
else return Search(right[x],k)
Iterative version
Search(T,k)
01 x root[T]
02 while x NIL and k key[x] do
03
if k < key[x]
04
then x left[x]
05
else x right[x]
06 return x
9
Analysis of Search
Running
10
Running
11
Successor
Given x, find the node with the smallest key
greater than key[x]
We can distinguish two cases, depending on the
right subtree of x
Case 1
12
Successor (2)
Case
the
13
Successor Pseudocode
TreeSuccessor(x)
01
02
03
04
05
06
03
if right[x] NIL
then return TreeMinimum(right[x])
y p[x]
while y NIL and x = right[y]
x y
y p[y]
return y
For
14
BST Insertion
The
take
16
y NIL
x root[T]
while x NIL
y x
if key[z] < key[x]
then x left[x]
else x right[x]
p[z] y
if y = NIL
then root[T] z
else if key[z] < key[y]
then left[y] z
else right[y] z
17
18
Deletion
Delete
has no children
x has one child
x has two children
Deletion Case 1
If
Deletion Case 2
If
Deletion Case 3
If
Delete Pseudocode
TreeDelete(T,z)
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
BST Sorting
Use
Call InorderTreeWalk
1 1 3 5 7 8 10
8
10
13
17
18
1/x
1/2
1/3
1/41/5
1/6
1/n
x
1 2 3 4 5 6
19
Quick Sort
Characteristics
sorts
Partitioning
Linear
Partition(A,p,r)
01
02
03
04
05
06
07
08
09
10
11
j j
i i
xA[r]
17 12
ip-1
X=10
i
jr+1
while TRUE
10 12
repeat jj-1
until A[j] x
repeat ii+1
until A[i] x
10 5
if i<j
then exchange A[i]A[j]
else return j
10
19 23
10
j
6
19 23
19 23
17
12 17
23 19 12 17
3
Quicksort(A,p,r)
01 if p<r
02
then q Partition(A,p,r)
03
Quicksort(A,p,q)
04
Quicksort(A,q+1,r)
Analysis of Quicksort
Assume
Best Case
T (n) = 2T (n / 2) + (n)
Worst Case
What
( k )
k =1
= ( k )
k =1
= ( n 2 )
7
input
is sorted
input reverse sorted
Same
Analysis of Quicksort
Suppose
10
Suppose, we alternate
lucky and unlucky
cases to get an
average behavior
(n)
n-1
1
(n-1)/2
(n-1)/2
(n-1)/2+1
( n )
(n-1)/2
11
Randomized algorithm
running time is independent of the input ordering
no specific input triggers worst-case behavior
the worst-case is only determined by the output of the
random-number generator
12
Randomized Quicksort
Assume
14
15
O(n2)
Best case running time of quicksort is
O(nlog2 n)
Expected running time of quicksort is
O(nlog2 n)
17
18
19
AVL Trees
AVL Trees
AVL Tree
AVL trees are
balanced.
An AVL Tree is a
binary search tree
such that for every
internal node v of T,
the heights of the
children of v can differ
by at most 1.
4
44
2
3
17
78
1
32
1
88
50
1
48
62
> 2in(h-2i)
When i = h/2-1 we get: n(h) > 2h/2-1n(2) = 2h/2
Taking logarithms: h < 2log n(h)
Thus the height of an AVL tree is O(log n)
A sharper bound
We
u
v
k-2
k-1
Insertion
A binary tree T is called height-balanced if for
every node v, height of vs children differ by atmost
one.
Inserting a node into an AVL tree changes the
heights of some of the nodes in T.
If insertion causes T to become unbalanced, we
travel up the tree from the newly created node until
we find the first node x such that its grandparent z
is unbalanced node.
Let y be the parent of node x.
11
Insertion (2)
5
44
2
17
32
78
4
1
50
88
1
48
62
54
44
2
17
2
1
32
62
78
50
z
2
1
48
54
1
88
12
AVL Trees
AVL Trees
Insertion
Inserting a node, v, into an AVL tree changes the
heights of some of the nodes in T.
The only nodes whose heights can increase are
the ancestors of node v.
If insertion causes T to become unbalanced, then
some ancestor of v would have a heightimbalance.
We travel up the tree from v until we find the first
node x such that its grandparent z is unbalanced.
Let y be the parent of node x.
Insertion (2)
To rebalance the subtree rooted at z, we must
perform a rotation.
44
17
z
78
y
50
32
44
48
88
62 x
62
17
y
50
32
48
x
78 z
54
88
54
3
Rotations
Rotation is a way of locally reorganizing a BST.
Let u,v be two nodes such that u=parent(v)
Keys(T1) < key(v) < keys(T2) < key (u) < keys(T3)
u
v
T1
T3
T2
Insertion
Insertion happens in subtree T1.
ht(T1) increases from h to h+1.
Since x remains balanced ht(T2)
is h or h+1 or h+2.
h+1 to h+2
z
y
T4
T3
T2
h
Insertion(2)
h+3
h+2 to h+3
T4
h+1
h+1 to h+2
T3
h+1
T1
h to h+1
T2
h
Single rotation
h+3
h+2 to h+3
h+1 to h+2
rotation(y,z)
T4
T3
h+1
T1
h to h+1
T2
h
h+3
h+1
h+2
z h+2
T1
T2
T3
T4
h+1
h+1
h+1
Double rotation
h+3
h+2 to h+3
h+1
h+4 z
h+3 x
rotation(x,y)
h+2 y
T4
T3
h+1
x h+1 to h+2
T1
T4
T1
T2
h+1
h+1
h+1
rotation(x,z)
h to h+1
T2
T3
h
x
h+3
h+2 y
h+2
T1
T2
T3
T4
h+1
h+1
h+1
8
Restructuring
single rotation
a=z
b=y
b=y
c=x
c=x
T0
T3
T1
T3
T0
T1
T2
T2
c=z
b=y
single rotation
b=y
a=x
c=z
a=x
T3
T0
T2
T1
T3
T2
T1
T0
Restructuring (contd.)
double rotations:
double rotation
a=z
c=y
b=x
a=z
c=y
b=x
T0
T3
T2
T0
T2
T1
T3
T1
double rotation
c=z
a=y
b=x
a=y
c=z
b=x
T3
T0
T2
T3
T2
T1
T0
T1
10
Deletion
When
Deletion(2)
Let w be the node deleted.
Let z be the first unbalanced node encountered
while travelling up the tree from w. Also, let y be the
child of z with larger height, and let x be the child of
y with larger height.
We perform rotations to restore balance at the
subtree rooted at z.
As this restructuring may upset the balance of
another node higher in the tree, we must continue
checking for balance until the root of T is reached
12
Deletion(3)
h+2
h+1
h
T1
h-1 or h-2
T2
Suppose
deletion
happens in subtree T4
z
and its ht. reduces from
h to h-1.
T4 h to h-1
Since z was balanced
but
is
now
unbalanced,
h
or
h-1
T3
ht(y) = h+1.
h-1 or h-2
x has larger ht. than T3
and so ht(x)=h.
Since y is balanced
ht(T3)= h or h-1
13
Deletion(4)
h+2
h+1
h
T1
h-1 or h-2
Since
z
T4
h to h-1
T3 h or h-1
ht(x)=h, and x is
balanced ht(T1), ht(T2)
is h-1 or h-2.
However, both T1 and
T2 cannot have ht. h-2
T2
h-1 or h-2
14
rotation(y,z)
T4
h to h-1
h+1 or h+2
h or h+1
T1
h-1 or h-2
T3
h or h-1
T2 h-1 or h-2
T1
z
T2
T3
h or h-1
T4
h-1
y
x
T1
h-1
T2
h-1 or h-2
As
Double rotation
h+2
z
h+1
rotation(x,y)
T4
x h
h-1
T3 h-1 or h-2
h to h-1
T1
T1
T4
h+1
h+2
T2
h-1
h-1 or h-2
h-1
rotation(x,z)
T2
h-1 or h-2
T3
x
h-1 or h-2
T1
h-1
h+1
z
T2
h-1 or h-2
T3
T4 h-1
h-1 or h-2
17
(2,4) Trees
Multi-way Searching
Searching
for s = 8
5 10
33 44
6 8
22
Searching
for s = 12
14
11 13
Not found!
25
23 24
27
17 18 19 20 21
(2,4) Trees
Properties:
At most 4 children
All leaf nodes are at
the same level.
Height h of (2,4) tree
is at least log4 n and
atmost log2 n
12
5 10
3 4
6 8
15
11
13 14
17
Insertion
21
23 40
29
13 22 32
18
3 8 10
1 2
4 5 6
11 12
14 15
25
20
24
35
26 28 30 33
37 39
Insertion(2)
29
13 22 32
18
3 8 10
1 2
4 5 6
11 12
14 15
25
20 21
23 24
35
26 28 30 33
37 39 40
Insertion(3)
13 22 32
18
3 8 10
1 2
4 5 6
11 12
14 15
25
20 21
35
33
23 24
26 28
29 30
37 39 40
Insertion(4)
If parent node does not have sufficient space
then it is split.
In this manner splits can cascade.
13 22 32
18
3 8 10
1 2
9
4 5
6 7
11 12
14 15
25 28
20 21
35
33
23 24
26
29 30
37 39 40
Insertion(5)
Eventually we may have to create a new root.
This increases the height of the tree
5 13
22 32
8 10
18
1 2
9
4
6 7
11 12
14 15
25 28
20 21
35
33
23 24
26
29 30
37 39 40
Deletion
Delete 21.
No problem if key to be deleted is in a leaf with at least 2
keys
13
22
8 10
18
1 2
9
4
6 7
11 12
14 15
25 28
20 21
23 24
26
29 30
Deletion(2)
If key to be deleted is in an internal node then
we swap it with its predecessor (which is in a
leaf) and then delete it.
Delete 25
13
22
8 10
18
1 2
9
4
6 7
11 12
14 15
25 28
20
23 24
26
29 30
Deletion(3)
If after deleting a key a node becomes empty
then we borrow a key from its sibling.
Delete 20
13
22
8 10
18
1 2
9
4
6 7
11 12
14 15
24 28
20
23
26
29 30
Deletion(4)
If sibling has only one key then we merge with it.
The key in the parent node separating these two
siblings moves down into the merged node.
13
Delete 23
22
8 10
24 28
15
1 2
9
4
6 7
11 12
14
18
23
26
29 30
Delete(5)
Moving a key down from the parent corresponds
to deletion in the parent node.
The procedure is the same as for a leaf node.
13
Can lead to a cascade .
Delete 18
5
22
8 10
28
15
1 2
9
4
6 7
11 12
14
18
24 26
29 30
(2,4) Conclusion
The
Red-Black Trees
What are they?
Their relation to 2-4 trees
Deletion in red-black trees
Red-Black Trees
A red black tree is a binary search tree in which
each node is colored red or black.
The root is colored black.
A red node can have only black children.
If a node of the BST does not have a left and/or
right child then we add an external node.
External nodes are not colored.
The black depth of an external node is defined
as the number of black ancestors it has.
In a red-black tree every external node has the
same black depth.
Black height of
tree is 2
Black height of
tree is 2
Double red
Example
9
4
13
7
2
1
19
11
17
4 9 13
1 2 3
5 7
11
17 19
Example
13
18
3 8 10
1 2
4 5 6
11 12
14 15
20
13
8
18
10
3
2
1
5
4
15
12 14
11
20
Deletion: preliminaries
To
11
17
17
Deletion: preliminaries(2)
Hence we can assume that the node deleted is a
black leaf.
Removing this reduces black depth of an
external node by 1.
Hence, in a general step, we consider how to
reorganize the tree when the black height of
some subtree goes down from h to h-1.
a is red
a is black
a
b is red
b is black
a
Both c
are black
b
c
a
b
c
Some c is
red
a
b
a
b
b
c
c
d
Some c is
red
Both c
are
black
Both d
are black
d
d
Some d is red
b
c
a
b
Deletion: case1.1
If parent is a red node (a).
Then it has a child (b) which must be black
If b has a red child (c)
a
b
h to h-1
h-1
c
h-1
h-1
h-1
h-1
h-1
h-1
a
a
b
c
h-1
h to h-1
h-1
h-1
b c
h to h-1
h-1
h-1
h-1
h-1
h-1
b a
a
b
c
h-1
h to h-1
h-1
h-1
h-1
h-1
b c
b a
d c
h-1
h-1
h to h-1
h-1
c
h
h-1
h-1
h-1
b a
c a
h-1
h to h-1
h-1
h-1
h-1
h-1
h-1
h-1
a
c d
h-1
a
c
h to h-1
h-1
h-1
h-1
h-1
a
c a
h-1
Deletion: Summary
In
Insertion(2)
Since
No problem
Insertion: case 1
Parent of inserted node (a) is red.
Parent of (a) must be black (b)
The other child of (b) is black (c).
b
c
a
k
Insertion: Case 2
Parent of inserted node (a) is red.
Parent of (a) must be black (b)
The other child of (b) is also red (c).
b
c
b
c b a
k
a k
(a,b) Trees
A multiway search tree.
Each node has at least a
and at most b children.
Root can have less than
a children but it has at
least 2 children.
All leaf nodes are at the
same level.
Height h of (a,b) tree is
at least logb n and at
most loga n.
12
5 10
3 4
6 8
15
11
13 14
17
Insertion
21
23
29
13 22
18
3 8
1 2
4 5
14 15
25
20
24
26 28
Insertion(2)
29
13 22
1 2
4 5
25
18
3 8
14 15
20
23 24
26 28
Insertion(3)
A
Deletion
If after deleting a key a node becomes empty
then we borrow a key from its sibling.
Delete 20
13
5
22
8 10
18
1 2
6 7
11 12
14 15
24 28
20
23
26
29 30
Deletion(2)
If sibling has only one key then we merge with it.
The key in the parent node separating these two
siblings moves down into the merged node.
Delete 23
13
5
22
8 10
24 28
18
1 2
6 7
11 12
14 15
20
23
26
29 30
Deletion(3)
In
Conclusion
The
Hard Disks
Large amounts of
storage, but slow
access!
Identifying a page
takes a long time (seek
time plus rotational
delay 5-10ms),
reading it is fast
Algorithm analysis
sequential reads
random reads
Principles
Pointers
Principles (2)
A
01
02
03
04
05
06
07
x a pointer to some
DiskRead(x)
operations that access
DiskWrite(x) //omitted
other operations, only
object
and/or modify x
if nothing changed
access no modify
Operations:
DiskRead(x:pointer_to_a_node)
DiskWrite(x:pointer_to_a_node)
AllocateNode():pointer_to_a_node
1 node
1000 keys
1000
1001
1000
1000
1001
1000
1001
1001
1000
1000
1001 nodes,
1,001,000 keys
1000
1,002,001 nodes,
1,002,001,000 keys
B-tree Definitions
Keys separate the ranges of keys in the subtrees. If ki is an arbitrary key in the subtree ci[x]
then ki keyi[x] ki+1
Height of a B-tree
t-1
t-1
2t
t-1
t-1
h
t-1
t-1
n 1 + (t 1) 2t i 1 = 2t h 1
i =1
t-1
B-tree Operations
An
(simple)
Creating an empty tree (trivial)
Insertion (complex)
Deletion (complex)
Searching
Straightforward
generalization of a binary
tree search
BTreeSearch(x,k)
01
02
03
04
05
06
08
09
10
i 1
while i n[x] and k > keyi[x]
i i+1
if i n[x] and k = keyi[x] then
return(x,i)
if leaf[x] then
return NIL
else DiskRead(ci[x])
return BTtreeSearch(ci[x],k)
disk!
BTreeCreate(T)
01
02
03
04
05
x AllocateNode();
leaf[x] TRUE;
n[x] 0;
DiskWrite(x);
root[T] x
Splitting Nodes
Nodes
... N W ...
... N S W ...
y = ci[x]
y = ci[x]
P Q R S T V W
T1
]
]
[1 x
[1 x [x]
y i- ey i ey i+
e
k
k
k
...
P Q R
T8
z = ci+1[x]
T V W
x: parent node
y: node to be split and child of x
i: index in x
z: new node
]
[1 x [x]
y i- ey i
e
k
k
... N W ...
y = ci[x]
P Q R S T V W
T1
...
T8
Inserting Keys
Done
BTreeInsert(T)
r root[T]
if n[r] = 2t 1 then
s AllocateNode()
root[T] s
leaf[s] FALSE
n[s] 0
c1[s] r
BTreeSplitChild(s,1,r)
BTreeInsertNonFull(s,k)
else BTreeInsertNonFull(r,k)
root[T]
A D F H L N P
r
T1
The
...
T8
A D F
L N P
Inserting Keys
BtreeNonFull
leaf insertion
internal node:
traversing tree
Insertion: Example
initial tree (t = 3)
G M P X
A C D E
J K
N O
R S T U V
Y Z
B inserted
G M P X
A B C D E
J K
Q inserted
A B C D E
N O
R S T U V
Y Z
G M P T X
J K
N O
Q R S
U V
Y Z
L inserted
G M
A B C D E
J K L
T X
N O
C G M
D E F
U V
Y Z
U V
Y Z
F inserted
A B
Q R S
J K L
T X
N O
Q R S
Deleting Keys
Done recursively, by starting from the root and
recursively traversing down the tree to the leaf
level
Before descending to a lower level in the tree,
make sure that the node contains t keys (cf.
insertion < 2t 1 keys)
BtreeDelete distinguishes three different
stages/scenarios for deletion
initial tree
C G M
A B
D E F
J K L
T X
N O
C G M
D E
U V
Y Z
U V
Y Z
F deleted:
case 1
A B
Q R S
J K L
T X
N O
Q R S
M deleted:
case 2a
C G L
A B
D E
J K
x
N O
T X
Q R S
U V
Y Z
G deleted:
case 2c
A B
C L
x-k
D E J K
N O
y = y+k + z - k
T X
Q R S
U V
Y Z
... k ...
ci[x]
...
C L P T X
delete B
ci[x]
A B
... k
A B
... k ...
E J K
N O
Q R S
U V
Y Z
sibling
E L P T X
B deleted:
A C
J K
N O
Q R S
U V
Y Z
x
ci[x]
... l k m...
m
... l
A
... l m ...
...l k m ...
A B
ci[x]
delete D
A B
C L
D E J K
D deleted:
A B
sibling
N O
T X
Q R S
U V
Y Z
U V
Y Z
C L P T X
E J K
N O
Q R S
Two-pass Operations
Simpler,
Text
editing
Term rewriting
Lexical analysis
Information retrieval
And, bioinformatics
a
g
a
g
a
g
g
a
Pga Pag
a
g
a
g
KMP (2)
The key idea is that if we have
successfully matched the prefix P[1i1] of the pattern with the substring T[ji+1,,j-1] of the input string and P(i)
T(j), then we do not need to reprocess
any of the suffix T[j-i+1,,j-1] since we
know this portion of the text string is
the prefix of the pattern that we have
just matched.
6
1
a
2
b
3
a
4
a
5
b
6
a
7
b
8
a
9
a
10
b
11
a
12
a
13
b
Hence, h(11) = 6.
T: a
P: a
P:
h11 = 6
Shift P to the right so that P[1,,h(i)] aligns with the suffix
T[k- h(i),,k-1].
They must match because of the definition of h.
In other words, shift P exactly i h(i) places to the right.
If there is no mismatch, then shift P by m h(m) places to
the right.
8
Suppose not,
T:
P before shift
P after shift
missed occurrence of P
Correctness of KMP.
First mismatch in position
k of T and in pos. i+1 of P.
T(k)
T:
P before shift
P after shift
missed occurrence of P
|| > || = h(i)
It is a contradiction.
10
An Example
Given:
Pattern:
Next funciton:
Array h:
1
a
0
0
Input string:
2
b
1
3
a
0
4
a
2
5
b
1
6
a
0
7
b
4
8
a
0
9
a
2
4
10
b
1
5
11
a
0
6
12
a
7
4
13
b
1
5
abaababaabacabaababaabaab.
i+1 = 12
Scenario 1:
a b a a b a b a a b a a b
a b a a b a b a a b a c a b a a b a b a a b a a b
k = 12
h(11) = 6 i h(i) = 11 6 = 5
i = 6 , h(6)= 3
i+1
a b a a b a b a a b a a b
a b a a b a b a a b a c a b a a b a b a a b a a b
k
11
An Example
Scenario 3:
i = 3, h(3) = 1
i+1
a b a a b a b a a b a a b
a b a a b a b a a b a c a b a a b a b a a b a a b
k
Subsequently
i = 2, 1, 0
a b a a b a b a a b a a b
a b a a b a b a a b a c a b a a b a b a a b a a b
k
12
Complexity of KMP
In the KMP algorithm, the number of character
comparisons is at most 2n.
After shift, no character will be
compared before this position
no backtrack
T:
P before shift
P after shift
Computing h: an example
Given:
Pattern: h:
Failure function
Text:
10 11 12 13
10 11 12 13
Pattern:
Pattern
b
h(11)=6
b
h(6)=3
KMP - Analysis
The
searching
16
Tries
Standard Tries
Compressed Tries
Suffix Tries
Text Processing
We have seen that preprocessing the pattern
speeds up pattern matching queries
After preprocessing the pattern in time
proportional to the pattern length, the KMP
algorithm searches an arbitrary English text in
time proportional to the text length
If the text is large, immutable and searched for
often (e.g., works by Shakespeare), we may want
to preprocess the text instead of the pattern in
order to perform pattern matching queries in time
proportional to the pattern length.
Standard Tries
The standard trie for a set of strings S is an ordered
tree such that:
each node but the root is labeled with a character
the children of a node are alphabetically ordered
the paths from the external nodes to the root yield
the strings of S
Eg. S = { bear, bell, bid, bull, buy, sell, stock, stop }
Applications of Tries
A
Compressed Tries
Trie with nodes of degree at least 2
Obtained from standard trie by compressing
chains of redundant nodes.
Standard Trie:
Compressed Trie:
10
13
Suffix Tree
A suffix tree T for string S is a rooted directed tree whose
edges are labeled with nonempty substrings of S.
Each leaf corresponds to a suffix of S in the sense that
the concatenation of the edge-labels on the unique path
from the root to the leaf spells out this suffix.
Each internal node, other than the root, has at least two
children.
xa b x a c
No two out-edges of
1
c
b
a node can have edge-labels
4
x
with the same first character. a
c
6
2
Suffix tree for string xabxac.
14
15
Compact Representation
16
b x a c
c
4
6
2
Suffix tree for string xabxac.
17
18
Example
x a b x a
1
c
a
4
x
x
a
5
6
21
P is exhausted or
no more matches are possible
Example
a
Root
y
a
w
x
w
awyawxawxz
x
awxz
awxawxz
awyawxawxz
23
24
Time complexity
Preprocessing : O(|T| )
Searching : O(|P| + k),
where k is # occurences of P in T
Space complexity
O(|P| + |T| )
25
Data Compression
File
Compression
Huffman Tries
0
0
C
1
0
A
0
D
1
B
ABRACADABRA
01011011010000101001011011010
1
File Compression
Encoding Trie
Encoding Trie(2)
We use an encoding trie to satisfy this prefix rule
the characters are stored at the external
nodes.
a left child (edge) means 0
a right child (edge) means 1
0
A = 010
B = 11
0
C
0
D
1
C = 00
B
D = 10
R = 011
Example of Decoding
0
A = 010
B = 11
trie
0
C
encoded text:
text:
0
D
1
B
C = 00
D = 10
R = 011
01011011010000101001011011010
ABRACADABRA
6
Trie this!
10000111110010011000111011110001010100110
100
0
0
R
0
0
S
1
W
1
0
T
1
B
0
E
1
C
0
K
1
N
7
Optimal Compression
ABRACADABRA
0101101101000010100101101010 0
29 bits
C
1
0
0
A
1
0
ABRACADABRA
001011000100001100101100
24 bits
D
8
Construction algorithm
Given
10
2
B
2
R
2
C
D 1
A
B
5
A
6
4
11
6
4
11
0
1
6
5
A
4
0
2
B
2
1
D
12
1
6
5
A
0
4
0
2
B
2
1
ABRACADABRA
0 100 101 0 110 0 111 0 100 1010
23 bits
13
frequency
5
A
2
B
2
R
2
C 1
5
A
2
B
D 1
4
2
R
2
C 1
D 1
14
2
B
4
2
R
2
C 1
D 1
5
A
2
B
2
R
2
C 1
D 1
15
5
A
2
B
2
R
2
C
D 1
11
6
5
A
2
B
2
R
2
C
D 1
16
11
5
A
2
B
1
0
2
R
1
0
C 1
1
D 1
ABRACADABRA
0 10 110 0 1100 0 1111 0 10 110 0
23 bits
17
19
20
Priority Queues
Scheduling example
The priority queue ADT
Implementing a priority queue with a sequence
Binary Heaps
Insertion in a Heaps and Heapify
Scheduling
In a multi-user computer system, multiple users
submit jobs to run on a single processor.
We assume that the time required by each job is
known in advance. Further, jobs can be
preempted (stopped and resumed later)
One policy which minimizes the average waiting
time is SRPT (shortest remaining processing
time).
The processor schedules the job with the
smallest remaining processing time.
If while a job is running a new job arrives with
processing time less than the remaining time of
current job, the current job is preempted.
Priority Queues
A priority queue is an ADT(abstract data type)
for maintaining a set S of elements, each with an
associated value called priority
A PQ supports the following operations
Comparators
The most general and reusable form of a priority
queue makes use of comparator objects.
Comparator objects are external to the keys that
are to be compared and compare two objects.
When the priority queue needs to compare two
keys, it uses the comparator it was given to do the
comparison.
Thus a priority queue can be general enough to
store any object.
The comparator ADT includes:
Priority Queues
Applications:
job
10
(Binary) Heaps
A binary tree that stores priorities (or priorityelement) pairs at nodes
Structural
property:
All levels except last
are full. Last level is
left-filled.
Heap property:
Priority of node is at
least as large as that 18
of its parent.
43
11
17
13
21
23
26
19
29
17
31
11
Examples of non-Heaps
Heap
property violated
11
19
13
18
43
21
23
26
19
29
17
31
12
Example of non-heap
Last
13
18
43
21
26
19
29
17
31
13
Height of a heap
Suppose
15
Implementing Heaps
Parent (i)
return i/2
11
17
Left (i)
return 2i
21
18
Right (i)
return 2i+1
43
13
23
19
17
26
1 2 3 4 5 6 7 8 9 10
11 17 13 18 21 19 17 43 23 26
Level: 0
a binary representation, a
multiplication/division by two is left/right shift
Adding 1 can be done by adding the lowest bit
17
Insertion in a Heap
Insert
12
Insert 8
11
17
13
18
43
21
23
26
17
19
29
31
12
18
Enlarge heap
Consider path from root to inserted node
Find topmost element on this path with higher priority that
that of inserted element.
Insert new element at this location by shifting down the
other elements on the path
12
11
17
13
18
43
21
23
26
19
29
31
17
19
Correctness of Insertion
The only nodes whose contents change are the
ones on the path.
Heap property may be violated only for children
of these nodes.
But new contents of these nodes only have
lower priority.
11
So heap property not violated.
17
13
12
18
43
21
23
26
19
29
31
17
20
Heapify
i
Heapify
Heap property violated at node with index 1 but
subtrees rooted at 2, 3 are heaps.
heapify(1)
17
10
11
16
43
21
23
26
13
29
31
12
19
22
16
43
21
23
26
29
13
31
12
19
23
24
Binary Heaps
Delete-min
Building
Delete-min
The
Delete-min in a Heap
8
10
11
18
43
21
23
26
13
29
31
12
19
17
of heap.
Heapify(1)
8
10
11
16
43
21
23
26
13
29
31
12
19
17
Building a heap
We start from the bottom and move up
All leaves are heaps to begin with
23
43
26
21
10
16
12
13
29
11
31
19
17
5
n +1
n +1 n +1
T ( n) = O
+
2+
3 + ... + 1 k
4
8
2
lg n
i
= O ( n + 1) i
i =1 2
= O ( n)
since
lg n
i =1
i
1/ 2
=
=2
2
i
2 (1 1/ 2 )
8
1
if x < 1 //differentiate
1 x
i =0
1
i 1
//multiply by x
i
x
=
2
i =1
(1 x )
i
x
=
i
i
x
=
i =1
(1 x )
//plug in x =
1
2
i 1/ 2
=
=2
i
1/ 4
i =1 2
Heap Sort
Create
a heap.
Do delete-min
8
repeatedly till
heap becomes
10
empty.
To do an in place
21
12
sort, we move
deleted element
16 23 43 29
to end of heap.
11
17
13
26
19
31
10
O(log n)
Heapify: O(log n)
Find minimum: O(1)
Delete-min: O(log n)
Building a heap: O(n)
Heap Sort: O(nlog n)
11
Why Sorting?
When
Worst-case
Heap
sort
Worst-case
Merge(A,
Merge(A, p,
p, q,
q, r)
r)
Take
Take the
the smallest
smallest of
of the
the two
two topmost
topmost elements
elements of
of
sequences
sequences A[p..q]
A[p..q] and
and A[q+1..r]
A[q+1..r] and
and put
put into
into the
the
resulting
resulting sequence.
sequence. Repeat
Repeat this,
this, until
until both
both sequences
sequences
are
are empty.
empty. Copy
Copy the
the resulting
resulting sequence
sequence into
into A[p..r].
A[p..r].
MergeSort (Example)
MergeSort (Example)
MergeSort (Example)
MergeSort (Example)
10
MergeSort (Example)
11
MergeSort (Example)
12
MergeSort (Example)
13
MergeSort (Example)
14
MergeSort (Example)
15
MergeSort (Example)
16
MergeSort (Example)
17
MergeSort (Example)
18
MergeSort (Example)
19
MergeSort (Example)
20
MergeSort (Example)
21
MergeSort (Example)
22
MergeSort (Example)
23
MergeSort (Example)
24
MergeSort (Example)
25
MergeSort (Example)
26
MergeSort (Example)
27
MergeSort (Example)
28
29
30
31
32
33
To sort n numbers
if n=1 done!
recursively sort 2 lists of
numbers n/2 and n/2
elements
merge 2 sorted lists in (n)
time
Strategy
break problem into similar
(smaller) subproblems
recursively solve
subproblems
combine solutions to answer
34
Recurrences
Running times of algorithms with Recursive
calls can be described using recurrences
A recurrence is an equation or inequality that
describes a function in terms of its value on
smaller inputs
solving_trivial_problem
if n = 1
T ( n) =
num_pieces T (n / subproblem_size_factor) + dividing + combining if n > 1
T ( n) =
2T (n / 2) + (n) if n > 1
35
Solving Recurrences
Substitution method
guessing the solutions
verifying the solution by the mathematical induction
Recursion-trees
Master method
T (n) =
2T ( n / 2) + n if n > 1
T (n) = 2T ( n / 2 ) + n substitute
= 2 ( 2T ( n / 4 ) + n / 2 ) + n expand
= 22 T (n / 4) + 2n substitute
= 22 (2T (n / 8) + n / 4) + 2n expand
= 23 T (n / 8) + 3n
T (n) = 2i T (n / 2i ) + in
= 2lg n T (n / n) + n lg n = n + n lg n
37
38
39
More Sorting
radix sort
bucket sort
in-place sorting
how fast can we sort?
Radix Sort
3.Recursion
recursively sort top subarray, ignoring leftmost bit
recursively sort bottom subarray, ignoring leftmost bit
Time to sort n b-bit numbers:
O(b n)
3
At step k the two keys are put in the correct relative order
Because of stability, the successive steps do not change the
relative order of the two keys
9
For Instance,
Consider a sort on an array with these two keys:
10
11
O(bn)
As you might have guessed, we can perform a stable sort based
on the keys kth digit in O(n) time.
The method, you ask? Why its Bucket Sort, of course.
12
Bucket Sort
BASICS:
n numbers
Each number {1, 2, 3, ... m}
Stable
Time: O (n + m)
For example, m = 3 and our array is:
(note that there are two 2s and two 1s)
First, we create M buckets
13
Bucket Sort
Each element of the array is put in one of the m buckets
14
Bucket Sort
In-Place Sorting
16
17
18
19
Graphs
Definitions
Examples
The Graph ADT
What is a Graph?
E= {(a,b),(a,c),(a,d),
(b,e),(c,d),(c,e),
(d,e)}
c
d
V= {a,b,c,d,e}
Applications
electronic circuits
CS201
start
more examples
scheduling (project planning)
A typical student day
wake up
eat
cs201 meditation
work
more cs201
play
cs201 program
sleep
dream of cs201
4
Graph Terminology
adjacent vertices: vertices connected by an edge
degree (of a vertex): # of adjacent vertices
3
3
3
5
Graph Terminology(2)
path:
a
c
c
e
d
abedce
d
bedc
bec
c
e
d
b
acda
c
d
e
7
More Terminology
connected graph: any two vertices are
connected by some path
connected
not connected
More Terminology(2)
subgraph of G
9
More Terminology(3)
10
tree
tree
for est
tree
tree
11
Connectivity
Let n = #vertices, and m = #edges
Complete graph: one in which all pairs of
vertices are adjacent
How many edges does a complete graph have?
There are n(n-1)/2 pairs of vertices and so
m = n(n -1)/2.
Therefore, if a graph is not complete,
m < n(n -1)/2
n=5
m = (5 4)/2 = 10
12
More Connectivity
n = #vertices
m = #edges
For a tree m = n 1
If m < n - 1, G is not
connected
n=5
m=4
n=5
m=3
13
Spanning Tree
G
spanning tree of G
Failure on any edge disconnects system (least fault
tolerant)
14
Gilligan
s Isle?
Pregal River
A
B
15
C
A
D
B
16
size()
Return the number of vertices + number of edges of G.
isEmpty()
elements()
positions()
swap()
replaceElement()
numVertices()
numEdges()
vertices()
edges()
17
18
19
Update Methods
NW 35
DL 247
BOS
AA 49
LAX
DL 335
AA 1387
DFW
E
AA 523
JFK
AA 41 1
UA 120
MIA
AA 903
ORD
UA 877
TW 45
SFO
21
DFW
LAX
MIA
Edge List
The edge list structure simply stores the vertices and the
edges into unsorted sequences.
Easy to implement.
Finding the edges incident on a given vertex is inefficient
since it requires examining the entire edge sequence
E
NW 35
DL 247
BOS
AA 49
LAX
DFW
JFK
AA 411
UA 120
MIA
AA 903
ORD
UA 877
TW 45
SFO
23
Time
O(1)
numVertices, numEdges
O(1)
vertices
O(n)
O(m)
elements, positions
O(n+m)
O(1)
O(m)
O(1)
removeVertex
O(m)24
b
c
e
d
b
Space =
(N + deg(v)) = (N + M)
25
35
DL 247
AA
BOS
in
DL
DL
LAX
out
NW
49
in
35
247
AA
AA
335
41 1
1387
DFW
out
49
AA
UA
in
120
UA
877
AA
523
523
AA
JFK
out
AA1387
AA
in
DL335
AA
49
NW
41 1
120
MIA
out
35
UA
in
AA1387
AA
TW
903
AA
903
UA
ORD
out
in
DL
247
AA523
UA
120
AA
903
AA
DL
335
41 1
877
TW
45
SFO
out
UA
in
877
TW
out
45
45
O(1)
numVertices, numEdges
O(1)
vertices
O(n)
O(m)
elements, positions
O(n+m)
O(1)
O(deg(v))
O(1)
O(min(deg(
u),deg(v)))
27
O(deg(v))
b
a
b
c
d
e
c
d
F
T
T
T
F
T
F
F
F
T
T
F
F
T
T
T
F
T
F
T
F
T
T
T
F
28
29
30
NW 35
DL 247
BOS
AA 49
LAX
DL 335
AA 1387
DFW
E
AA 523
JFK
AA 41 1
UA 120
MIA
AA 903
ORD
UA 877
TW 45
SFO
1
Edge List
DL 247
BOS
AA 49
LAX
DFW
JFK
AA 411
UA 120
MIA
AA 903
ORD
UA 877
TW 45
SFO
3
Time
O(1)
numVertices, numEdges
O(1)
vertices
O(n)
O(m)
elements, positions
O(n+m)
O(1)
O(m)
O(1)
removeVertex
O(m) 4
b
c
e
d
b
Space =
(N + deg(v)) = (N + M)
35
DL 247
AA
BOS
in
DL
DL
LAX
out
NW
49
in
35
247
AA
AA
335
41 1
1387
DFW
out
49
AA
UA
in
120
UA
877
AA
523
523
AA
JFK
out
AA1387
AA
in
DL335
AA
49
NW
41 1
120
MIA
out
35
UA
in
AA1387
AA
TW
903
AA
903
UA
ORD
out
in
DL
247
AA523
UA
120
AA
903
AA
DL
335
41 1
877
TW
45
SFO
out
UA
in
877
TW
out
45
45
O(1)
numVertices, numEdges
O(1)
vertices
O(n)
O(m)
elements, positions
O(n+m)
O(1)
O(deg(v))
areAdjacent(u, v)
O(min(deg(
u),deg(v)))
O(1)
removeVertex(v)
O(deg(v))
7
b
a
b
c
d
e
c
d
F
T
T
T
F
T
F
F
F
T
T
F
F
T
T
T
F
T
F
T
F
T
T
T
F
The adjacency
matrix structures
augments the edge
list structure with a
matrix where each
row and column
corresponds to a
vertex.
10
Compilers
Graphics
Maze-solving
Mapping
Networks: routing, searching, clustering, etc.
11
12
13
BFS Example
r
Q s
Q r t x
1 2 2
Qw r
1 1
Q t x v
2 2 2
14
BFS Example
r
Q x v u
2 2 3
Q u y
3 3
Q v u y
2 3 3
Q y
15
Q -
16
a)
0
A
c)
b)
d)
17
More BFS
0
18
BFS Algorithm
BFS(G,s)
01 for each vertex u V[G]-{s}
02
color[u] white
03
d[u]
04
[u] NIL
05 color[s] gray
06 d[s] 0
07 [u] NIL
08 Q {s}
09 while Q do
10
u head[Q]
11
for each v Adj[u] do
12
if color[v] = white then
13
color[v] gray
14
d[v] d[u] + 1
15
[v] u
16
Enqueue(Q,v)
17
Dequeue(Q)
18
color[u] black
Init all
vertices
Init BFS
with s
Handle all us
children
before
handling any
children of
children
19
BFS Properties
Given an undirected graph G = (V,E), BFS
discovers all vertices reachable from a
source vertex s
For each vertex v at level i, the path of the BFS
tree T between s and v has i edges, and any
other path of G between s and v has at least i
edges.
If (u, v) is an edge then the level numbers of u
and v differ by at most one.
It computes the shortest distance to all
reachable vertices
21
Predecessor subgraph of G
G = (V , E )
E = {([v], v) E : v V {s}}
G is a breadth-first tree
22