You are on page 1of 7

What is a good program and what is a bad

program?
Three criteria:
1, Correct - testing, verification
Data Structures and Algorithms
2, Efficient - what we are going to study in this
course is mainly concerned with efficiency.
Textbook: 3, Simple.
Data Structures and Algorithms Analysis in C
Programs, including the ones you are going to
by Mark Allen Weiss
write, will be judged by all these criteria.
Florida International University
http://www.cs.fiu.edu/∼ weiss Question: why do we need to worry about the
efficiency, since computers have become more and
Source code in the textbook can be found by
more powerful?
following the link.
Answer: there are problems which are difficult or
Data structure: methods of organising (large
very difficult, and even on the fastest computers,
amount of) data.
they still take a lot of time or space to solve.
Algorithm: the way to process data.
Moreover, some problems can be solved efficiently,
but bad designs will take too much time or space.
Therefore, we must study methods that lead to
good solutions.

UM Course Data I, Introduction 3 UM Course Data I, Introduction 4

We next prove that no program can be written to


solve halting problem.
Proof: we prove by contradiction. Suppose a
program can be written to solve the halting
problem, call it Check(P,X), for an arbitrary
program P and input X. Construct the following
Complexity of Algorithms
program

Undecidable Problems. Loop (P)


{ if Check(P,P)==No
These are the hardest problems: impossible to
/* Check decides that P does not terminate */
solve by computers.
/* with P as input */
Halting problem: a classic undecidable problem. return yes;
Is it possible to check if an arbitrary program else loop forever;
terminates on an input? }
Consider P=Loop(X). Loop(P) terminates iff
Check(P,P) returns No, that is Loop(X) does not
terminate with P as input. Contradiction.
Loop(P) cannot exist, and therefore Check(P,X)
cannot exist.
9/

of c

of c
a7

For comparison: the ‘Big Bang’ was approximately 15 billion year


n
n
a
Different Running Time (assume the computer executes 10
Tractable and Intractable Problems

of centuries
a 184-digit
40 trillion
centuries
1/1000 s

number
0.28 h
100
Among decidable problems, there is no clear line
separating hard from easy problems. Usually, we
expect tractable problems to be solvable in

of centuries
a 69-digit
1/4000 s

number
0.52 m polynomial time. Problems that have

3.6 y
50

exponentical solutions are considered intractable.


Note this is not absolute, e.g., 1.0000001N is
smaller than N 1000 for even very large N .

0.33 trillion
million instructions per second)

1/25000 s

centuries
However, in real life, polynomial is more likely N 2
0.32 s
0.1 s
20

and exponentical is likely 2N .


There are many important problems that we do
not know if they are tractable or not: nobody can
1/100000 s

1/10000 s

find polynomial algorithms for them but nobody


1/100 s

0.28 h
10

can prove that no polynomial algorithms exist


ago. either.
2

2N

N
N
N

UM Course Data I, Introduction 7 UM Course Data I, Introduction 8

For a weighted directed graph, which can model


e.g., city maps, there are two well-known
problems, which are similar but in fact very An Example of Efficiency
different in nature.
Fibonacci numbers:

Traveling Salesman Problem F ib(0) = F ib(1) = 1


F ib(N ) = F ib(N − 1) + F ib(N − 2)
Find a tour (a simple cycle that includes all
vertices) that the total cost is the smallest. We next consider two programs that calculate the
Nobody knows if the Traveling Salesman Problem N-th Fibonacci number.
has polynomial algorithm or not (NP problem). The direct implementation using recursion is
inefficient.
Shortest Path Problem Question: why this implementation is inefficient?

Find the shortest paths from one vertex to all An efficient implementation will use iteration
other vertices. instead.

This problem can be solved by an algorithm with When the input gets bigger, the difference of the
O(N2 ) (we will define the O soon) running time, running time of the two programs become
where N is the number of vertices. Therefore, this significant.
is an easy problem. The algorithm is in Chapter
9.
Running time of the recursive Fibonacci
program
Algorithm Analysis Method
Let the running time be T(N), where N is the
Use positive functions to denote running time of
input.
an algorithm.
T (0) = T (1) = 1 Definitions
T (N ) = T (N − 1) + T (N − 2) + 2 • T(N)=O(f(N)) if there are positive constant c
When N=0 or N=1, the program just executes a and n such that T(N)≤cf(N) when N≥n.
return statement, and we assume this takes one • T(N)=Ω(g(N)) if there are positive constant c
unit time. When N>1, the program first and n such that T(N)≥cg(N) when N≥n.
calculates Fib(N-1) and Fib(N-2) and they take
• T(N)=Θ(h(N)) if and only if T(N)=O(h(N))
T(N-1) and T(N-2) time respectively, then add
and T(N)=Ω(h(N)).
the two values and return and we assume these
two operations take two unit time. • T(N)=o(p(N)) if and only if T(N)=O(p(N))
and T(N)6= Θ(p(N)).
We can easily prove (how?) T(N)≥Fib(N). It can
be shown Fib(N)≥ 1.5N . Therefore, the program
has exponentical running time.

UM Course Data I, Introduction 11 UM Course Data I, Introduction 12

Big-Oh, Big-Omega, Big-Theta, Little-Oh capture


growth rate.
Examples:
N2 =O(N3 ), N3 = Ω(N2 ) Rule 1
2N2 =o(N3 ), 2N2 =Θ(N2 ).
If T1 (N)=O(f(N)) and T2 (N)=O(g(N)), then

Example: 1000N and N2 (a) T1 (N)+T2 (N)=O(f(N)+g(N))


1000N is larger than N2 for small values of N. But (b) T1 (N)∗T2 (N)=O(f(N)∗g(N))
N2 grows at a faster rate, and eventaully larger
Proof: There exist positive constants c1 , n1 , c2 , n2 ,
than 1000N.
T1 (N ) ≤ c1 f (N ) when N ≥ n1 , T2 (N ) ≤ c2 g(N )
Let c=1, n=1000 or c=100, n=10, it follows from
when N ≥ n2 . Therefore,
the definitions that 1000N=O(N2 ) and
1000N=o(N2 ). T1 (N ) + T2 (N ) ≤ max(c1 , c2 )(f (N ) + g(N ))
T1 (N )T2 (N ) ≤ c1 c2 f (N )g(N )
Do not include constant or lower order terms
when N ≥ max(n1 , n2 ).
inside a Big-Oh. For example, do not say
T(N)=O(2N2 ) or T(N)=O(N2 +N), just Similar rules can be found.
T(N)=O(N2 ).
It is easy to show (how?) that T(N)=O(f(N)) if
and only if f(N)=Ω(T(N)).
Rule 2
• If T(N) is a polynomial of degree k, then
Rule 1′ T(N)=Θ(Nk ).
If T1 (N)=O(f(N)), T2 (N)=O(g(N)) and
• logk N=O(N)
f(N)=O(g(N)), then T1 (N)+T2 (N)=O(g(N)).
Rule 3: Growth Rate of Common Functions
Proof: There exist positive constants
c1 , n1 , c2 , n2 , c3 , n3 , T1 (N ) ≤ c1 f (N ) when c
N ≥ n1 , T2 (N ) ≤ c2 g(N ) when N ≥ n2 , logN
f (N ) ≤ c3 g(N ) when N ≥ n3 . Therefore,
log2 N
T1 (N ) + T2 (N )≤ c1 f (N ) + c2 g(N )
N
≤ c1 c3 g(N ) + c2 g(N )
NlogN
= (c1 c3 + c2 )g(N )
N2
when N ≥ max(n1 , n2 , n3 ).
N3
2N

UM Course Data I, Introduction 15 UM Course Data I, Introduction 16

Calculate Growth Rate by Limit

Calculate limN →∞ f(N)/g(N), and decide the


Calculate Growth Rate by Simple Algebra
growth rate according to the following table
• 0: f(N)=o(g(N)) Example:
• c6= 0: f(N)=Θ(g(N))
log2 N = O(N )
• ∞: g(N)=o(f(N)) log N = O(N 0.5 )
Example: It follows from N log N = O(N · N 0.5 ) = O(N 1.5 )
N log N
lim =0
N →∞ N 1.5
that N log N = o(N 1.5 ).
Calculation of Running Time

Direct Calculation
Computation Model 3
A simple example: Calculate ΣN
i=1 i

int Sum(int N)
Assume any single statement takes one unit time
{
to execute. Let the input size be N, Tavg (N) and
int i, PartialSum;
Tworst (N) are the average and worst case
PartialSum=0; // 1
running time. Obviously, Tavg (N) ≤Tworst (N).
for (i=1;i<=N;i++) // 2
Usually, we only only study worse-case running
PartialSum+=i*i*i; // 3
time, because
return PartialSum; // 4
• it provides an upper bound }
• easier to analyse than the average case Line 1 and 4: one unit each
Sometimes, we also study the best case running Line 3: two *, one +, one assigenment, together
time. 4N
Line 2: one initialization, N+1 tests, N
increments, together 2N+2
Total: 6N+4=O(N)

UM Course Data I, Introduction 19 UM Course Data I, Introduction 20

Direct calculation is complicated. For Big-Oh,


using the following rules is easier. Rule 2: Nested Loops
Analyse the loops inside out. The total running
Rule 1: Loops time of a statement inside a group of nested loops
is the running time of the statement multiplied by
The running time of a loop is the running time of the product of the sizes of all the loops.
the statements inside the loop (including tests)
times the number of iterations. Example

Example for (i=0;i<N;i++)


for (j=0;j<N;j++)
for (i=0;i<N;i++) k++;
k++;
O(N2 )
O(N)
Rule 3: Consecutive Statements
They just add up (the maximum is the one that
Rule 4: if/else
counts, why?).
The running time of an if/else statement is not
Example
more than the running time of the test plus the
for (i=0;i<N;i++) larger of the running time of the two branching
A[i]=0; statements.
for (i=0;i<N;i++)
for (j=0;j<N;j++)
Basic Strategy: analyse from inside to outside
A[i]+=A[j]+i+j
O(N2 )

UM Course Data I, Introduction 23 UM Course Data I, Introduction 24

More Examples:
Analysis of Recursion
Use recurrence equations. Search
longint Factorial(int N) Given an array of integers A[0],A[1],...,A[N-1],
{ and an integer X, find i such that A[i]=X or
if (N<=1) return i=-1 if X is not in the array.
return 1; Linear Search
else
Search the array one by one
return N*Factorial(N-1);
O(N)
}
Binary Search
T (1) = 1 If the array is sorted, A[0]≤A[1]≤ · · · ≤A[N-1],
T (N ) = 2 + T (N − 1) then check the middle element, if it equals X,
then the job is done, and if it does not equal X,
T (N ) = 2 + 2 + · · · + 2 +T (1) then either search the first half or the second half
| {z }
N −1 of the array.
= O(N )
Code: Binary search, Figure 2.9
Running time O(logN). How to calculate this?
Exponentiation: X N
Obvious solution, N-1 multiplication, O(N),
inefficient when N is large.
An O(logN) solution

X 62
=(X 2 )31
An algorithm is =(X 4 )15 X 2
• O(logN) if it takes constant (i.e., O(1)) time =(X 8 )7 X 4 X 2
to cut the problem size by a fraction (usually =(X 16 )3 X 8 X 4 X 2
1/2).
=(X 32 )X 16 X 8 X 4 X 2
• O(N) if it takes constant time to cut the
Code: Efficient exponentiation, Figure 2.11.
problem size by a constant (e.g., N→N-1).
Calculate Pow(X,62) following the execution of
the algorithm.
Bad alternatives
return Pow(Pow(X,2),N/2);
return Pow(Pow(X,N/2),2);
infinite loop when N=2
return Pow(X,N/2)*Pow(X,N/2);
less efficient, why (what is the running time)?

UM Course Data I, Introduction 27

Checking Your Analysis


After theoretical analysis, you should run the
program and observe the running time.
For example, let N→2N
• O(N 2 ) ր ×4
• O(N 3 ) ր ×8
• O(log N ) ր + log 2 (small increase)
• O(N log N ) ր ×2N +little
To verify O(f(N)), compute T(N)/f(N)
• if f(N) is a tight estimation,
T(N)/f(N)→constant
• if f(N) overestimate, T(N)/f(N)→ 0
• if f(N) underestimate, T(N)/f(N) diverges

You might also like