You are on page 1of 13

Systolic Algorithms

Part 3
Analysis of Algorithms - Big-O Notation
Let f,g:ℕ→ℝ, g(n)>0, for all large enough n values.

● f(n) = Ω(g(n)): f(n) of order “at least” g(n)


∃c, n0, f(n) ≥ c*g(n), ∀n≥n0
● f(n) = o(g(n)): f(n) of order “at most” g(n)
∃c, n0, f(n) ≤ c*g(n), ∀n≥n0
● f(n) = O(g(n)): f(n) of order g(n)
∃c1, c2, n0, c1*g(n) ≤ f(n) ≤ c2*g(n), ∀n≥n0
Analysis of Algorithms - Lower and Upper Bound
1. Is it the best algorithm?
○ Lower bound: in some cases it is easy to see that an algorithm cannot perform better than a
threshold.
○ An algorithm is optimal if needs the same order of operations as the lower bound in the worst
case scenario.
2. Is it better than any known algorithm?
○ If it is faster than any known algorithm, it has defined an “upper bound”.

Example: matrix multiplication A,B,C∈Mmxn(ℤ), C = A*B


C has n2 elements, so an algorithm needs at least n2 basic operations, Ω(n2) is the lower bound. Actually
Ω(n log n) is a larger lower bound.
Fast matrix multiplication algorithms based on divide & conquer have O(n2.3728639).
Analysis of Parallel Algorithms
Criteria:

● Running time t(n)


● Number of processors p(n)
● Cost c(n)
● Other criteria specific to the computer architecture

Lower- and upper bound should take into account the computation model, the
number of processors, the architecture.
Analysis of Parallel Algorithms - Running Time
● The most important criteria
● The time elapsed from the start of the algorithm until it finishes.
● If the processors do not start at the same time, it is the time elapsed since the
first processor start until the last one finishes.
● For MIMD it is the only method to evaluate an algorithm
Analysis of Parallel Algorithms - Running Time
● Usually the estimation is based on counting the number of basic operations
(additions, comparisons, swaps). What a “basic operation” is depends on the
architecture.
● Number of operations on average or in the worst case scenario.
● For SIMD or MIMD: computation steps and communication steps.
● In the parallel architecture, the running time depends also on the number of
processors. If the number of processors depends on n, we can write it as t(n).
If it is constant, it does not change the result in big-O notation.
● t(n) is the number of steps in the worst case scenario.
● Speedup = tseq/tpar
Analysis of Parallel Algorithms - Processors & Cost
● Noted p(n), where n is the size of the problem
● Importance:
○ Acquisition cost
○ Maintenance cost
○ Architecture constraints
● In some cases it does not depend on n
● The cost of the algorithm c(n) = t(n)*p(n)
● Efficiency efficiency = tseq(n)/c(n)
● A parallel algorithm has an optimal cost if its cost is of the same order with
an optimal sequential algorithm.
Analysis of Parallel Algorithms - Example 1
Problem: Search a value in a file with n records using N processors on EREW
SIMD.

1. Broadcast the search value to every processor.


2. Split the n records to the N processors and each processor compares its
share of records with the searched value.
Analysis of Parallel Algorithms - Example 1
1. Broadcast

D is the location holding the searched value but EREW means exclusive read.
We use an array A of length (N+1)/2 to hold the values to broadcast.

1. P1 reads D, stores it in its register and writes it in A 1


2. for i = 0 to log N-1
parallel: for j = 2 i+1 to 2 i+1
Pj reads A j-2^i, stores it in its register and writes it in A j

The number of configured processors doubles at every step, therefore the algorithm’s order is O(log N).
Analysis of Parallel Algorithms - Example 1 (N=8, D=5)
D 5 A
Step 1
5 P

D 5 5 A
Step 2, i = 0
5 5 P

D 5 5 5 5 A
Step 2, i = 1
5 5 5 5 P

D 5 5 5 5 5 5 5 5 A
Step 2, i = 2
5 5 5 5 5 5 5 5 P
Analysis of Parallel Algorithms - Example 1
t(n) = log N + n/N = O(n/N)

tseq = O(n)

Speedup = tseq/t(n) = O(N)

c(n) = N * t(n) = O(n)

Efficiency = tseq/c(n) = 1

The algorithm is optimal.


Analysis of Parallel Algorithms - Example 2
Problem: We have N processors Pi, each containing a value ai, 1 ≤ i ≤ N.
We have to find all sums Si = a1+ a2+...+ai, ∀i, 1 ≤ i ≤ N in O(log N).

Let Sij = ai+ ai+1+...+ aj, ∀i,j, 1 ≤ i ≤ j ≤ N.


Algorithm:
for i = 0 to log N - 1
Parallel: for j = 2 i+1 to N
Pj obtains the value stored in P j-2^i
using shared memory
Pj adds it to its own value
Analysis of Parallel Algorithms - Example 2 (N=8)
A
Initial setup: Sii= aii
S11 S22 S33 S44 S55 S66 S77 S88 P

S11 S22 S33 S44 S55 S66 S77 S88 A


i=0
S11 S12 S23 S34 S45 S56 S67 S78 P

S11 S12 S23 S34 S45 S56 S67 S78 A


i=1
S11 S12 S13 S14 S25 S36 S47 S58 P

S11 S12 S13 S14 S25 S36 S47 S58 A


i=2
S11 S12 S13 S14 S15 S16 S17 S18 P

You might also like