Professional Documents
Culture Documents
INTRODUCTION
A simple example:
Job: put on socks and shoes Processor: a pair of hands Sequential algorithm: put on right sock, right shoe, put on left sock, left shoe. Need 4 time units Parallel algorithm: Two processors: one for left foot and another for right foot. Need 2 time units. Question: Can we use four processors to further speed up to, say, 1 time unit?
Multicomputers (distributed memory) Hypercube architecture Mesh connected architecture Networks of workstations (NOW)
An inexpensive way to build parallel computers.
Review of time and space complexity Time complexity: a function of the problem
size
is said to be
so
Examples:
is exponential.
NP-class: Traveling salesman problem (travel all cities with minimum cost):
An example:
putting on socks and shoes
ERCW-PRAM: exclusive read concurrent write. CRCW-PRAM: concurrent read current write.
How to resolve the write conicts Common: all simultaneous writes store the
same value to that memory location
Arbitrary: choose one value ignore others Minimum: store the value of the processor
with the minimum index
Example:
An algorithm on a PRAM: Multiplication of two matrices in time on a PRAM (CREW) with cessors.
pro-
Standard algorithm:
for
10
Step 1:
Step 2:
Now look at
processors.
Step 1:
. .
Step 2:
to
11
VLSI complexity model ( model) Set limits on memory, I/O and communication, for implementing parallel algorithms with VLSI chips.
A: chip area (chip complexity) T: time for completing a given computation s: problem size There exists a lower bound such that Memory requirement sets a lower bound on
chip area A
12
Bisection
(usually use
): maximum
information exchange between the two halves of the chip during time T.
13
Example:
Matrix multiplication.
matrices,
2-D mesh architecture, PEs broadcast bus for inter-PE communication chip area complexity: time complexity
14
How to solve a typical computation task sorting using different types of computation models. Problem description: A sequence
A linear order
is dened on .
such that
for
15
Sequential algorithm.
Time
16
Sorting by enumeration:
stores .
If
and
then
in the sorted
list.
Each compares
position
of .
If
optimal.
17
Step 2: for
stores
to doall
in position
of
end for
18
Parallel algorithm on CREW model. Divide into subsets and one processor sorts a subset.
Optimal algorithm.
19
custom-designed
inter-
comparator
20
merging network: merges two length- sorted lists into one length sorted list.
a1 a2
P1
c1
P3 b1 b2
c2 c3
P2
c4
and
21
ing networks
connected to the
rst merger
ond merger
Additional comparators
22
a1 a2
P1
c1
P3 b1 b2
c2 c3
P2
c4
a1 a2 a3 a4 an-1 an
c1 P1 P2 c2 c3 c4 c5
b1 b2 b3 b4 bn-1 bn
c2i c2i+1
c2n
23
Proof of correctness.
and
Consider sequence
Suppose elements of
are in
24
elements
Then
elements in
elements
,
. These
Plug in
is greater than
Similarly,
is greater than
Then we have
25
Similarly, consider
So
of
are in
of
are in s, and
is greater than
is greater than
s.
We have
for
Now let
, we have
Since
26
Then
For
Then
27
Processors:
Cost:
28
Time:
Processors:
Cost:
29
Optimal. The best parallel algorithm: AKS sorting network (CREW model)