Shared Memory Multiprocessor Programming

CS-421 Parallel Processing
BE (CIS) Batch 2004-05
Shared Memory Multiprocessor Programming

Section 4.1 is your reading assignment. This section covers a comparison of shared memory and message passing programming. However, some important points of comparison between the two paradigms are emphasized here.
Interface to Communication o
Shared Memory Systems Communication between processors is implicit and transparent. Processors access memory through the shared bus.
Message Passing Systems Processors must explicitly communicate with each other through messages.
Complexity of the Architecture o
Shared Memory Systems Leverages conventional architecture better since existing processors can be added to the shared bus system easily.
Message Passing Systems Since there are fewer assumptions on the model, it leads to a simpler multiprocessing architecture overall. The catch is that this requires code to be rewritten for new platforms due to the explicit interface to communication.
Convenience o
Shared Memory Systems Serial code runs without modification. Message Passing Systems Message passing libraries (i.e. MPI, PVM...) are available for a wide variety of platforms.
Protocols o
Shared Memory Systems Processors do not explicitly communicate with each other so communication protocols are hidden within the system. This means communication can be close to the hardware (or built into the processor), with the shared bus system deciding on how most efficiently to manage communication. Since communication occurs as part of the memory system, a smart shared memory architecture can make communication faster by taking advantage of the memory hierarchy.
Message Passing Systems Communication protocols are fully under user control. These protocols are complex to the programmer causing communication to be treated as an I/O call for portability reasons. This can be expensive and slow.
Page 1 of 6
Linear Recurrence Solver Heres a preliminary version of the MIMD program for linear recurrence solver
shared n, a[n, n], x[n], c[n]; private i, j; for i := 1 step 1 until n-1 fork DOROW; i := n; /* Initial process i.e. the one that creates other processes handles i = n */ DOROW: x[i]:= c[i]; for j := 1 step 1 until i-1 x[i] := x[i] + a[i, j] * x[j]; join n;
Private copy of i to be received by every spawned process at the time of forking. All processes must proceed in a way that x[k] is available for i > k There may be several processes per processor Please note the difference between SIMD version (we studied in chapter 3) and the MIMD version (above code) of recurrence solver
Data Sharing & Process Management
-------------(A)
Join to wait for the forked processes to finish and allow one of them to proceed
Pass i by value private value of i (of new process) is set to current value of i (of forker) Pass shared variables by reference Parameter passing can be made clearer by replacing fork with create to instantiate processes
create procedure_name(parameter list) Create is similar to procedure call except that:
o o
o
The created procedure (process) runs in parallel with the creator The creator continues its execution without waiting for the return of the created process A return in a procedure that can be invoked by either a call or create is interpreted as quit (process termination) if the procedure instance resulted from a create and as a sequential return if it was called Necessitates an explicit counter as opposed to an implicit counter in join
o create.
Heres the next version of MIMD recurrence solver with the above changes incorporated i.e. (fork, join) replaced by
procedure dorow(value i, done, n, a, x, c) shared n, a[n, n], x[n], c[n], done; private i, j; x[i] := c[i]; for j := 1 step 1 until i-1 x[i] := x[i] + a[i, j] * x[j];
Page 2 of 6

return; end procedure
done := done 1;
shared n, a[n, n], x[n], c[n], done; private i; done := n; for i := 1 step 1 until n-1 create dorow(i, done, n, a, x, c); i := n; call dorow(i, done, n, a, x, c); /* call the nth one */ while (done 0) ; /* loop until all procedure instances finished */ < code to use x[ ]> /* create n 1 procedures */
Synchronization
Theres no guarantee of (A) in above programs. For this we need to have explicit synchronization operations. A synchronization operation delays the progress of one or more processes until some condition is satisfied. This condition may be the o o Progress of some other process(es) (control-based synchronization) Status of some variable (data-based synchronization)
Here we present the next version of MIMD recurrence solver with a popular form of data-based synchronization called producer-consumer synchronization. But, first we describe the producer-consumer synchronization.
State o o o o o
1-bit state (FULL/empty) is associated with the synchronizing variable

produce
Operations
waits for the variable to be empty, write it, and sets its state as FULL
consume
waits for the variable to be FULL, reads its value, and sets its state as empty
void
initializes the state to empty

copy
waits for the variable to be FULL, reads its value, but doesnt sets its state as empty
Pseudocode Syntax
produce <shared variable> := <expression> consume <shared variable> into <private variable> copy <shared variable> into <private variable> void <shared variable>
Page 3 of 6
With the producer-consumer synchronization, the MIMD recurrence solver is presented below.
procedure dorow(value i, done, n, a, x, c) shared n, a[n, n], x[n], c[n], done; private i, j, sum, priv; sum := c[i]; for j := 1 step 1 until i-1 {copy x[j] into priv; /* Get x[j] when available */ sum := sum + a[i, j] * priv;} produce x[i] :=sum; /* Make x[i] available to others */ done := done 1; return; end procedure shared n, a[n, n], x[n], c[n], done; private i; done := n; for i := 1 step 1 until n-1 { void x[i]; create dorow(i, done, n, a, x, c); } i := n; void x[i]; call dorow(i, done, n, a, x, c); while (done 0) ; < code to use x[ ]>
Atomicity & Synchronization
An atomic operation runs indivisibly to completion Simultaneous access to done (i.e. decrementing) by two or processes may lead to an incorrect value in done Mutual Exclusion is one mechanism (but NOT the only one) to implement atomic operations on shared variables o Atomic operations on shared variables are kept in a section of code that can be executed by only one process at a time. Such a code section is called critical section. Execution of a critical section by one process excludes all other processes from executing the critical section and hence ensuring atomicity o Critical sections of the same name exclude each other but may run in parallel with critical sections of different name The final, synchronized, MIMD recurrence solver is presented below.
procedure dorow(value i, done, n, a, x, c) shared n, a[n, n], x[n], c[n], done; private i, j, sum, priv; sum := c[i]; for j := 1 step 1 until i-1
Page 4 of 6
{copy x[j] into priv; sum := sum + a[i, j] * priv;}
produce x[i] :=sum; critical /* lock out other processes */ done := done 1; end critical /* allow other processes */ return; end procedure shared n, a[n, n], x[n], c[n], done; private i; done := n; for i := 1 step 1 until n-1 { void x[i]; create dorow(i, done, n, a, x, c); } i := n; void x[i]; call dorow(i, done, n, a, x, c); while (done 0) ; < code to use x[ ]>
There are two kinds of synchronizations in this program:

1. Control-Based for shared variable done 2. Data-Based for shared variable x[j]
Here the critical section could be replaced by the following data-based synchronization:
consume done into pdone; produce done := pdone 1;
where pdone is declared private and done is initialized to FULL with value n. The ability to replace one synchronization with another is quite general, and synchronization methods are chosen on the basis of efficiency and ease of use rather than on the basis of intrinsic capability.
Work Distribution
In the MIMD recurrence solver every process doesnt perform same amount of computation. In particular, the process handling i = 1 doesnt do any work at all while the one dealing i = n does the most. This gives rise to load imbalance resulting in inefficiency as exemplified below. Let P = number of processes and n = number of x[] to compute. So far we have assumed P = n. However, n>>P, in general. Now, arises a question: which values of i are handled by each process? This decision can be made in two ways:
Page 5 of 6
Shared Memory Multiprocessor Programming 1. Prescheduled Work Distribution
At the time of program creation programmer or compiler decides about pattern of assignment (of i values to processes, for instance) There are two methods of prescheduled work distribution. In both the methods the values of i are divided into groups of n/P and each group is assigned to one process(or). a. Block Mapping If the block contains consecutive values of i, we call it block mapping. However, load imbalance remains b. Cycling Mapping Processor p is assigned the values of i such that p = i mod P, where P is the number of processors
2. Self-scheduled Work Distribution
Work distribution decision is made at run time. This is achieved by maintaining a shared value of i that is atomically read and incremented by each process whenever it completes work for a previous value of i. This gives rise to dynamic load balancing because processors doing more computation for their i values get fewer values before the supply of i values runs out.
******
Page 6 of 6

Shared Memory Multiprocessor Programming

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shared Memory Multiprocessor Programming

Uploaded by

Copyright:

Available Formats

CS-421 Parallel Processing

BE (CIS) Batch 2004-05