You are on page 1of 21

What is Parallel Computing?

Traditionally, software has been written for serial computation: o To be run on a single computer having a single Central Processing Unit (CPU); o A problem is broken into a discrete series of instructions. o Instructions are executed one after another. o Only one instruction may execute at any moment in time.

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: o To be run using multiple CPUs o A problem is broken into discrete parts that can be solved concurrently o Each part is further broken down to a series of instructions o Instructions from each part execute simultaneously on different CPUs

The compute resources can include: o A single computer with multiple processors; o An arbitrary number of computers connected by a network; o A combination of both. The computational problem usually demonstrates characteristics such as the ability to be: o Broken apart into discrete pieces of work that can be solved simultaneously; o Execute multiple program instructions at any moment in time; o Solved in less time with multiple compute resources than with a single compute resource.

The Universe is Parallel: Parallel computing is an evolution of serial computing that attempts to emulate what has always been the state of affairs in the natural world: many complex, interrelated events happening at the same time, yet within a sequence. For example: Galaxy formation Planetary movement Weather and ocean patterns
o o o o o o o o

Rush hour traffic Automobile assembly line Building a space shuttle

Tectonic plate drift

Ordering a hamburger at the drive through.

The Real World is Massively Parallel

Uses for Parallel Computing:

Historically, parallel computing has been considered to be "the high end of computing", and has been used to model difficult scientific and engineering problems found in the real world. Some examples: o Atmosphere, Earth, Environment o Physics - applied, nuclear, particle, condensed matter, high pressure, fusion, photonics o Bioscience, Biotechnology, Genetics o Chemistry, Molecular Sciences o Geology, Seismology o Mechanical Engineering - from prosthetics to spacecraft

o o

Electrical Engineering, Circuit Design, Microelectronics Computer Science, Mathematics

Today, commercial applications provide an equal or greater driving force in the development of faster computers. These applications require the processing of large amounts of data in sophisticated ways. For example: o Databases, data mining o Oil exploration o Web search engines, web based business services o Medical imaging and diagnosis o Pharmaceutical design o Management of national and multi-national corporations o Financial and economic modeling o Advanced graphics and virtual reality, particularly in the entertainment industry o Networked video and multi-media technologies o Collaborative work environments

Why Use Parallel Computing?


Main Reasons:

Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings. Parallel clusters can be built from cheap, commodity components.

Solve larger problems: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. For example: o "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems requiring PetaFLOPS and PetaBytes of computing resources. o Web search engines/databases processing millions of transactions per second

Provide concurrency: A single compute resource can only do one thing at a time. Multiple computing resources can be doing many things simultaneously. For example, the Access Grid (www.accessgrid.org) provides a global collaboration network where people from around the world can meet and conduct work "virtually".

Use of non-local resources: Using compute resources on a wide area network, or even the Internet when local compute resources are scarce. For example: o SETI@home (setiathome.berkeley.edu) uses over 330,000 computers for a compute power over 528 TeraFLOPS (as of August 04, 2008) o Folding@home (folding.stanford.edu) uses over 340,000 computers for a compute power of 4.2 PetaFLOPS (as of November 4, 2008)

Limits to serial computing: Both physical and practical reasons pose significant constraints to simply building ever faster serial computers: o Transmission speeds - the speed of a serial computer is directly dependent upon how fast data can move through hardware. Absolute limits are the speed of light (30

cm/nanosecond) and the transmission limit of copper wire (9 cm/nanosecond). Increasing speeds necessitate increasing proximity of processing elements. Limits to miniaturization - processor technology is allowing an increasing number of transistors to be placed on a chip. However, even with molecular or atomic-level components, a limit will be reached on how small components can be. Economic limitations - it is increasingly expensive to make a single processor faster. Using a larger number of moderately fast commodity processors to achieve the same (or better) performance is less expensive.

Current computer architectures are increasingly relying upon hardware level parallelism to improve performance:
o o o

Multiple execution units Pipelined instructions Multi-core

RAM model
Random Access Machine is a favorite model of a sequential computer. Its main features are: 1. Computation unit with a user defined program. 2. Read-only input tape and write-only output tape. 3. Unbounded number of local memory cells. 4. Each memory cell is capable of holding an integer of unbounded size. Instruction set includes operations for moving data between memory cells, comparisons and conditional branches, and simple arithmetic operations. 5. 6. Execution starts with the first instruction and ends when a HALT instruction is executed. 7. All operations take unit time regardless of the lengths of operands. 8. Time complexity = the number of instructions executed. 9. Space complexity = the number of memory cells accessed.

PRAM model
Parallel Random Access Machine is a straightforward and natural generalization of RAM. It is an idealized model of a shared memory SIMD machine. Its main features are: 1. Unbounded collection of numbered RAM processors P0, P1, P2,... (without tapes). 2. Unbounded collection of shared memory cells M[0], M[1], M[2],.... 3. Each Pi has its own (unbounded) local memory (registers) and knows its index i. 4. Each processor can access any shared memory cell (unless there is an access conflict, see further) in unit time. 5. Input af a PRAM algorithm consists of n items stored in (usually the first) n shared 5. memory cells. 6. Output of a PRAM algorithm consists of n' items stored in n' shared memory cells.

7. PRAM instructions execute in 3-phase cycles. 1. Read (if any) from a shared memory cell. 2. Local computation (if any). 3. Write (if any) to a shared memory cell. 8. Processors execute these 3-phase PRAM instructions synchronously. 9. Special assumptions have to be made about R-R and W-W shared memory access conflicts. 10. The only way processors can exchange data is by writing into and reading from memory cells. 11. P0 has a special activation register specifying the maximum index of an active processor. Initially, only P0 is active, it computes the number of required active processors and loads this register, and then the other corresponding processors start executing their programs. 12. Computation proceeds until P0 halts, at which time all other active processors are halted. 13. Parallel time complexity = the time elapsed for P0's computation. 14. Space complexity = the number of shared memory cells accessed. PRAM is an attractive and important model for designers of parallel algorithms. Why? 1. It is natural: the number of operations executed per one cycle on p processors is at most p. 2. It is strong: any processor can read or write any shared memory cell in unit time. 3. It is simple: it abstracts from any communication or synchronization overhead, which makes the complexity and correctness analysis of PRAM algorithms easier. Therefore, 4. It can be used as a benchmark: If a problem has no feasible/efficient solution on PRAM, it has no feasible/efficient solution on any parallel machine. 5. It is useful: it is an idealization of existing (and nowaday more and more abundant) shared memory parallel machines.

Simulation From One PRAM Model To Other

PREFIX SUM

The sequential for loop executes [logn] times. Hence, The overall execution time will be [logn].

List Ranking Algorithm . The .

Merging Two Sorted List

Cost Optimal Parallel Algorithms

You might also like