CPD 02

Introduction to Metrics, Applications and Architectures
Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Tcnico e
September 14, 2011
Jos Monteiro (DEI / IST) e
Parallel and Distributed Computing 2
2011-11-14
1 / 24
Outline
Simple Example: Opportunities for Parallelism Speedup and Overheads Application Areas Parallel Systems
2011-11-14
2 / 24
Simple Example: Opportunities for Parallelism

x = initX(A, B); y = initY(A, B); z = initZ(A, B); for(i = 0; i < N_ENTRIES; i++) x[i] = compX(y[i], z[i]); for(i = 1; i < N_ENTRIES; i++){ x[i] = solveX(x[i-1]); z[i] = x[i] + y[i]; } finalize1(&x, &y, &z); finalize2(&x, &y, &z); finalize3(&x, &y, &z);
Functional Parallelism Data Parallelism Pipelining
No good?
2011-11-14
3 / 24

No good?
2011-11-14
3 / 24

No good?
2011-11-14
3 / 24

No good?
2011-11-14
3 / 24

No good?
2011-11-14
3 / 24
How Much Faster?

Speedup
S= tserial tparallel
Ideal speedup with p processors?
2011-11-14
4 / 24
How Much Faster?

Speedup
tserial p )
Ideal speedup with p processors? Expected speedup?
S =p
(tparallel =
2011-11-14
4 / 24
How Much Faster?

Speedup
tserial p )
Ideal speedup with p processors? Expected speedup? S <p
S =p
(tparallel =
Cant we get superlinear speedup S > p?
2011-11-14
4 / 24
How Much Faster?

Speedup
tserial p )
S =p
(tparallel =
Cant we get superlinear speedup S > p?
Yes!
2011-11-14
4 / 24
How Much Faster?

Speedup
tserial p )
S =p
(tparallel =
Cant we get superlinear speedup S > p? increased e ciency in memory access
Yes!
2011-11-14
4 / 24
How Much Faster?

Speedup
tserial p )
S =p
(tparallel =
Cant we get superlinear speedup S > p? increased e ciency in memory access
Yes!
some specic problems (for example, search)
2011-11-14
4 / 24
Limitations for Ideal Speedup

Overheads that limit parallel speedup?
2011-11-14
5 / 24
Limitations for Ideal Speedup

Overheads that limit parallel speedup?
data transfers (or more generally, communication among tasks) task startup / nalize load balancing inherent sequential portions of computation
2011-11-14
5 / 24
Eect of Sequential Fraction

tparallel = f tserial + (1 S(p, f ) = f ) tserial p 1 f + 1pf
p!1
lim S(p, f ) =
1 f
2011-11-14
6 / 24
Eect of Sequential Fraction

tparallel = f tserial + (1 S(p, f ) = f ) tserial p 1 f + 1pf
p!1
lim S(p, f ) =
1 f
f=0%
f=5%
f=10%
f=20%
2011-11-14
6 / 24
Di culties about Parallel Programming
Algorithm development is harder

dene and coordinate concurrent tasks
2011-11-14
7 / 24

Software programming more complex

low level parallel directives debug signicantly more di cult lack of programming models and environments
2011-11-14
7 / 24

Software programming more complex

low level parallel directives debug signicantly more di cult lack of programming models and environments
Rapid pace of change in computer system architecture

parallel algorithm may not be e cient for next generation of parallel computers
2011-11-14
7 / 24
Application Areas
Why bother with parallel computation?
2011-11-14
8 / 24
Application Areas
Why bother with parallel computation? Continued demand for greater computational power from many dierent domains! Two major classes of problems in parallel computation:
Grand Challenge problems

Problems that cannot be solved in a reasonable amount of time with todays computers.
Embarrassingly Parallel problems

Problems whose workload can be easily divided into (almost) independent tasks.
Jos Monteiro (DEI / IST) e Parallel and Distributed Computing 2 2011-11-14 8 / 24
Grand Challenge problems
Global Environmental/Ecosystem Modeling Biomechanics and biomedical imaging Fluid dynamics Molecular nanotechnology Nuclear power and weapons simulations
2011-11-14
9 / 24
Embarrassingly Parallel problems
Numerical weather forecasting Computer graphics / animation Basic Local Alignment Search Tool (BLAST) in bioinformatics Monte-Carlo methods Genetic algorithms
2011-11-14
10 / 24
Example: Weather Forecasting
Atmosphere is modeled by dividing it into 3-dimension cells (eg, 1 km3 ).
Time is discretized into intervals (1 second, 1 minute, 1 hour)
Atmospheric conditions (temperature, pressure, humidity, etc) for each cell are computed as a function of neighbors cell conditions in this and previous time intervals.
2011-11-14
11 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 .
2011-11-14
12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval.
2011-11-14
12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval. If the second is the time interval, and we want to compute the forecast for tomorrow (almost 105 seconds in a day), there a total of 5 1014 operations.
2011-11-14
12 / 24

For the forecast of continental Portugal, take an area of 1000km 500km = 5 105 km2 . Assuming the atmosphere height of 50 km, there are 25 106 cells. If each cell takes 200 oating point operations, we require a total of 5 109 operations for each time interval. If the second is the time interval, and we want to compute the forecast for tomorrow (almost 105 seconds in a day), there a total of 5 1014 operations. An Intel Pentium IV 3.2 GHz performs at 3 GFLOPS, hence taking about 40 hours...
2011-11-14
12 / 24
Example: n-Body Problem
Each body has a given position, velocity, acceleration, that needs to be computed for every time interval.
Each body attracts (and/or repels) every other body. For n bodies, there are a total of n2 interactions that need to be accounted for.
Example: a galaxy has more than 1011 stars, leading to more than 1022 oating point operations for each time interval!
2011-11-14
13 / 24
Processor Evolution
2011-11-14
14 / 24
Supercomputer Evolution
2011-11-14
15 / 24
Top 10 Supercomputers (June 2011)
2011-11-14
16 / 24
Projected Performance Development
First Peta system available in 2009! Estimate of humans brain computational power: 1014 neural connections at 200 calculations per second ) 20 PFLOPS
Types of Supercomputers
Processor Arrays (SIMD)
Name associated with vector processing, very popular in early supercomputers.
2011-11-14
18 / 24
Multicore (SMP)
Set of processors sharing a common main memory.
2011-11-14
18 / 24
Multicore (SMP)
Massively Parallel Processors (MPP)

Processors with individual main memory with tightly coupled interconnections.
2011-11-14
18 / 24
Multicore (SMP)

Clusters
Processors with individual main memory linked together using InniBand, Quadrics, Myrinet, or Gigabit Ethernet connections.
COW / NOW: Cluster / Network Of Workstations Beowulf: cluster made of PCs running Linux using TCP/IP (COTS: Commodity-O-The-Shelf)
2011-11-14
18 / 24
Multicore (SMP)

Clusters
Processors with individual main memory linked together using InniBand, Quadrics, Myrinet, or Gigabit Ethernet connections.
COW / NOW: Cluster / Network Of Workstations Beowulf: cluster made of PCs running Linux using TCP/IP (COTS: Commodity-O-The-Shelf)
Constellation
MPP / cluster where each node is a multicore.
Evolution of Types of Supercomputers
2011-11-14
19 / 24
Evolution of Dominant Companies
2011-11-14
20 / 24
Hierarchy of Computational Power
2011-11-14
21 / 24
Warehouse-size Computers
2011-11-14
22 / 24
Warehouse-size Computers
2011-11-14
22 / 24
Multicores
Sample of todays multicore processors: AMD
Opteron: dual, quad, hex, 8-, 12-cores Phenom: dual, quad, hex cores
Intel
Core i7: six hyperthreaded cores Dunnington (Xeon): six cores
Sun
Niagara: 8 cores; 8-way ne-grain multithreading per core
IBM
Power 7: dual, quad, hex, 8-core Cell: 1 PPC core; 8 SPEs w/ SIMD parallelism
Next Class
technologies for parallel programming
models for computer architecture
review of computer architecture
levels of parallelism
2011-11-14
24 / 24

CPD 02

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CPD 02

Uploaded by

Copyright:

Available Formats

Introduction to Metrics, Applications and Architectures

September 14, 2011

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Simple Example: Opportunities for Parallelism

Functional Parallelism Data Parallelism Pipelining

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Simple Example: Opportunities for Parallelism

Functional Parallelism Data Parallelism Pipelining

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Simple Example: Opportunities for Parallelism

Functional Parallelism Data Parallelism Pipelining

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Simple Example: Opportunities for Parallelism

Functional Parallelism Data Parallelism Pipelining

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Simple Example: Opportunities for Parallelism

Functional Parallelism Data Parallelism Pipelining

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors?

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors? Expected speedup?

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors? Expected speedup? S <p

Cant we get superlinear speedup S > p?

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors? Expected speedup? S <p

Cant we get superlinear speedup S > p?

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors? Expected speedup? S <p

Cant we get superlinear speedup S > p? increased e ciency in memory access

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

How Much Faster?

Ideal speedup with p processors? Expected speedup? S <p

Cant we get superlinear speedup S > p? increased e ciency in memory access

some specic problems (for example, search)

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Limitations for Ideal Speedup

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Limitations for Ideal Speedup

Jos Monteiro (DEI / IST) e

Parallel and Distributed Computing 2

Eect of Sequential Fraction

Jos Monteiro (DEI / IST) e