You are on page 1of 8

Table of Contents

UNIT I ...............................................................................................................................................................
MULTI-CORE PROCESSORS .......................................................................................................................
1.1

Single core to Multi-core architectures ...........................................................................................

1.1.1 Introduction ....................................................................................................................................


1.1.2 Single-Core Processor ..................................................................................................................
1.1.3 Multi-core processor .....................................................................................................................
1.1.4 Development .................................................................................................................................
1.1.5 Technical factors ...........................................................................................................................
1.1.6 Advantages ...................................................................................................................................
1.2

SIMD and MIMD systems ...............................................................................................................

1.2.1 SIMD Systems...............................................................................................................................


1.2.2 MIMD Systems ..............................................................................................................................
1.3

Interconnection networks ................................................................................................................

1.3.1 Shared-memory interconnects .....................................................................................................


1.3.2 Distributed-memory interconnects ...............................................................................................
1.4

Symmetric and Distributed Shared Memory Architectures ...........................................................

1.4.1 Symmetric Shared Memory Architectures ...................................................................................


1.4.2 Distributed Shared Memory Architectures ...................................................................................
1.5

Cache coherence ............................................................................................................................

1.5.1 Snooping cache coherence ..........................................................................................................


1.5.2 Directory-based cache coherence ...............................................................................................
1.5.3 False sharing .................................................................................................................................
1.6

Performance Issues ........................................................................................................................

1.6.1 Speedup and efficiency ................................................................................................................


1.6.2 Amdahls law .................................................................................................................................
1.6.3 Scalability ......................................................................................................................................
1.6.4 Taking timings ...............................................................................................................................
1.7

Parallel program design ..................................................................................................................

1.7.1 An example....................................................................................................................................

Short Answers
Long Answers

UNIT 2 ..............................................................................................................................................................
PARALLEL PROGRAM CHALLENGES .......................................................................................................
2.1.1 Defining Performance ....................................................................................................................
2.1.2

Understanding Algorithmic Complexity ..................................................................................

2.1.3 Why Algorithmic Complexity Is Important .....................................................................................


2.1.4 Using Algorithmic Complexity with Care .......................................................................................
2.1.5 How Structure Impacts Performance ............................................................................................
2.1.6 The Impact of Data Structures on Performance ...........................................................................
2.2 Scalability ...............................................................................................................................................
2.2.1 Constraints to Application Scaling .................................................................................................
2.2.2 Performance Limited by Serial Code ............................................................................................
2.2.3

Superlinear Scaling.................................................................................................................

2.2.4 Scaling of Library Code .................................................................................................................


2.2.5 Hardware Constraints to Scaling ...................................................................................................
2.2.6 Operating System Constraints to Scaling .....................................................................................
2.2.7 Multicore Processors and Scaling .................................................................................................
2.3 Synchronization and data sharing ........................................................................................................
2.4 Data races .............................................................................................................................................
2.4.1 Using Tools to Detect Data Races ................................................................................................
2.4.2 Avoiding Data Races .....................................................................................................................
2.5 Synchronization primitives (mutexes, locks, semaphores, barriers) ...................................................
2.5.1 Mutexes and Critical Regions ........................................................................................................
2.5.2

Spin Locks...............................................................................................................................

2.5.3 Semaphores ...................................................................................................................................


2.5.4 Readers-Writer Locks ....................................................................................................................
2.5.5

Barriers ....................................................................................................................................

2.6 Deadlocks and livelocks ........................................................................................................................


2.7 Communication between threads (condition variables, signals, message queues and pipes). ........
2.7.1 Memory, Shared Memory, and Memory-Mapped Files ................................................................

2.7.2

Condition Variables.................................................................................................................

2.7.3 Signals and Events ........................................................................................................................


2.7.4

Message Queues ....................................................................................................................

2.7.5

Named Pipes...........................................................................................................................

2.7.6 Communication Through the Network Stack ................................................................................


2.7.7

Other Approaches to Sharing Data Between Threads .........................................................

2.7.8

Storing Thread-Private Data...................................................................................................

Short Answers
Long Answers

UNIT 3 ..............................................................................................................................................................
SHARED MEMORY PROGRAMMING WITH OpenMP ................................................................................
3.1

OpenMP Execution Model ..............................................................................................................

3.2

Memory Model.................................................................................................................................

3.2.1 Structure of the OpenMP Memory Model ....................................................................................


3.2.2 Device Data Environments ...........................................................................................................
3.2.3 The Flush Operation .....................................................................................................................
3.2.4 OpenMP Memory Consistency.....................................................................................................
3.3

OpenMP Directives .........................................................................................................................

3.3.1 Directive Format ............................................................................................................................


3.3.2 Conditional Compilation ................................................................................................................
3.3.3 Internal Control Variables .............................................................................................................
3.3.4 Array Sections ...............................................................................................................................
3.3.5 Parallel Construct ..........................................................................................................................
3.3.6 Canonical Loop Form....................................................................................................................
3.4

Work-sharing Constructs ................................................................................................................

3.4.1 Loop Construct ..............................................................................................................................


3.4.2 Sections Construct ........................................................................................................................
3.4.3 Single Construct ............................................................................................................................
3.4.4 workshare Construct .....................................................................................................................
3.5

Library functions ..............................................................................................................................

3.5.1 Runtime Library Definitions ..........................................................................................................


3.5.2 Execution Environment Routines .................................................................................................
3.5.3 Lock Routines................................................................................................................................
3.5.4 Timing Routines ............................................................................................................................
3.6

Handling Data and Functional Parallelism .....................................................................................

3.6.1 GENERAL DATA PARALLELISM .................................................................................................


3.6.2 FUNCTIONAL PARALLELISM .....................................................................................................
3.7

Handling Loops ...............................................................................................................................

3.7.1 Parallel for Pragma .......................................................................................................................


3.7.2 Function omp_get_num_procs .....................................................................................................
3.8

Performance Considerations ..........................................................................................................

3.8.1 Inverting loops ...............................................................................................................................


3.8.2 Conditionally Executing Loops .....................................................................................................
3.8.3 Scheduling. Loops
Short Answers
Long Answers

UNIT 4 ..............................................................................................................................................................
DISTRIBUTED MEMORY PROGRAMMING WITH MPI ...............................................................................
4.1

Introduction ......................................................................................................................................

4.2

MPI program execution ...................................................................................................................

4.2.1

Compilation and execution .....................................................................................................

4.2.2

MPI programs..........................................................................................................................

4.2.3

MPI Init and MPI Finalize .......................................................................................................

4.2.4

Communicators, MPI_Comm_size and MPI_Comm_rank ...................................................

4.2.5

SPMD programs .....................................................................................................................

4.2.6

Communication .......................................................................................................................

4.2.7

Message matching..................................................................................................................

4.2.8

The status_p argument...........................................................................................................

4.3
4.3.1

MPI constructs.................................................................................................................................
Datatype Constructors ............................................................................................................

4.3.2

Subarray Datatype Constructor..............................................................................................

4.3.3

Distributed Array Datatype Constructor .................................................................................

4.3.4

Cartesian Constructor.............................................................................................................

4.3.5

Distributed Graph Constructor ...............................................................................................

4.4

Libraries ...........................................................................................................................................

4.4.1

Contexts of communication ....................................................................................................

4.4.2

Groups of processes...............................................................................................................

4.4.3

Virtual topologies ....................................................................................................................

4.4.4

Attribute caching .....................................................................................................................

4.4.5

Communicators .......................................................................................................................

4.5

MPI send and receive .....................................................................................................................

4.5.1

MPI Send.................................................................................................................................

4.5.2

MPI Recv .................................................................................................................................

4.5.3

Semantics of MPI Send and MPI Recv..................................................................................

4.6

Point-to-point and Collective communication.................................................................................

4.6.1

Point-to-point communication.................................................................................................

4.6.2

Collective communication.......................................................................................................

4.7

MPI derived datatypes ....................................................................................................................

4.8

Performance evaluation ..................................................................................................................

4.8.1 Taking timings ................................................................................................................................


4.8.2 Results ............................................................................................................................................
4.8.3 Speedup and efficiency .................................................................................................................
4.8.4 Scalability
Short Answers
Long Answers

UNIT 5 ..............................................................................................................................................................
PARALLEL PROGRAM DEVELOPMENT CASE STUDIES .....................................................................
5.1 n-Body solvers .......................................................................................................................................
5.1.1 The problem ...................................................................................................................................
5.1.2 Two serial programs ......................................................................................................................

5.1.3 Parallelizing the n-body solvers.....................................................................................................


5.1.4 A word about I/O ............................................................................................................................
5.1.5 Parallelizing the basic solver using OpenMP................................................................................
5.1.6 Parallelizing the reduced solver using OpenMP ...........................................................................
5.1.7 Evaluating the OpenMP codes ......................................................................................................
5.1.8 Parallelizing the solvers using pthreads........................................................................................
5.1.9 Parallelizing the basic solver using MPI........................................................................................
5.1.10 Parallelizing the reduced solver using MPI .................................................................................
5.1.11 Performance of the MPI solvers ..................................................................................................
5.2 Tree Search ...........................................................................................................................................
5.2.1 Recursive depth-first search ..........................................................................................................
5.2.2 Nonrecursive depth-first search ....................................................................................................
5.2.3 Data structures for the serial implementations .............................................................................
5.2.4 Performance of the serial implementations ..................................................................................
5.2.5 Parallelizing tree search ................................................................................................................
5.2.6 A static parallelization of tree search using pthreads ..................................................................
5.2.7 A dynamic parallelization of tree search using pthreads .............................................................
5.2.8 Evaluating the Pthreads tree-search programs ...........................................................................
5.2.9 Parallelizing the tree-search programs using OpenMP...............................................................
5.2.10Performance of the OpenMP implementations............................................................................
5.3

OpenMP and MPI implementations and comparison ....................................................................

5.3.1 Implementation of tree search using MPI and static partitioning ................................................
5.3.2 Implementation of tree search using MPI and dynamic partitioning
Short Answers
Long Answers

You might also like