Introduction To Parallel Processing

I. Parallel computer systems can be categorized into three structural phase: 1. Pipelined computers 2. Array processors 3.
Multiprocessor systems
In computing, a Pipeline is a set of data processing elements connected in series, so that the output of one element is the input of next one. The elements of a pipeline are often executed in parallel or in time-sliced-fashion. II. Computer related pipelines include: 1. Instruction pipelines 2. Graphics pipelines 3. Software pipelines Buffered, Synchronous pipelines: Conventional microprocessors are synchronous circuits that use buffered, synchronous pipelines. In these pipelines, "pipeline registers" are inserted inbetween pipeline stages, and are clocked synchronously. The time between each clock signal is set to be greater than the longest delay between pipeline stages, so that when the registers are clocked, the data that is written to them is the final result of the previous stage. Buffered, Asynchronous pipelines: Asynchronous pipelines are used in asynchronous circuits, and have their pipeline registers clocked asynchronously. Generally speaking, they use a request/acknowledge system, wherein each stage can detect when it's "finished". When a stage is finished and the next stage has sent it a "request" signal, the stage sends an "acknowledge" signal to the next stage, and a "request" signal to the previous stage. When a stage receives an "acknowledge" signal, it clocks its input registers, thus reading in the data from the previous stage. The AMULET microprocessor is an example of a microprocessor that uses buffered, asynchronous pipelines. Unbuffered pipelines: Unbuffered pipelines, called "wave pipelines", do not have registers in-between pipeline stages. Instead, the delays in the pipeline are "balanced" so that, for each stage, the difference between the first stabilized output data and the last is minimized. Thus, data flows in "waves" through the pipeline, and each wave is kept as short (synchronous) as possible.
The maximum rate that data can be fed into a wave pipeline is determined by the maximum difference in delay between the first piece of data coming out of the pipe and the last piece of data, for any given wave. If data is fed in faster than this, it is possible for waves of data to interfere with each other.
III.An Instruction pipeline is a technique used in the design of computers and other electronic devices to increase their instruction throughput (the number of instruction that can be executed in a unit of time). Pipelining doesnt reduce the time it takes to complete an instruction; it increases the number of instructions that can be processed at once, thus reducing the delay between completed instructions. The classic RISC pipeline is broken into five stages with a set of flip flops between each stage. Instruction fetch Instruction decode and register fetch Execute Memory access Register write back
When a programmer (or compiler) writes assembly code, they make the assumption that each instruction is executed before execution of the subsequent instruction is begun. This assumption is invalidated by pipelining. When this causes a program to behave incorrectly, the situation is known as a hazard. Various techniques for resolving hazards such as forwarding and stalling exist. A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle. Pipelining does not completely cancel out idle time in a CPU but making those modules work in parallel improves program execution significantly. Processors with pipelining are organized inside into stages which can semiindependently work on separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another stage until the job is done. This organization of the processor allows overall processing time to be significantly reduced. A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in each stage. This generally means that the processor's frequency can be increased as the cycle time is lowered. This happens because there are fewer components in each stage of the pipeline, so the propagation delay is decreased for the overall stage.
Basic five-stage pipeline in a RISC machine: IF = Instruction Fetch ID = Instruction Decode, EX = Execute, MEM = Memory access WB = Register write back In the fourth clock cycle (the green column), the earliest instruction is in MEM stage, and the latest instruction has not yet entered the pipeline.
Advantages of Pipelining: The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.
Disadvantages of Pipelining: A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture. The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor. A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.
Generic pipeline:
To the right is a generic pipeline with four stages: 1. 2. 3. 4. Fetch Decode Execute Write-back
The top gray box is the list of instructions waiting to be executed; the bottom gray box is the list of instructions that have been completed; and the middle white box is the pipeline. Execution is as follows:
Time
Execution
Four instructions are awaiting to be executed
the green instruction is fetched from memory the green instruction is decoded the purple instruction is fetched from memory the green instruction is executed (actual operation is performed)
2 3
the purple instruction is decoded the blue instruction is fetched the green instruction's results are written back to the register file or
memory 4 5 6 7 8 9 the red instruction is written back the red instruction is completed the purple instruction is executed the blue instruction is decoded the red instruction is fetched the green instruction is completed the purple instruction is written back the blue instruction is executed the red instruction is decoded The purple instruction is completed the blue instruction is written back the red instruction is executed the blue instruction is completed
All instructions are executed
IV. In 3D computer graphics, the terms graphics pipeline or rendering pipeline most commonly refer to the current state of the art method of rasterization-based rendering as supported by commodity graphics hardware. The graphics pipeline typically accepts some representation of 3D primitives as an input and result in a 2D raster images as output.
V. An array processor or vector processor, is a CPU that implements an instruction set containing instruction that operate on one-dimensional arrays of data called vectors.
Flynn's taxonomy Single instruction Multiple instruction
Single data Multiple data
SISD SIMD
MISD MIMD
In computing, SISD (single instruction, single data) is a term referring to a computer architecture in which a single processor, a uniprocessor, executes a single instruction stream, to operate on data stored in a single memory.
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism.
In computing, MIMD (multiple instruction, multiple data) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently
In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline.
VI. Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them.
Processor symmetry:
In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes. A combination of hardware and operating-system software design considerations determine the symmetry (or lack thereof) in a given system. Systems that treat all CPUs equally are called symmetric multiprocessing (SMP) systems. In systems where all CPUs are not equal, system resources may be divided in a number of ways, including asymmetric multiprocessing (ASMP), non-uniform memory access (NUMA) multiprocessing, and clustered multiprocessing.
Instruction and data streams

In multiprocessing, the processors can be used to execute a single sequence of instructions in multiple contexts (single-instruction, multiple-data or SIMD, often used in vector processing), multiple sequences of instructions in a single context (multiple-instruction, single-data or MISD, used for redundancy in fail-safe systems and sometimes applied to describe pipelined processors or hyper-threading), or multiple sequences of instructions in multiple contexts (multiple-instruction, multiple-data or MIMD).
Processor coupling
Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at the bus level. Chip multiprocessors, also known as multi-core computing, involves more than one processor placed on a single chip and can be thought of the most extreme form of tightly-coupled multiprocessing. Mainframe systems with multiple processors are often tightly-coupled. Loosely-coupled multiprocessor systems (often referred to as clusters) are based on multiple standalone single or dual processor commodity computers interconnected via a high speed communication system (Gigabit Ethernet is common). A Linux Beowulf cluster is an example of a loosely-coupled system. Tightly-coupled systems perform better and are physically smaller than loosely-coupled systems.
Power consumption is also a consideration. Tightly-coupled systems tend to be much more energy efficient than clusters. This is because considerable economy can be realized by designing components to work together from the beginning in tightly-coupled systems.
In computing, symmetric multiprocessing (SMP) involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use SMP architecture. A computer system that uses symmetric multiprocessing is called a symmetric multiprocessor or symmetric multiprocessor system (SMP system). SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.
Multiprocessor system with centralized Shared-Memory called Main Memory (MM) operating under a single OS (Operating System) with two or more homogeneous processors.
SMP (Symmetric Multiprocessor System) is a tightly coupled multiprocessor system with a pool of homogeneous processors running independently, each processor executing different programs and working on different data and with capability of sharing common resources (memory, I/O device, interrupt system, etc) and connected using a system bus or a crossbar. Usually each processor has an associated private high-speed memory known as cache memory (or cache) to speed-up the MM data access and to reduce the system bus traffic. Multiprocessing is a type of "processing" in which two or more processors work together to "process more than one program simultaneously". The term Multiprocessor is referred to the hardware architecture that allows multiprocessing.
Asymmetric multiprocessing, or AMP, was a software stopgap for handling multiple CPUs before symmetric multiprocessing, or SMP, was available.
A multi-core processor is a single computing component with two or more independent actual processors (called "cores"), which are the units that read and execute program instructions. The instructions are ordinary cpu instructions like add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package.
A dual-core processor has two cores (e.g. AMD Phenom II X2, Intel Core Duo), a quad-core processor contains four cores (e.g. AMD Phenom II X4, intel's quad-core processors, see i3, i5, and i7at Intel Core), a hexa-core processor contains six cores (e.g. AMD Phenom II X6, Intel Core i7 Extreme Edition 980X), an octa-core processor contains eight cores (e.g. AMD FX-8150). A multi-core processor implements multiprocessing in a single physical package. Cores may or may not share caches, and they may implement message passing or shared memory inter-core communication methods. Common network topologies to interconnect cores include bus, ring, two-dimensional mesh, and crossbar. Homogeneous multi-core systems include only identical cores. Heterogeneous multi-core systems have cores which are not identical.

Introduction To Parallel Processing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Parallel Processing

Uploaded by

Copyright:

Available Formats

I. Parallel computer systems can be categorized into three structural phase: 1. Pipelined computers 2. Array processors 3.

Four instructions are awaiting to be executed

All instructions are executed

Flynn's taxonomy Single instruction Multiple instruction

Single data Multiple data

Instruction and data streams

You might also like