Parallel Processing

CONTENT BEYOND THE SYLLABUS PARALLEL PROCESSING High-performance computers are increasingly in demand in the areas of structural analysis,
weather forecasting, petroleum exploration, medical diagnosis, aerodynamics simulation, artificial intelligence, expert systems, genetic engineering, signal and image processing, among many other scientific and engineering applications. Without superpower computers, many of these challenges to advance human civilization cannot be made within a reasonable time period. Achieving high performance depends not only on using faster and more reliable hardware devices but also on major improvements in computer architecture and processing techniques. Flynns Taxonomy In general, digital computers may be classified into four categories, according to the multiplicity of instruction and data streams. This scheme for classifying computer organizations was introduced by Michael J. Flynn. The essential computing process is the execution of a sequence of instructions on a set of data. The term stream is used here to denote a sequence of items (instructions or data) as executed or operated upon by a single processor. Instructions or data are defined with respect to a referenced machine. An instruction stream is a sequence of instructions as executed by the machine; a data stream is a sequence of data including input, partial, or temporary results, called for the instruction stream. Computer organizations are characterized by the multiplicity of the hardware provided to service the instruction and data streams. Listed below are Flynns four machine organizations: Single instruction stream single data stream (SISD) Single instruction stream multiple data stream (SIMD) Multiple instruction stream single data stream (MISD) Multiple instruction stream multiple data stream (MIMD)
SISD computer organization This organization represents most serial computers available today. Instructions are executed sequentially but may be overlapped in their execution stages. SIMD computer organization In this organization, there are multiple processing elements supervised by the same control unit. All PEs receive the same instruction broadcast from the control unit but operate on different data sets from distinct data streams.
MISD computer organization There are n processor units, each receiving distinct instructions operating over the same data stream and its derivatives. The results (output) of one processor become the input (operands) of the next processor in the macropipe. MIMD computer organization Most multiprocessor systems and multiple computer systems can be classified in this category. MIMD computer implies interactions among the n processors because all memory streams are derived from the same data space shared by all processors. If the n data streams were from disjointed subspaces of the shared memories, then we would have the so-called multiple SISD (MSISD) operation, which is nothing but a set of n independent SISD uniprocessor systems. The last three classes of computer organization are the classes of parallel computers. Parallel/Vector Computers Intrinsic parallel computers are those that execute programs in MIMD mode. There are two major classes of parallel computers, namely shared-memory multiprocessors and messagepassing multi-computers. The major distinction between multiprocessors and multi-computers lies in memory sharing and the mechanisms used for inter-processor communication. The processors in a multiprocessor system communicate with each other through shared variables in a common memory. Each computer node in a multicomputer system has a local memory, unshared with other nodes. Inter processor communication is done through message passing among the nodes. Explicit vector instructions were introduced with the appearance of vector processors. A vector processor is equipped with multiple vector pipelines that can be concurrently used under hardware or firmware control. There are two families of pipelined vector processors. Memory-to-memory architecture supports the pipelined flow of vector operands directly from the memory to pipelines and then back to the memory. Register-to-register architecture uses vector registers to interface between the memory and functional pipelines.
Pipelining: An Overlapped Parallelism Pipelining offers an economical way to realize temporal parallelism in digital computers. The concept of pipeline processing in a computer is similar to assembly lines in an industrial plant. To achieve pipelining, one must subdivide the input task (process) into a sequence of subtasks, each of which can be executed by a specialized hardware stage that operates
concurrently with other stages in the pipeline. Successive tasks are streamed into the pipe and get executed in an overlapped fashion at the subtask level. The subdivision of labor in assembly lines has contributed to the success of mass production in modern industry. By the same token, pipeline processing has led to the improvement of system throughput in the modern digital computer. Assembly lines have been used in automated industrial plants in order to increase productivity. Their original form is a flow line (pipeline) of assembly stations where items are assembled continuously from separate parts along a moving conveyor belt. Ideally, all the assembly stations should have equal processing speed. Otherwise, the slowest station becomes the bottleneck of the entire pipe. This bottleneck problem plus the congestion caused by improper buffering may result in many idle stations waiting for new parts. The subdivision of the input task into a proper sequence of subtasks becomes a crucial factor in determining the performance of the pipeline. In a uniform-delay pipeline, all tasks have equal processing time in all station facilities. The stations in an ideal assembly line can operate synchronously with full resource utilization. However, in reality, the successive stations have unequal delays. The optimal partition of the assembly line depends on a number of factors, including the quality (efficiency and capability) of the working units, the desired processing speed, and the cost effectiveness of the entire assembly line. Arithmetic pipelining The arithmetic logic units of a computer can be segmentized for pipeline operations in various data formats. Well-known arithmetic pipeline examples are the four-stage pipes used in STAR-100, the eight-stage pipelines used in the TI-ASC, the up to 14 pipeline stages used in the CRAY-1, and the up to 26 stages per pipe in the CYBER-205. Instruction pipelining The execution of a stream of instructions can be pipelined by overlapping the execution of the current instruction with the fetch, decode, and operand fetch of subsequent instructions. This technique is also known as instruction look ahead. Almost all highperformance computers are now equipped with instruction-execution pipelines. Processor pipelining This refers to the pipeline processing of the same data stream by a cascade of processors, each of which processes a specific task. The data stream passes the first processor with results stored in a memory block which is also accessible by the second processor. The second processor than passes the refined results to the third, and so on.
Unifunctional vs. multifunctional pipelines A pipeline unit with a fixed and dedicated function, such as the floating-point adder, is called uni-fuctional. A multifunctional pipe may perform different functions, either at different subsets of stages in the pipeline. Static vs. Dynamic pipelines A static pipeline may assume only one functional configuration at a time. Static pipelines can be either unifunctional or multifunctional. Pipelining is made possible in static pipes only if instructions of the same type are to be executed continuously. The function performed by a static pipeline should not change frequently. Otherwise, its performance may be very low. A dynamic pipeline processor permits several functional configurations to exist simultaneously. In this sense, a dynamic pipeline must be multifunctional. On the other hand, a uni-functional pipe must be static. The dynamic configuration needs much more elaborate control and sequencing mechanisms than those for static pipelines. Most existing computers are equipped with static pipes, either uni-functional or multifunctional.

Parallel Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Processing

Uploaded by

Copyright:

Available Formats

CONTENT BEYOND THE SYLLABUS PARALLEL PROCESSING High-performance computers are increasingly in demand in the areas of structural analysis,

You might also like