Presentation

Superscalar and vliw architecture
SUPER SCALAR AND VLIW ARCHITECTURE

1) INSTRUCTION LEVEL PARALLELISM:The potential overlap among instructions is called instruction level parallelism. Instructionlevel parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. There are two approaches to it; 1. Hardware 2. Software. Hardware level works upon dynamic parallelism whereas, the software level works on static parallelism. A goal of compiler and processor designers is to identify and take advantage of as much ILP as possible. Ordinary programs are typically written under a sequential execution model where instructions execute one after the other and in the order specified by the programmer. ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed. Techniques that are used to exploit ILP include: Instruction pipelining where the execution of multiple instructions can be partially overlapped. Superscalar execution, VLIW, and the closely related Explicitly Parallel Instruction Computing concepts, in which multiple execution units are used to execute multiple instructions in parallel.
2) SUPER SCALAR ARCHITECTURE:Superscalar processors are designed to exploit more instruction-level parallelism in user programs. Only independent instructions can be executed in parallel without causing a wait state. The amount of instruction-level parallelism varies widely depending on the type of code being executed. SUPER SCALER PIPELINE:-
Page | 1

3) SUPER SCALER ARCHITECTURE BLOCK DIAGRAM:-
Figure shows a crude high-level block diagram of a superscalar RISC or CISC processor implementation. The implementation consists of a collection of (integer ALUs, floating-point ALUs, load/store units, branch units, etc.) that are fed operations from an instruction dispatcher and operands from a register file. The execution units have reservation stations to buffer waiting operations that have been issued but are not yet executed. The operations may be waiting on operands that are not yet available. The instruction dispatcher examines a window of instructions contained in a buffer. The dispatcher looks at the instructions in the window and decides which ones can be dispatched to execution units. It tries to dispatch as many instructions at once as is possible, i.e. it attempts to discover maximal amounts of instruction-level parallelism. Higher degrees of superscalar execution, i.e., more execution units, require wider windows and a more sophisticated dispatcher. It is conceptually simple though expensive to build an implementation with lots of execution units and an aggressive dispatcher, but it is not currently profitable to do so. The reason has more to do with software than hardware. The compilers for RISC and CISC processors produce code with certain goals in mind. These goals are typically to minimize code size and run time. For scalar and very simple superscalar processor implementation, these goals are mostly compatible. For high-performance superscalar implementations, on the other hand, the goal of minimizing code size limits the performance that the superscalar implementation can achieve. Performance is
Page | 2

limited because minimizing code size results in frequent conditional branches, about every six instructions. Conceptually, the processor must wait until the branch is resolved before it can begin to look for parallelism at the target of the branch. To avoid waiting for conditional branches to be resolved, high-performance superscalar implementations implement branch prediction. With branch prediction, the processor makes an early guess about the outcome of the branch and begins looking for parallelism along the predicted path. The act of dispatching and executing instructions from a predictedbut unconfirmedpath is called speculative execution. Unfortunately, branch prediction in not 100% accurate. Thus, with speculative execution, it is necessary to be able to undo the effects of speculatively executed instructions in the case of a mispredicted branch. Some implementations, such as Intels superscalar Pentium, simply prevent instructions along the predicted path from progressing far enough to modify any visible processor state, but to gain the most from speculative execution, it is necessary to allow instructions along the predicted path to execute fully. To be able to undo the effects of full, speculative execution, a hardware structure called a reorder buffer can be employed. This structure is an adjunct to the register file that keeps track of all the results produced by instructions that have recently been executed or that have been dispatched to execution units but have not yet completed. The reorder buffer provides a place for results of speculatively executed instruction (and it solves other problems as well). When a conditional branch is, in fact, resolved, the results of the speculatively executed instructions can be either dropped from the reorder buffer (branch mispredicted) or written from the buffer to the register file (branch predicted correctly). LIMITATIONS:Performance improvement is limited by three key areas 1) Limited amount of instruction level parallelism 2) the complexity and time cost of the dispatcher and associated dependancy checking logic. 3)The branch instruction processing Examples of super scalar architecture processors are power pc 604 and Pentium processor. Power pc has 6 independent execution units which contain a branch execution unit, load/store unit, three integer unit, and one floating point unit. Pentium processor havethree independent execution units: 2 Integer units, Floating point unit .
Page | 3
Superscalar and vliw architecture VLIW ARCHITECTURE:A VLIW computer is based on an architecture that implements Instruction Level Parallelism (ILP) A typical VLIW (very long instruction word) machine has instruction words hundreds of bits in length. Multiple functional units are used concurrently in a VLIW processor. All functional units share the use of a common large register file. A VLIW computer is based on an architecture that implements Instruction Level Parallelism (ILP) meaning execution of multiple instructions at the same time. A Very Long Instruction Word (VLIW) specifies multiple numbers of primitive operations that are grouped together They are passed to a register file that executes the instruction with the help of functional units provided as part of the hardware Unlike Super Scalar architectures, in the VLIW architecture all the scheduling is static This means that they are not done at runtime by the hardware but are handled by the compiler. The compiler takes the complex instructions that need to be handled, as a result of Instruction Level Parallelism and compiles them into object code .The object code is then passed to the register file It is this object code that is referred to as the Very Long Instruction Word (VLIW). The compiler prearranges the object code so the VLIW chip can quickly execute the instructions in parallel This frees up the microprocessor from having to perform the complex and continual runtime analysis that Super Scalar RISC and CISC chips must do. VLIW PROCESSOR BLOCK DIAGRAM:-
Page | 4

Figure shows a generic VLIW implementation, without the complex reorder buffer and decoding and dispatching logic . While a VLIW architecture reduces hardware complexity over a superscalar implementation, a much more complex compiler is required. Extracting maximum performance from a superscalar RISC or CISC implementation does require sophisticated compiler techniques, but the level of sophistication in a VLIW compiler is significantly higher.
. VLIW simply moves complexity from hardware into software. Luckily, this trade-off has a significant side benefit: the complexity is paid for only once, when the compiler is written instead of every time a chip is fabricated. Among the possible benefits is a smaller chip, which leads to increased profits for the microprocessor vendor and/or cheaper prices for the customers that use the microprocessors. Complexity is usually easier to deal with in a software design than in a hardware design. Thus, the chip may cost less to design, be quicker to design, and may require less debugging, all of which are factors that can make the design cheaper. Also, improvements to the compiler can be made after chips have been fabricated; improvements to superscalar dispatch hardware require changes to the microprocessor, which naturally incurs all the expenses of turning a chip design.
VLIW PIPELINING:-
Page | 5
Superscalar and vliw architecture LIMITATIONS:VLIW architecture still has many problems it must overcome code expansion high power consumption scalability Also the VLIW compiler is specific it is an integral part of the VLIW system A poor VLIW compiler will have a much more negative impact on performance than would a poor RISC or CISC compiler . Examples of vliw architecture is viper processor which Executes four 32 bit operations concurrently .Up to 2 load/store operations at once. To solve the compiler problem Viper uses only one ALU
VLIW V/S SUPER SCALAR ARCHITECTURE:-
Super Scalar architectures, in contrast, use dynamic scheduling that transform all ILP complexity to the hardware This leads to greater hardware complexity that is not seen in VLIW hardware VLIW chips dont need most of the complex circuitry that Super Scalar chips must use to coordinate parallel execution at runtime .
Page | 6

Thus in VLIW hardware complexity is greatly reduced the executable instructions are generated directly by the compiler they are then passed as native code by the functional units present in the hardware VLIW chips can 1) cost less 2) achieve significantly higher performance than comparable RISC and CISC chips
QUESTIONS:1) Compare super scalar and vliw architecture. 2) Explain vliw and super scalar architectures. REFERENCES:1) Techniques to improve performance beyond pipelining: superpipelining, superscaler, and vliw by jean-luc gaudiot, jung yup kang, won woo ro 2) An introduction to very-long instruction word (VLIW) computer architecture by Philips Semiconductors 3) vliw computing by Serge Vaks, Mike Roznik, Aakrati Mehta
Page | 7

Presentation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation

Uploaded by

Copyright:

Available Formats

Superscalar and vliw architecture

SUPER SCALAR AND VLIW ARCHITECTURE

Superscalar and vliw architecture

Superscalar and vliw architecture

Superscalar and vliw architecture

VLIW V/S SUPER SCALAR ARCHITECTURE:-

Superscalar and vliw architecture

You might also like