Professional Documents
Culture Documents
2) SUPER SCALAR ARCHITECTURE:Superscalar processors are designed to exploit more instruction-level parallelism in user programs. Only independent instructions can be executed in parallel without causing a wait state. The amount of instruction-level parallelism varies widely depending on the type of code being executed. SUPER SCALER PIPELINE:-
Page | 1
Figure shows a crude high-level block diagram of a superscalar RISC or CISC processor implementation. The implementation consists of a collection of (integer ALUs, floating-point ALUs, load/store units, branch units, etc.) that are fed operations from an instruction dispatcher and operands from a register file. The execution units have reservation stations to buffer waiting operations that have been issued but are not yet executed. The operations may be waiting on operands that are not yet available. The instruction dispatcher examines a window of instructions contained in a buffer. The dispatcher looks at the instructions in the window and decides which ones can be dispatched to execution units. It tries to dispatch as many instructions at once as is possible, i.e. it attempts to discover maximal amounts of instruction-level parallelism. Higher degrees of superscalar execution, i.e., more execution units, require wider windows and a more sophisticated dispatcher. It is conceptually simple though expensive to build an implementation with lots of execution units and an aggressive dispatcher, but it is not currently profitable to do so. The reason has more to do with software than hardware. The compilers for RISC and CISC processors produce code with certain goals in mind. These goals are typically to minimize code size and run time. For scalar and very simple superscalar processor implementation, these goals are mostly compatible. For high-performance superscalar implementations, on the other hand, the goal of minimizing code size limits the performance that the superscalar implementation can achieve. Performance is
Page | 2
Page | 3
Superscalar and vliw architecture VLIW ARCHITECTURE:A VLIW computer is based on an architecture that implements Instruction Level Parallelism (ILP) A typical VLIW (very long instruction word) machine has instruction words hundreds of bits in length. Multiple functional units are used concurrently in a VLIW processor. All functional units share the use of a common large register file. A VLIW computer is based on an architecture that implements Instruction Level Parallelism (ILP) meaning execution of multiple instructions at the same time. A Very Long Instruction Word (VLIW) specifies multiple numbers of primitive operations that are grouped together They are passed to a register file that executes the instruction with the help of functional units provided as part of the hardware Unlike Super Scalar architectures, in the VLIW architecture all the scheduling is static This means that they are not done at runtime by the hardware but are handled by the compiler. The compiler takes the complex instructions that need to be handled, as a result of Instruction Level Parallelism and compiles them into object code .The object code is then passed to the register file It is this object code that is referred to as the Very Long Instruction Word (VLIW). The compiler prearranges the object code so the VLIW chip can quickly execute the instructions in parallel This frees up the microprocessor from having to perform the complex and continual runtime analysis that Super Scalar RISC and CISC chips must do. VLIW PROCESSOR BLOCK DIAGRAM:-
Page | 4
. VLIW simply moves complexity from hardware into software. Luckily, this trade-off has a significant side benefit: the complexity is paid for only once, when the compiler is written instead of every time a chip is fabricated. Among the possible benefits is a smaller chip, which leads to increased profits for the microprocessor vendor and/or cheaper prices for the customers that use the microprocessors. Complexity is usually easier to deal with in a software design than in a hardware design. Thus, the chip may cost less to design, be quicker to design, and may require less debugging, all of which are factors that can make the design cheaper. Also, improvements to the compiler can be made after chips have been fabricated; improvements to superscalar dispatch hardware require changes to the microprocessor, which naturally incurs all the expenses of turning a chip design.
VLIW PIPELINING:-
Page | 5
Superscalar and vliw architecture LIMITATIONS:VLIW architecture still has many problems it must overcome code expansion high power consumption scalability Also the VLIW compiler is specific it is an integral part of the VLIW system A poor VLIW compiler will have a much more negative impact on performance than would a poor RISC or CISC compiler . Examples of vliw architecture is viper processor which Executes four 32 bit operations concurrently .Up to 2 load/store operations at once. To solve the compiler problem Viper uses only one ALU
Super Scalar architectures, in contrast, use dynamic scheduling that transform all ILP complexity to the hardware This leads to greater hardware complexity that is not seen in VLIW hardware VLIW chips dont need most of the complex circuitry that Super Scalar chips must use to coordinate parallel execution at runtime .
Page | 6
QUESTIONS:1) Compare super scalar and vliw architecture. 2) Explain vliw and super scalar architectures. REFERENCES:1) Techniques to improve performance beyond pipelining: superpipelining, superscaler, and vliw by jean-luc gaudiot, jung yup kang, won woo ro 2) An introduction to very-long instruction word (VLIW) computer architecture by Philips Semiconductors 3) vliw computing by Serge Vaks, Mike Roznik, Aakrati Mehta
Page | 7