You are on page 1of 6

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

ORG

246

EPIC Architecture Evaluation and Comparison With Other Architectures


Majid Meghdadi, Ali Akbar Arjomand Hashjin, Yaghoob Gharani Bonab, and Behzad Gaffari

Abstract HP and Intel, recently have been introduced a new method of instruction set called EPIC (Explicitly Parallel Instruction Computation) and also a specific structure, called IPF (Itanium Processor Family). This paper aims to identify the differences between the structures of EPIC and methods of the previous instruction set such as VLIW and Super Oscar. Different aspects of the EPIC has been seen in computer design and the records are listed. A collection of old instructions for maximize use of EPIC performance have also been studied. Key Words: EPIC architecture Itanium Processor Family VLIWMemory Latencies.

1 Introduction

2 Three major works to run ILP: Parallel processing instructions need three major objects: 1. Check out the dependencies between the instructions to determine which instructions can be grouped together for parallel execution. 2. Allocating instructions to hardware operation units. 3. Determination of start time and start up time of instructions.
Table 1. Four Major Categories of ILP Architectures.

xplicitly parallel instruction computing or EPIC is a word that was invented by HP and INTEL in 1997 to describe the manner of calculation that the researchers were investigating the early 1980s about it. It is also called independent architecture. This was the basis for development of Itanium architecture by HP and INTEL, and HP later stated that the only old name for the Itanium architecture has been EPIC. And specific instruction set architecture shared-designed wascalledIA-64. Explicitly parallel instruction computing allows microprocessors to uses software instructions instead of using complex circuits on chip to control parallel execution, performed by compiler and run in parallel. The purpose of the invention was, to making it easy scalability of performance, without the use of higher clock frequencies.
F.A. Majid Meghdadi, PhD. Computer Engineering, University of Zanjan, Zanjan, Iran. S.B. Ali Akbar Arjomand Hashjin, MCs student, is with the computer Engineering Faculty, Univercity of zanjan, zanjan, Iran. T.C. Yaghoob Gharan Bonabi, MCs student, is with the computer Engineering Faculty, Univercity of zanjan, zanjan F.D. Behzad Gaffari, MCs student, is with the computer Engineering Faculty, Univercity of zanjan, zanjan, Iran.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

247

Table 1 illustrates the four- level ILP instructions that are result of run in hardware or software. Super-scalar processor is a traditional and sequential set of instruction that means a program, based on a sequential machine model. It means program results should be the same as one in which the instructions have been processed one by one on a sequential machine. So that an instruction should be completed before examine of the next instruction. Super-scalar processors include hardware to accelerate the program fetch, decode, execute and complete multiple instructions per cycle, but the way, that the program be maintained. Decode and promulgate various instructions need hardware dependency checking and to assign instructions to function units, need hardware routing and to scheduling execution time of instructions need scoreboard hardware. VLIW is on the opposite side. Three tasks of ILP have been assigned to compiler. VLIW implementation will use instructions that provide individual operations for each operation unit of each cycle. Width of instruction word depends on the number of operation units. For example, width of multi-flow machines with long instruction is up to 28 operations. Groups of independent operations are placed together within a VLIW and operations determined by position in the long instruction word slash have been allocated to operations unit. Set up scheduling is limited to the instruction in which an operation can be seen. All VLIW operations begin run in parallel. Order of the instruction words defines a specific program on a specific implementation. The program is defined by the VLIW instruction cycle. To determine which operations can be grouped together and where in the instruction chain is the task of compiler. Therefore the disadvantage of VLIW is also described. Code compiled for a run with specific operation units and specific records will not run correctly on deferent performances with deferent functions or records. If the compiler determine grouping of independent instructions and pass it to the instructions through specific information, we have something that Fisher and Rao is called architecture independent or what is

now known as EPIC style of architecture. EPIC maintains consistency across the different performance, like super-scalars, but doesnt need super-scalar check dependencies hardware. This way, we can say that EPIC is the best super-scalar structure and it combines VLIW. It seems the first EPIC architecture is Burton Smiths horizon (1998), that provide explicit offer of the field to next dependence instruction. Another category of ILP architecture is the one in which grouping and allocating of functional units, is done by the compiler. But set up scheduling and start of operations is done by hardware scheduling. This method is called dynamic VLIM and has several advantages to the old VLIM. Because it responses to run time events that compiler can not control them in compile time. 3 Features of EPIC architecture and the historical background 3.1 Specific parallel As described above, specific information about independent instructions of program is the main features of the EPIC architecture. In EPIC architecture, three 41-bit instructions are packed together in a 128-bit packet. The 5-bit format identified type of instructions and stops architecture, between instruction groups. In littleendian format (related to less important byte in the front) a packet is displayed as follow: Format Instruction 0 Instruction 1 Instruction 2

45

86

127

Packets can have 0, 1 or maximum 2 stops. So groups of instructions (means independent set of instructions) can include instruction packets. To complete packets instructions may be necessary in some cases. The type of instructions (one of six, computational and logical unit, integer without computational and logical unit, memory, floating point, jumping and developing) can help allocation of operating unit and routing through decoding, but this information only provide the type of

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, may 2012, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

248

information not identifying a specific operation unit. This way, this is not in dynamic or old VLIW class. 3.2 declared execution To avoid conditional jumping, we can condition every instruction on true or false values. Only those instructions that have the true value, are allowed to be written into their destination registers. Therefore, conditional chain can be performed without jumping (that is called conditional alteration). Instructions to each side, are inversely related to one of two registers and can be run simultaneously. 3.3 jump service Conditional jump consist of three deferent stages: (1) deciding to jump or not. (2) provide target address and (3) change the PC. By separating these works, several comparison can be made at the same time, first in instruction flow. Also different objectives can be identified and this way, instructions can be pre-fetch. Hence, since specific instruction or a set of instructions have the effect of multi-way jumping with priority, change of the PC can be delayed. To record comparison results IPF architecture use predicate register and consists of 8 registers to use in pre-fetch. Jumping instruction of IPF, are provided by using a prediction and can identify Jump register (operation 3) or a relative address (operation 2+3). 3.4 Compiler control of the memory hierarchy EPIC architecture should be able to provide advices about possible delay performance (means, where the amount of data found in the memory hierarchy) and possible location of stored or uploaded data (means, where the amount of data in memory hierarchy). These are instructions for an accurate schedule of performance. Thus still antilock registers and scoreboard methods are being used.

IPF provide guidance given and also provide pre-fetch by addressing mode updates. Due to temporary location at vector operands, the data memory cache bypass was a feature of vector processors. 3.5 Control speculation To start loads (or other potentially-long-running instructions) on time, they must often be moved up beyond a branch. The problem with this approach occurs when the load (or other instruction) generates an exception. If the branch is taken, the load (or other instruction) would not have been executed in the original program and thus the exception should not be seen. To allow this type of code scheduling, an EPIC architecture should provide a speculative form of load (or other longrunning instruction) and tagged operands. When the speculative instruction causes an exception, the exception is deferred by tagging the result with the required information. The exception is handled only when a nonspeculative instruction reads the tagged operand (in fact, multiple instructions may use the tagged operand in the meantime and merely pass the tag on). Thus, if the branch over which the instruction is moved is not taken, no exception occurs, thereby following the semantics of the original program. IPF provides speculative load and speculation check instructions. Integer speculative loads set a NaT (Not a Thing) bit associated with the integer destination register when an exception is deferred. Floating-point speculative loads place a NaTVal (Not a Thing Value) code in the floating-point destination register when an exception is deferred. These bits and encoded values propagate through other instructions until a speculation check

instruction is executed. At that point a NaT bit or NaTVal special value will raise the exception. 3.6 Data speculation To be able to rearrange load and store instructions, the compiler must know the memory addresses to which the instructions refer. Because

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, may 2012, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

249

of aliasing, compilers are not always able to do this at compile time. In the absence of exact alias analysis, most compilers must settle for safe but slower (i.e., unreordered) code. EPIC architectures provide speculative loads that can be used when an alias situation is unlikely but yet still possible. A speculative load is moved earlier in the schedule to start the load as early as possible; and, at the place where the loaded value is needed, a dataverifying load is used instead. If no aliasing has occurred, then the value retrieved by the speculative load is used by the data-verifying load instruction. Otherwise, the dataverifying load reexecutes the load to obtain the new value. IPF provides advanced load and advanced load check instructions that use an Advanced Load Address Table (ALAT) to detect stores that invalidate advanced load values. 4 Effect of Hazards on Superscalar Execution Figure shows effects of hazards.

4.1 Data Hazards Data hazards limit ILP. For example: I1: movi r3, 3 I2: add r1, r3, r4 Instructions I1 and I2 cannot possibly execute together correctly even if multiple integer units are available. Superscalar Architectures must provide logic to determine if such dependencies exist: Pre-fetch a small number (e.g. 16) of instructions. Have circuits to determine dependencies Between instructions. Issue instructions that are independent. This is a complex, expensive process and need skilled engineering designs that are expensive, occupies precious chip real-estate that are expensive and complex circuits with higher failure rate that are expensive. Compiler will identify instructions that can be executed in parallel and inform the hardware. IA64 defines 128-bit bundles that 3 instructions (packs) of 40 bits each and 8 bit Template and dependent and independent instructions may be mixed in the same bundle. Compiler sets template bits to inform hardware which instructions are independent. Template can identify independent instructions across. This system, maximize

utilization of available functional units. 4.2 Control Hazards Consider this program: I1: mov r2, r3 I2: sub r1, r2, r7 I3: beq r1, I7 I4: sub r3, r1, r7 I5: muli r3, r3, 2 I6: j I10 I7: muli r7, r3, 4 I8: div r6, r7, r3 I9: shri r6, 7 I10: When we encounter the branch in I3, Should we start pre-fetching and executing I4 I6? OR should we start pre-fetching and executing I7 I9? This decision is made before the branch

Fig.2. the effect of hazards on Superscalar Execution [5]

End result in each case is that fewer than two instructions are completed per clock cycle that causes Loss of efficiency. This means loss of processing power!

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, may 2012, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

250

outcome is known, to prevent stalls. Mistakes are costly. Most unroll instructions erroneously executed. Erroneous execution wastes time. The IA64 Solution Idea: Compiler tags each side of a branch with a predicate. Bundle tagged instructions and set template bits to allow parallel execution of predicated instructions. Both sides of the branch are executed simultaneously. When the outcome of the branch is known, the effects of the correct side are committed (registers modified, etc.), while the effects (and remainder) of the wrong side are discarded. Benefits: No need to unroll effects (they are committed only after we know which is the correct side of the branch). Time taken to execute wrong side at least partially amortized by execution of correct side, assuming sufficient functional units. 4.3 Memory Latencies Memory units are slow. Delay of between one and several tens of thousands of cycles, depending on multi-level cache hit/miss profiles. For IA64 solves this by predication.

when it is read. Loads may belong to decision paths that are never executed. Hoisting effectively causes these loads to be executed anyway, even if their contents arent actually required. Therefore Loads are speculative: They are done on the speculation that the results are needed later. A Check instruction is placed just before the load results are needed. This, checks for exceptions in the speculative load. As well as commits the effect of load to the target register. 5 Conclusions EPIC architectures are a new style of instruction set for computers. They are the skillful combination of several preexisting ideas in computer architecture along with a nontraditional assignment of the responsibilities in ILP processing between the compiler and the hardware. As such, EPIC architectures can claim to combine the best attributes of superscalar processors (compatibility across implementations) and VLIW processors (efficiency since less control logic). Through nontraditional translation, current traditional instruction sets can be used but the combined hardware and software system can exploit the efficiency of VLIW and EPIC implementations. 6- References [1]. VINOD KATHAIL, MEMBER, IEEE, MICHAEL S. SCHLANSKER, MEMBER, IEEE, AND B.RAMAKRISHNA RAU, FELLOW, IEEECompiling for EPIC Architectures THE IEEE, VOL. 89, NO. 11, NOVEMBER 2001. [2].W.W.S. Chu, R.G. Dimond, S. Perrott, S.P. Seng and W. Luk Department of Computing, Imperial College London 180 Queens Gate, London SW7 2BZ, Customisable EPIC Processor: Architecture and Tools 1530-1591/04 , 2004 ,IEEE.

example: I1: lw r1, 0(r4) I2: add r2, r1, r3 Potentially delay of several thousand cycles between I1 and I2 may be impractical to schedule sufficient instructions in between. Solution: Idea: Move all load instructions to the start of the program. This is called hoisting. Loads will be executed concurrently with the rest of the program. Hopefully data will be ready in register

[3] R. Colwell, et al., A VLIW Architecture for a Trace Scheduling Compiler, IEEE Transactions on Computers, August 1988, pp. 967-979.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, MAY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 5, may 2012, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

251

[4].Mark Smotherman &Clemson University-Dept. of Computer Science-Understanding EPIC Architectures and Implementations. [5]. Computer Organization and Architecture, 5th Edition,William Stallings, Prentice Hall International Editions, ISBN0-13-085263-5, 2000. [6] M.S. Schlansker and B.R. Rau, EPIC: Explicitly Parallel Instruction Computing, Computer, February 2000, pp. 3745. [7] K.V. Palem, S. Talla and W.F. Wong, Compiler Optimizations for Adaptive EPIC Processors, Proc. First International Workshop on Embedded Software, LNCS 2211, Springer-Verlag, 2001, pp. 257273.

[8] J. Crawford, Introducing the Itanium Processors, IEEE Micro, September-October 2000, pp. 9-11.
[9]. A 1.5GHz Third Generation Itanium Processor, JasonStinson, Stefan Rusu, Intel Corporation, 2002. [10]. Michael S. Schlansker, B. Ramakrishna Rau EPIC: An Architecture for Instruction-Level Parallel Processors.

[11] M. Schlansker, et al., Achieving High Levels of Instruction-Level Parallelism with Reduced Hardware Complexity, HP Labs Tech. Rept. HPL96-120, November 1994.
[12]. S. Banerjia, et al., MPS: Miss-Path Scheduling forMultiple Issue Processors, IEEE Transactions on Computers, December 1998, pp. 1382-1397

You might also like