You are on page 1of 69

Comuter Archtecture Lab. Comuter Archtecture Lab.

Alpha
Miciopioccssoi - Casc Study I
3-2 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Philosophy
Smait compilei, smait machine, and a CREAT
ciicuit design
Compilei cieates iecoid of execution
Machine exploits additional infoimation available at
iuntime
Woiks acioss baiiieis to compile-time analysis
Iocus on scalai piogiams
Add iesouices foi vectoi
Amdahl's law
3-3 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Roadmap
1997 1998 1999 1995 1996 2000 2001
EV5/333 EV5/333
21164 21164
EV6/575 EV6/575
21264 21264
EV68/1000 EV68/1000
21264 21264
PCA56/533 PCA56/533
21164PC 21164PC
EV56/600 EV56/600
21164 21164
0.5 m
0.35 m
0.35 m
0.35 m
EV67/750 EV67/750
21264 21264
0.28 m
PCA57/600 PCA57/600
21164PC 21164PC
0.28 m
0.18 m
0.18 m
Higher Performance
L
o
w
e
r

C
o
s
t

EV8 EV8
0.13 m
EV7/1000 EV7/1000
21364 21364
3-4 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha Aichitectuie
Iull 64-bit load/stoie RISC aichitectuie
High clock speed, multiple instiuction issue, and multiple
piocessois
Sepaiate integei and floating point iegisteis(Thiity-two each)
64-bit viitual byte addiessing
32-bit fixed instiuction size(6-bit opcode)
3-5 Comuter Archtecture Lab. Comuter Archtecture Lab.
PALcode
Similai to the BIOS libiaiies in
PC
Piivileged mode
complete contiol of the machines
state
physical I-stieam
inteiiupt disabled
Special Instiuction
Access all inteinal state piivate
CPRs
Viitual oi Physical LD/ST
Piivileged jump
Stiict Coding Rules
Applcalcns
HW
peralnq
Syslen
PAL
3-6 C o m u t e r C o m u t e r
Alpha 21O64 Oveiview
Veiy fast clock(2OOMHz 21O64 in 1992 and 275Mhz
21O64A in 1994)
Simple instiuctions
Dual issue supeiscalai
Thiee paiallel pipelines
Integei pipeline : 7 stages
Iloating point pipeline : 1O stages
Load/stoie pipeline : 7 stages
Dynamic bianch piediction with a 2O48 entiy table
Sepaiate instiuction and data caches
3-7 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21O64 Block diagiam
3-8 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21O64 pipeline
S
swap
K
|
p 5
.
|
p 4

|
p 8
C
|
p 2
F
|
p !
l
issue
0
0ecode
\
|
w|
/
/LU !
B
/LU 2
\
w|ie
|eqs
\
w|ie
|eqs
B
cache
access
/
add|ess
add
l
issue
0
0ecode
F
ins.
ech
S
swap
Load/so|e
ins|ucion
l
B,pass
B,pass
lneqe|
ins|ucion
B,passes
3-9 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21164 Oveiview
Quad-issue Supeiscalai
Low latency in functional units
High thioughput, nonblocking memoiy subsystem
low-latency piimaiy caches
Laige second-level, on-chip wiite-back cache
1O peicent fastei than the pievious 21O64
implementation
3-10 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21164 PC
Shipping at 583MHz Novembei 1998
16.7/17.O estimated SPECint95
(base/peak)
2O.7/22.7 estimated SPECfp95
(base/peak)
34O MB/sec STREAMS
Chip featuies:
1.O cm

7 million tiansistois
32K 2-set I-cache
16K viitual D-cache
impioved 3-cycle multipliei
impioved 6 bit/cycle dividei
incieased wiite buffei size (8 x 32B)
suppoit foi 2OOMHz off-chip cache
3-11 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Iile(21164)
4O Integei Registei
RO-R31 foi CPR
Eight shadow iegistei foi PALcode
4 iead poit(2 foi pipe), 2 wiite poit(1 foi pipe)
32 IP Registei
9poit(5 iead, 4 wiite)
3-12 Comuter Archtecture Lab. Comuter Archtecture Lab.
Integei Pipeline(21164)
Bypass fiom any stage
except multiply : only fiom S6
O ` 2 3 4 5 G
F Sw ` O wR
cache
access
decode
swao
o|edict
issue
RF |ead
/LU 1 /LU2 w|ite
FC Oen
\/ Oen
TB
DTB
HitlHiss
HtlHss
Byoass(Fo|wa|dinq)

3-13 Comuter Archtecture Lab. Comuter Archtecture Lab.
Iloating Point Pipeline(21164)
9 stage(5ns cycles)
0 l 2 3 4 5 6
F BW l 0 FWP
cache
access
decode
swab
b|edict
issue
PF |ead
/dd
3x

7 8
LlD
Vu|l
BHFT
Vu|2
/ddlPnd
/ddlPnd
w|ite
w|ite
Bybass
3-14 Comuter Archtecture Lab. Comuter Archtecture Lab.
Memoiy Pipeline(21164)
21164 used L2 cache access in pipeline
. : + e
|| S! | |
::'-
.-J
M1

|. ::'-
| .-J
MZ M1 M1 M M M M MJ
J .
|. ::'-
J| .-J
J|
::'- |''
3-15 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction/Data stieam(21164)
I-stieam suppoit
Stieam Buffei piefetches in-line code
BHT/1SR Stack/Bubble Squash
D-stieam suppoit
Pending LD/Wiapped Reads impiove latency
Buist mode RAM suppoit
Hit undei Miss
Pending Stoie, fully pipeline LD/ST to cache
3-16 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 Oveiview
Thiid-geneiation 64-bit Alpha miciopiocessoi
New motion-video instiuction(MVI)
4-way out-of-oidei-issues
dynamic scheduling
iegistei ienaming
speculative execution
4 integei execution unit
2 floating-point execution unit
BIU maintains coheiency between the D-cache and
the L2 cache and main memoiy
3-17 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 pipeline
3-18 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21264 Update
Miciopiocessoi Ioium 1996
3O- SPECint95 and 5O- SPECfp95
5OOMHz in O.35um CMOS
Spectaculai memoiy bandwidth
Systems 2H97
Iiist powei on 1uly 1997 (no IP)
Iull function powei on Ieb 1998
Pioduction powei on 1une 1998
3-19 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Highlight
Impiove
Single piocessoi peifoimance, opeiating fiequency, and
memoiy system
SMP scaling
System peifoimance density
Reliability and availability
Deciease
System cost
System complexity
3-20 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Ieatuies
Alpha 21264 coie with enhancements
Integiated L2 Cache
Integiated memoiy contiollei
Integiated netwoik inteiface
Suppoit foi lock-step opeiation to enable high-
availability systems.
3-21 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Block Diagiam
Memory
Controller
R
A
M
B
U
S
21264
Core
16 L1
Miss BuIIers
L2
Cache
Address Out
Address In
Network
InterIace
N
S
E
W
I/O
16 L1
Victim BuI
16 L2
Victim BuI
64K Icache
64K Dcache
3-22 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Coie
Int
Reg
Map
Branch
Predictors
FETCH MAP QUEUE REG EXEC DCACHE
Stage: 0 1 2 3 4 5 6
L2
cache1
.5MB
6-Set
Int
Issue
Queue
(20)
Exec
4 Instructions / cycIe
Reg
FiIe
(80)
Victim
Buffer
L1
Data
Cache
64KB
2-Set
FP
Reg
Map
FP ADD
Div/Sqrt
FP MUL
Addr
80 in-fIight instructions
pIus 32 Ioads and 32 stores Addr
Miss
Address
Next-Line
Address
L1 Ins.
Cache
64KB
2-Set
Exec
Exec
Exec
Reg
FiIe
(80)
FP
Issue
Queue
(15)
Reg
FiIe
(72)
3-23 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated L2 Cache
1.5 MB
6-way set associative
16 CB/s total iead/wiite bandwidth
16 Victim buffeis foi L1 -> L2
16 Victim buffeis foi L2 -> Memoiy
ECC SECDED code
12ns load to use latency
3-24 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated Memoiy Contiollei
Diiect RAMbus
High data capacity pei pin
8OO MHz opeiation
3Ons CAS latency pin to pin
6 CB/sec iead oi wiite bandwidth
1OOs of open pages
Diiectoiy based cache coheience
ECC SECDED
3-25 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 Integiated Netwoik Inteiface
Diiect piocessoi-to-piocessoi inteiconnect
1O CB/second pei piocessoi
15ns piocessoi-to-piocessoi latency
Out-of-oidei netwoik with adaptive iouting
Asynchionous clocking between piocessois
3 CB/second I/O inteiface pei piocessoi
3-26 Comuter Archtecture Lab. Comuter Archtecture Lab.
Alpha 21364 System Block Diagiam

3-27 Comuter Archtecture Lab. Comuter Archtecture Lab.


Alpha 21364 Technology
O.18 um CMOS piocess
1OOO- MHz
1OO Watts 1.5 volts
3.5 cm

6 Layei Metal
1OO million tiansistois
8 million logic, 92 million RAM
7O SPECint95 (estimated)
12O SPECfp95 (estimated)
RTL model iunning
Tapeout 4Q99
Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM
Miciopioccssoi - Casc Study I
3-29 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Aichitectuie Oveiview
Advanced RISC Machine
Oiiginally intended to simple,
low cost, 32bit system to be
used peisonal computei
veiy definite RISC piopeities
low numbei of instiuction,
addiessing mode, instiuction
foimats
all instiuction executes in a cycle
memoiy accessed only by
load/stoie instiuction
haidwiied contiol
Best MIPS pei watt and $
3-30 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Application Overview
Portable
Apple Newton PDA, Mobile Computer
GSM, PCS, Smart Phone, Video Phone
ISDN Chip
Embedded
ATM Card
Smart Card
Consumer Multimedia
Oracle Network Computer
Settop Box
Video Game, Education Game
Camera
3-31 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Chips
CPU Product Description Process Die
Area
Average
Power
Performance
ARM7TD
MI
ARM7TDMI Core
(Optimized Hard Macro)
0.35m
0.25m
2.1mm

1.0mm

0.6mW/MHz
N/A
0.9 MIPS/MHz or
59 MIPS 66MHz
N/A
ARM710T ARM7TDMI Core 8KB
UniIied Cache MMU
Note: Power
calculations made with
cache on.
0.35m
0.25m
11.7mm

5.8mm

1.8mW/MHz
N/A
0.9 MIPS/MHZ or
53 MIPS 59MHz
N/A
ARM740T ARM7TDMI Core 8KB
UniIied Cache MMU
Note: Power
calculations made with
cache on.
0.35m
0.25m
9.8mm

4.9mm

1.8mW/MHz
N/A
0.9 MIPS/MHZ or
53 MIPS 59MHz
N/A
3-32 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM Chips (Cont'd)
CPU Product Description Process Die Area Average
Power
Performance
SA-110 SA-110 Core 0.35m
0.35m
0.35m
0.35m
50mm
N/A
N/A
N/A
110mW
N/A
N/A
N/A
1.15 MIPS/MHz or
115 MIPS 100MHz
1.15 MIPS/MHz or
191 MIPS 166MHz
1.15 MIPS/MHz or
230 MIPS 200MHz
1.15 MIPS/MHz or
268 MIPS 233MHz
SA-1100 SA-110 Core Caches
MMU Display Controller
0.35m
0.35m
0.35m
0.35m
N/A
N/A
N/A
N/A
230mW
N/A
330mW
550mW
1.13 MIPS/MHz or
150 MIPS 133MHz
1.13 MIPS/MHz or
180 MIPS 160MHz
1.16 MIPS/MHz or
220 MIPS 190MHz
1.14 MIPS/MHz or
250 MIPS 220MHz
SA-1110 SA-110 Core SA-1100
Eunctions Enhanced
Memory I/O
0.35m
0.35m
N/A
N/A
240mW
400mW
1.13 MIPS/MHz or
150 MIPS 133MHz
1.14 MIPS/MHz or
235 MIPS 206MHz

3-33 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction Set Summaiy(V-4)
Icatuic
high code density
Conditional execution
fiist 4 bits of opcodes contains 16 possible conditions
Baiiel shiftei
Encoding of semantic content in each instiuction
easy instiuction decoding
A small numbei of highly flexible instiuction types
Consistent instiuction data foimats
3-34 Comuter Archtecture Lab. Comuter Archtecture Lab.
1O Basic Instiuction Types
2 types (ALU, baiiel siftei, multipliei, 16 visible 32 bit iegisteis)
Data piocessing and PSR tiansfei
aiitlmctic (SUB RSB ADD ADC SBC RSC CMP CMN)
logic (AND EOR TST TEQ ORR MOV BIC MVN)
slift (LSL, LSR, ASR, ASL, ROR)
Multiply and Multiply-Accumulate(Mul, MLA)
3 types (Tiansfei of data between main memoiy and iegistei bank)
Ilexibility of addiessing(single data tiansfei : LDR, STR)
iapid context switching(block data tiansfei : LDM, STM)
managing semaphoies(single data swap : SWP)
2 types (Ilow and piivilege level of execution)
Bianch and Bianch with link (B, BL)
Softwaie Inteiiupt (SWI)
2 types (Exteinal copiocessoi)
copiocessoi data opeiation (CDP)
copiocessoi data tiansfei (LDC, STC), iegistei tiansfei (MRC, MCR)
3-35 Comuter Archtecture Lab. Comuter Archtecture Lab.
Instiuction Ioimat
3-36 Comuter Archtecture Lab. Comuter Archtecture Lab.
Opeiating Mode
U +
; +^
;^ ^
; +^
7 U j ,

oie!elc
=
, ;
,
; ; ; ;
;^ ;;U
1OOO1
1OO1O
1OO11
1O111
11O11
PSR (Picqiam Slalus PSR (Picqiam Slalus PSR (Picqiam Slalus PSR (Picqiam Slalus
Req!slei) Req!slei) Req!slei) Req!slei)U UU U 4O| 4O| 4O| 4O| l l l l
1OOOO
( (( (Mcde) Mcde) Mcde) Mcde)
(Jsei Mcde)
U
(Fasl +nleiiuol Requesl Mcde)

(+nleiiuol Requesl Mcde)
Il
(Suoeiv!sci Mcde)
(Aocil Mcde)
,
(Jnde!!ned Mcde)
3-37 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Bank
37(geneial-31, status-6), 16

/=
U
16(RO-R15)2;U
(CPSR, SPSR).
R15 => PC,, R14 => /=
CPSR(Cuiient Piogiam Status Registei):
\;U
SPSR(Stoied Piogiam Status Registei):
+SPSR (Stoied Piogiam Status Registei)
3-38 Comuter Archtecture Lab. Comuter Archtecture Lab.
Registei Bank(Cont'd)
R0
R1
R2
R3
R12
R13
R14
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12_fiq
R13_fiq
R14_fiq
R15(PC)
R4
R5
R6
R7
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R0
R1
R2
R3
R12
R13_abt
R14_abt
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_irq
R14_irq
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_und
R14_und
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
R0
R1
R2
R3
R12
R13_svc
R14_svc
R15(PC)
R4
R5
R6
R7
R8
R9
R10
R11
CPSR
SPSR_fiq
CPSR CPSR
SPSR_svc
CPSR
SPSR_abt
CPSR
SPSR_irq
CPSR
SPSR_und
System & User System & User System & User System & User FQ FQ FQ FQ Supervisor Supervisor Supervisor Supervisor Abort Abort Abort Abort PQ PQ PQ PQ Undelined Undelined Undelined Undelined
General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter General Peqisters and Proqram Oounter
Proqram Status Peqisters Proqram Status Peqisters Proqram Status Peqisters Proqram Status Peqisters
3-39 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7 Block Diagiam
3-40 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7 Pipeline

Fetch
Fetch
PH-1 : off-chip memoiy
access
PH-2 : Instiuction Reg. !an
instiuction fiom off-chip
memoiy

Decode
Decode
PH-1 : Decode-stage
instiuction iegistei !
instiuction iegistei
PH-2 : Decode instiuction

Execute
Execute
PH-1 : Opeiand fetch and
Shiftei opeiation
PH-2 : ALU opeiation and
Result wiite opeiation
3-41 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb)
Ioi contiol application
CISC : good code density but
powei limitation
RISC : pooi code density
Solution to code size pioblem
hand coding (ieduce 1O-2O%)
compiessed code which is
expended at iun time(-3O%)
Thumb concept
on execution, 16 bits Thumb code are decompressed to equivalent 32
bits ARM instruction Thumb concept
3-42 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb)
Always condlon code
Major opcode
denoting format 3
move/compare/add/sub
with immediate value
Minor opcode
denoting ADD
instruction
Destination and
source register
lmmediate
value
1110 00 1 0100 1 0 Rd 0 Rd 0000 8-bit immediate
001 10 8-bit immediate Rd
APM code
Thumb code
Example ADD rd. =Consl
Thumb code limitations
only eight iegisteis
2-opeiand instiuction
usei 3bit iegistei specifieis
3-43 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb) - Cont'd
Noimalizcd Oliystonc 1.1 codc sizc
(Source: Microprocessor Forum, 1993, and vendor data)
3-44 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM7TDMI(Thumb) - Cont'd
Pioccssois at 5 volts in 16-bit mcmoiy systcms
33 MMz 5V
18 MMz
?O MMz 5V
1O MMz 5V
Syslem Syslem Syslem Syslem
33 MMz 5V
33 MMz 5V
1G MMz 5V
?5 MMz 5V
O4?4
OO4
O5
O1
Pcwei Pcwei Pcwei Pcwei
(W) (W) (W) (W)
O181
??5
O?5
?5
Piccessci Piccessci Piccessci Piccessci
ARM7OM1
ARM 71O
Z38O
SM7O3?
M8/5OO
48GS|C
M8/3OOM
38G|C
38?
31
1G4
1O
Ohiyslcne Ohiyslcne Ohiyslcne Ohiyslcne 11 11 11 11
(M1PS) (M1PS) (M1PS) (M1PS)
?1?
18O
1O
8O
OO
78
33
1O
M1PS/W M1PS/W M1PS/W M1PS/W
117
8
8
3
(Source: Microprocessor Eorum, 1993, and vendor data)
3-45 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM81O Oveiview
8O MIPS(3.3V, O.5 micion) Peifoimance
5 -stage Pipeline
Highei Clock iate
incieased die size, =7,, =2U
Paiallel Opeiation of shiftei and addei
deceased coie cycle time
CPI= 1.43
double-bandwidth cache ieads
load and stoie instiuctions : 1cycle
bianch piediction
3-46 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM81O Block Diagiam
5 stage pipeline
ARM8 CPU coie
- PU
static bianch
piediction in PU
8KB unified cache
wiite-back/wiite-
thiough
MMU
two-level page-table
stiuctuie
3-47 Comuter Archtecture Lab. Comuter Archtecture Lab.
ARM8 coie Block Diagiam
Iftcr / /L\
Fc_:tcr EarI
Fc_:tcr cccccr
arc CcrtrcI Lc_c
CF\ Ccrc CF\ Ccrc CF\ Ccrc CF\ Ccrc
FrcfctcI FrcfctcI FrcfctcI FrcfctcI
\rt \rt \rt \rt
!crcry !crcry !crcry !crcry
!rtcrfacc !rtcrfacc !rtcrfacc !rtcrfacc
^DItJIcr
Vrtc ata
FJcIrc
FC
!C
FC
!rcrcrcrtcr
/rrrc::
EDffcr
FF:
!r:trctrr
!r:trct
Frata
Frata Frata
E

E
D
:
FC
F
c
:
D
I
t

E
D
:
F
c
:
D
I
t

E
D
:
Vrata
/

E
D
:
/ E:
F
c
:
D
I
t

E
D
:
\/rrr:: \/rrr::
FC
Vrata
Frata
!r:trDctr
r
!C
3-48 Comuter Archtecture Lab. Comuter Archtecture Lab.
185 MIPS(2.OV, O.35 micion) -> PDA
Coie logic
5 stage pipeline
Havaid aichitectuie(I-cache, D-cache)
Cache
I-cache : 16KB 32 way set-associative with 32bytes block
D-cache : 16KB 32 way set-associative wiite-back with 32bytes block
MMU
IMMU, DMMU
sepaiate TLB(32 entiies each)
data TLB(flusl-all/singlc), instiuction TLB(flusl-all)
|- |- |- |- .||- .||- .||- .||- -..|- -..|- -..|- -..|- '.- '.- '.- '.-
''.'' ''.'' ''.'' ''.''

'` `. '` `. '` `. '` `.
`` `` `` ``
' ' ' '

\\ \\ \\ \\

/.'' /.'' /.'' /.''
Stiong ARM-11O
3-49 Comuter Archtecture Lab. Comuter Archtecture Lab.
Stiong ARM-11O Block Diagiam
3-50 Comuter Archtecture Lab. Comuter Archtecture Lab.
Stiong ARM-11O
Die aiea: 5Omm

Ieatuie Size : O.35m


Channel Length: O.25 m
Vtn/Vtp : O.35V/-O.35V
Powei Supply: 2.OV
Clock: 16O MHz
IBOX - instruction unit EBOX- integer execution unit
MUL - integer multiplier IMMU - Instruction MMU
DMMU - Data MMU Icache - Instruction cache
Dcacje - Data cache WB - write buIIer
BIU - bus interIace unit
3-51 Comuter Archtecture Lab. Comuter Archtecture Lab.
Stiong ARM-11O (Powei Down Mode)
Idle
=;U, U
PLL(Phase Lock Loop)=, l=
;: l==
Powei : 45O mW -> 2O mW
Sleep
=;U,
U'=
I/O 7
;==
Cuiient : 5O A
Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC
Miciopioccssoi - Casc Study I
3-53 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC Aichitectuie
Intioduction
Peifoimance Optimization With Enhanced RISC
Peifoimance Computing
1ointly developed by Apple, IBM and Motoiola
Pioduced the RS/6OOO woikstation, and the Powei
Macintosh
Tailoied to specific maiket segments
peifoimance hungiy application PoweiPC 6O4e, PoweiPC
62O
automatic eneigy conseivation PoweiPC 6O3e
Open Standaid, PoweiPC Platfoim : PoweiPC aichitectuie
(involved in PoweiOpen enviionment)
Scaleable aichitectuie
64-bit implementation
32-bit implementation
3-54 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O1
Highlights
Iiist implementation of PoweiPC
Implements the 32-bit poition of the PoweiPC aichitectuie
Dispatch up to thiee inst/cycle
Thiee execution units ( IU, BPU, IPU )
32 geneial puipose iegisteis
32 64-bit geneial puipose floating point iegisteis
32-bit addiess, 64-bit data bus
Single cycle multiply-add instiuction ( pipelined IPU )
3-55 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O1
Highlights
32KB unified non-blocking cache ( no coheiency )
eight-way set associative
physically addiessed
LRU ieplacement algoiithm
On chip MMU
256-entiy two-way set-associative unified TLB
Ciaphics copiocessoi suppoit
Bianch Piocessing Unit
Eaily exposition of bianches effectively pioducing zeio code
bianches
Ioui-entiy tianslation shadow buffei
3-56 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O2
Highlights
implements low-cost, low-powei, 32-bit potion of the Powei PC
aichitectuie
Powei Management Unit piovides dynamic and static powei saving
mode
Ioui execution units ( IU, IPU, BPU, LSU )
Two iegistei files ( CPRs, IPRs )
dispatches 1 inst/cycle, ietiies up to 1 inst/cycle, up to 4 instiuctions in
execution
bianch folding
sepaiates 4KB code and 4KB data caches
one cycle cache access
Sepaiate inst/data BAT(4) back addiess tianslation(BAT) aiiays
Time multiplexed addiess and data bus
3-57 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O2 Block Diagiam
3-58 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O3/6O3e
Highlights
Powei Management Unit
dispatches 2 inst/cycle
static bianch piediction
16KB code and data caches : 3-state coheiency, copy-back
data cache
one cycle cache access
8 BAT iegisteis
fast-tiap mechanism foi softwaie ieload TLB
3-59 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O3/6O3e
Powei management unit
Static low-powei design
Dynamic powei management
Instiuction Ietching and Bianch unit
6-instiuction piefetch queue
static bianch piediction
Dispatch unit
Dispatches 2 instiuctions/cycle
4 stage pipeline : Ietch, Dispatch, Execute and Complete
Load/stoie unit : one cycle cache access
Cache unit : Sepaiate 16KB code and 16KB data cache
4 way set associative, 3-state coheiency, copy back data cache
MMU : 8 BAT iegisteis
3-60 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e
Advanced Supeiscalai CPU Design
Speculative execution past 2 uniesolved bianches
16-entiy ieoidei buffei
2-entiy ieseivation station pei execution unit
Registei ienaming on CPR, IPR, and CR
Dynamic Bianch Piediction
8 woid wide inst fetch bus fiom cache
Dispatch up to 4inst/cycle, 8-inst dispatch buffei
Six Execution Unit
Bianch, Load/Stoie
2 simple fixed point unit - complex fixed point unit
floating point
3-61 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e
Completion Unit
Completes 4 inst plus 1 stoie and 1 bianch
Load/Stoie Unit
Haidwaie suppoited mis-aligned little endian accesses
Haidwaie contiolled load/stoie multiple iegisteis
Out of oidei load/stoie
Bianch Piediction
512-entiy bianch histoiy table
64-entiy bianch taiget addiess cache, fully associative
Memoiy Queue
Two entiy iead queue, thiee entiy wiite queue
Memoiy coheiency maintained thiough bus snooping
3-62 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e Block Diagiam
3-63 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e Pipeline
3-64 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e
Data Cache and memory queues
3-65 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 6O4/6O4e
Cache
Sepaiated physical caches, each 32KB, 4-way set associative
Wiite-Thiough/Copy-Back, line-fill buffei foiwaiding
Non blocking cache
BUS snooping
MESI data cache coheiency contiol
Softwaie contiolled instiuction cache coheiency
MMU
52-bit viitual and 32-bit ieal addiessing
4 inst, 4 data block addiess tianslation iegisteis
64-entiy 2-way code/data TLB
Big/little endian addiessing
3-66 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 62O
64-bit Advanced Supeiscalai Piocessoi
fetch and dispatch up to 4 inst/cycle
speculative execution past 4 uniesolved bianches
iegistei ienaming
Execution Unit
six execution unit
3 integei units with 2 ieseivation stations each
bianch unit with 4 ieseivations stations
Static/Dynamic Bianch Piediction
bianch piediction in fetch and dispatch stages
256-entiy bianch taiget addiess cache, 2O48-entiy bianch
histoiy table
3-67 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 62O Micioaichitectuie
3-68 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC 62O
MMU
8O-bit viitual, 64-bit effective addiessing
128-entiy, 2-way set associative shaied TLB
2O-entiy fully associative segment lookaside buffei
16 segment iegisteis foi 32-bit mode suppoit
Caches
Sepaiate 32KB 8-way set associative instiuction and data
caches
Sepaiate 64-entiy, fully associative effective to ieal addiess
tianslatoi
4 inst, 4 data BAT iegisteis
coheient data cache(MESI)
on-chip L2 cache inteiface
3-69 Comuter Archtecture Lab. Comuter Archtecture Lab.
PoweiPC Iamily of Embedded Piocessois
PoweiPC 74O Embedded Piocessoi
suitable foi high-end communication and netwoiking
applications ( hubs, iouteis, LAN switches, netwoik
computeis, stoiage contiolleis )
Application-Specific Piocessois(ASSP)
devices will be optimized foi specific applications
modulai design techniques
Custom Piocessois
Coie - ASIC piogiam

You might also like