You are on page 1of 22

CSE477 VLSI Digital Circuits Fall 2002 Lecture 26: Low Power Techniques in Microarchitectures and Memories

Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477

[Adapted from Rabaeys Digital Integrated Circuits, 2002, J. Rabaey et al.]


CSE477 L26 System Power.1 Irwin&Vijay, PSU, 2002

Review: Energy & Power Equations


E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage
f01 = P01 * fclock

P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage


Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing)

CSE477 L26 System Power.2

Irwin&Vijay, PSU, 2002

Power and Energy Design Space


Constant Variable Throughput/Latency Throughput/Latency Design Time Non-active Modules Run Time Logic Design Clock Gating DFS, DVS Reduced Vdd Sizing Multi-Vdd Leakage + Multi-VT Sleep Transistors Multi-Vdd Variable VT + Variable VT (Dynamic Freq, Voltage Scaling)

Energy Active

CSE477 L26 System Power.3

Irwin&Vijay, PSU, 2002

Bus Multiplexing

Buses are a significant source of power dissipation due to high switching activities and large capacitive loading
q q

15% of total power in Alpha 21064 30% of total power in Intel 80386

Share long data buses with time multiplexing (S1 uses even cycles, S2 odd)
D1 S1 D1

S1

S2

D2

S2

D2

But what if data samples are correlated (e.g., sign bits)?


Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.4

Correlated Data Streams


1

Bit switching probabilities

Muxed Dedicated

0.5

For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams)
q

Bus sharing should not be used for positively correlated data streams Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) more random switching

0 14 12 10 8 6 4 2 0

MSB

Bit position

LSB

CSE477 L26 System Power.5

Irwin&Vijay, PSU, 2002

Glitch Reduction by Pipelining

Glitches depend on the logic depth of the circuit - gates deeper in the logic network are more prone to glitching
q q

arrival times of the gate inputs are more spread due to delay imbalances usually affected more by primary input switching

Reduce logic depth by adding pipeline registers


q

additional energy used by the clock and pipeline registers


Fetch Instruction Decode Execute Memory WriteBack

MDR

MAR

PC

I$

D$

pipeline stage isolation register

clk
CSE477 L26 System Power.6 Irwin&Vijay, PSU, 2002

Power and Energy Design Space


Constant Variable Throughput/Latency Throughput/Latency Design Time Non-active Modules Run Time Logic Design Clock Gating DFS, DVS Reduced Vdd Sizing Multi-Vdd Leakage + Multi-VT Sleep Transistors Multi-Vdd Variable VT + Variable VT (Dynamic Freq, Voltage Scaling)

Energy Active

CSE477 L26 System Power.7

Irwin&Vijay, PSU, 2002

Clock Gating

Most popular method for power reduction of clock signals and functional units Gate off clock to idle functional units
q q

e.g., floating point units need logic to generate signal disable

R Functional e unit g

- increases complexity of control logic - consumes power - timing critical to avoid clock glitches at OR gate output
q

clock disable

additional gate delay on clock signal


- gating OR gate can replace a buffer in the clock distribution tree

CSE477 L26 System Power.8

Irwin&Vijay, PSU, 2002

Clock Gating in a Pipelined Datapath

For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation)
Fetch Instruction Decode Execute Memory WriteBack

I$

D$

clk No FP
CSE477 L26 System Power.9

No WB
Irwin&Vijay, PSU, 2002

MDR

MAR

PC

Power and Energy Design Space


Constant Variable Throughput/Latency Throughput/Latency Design Time Non-active Modules Run Time Logic Design Clock Gating DFS, DVS Reduced Vdd Sizing Multi-Vdd Leakage + Multi-VT Sleep Transistors Multi-Vdd Variable VT + Variable VT (Dynamic Freq, Voltage Scaling)

Energy Active

CSE477 L26 System Power.10

Irwin&Vijay, PSU, 2002

Review: Dynamic Power as a Function of VDD

tp(normalized)

Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance)

5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

VDD (V)

Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other logic to reduce dynamic energy consumption.
Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.11

Dynamic Frequency and Voltage Scaling

Intels SpeedStep
q

Hardware that steps down the clock frequency (dynamic frequency scaling DFS) when the user unplugs from AC power
- PLL from 650MHz 500MHz

CPU stalls during SpeedStep adjustment

Transmeta LongRun
q

Hardware that applies both DFS and DVS (dynamic supply voltage scaling)
- 32 levels of VDD from 1.1V to 1.6V - PLL from 200MHz 700MHz in increments of 33MHz

Triggered when CPU load change is detected by software


- heavier load ramp up VDD, when stable speed up clock - lighter load slow down clock, when PLL locks onto new rate, ramp down VDD

CPU stalls only during PLL relock (< 20 microsec)


Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.12

Dynamic Thermal Management (DTM)

Trigger Mechanism:
When do we enable DTM techniques?

Initiation Mechanism:
How do we enable technique?

Response Mechanism:
What technique do we enable?

CSE477 L26 System Power.13

Irwin&Vijay, PSU, 2002

DTM Trigger Mechanisms

Mechanism: How to deduce temperature? Direct approach: on-chip temperature sensors


q

Policy: When to begin responding?


q

Trigger level set too high means higher packaging costs Trigger level set too low means frequent triggering and loss in performance

Based on differential voltage change across 2 diodes of different sizes May require >1 sensor Hysteresis and delay are problems

q q

Choose trigger level to exploit difference between average and worst case power
Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.14

DTM Initiation and Response Mechanisms

Operating system or microarchitectural control?


q

Hardware support can reduce performance penalty by 20-30%

Initiation of policy incurs some delay


q q

When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead Increasing policy delay reduces overhead; smarter initiation techniques would help as well

Thermal window (100Kcycles+)


q

Larger thermal windows smooth short thermal spikes


Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.15

DTM Activation and Deactivation Cycle


Trigger Turn Reached Response On Initiation Response Delay Delay Check Temp Check Temp Turn Response Off

Policy Delay

Shutoff Delay

Initiation Delay OS interrupt/handler Response Delay Invocation time (e.g., adjust clock) Policy Delay Number of cycles engaged Shutoff Delay Disabling time (e.g., re-adjust clock)

CSE477 L26 System Power.16

Irwin&Vijay, PSU, 2002

DTM Savings Benefits


Designed for cooling capacity without DTM
Temperature

Designed for cooling capacity with DTM DTM trigger level

System Cost Savings

DTM Disabled

DTM/Response Engaged
Time

CSE477 L26 System Power.17

Irwin&Vijay, PSU, 2002

Power and Energy Design Space


Constant Variable Throughput/Latency Throughput/Latency Design Time Non-active Modules Run Time Logic Design Clock Gating DFS, DVS Reduced Vdd Sizing Multi-Vdd Leakage + Multi-VT Sleep Transistors Multi-Vdd Variable VT + Variable VT (Dynamic Freq, Voltage Scaling)

Energy Active

CSE477 L26 System Power.18

Irwin&Vijay, PSU, 2002

Speculated Power of a 15mm P


70 60
Power (Watts) 0.25 , 15mm die, 2V

50 40 30 20 10 -

Power (Watts)

Leakage Active

70 60 50 40 30 20 10 Leakage Active 9% 0% 0% 1% 1% 2% 3% 5% 7% 0.18 , 15mm die, 1.4V

0% 0% 0% 0% 1% 1% 1% 2% 3%

30

40

50

60

70

80

90

10 0 11 0

30

40

50

60

70

80 80

90 90

Temp (C)

70 60
Power (Watts)

50 40 30 20 10 -

Power (Watts)

Leakage 0.13 , 15mm die. 1V Active 26% 20% 15% 8% 11% 1% 2% 3% 5%

70 60 50 40 30 20 10 14% 6% 9%

Temp (C) 33% 19% 26%

41% 49% 56%

0.1 , 15mm die, 0.7V Leakage Active

30

40

50

60

70

80

90

30

40

50

60

10 0 11 0

70

Temp (C)
CSE477 L26 System Power.19

Temp (C)

Irwin&Vijay, PSU, 2002

10 0 11 0

10 0 11 0

Review: Leakage as a Function of Design Time VT

But, reducing VT decreases gate delay (increases performance)

ID (A)

Reducing the VT increases the subthreshold leakage current (exponentially)

VT=0.4V VT=0.1V

0.2

0.4

0.6

0.8

VGS (V)

Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.
Irwin&Vijay, PSU, 2002

CSE477 L26 System Power.20

Review: Variable VT (ABB) at Run Time

VT = VT0 + (|-2F + VSB| - |-2F|)


where VT0 is the threshold voltage at VSB = 0 VSB is the source-bulk (substrate) voltage is the body-effect coefficient

For an n-channel device, the substrate is normally tied to ground

0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 -2.5 -2 -1.5 -1 -0.5 0

A negative bias causes VT to increase from 0.45V to 0.85V

Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)

CSE477 L26 System Power.21

VT (V)

VSB (V)
Irwin&Vijay, PSU, 2002

Next Lecture and Reminders

Next lecture
q

System level interconnect


- Reading assignment Rabaey, et al, xx

Reminders
q q q

Project final reports due December 5th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled
- Monday, December 16th from 10:10 to noon in 118 and 121 Thomas

CSE477 L26 System Power.22

Irwin&Vijay, PSU, 2002

You might also like