0 views

Uploaded by sanjeevsoni64

Pipelining

- Microprocessor Design Projrect
- 3des Paper
- Zero skew vs tolerable skew.docx
- br3-u7
- General Ledger Useful SQL ScriptsDanielNorth
- UIC1001
- Explain Microsoft Word
- Proc Sys Reset
- Sta Basics
- Fluid Services Part 2
- 11-ALU
- ComputerOrganizationHamacherInstructorManualsolution-chapter10
- ca
- ChipDesign_Summer2014
- Work Report Guidelines for website.pdf
- LoRa Limmit
- Ch6 DT Interconnect
- Decoupling 4 Bit Architectures from E-Commerce in Internet QoS
- CQI-UMTS
- async_fifo

You are on page 1of 27

why wait . . . ?

... let's solve a "real problem"

device: washer

function: fill, agitate, spin

washerPD = 30 mins

function: heat, fast spin

dryerPD = 60 mins

one load at a time

reason that students put off

doing laundry so long is not

because they procrastinate,

step 1:

are lazy,

or even not because they are

working with their

computation slides

at a time is not smart step 2:

doing N loads of laundry

the "combinational" way

step 2:

of pipelining yet ! step 3:

step 4:

.....

= N*90 mins

doing N loads... the 1st year E.E. way

step 1:

laundry process

step 2:

if we account for the startup step 3:

transient correctly

.....

when doing pipeline analysis, we're

mostly interested in the "steady

state" where we assume we have an

infinite supply of inputs

= N*60 mins

some definitions

latency:

the delay from when an input is established until the

output associated with that input becomes valid

assuming that the

(0th year's laundry = 90 mins) wash is started

(1st year's laundry = 120 mins) as soon as

possible and waits

(wet) in the

implies that 0th's in a six hour wait gets 4 loads done,

washer until dryer

while 1st's gets 5 and goes home half an hour earlier

is available

throughput:

the rate of which inputs or outputs are processed

(0th year's laundry = 1/90 mins-1= 0.011 mins-1)

(1st year's laundry = 1/60 mins-1= 0.016 mins-1)

okay, back to circuits...

F

latency = tPD

X H P(X) throughput = 1/tPD

we can't get the answer

G faster, but are we making

effective use of our

hardware at all times?

X

F(X)

G(X)

P(X)

stable while H performs its computation

pipelined circuits

use registers to hold H's input stable!

now F and G can be working on input

Xi+1.

because of the 2-stage pipeline :

P(X) is valid during clock j+2.

suppose F, G, H have propagation delays of 15, 20, 25 ns

and we are using ideal zero-delay registers:

latency throughput

unpipelined 45 1/45

worse better

pipeline diagrams

clock cycle

i i+1 i+2 i+3

pipeline stages

...

G reg G(Xi) G(Xi+1) G(Xi+2)

move diagonally through the diagram,

progressing through one pipeline stage each clock cycle

pipeline conventions

definition:

a K-stage pipeline ('K-pipeline") is an acyclic circuit having

exactly K registers on every path from an input to an output

a combinational circuit is thus a 0-stage pipeline

convention:

every pipeline stage, hence every K-stage pipeline, has a

register on its output (not on its input)

always:

the clock common to all registers must have a period

sufficient to cover propagation over combinational paths

PLUS (input) register tPD PLUS (output) register tSETUP

period of the clock common to all registers

the throughput of a K-pipeline is the

frequency of the clock

ill-formed pipelines

consider a bad job of pipelining:

none

problem:

successive inputs get mixed: e.g., B(A(Xi+1 ), Yi)

this happened because some paths from inputs to outputs

had 2 registers, and some had only 1!

can this happen on a well-formed K pipeline?

a pipelining methodology

step 1: STRATEGY:

draw a line that crosses every

output in the circuit, and focus your attention on placing

select one endpoint as an pipelining registers around the

origin slowest circuit elements

(bottlenecks)

step 2:

continue to draw new lines

from the origin across various

circuit connections such that

these new lines partition the

inputs from the outputs

every point where a separating

line crosses a connection will

always generate a valid

pipeline

pipeline example

observations:

• 1-pipeline improves neither

latency nor throughput

• troughput is improved by

breaking long combinational

paths, allowing faster

clock

• too many stages cost

LATENCY THROUGHPUT latency while not

improving throughput

0-pipe 4 1/4 • back-to-back registers

are often required to keep

1-pipe 4 1/4 pipeline well-formed

2-pipe 4 1/2

3-pipe 6 1/2

considering pipelining

• advantages

– higher throughput than

the corresponding combinatorial device

– different parts of the logic

work on different parts of the problem

• disadvantages

– generally, increases latency

– only as good as the weakest link

how do 1st year EE's a.d.2010 laundry

they work around the bottleneck:

first they find a place

with twice as many dryers as washers

step 1:

step 3:

latency = 90 min

step 4:

step 5:

circuit interleaving

one way to overcome

a pipeline bottleneck

is to replicate

the critical element

as many time as needed

and alternate inputs

between the various copies

N-1 registers

latency = 2 clocks

to N pipeline stages

combining pipelining and interleaving

combining interleaving

with pipelining

moves the bottleneck

from the C-element

to the F-element

with a propagation delay of 8 ns

a throughput of 4 ns,

this can be considered and latency of 8ns.

as an extra pipelining stage

that passes through the middle of the C' module

assignment B4

Pipeline a combinational encryptor X 5 1 3 3 1

for throughput! 0

The device takes an integer value X 2 4 7

and computes an encrypted version C(X).

1 5 3 8 5

The propagation delay of each module

is given in ms. 6 9 11

(contamination delays are zero). 13

5 10 3 12 1

before monday, march 8, 9:00 C(X)

• what is the latency and throughput

of the unpipelined device? From: student@tue.nl

• give the locations for registers To: computation@ics.ele.tue.nl

(ideal, zero-delay) by edge numberSubject : B4

after maximizing the throughput!

use as few as possible registers! 27 0.40

• give the latency and throughput 4 9 10 drawing in

of your pipelined device! 30 0.80 attachment

attachment <student_B3.xxx>

multiplication (positive numbers)

multiplicand A3 A2 A1 A0

multiplier B3 B2 B1 B0

x

ABi called a "partial A3B0 A2B0 A1B0 A0B0

product

A3B1 A2B1 A1B1 A0B1

A3B2 A2B2 A1B2 A0B2

+ A3B3 A2B3 A1B3 A0B3

easy part:

forming partial products (just an AND gate since Bi is either 0 or 1)

hard part:

adding partial products column by column with carry

multiplication

multiplier B1

3 0B2 1B1 1B0

x

ABi called a "partial A31B0 A02B0 01B0 A10B0

A

product

A31B1 A20B1 A01B1 A

10B1

A3B2 A2B2 A1B2 A0B2 +

+ A3B3 A2B3 A11B3 A10B3 0 1 1

1 0 0 1

multiplying N-bit number by M-bit number gives (N+M)-bit result +

1 1 0 0 0 1 1

easy part:

forming partial products (just an AND gate since Bi is either 0 or 1)

hard part:

adding M N-bit partial products

sequential multiplier

and the multiplier (B) has M bits.

init: P 0, load A&B

repeat M times {

P P + (BLSB ==1 ? A : 0)

shift P/B right one bit

}

can proces one partial product at a time

and then cycle the circuit M times

sequential multiplier (64-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

63 31 0

E

<< repeat 32 times:

>>

{if 1==LSB in multiplier,

then add multiplicand;

shift multiplier 1 right;

shift multiplicand 1 left;

64 - bit

add zero }

ALU multiplier

63 0 31 0

product register E E

<<

>>

LSB

finite state machine

sequential multiplier (32-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

31 0

E

repeat 32 times:

{if 1==LSB in multiplier,

then add multiplicand;

shift multiplier 1 right;

shift product 1 right;

32 - bit add zero }

ALU multiplier

63 0 31 0

product register E E

<<

<<

>> >>

LSB

finite state machine

sequential multiplier (32-bit ALU)

after initialization

multiplicand

(product register at 0)

and loading the operands;

31 0

E

repeat 32 times:

{if 1==LSB in multiplier,

then add multiplicand;

shift content of

product register 1 right;

32 - bit add zero }

ALU multiplier

63 0

product register E

<<

>>

LSB

a combinational multiplier

A3 A2 A1 A0

B0

tPD = 10*tPD,FA

FA FA FA FA

(follow the path A3 A2 A1 A0

from A0 to P7) B1

FA FA FA FA

A3 A2 A1 A0

B2

FA FA FA FA

A3 A2 A1 A0

B3

FA FA FA FA

P7 P6 P5 P4 P3 P2 P1 P0

pipelined multiplier

A3 A2 A1 A0

B0

"carry save" FA FA FA FA

configuration

A3 A2 A1 A0

B1

FA FA FA FA

A3 A2 A1 A0

B2

FA FA FA FA

A3 A2 A1 A0

B3

FA FA FA FA

FA FA FA FA

P7 P6 P5 P4 P3 P2 P1 P0

summary

• latency (L) = time it takes for given input to effect an output

• throughput (T) = rate at which new outputs appear

• for combinational circuits: L = tPD of device, T = 1/L

• for K-stage pipeline (K > 0):

– always have registers on output(s)

– K registers on every path from input to output

– T = (tPD,reg + tPD,slowest pipeline stage + tSETUP)-1

• to increase throughput: split the slowest stage

• no further splitting possible, use replication/interleaving

– L = KxT

• pipelined latency ≥ combinational latency

• pipelining can be combined chapter

with circuit interleaving 4.5-p332

en 3.3:

- Microprocessor Design ProjrectUploaded bymanojpeehu
- 3des PaperUploaded byKulwinder
- Zero skew vs tolerable skew.docxUploaded byPavan Raj
- br3-u7Uploaded byapi-233970717
- General Ledger Useful SQL ScriptsDanielNorthUploaded byRamge07
- UIC1001Uploaded byflorinssl
- Explain Microsoft WordUploaded byDigvijay Singh
- Proc Sys ResetUploaded bytruongdungspkt
- Sta BasicsUploaded byMarverickwangoo
- Fluid Services Part 2Uploaded byAsif Mohammed
- 11-ALUUploaded byRajesh Babu
- ComputerOrganizationHamacherInstructorManualsolution-chapter10Uploaded byshmruthis
- caUploaded bysabatino123
- ChipDesign_Summer2014Uploaded bybhavaniprasad_k485
- Work Report Guidelines for website.pdfUploaded bykrishna_piping
- LoRa LimmitUploaded byTrầnTrungHiếu
- Ch6 DT InterconnectUploaded bybalajibs203285
- Decoupling 4 Bit Architectures from E-Commerce in Internet QoSUploaded byGorge Gorgy
- CQI-UMTSUploaded byBiswajit Mohanty
- async_fifoUploaded bySharath Kumar
- PowerNap a Power-Aware DistributedUploaded byRafael Acosta
- Vedic Maths PresentationUploaded byRajveer Singh Sekhon
- Vedic Maths is Based on Sixteen Sutras or PrinciplesUploaded byGagan Bansal
- Micro Processor 8085 Questions and AnswersUploaded byk_suganthivasu
- lesson plansUploaded byapi-416444696
- hef4752Uploaded byonlyvinod56
- LABS12DINTRO31Uploaded byAijur
- DBTMAUploaded byutkarsh9891
- Capacityi in LTEUploaded byAgus Andriyas
- UntitledUploaded byapi-280577806

- Tri-Band Circularly Polarized Annular Slot Antenna for GPS and CNSS ApplicationsUploaded bysanjeevsoni64
- Behavioral DelaysUploaded byapi-3719969
- OHP_CMOS_6(H20-5-16)Uploaded bysanjeevsoni64
- VNA_Models - RyttingUploaded bysanjeevsoni64
- ee242_mixer_fundamental.pdfUploaded bysanjeevsoni64
- Direct Transistor Level Layout for Digital BlocksUploaded bysanjeevsoni64
- Design and Optimization of Single, Dual, And Triple Band Transmission Line Matching Transformers for Frequency-Dependent LoadsUploaded bysanjeevsoni64
- Design and Optimization of Single, Dual, And Triple Band Transmission Line Matching Transformers for Frequency-Dependent LoadsUploaded bysanjeevsoni64
- Fundamentally Changing Nonlinear Microwave Design_Vye 2010Uploaded byKateXX7
- Multi-Layered Planar Filters Basedon Aperture Coupled , Microstripor Stripline ResonatorsUploaded bysanjeevsoni64
- Tai_Pereira - An Approximate Formula for Calculating the Directivity of an AntennaUploaded bysanjeevsoni64
- Graphene-Based Nano-Antennas for Electromagnetic Nanocommunications in the Terahertz Band.pdfUploaded bysanjeevsoni64
- Quantitative Theory of Nanowire and Nanotube Antenna PerformanceUploaded bysanjeevsoni64
- Microwave Power Dividers and Couplers PrimerUploaded byMohsin Fayyaz
- Spectrum Analysis BasicsUploaded by趙世峰
- [Peter C. L. Yip (Auth.)] High-Frequency CircuitUploaded byAar Kay Gautam
- 70857.pdfUploaded byEdward T Ramirez
- Computer-Aided Design Ofbroadband Single Balancedwaveguide Mixer at K-bandUploaded bysanjeevsoni64
- Surface Mount Packages Linear Models for Diode - Avago Application Note 1124Uploaded bysanjeevsoni64
- 07-RF Electronics Kikkert Ch5 MixersUploaded byAnonymus_01
- Demonstration of Beam Steering Viadipole-coupled Plasmonic Spiral AntennaUploaded bysanjeevsoni64
- Graphene-Based Nano-Antennas for Electromagnetic Nanocommunications in the Terahertz Band.pdfUploaded bysanjeevsoni64
- Electron Spin and Its History - Eugene D. ComminsUploaded bysanjeevsoni64
- Electron Emission in Intense Electric Fields. - R. H. Fowler, F.R.S., And Dr. L. NordheimUploaded bysanjeevsoni64
- Design of Millimeter-wave Wideband Mixerwith a Novel if Block - m. z. ZhanUploaded bysanjeevsoni64
- Design High Speed, Low Noise, Low Power Two Stage CMOSUploaded bysanjeevsoni64
- 5989-9102ENUploaded bywrite2arshad_m
- LNAUploaded bysanjeevsoni64
- Mathemtical_Methods - Niels Walet.pdfUploaded bysanjeevsoni64

- IFR_6113EUploaded byNdambuki Dickson
- Circuits LogicUploaded byAisha Khan
- EATON_pg08703001eUploaded byvicgarofalo
- DSP QPUploaded byParamesh Waran
- 23B Voltage_parallel TN.docUploaded byShirlyn Hee
- 3_Schneider_P546_kb4_Uploaded bycomport
- 1078K1Uploaded bynabil160874
- Cat IndustrialVibr CIV en 0517 Rev11Uploaded byKATHERINNE CHICA
- ECE584_Lab1_Slotted_Line.pdfUploaded byFrew Frew
- 3-Characteristics of ElementsUploaded byalpcruz
- 80.Smart Paroxysm Prediction and Life Saver SystemUploaded byVenkat Veera
- EE706LecNotesUploaded byAbhishek Bhowmick
- Energy MeteringUploaded byAlistair Ciantar
- TM 11-636 AN.TRC-10Uploaded byAdvocate
- Integration of FACTS Devices Into a Dynamic PowerUploaded by384902253681
- RET54_tob_755543_ENeUploaded byemreboz
- RS 485 & 422Uploaded byBharathprabhu
- XENYX-502_P0576_M_ENUploaded byRomeo Rubonal
- PVUploaded byAnonymous z3ihT9DJ1v
- DPSD-1Uploaded byRamaChandran
- ACT510_DatasheetUploaded byArthit Somrang
- VLSI & Embedded SystemUploaded byAditya
- FOrm 5 SPM PhysicsElectronicsUploaded byVjayan Dharma
- Eureka GoldUploaded byBUSHRA_KHARIF
- Chapter08_Frequency Converter OperationUploaded byGrundfosEgypt
- Doors Geze Slimdrive EMD Usi Culisante EnUploaded byMelody Cotton
- QUARTUS II 13.0.1Uploaded byTrần Ngọc Lâm
- Control System ManualUploaded byBatu Yildirim
- Remote Notice Board Using Gsm With SmsUploaded byMini Kandregula
- Armour Loss in Three-core Submarine Xlpe CablesUploaded byFrido Feby Ariyanto