Chapter 3 Pipelining

Chapter 3 Pipelining
3.1 Pipeline Model
Terminology
task
subtask
stage
staging register
Total processing time for each task.

k
Tpl =
, where ti is the processing time,
di is the delay by the staging register, and k is the
number of stages
i 1
(ti di )
3.1 Pipeline Model

(continued)
Total processing time for each task.

Tseq = (t )
pipeline cycle time, tmax = Max(ti+di), 1 I k
clock frequency = 1/ tmax
k
i 1
pipeline cycle time tcyc can be denoted by

Tseq/k + d
N Tseq
speedup, S = ( k N 1) tcyc ,where N is the number
of tasks.
3.1 Pipeline Model

(continued)
If staging register delay is ignored and the

processing times of the stages are same,
tcyc = Tseq / k.
N k
Therefore, Sideal becomes
k N 1
If N , Sideal k
3.1 Pipeline Model

(continued)
The total cost of the pipeline is given by

C= L.k + Cp where Cp =
and
c L is the cost
of each staging register.
To minimize the composite cost per the
computation rate, k = CpLTseq
d
k
i 1
3.1 Pipeline Model

(continued)
In practice, making the delays of pipeline stages

equal is a complicated and time-consuming process
It is essential to maximum performance that the stages be
close to balanced.
It is done for commercial processors, although it is not easy
and cheap to do
Another problem with pipelines is the overhead in

term of handling exception or interrupts.
A deep pipeline increases the interrupt handling overhead.
Pipeline Types
Pipeline Types(Handlers classification)

Instruction pipelines
FI, DI, CA, FO, EX, ST
arithmetic pipelines
processor pipelines: a cascade of
processors each executing a specific
module in the application program.
Instruction pipeline
reservation table
Row : stages
Column : pipeline cycles
The cycle time of instruction pipelines is

often determined by the stages
requiring memory access.
Control Hazard
Conditional branch instructions

The target address of branch will be known only
after the evaluation of the condition.
The ways to solve control hazards
The pipeline is frozen

The pipeline predicts that the branch will not be
taken.
It would be to start fetching the target instruction
sequence into a buffer while the nonbranch
sequence is being fed into the pipeline.
Arithmetic pipelines
Floating point addition

Consider S = A + B, where A=(Ea,Ma), B=(Eb,
Mb), and S=(Es,Ms)
Addition steps (Figure 3.5)
Equalize the exponents

Add mantissas
Normalize Ms and adjust Es for the sum normalization
Round Ms
Renormalize Ms and adjust Es
Modified floating point add pipeline (Figure 3.6 &

3.7)
Arithmetic pipelines(cont.)
floating point multiplication

Consider P= A x B, where A=(Ea,Ma), B=(Eb, Mb),
and P=(Ep,Mp)
Multiplication steps (Figure 3.8)
Add exponents
Multiply mantissas
Normalize Mp and adjust Ep
Round Mp
Renormalize Mp and adjust Ep
Modified floating point add pipeline (Figure 3.9)
Arithmetic pipelines(cont.)
Multifunction pipeline
To perform more than one operation
A control input is needed for proper
operation of the multifunction pipeline.
Figure 3.10 : floating point add/multiplier
Classification scheme by
Ramamoorthy and Li
Functionality
unifunctional
multifunctional
Configuration
static
dynamic
Mode of operation:
scalar
vector
3.2 Pipeline control and

Performance
To provide the max. possible throughput, it

must be kept full and flowing smoothly.
Two conditions of smooth flow of a pipeline:
the rate of input of data
data interlocks between the stages
Example 3.1 : the pipeline completes one

operation per cycle(once it is full)
Example 3.2 : non-linear pipeline
Structural hazard
Due to the non-availability of
appropriate hardware
One obvious way of avoiding structural
hazard is to insert additional hardware
into the pipeline.
Example 3.3
Figure 3.12 depicts the operation of the

pipeline
In cycle 3, 4, 5, and 6, simultaneous accesses are
needed.
If we assume that the machine has separate data
and instruction caches, in cycles 5 and 6 the
problems are solved.
One way to solve the problem in cycle 4 is to stall
the ADD instruction (Figure 3.13)
The stalling process results in a degradation of pipeline

performance.
Collision vectors
Initiation : launching of an operation into the

pipeline
Latency: the number of cycles that elapse
between two initiation.
Latency sequence: the latencies between
successive initiations
Collision: it occurs if a stage in the pipeline is
required to perform more than one task at
any time.
Collision vectors(cont.)
Forbidden set: the set of all possible column

distances between two entries on some row
of RT.
Collision vector can be derived from
forbidden set F and can be utilized to
control the initiation of operations in the
pipelines.
CV = (vn-1,vn-2,,v2,v1)
Vi =1 if i is in the forbidden set
Examples
Example 3.4
(a) Overlapped RT
(b) Collision Vector(CV)
Example 3.5 & 3.6

Collision case and no collision case
Control
How to control the initiation of pipeline using CV.

Place the CV in a shift reg.
If the LSB of the shift reg. Is 1, do not initiate an
operation at that cycle; shift the CV right once,
inserting 0 at the vacant MSB position
If the LSB of the shift reg. Is 0, initiate a new
operation at that cycle; shift the CV right once,
inserting 0 at the vacant MSB position. In order to
reflect the superposing status due to the new
initiation over the original one, perform a bit-by-bit
OR of the original CV with the content of the shift reg.
3.2.3 Performance
Figure 3.15(a)
The CV of Figure 3.11 : (00111)
Figure 3.15(a) shows the state transitions.
3.2.3 Performance
Average latency
simple cycle
greedy cycle
MAL(Minimum average Latency)
3.2.4 Multifunction Pipelines

Figure 3.17
Vxx, Vxy, Vyx, Vyy
3.3 Other Pipeline Problems
Data Interlock: due to the sharing of

resources. Data hazard
data forwarding
internal forwarding
write-read forwarding
read-read forwarding
write-write forwarding
load/store architectures versus

memory/memory architectures
3.3 Other Pipeline Problems

(continued)
Conditional Branches
branch prediction
delayed branch
branch-prediction buffer
branch history
multiple instruction buffers
Interrupts
precise interrupt scheme
3.4 Dynamic Pipelines
Instruction deferral
scoreboard
Tomosulos algorithm
Performance evaluation
maximizing the total number of initiations

per unit time
minimizing the total time required to handle
a specific sequences of initiation table types
3.5 Example systems

CDC Star-100
CDC 6600
MIPS R-4000
3.6 Summaries
Three approaches have been tried to

improve the performance beyond the
ideal CPI case:
superpipeline
superscalar
VLIW(Very Long Instruction Word)
End of Chapter 3

Chapter 3 Pipelining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Pipelining

Uploaded by

Copyright:

Available Formats

Chapter 3 Pipelining

3.1 Pipeline Model

Total processing time for each task.

3.1 Pipeline Model

Total processing time for each task.

pipeline cycle time tcyc can be denoted by

3.1 Pipeline Model

If staging register delay is ignored and the

3.1 Pipeline Model

The total cost of the pipeline is given by

3.1 Pipeline Model

In practice, making the delays of pipeline stages

Another problem with pipelines is the overhead in

Pipeline Types(Handlers classification)

FI, DI, CA, FO, EX, ST

The cycle time of instruction pipelines is

Conditional branch instructions

The ways to solve control hazards

The pipeline is frozen

Floating point addition

Equalize the exponents

Modified floating point add pipeline (Figure 3.6 &

floating point multiplication

Modified floating point add pipeline (Figure 3.9)

3.2 Pipeline control and

To provide the max. possible throughput, it

Example 3.1 : the pipeline completes one

Figure 3.12 depicts the operation of the

The stalling process results in a degradation of pipeline

Initiation : launching of an operation into the

Forbidden set: the set of all possible column

Example 3.5 & 3.6

How to control the initiation of pipeline using CV.

3.2.4 Multifunction Pipelines

3.3 Other Pipeline Problems

Data Interlock: due to the sharing of

load/store architectures versus

3.3 Other Pipeline Problems

3.4 Dynamic Pipelines

maximizing the total number of initiations

3.5 Example systems

Three approaches have been tried to

You might also like