You are on page 1of 20

INVENTIVE

Palladium Clocking in ICE/STB


flow
Two levels of abstraction

• There are two levels at which we can discuss Palladium


operation when using ICE flow:
– FCLK level
– Step clock level
• Do not mix up these levels! When discussing behavior of a
design in palladium, always think at the FCLK level. It is
never necessary to consider step clock level to understand
behavior of a design in Palladium.

2 5/31/19 Cadence Confidential


Emulation using ICE flow is “cycle-based”
• A run consists of a sequence of “emulation cycles” or “FCLK cycles”.
FCLK frequency is usually 0.5 – 1.2 Mhz.
• Each net in the design is updated once per FCLK cycle.
• Waveforms show one sample per FCLK cycle. No finer granularity of
time is visible to the user (unless they use 1X mode. If they use 1X
mode, the granularity of time is two samples per FCLK.)

Example: Clocks with frequency ratio 6:5


FCLK # 0 1 2 3 4 5 6 7 8 9 10 11 etc.
CK6X

CK5X

• Inputs from target system are sampled once per FCLK cycle. Outputs
change once per FCLK cycle. (If multi-sampled IO is enabled, then
inputs can be sampled twice per FCLK and outputs can change twice
per FCLK.)
3 5/31/19 Cadence Confidential
Build Flow

DUT+EB
SRC
SystemVerilog
VHAN/VLAN and C/C++

IXCOM
DUT+EB SW Model
DUT HW Model
Modifed Modifed
SRC SRC

HDLICE Scripted by IXCOM irun -c

Import
Snapshot
compilerOptions.qel

xeCompile Precompile
xeDebug

Compile PD-XP
irun -R
Database

4
What is given to back-end compiler

• Precompile takes the gate-level netlist, and creates a


new netlist given to the back-end compiler.
• Netlist given to back-end compiler consists of:
– Combinational Gates
– DLY cells
– Memory Port Primitives (MPR, MPW)
• This netlist cannot have combinational logic cycles.
– Precompile breaks any loops by adding DLY cells.

5 5/31/19 Cadence Confidential


Behavior of Primitives A
Z
• Combinational gate: example AND B
– Let Ai Bi Zi be the values of A, B, Z at cycle i (for any i)

– The behavior is: Zi = Ai && Bi


FCLK # 0 1 2 3
A

• Delay cell: DLY D Q


– The behavior is: Qi = Di-1
FCLK # 0 1 2 3

D
Q

6
Behavior of Design Flipflop
• Flipflop: Q_FDP0(CK,D,Q)
– The desired behavior is: Qi = (CKi && ~CKi-1) ? Di-1 : Qi-1

FCLK # 0 1 2 3 4 5
D Q D

CK
gates CK
C Q

7 5/31/19 Cadence Confidential


D Flip-Flop Implementation

Behavior: Qi = (CKi && ~CKi-1) ? Di-1 : Qi-1

DLY
Q
D
DLY

DLY
CK

8 5/31/19 Cadence Confidential


No Timing Problems in ICE flow
• No Hold-time Violations
– Each Flip-Flop obeys a mathematical equation
Qi = (CKi && ~CKi-1) ? Di-1 : Qi-1
that gives its output behavior in terms of the values of its
inputs and in current and previous FCLK cycle and its state
in previous cycle. The behavior does not depend on the
number of logic levels generating CK.
FCLK # 0 1 2 3 4 5
D Q D

CK
gates CK
C Q

Flipflop clocks in
9 5/31/19 Cadence Confidential
“old” data
No Timing Problems in ICE flow (2)
• No “glitches”
– Combinational logic “glitches” do not occur
– In a given FCLK cycle, a combinational gate’s output is
computed only after its inputs are computed

ENA FCLK # 0 1 2 3

CK
ENA
GCK
CK GCK

No “glitch” here

10 5/31/19 Cadence Confidential


IXCOM flow
• FCLK is free-running
• Clocks generated within the DUT are “controlled” by the
testbench.
– Example: in Lab3, slowclk was generated in testbench and 10x faster
clock clk was generated in DUT. With tbrun, clk runs for 5 cycles (10
FCLKs, or 5 FCLKs when using +1xua) and then stops.
• Waveforms are based on simulation time (not on FCLK)
• If multiple primary inputs change at the same simulation
time, all change at the same FCLK.
– Primary inputs are delayed by one FCLK compared to “deposit” or “force”.

simulator emulator Time (ns) 0 10 20 30

ENA CK
ENA

GCK
GCK
CK
11 5/31/19 Cadence Confidential
No glitch
IXCOM flow(2)

simulator emulator Time (ns) 0 10 20 30 40 50


D Q D

CK
gates CK
C Q

Flipflop clocks in
“old” data
1X Mode

• Palladium supports a feature called “1X mode”. In 1X


mode, each design net is effectively evaluated twice per
FCLK cycle. Waveforms show two samples per FCLK
cycle. External inputs are sampled twice per FCLK cycle,
and outputs change twice per FCLK cycle.
• To user, 1X mode looks the same as 2X mode. It’s just that
two design-net evaluations occur in each hardware
emulation cycle.

13 5/31/19 Cadence Confidential


1X Mode: How it works
• Palladium hardware evaluates each net only once per FCLK
But in 1X mode, some design signals need to change twice per
FCLK!
• Therefore, in 1X mode, for any signal that can change at both
rising and falling edges of the fastest design clock, precompile
creates two nets (called net and shadow net) representing the
signal’s value in the high and low phases of the fastest clock.

Z
Q1 Q2 CK net sh

Q1 net

Q2 net
CK
Z net sh
14 5/31/19 Cadence/Customer Confidential
1X Mode: How it works (2)

• Runtime software understands shadow nets and


displays waveforms for all nets correctly as if
running in 2X mode. End-user is not aware of
shadows.
• 1X mode has a capacity cost, proportional to the
number of shadow nets.
• Fortunately, for most designs, most nets do not
need shadows.

15 5/31/19 Cadence/Customer Confidential


Palladium Emulation at Step Clock Level
• The Emulation Clock Cycle (FCLK cycle) is divided into
a sequence of “steps”.
– These steps are almost entirely invisible to the user
• Each step takes 2 ns (i.e. Step clock is 500 MHz)
• Steps are used for both:
– Logic Evaluation
– Communication

One FCLK Cycle

Typically 250-1000 steps, depending on


design

16 5/31/19 Cadence Confidential


Palladium Emulation (2)
• Within a given FCLK cycle, the following happens (slightly
oversimplified):
• At an early step (e.g. 0), each DLY cell output get the value the delay
cell input had in the previous FCLK cycle
• Then the outputs of combinational gates are computed according to
a “schedule”. Output of a given gate is computed at same or later
step than all its inputs
A 5
Z
D Q 7
DLY
8 0

• Inputs from target, and outputs to target, are also scheduled. Inputs
are sampled at the scheduled step; outputs change at the scheduled
step. Bi-di enable, output and input are separately scheduled

17 5/31/19 Cadence Confidential


A 5
Palladium Emulation (3) Z
D Q 7
DLY
• Palladium XP processor architecture 8 1

– 12x256 processors per domain


– One processor is shown below (over-simplified)
Instruction Result • At each step, one instruction is executed.
memory memory
0 • The instruction can represent a DLY cell or
1 DLY Q a 4-input gate

A • Each instruction takes previously
7 AND Z computed results (from same or different
D
processor) as its inputs, and stores its
output in the result memory.
511 • A FCLK cycle consists of one or more
passes through the set of instructions (0-
Processor
511)

18 5/31/19 Cadence Confidential


Upper Bound on Emulation Speed

• From this description, you can see for the compiler to


achieve a step count of 256, no processor can implement
more than 256 instructions.
• This means the utilization of primitives must be no more
than 50%.
• If the emulator is 100% utilized, the upper bound on FCLK
speed is 500MHz/512 which is about 1 Mhz
• To achieve 2 Mhz, the emulator must be less than half
utilized.

19 5/31/19 Cadence Confidential

You might also like