You are on page 1of 58

Metastability

Cairo University Faculty of Engineering


Department of Electronics and Electrical Communications
Dr. Karim Ossama Abbas
2012

Contents
Definition of metastability
Entering metastability
Control signal synchronizers
Reliability of synchronization
Mesonchronous and multisynchronous systems
Main reference: Metastability and Synchronizers: A Tutorial,

Ran Ginosar, IEEE CASS and CS

Metastable point

A metastable point is one that is strictly stable


However it is different from truly stable states 1 and 2
The metastable point is extremely sensitive
It will inevitably settle on either stable points 1 or 2 if left

alone
Noise alone is enough to do this

Metastability in logic
As far as inverters are concerned:
Metastability is being in the high gain region, particularly close

to the logic threshold


The stable states are the low gain regions 1 and 0 logic
regions, below and above VIL and VIH

How do we enter metastability


We enter metastability if data and clock are switched

simultaneously or too close together


This is equivalent to a violation of setup or hold time
This is almost inevitable if we cross clock domains without
additional measures

Metastability inside a flip flop

CLK is 0 and D changes from 0 to 1


Node A is initially at 0
Node A is supposed to charge to 1
We have to give enough time for this to happen (setup time)

Metastability inside a flip flop

CLK changes too soon from 0 to 1


If this happens while A is midway through charging
Then B is also midway through discharging
A and B drive the inverters into a metastable state

Metastability inside a flip flop

The two inverters WILL exit this metastable state through

their positive feedback


However, it will take them too much time to exit

Metastability inside a flip flop

Metastability: When it takes longer than Tcq for data to

appear at the output of a latch


Note that many people mistake metastability for wrong or
invalid data appearing on output

Waveforms of a metastable latch

The input waveforms are moved closer to the clock by steps

of 100ps
The closer the waveform, the longer it takes the output to
settle
All must eventually settle

Waveforms of a metastable latch

The input waveforms are moved closer to the clock by steps

of 1ps
Again takes longer to settle the closer we are
Which direction we settle depends on noise and how close to
inverter threshold we are

Practical situations with metastability


Metastability may seem like an easy situation to avoid
However it is very common
Most commonly seen while crossing clock domains
For example consider an FFT circuit passing data to a

baseband demodulator
Both use different and unrelated clocks
Because the frequencies are independent there will be
periodic instances where data and clock edges are too close
together
Setup and hold times are periodically violated for the
receiver

Probability of metastability
First we need to know how often the latch enters

metastability
Assume that for metastability to occur a change in signal
must happen in a certain window around the edge (e.g.
Setup+hold times) Tw.
Now lets assume a uniform distribution of change within the
clock period (assuming e.g. Two independent clock domains)
Tc
Thus probability of entering metastability is Tw/Tc=TwFc

Rate of metastability
D does not change every cycle
Assume D changes at a rate Fd < Fc
Then rate of metastability is FdTwFc
Consider the following example:
Fc=1GHz
Fd=100MHz (D changes on every tenth cycle)
Tw=20ps (typical value)

Rate=2,000,000 times/sec = TWICE per MICROSECOND

What does the metastability rate


mean?
Metastability is a failure mode
We get logic values that we dont expect
We may get the logic values we expect but after a longer than

expected delay
Thus the following registers store the wrong values
This error propagates and the circuit completely fails
If this happens twice per microsecond, then the circuit is
completely useless

Circuit model to exit metastability


In metastability both inverters in the

master latch are in the high gain


region
The inverters are modelled as
amplifiers
Their gain is A, the output
impedance is R (small signal)
C is the parasitic capacitance from
gates and drains

Behaviour to exit metastability


A small difference exists between Va

and Vb (due to initialization or


noise)
This difference is quickly amplified
by the positive feedback
The master latch quickly resolves to
a stable state
However, this is not as quick as
normal operation where input is
DRIVEN and DRIVING the
inverter pair

Mathematical behaviour of inverter


pair

Notes on factors affecting metastability


exit
C, A, and R affect the time to exit metastability by affecting

the time constant


C is the parasitic capacitance and as a first order scales with
technology

Three dimensional effect and wiring caps are affecting this trend

R and A are equal to the analog gm=A/R


The time constant was generally expected to drop with

technology, doesnt scale well lately

Graph of exiting metastability


For a long time after the

edge the voltage


remains constant!
Where is the
exponential growth?
The linear scale is
fooling us
The visible part of the
growth is not the
exponential part

Log scale graph of exiting metastability

Exponential growth happens with the edge


We exit the exponential regime once the inverters enter

their low gain regions (passing Vih or Vil)


Thus we trace a time between initial voltage difference Vo
and entering stable regions at voltage V1

Time to exit metastability

The time to exit metastability depends logarithmically on the

entering voltage Vo
V1 is a constant
The larger the Vo, the shorter the time
If Vo = V1 then time=0 (there is no metastability
If Vo=0 time=infinity (impossible due to noise)
Thus metastability is NOT being exactly in the middle and
exiting randomly, it is entering the high gain region and
exiting in a known but long time

Time to exit is a log function of starting


voltage difference

Probability of remaining in
metastability
In general we have no idea how large Vo is
It is affected by the exact conditions when the edge occured
Also depends on noise and coupled signals!

We do know though that voltage difference grows

exponentially
So we can obtain a value for the PROBABILITY that the latch
or flip flop is still metastable at a time S
If a flip-flop enters metastability
the probability it is still metastable at a time
t>0 later is:
e t /

Synchronization
We can NEVER guarantee that asynchronous signals will not

cause metastability
In fact we can GUARANTEE they will cause metastability at
a certain rate
The best we can do is design a scheme that makes the
probability of metastability extremely low

Condition for failure


Now assume we are sampling a metastable circuit at a time S

after it enters metastability

i.e. At a time S after the clock edge that caused metastability

We define a failure as the signal remaining metastable at that

later time we are sampling at


This is a two component event:
We enter metastability

At a time S later we are still in metastability

Probability and rate of entering


metastability
Remember probability of metastability is TwFc (Tw is the

critical time around clock edge and Fc is cock frequenecy)


And rate of metastability if Fd Tw Fc (Fd is the rate of data
change)

MTBF
The inverse of rate is time
The inverse of rate of failure is Mean Time Between Failures

(MTBF)
Our aim is thus to increase MTBF as much as we can

Solution to metastability?
Lets assume we give the circuit one full cycle to resolve

from metastability before sampling


This means S=one complete cycle time
Using typical values for 28nm technology (33nm modified)
Time constant = 10ps, Tw=20ps, Fc=1GHz, Fd=100MHz, and

S=complete cycle

MTBF=4X10^29 years
If the circuit fails due to metastability, the next failure

happens probably after the end of the world!


Thats safe enough

Observation of metastability at output

If node A is metastable, what do we see at output?


Q is three inverters later
We will not see an undefined value, we will see either 1 or

Observation of metastability at output

However the 1 or 0 may switch before settling to the

correct value
It will take longer than Tcq for the correct value to appear on
Q
This is the true definition of metastability at output port

Two flip flop synchronization circuit

There is a finite and high chance that Q1 is metastable often


However, FF2 samples this output S later (one cycle late)
The probability that Q2 will go metastable is very low
We must substract the following from S:
Wire delay, setup time of FF2

Scenarios in the 2 FF synchronization


circuit

Scenario a:
FF1 catches the correct value of D1 in cycle 1
FF2 samples this value in cycle 2
The value appears at the output of FF2 in cycle 2
Q2 is correct in cycle 3

Scenarios in the 2 FF synchronization


circuit

Scenario b:
FF1 completely misses the 1 on D1
Q1 remains at 0
FF1 samples correct D1 in cycle 2
Q2 is correct in cycle 3

Scenarios in the 2 FF synchronization


circuit

Scenario c:
FF1 goes metastable
Q1 resolves to correct value but very slowly
However, there is negligible probability it will fail to resolve

before next edge (S)


FF2 samples correct value in cycle 2
Q2 is correct in cycle 3 (except every MTBF years)

Scenarios in the 2 FF synchronization


circuit

Scenario d:
FF1 goes metastable
Q1 resolves to incorrect value (logic low)
Difference between cases b and d? (missing the value or

resolving metastability wrong)


However, on cycle 2 the correct logic value is registered
FF2 samples the correct value from Q1 a cycle later
Q2 is correct in cycle 3

Scenarios in the 2 FF synchronization


circuit

Scenario e:
FF1 goes metastable and goes high
FF1 glitches and resolves metastability at 0
FF1 samples correct value comfortably in cycle 2
Q2 is correct in cycle 3

Scenarios in the 2 FF synchronization


circuit

Scenario f:
FF1 goes metastable and its output shoots to high
Q1 resolves to correct value (maintaining high)
FF2 samples correct value in cycle 2
Q2 is correct in cycle 3

Scenarios
Data can be:
Missed
Caught
Metastably sampled
Resolves to correct value
Resolves to wrong value
Shoots to either values then resolves to either values

The only common factor between all scenarios is that one

cycle later the second flip flop CERTAINLY (or at least


within an MTBF) samples correctly

Turning the synchronization circuit into


a synchronizer
If data is to be sampled correctly, it must be maintained on

input for up to two cycles (sometimes one!)


How do we know how long to maintain?

Turning the synchronization circuit into


a synchronizer
The sender should put its data on a bus and raise the request

control signal
The receiver sees the request and reads the data, raising
acknowledge. Sender can now change data and re-raise req

Turning the synchronization circuit into


a synchronizer
But req is raised at the transmitter clock, ack is raised at the

receiver clock! They could become metastable


We are synchronizing the two controls through 2 FF
synchronizers
Worst case is a control is read a cycle late
No problem!
Conservative!

Turning the synchronization circuit into


a synchronizer
Note data is NEVER passed through synchronizers
FF synchronizers are ONLY used with control signals
Data is on busses
Synchronizers have a one cycle uncertainty on when they

manage to synchronize

Turning the synchronization circuit into


a synchronizer
It is certain that some bits of the bus synchronize in the first

cycle and others in the second cycle


One bit wrong on the whole bus is a failure of the whole data
bus

Overhead of synchronization
It takes 2 cycles (always consider worst case) to synchronize

req
2 cycles to synchronize ack
1 cycle at each side (at least) to read data
Now if another transfer requires a similar change in controls
then req and ack must be lowered to get ready for the next
cycle
Transmitter reads ack high (2 cycles) and lowers req (1 cycle)
Receiver sees req going low (2 cycles) and lowers ack (1
cycle)
Overhead is 2+2+1+1+2+2+1+1=12 cycles to transmit
one word!!

Asynchronous FIFO for high payload


transfer
In case we have a lot of data words to transfer in bulk the best

approach is the simplest


We will use a two port two clock asynchronous FIFO
FIFO = First In First Out
It is simply a shift register
However, it has indicators for EMPTY and FULL
It is two port because one port writes data, the other reads
If the two ports are controlled by different clocks, it is
asynchronous

Two port FIFO

Simply a two port RAM


The write port has a write pointer (last written address)
The read port has a read pointer (last read address)
Reads and writes can only be sequential

Empty and Full, what do they mean?


Empty indicates there is no more data to read
Empty is significant to the receiver
Empty tells the receiver to stop reading until Empty goes low
Full indicates we have run out of space to write
Full is significant to the transmitter
Full tells the transmitter to stop writing until Full goes low

Empty and Full, how to calculate?


Empty happens when the read and write pointers are equal
But full also occurs when the read and write pointers are

full!!

INSERT FIGURE OF QUEUE BEING READ AND

WRITTEN

Empty and Full, how to calculate?


However Empty occurs when we have JUST read and the read and

write pointers are equal


Full occurs when we have JUST written and the read and write
pointers are equal
So if read is incremented and we get equal pointers
EMPTY

If we write and we get equal pointers


FULL

INSERT FIGURE OF QUEUE BEING READ AND WRITTEN

Who increments the pointers?


Read is incremented by the receiver whenever it reads a new

word
Write is incremented by the transmitter whenever it writes a
new word
Full and Empty = NOT(Read-Write)
Thus calculating both Full and Empty requires reading
pointers from two clock domains!

Synchronizing pointers

The two pointers must be passed through synchronization

FFs before going to the other side for comparison


Read pointer is generated by the receiver
Write pointer is generated by the transmitter

Going EMPTY

The receiver is reading, but the transmitter is slower or has

stopped writing
The receiver increments read pointer
It reads write pointer through synch
It raises empty flag and stops reading

Going EMPTY Synch one cycle late

What happens if synch is one cycle late?


Write pointer may have been incremented but we missed the

increment till one cycle later


Means we raise empty when FIFO is not empty
Safe, we unnecessarily stop reading for one cycle, no problem
No scenario where we fail to raise empty when FIFO is empty

Going FULL

The transmitter is writing, but the receiver is slower or has

stopped reading
The transmitter increments write pointer
It reads Read pointer through synch
It raises Full flag and stops Writing

Going FULL Synch one cycle late

What happens if synch is one cycle late?


Read pointer may have been incremented but we missed the

increment till one cycle later


Means we raise Full when receiver has cleared one address
Safe, we unnecessarily stop writing for one cycle, no problem
No scenario where we fail to raise Full when FIFO is Full

Advantages of FIFO
Much lower latency than normal synchronizers
We have at most 2 cycles of being held up if the empty or full

flags are asserted


Once reading or writing have resumed they are synchronous
with the port clock
In other words there is no handshaking unless we hit the
empty or full conditions, while the read and write pointers
are far, each port behaves like a synchronous circuit

We are cheating ... a little


We previously said we shouldnt use synchronizers with data

words
Pointers are not single bit controls, they are data busses
The pointer counters are Grey encoded, they do not simply
increment
With Grey encoding only one bit changes every cycle
Even if the synchronizer fails to synchronize one bit, it is
highly unlikely that the failed bit is the bit that has changed
Usually works well

You might also like