You are on page 1of 30

ECE 679: Digital Systems

Engineering
Patrick Chiang
Office Hours: 1-2PM Mon-Thurs
GLSN 100

Class Introductions
Who am I
Who are you

Class Basics
Class basics
4 Homeworks (%20) (groups of 2)
Midterm (%40)
Final Project (%40)
4-page IEEE report
10 minute presentation (groups of 2)

Guest lecture (Dr. Frank OMahony)


Intel Research Labs (May 4th)
Intel Field Trip (June 7th) TBD
Presentations of 1-2 best project reports

Homework

Class Homework

Skim Dally/Poulton Digital Systems Engineering


Chapter 3

Skim Overview Paper:


http://mos.stanford.edu/papers/mh_micro_98.pdf
Includes running Stat Eye
Oregon State Matlab (eecs.oregonstate.edu/it)
www.stateye.org

Problem Set #1
rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc)
Spice models -- ~pchiang/hspice/process_files/
130nm to 22nm
Simulator lang = spice

Spectre models
DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090

What does this mean for analog


designers?
Ever build an ADC?
Ever wonder what to do with the digital bits?

8-16 bits
@ 100MHz, 200MHz, 400MHz
Goes to Vector analyzer

Analog

Fs = 600MHz

Why does this clock rate not increase?


What really is this output doing? Where
is it going?

Brief Summary
Introduction to the area
Why serial links are important
What are the current technology
trends/limitations

4Gb/s Low Power, Area Efficient Serial Links


Memory

IBM Processor

Interconnection between
different chips

Transmitter Equalization

Receiver Offset Cancellation

2000 0.25um Testchip 2001 0.25um Testchip

CPU

High-speed I/Os

CPU

From/to other
subsystems
(e.g. backplane)

Transmitter
Output

Receiver
Input

Router Backplane(1m, FR4)


Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits
Symposium, Kyoto, Japan, June 2001, pp. 149-152.
Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp.
1591-1599.

Scaling Serial Links:


From 4Gb/s->20Gb/s
Thesis: Develop 20Gb/s Serial Link
Area: 500um x 500um
Power: 200mW/link

1 bit time = 1FO4


Timing uncertainty becomes KEY issue
v

4Gb/s
Eye Diagram

250ps

v
t

20Gb/s
Eye Diagram

50ps

Transmitter Block Diagram

No post-PLL
Clock Buffers

Test Chip
Test Interface

700um

10GHz
PRBS Check
Phase
PLL

DLL
TX

Interpolator
s

Clock
Transmitter
Muxing

RX

Test
Structures

Recovery

PRBS Gen

1.1mm

UMC 1.2V, 0.13um CMOS(single Vt)


Die size 700um x 1.15mm
50 Ohm Pad Termination using Wafer Probes

PLL Measurements
Power Spectrum

Open Loop VCO


Phase Noise
@ 1MHz

-97dBc/Hz

10GHz Jitter (RMS)

0.97ps

10GHz Jitter(pk-pk)

8.0ps

PLL Power

38.6mW

VCO Power

6mW

Tuning Range

1.14-1.31

Q=10 Jitter

R
2

ICPKVCO
2

Q=5 Jitter

(c)

Jitter limited by 1.25GHz input reference clock


HP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)

Eye Diagram
Jitter
2.2ps RMS
15.6ps pk-pk

Data Rate = 19.2Gb/s


Voltage ripple caused by lack of current source
at differential pair tail node

High Speed Transmitter Comparisons


P. Chiang
VLSI 2004
Data Rate (Gb/s) 20Gb/s
Power
165mW
Area
0.2275mm^2
Jitter (RMS, pk-pk) 2.37ps, 15ps
Technology
0.13um CMOS

J. Kim
ISSCC 2005

U. Singh
VLSI 2005

D. Shaeffer
ISSCC 2003

40Gb/s
2.7W
9.18mm^2
1.53ps, 8.11 ps
0.13um CMOS

34Gb/s
40Gb/s
1.335W
4.9W
4.16mm^2
8.25mm^2
1.44ps, 9.44ps 880fs, 5.1ps
0.18um CMOS 0.09um SiGe

A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOS


using a Tri-State Binary PD with 100ps Gated Digital Output
T. Masuda, et. al., ISSCC 2007.
A full-rate 10Gb/s transceiver core employing a tri-state binary PD
with 100ps gated digital output is implemented in a 90nm CMOS
process. Direct drive from the VCO is utilized to eliminate the
10GHz clock buffer current. The RX exhibits a recovered jitter
of 906fs(rms) and an input sensitivity of 5.9mV. The TX generates
a jitter of 5mUI(rms). The chip consumes 250mW.

Conventional Serial Link Receivers


Conventional architectures
also use multi-phase PLL
Static Phase Offset
Power Supply Sensitivity

Multiphase
PLL
ck[0] ck[1] ck[2] ck[3]

D[0]
D[1]

In Data
20Gb/s
Pre-Amp

D[2]
D[3]

2nd Generation Transmitter


Charge

Varactor
Control

10GHz
Oscillator

Pump

Off Chip

Phase

10GHz CLKB

2:1
Divider

@ 1.25GHz Comparator
2.5GHz

50ps
Delay

2:1
MUX

Buffers

8
Low-High
Buffers

2:1
Data

10GHz->5GHz
Divider
4 Low-High

8 phases @
5GHz

PRBS/
BER
Checker

Equalizing
Path

5GHz->2.5GHz
Divider

1.25GHz

Equalizing
Path

10GHz CLK

2:1

Retiming
2:1
2:1

4 phases @
5GHz
5Gb/s
5Gb/s

2:1
MUX

10Gb/s

2:1
MUX

5Gb/s

5Gb/s

2:1
MUX

10Gb/s

2-Tap Equalizer implemented for compensating


for channel losses
Achieve 50ps analog delay with CML buffers

20Gb/s
Main
Path

Fabrication: Test Chip


ST Microelectronics 0.13um test
chip
307mW / transceiver
0.46mm^2
20mV input sensitivity

2006 0.13um Test Chip


450um
350um
Transmitter

500um
600um

Receiver

Results
80mV

20Gb/s

Ideal Channel

20Gb/s

-6.5dB @ 10GHz

All Results
Single-Ended

43ps

33mV
37ps

Results (contd)
20Gb/s

Ideal Channel
with =0.37

20Gb/s

-6.5dB @ 10GHz
with =0.37

72mV
36.4ps

62mV
35ps

Rationale for Multi-cores

Next generation computing Multi-core Processing


i.e. multiple, parallel DSPs (i.e. MACs)

Why we cannot achieve faster frequencies?


Wire delays dont scale like transistors
Power increases exponentially
(when pushing process technology)
Timing margins degraded by
Variability
Power supply noise
Digital crosstalk

NOTE: More independent threads require more


memory bandwidth

Intel, 80 Cores, ISSCC 2007

Research: Explore Parallel Serial Links


Serial Links also exhibit the same characteristics
Channel losses get worse
Power consumption increases significantly with bandwidth
Timing precision limited by:
Static Phase Offset (process variation)
Power-supply Induced Jitter
Interchannel Crosstalk
Serial Links need to to also push for high amounts of parallelism
How is this different than conventional link design?
Channel equalization becomes more difficult
Adjacent channel crosstalk
Difficult channel estimation problem
(power, flexibility, data-rate, equalizer design, channel, distance)
Amortize Clock Power for Multiple Links
Distributed resonant clocking of analog/mixed-signal front-ends

Problem of IO

2500 pins / 2 = 1200 Differential pins


Assume 10Gbs / link = 12 Tb/s Bandwidth
100mW/Gb(bandwidth) = 120W

Stateye Playing
Fun with Stat-Eye
5Gb/s -> 10Gb/s
Worse Channels
Worse timing jitter

Homework examples

Next Time
Telegraphers Equation
Reflection coefficients

Channel Models
Skin Effect
Dielectric constant
vias

You might also like