You are on page 1of 5

Design, Modeling and Simulation Methodology for Source Synchronous DDR Memory Subsystems

N. Pham, M. Cases, J. Bandyopadhyay


IBM Corporation, Netfinity PC Servers
11400 Burnet Road, Austin, TX 78758
Tel: (512) 838-6225; PAX (512) 823-5938
Email: npham, cases, iayab @us.ibm.com
1.0 Abstract per DIMM. Some chipsets support unbuffered and registered
This paper describes the performance modeling and simu- DIMMs, and other loading extensions are possible using
lation methodology used to optimize the source synchronous DIMMs with FBT-switches. Total DIMM capacities of
timing equations for the system level interconnects. Actual 128MB, 256MB, 516MB, and 1GB are expected to be avail-
double data rate (DDR) memory system configurations and able commercially in IQOO-4QOO using 128Mb and 256Mb
timing specifications are used to describe the design methodol- SDRAM chips (x4, x8) designs in a x64 or x72 DIMM organi-
ogy. The delay skew budget and noise margin allocation for zation.
the various components of the optimization equations are dis- 4.0 Optimization Equations
cussed in conjunction with their associated delay skew control For memory subsystems using the common clock signal-
techniques. This includes: Driver/ receiver circuit design tech- ing scheme, a common clock is used to reference all transac-
niques, such as controlled driver's impedance and edge rate tions on the bus. In contrast, the source synchronous technique
and differential receivers; resistive termination schemes; hoard
generates sampling strobes on the same chip as the signals.
impedance and crosstalk controlled designs; etc. Finally, Both, the signal and the strobe are then sent to the receiving
design guidelines are given for commercially available DDR chips through the printed circuit boards on the system. In this
SDRAMs designs using double pumped bus transfer rates. technique, the absolute signal propagation delays (flight times)
2.0 Introduction are removed from the system timing equations, and it provides
Continued advances in silicon technology have yielded increased bus data transfer rates. All the delay terms are trans-
dramatic increases in both circuit speed and wiring density. formed to differential delays between the data and the data
These advances are shifting the performance challenges to the strobe signals. Therefore, the signal integrity and the electrical
chip-to-chip interconnections. For digital computer systems, performance of the package and board interconnects are the
the board level interconnection technology is lagging behind major limiting factors to the bus speed. For the common clock
the on-chip circuit-to-circuit interconnection technology. As scheme, the design optimization problem reduces to the mini-
the result of these limitations, system level performance is lim- mization of the signal propagation delay spread across the
ited by data transfer rates and memory bandwidths, including manufacturing process and the environmental conditions. For
L2 cache, L3 cache and main memory subsystems. The Semi- the source synchronous scheme, the design optimization prob-
conductor Industry Association technology road map esti- lem reduces to the minimization of the differential delays (or
mates that by the year 2001, silicon devices will be capable of skews) between the signals and the associated strobe [ 2 ] .
operating at speeds greater than 1 GHz, while external bus The basic source synchronous bus timing optimization
speeds using conventional clocking techniques will be operat- equations are:
ing at speeds of at most 300 MHz [l]. Today increased data
rates can be achieved (400-800 MHz) using various circuit and Tvbmin > + TSr + Afman (EQ 1)
timing techniques, such as source synchronous clock signal-
Tvamin > T D S+~TH, + &in (EQ 2)
ing.
(Tvb -b Tva) < o.5 Tcycle (EQ 3)
3.0 Background Where,
Strong PC and server demands far increased memory Tvb/T, is the minimum time the signal is required to be
bandwidth have forced leading semiconductor makers world-
valid at the receiving components beforelafter the sampling
wide to develop high-performance dynamic random access
edge of the strobe.
memory (DRAM) chip technologies such as RAMBUS and
DDR SDRAM. These new memory technologies will target TDsd is the signal jitter with respect to the strobe for the
multiple applications cost-effective computer platforms in the driving component.
next five years. The initial offerings for DDR registered
Ts,4THr is the setup/hold delay for the receiving compo-
DIMMs are targeted for the server, workstation, and high-end
desktop computer market segments. The initial offerings in the nents.
next two years will be based on memory subsystems with Af is the difference between the signal and the strobe
clock frequency of 100-133 MHz (200-266 Mblseclpin) with arrival times.
limited extensions possible using alternate DRAM packaging Tcycleis the bus transfer cycle time.
schemes.
The basic DIMM usage is assumed to be 1-4 DIMMs per The Af component is a lumped sum effect of many undesir-
data bus, 1-8 DIMMs per address bus, and 1-2 SDRAM banks able events in the system. The system propagation delay skew
Af is composed of the following elements:

0-7803-5908-9/00/$10.0002000 IEEE 267 2000 Electronic Components and Technology Conference


Af = (A% + Xtalk + IS1 + Alength + Trf + Tvref)/Z (EQ 4) 5.0 Bus Timing Specifications
Where, Tables 1 and 2 summarize some of the important DC/AC
Xtalk is the hoard and connector crosstalk induced delay. electrical specifications and bus timing requirements for
PC200 grade DDR SDRAM DIMMs [3], where data is cap-
AZOis the impedance mismatch effects in multiple hoard tured at twice the clock frequency of 100MHz.
systems.
Alength is the wiring length tolerances of data lines with TABLE 1. DCIAC Voltage Specifications
respect to the associated strobe. Parameter Symbol Min Max
IS1 is the intersymbol interference effects. Supply voltage VddNddq 2.3V 2.lV
U0 reference voltage Vref Vddq/Z Vddq/2
Tvref is the Vref noise tolerance effects for single ended Input High (logic 1) Vih(AC) Vrcf
differential receivers. (DQDQS) +0.32V
Trf is the signal rise/fall delay skew. Input Low (logic 0) Vi1 (AC) Vref
(DQDQS) -0.32V
The IS1 is the effects of residual signal settling noise on
subsequent transfer cycles. When the differential receiver is
used as a single ended circuit, the second input is tied to a ref- PC200
erence voltage, Vref, where the switch point occurs. Assum-
ing that there is one strobe and two data lines with input signal
at points a and b as shown in Figure I , the strobe latches in the
data when the voltage signal crosses Vref. Since the data lines
can he rising or falling at t l or t2, they create the rise and fall
delay skew (Trf). It is important to keep Vref free of noise,
and closer to the crossing point x to minimize Trf. The cross-
ing point x is also a function of impedance mismatch of the p
and n devices on the driver circuitry. The controller could
implement a common U 0 receiver design to eliminate the
need for the Vref pin. The signal switching point is then
depending on the threshold voltage of the p and n MOSFET time I I I
devices. Thus, it is sensitive to mismatches between the MOS- Data-idmask to DQS hold I tDQSH I 0.611s I ~

time
FET devices. On the other hand, the SDRAM implements a
Clock and Strobe
differential receiver design with a Vref pin. Vref is generated
DQS falling edge to CK setup tDSS 0.2tCK -
at the package pin using a voltage divider network to the
driver power supply. This receiver reference voltage design is
timc I I I
susceptible to power supply noise. Therefore, conventional
DQS falling edge to CK hold I tDSH I 0.2tCK I -
analog filter circuit design techniques should be used to gener- DQS output access time from tDQSCK 0.8ns -
ate Vref. CK
FIGURE 1. Signal Rise/Fall Delay Skew

g-vref
The DDR SDRAM uses a differential clock (CK) input to
latch the address and command signals. The address and com-
mand setup and hold times from CK are tIS and tIH respec-
tively. For write cycle, the controller sends strobe and data to
-S a b bT
*f
the DDR SDRAM one cycle following the write command
within the tDQSS timing specification. The data is latched on
both edges of DQS. For read cycle, SDRAM takes a few
Figure 2 shows a source synchronous scheme where the
cycles (CAS latency) to assert DQ and DQS signals. The DQ
strobe is centered with respect to the associated group of data
and DQS skew is tDQSQ, and the data is held valid for the
lines including the Af delay skew for both data and strobe. The
tQH time duration.
hoard delay skew for either data valid time (TvaRvb) is half
During read cycle, the DDR implements a delay locked
of the total hoard propagation delay skew.
loop circuit (DLL) which tracks both the edges of CK and
FIGURE 2. Board Delay Skew Allocation input signals and it aligns DQS output edges with CK input
edges. For the controller design, the read cycle timing is a
complete loop from the read command launch time to DQS
signal appearing at its receiver. During write cycle, the con-
troller will launch DQS after CK with a delay of tDQSS. In
this case, the timing specification tDQSS/tDQSH with respect
to CK falling edges is used for timing the data bus.

268 2000 Electronic Components and Technology Conference


6.0 Modeling Methodology Table 3 shows simulation conditions for typical system
The main memory subsystem described in this paper level fast and slow design corners. These simulation conditions
supports the 184-pin, 2.5 Volt, PC200, 72-bit wide, Registered include power supply, temperature, and silicon process, as well
DDR SDRAM DIMMs. The DIMM uses the "x4" DDR as boardhard level electrical parameter variation with process
SDRAMs. The DIMM can be built using either one bank and temperature. The fast case usually addresses overshot,
(mono) or two banks (stacked) SDRAMs. For I GB DIMMs, undershot and ring-back specifications, while the slow case
18 stacked 256Mb SDRAM are used. This system is designed usually addresses bus timing and signal slew rate specifica-
with 8 DIMM slots using an 144 bit double data bus width tions.
mounted on a riser card. This approach is a trade-off between
timing and compactness in the planar design. A schematic of TABLE 3. Simulation Conditions
the board layout is shown in Figure 3. I I Slow I Fast 1
FIGURE 3. Main Memory Layout Schematic Parameter (W/C) (bk)
Voltagcs LOW
-High
Temperature High Low
Driver / Receiver modcls SLOW Fast
zo/To*
Connector RLC
* Impedance and propagation delay of multiple boards
The simulation cases are further divided into:
I , Read and write cycles.
2. Impedance mismatching from system, riser, and DIMM
U
card to look for worst case reflection and IS1 skews.
Figure 4 is a graphical representation of the simulated 3. Light loaded cases with one mono DIMM plugged at
topology during a write cycle. The lead-in transmission line various slots to look for AC and DC level violations.
length (wl+w2+w3) is 4" to 6",the DIMM spacing (w4) is 4. Heavy loaded cases with stacked DIMM in all slots to
0.4" to l", and the trace length to Rt (w5) is 0.2". The Rs and look for worst case timings and signal slew rates.
Rt are 1% tolerance resistors with values of 16 and 24C2
respectively. The DIMM circuitry is not shown in Figure 4. 5 . Wiring length and components tolerance for physical
Figure 5 summarizes the electrical models for the controller topology options.
package, the DIMM connectors, the riser card connector, and 6. Coupling induced delay skews from connector and trace
the DIMM package. spacings using three line coupling model.
FIGURE 4. Main Memory Simulation Topology
7. Receiver Vref noise affects Trf skew delay.
DIMM connectors
8. Simultaneous switching outputs (SSO) noise generated
by data switching impacts data and strobe differently
due to the 114 cycle strobe centering delay.
The daisy chain topology of Figure 4 is selected for the DQ
and DQS nets, based on a net topology sensitivity analysis.
T-lint Zg =SOiSQ
Other net topologies, such as the tee topology where the lead-
FIGURE 5. PackageKonnector Models
in length comes to a center point with 2 DIMMs on each side,
offer slower signal transition rates. The major concern is a res-
onant point caused by the DIMM connector and the SDRAM
package parasitic, which creates AC signal level violations. A
scries resistor is placed on the DIMM between these compo-
nents to dampen the resonance effect. The reflection also
induces an IS1 effect, forcing the use of a load-end termination
technique. A resistor termination Rt was placed at the last
DIMM connector connected to Vref to keep the IS1 effect
within a reasonable range. The reflection also occurs in the
read cycle. A series resistor Rs is also inserted between the
lead-in trace and the first DIMM connector to reduce the
impedance mismatch between the lead-in trace and the trace
segments connecting the DIMMs. Both heavy and light loaded
cases are simulated to investigate DC and AC level sensitivi-
ties to resistor termination values as well as IS1 pattern effects.

269 2000 Electronic Components and Technology Conference


~

7.0 Simulation Results FIGURE 7. Best Case Read Cvcle Simulation Results
An IS1 pattern of 48 bit long is chosen to simulate the ran- 22
domness of the data signal. The strobe driver input signal is 2
composed of four pulses followed by a quiet duration pattern
18
to emulate the strobe signal waveform. Both driver and , , ,~....
L l 6 ..............
.....I
receiver models include the substrate power and ground induc-
--
I , ,

tance effect and they are connected to power distribution mod- 9'
31.4
I

- - .-
, ,

.;,-.......iV.r&.......1.........
,

, , , ,
, MM
I-

els. The strobe driver input is shifted an 1/4 cycle from the data 1.2 .:.......: ,... ...:.
, ........
I , , ,
signal to take into account the power and ground noise induced ........................
, , I
1
driver delay skew. It was found that the strobe and the data
have a compatible Af delay skew based on these conditions. BOOm

Crosstalk on three coupled line model and connector effect are 000m
I I : : : ' : : : ' : :
analyzed separately to speed up the simulation effort. Sensitiv- r = ,I ' ' F , , , , " , I I" ,'I, , 1 ' I , , , I F " ' , , > ' ' I " r,
0 500m 1 1.5 2 2.5 3 3.5 4 4.5
ity analysis using either one mono or one stacked DIMM Timdns)
plugged into all combination of slot positions indicated that
FIGURE 8. Best Case Write Cycle Simulation Results
the worst case voltage level is obtained with one mono DIMM
plugged into position a under best case (h/c) condition for 2
either the read or the write cycle (Refer to Figures 3 and 4 for
slot position definition). Figure 6 shows read cycle waveforms
measured at the controller pin where the rise and fall edges are -a 1.8

1.6
superimposed on the same plot (Eye diagram format). The 8
voltage levels of these waveforms met both the DC and AC $ 1.4
VihNil specifications.
1.2
FIGURE 6. One Mono DIMM Plugged into Slot a b/c -
Read Cycle 1
. . . . . .
, , , , , ,
i ~ ~ i ' ~ ' ~ I ~ ' ' ' I ' ~ ~ ' I " ' ~ I ~ ~ ' ' I ~ ~ ~ ' F ~ ~
, , , , ,
0 500m 1 15 2 25 3 3.5 4 45
nine (IIF)
Table 5 summarizes the timing allocations for both a write
and read data bus. A similar table for address and control bus
can be constructed, and it is not shown. This spreadsheet is
used to indicate timing closure Cor Tva and Tvb, and is useful
to identify the design areas where improvements can be made.
The multiply by 2 of the Af parameters is to account for both
DQ and DQS. The strobe is centered at the optimum point on
data valid window to take advantage of onequal data setup and
hold times. The board and card trace impedances are con-
trolled using 50&10% triplate structores, while the crosstalk
. . . . . . . . induced delay adder is minimized using a trace space to trace
o soom i 15 2 2.5 3 3,5 i 4.5 5 width aspect ratio of greater than three. This corresponds to a
Time (nr)
coupled voltage level of less than 5% of the aggressor's volt-
Table 4 shows a sample of timing simulation results for the age swing.
ISI, Trf, and AZ effects using the stacked DIMM plugged into
all slot positions and no Vref variation. The skew delays are TABLE 5. Summary of Timing Allocations
measured from the first rising to last falling edges when they
cross the Vref level, at point t l and t2 as shown in Figure I for
aread cycle, Figure 8 shows similar results for a write cycle.

TABLE 4. Simulation Results for ISItTrf+AZ


Read(ps) Write(ps)
Vref?Umv b/c I w/c b/c I
wic
DIMM a 200 I 220 60 I 145
DIMM d 350 I 270 16 I 175
I I
vlrr*ioOmv 1370 Vrcf(n-p mismatch) swps
Oerign Margin cS40 Design Margin cl60

270 2000 Electronic Components and Technology Conference


8.0 Design ?hade-offs 3. Command lines
The controller driver design is optimized for the worst case A. Controller total lead-in trace length (wl+w2+w3):
net topology. The fast slew rate tends to create resonance with Differential clock (CK) = 5”
the stacked DIMM package inductance, such as the 0.5nH Command lines within group (CMD) = C e 1 ”
inductance in Figure 5c, causing a plateau region in the middle B. DIMM spacing (w4) = 0.4”- l”(halanced lengths across
of the signal transition. The slew rate is selected based on the all DIMMs)
signal waveforms at both the SDRAM pin and the receiver cir- C. Far end termination trace length (w5):
cuit input pin. The driver impedance affects the rise and fall Last DIMM connector to termination resistor Rt S0.2”
time, dv/dt. In the heavy loaded stacked DIMM case, DIMM
In order to achieve balanced trace lengths for critical sig-
position d experiences the slowest dv/dt that is susceptible to nals within a group or between groups, meander lines can be
Vref variation. It also reduces the timing margin of the
used to provide the fine control of the delay skew. Because of
SDRAM components. The driver impedance of ahout 15-200 the difference in delay between a straight line and a meander
is designed to satisfy both the slow slew rate and the low line of tight pitch, design rules are needed to guarantee the
impedance requirements. proper delay skew control for meander lines[4].
The net topology design requires extensive simulations to
control the reflections and resonances as mentioned in the pre- 10.0 Summary
vious sections. The mechanical dimension of the system A design and simulation methodology is presented that
affects the choice of vertical riser card, interleaved the allows for optimization of the source synchronous timing
DIMMs, etc., which in turn dictates the choice of the DIMM equations in a practical memory subsystem environment using
spacing (w4) and the resistor Rs and Rt value selections. Each PC200 DDR DIMMs. For system timing closure, the critical
choices of DIMM spacing requires an appropriate value of Rs design parameters are the lead-in trace length, the DIMMs
and Rt. Besides the topology presented in this paper, other connector spacing and the driver’s electrical characteristic (i.e.,
topologies such as lOmm DIMM spacing meet the design driver’s output impedance and signal slew rate). The trace
requirment with Rs and Rt values of 10 and 1 5 0 respectively. lengths should be well balanced within a signal group and its
The lead-in trace determines the IS1 delay skew. As a rough associated strobe. An accurate modelling and simulation meth-
estimate, for every 2 inch deviation from the designed trace odology is essential to guarantee hardware functionality pri-
length, the Af delay skew varies hy approximately lOaps. marily due to the sensitivity of the system timing parameters to
inter-symbol interference, signal coupling and impedance mis-
9.0 Design Rules and Guidelines match effects. Finally, a bus timing allocation table and its
The DDR subsystem design rules and guidelines are devel- associated design roles and guidelines are developed using
oped from a series of sensitivity analyses performed using pre- typical commercial DDR DIMM specifications.
physical design assumptions. These results are used to drive
initial component placement and wiring. These design guide- 11.0 Acknowledgment
lines are grouped into three hasic categories: Data bus, address The authors wish to thank Paul Coteus from the IBM
bus, and command lines. Additionally, these guidelines are Research Division, Mark Kellogg from the IBM Microelec-
written for a set of signal lines and their associated strobe line tronics Division, and John Borkenhagen from the IBM Server
since the delay skew between the data and strobe is the critical Division for their reviews, suggestions, and assistance during
design parameter. The design rules and guidelines for a typical the period of this work.
PC200 DDR DIMM memory subsystem are summarized 12.0 References
below (Refer to the net topology shown in Figure 4 for the net I. Semiconductor Industry Association, “The National
trace length definitions): Technology Roadmap for Semiconductors- Technology
1. Data bus Needs:’ 1997 Edition.
A. Controller total lead-in trace length (wl+w2+w3):
Data strobe (DQS) = 5%” 2. T. Arabi, J. Jones and D. Riendeau, “Modeling, Simula-
Data lines within group (DQ) = DQSM.l” tion and Design of the Interconnect and Packaging of an
Series resistor Rs to first DIMM (w3) SO.2” Ultra-High Speed Source Synchronous Bus,” IEEE
B. DIMM spacing (w4) = 0.4”- l”(halanced lengths across EPEP’98, pp 8-11, October 1998.
all DIMMs) 3. Jedec Standard Specification Document for DDR
C. Far end termination trace length (w5): SDRAM, July 1999.
Last DIMM connector to termination resistor Rt S0.2”
4. B. Rubin and B. Singh, “Study of Meander Line Delay
2. Address bus
in Circuit Boards:’ IEEE EPEP’99, pp, 193-196, Octo-
A. Controller total lead-in trace length (wl+wZ+w3):
ber 1999.
Differential clock (CK) = 5”
Address lines within group (ADD) = CK+l”
B. DIMM spacing (w4) = 0.4”- l”(ha1anced lengths across
all DIMMs) @COLOR
Your accompanying CD-ROM contains h
C. Far end termination trace length (w5):
version of this pnper with color images.
Last DIMM connector to termination resistor Rt 50.2’’ http:llwww.cpmt.orglproceedings/order.ht~l

2; ‘1 2000 Electronic Components and Technology Conference

You might also like