Professional Documents
Culture Documents
Abstract 2 Architecture
An 8-bit adder, operating at 800MHz, and a single-stage The architecture of the 8-bit adder is based on the cany-
bit-serial adder, running at more than 1 GHz, have been im- increment adders (CIA)proposed by v a g i [2]. The scheme
plemented and successfully tested in a standard 1.O pm CMOS is similar to carry-select adders (CSA)but instead of selecting
process. The performance was achieved through the use of between two results, the carry-input increments or passes the
carry-increment full adders, fine-grain pipelining, and merging intermediate result. The area compared to a standard CSA is
the combinational logic into the pipeline registers. thereby reduced by about one third. Figure 1 shows a 3-bit
block of such a CIA.The delay from the block-carry-in C p b of
1 Introduction
581
27.3.1
IEEE 1995 CUSTOM INTEGRATED CIRCUITS CONFERENCE 0-7803-2584-2/95 $3.00 01995 IEEE
4
Cb
- !.i""
I I I T U
B after the rising clock edge to the outputs, remains about the
same as that for the D-flip-flop. The cycle time of such a NAND
3 Circuit technique of the logic flip-flops pipeline stage is therefore reduced from the delay of the NAND
gate plus the delay of the D-flip-flop down to approximately the
In addition to an efficient architecture the circuit technique
delay of the D-flip-flop itself.
is another crucial aspect of high-speed digital design. The m e
Going one step further, we also put logic functions into the
single-phase clocking (TSPC) technique [3] has proved to be
second stage of the TSPC D-flip-flop (cf. figure 5). The first
one of the fastest techniques for digital CMOS circuits 141.
stage has been duplicated, having the same setup time as the
Figure 3 shows such a dynamic TSPC D-flip-flop optimized for
CLK =
Q
QB
high-speed and glitch-free operation [5]. Prescalers that we Figure 5: Schematic of the AOI-flip-flop (drawn without cir-
have implemented in 1.O pin CMOS using this flip-flop run at cuitry for glitch-free outputs and low frequency)
over 2.6 GHz.
This flip-flop is also the kernel for the new logic-flip-flops. NAND-flip-flop. The additional transistor M B in the second
Compared to a standard pipeline, where the combinational logic stage performs the NAND operation of the two first stages.
is located between the register stages, these gates have been This transistor increases the propagation delay of this AND-
merged into the registers. Thereby the propagation delay of the OR-INVERT (AOI-) flip-flop slightly. The cycle time of an
combinational gate is shared with the delays of the flip-flop, A01 pipeline stage with external gates is nearly 70% bigger
significantly reducing the cycle time of a pipeline stage. than the cycle time of this AOI-flip-flop.
Figure 4 shows such a logic-flip-flop with a NAND gate in Table I gives a summary of the performance of these logic-
its first stage. The setup-tinie, needed to charge or discharge the flip-flops. Although the clock load increases with more func-
first intermediate node B , is slightly increased. The propagation tionality, one can argue that the clock load and area consumption
delay, which is the time it takes to transfer the value of node would be even bigger, if the circuit were built using finest-grain
27.3.2
pipelining with only conventional NAND gates and flip-flops.
CLK
Additionally, a flip-flop for the three-input function A 4-
B C, was designed. Although only three different logk-flip-
flops (plus the D-flip-flop) are available, any logic gate can
be implemented because complementary outputs are available.
The logic-flip-flops have similar circuitry for glitch-free and
low frequency operation as the TSPC D-flip-flop.
Even more logic can be merged into the TSPC D-flip-flop.
For the bit-serial adder two logic flip-flops have been designed
- one for the sum - and one for the carry operation.
TABLE I 6. 5. 4. 3. 2. 1. sync.
Performance of logic-flip-flops’ Stage Stage Stage Stage Stage Stage Stage
Circuit 1 t, I tpd I ## I Width’ I CLKloadj
[ns] [ns] Trans. [pm] [fFl Figure 6: Clock distribution tree. Numbers below buffers
D-FF 0.26 0.GS 16(11) 29 142 indicate channel widths of PMOS and NMOS (in pm)
Nm-FF 0.38 0.70 19(14) 31 187
AOI-FF 0.36 0.78 30(21) 50 273
Simulations show that the clock skew is below 80 ps and the
rise times are around 250ps. The area between pad ring and
the circuit core is used for power supply blocking capacitors.
Die size (includingpads) is 3.3 mm’, the adder itself occupying
4 Implementation only 0.37 mm2. Figure 7 shows a micrograph of the 8-bit adder
chip.
The bit-serial and the 8-bit parallel adder have been imple-
mented in a two-metal single-poly 1.O p m self-aligned CMOS
process. The logic-flip-flops for the 8-bit adder are full-custom
cells, designed like standard cells to allow easy abutment. The
size of each transistor has been optimized using SPICE simu-
lations in an iterative process to achieve minimal delays and to
satisfy the internal loads of the adder.
The input and output signals are fed via dedicated high-
frequency pads including ESD protection up to 1.7kV. To syn-
chronize the input signals and to generate their complements a
row of D-flip-flops has been added in front of the first :stage.
In high-speed designs the clock signal must be distributed
with special care. The clock skew must be controlled by care-
fully balancing the clock tree. The TSPC technique needs no
inverted clock signal and all flip-flops in this design operate on
the rising clock edge. The architecture of the 8-bit adder itself
reduces the total clock load by about one third compared to a
Figure 7: Photomicrograph of the pipelined 8-bit adder
CSA.
The clock signal is distributed through a four stage clock-
tree (cf, fig. 6). Two central buffers feed the clock signal to each For test purposes an additional 8-bit adder block was imple-
pipeline-stage, starting at the last stage, going back to the first mented with a boundary scan structure. These scan-registers
stage. The clock buffers of each stage were sized accoirding to (= AOI-flip-flops) at the inputs and outputs allow testing with
the extracted capacitance of the clock inputs of all its flip-flops. only a few fast input and output signals.
Each flip-flop is laid out with the clock line in its middle,parallel
to the power rails. Therefore not only the power supplies but
also the clock signal can be readily connected by abutment,
further simplifying the clock distribution. 5 Measurements
‘numbers stem from extracted layout and typical SPICE SimUlatiOnS
’all cells have same height of 75 /L” The 8-bit adder was bonded into a high speed ceramic pack-
lOOfF load at each output age. The bonded chip was tested on a HP83000 ASIC tester,
27.3.3
583
Figure 8: Input and output waveforms at 666 MHz on the ASIC tester.
27.3.4
584