(Team Unknown) : 6.2. Subsystem Design Principles

6.2.
Subsystem Design Principles
Page 1 of 5
[ Team Unknown ]
Information Theory Computer Science Wayne Wolf Prentice Hall Modern VLSI Design: System-on-Chip Design, Third Edition
6.2. Subsystem Design Principles

6.2.1. Pipelining
Pipelining is a relatively simple method for reducing the clock period of long combinational operations. Complex combinational components can pose serious constraints on system design if their delay is much longer than the delay of the other components. If the propagation time through that combinational element determines the clock period, logic in the rest of the chip may sit idle for most of the clock cycle while the critical element finishes. Pipelining allows large combinational functions to be broken up into pieces whose delays are in balance with the rest of the system components. As illustrated in Figure 6-2, pipelining entails introducing a rank of memory elements along a cut through the combinational logic. We usually want all the outputs to appear on the same clock cycle, which implies that the cut must divide the inputs and outputs into disjoint sets. The pipelined system still computes the same combinational function but that function requires several cycles to be computed.
Figure 6-2. Structure of a pipelined system.
The number of cycles between the presentation of an input value and the appearance of its associated output is the latency of the pipeline. Figure 6-3 shows how clock period and latency vary with the number of stages in a pipeline. Moderate amounts of pipelining cause a great reduction in clock period, but heavy pipelining provides only modest, and decreasing gains. Meanwhile, latency increases linearly. The total delay through the pipelined system is slightly longer than the combinational delay thanks to the setup and hold requirements of the memory elements. But the clock period can be substantially reduced by pipelining, balancing the pipelined system's critical delay path lengths with those in the rest of the system.
Figure 6-3. Clock period and latency as a function of pipeline depth.
mk:@MSITStore:C:\Amol\M.E\Reference\ME%20Reference\MICROELECTRONICS\M... 12/6/2011
Page 2 of 5
When pipelining, memory elements can be placed along any cut, the best placement balances delays through the combinational logic. As shown in Figure 6-4, if memory elements are placed so that some delay paths are much longer than others, you have recreated in miniature the same conditions that caused you to pipeline the logic in the first place. Perfect pipelining balances the delays between ranks of memory elements, using the principles described in Section 5.4.
Figure 6-4. A pipeline with unbalanced stage delays.
6.2.2. Data Paths

A data path is both a logical and a physical structure: it is built from components which perform typical data operations, such as addition and it has a layout structure which takes advantage of the regular logical design of the data operators. Data paths typically include several types of components: registers (memory elements) store data; adders and ALUs perform arithmetic; shifters perform bit operations; counters may be used for program counters. A data path may include pointto-point connections between pairs of components, but the typical data path has too many connections for every component to be connected to every other component. Data is often passed between components on one or more busses, or common connections; the number of busses determines the maximum number of data transfers on a clock cycle and is a primary design parameter of data paths. Most data operations are regularwe saw in the previous sections that adders, ALUs, shifters, and other operators can be constructed from arrays of smaller components. The cleanest way to take advantage of this regularity in most cases is to design the layout as a bit-slice, as shown in Figure 6-5. A bit-slice, as the name implies, is a one-bit version of the complete data path, and the n-bit data path is constructed by replicating the bit-slice. Typically, data flows horizontally through the bitslice along point-to-point connections or busses, while control signals (which provide read and write signals to registers, opcodes for ALUs, etc.) flow vertically.
Page 3 of 5
Figure 6-5. Structure of a typical bit-slice data path.
Bit-slice layout design requires careful, simultaneous design of the cells that comprise the data path. Since the bit-slice must be stacked vertically, any signals that pass through the cells must be tilablethe signals must be aligned at top and bottom. Horizontal constraints are often harder to satisfy. The VDD and VSS lines must run horizontally through the cells, as must busses. Signals between adjacent cells must also be aligned. While the vertical wires usually distribute signals, the horizontal wires are often interrupted by logic gates. The transistors in the cells impose constraints on the layout that may make it hard to place horizontal connections at the required positions. As shown in Figure 6-6, cells often need to be stretched beyond their natural heights to make connections with other cells.
Figure 6-6. Abutting cells may require moving pins or stretching.
The data path's layout design also requires careful consideration of layer assignments. With a process that provides two levels of metal, metal 1 is typically used for horizontal wires and metal 2 for vertical wires. A wiring plan helps organize your thoughts on the wires required, their positions, and the best layers to use for each wire. A black-and-white wiring plan is shown in Figure 6-7; you should draw your wiring plans in color to emphasize layer choices.
Figure 6-7. A simple wiring plan.
Page 4 of 5
Two circuit design problems unique to data path design are registers and busses. The circuit chosen for the register depends on the number of registers required. If only a few registers are needed, a standard latch or flip-flop from Section 5.2 is a perfectly reasonable choice. If many registers are required, area can be saved by using an n-port static RAM, which includes one row enable and one pair of bit lines for each port. Although an individual bit/word can allow only one read or write operation at a time, the RAM array can support the simultaneous, independent reading or writing of two words simply by setting the select lines of the two words high at the same time. One SRAM port is required for each bus in the data path. A bit-slice with connections between all pairs of components would be much too large: not only would it be much too tall because of the large number of horizontal wires needed for the connections, but it would also be made longer by the many signals required to control the connections. Data paths are almost always made with busses that provide fewer connections at much less cost in area. Since the system probably doesn't need all data path components to talk to each other simultaneously, connections can be shared. But these shared connections often require special circuits that take up a small amount of space while providing adequate speed of communication. While a multiplexer provides the logical function required to control access to the bus, a mux built from static complementary gates would be much too large. A more clever circuit design for a bus is as a distributed NOR gate: the common wire forms the NOR gate's output, while pulldowns at the sources select the source and set the NOR gate's output. (All devices connected to the bus can read it in this scheme.) The circuit choices for busses are much like those for the advanced gate circuits of Section 3.5: pseudo-nMOS, shown in Figure 6-8, and precharged, shown in Figure 6-9. The trade-offs are also similar: the pseudo-nMOS bus is slow but does not require a separate precharge phase.
Figure 6-8. A pseudo-nMOS bus circuit.
Page 5 of 5
Figure 6-9. A precharged bus circuit.

(Team Unknown) : 6.2. Subsystem Design Principles

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Team Unknown) : 6.2. Subsystem Design Principles

Uploaded by

Copyright:

Available Formats

6.2.

Subsystem Design Principles

6.2. Subsystem Design Principles

Figure 6-2. Structure of a pipelined system.

Figure 6-3. Clock period and latency as a function of pipeline depth.

6.2. Subsystem Design Principles

Figure 6-4. A pipeline with unbalanced stage delays.

6.2.2. Data Paths

6.2. Subsystem Design Principles

Figure 6-5. Structure of a typical bit-slice data path.

Figure 6-6. Abutting cells may require moving pins or stretching.

Figure 6-7. A simple wiring plan.

6.2. Subsystem Design Principles

Figure 6-8. A pseudo-nMOS bus circuit.

6.2. Subsystem Design Principles

Figure 6-9. A precharged bus circuit.

You might also like