Professional Documents
Culture Documents
Abstract bit types rather than one: the random activity of the least
The goal of this paper is to present an innovative conceptual significant bits (LSB's) and the correlated activity of the most
framework suitable for achieving accurate and eficient significant bits (MSB's).
estimation of power dissipation for data-path intensive The aim of this paper is to provide a conceptual analysis
architectures described at RT and Behavioral levels. The aim is to framework for accurate and efficient estimation of power
provide the designer with the capability of analyzing different dissipation in embedded and VLSI systems at the architectural
solutions in the architectural design space, before the synthesis level. Such a choice is motivated by the fact that architectural
tasks. level and its corresponding RTL description, are the dl-sign entry
The proposed methodology addresses all the elements point for the majority of embedded systems and IC designs.
composing a typical data-path architecture, such as storage units, In the proposed approach, the target embedded system
functional units and multiplexers. The paper includes architecture for power estimation (to be implemented in a single
experimental results demonstrating the validity of the proposed ASIC) is similar to those proposied in [ 5 ] ,[8] (see figure 1).
approach.
1. Introduction
Power dissipation has become one of the main constraints during
the design of integrated circuits (ICs) in the recent years, due to
the steady increasing of integration level and operating frequency.
Such aspects, combined with the growing demand of portable
systems supplied with limited batteries (e.g., portable PCs, lap-
tops, PDAs, cellular phones) contributed to increase the
importance of power dissipation issues during the design of
electronic systems.
VLSI designers need for advanced techniques and related tools MElrlIlRY
for the early estimation of power dissipation during the design
phases, in order to satisfy the power constraints without reducing
significantly the global performances. The goal is to meet the
design tum-around time deadlines, while exploring the space of
possible design altematives. Accuracy and efficiency of a high
level analysis approach should be the "booster" to meet the power
requirements, avoiding a costly redesign process. It has to be
pointed out that relative accuracy in power estimation is more
important than absolute accuracy since, usually, the main goal is
to compare altemative design solutions [l].
General surveys of VLSI power estimation techniques can be
found in [ I ] , [2], [3], [4] An important aspect is that most of the
average power dissipation in CMOS technology i s strongly Figure 1. Target architecture at RT level
related to the switching activity of the circuit nodes. Such a fact is The ASIC architecture is defined at a pre-synthesis RT-Level and
indicated in [3] as stating that the power estimation is a pattem- consists of the following components:
dependent process. data-path, composed of storage units, functional units and
Analytical and stochastic power estimation techniques at multiplexers. The typical operaition along the data-path implies a
behavioral level have been proposed in [5], targeting real time register-to-register transfer, consisting of operand read from
DSP applications on ASIC architectures. The power dissipated by storage units, an operation performed on the operands and the
some ASIC components, such as data-path components, have results stored in another register [8];
been analytically estimated from the Control Data Flow Graph memories, such as single and multi-port memories, cache
(CDFG) representing the design. For other ASIC components, memories, TLBs, FIFOs and LIFOs, etc.; we assuime that all
such as interconnects and controllers for which the power readwrite accesses to the mernory will be performed through a
information available at the behavioral level are not sufficient, register;
statistical models were built to estimate power based on a control unit, implemented as a set of finite state machines and
stochastic study on 23 different ASICs, showing an average error generating the control lines for the data-path components and the
of approximately 20%. However, the proposed models do not memories;
account for the power consumed by multiplexers and memories. embedded core processor, such as a standard processor, a
Power estimation techniques based on high level descriptions microcontroller, a DSP, etc ..., with its memory (even if part of the
have been also proposed in [ 6 ] ,tailored to data-path architectures. memory can be extemal) implementing the SW bound part;
These techniques derive stochastic models of busses and intemal clock distribution logic;
modules from the statistic behavior of inputs. In [ 7 ] , a power crossbar network, to interface the architectural units by using a
estimation model for data-path architectures operating at the RTL communication protocol at system level. The interconnection
level is described. The model accounts for the switching activity power of the crossbar network is included in the power dissipated
by using the Dual Bit Type (DBT) method, considering two input
1805
I . compute the lifetimes of all the variables in the given VHDL code, P()k = non-switching power consumption per MHz of a single
composed of S statements. A variable y is said to live over a set of register of type tk at a given reference frequency fo, as a function of
sequential code statements (i, i+l, i+2, ..., i+n}, when the variable is the clock input ramptime;
written in statement i and it is last accessed in statement i+n. When a TRk estimated toggle rate of the: registers in the set sk, being the
variable is written in a statement i+k in the set, but last used in the values computed with the live variable analysis;
same statement i+k of the next iteration, it is assumed to live over the 6 = derating factor.
entire set; PMuxestimation. To estimate the size and number of miultiplexers
2. represent the lifetime of each variable as a vertical line from statement from the VHDL code, it is necessary to determine the inumber of
i through statement i+n in the column j reserved for the paths in the data-path. The analysis of design paths and the related
correspondingvariable v,; notations are similar to those performed in [8], however in the
3. determine the maximum number N of overlapping lifetimes proposed approach we consider also the paths from primary
computing the maximum number of vertical lines intersecting with inputs to intemal registers and from intemal registers lo primary
any horizontal cut-line [8]; outputs.
4. estimate the minimal number N of set of registers necessary to Once the size and number of multiplexers has been computed, we
implement the code by using register sharing. Registers sharing has to derive the switching activity of he multiplexers, given the model
be applied whenever a group of variables, with the same bit-width bi, of the ?-input non-inverting multiplexer depicted in figure 2.
can be mapped to the same register. The total number of registers is
N
given by Cb, ;
i=l
5. select a possible mapping of variables into registers by using registers
sharing;
6 . compute the number wi of write to the variables mapped to the same
set of registers; -1
I. estimate a, of each set of register dividing wI by the number of Figure 2. The 2-input non-inverting multiplexer model for power
statementsS: a,=wI / S; hence TR,MAX= aIfCLK. estimation
Considering PREG, it is worth noting that the power of latches and A simplified model for the maximum switching activity of the
flip/flops is consumed not only during output transitions, but also output Z of a 2-input non-inverting multiplexer is:
during all clock edges by the intemal clock buffers, even though az = a, (1 - P S I ) + aB ps'
the data stored in it does not change. The non-switching power where CLA is the switching activity of the input A; aB is the
dissipated by intemal clock buffers accounts for approximately switching activity of the input B and ps' is the static signal
30% of the average power for the LCBSOOK technology supplied probability of the selection net S .
by LSI Logic [9]. The technology features geometries with Globally, the average power diss#ipatedby the multiplexers can be
0.5-micron drawn gate length, 0.38-micron effective channel estimated as:
length optimized for 3.3 V operation. N
Note that the internal clock buffers are independent of the output
load, thus the non-switching power of latches and flip/flops are k= 1
load-independent, but dependent on the clock input ramptime. where N is the estimated number of multiplexers and Pk is the
Globally, the average power of the registers can be estimated as: average power of each multiplexer.
N The value of Pk for each multiplexer k is given by:
pF3ZG = c ( p k + PNSk) P k = Ptk TR, 6
k= 1 where Ptkis the average power consumption per MHz of a 2-input
where N is the number of set of registers sk constituted by the non-inverting multiplexer; T R k the estimated toggle rate of the
same type of latches or D flip/flops and PI, is the average power of multiplexer and 6 the derating factor.
each set sk and PNskis the average non-switching power dissipated PFu estimah'on. The power model for functional units is a
by the intemal clock buffers of the registers in the set sk,that is the complexity-based model [8], estimating the complexity of
average power dissipated by the internal clock buffers when there functional units in terms of equivalent gates. The number of
is no output transition. equivalent gates necessary to implement the l o g c from the VHDL
It has to be pointed out, that the measured average power Pk, code is derived from a library of macro-functions such as adders,
tabulated in the target library, includes also the power dissipated multipliers, etc.. The library should include the estimated number
by the internal clock buffers during clock edges corresponding to of logic gates for each macro-function, depending on the number
output transitions. Hence the estimated value of Pk should account of operands and the parallelism. Once the number of equivalent
for an activity factor given by the TR,, while the estimated values gates for each macro-function has been evaluated, the estimated
of the PN,k should consider an activity factor of (fCLK - T R k ) . power dissipated by the functional units can be expressed as:
The estimated value of Pk and PNSkfor each set sk is given N
respectively by: P w = CPk
F& = nk P&T R K 6 lc= 1
where N is the number of macro-functions, and PKis the power of
PNSk nk (fCLK -mk16 each macro-function given by:
where:
nk = estimated number of registers in the set sk; Pk n k PTECH mk 6
where PTECH is a technological parameter expressed in [pW/(gate
Prk= average power consumption per MHz of a single register of MHz)]; nk is the estimated number of logic gates in the macro-
type tk at a given reference frequency fo, for different output function k; TRk is the toggle rate of the gates of the macro-
standard loads Ck (representing both load cells and function k.
interconnections) and clock input ramptime;
4. Experimental results and conclusions
To verify the advantages of the proposed analysis over task of a more general hardware-software co-design environment
conventional approaches, several experiments has been carried for embedded systems [lo]. Moreover the analysis will be
out. To give the flavor of the potential mismatches existing even extended to consider also the control-units and the embedded core
in simple designs, this section reports the experimental data processor and memories of the target system architecture.
obtained by developing a benchmark ASIC reported in figure 3.
The architecture contains some of the typical elements composing
a generic data-path: two sets of registers, an adder, the U 0 pads, a
battery of 64 multiplexers and a clock distribution tree. The
synthesis data have been obtained by using the Synopsys
Design Compiler environment and we used the state-of-the-art
HCMOS6 technology, featuring 0.35ym and 3.3 V, supplied by
SGS-Thomson Microelectronics at the target operating frequency
of 100 MHz.
A I1 12 CMD
cfl cfl
5. References
P Landman, “High-Level Power Estimation”, in Proc. of ISLPED’96,
Int. Symp. on Low Power Electronics and Design, Monterey, CA,
August 12-14, 1996. pp.29-35.
S.Devadas and S. Malik, “A Survey of Optimization Techniques
Figure 3. The ASIC architecture of the power estimation benchmark Targeting Low Power V U 1 Circuits”, in Proc. of the 32nd Design
Automation Conference, 1995.
Experimental results are reported in table 1, considering several F.N.Najm, “A Survey of Power Estimation Techniques in VLSI
input switching activities (0.75, 0.5, 0.25 and 0.1). The results Circuits”, IEEE Trans. on Very Large Scale Integration (VLSI)
obtained by the proposed methodology have been compared to the Systems, V01.2, No. 4, pp. 446-455, December 1994.
results obtained by the Synopsys Design Power tool on the gate- A. P. Chandrakasan, R. W. Brodersen, “Minimizing Power
level netlist. Note that both the estimation methods are based on a Consumption in Digital CMOS Circuits,“ Proceedings of the EEE,
Zero Delay Model. Regarding the global power, the proposed Vo1.83, No.4, pp.498-523, April 1995.
method provides an over-estimation: the percentage error is in the R. Mehra and J. Rabaey, “Behavioral Level Power Estimation and
range from the 1.59% to the 4.16% with respect to the Synopsys Exploration”, in Proc. of First Int. Workshop on Low Power Design,
estimates. However, being the switched capacitance of I/O nodes Napa Valley, CA, pp. 197-202, April 1994.
usually larger than the switched capacitance of the intemal nodes P.E. Landman and J.M. Rabaey, “Power Estimation for High k v e l
up to three orders of magnitude, the major contribution to the Synthesis”, in Proc. of EDAC-EUROASIC ‘93, Pans, France, pp.
global power is constituted by the I/O power. In the benchmark, 361-366, Feb. 1993.
the U0 power represents the 94.83%, on average, of the total P.E. Landman and J.M. Rabaey, “Architectural Power Analysis: The
power, due to the reduced size of the core logic. Thus, a more Dual Bit Type Method”, Trans. on Very Large Scale Integration
realistic measure derives from the comparison of the core power: (VLSI) Systems, Vo1.3, No. 2, June 1995.
the model provides an over-estimation from the 8.52% to the S. Narayan and D.D. Gajsk, “Area and Performance Estimation from
3 1.64%. In particular, the results show an average percentage System-level Specifications”, Technical Report ICs-92.16, Dept. of
error of 24.87% for the registers, 30.49% for the multiplexers and Information and Computer Science, University of Cdifomia, Irvine,
-0.54% for the adder. Globally, the relative accuracy of our December 20, 1992.
approach compared with the Design Power gate-level tool is LCB500K, Cell-Based ASIC Design Manual, LSI Logic Corporation,
considered satisfactory at this level of abstraction. Furthermore, June 1995.
traditional gate-level methods suffer from a main drawback with [IO] A.Balboni, W.Fomaciari, D.Sciuto, “TOSCA: a pragmatic approach
respect to our approach: the need to perform time-consuming to co-design automation of control dominated systems”,
tasks such as the synthesis. On the contrary, our approach, by Hardwardsoftware Co-design, NATO AS1 Series, Series E: Applied
avoiding to move down to the gate-level description, represents an Sciences - vo1.310, pp.265-294, Kluwer Academic Publisher, 1996.
innovative methodology encompassing the requirements to [ 1 11W.Fomaciari, P.Gubian, D.Sciuto, C.Silvano, “A conceptual analysis
achieve accurate power estimation in a reasonable design time. framework for low power design of embedded systems”, in Proc. of
Work is in progress aiming at integrating the proposed conceptual IEEE Int. Conf.: Innovative System In Silicon, October 9-1 1, 1996,
framework and the related power metrics within the partitioning Austin, TX, USA.