You are on page 1of 4

1997 IEEE International Symposium on Circuits and Systems, June 9-12,1997, Hong Kong

High-Level Power Estimation of VLSH Systems


W.Fornaciari (l), P.Gubian (2), D S c i u t o (l), C.Silvano (2)
( I ) Politecnico di Milano, Dip. di Elettronica e Informazione, P.zza L.Da Vinci, 32 - 201133 Milano, Italy
(2) Univ. degli Studi di Brescia, Dip. di Elettronica per l'Automazione, Via Branze, 38 - 25123 Brescia, Italy

Abstract bit types rather than one: the random activity of the least
The goal of this paper is to present an innovative conceptual significant bits (LSB's) and the correlated activity of the most
framework suitable for achieving accurate and eficient significant bits (MSB's).
estimation of power dissipation for data-path intensive The aim of this paper is to provide a conceptual analysis
architectures described at RT and Behavioral levels. The aim is to framework for accurate and efficient estimation of power
provide the designer with the capability of analyzing different dissipation in embedded and VLSI systems at the architectural
solutions in the architectural design space, before the synthesis level. Such a choice is motivated by the fact that architectural
tasks. level and its corresponding RTL description, are the dl-sign entry
The proposed methodology addresses all the elements point for the majority of embedded systems and IC designs.
composing a typical data-path architecture, such as storage units, In the proposed approach, the target embedded system
functional units and multiplexers. The paper includes architecture for power estimation (to be implemented in a single
experimental results demonstrating the validity of the proposed ASIC) is similar to those proposied in [ 5 ] ,[8] (see figure 1).
approach.
1. Introduction
Power dissipation has become one of the main constraints during
the design of integrated circuits (ICs) in the recent years, due to
the steady increasing of integration level and operating frequency.
Such aspects, combined with the growing demand of portable
systems supplied with limited batteries (e.g., portable PCs, lap-
tops, PDAs, cellular phones) contributed to increase the
importance of power dissipation issues during the design of
electronic systems.
VLSI designers need for advanced techniques and related tools MElrlIlRY
for the early estimation of power dissipation during the design
phases, in order to satisfy the power constraints without reducing
significantly the global performances. The goal is to meet the
design tum-around time deadlines, while exploring the space of
possible design altematives. Accuracy and efficiency of a high
level analysis approach should be the "booster" to meet the power
requirements, avoiding a costly redesign process. It has to be
pointed out that relative accuracy in power estimation is more
important than absolute accuracy since, usually, the main goal is
to compare altemative design solutions [l].
General surveys of VLSI power estimation techniques can be
found in [ I ] , [2], [3], [4] An important aspect is that most of the
average power dissipation in CMOS technology i s strongly Figure 1. Target architecture at RT level
related to the switching activity of the circuit nodes. Such a fact is The ASIC architecture is defined at a pre-synthesis RT-Level and
indicated in [3] as stating that the power estimation is a pattem- consists of the following components:
dependent process. data-path, composed of storage units, functional units and
Analytical and stochastic power estimation techniques at multiplexers. The typical operaition along the data-path implies a
behavioral level have been proposed in [5], targeting real time register-to-register transfer, consisting of operand read from
DSP applications on ASIC architectures. The power dissipated by storage units, an operation performed on the operands and the
some ASIC components, such as data-path components, have results stored in another register [8];
been analytically estimated from the Control Data Flow Graph memories, such as single and multi-port memories, cache
(CDFG) representing the design. For other ASIC components, memories, TLBs, FIFOs and LIFOs, etc.; we assuime that all
such as interconnects and controllers for which the power readwrite accesses to the mernory will be performed through a
information available at the behavioral level are not sufficient, register;
statistical models were built to estimate power based on a control unit, implemented as a set of finite state machines and
stochastic study on 23 different ASICs, showing an average error generating the control lines for the data-path components and the
of approximately 20%. However, the proposed models do not memories;
account for the power consumed by multiplexers and memories. embedded core processor, such as a standard processor, a
Power estimation techniques based on high level descriptions microcontroller, a DSP, etc ..., with its memory (even if part of the
have been also proposed in [ 6 ] ,tailored to data-path architectures. memory can be extemal) implementing the SW bound part;
These techniques derive stochastic models of busses and intemal clock distribution logic;
modules from the statistic behavior of inputs. In [ 7 ] , a power crossbar network, to interface the architectural units by using a
estimation model for data-path architectures operating at the RTL communication protocol at system level. The interconnection
level is described. The model accounts for the switching activity power of the crossbar network is included in the power dissipated
by using the Dual Bit Type (DBT) method, considering two input

0-7803-3583-X/97 $10.00 01997 IEEE 1804


by the outputs of data-path, memories and control logic; clock frequency.
primary UOpads. The values of the toggle rates for the input, output and bi-
Due to space limits, this paper deals only with the estimation of directional nets of the ASIC should be provided with the
the data-path power budget. The proposed analysis framework is specifications at the system-level. The values of these parameters
based on the ASIC model at the behavioralmT level specified by for the internal nets will be derived from the specifications in the
using the VHDL language. The entire analysis is based on the proposed model.
probabilistic estimation of the switching activity. The inputs for 3. Power estimation model of the target architecture
the analysis are: In the proposed single ASIC architecture, the average power PAVE
the ASIC specification, consisting of a hierarchical VHDL can be subdivided into PAvE=PIo+P~oRE, where PIo and Pco, are
description implementing the target system architecture depicted the average power of the I/O nets and intemal nets of the ASIC,
in figure 1; respectively.
an aZlocation library, composed of available components The proposed model is based on the following assumptions:
implementing basic blocks such as registers, adders, multipliers. the supply and ground voltage levels in the ASIC are fixed,
multiplexers, logic gates, 1/0 buffers, etc. Each component is although it is worth noting the impact of supply voltage
specified in terms of models describing logic behavior, input reduction on power;
capacitances, area and power characteristics dependent on the the design style is based on synchronous sequential circuits;
chosen technology; the data transfer occurs at the register-to-register level;
technological parameters, such as frequency, power supply, a Zero Delay Model has been used, thus ignoring the
derating factors (considering the effects of variations in process, contribution of glitches and hazards to power.
voltage and temperature), etc.;
PIo estimation. Although a pre-synthesis analysis is performed, it
the switching behavior, of the ASIC primary 110 pads in terms of
is assumed to know the ASIC interface in terms of primary I/O
toggle rate of the input, output and bi-directional nets.
buffer characteristics and related switching activity from the
The paper is organized as follows. The discussion starts by
system specifications.
introducing the switching activity factors, derived from system
The set S of input, output and bidirectional nets of the ASIC can
specifications, constituting the basis of the analysis. The proposed
be subdivided into sets s k of nk nets each, corresponding to the
estimation model is presented in Section 3, along with its formal
same type tk of I/O buffers S= {sI,sz, ..., sk, ..., sN}. If the generic
definitions. Results obtained from a data-path application
set sk is composed of output buffers of type tk, the average power
benchmark are reported in Section 4, which also outlines the
of the set sk can be estimated as Pk = nk PtkT R k 6, where nk is the
future developments of our investigation.
number of output nets in the set ski Plk is the average power
2. Background of the analysis consumption per MHz of a single output buffer of type tk. The
Power dissipation in CMOS circuits can be considered as value of Ptk is computed as a function of the output load Ck at a
composed of both static and dynamic components. Static power is given reference frequency fo. “RI, is the toggle rate of the output
mainly due to leakage currents due to reverse-biased diodes and nets of the set sk, derived from system specifications; 6 is the
suh-threshold transistor conduction. However, the dominant part derating factor takmg into account the derating of the power
of the power budget in CMOS circuits is due to dynamic power values contained in the target library; this parameter models the
and it is composed of two terms [2].The first term, indicated as variations in process, voltage and temperature (note that the
switching activity power (Psw), is due to the charge and discharge previous equation for P, is valid in a range of fO). Similarly, the
of the circuit node capacitances on the output of each logic gate average power of input buffers can be computed, depending on
and usually accounts for over 90% of the total power [2].The estimated intemal standard loads and input ramptime.
second term, indicated as short-circuit power (PSH),represents the PcoRE estimation. The PCoE estimation model is based on the
short-circuit current from the supply to ground during output different components of the target architecture at RTL level. P,,,,
transition, which can be usually ignored, since it represents the can be detailed as:
5t10 % of the dynamic power [I], [2]. In general, the values of PCoRE=PDP + PMEM+ PcNTR+ P P R ~ Cwhere , the single terms
Psw and PSH are not directly accessible separately, being the represent the average power dissipated by the data-path, the
available data of standard libraries normally expressed in terms of memories, the control logic and the embedded core processor. The
average power, gathering both dynamic contributions [9]. present paper takes into account only the terms related to the data-
Regarding the switching activity of each signal(a primary I/O or path power.
an internal signal), it is fully characterized by the following two Pnp estimation. Let P D represents
~ the average power of the data-
components[4]: a static component, takmg into account the static path, expressed as P D ~ PEG
= + PMUX + Pw, where PREG is the
probability of a signal; a dynamic component, taking into account power of the registers or reaster files; PMUXis the power of the
the timing behavior of the circuit. multiplexers; PFu is the power of the functional units.
The static component can be expressed in terms of the static
PREGestimation. The preliminary step is the estimation of the
signal ],robability pnlof each node n (note that p,,’5 1 and p,,” = 1
required registers and, consequently, the values of the maximum
- P,,’). toggle rate TR, for each of them. According to the considered
The dynamic component is based on the definition of the
level of abstraction, such data are directly available for RTL
transition probability pno’ of each node n, that is the probability of
description or live variable analysis [8] can be applied to VHDL
a zero to one transition. In the spatial and temporal independence
behavioral level specifications.
assumption ( [ 3 ] ) ,the transition probability p i i 1 is given by the
The algorithm to estimate the number of registers is similar to the
probability that the current state is zero times the probability that
one proposed in [8], for the computation of the lifetime of a
the next state is one: pii1 = p / p n z = (1 - pnl)pnl. Under the same
variable in terms of its definition and use over a selected set of
assumption, the switching activity actor of a node 12, indicated as
it
q, i s given by: W, = pI:’ + p n = 2 p n l ( l - pn1J Given the
code statements. New steps have been added to the algorithm
proposed in [8] to obtain information conceming the registers
switching activity factor cl, of a node n, the corresponding toggle
switching activity. The proposed algorithm can be summarized as
rate can be defined as TR,= ol, fc-K, where f c L K is the system
follows (more details can be found in [I I]):

1805
I . compute the lifetimes of all the variables in the given VHDL code, P()k = non-switching power consumption per MHz of a single
composed of S statements. A variable y is said to live over a set of register of type tk at a given reference frequency fo, as a function of
sequential code statements (i, i+l, i+2, ..., i+n}, when the variable is the clock input ramptime;
written in statement i and it is last accessed in statement i+n. When a TRk estimated toggle rate of the: registers in the set sk, being the
variable is written in a statement i+k in the set, but last used in the values computed with the live variable analysis;
same statement i+k of the next iteration, it is assumed to live over the 6 = derating factor.
entire set; PMuxestimation. To estimate the size and number of miultiplexers
2. represent the lifetime of each variable as a vertical line from statement from the VHDL code, it is necessary to determine the inumber of
i through statement i+n in the column j reserved for the paths in the data-path. The analysis of design paths and the related
correspondingvariable v,; notations are similar to those performed in [8], however in the
3. determine the maximum number N of overlapping lifetimes proposed approach we consider also the paths from primary
computing the maximum number of vertical lines intersecting with inputs to intemal registers and from intemal registers lo primary
any horizontal cut-line [8]; outputs.
4. estimate the minimal number N of set of registers necessary to Once the size and number of multiplexers has been computed, we
implement the code by using register sharing. Registers sharing has to derive the switching activity of he multiplexers, given the model
be applied whenever a group of variables, with the same bit-width bi, of the ?-input non-inverting multiplexer depicted in figure 2.
can be mapped to the same register. The total number of registers is
N
given by Cb, ;
i=l
5. select a possible mapping of variables into registers by using registers
sharing;
6 . compute the number wi of write to the variables mapped to the same
set of registers; -1
I. estimate a, of each set of register dividing wI by the number of Figure 2. The 2-input non-inverting multiplexer model for power
statementsS: a,=wI / S; hence TR,MAX= aIfCLK. estimation
Considering PREG, it is worth noting that the power of latches and A simplified model for the maximum switching activity of the
flip/flops is consumed not only during output transitions, but also output Z of a 2-input non-inverting multiplexer is:
during all clock edges by the intemal clock buffers, even though az = a, (1 - P S I ) + aB ps'
the data stored in it does not change. The non-switching power where CLA is the switching activity of the input A; aB is the
dissipated by intemal clock buffers accounts for approximately switching activity of the input B and ps' is the static signal
30% of the average power for the LCBSOOK technology supplied probability of the selection net S .
by LSI Logic [9]. The technology features geometries with Globally, the average power diss#ipatedby the multiplexers can be
0.5-micron drawn gate length, 0.38-micron effective channel estimated as:
length optimized for 3.3 V operation. N
Note that the internal clock buffers are independent of the output
load, thus the non-switching power of latches and flip/flops are k= 1
load-independent, but dependent on the clock input ramptime. where N is the estimated number of multiplexers and Pk is the
Globally, the average power of the registers can be estimated as: average power of each multiplexer.
N The value of Pk for each multiplexer k is given by:
pF3ZG = c ( p k + PNSk) P k = Ptk TR, 6
k= 1 where Ptkis the average power consumption per MHz of a 2-input
where N is the number of set of registers sk constituted by the non-inverting multiplexer; T R k the estimated toggle rate of the
same type of latches or D flip/flops and PI, is the average power of multiplexer and 6 the derating factor.
each set sk and PNskis the average non-switching power dissipated PFu estimah'on. The power model for functional units is a
by the intemal clock buffers of the registers in the set sk,that is the complexity-based model [8], estimating the complexity of
average power dissipated by the internal clock buffers when there functional units in terms of equivalent gates. The number of
is no output transition. equivalent gates necessary to implement the l o g c from the VHDL
It has to be pointed out, that the measured average power Pk, code is derived from a library of macro-functions such as adders,
tabulated in the target library, includes also the power dissipated multipliers, etc.. The library should include the estimated number
by the internal clock buffers during clock edges corresponding to of logic gates for each macro-function, depending on the number
output transitions. Hence the estimated value of Pk should account of operands and the parallelism. Once the number of equivalent
for an activity factor given by the TR,, while the estimated values gates for each macro-function has been evaluated, the estimated
of the PN,k should consider an activity factor of (fCLK - T R k ) . power dissipated by the functional units can be expressed as:
The estimated value of Pk and PNSkfor each set sk is given N
respectively by: P w = CPk
F& = nk P&T R K 6 lc= 1
where N is the number of macro-functions, and PKis the power of
PNSk nk (fCLK -mk16 each macro-function given by:
where:
nk = estimated number of registers in the set sk; Pk n k PTECH mk 6
where PTECH is a technological parameter expressed in [pW/(gate
Prk= average power consumption per MHz of a single register of MHz)]; nk is the estimated number of logic gates in the macro-
type tk at a given reference frequency fo, for different output function k; TRk is the toggle rate of the gates of the macro-
standard loads Ck (representing both load cells and function k.
interconnections) and clock input ramptime;
4. Experimental results and conclusions
To verify the advantages of the proposed analysis over task of a more general hardware-software co-design environment
conventional approaches, several experiments has been carried for embedded systems [lo]. Moreover the analysis will be
out. To give the flavor of the potential mismatches existing even extended to consider also the control-units and the embedded core
in simple designs, this section reports the experimental data processor and memories of the target system architecture.
obtained by developing a benchmark ASIC reported in figure 3.
The architecture contains some of the typical elements composing
a generic data-path: two sets of registers, an adder, the U 0 pads, a
battery of 64 multiplexers and a clock distribution tree. The
synthesis data have been obtained by using the Synopsys
Design Compiler environment and we used the state-of-the-art
HCMOS6 technology, featuring 0.35ym and 3.3 V, supplied by
SGS-Thomson Microelectronics at the target operating frequency
of 100 MHz.
A I1 12 CMD

cfl cfl

Design Power 1 3s 14.3’) 0.24 z.ox 164.11

Perc. Error 0 0%~ 36 27%, 13 339, -0 5 8 % I 4%

Ave.Perc. Err. -09% 2 4 . ~ 7 %31).4~a,-II.s~%, I 4%

Table I . Power estimation comparison results for the data-path


benchmark ASIC

5. References
P Landman, “High-Level Power Estimation”, in Proc. of ISLPED’96,
Int. Symp. on Low Power Electronics and Design, Monterey, CA,
August 12-14, 1996. pp.29-35.
S.Devadas and S. Malik, “A Survey of Optimization Techniques
Figure 3. The ASIC architecture of the power estimation benchmark Targeting Low Power V U 1 Circuits”, in Proc. of the 32nd Design
Automation Conference, 1995.
Experimental results are reported in table 1, considering several F.N.Najm, “A Survey of Power Estimation Techniques in VLSI
input switching activities (0.75, 0.5, 0.25 and 0.1). The results Circuits”, IEEE Trans. on Very Large Scale Integration (VLSI)
obtained by the proposed methodology have been compared to the Systems, V01.2, No. 4, pp. 446-455, December 1994.
results obtained by the Synopsys Design Power tool on the gate- A. P. Chandrakasan, R. W. Brodersen, “Minimizing Power
level netlist. Note that both the estimation methods are based on a Consumption in Digital CMOS Circuits,“ Proceedings of the EEE,
Zero Delay Model. Regarding the global power, the proposed Vo1.83, No.4, pp.498-523, April 1995.
method provides an over-estimation: the percentage error is in the R. Mehra and J. Rabaey, “Behavioral Level Power Estimation and
range from the 1.59% to the 4.16% with respect to the Synopsys Exploration”, in Proc. of First Int. Workshop on Low Power Design,
estimates. However, being the switched capacitance of I/O nodes Napa Valley, CA, pp. 197-202, April 1994.
usually larger than the switched capacitance of the intemal nodes P.E. Landman and J.M. Rabaey, “Power Estimation for High k v e l
up to three orders of magnitude, the major contribution to the Synthesis”, in Proc. of EDAC-EUROASIC ‘93, Pans, France, pp.
global power is constituted by the I/O power. In the benchmark, 361-366, Feb. 1993.
the U0 power represents the 94.83%, on average, of the total P.E. Landman and J.M. Rabaey, “Architectural Power Analysis: The
power, due to the reduced size of the core logic. Thus, a more Dual Bit Type Method”, Trans. on Very Large Scale Integration
realistic measure derives from the comparison of the core power: (VLSI) Systems, Vo1.3, No. 2, June 1995.
the model provides an over-estimation from the 8.52% to the S. Narayan and D.D. Gajsk, “Area and Performance Estimation from
3 1.64%. In particular, the results show an average percentage System-level Specifications”, Technical Report ICs-92.16, Dept. of
error of 24.87% for the registers, 30.49% for the multiplexers and Information and Computer Science, University of Cdifomia, Irvine,
-0.54% for the adder. Globally, the relative accuracy of our December 20, 1992.
approach compared with the Design Power gate-level tool is LCB500K, Cell-Based ASIC Design Manual, LSI Logic Corporation,
considered satisfactory at this level of abstraction. Furthermore, June 1995.
traditional gate-level methods suffer from a main drawback with [IO] A.Balboni, W.Fomaciari, D.Sciuto, “TOSCA: a pragmatic approach
respect to our approach: the need to perform time-consuming to co-design automation of control dominated systems”,
tasks such as the synthesis. On the contrary, our approach, by Hardwardsoftware Co-design, NATO AS1 Series, Series E: Applied
avoiding to move down to the gate-level description, represents an Sciences - vo1.310, pp.265-294, Kluwer Academic Publisher, 1996.
innovative methodology encompassing the requirements to [ 1 11W.Fomaciari, P.Gubian, D.Sciuto, C.Silvano, “A conceptual analysis
achieve accurate power estimation in a reasonable design time. framework for low power design of embedded systems”, in Proc. of
Work is in progress aiming at integrating the proposed conceptual IEEE Int. Conf.: Innovative System In Silicon, October 9-1 1, 1996,
framework and the related power metrics within the partitioning Austin, TX, USA.

You might also like