You are on page 1of 48

iv

Abstract
Power being one of the important design metric needs to be estimated beforehand to
enable early design wins and to provide a competitive edge for the product. A pre-silicon power
model of SoC serves to project average power consumed across a variety of widely accepted
benchmarks (called Key Power Indicators or KPIs). An accurate and scalable power model
enables architects to optimize design metricsunder a given product power envelope.
The growing importance of power metric has led power architects to think in a common
direction of estimating SoC power and performing what-if analysis even before the silicon is in
hand. There is a great push towards the modular infrastructure that are accurate and efficient in
maintaining a multi-user power model for newer SoC and its derivatives.
The work presented here concentrate on the template driven technique to roll-up power
from IP to SoC and Platform levels. The work propose a self-sustained ecosystem for SoCs that
a) enforces a template structure on power collateral and b) exploits the structure and context
using automation to defray the complexity in power modeling. Once completely productized,
the approach presented in this work will significantly improve the accuracy and efficiency in
maintaining a multi-user power model for newer SoCs and its derivatives.
v
Contents
Abstract .........................................................................................................................................iv
List of Figures ............................................................................................................................. vii
Acronyms, Abbreviations and Nomenclature............................................................................ix
Chapter 1 Introduction.................................................................................................................1
1.1 Motivation...............................................................................................................1
1.2 Aims and goals........................................................................................................2
1.3 Organisation of dissertation.....................................................................................2
Chapter 2 Review of Literature ...................................................................................................3
2.1 Existing approach....................................................................................................3
2.2 Self-sustained power modeling infrastructure.........................................................4
2.2.1 Introduction to Aceplorer power models........................................................4
2.2.1.1 Hardware modeling............................................................................4
2.2.1.2 Scenario modeling............................................................................12
2.2.1.3 Simulations.......................................................................................17
2.3 PTPX Overview.....................................................................................................20
2.3.1 Introduction..................................................................................................20
2.3.2 Running PTPX session.................................................................................21
2.3.3 Report generation: ........................................................................................21
Chapter 3 Power roll-up infrastructure ....................................................................................24
3.1 Introduction...........................................................................................................24
3.2 Excel ......................................................................................................................25
3.3 Aceplorer...............................................................................................................26
3.4 DB..........................................................................................................................27
3.5 Annealing tools......................................................................................................27
3.6 Web........................................................................................................................27
3.7 Sharepoint BI .........................................................................................................28
vi
Chapter 4 Methodology ..............................................................................................................29
4.1 Modeling of SoC...................................................................................................29
4.2 Hardware Modeling...............................................................................................29
4.3 Scenario Modeling.................................................................................................31
Chapter 5 Results and Discussions ............................................................................................33
5.1 PnP study...............................................................................................................33
Chapter 6 Summary and Conclusion ........................................................................................38
6.1 Summary................................................................................................................38
6.2 Conclusion.............................................................................................................38
6.3 Scope of Future Work...........................................................................................39
Literature Cited...........................................................................................................................40
Acknowledgements ......................................................................................................................41
vii
List of Figures
Figure 2.1Tree model summary......................................................................................................7
Figure 2.2 GUI representation of power model...............................................................................8
Figure 2.3 Detailed view of component Mem1...............................................................................9
Figure 2.4 Python definition of statics. .........................................................................................10
Figure 2.5 Power state equations...................................................................................................11
Figure 2.6 Block representation of power models.........................................................................12
Figure 2.7Scenario tree.................................................................................................................15
Figure 2.8Tree representation of scenario in GUI........................................................................16
Figure 2.9J ob element. .................................................................................................................16
Figure 2.10ACE computed overall power of SoC........................................................................17
Figure 2.11Energy distribution for different blocks.....................................................................18
Figure 2.12Power state residencies. .............................................................................................18
Figure 2.13Component SMPS1 power result...............................................................................19
Figure 2.14Simulation table. ........................................................................................................19
Figure 2.15Power report...............................................................................................................22
Figure 2.16Report threshold voltage groups. ...............................................................................23
Figure 2.17Report clock gate savings. .........................................................................................24
Figure 3.1 Power roll up infrastructure. ......................................................................................256
Figure 3.2Sharepoint BI .............................................................................................................289
Figure 4.1 Generic block template. .............................................................................................301
Figure 4.2Parallel jobs................................................................................................................313
Figure 5.1 PnP curve for an experimented workload..................................................................345
Figure 5.2 PnP curve with different VR topology.........................................................................36
Figure 5.3 PnP curve for different workload.................................................................................36
Figure 5.4 IA freq for different fps................................................................................................37
Figure 5.5 Core and interconnect power variation........................................................................38
viii
ix
Acronyms, Abbreviations and Nomenclature
ACE Aceplorer
API Application python interface
ASIC Application specific integrated circuits
CCS Composite current source
CPU Central processing unit
DDR Double data read
DB Database
ESL Electronic system level
FCFS First come first serve
GUI Graphical user interface
HDL Hardware description language
IA Intel architecture
IC Integrated circuits
IP Intellectual properties
KPI Key power indicators
LDO Low dropout oscillators
MB Mega bytes
MIPS Million instructions per second
NLMP Non-linear power models
PnP Power and performance
PU Processing unit
RTL Register transfer logic
SAIF Switching activity interchange format
x
SDC Synopsys design constraints
Si Silicon
SMPS Switched mode power supply
SoC System on chip
VCD Value change dump
VR Voltage rail
Chapter 1
Introduction
1.1Motivation
Power being one of the important design metric needs to be estimated beforehand to
enable early design wins and to provide a competitive edge for the product. A pre-silicon power
model of SoC serves to project average power consumed across a variety of widely accepted
benchmarks (called Key Power Indicators or KPIs). An accurate and scalable power model
enables architects to optimize design metrics under a given product power envelope. As a result,
the battery life targets and performance data that emerges from this pre-silicon model enables
early design wins and potentially provides a competitive edge for the product.
The project presents a template driven technique to roll-up power from IP to SoC and
Platform levels and propose a self-sustained ecosystemfor SoCs that a) enforces a template
structure on power collateral and b) exploits the structure and context using automation to defray
the complexity in power modeling. Once completely productized, the approach presented in this
work will significantly improve the accuracy and efficiency in maintaining a multi-user power
model for newer SoCs and its derivatives.
2
While static spreadsheets serve as a transparent mechanism to aggregate the power data,
it lacks scalability with increasing levels of hierarchy and dependency between hardware and
software. Furthermore, a spreadsheet based model for a typical SoC is of the order of hundreds of
MB and frequently stalls the CPU and hampers architectural studies desired through parameter
sweep. Above limitations along with lack of structure in power data of IP blocks leads to large
single user spreadsheets which slowly perishes (due to lack of scalability) before any data from
silicon is available to co-relate with.
1.2Aims and goals
The main objective behind this work is to come up with an infrastructure that enables
estimation of pre-silicon power and allows what if analysis on the product thereby enabling
architects to optimize a variety of design metrics under a given product power envelope. The
what-if analysis performed on the power model helps in providing early feedback to architects
and designers to optimize the design metric.
1.3Organisation of dissertation
The whole thesis is organized into six chapters. Chapter 2 presents overview of the
existing approach of power modeling and the features/capabilities of Aceplorer [13] tool for
power modeling along with the Primetime PX tool [14] for power estimation of various IPs.
Chapter 3 presents the complete power roll-up ecosystem which makes the power models much
flexible, reusable and user friendly. Chapter 4 describes the bottom up approach to model SoC,
starting from templates creation of IPs to the complete SoC. Chapter 5 presents the results,
conclusions and future scope. Chapter 6 gives the references.
Chapter 2
Review of Literature
2.1Existing approach
Existing power models are spreadsheet based. The power variables (frequency, voltage,
activity, capacitance etc.) and the raw data (process variations, leakage scaling, leakage power,
temperature coefficient for leakage power, f-v curves etc.) for every IP is maintained in
individual sheets/tabsof excel. Typical spreadsheet sizes of fragments of the SoC at Intel exceeds
100 MB.
These power models are predominantly single user and are not scalable. The lack of
hierarchical and visual representation makes the power models confined to singleuser. However,
a successful power model is typically built through seamless refresh of lower level blocks all the
way to the platforms. Furthermore, addition of new variables and accounting its effect on the SoC
power presents a great challenge. Also, it sometimes stalls the CPU making thewhat-if analysis
time consuming.
4
2.2Self-sustained power modeling infrastructure
Above limitations of existing power modelsled the power architects to think in a common
direction of having a multi user and scalable power models. The power model which will be
functionally and visually more descriptive and which facilitates quick what-if analysis.
2.2.1 Introduction to Aceplorer power models
Aceplorer [13] is an ESL (electronic system level) tool for power estimation. ACE power
models are not functional models. The idea behind this is to capture the power architect intent
and to separate these concerns from the functional intent. Aceplorer [13] uses power states to
describe the power consumption behavior of an electronic component. It describes the power
behavior of electronic components like IP, IC, platform etc. Interdependencies between
parameters of different blocks can be also modeled for providing more accurate and reliable
figures. ACE power models can be broadly categorized in two sections: hardware and scenario
modeling.
Hardware modeling represents the hardware block and defines the power consuming
variables and their interdependencies to characterize the power consumption of the block in each
of itspower states.
Scenario modeling models the residencies of block in each of the power states for the
given use case.
2.2.1.1 Hardware modeling
Standard elements for ACE power models:
The elements for describing the architecture of a system are macrocomponent, component,
processing unit, constraint, parasiticand power switch.
a) Component: Leaf element of the power model hierarchy that represents physical
component in a system and consumes power. It isdescribed by:
- Its inputs/outputs: logical interface container.
- Its variables: variables container.
- Its statics: statics container.
- Its power states: states container.
5
Power consumption equations are to be defined for each power states to fully describe the
power behavior of the component.
b) Processing unit (PU) element: Leaf element of the power model hierarchy that represents
physical component in a system and consumes power. It is described by:
- Its inputs/outputs: logical interface container.
- Its variables: variables container.
- Its statics: statics container.
- Its power states: states container.
Power consumption equations are to be defined for each power states to fully describe the
power behavior of the component. The difference between PU and component lies in its
capability to handle additional parameters such as:
- The maximum processing rate (max_processing_rate) that defines the maximum
speed of PU for processing a workload representing an application running on the
system. The type can be user-defined like MIPS, bytes/sec etc. The duration for the
task is then computed on the fly during simulation.
- The arbitration scheme defines how the scheduling of the tasks is handled by the PU.
It may be either first come first served (FCFS) basis or fixed priority. In fixed priority
scheme, PU executes next task with the highest priority from the task queue after
finishing onetask.
- The arbitration penalty represents the extra load due to scheduling.
- The processing ratio represents the current available processing power for a
processing unit.
c) Parasitic element: It represents the physical component that can be modeled using an
electrical resistance. This is a leaf element of the power model hierarchy. It is described
by:
- Its inputs/outputs: logical interface container.
- Its variables: variables container.
- Its statics: statics container.
- Its power states: states container.
6
d) Power switch element: It models power switches and allows selection of a voltage source
from N voltage sources. It is described by:
- Its inputs/outputs: logical interface container.
- Its variables: variables container.
- Its statics: statics container.
- Its power states: states container.
e) Constraint: It represents some constraints set onto the power model. It is the leaf element
in the power model hierarchy. It has no physical existence into the real system and it does
not consumes power. It is used to describe environment conditions (voltage of the
external power supply, input bandwidth, etc.) or to perform translation/aggregation of
data between one block to another. It is described by:
- Its inputs/outputs: logical interface container.
- Its variables: variables container.
- Its statics: statics container.
- Its power states: states container.
f) Macrocomponent: A virtual wrapper around components, processing units, constraints,
parasitics, power-switches or other macrocomponents. It defines a level in the model
hierarchy. Hence it does not have any internal parameters (there are no statics, variables,
etc.). However, its power is simulated as the sum of the power of its inner elements. It is
described by:
- Its inputs/outputs: logical interface container.
- Its connection between the elements inside it: links container.
- Its power states at the macrocomponent level: states container.
Fig 2.1 shows the ACE tree model summary. The tree model shows the different Aceplorer [13]
elements that forms the power model hierarchy. Macrocomponent element acts as a wrapper for
other elements.
7
Figure 2.1Tree model summary
Fig 2.2 shows GUI representation of power model of SoC. This gives a top view of all the
macrocomponents or components used inside the power model. This view does not give any
details of internal variables used to capture the power behavior of the component. The plus mark
sign present on the left hand side of each elements in the figure can be clickedto dive down into
the internal details of the individual elements.
The top level model is SoC comprising of two macrocomponents macroblock1 and DDR.
Macrocomponent macroblock1 comprises of two components block1 and mem1. Also, the given
SoC comprises of components SMPS1, SMPS2, LDO1, LDO2 etc. SoC has a processing unit
PU1 and several constraints like Battery, ClockDiv etc.
8
Figure 2.2GUI representation of power model
Fig 2.3 shows the detailed view of Mem1 component that represents a memory.
9
Figure 2.3Detailed view of component Mem1.
Memory has two supply voltages (Vdd1 and Vdd2), clock, read and write bandwidth as its inputs.
The current consumption variable I_leak_p and I_dyn_p is tied to Vdd2. This means that
I_leak_p is the amount of leakage current and I_dyn_p is the amount of dynamic current drawn
10
from the voltage supply Vdd2. The power drawn from the Vdd2 supply is just the sum of
products of each current consumption variable and Vdd2. Similarly there are current consumption
elements for other voltage supply Vdd1. The catch here is to define the current consumption
element accurately enough for each power states (off, write, read, retention).
Statics represents the other power variables required for power computation. The statics
can be either simple expression (like Word_size) or a simple/complex python expression (like
I_leak_retention_p).
Figure 2.4Python definitionof statics.
Fig 2.5 represents various power states that Mem1 can acquire. Each power state has its
own equations for current consumption equation which determines the power consumed by
memory in that state. The duration/residency of each state is determined by the scenario section
(discussed later).
11
Figure 2.5Power state equations.
Fig 2.6 represents the ACE GUI block representation of power models. The block
representation gives a visual feel of how the different blocks are connected and whichoutput of a
block feeds the other blocksand in turn affectsthe power distribution of other block.
12
Figure 2.6Block representation of power models.
2.2.1.2 Scenario modeling
Standard elements for scenario modeling are:
1) Scenario element: It represents an application running on power model. Scenarios are
time based and are attached to a power model. Each scenario has a mandatory flow
element and can include sequential and parallel sequences as well as flow parameters
elements.
2) Flow element: It is a container per scenario/job to store the sequences of events in the
scenario/job. It necessarily has unique start and end elements and can include multiple
arc, comment, delay, job, stamp, step, tasks and syncro elements.
3) J ob element: It is a wrapper for leaf elements such as arc, comment, delay, job, stamp,
step, task and syncro. It allows creating hierarchical scenarios. Its attributes are:
a) Name : defines the name of the job
b) Repeat factor: defines how many times the job should be re-executed
c) Path to component: defines the power model element to which job is applied
4) Arc element: It is used to link elements contained in the flow container. The elements
contained in the flow container are executed sequentially unless parallel arcs elements
are used. In such case there is a concurrent branch execution. A syncro element is
13
used to synchronize concurrent branches. Syncro element ensures the flow execution
proceeds only after all concurrent branches have terminated. Its attributesare:
a) Arc source: defines the starting point of arc
b) Arc destination: defines the end point of arc
c) Branch id: integer that is used to make branching decision
5) Step element: It is a leaf element of the scenario definition. It represents the
elementary task of the application that run onto the model. It defines a set of state
setting to be applied on component, constraint, power switch or parasitic during a
given duration. Its attributes are:
a) Step name: defines the name of the step
b) Duration: defines the duration of step in sec.
c) Description: any text description
6) Stamp element: It is a leaf element of the scenario definition. It represents the
instantaneous change in the scenario run onto the model. It is mostly used for
initializing the model. It is similar to step but dont have any duration. Its attribute are:
a) Step name: defines name of the step
b) Description: any text description
7) State setting element: It describes the power state change of a component,
macrocomponent, constraint, power switch and parasitic. Thiscan be inserted in a step
or stamp element. Its attributes are:
a) Path to the component: gives the path of the block on which setting has to be
applied.
b) State entered: gives the power state of the block
c) Concurrency policy: the policy can be either concurrent or exclusive.
i) Concurrent - during the duration of its parent step, some concurrent
steps can apply the same state setting i.e. set the same state to the same
component.
ii) Exclusive - during the duration of its parent step, no concurrent step
can do so.
d) Description: any text description
14
8) Task element: It is a leaf element of the scenario definition. It defines a workload
assigned to a processing unit. Contrary to step, the duration is calculated based on the
workload by the Aceplorer [13] during the runtime of simulation.
9) Flow parameter element: It is a variable for parametrizing a flow. Its a variable
available in scenario section. It allows for example to define step and delay durations
as well as job repeat factor or processing load in task. It can be also used within
programmable settings and parameterizable settings. It can be added at different levels
in the hierarchy of the scenario:
a) At scenarios level: visible for all scenarios defined in the power model
b) At scenario level: visible for the given scenario
c) At job level: visible in the job
Its attributes are:
a) Name: gives the name of the flow parameter.
b) Type: defines the type of flow parameter.
c) Equation: defines value of the flow parameter.
d) Description: any text description
10) Programmable setting element: It drives power models from scenario by overriding
the static of element that has been made programmable. In the programmable setting
equation, the flow parameters can be accessed. This allows assignment of different
values to the statics for different power states of the element. Its attributes are:
a) Path to component: refers to the path of the element
b) Programmable reference: refers to the static that has been made programmable
c) Equation: define the value of the programmable setting
d) Description: any text description
Fig 2.7 represents the tree structure of the scenario. The scenario comprises of a
combination of parallel and serial jobs. Each job is modeled to capture the duration for each
power states of the respective component.
15
Figure 2.7Scenario tree.
Fig 2.8 defines the internal settings present inside the example job. The job defines the
different steps which corresponds to the different power states of the entire SoC. Step Active says
that the SoC is in active state for 20 seconds. Active state of SoC corresponds to different state
setting for individual element. DDR is in active state, mem1 is in write state etc. Similarly each
state of SoC corresponds to different state combinations of individual elements.
Fig 2.9 shows the block view of job element having a start and end. Start resembles the
beginning of application and end resembles the end of the application.
16
Figure 2.8Tree representation of scenario in GUI
Figure 2.9J ob element.
17
2.2.1.3 Simulations
Once the hardware and the scenario modeling is done, the simulation is carried out to
study the power behavior of SoC. The simulation generates different curves and reports to
analyzethe power results for every IPs. Few of them are described in the below figures.
Fig 2.10 shows the power reporting for the element SoC which is the top level module.
Similarly, any element in the left hand side from the tree can be selected and corresponding top
level power report can be viewed.
Figure 2.10ACE computed overall power of SoC
18
Fig 2.11 represents the energy distribution of different blocks over time.
Figure 2.11Energy distribution for different blocks
Fig 2.12 shows the residencies of component SMPS during the simulation run.
Figure 2.12Power statesresidencies
19
Fig 2.13 represents the power consumed by component SMPS1.
Figure 2.13Component SMPS1Power result
Fig 2.14 shows dynamic and static current of component SMPS at different instants of
simulation.

Figure 2.14Simulationtable.
20
2.3PTPX Overview
2.3.1 Introduction
PrimeTime PX [14] is an add-on feature to PrimeTime [14] tool suite that accurately
analyzes power dissipation of cell-based designs. It is intended as an advanced solution for ASIC
and structured custom circuit designers who are developing products for power-critical
applications such as portable computing and telecommunications.
PrimeTime PX provides vector-free and vector-based peak power and average power
analysis. The vectors to PrimeTime PX are either RTL or gate-level simulation results in the
Value Change Dump (VCD) format or Switching Activity Interchange Format (SAIF).
PrimeTime PX provides support for multi voltage and power domain analysis. It also has an
integrated graphical user interface (GUI) for visual power debugging.
PrimeTime PX builds a detailed power profile of the design based on the circuit
connectivity, the switching activity, the net capacitance, and the cell-level power behavior data in
the Synopsys database format (.db) library. The library can be a nonlinear power model (NLPM)
or a Composite Current Source (CCS) library. It then calculates the power behavior for a circuit
at the cell level and reports the power consumption at the chip, block, and cell levels.
Following power analysis techniques can be done with PrimeTime PX:
Averaged power analysis: For purely averaged power analysis, PrimeTime PX supports
propagation of switching activity based on defaults, user-defined switching, or switching derived
from an HDL simulation (either RTL or gate level).
Time-based power analysis: For extremely accurate analysis of power with respect to
time, PrimeTime PX supports analysis based on the RTL or gate-level simulation activity over
time. PrimeTime PX uses an event-driven algorithm to calculate the power consumption for each
event. The tool generates detailed time-based power waveforms to provide both average and peak
power results and also reports the power results.
21
2.3.2 Running PTPX session
The following steps are common for all types of power analysis and are followed to create
a PTPX [14] session:
1) Set the power analysis mode: Power analysis mode is fixed either to average or time-
based.
Command used: set_app_avr power_analysis_mode averaged
2) Read and link the design: The gate level netlist is read and are linked to the library
cells. The search_path variable is first set which gives the path to be searched for
libraries and design. The library name and Verilog name is then given and finally
command link performs the linking of the design.
set _app_var sear ch_pat h . / sr c/ hdl / gat e. . / sr c/ l i bs/ snps
set _app_avr l i nk_l i br ar y *cor e_t ype. db
r ead_ver i l og mac. vg
cur r ent _desi gn mac
l i nk
3) Read SDC (Synopsys design constraints) and annotate parasitics: The constraints are
read and the nets parasitics are annotated.
r ead_sdc . . / sr c/ hdl / gat e/ mac. sdc
r ead_spef . . / sr c/ annot at e/ mac. spef . gz
4) Read switching activity file (SAIF): The switching activity file is read which contains
information about the toggle rate and static probability of nets/pins/ports.
r ead_vcd - st r i p_pat h t b/ maci nst . . / si m/ vcd. dump. gz
r ead_sai f st r i p_pat h t b/ maci nst . . / si m/ mac. sai f
5) Update power: Finally the update_power command is used to allow the propagation of
switching activity across each nets and the power of the block is updated.
Command used: update_power
2.3.3 Report generation:
Once the power for the IP is updated, the various customizable reports can be generated
for deeper analysis of the power behavior.
Below are few reports that can be generated:
1) Report generated by report_power command:
22
Figure 2.15Report power.
Above report shows the power distribution (switching, internal, leakage power) of the
design unit mac
2) Report generated by Report_threshold_voltage_group:
Figure 2.16Report threshold voltage group
23
Above report shows the distribution of low Vth and high Vth cells across different
power groups.
2) Report generated by report_clock_state_savings:
Toggle saving is defined as the fraction of the input clock toggles to the clock gates that are pr
Figure 2.17Report clock gate savings
Chapter 3
Power roll-up infrastructure
3.1 Introduction
The power roll-up ecosystem enforces the template structure of power collaterals. All the
components of a typical industry silicon can be categorized under one of the following
categories: Hard IP, Soft IP, Core, Graphics, and Memory. In this work we present the standard
templates defined for the above critical components. The templates enables a systematic lookup
at various hierarchical levels. The Aceplorer [13] tool with python API enables generation of
power models through automation and enables dynamic simulation. DB is a MySql based
database with version control and multi-user hierarchical access. Web is a webpage which
interacts with the Aceplorer [13] and DB in the background to generate power results. Annealing
tools [12] employs annealing flows tomeet performance within the power envelop. Finally, there
is a sharepoint BI which is an off the shelf gallery for DB analytics and KPI aware visualizations.
25
Figure 3.1Power roll up infrastructure.
3.2 Excel
The existing power models are spreadsheet based. The excel power models are huge with
number of tabs. There is one tab for configuration which characterizes the values of power
supplies, frequency, bandwidth, residencies, enable line etc. for different IPs under different
scenarios. There are tabs for VF (voltage - frequency) characterization, leakage characterization.
VF tab shows the voltage required to support the various frequencies. Leakage tab shows the
leakage power for different library cells under different voltage and temperature. There are tabs
foe dynamic and leakage power which shows the power consumed (both dynamic and leakage)
for every IPs under all scenarios. The power values are obtained from complex power relations
which refers to variables present in other tabs like configuration and leakage tab.
EXCEL
(Existing
Power Models)
Aceplorer
Power models
Database
Annealing
Tools
Web
Sharepoint
BI
26
In essence, power number for a single IP under a single scenario is obtained by the
complex computation involving huge number of power variables defined in other tabs. As a
result, tracking of power numbers becomes difficult. The lack of hierarchy makes the situation
worse and the spreadsheet based model perishes before the actual silicon data is available for
correlation.
3.3 Aceplorer
Aceplorer [13] template for each IP is first created with the help of existing power models
and then instantiated to model the complete SoC. The Aceplorer [13] template of an IP has its
interfaces, power variables, power states and power equations finalized. The variables and
equations are generic enough to instantiate the IP in power models of any future generation SoC.
Also the template describing the job of the IP is finalized which has the generic definition for the
duration of IPs power states. Once the template(both hardware and scenario) of every IPs with
generic variables and equations are finalized, they are savedin a separate file. Aceplorer [13] has
an inbuilt python API that enables automation ininstantiating these instances and generating the
complete power model of entire SoC. Once both the hardware and scenario section are modeled,
the power model is simulated to correlate the simulation generated power values with the power
values calculated in excel based power model. Fine tuning of IPs hardware and scenario
templates are made if necessary to get 100% correlation in power values.
Aceplorer [13] supports native tables where the leakage tables, VF tables etc. can be
stored and the specific value for the given operating condition can be queried. These tables are
frequently updated based on new experimental data. The new tables can be frequently imported
as and when available inside the Aceplorer [13] using the python automation.
Once the power model is ported to Aceplorer [13], one can use the python automation i.e.
can write a python script to sweep one variable against other. This opens a scope for wide range
of what-if analysis. For example, the frequency of one block can be varied and the power value of
complete SoC can be plotted against it. Also, the power values of two blocks can be plotted
against the frequency variation of third block.
27
Once the templates meet the 100% correlation with the existing excel based power model
for the SoC, those were instantiated to model several other existing SoC to finalize and come up
with the generic template of IP. Hundreds of scenarios were run for each SoC to get the power
number and correlate with the existing excel based power numbers.
3.4DB
DB is a MySql based database with version control and multi-user hierarchical access.
This database holds all the power models (.dmo files), scenarios (.dsi files) and templates (.dpr
files) and the files can be queried from any machine having Aceplorer [13] installed and required
permissions. In essence, the complete Aceplorer [13] power model can be built from the scratch
from the database. Users working on the same project across different sites can make the
necessary changes and save the files back to database.
This enables smooth files movement between multiple usersworking on the same project.
It has a version control mechanism that tracks the changes made to earlier files.
3.5Annealing tools
In order to optimize power, the phenomenon of obtaining minimum frequency that
delivers the required performance is known as annealing[12]. During the simulation, it is often
required to perform annealing to meet the performance within the power envelope.
As the frequency is increased, the power consumption being directly proportional to
frequency should goes up but with the increase in frequency, the residency decreases thereby
forcing the power to go down. There exists an optimum frequency that meets the performance
with lowest power [9]. The phenomenon of determining this frequency is called annealing[12].
3.6Web
It is a chrome web browser that interacts with Aceplorer [13] in background and allows
simulations of power models from the web page. It can also interact with databaseand Excel for
28
fetching templates and then can simulate the new templates based SoC using Aceplorer [13] in
background.
3.7Sharepoint BI
Off the shelf gallery for DB analytics and KPI aware visualizations. Its an added feature
for better visual representations.
Figure 3.2Sharepoint BI
Chapter 4
Methodology
4.1Modeling of SoC
The first step in modeling of SoC using Aceplorer [13] involves creation of Aceplorer [13]
templates for different IP blocks and their scenario template. The emphasis is given on creation
of versatile templates of the IPs so that it can be reused/instantiated for derivatives SoCs that uses
the same IPs.
Once thesetemplates of IPs are finalized, they are instantiated and connected properly to
complete the power model. Then the template of scenario is finalized which forms the backbone
of all the scenarios.
4.2Hardware Modeling
The hardware template of IP intends to capture the power variables that can capture and
describe the power numbers for the IP across different scenarios. The hardware template of the IP
defines its interfaces i.e. input and output, power states i.e. the states needed to categorize the
power behavior of IP under different operating conditions, power variables i.e. capacitance,
30
leakage scaling etc. IPs templates are instantiated to resemble the IP present in the SoC and the
interface connection is made to represent the complete SoC. The template for a generic block for
example is shown below.
Figure 4.1Generic block template.
The above generic block has two power supplies (Vdd1 and Vdd2), clock, read and write
bandwidth as its inputs. Four variables two for each power supplies has been defined representing
the leakage and dynamic current drawn from the respective power supplies. Once these variables
are known (determined by scenario and complex equation involving the power variables),
Aceplorer [13] multiplies them with the voltage value to return the dynamic and static power
consumption. Statics are defined (some are hardcoded and some are derived using other statics)
which finally contributes to the computation of dynamic and leakage current. Moreover, these
statics can be made programmable to take different values or expressions depending upon the
31
scenario. Finally the power states (off, read, write) of the block is defined. These are the different
states required to characterize the power behavior of the block.
4.3Scenario Modeling
Once the hardware blocks are instantiated into Aceplorer [13], the scenario is modeled
that represents the application running onto the SoC. J obs elements are created for every blocks.
Each job element tries to model the application performed by the respective block. J ob elements
for the blocks can be connected in parallel or serial accordingly to model the scenario section.
Below figure shows the jobs performed by five blocks running in parallel. J ob1 models the
application running on block1, job2 models the application running on block2 and so on. Inside
each job, the statics or variables defined for the respective block can be accessed and modified
depending upon the scenario.
Figure 4.2Parallel jobs.
32
Inside the job, the residencies for different power states for the block is also defined that
pushes the block to the respective state for the computed duration. The residency percentage is
computed dynamically based on the scenario and other variables and operating conditions.
Running different applications boils down to updating the duration/residencies of all the
power states for different blocks.
Chapter 5
Results and Discussions
5.1PnP study
Power and performance (PnP) study involves the study of performance of the SoC within
the given power envelope. It is one of the important study to find out the optimization of both
power and performance.
For the given performance, the power distribution of the SoC generated from Aceplorer
[13] can be analyzed and the blocks consuming higher power can be identified and appropriate
feedback can begiven to the design team for power optimization.
Since the SoC supports multiple VR topology [10], the PnP curves were generated (from
Aceplorer [13] with the help of python API) for different VR topologies and the optimum
topology for the given application were identified. For example:
PnP curve for 2VR topology with one voltage rail combination is shown below. The
curve is obtained for a workload in which one block is master and other block is slave[1]. The
master operating frequency is varied using python API and the power and performance values are
34
obtained from the Aceplorer [13] simulation results. Python script finally generates the PnP
curve.
Figure 5.1PnP curve for an experimented workload.
PnP curve for 2VR topology with different voltage rails combination and for the same
workload is shown below. The master operating frequency is varied using python API [2] [4] and
the power and performance values are obtained from the Aceplorer [13] simulation results.
Python script finally generates the PnP curve.
35
Figure 5.2PnP curve with different VR topology.
The similar set of analysis were done for different workload, where master and slave
interchanges their role [2] [4]. The slave operates at its minimum frequency and the master
frequency is varied to obtain the PnP curve.
Figure 5.3PnP curve for different workload.
The IA frequency required to support different fps (annealing) curve has been plotted
(Figure 5.4) to see the minimum frequency that IA can operate without compromising the
performance. The operating system can be enabled with this pre silicon model to dynamically
off-load the Graphics engine or the Core based on the current power envelope and estimated
battery life.
36
Figure 5.4IA freq for different fps.
Figure 5.5 shows the variation of core and interconnect power with frequency. The non-
linearity of the model can be exploited to operate the design at an optimum threshold which
results in minimum power [10] to perform at desired fps.
37
Figure 5.5Core and interconnect power variation.
Figure 5.5shows the difference in scaling of power between the Core, that is primarily
comprising of high speed logic and the Interconnect, which is a generic fabric. As expected, the
Core power consumption is more sensitive to variations in frequencythan the Interconnect
power.
Chapter 6
Summary and Conclusion
6.1Summary
The thesis presents an overall infrastructure which can project pre-Si power of SoCs and
facilitates what-if analysis on the power behavior of SoCs. The tool called Aceplorer [13] is used
to build power models of each IPs that tries to capture its power behavior. These power models of
individual IPs are integrated with proper interfaces to build the power model of complete SoCs.
The complete power model can then be simulated across different scenarios to study the power
distribution. Moreover, what-if analysis can be performed to improve the power metric of the
complete SoCs.
6.2Conclusion
Through the power modeling infrastructure built in this work, we have extracted power
and performance data for a large Intel based SoC. As the performance or throughput demand on
39
the SoC increases, the total skew power increases as well and an optimum point based on the
product envelope could be chosen. This PnP curve was plotted for different combinations of
voltage rails to figure out the best voltage rail combination for a given workload.
6.3Scope of Future Work
The power modeling technique for pre-Si power estimation tends to become complex
because of the huge number of power variables involved. The tracking of power variables
becomes complex because of the long list of hierarchical levels. Specifically, when the
dependency between the architectural variables increases more than three levels, enhancing the
templates presented in this work might be required. The what-if analysis, analyzing different
curves can lead to early design win in the product cycle. There is potential for future work to
build visualization and analytics on the power data and creating dedicated views that cater to an
1) Architect 2) IP power owner 3) Marketing/Target Skew Planner.
40
Literature Cited
[1] Manish Arora, Redefining the role of the CPU in the era of CPU-GPU integration, IEEE
Micro 2012.
[2] Manish Arora, The architecture and evolution of CPU-GPU systems for general purpose
computing, Research Surveys, Department of Computer Science and Engineering, UC San
Diego, 2012.
[3] Indrani Paul, Cooperative Boosting: Needy versus Greedy power management, ISCA 2013.
[4] NVIDIA GPU and Intel CPU family comparison articles, http://wikipedia.org.
[5] A. Bakhoda, Analyzing CUDA workloads using a detailed GPU simulator, IEEE
International Symposium on high performance computer architecture 2001.
[6] V.W. Lee, Debunking the 100x GPU vs. CPU myth: An evolution of throughput computing
on CPU and GPU, International Symposium on computer architecture 2010.
[7] Kangmin Lee, Low-power network-on-chip for high-performance SoC design, VLSI
Systems, IEEE 2006.
[8] M. Bohr, New era of scaling in SoC world, IEEE 2009.
[9] M. Horowitz, Low-power digital design, Low power electronics, IEEE 1994.
[10] An ASIC low power primer, Analysis, Techniques and specification by Rakesh Chadha and
J . Bhasker (Springer Publication).
[11] Solvnet, https://solvnet.synopsys.com.
[12] Annealing tools, Intel Internal Tool Reference.
[13] Docea Aceplorer, http://www.doceapower.com.
[14] Synopsys Primetime Reference Manual, 2013
41
Acknowledgements
First and foremost I would like to thank my guide Dr. B. Lakshmi, department of
electronics and communication engineering, National Institute of Technology Warangal for
allowing me to do this work with his constant support throughout the completion of the project. I
am grateful to her for her invaluable guidance and motivation.
I would like to thank Prof. K.S.R. Krishna Prasad, department of electronics and
communication engineering, National Institute of Technology Warangal for his invaluable
guidance and motivation. His devotion in the field of VLSI has been a great inspiration in
finishing my masters studies.
I would like to thank the Head of the Department Prof. N.V.S.N Sarma and all my
faculty members in ECE department for their help during my M. Tech.
I would like to thank my Mentor, Dr. Arun Janarthanan, Component Design Engineer,
Intel Technology India Private Limited, Bangalore and my manager Mrs. Bharathi V,
Engineering Manager, Intel India Private Limited, Bangalore for their esteemed guidance,
valuable suggestions and time throughout. Their guidance and vast knowledge helped me
throughout my internship tenure and writing of this thesis. I could not have imagined having a
better manager and mentor for my masters studies.
At last I thank all my classmates for their help in the discussion of the problems about this
project. I am grateful to my parents for their constant support.
Ranjan Kumar
Roll no. 124568
Date:

You might also like