You are on page 1of 12

CADENCE DESIGN SYSTEM, INC.

Optimization Best Practices in RTL


Compiler
Application Note

9/27/2011

This application note outlines some of the best practices and strategies when using optimization with
RTL Compiler (referred as RC throughout the document).
Optimization Best Practices in RTL Compiler

Contents

1 Purpose ................................................................................................................................................. 3
2 Data preparation ................................................................................................................................... 3
2.1 Libraries......................................................................................................................................... 3
2.2 HDL ................................................................................................................................................ 4
2.3 Constraints .................................................................................................................................... 5
3 Top-down vs. Bottom-up ...................................................................................................................... 5
4 Wireload Model vs. PLE ........................................................................................................................ 6
5 Runtime Consideration ......................................................................................................................... 7
6 Optimization Goals................................................................................................................................ 7
6.1 Timing............................................................................................................................................ 7
6.2 Power .......................................................................................................................................... 10
6.3 Area ............................................................................................................................................. 11
SUMMARY ................................................................................................................................................... 12

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 2
Optimization Best Practices in RTL Compiler

1 Purpose

This application note outlines some of the best practices and strategies when using optimization with
RTL Compiler (referred as RC throughout the document). The scope of the discussion does not cover the
advanced synthesis features such as Low Power and RC Physical. The commands and features being
discussed here are based on RC 10.1 release.

2 Data preparation

A good result from synthesis tool depends greatly on the input data. An old saying “garbage in garbage
out” is also true for RTL Compiler. Before attempting to run synthesis, the user should check the input
data, pay attention to the warning messages and correct any obvious issues.

2.1 Libraries

When reading in the timing libraries (only Liberty format supported), RC might mark some library cells as
unusable or timing model. These cells won’t be used by the tool for mapping, but they can be
instantiated in the HDL. A library cell becomes unusable when its timing or function is too complex for
RC to understand. In the other hand, when a library cell has no output function, or multiple outputs, it
will become a timing model, for example, a RAM cell. An exception to this is a full-adder or half-adder
cell, which has 2 outputs, but it is not considered as a timing model.

RC can mark a library cell as avoid (dont_use) when the cell exists in the timing library but does not exist
in the physical library (LEF file). This behavior usually surprises the user when it comes to mapping and
RC will issue this error message:

Libraries have 0 usable logic and 0 usable sequential lib-cells.


Error : Cannot perform synthesis because libraries do not have usable
inverters. [LBR-171] [synthesize]
: Inverters are required for mapping. Ensure that the loaded
libraries contain at least one usable inverter.
Error : Cannot perform synthesis because libraries do not have usable basic
gates. [LBR-172] [synthesize]
: At least one usable two-input and/or/nand/nor gate (modulo
inversion at inputs) is required for mapping. Ensure that the loaded
libraries contain at least one such cell.
Synthesis failed.

The library consistency checking mentioned above is can be disabled by the root attribute
‘lib_lef_consistency_check_enable’. However, it’s a good practice to identify the cause
of the library inconsistency and correct it before synthesis.

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 3
Optimization Best Practices in RTL Compiler

When using RC with the PLE flow (Physical Layout Estimates), the captable file needs to be read in
together with the LEF library for more accuracy. RC will check the consistency between the metal layers
defined in LEF versus captable and issue an error if there are mismatches.

2.2 HDL

Besides the obvious HDL syntax errors may occur during reading HDL, users need to pay attention to the
blackboxes and latch inference during elaboration. If RC does not find the HDL module or a library cell
for an instance referenced in the HDL, it will issue the following warning message and treat the instance
as a blackbox:

Warning : Black-boxes are represented as unresolved references in the design.


[TUI-273]
: Cannot resolve reference to 'dummy'.
: To resolve the reference, either load a technology library
containing the cell by appending to the 'library' attribute, or read in the
hdl file containing the module before performing elaboration. As the design
is incomplete, synthesis results may not correspond to the entire design.

If a module has only port definition, but no logic inside, it will be regarded as a logic abstract. RC will
issue a different message when encountering such modules:

Warning : Detected a logic abstract. [CDFG-331]


: Logic abstract 'dummy' in file 'bb.v' on line 9.
: A logic abstract is an unresolved reference with defined port names
and directions. It is inferred from an empty Verilog or VHDL design, or when
the 'black_box' pragma or 'blackbox' hdl_arch attribute is specified. Use
'set_attribute hdl_infer_unresolved_from_logic_abstract false /' to treat an
empty module as a defined module.

There is a slight different between a logic abstract and a blackbox. A logic abstract has port definitions,
which is not the case with blackboxes. All ports are treated as inout ports in a blackbox. During
optimization, the logic connecting to the blackbox or logic abstract instance will not be optimized or
deleted.

Another thing that the user might want to check after elaboration is unintentional latch inference. This
is normally occurs when coding case logic with missing condition or default condition. RC will issue an
informational message when it infers latches during elaboration:

Info : Latch inferred. [CDFG2G-616]


: Latch inferred for variable 'q' in file 'lat.v' on line 8, column
9.

If latches are not intended to have in the design, the attribute ‘hdl_error_on_latch’ can be
enabled before elaboration so that RC will issue an error message when it encounters a latch. After

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 4
Optimization Best Practices in RTL Compiler

elaboration, the command ‘report sequential’ can be used to report all the sequential instances
inferred. This is an alternative to check for any latch inference.

A useful command for sanity checking the design is ‘check_design’. This command will report
blackboxes, empty modules (logic abstract), floating pins, multidriven pins, and constant pins found in
the design. The command also reports cell consistency between the timing library and the physical
library.

2.3 Constraints

RC is very sensitive to timing constraints. During the first stage of global mapping, the timing constraints
are used to estimate the initial target slack, which is a timing goal for RC to work toward during the
optimization process. Inaccurate or unrealistic constraints can lead to a bad starting point during global
mapping, and consequently, a bad result at the end. The practice of overconstraining the design with a
small guard band is not recommended in RC.

At the minimum, the user should check the constraints with the ‘report timing -lint’ command
before synthesis. This command will report combinational loops, missing I/O constraints, flops with
missing clocks, and other timing related issues. More advanced constraints checking can be done with
the ‘write_do_ccd validate’ command, which writes a dofile for Conformal Constraint Designer
(CCD) tool. The user would use the output dofile and run CCD separately from RC for validating the
constraints.

Unrealistic constraints as large external delays, large output capacitive load, are not flagged by RC,
however, they might manifest later during global mapping as paths with a large target negative slack
which will not be possible to close timing. Such constraints need to be reviewed by the designer to
ensure validity before synthesis.

Constraints can be entered in SDC format with the ‘read_sdc’ command or interactively with the
‘dc::’ prefix. Avid RC users can also mix SDC with RC native constraints. One thing should be cautious
here is that RC native constraints always take ps and fF as units for time and capacitance respectively.

3 Top-down vs. Bottom-up

In general, RC would give a better QOR result (timing, area, power) when synthesizing top-down. This is
because it can see the entire picture of the design in the top-down flow compared to the bottom-up
flow. There is no capacity limit with the 64-bit version of the tool. However, sometimes it might not be
practical to synthesize from the top level due to computing resources and runtime. For example, if the
timing violation path contains in a single module, one could synthesize that module alone using the

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 5
Optimization Best Practices in RTL Compiler

bottom-up approach to resolve the issue rather than waiting for the result from synthesizing the entire
design.

The tricky part of the bottom-up approach is getting the correct I/O constraints for the synthesized
module as seen from the top design level. This can be done using the ‘derive_environment’ command
where RC extracts the module I/O constraints from the top level constraints. The basic flow as follows:

read_hdl {top.v and all modules}


elaboration top
read_sdc top.sdc
synthesize -to_map -effort low
derive_environment -sdc_only <module_instance_to_be_extracted>
cd /designs/<extracted_module>
write_sdc > module.sdc

Once the lower level module SDC is extracted, the module can be synthesized alone. The resulting
netlist of the module can be linked back to the top level design as follows:

read_netlist bottom.vg # netlist from a synthesized module


read_hdl {top.v and all modules}
elaboration top

# at this point, there are 2 designs in RC virtual directory: bottom, top


cd /designs/top
change_link -instance bottom_inst -design /designs/bottom

# if no re-synthesis required for the bottom module


set_attr preserve true bottom_inst

4 Wireload Model vs. PLE

During synthesis (not post-placement), the wire capacitance and resistance are traditionally estimated
from the wireload model (WLM). These WLMs are often provided with the technology libraries, which
are statistical parasitic values obtained from past designs. They are lookup table of wire capacitance,
resistance, and area based on the number of net fanouts. A more accurate form of WLM is a custom
WLM, which is derived after the design is placed. However, in order to place the design, the initial
netlist needs to be synthesized with the WLM supplied with the library or zero WLM. Another issue with
the custom WLM is that it represents a static view of the design placement when the model is extracted.
As the RTL and constraints are modified during the course of the design, the WLM becomes less
accurate.

The PLE (Physical Layout Estimates) in the other hand, models dynamically the effects of placement
based on the current state of the design and constraints. It requires at the minimum, the LEF library.
The addition of the captable file and the DEF floorplan will add more accuracy to the PLE model. If the
COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED
Page 6
Optimization Best Practices in RTL Compiler

LEF library is read in, RC will automatically be in the PLE mode for synthesis. This can be verified with
the root attribute ‘interconnect_mode’. In this mode, the ‘report ple’ command would show
the values of the estimated wire parasitic in each metal layer.

5 Runtime Consideration

RC has the superthreading capability where the synthesis job can be split up into parallel processes on
multiple machines or multiple CPUs. When invoking RC on a multi-processor machine with the
RTL_Compiler_Physical license, superthreading is automatically enabled and another synthesis
thread is launched on the second CPU. However, any additional thread after the second one would
require another RTL_Compiler_XL license. The number of CPUs or servers can be set with the root
attribute ‘super_thread_servers’. Currently, the max turn-around time reduction is close to 4X.

In addition to superthreading, RC also speeds up repeated synthesis runs by caching intermediate data.
This process is set up using a couple attributes:

# set the cache directory


set_attribute super_thread_cache <directoy> /

# set the cache size (default to 1000 MB)


set_attribute max_super_thread_cache_size <integer> /

6 Optimization Goals

In the default settings, RC will produce the smallest design that satisfies the timing constraints. The
power constraints need to be set by users, RC does not automatically optimize for power during
synthesis. There is no ‘one-size-fit-all’ recipe for optimization, the recommended flow is to run the
design through the baseline flow once with the ‘synthesize -to_map’ command, analyze the
result, then fine tune the flow if needed.

6.1 Timing

Timing is the number one priority in RC during optimization. RC will optimize the design to satisfy timing
constraints first, and then consider area, power, and design rules violations (DRC) later in the process.
The following are some ideas to fine tune optimization for timing improvement.

• Ungrouping

Generally, RC structures and optimizes logic better when there is no module hierarchy. For example,
datapath components across module hierarchies cannot be merged. However, it’s not recommended to

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 7
Optimization Best Practices in RTL Compiler

flatten the design into a single module either, the synthesized netlist will be difficult for verification and
debugging. In version 10.1, RC will automatically ungroup user hierarchies if they contain only muxes or
datapath components. This feature is enabled during high effort synthesis. User can turn off this auto
ungrouping with the global attribute:

set_attribute auto_ungroup none /

or selectively at the instance or subdesign:

set_attribute ungroup_ok false <instance/subdesign>

• Boundary Optimization

RC automatically performs boundary optimization for all modules in the design. During this
optimization, it does: constant propagation, removal of logic connected to undriven inputs or unloaded
outputs, collapsing equal or opposite hierarchical pins, hierarchical pin inversion, and rewiring of
equivalent signals. Boundary optimization should not be disabled unless it’s needed to resolve formal
verification issue. In this case, it can be disabled at selective module with the attribute or at the pin
level with the attributes:

boundary_opto
boundary_optimize_constant_hier_pins
boundary_optimize_equal_opposite_hier_pins
boundary_optimize_feedthrough_hier_pins
boundary_optimize_invert_hier_pins

• Datapath Optimization

Datapath optimization is enabled by default and done more aggressively in high effort generic synthesis.
This is where the architecture for datapath components are selected to best satisfy the timing
constraints. Also, operator merging, CSA transformation, and resource sharing are happening in this
optimization stage. To see what datapath architecture are selected by RC, use the ‘report
datapath’ command after generic synthesis and mapping stage. RC should select the ‘very_fast’
architecture if the datapath component is in a critical path. If it’s not the case, the user can manually
select the architecture by setting this attribute prior to generic synthesis:

set_attribute user_speed_grade very_fast <datapath_module>

• Path Grouping

During optimization, RC works on one cost group at the time until there is no improvement on the
critical path of the cost group. For every clock defined with the ‘create_clock’ constraint, a cost
group is created for all endpoints related to that clock waveform. If a design has a single clock, then all
paths in the design will be in the same cost group. Suppose the critical path is on the I/O paths (input-

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 8
Optimization Best Practices in RTL Compiler

to-register or register-to-output), if RC can improve timing, it will give up without optimizing the
register-to-register paths in the same cost group.

It’s a good practice to separate the I/O paths from the register-register paths in different cost groups so
that RC can still optimize for critical paths in each cost group. Example of cost group settings:

define_cost_group -name in2reg


path_group -from [all_inputs] -group in2reg

define_cost_group -name reg2out


path_group -to [all_outputs] -group reg2out

define_cost_group -name in2out


path_group -from [all_inputs] -to [all_outputs] -group in2out

define_cost_group -name reg2reg


path_group -from [all des seq] -to [all des seq] -group reg2reg

• TNS Optimization

As mentioned previously, RC works on the critical path (WNS) of each cost group until there is no more
improvement. The non-critical paths can be downsized for area saving. In the TNS optimization mode,
all endpoints of the cost group are optimized. Optimizing for TNS reduces the number of violating paths
and might reduce WNS in some cases. However, there is a drawback on runtime and possible area
increase. To enable TNS optimization, use:

set_attribute tns_opto true /

• Incremental Optimization

Multiple incremental synthesis runs can help to improve timing result. It’s a common practice to try a
couple incremental runs until there is no more improvement. If runtime is not a concern, one can try
the Ultra Incremental mode by setting the attribute ‘iopt_ultra_optimization’. In this
mode, RC will work rigorously to achieve the best result.

• Retiming

Retiming is an advanced optimization technique where registers are repositioned to reduce cycle time
or area without changing the input-output latency of the design. This technique is best fitted for a
design that can be pipelined. The drawback of this technique is possible problem in formal versification.
This advanced technique also requires RTL_Compiler_GXL license. Retiming can be enabled at the top
level or selective modules using the ‘retime’ attribute.

• Path Adjust

Tightening the constraint on a selective path would make the path become more critical and force RC to
work harder on it. This trick can help closing timing for a small number of violating paths. The
constraint needs to be set before mapping:
COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED
Page 9
Optimization Best Practices in RTL Compiler

path_adjust -delay <delay> -from <start_point> -to <end_point>

Use a negative delay number to tighten the constraint (the effect of subtracting X delay from the normal
cycle time). In contrary, a positive delay number would relax the constraint of the path, which can be
used for area saving on non-critical paths.

• Cell Biasing

During synthesis, the cell area has a great impact on the selection of the cell. Cells with small area are
more favorable if they have the same function compared to larger cells. In RC, the library cells can made
more favorable or less favorable during mapping by modifying the cell attribute ‘area_multiplier’.
Increasing this multiplier makes the cell less favorable for use and vice versa, decreasing the multiplier
makes it more favorable. For example, to encourage RC to use more complex cells (AOI, OAI gates), one
would set the area_multiplier for these cells to be less than 1.0.

• Initial Target Setting

At the beginning of the global mapping step, RC will estimate a target slack for each cost group. This
estimated target is based on the libraries, the logic structure, and the constraints. RC will work toward
this target number during the optimization process. In the logfile, search for the keyword ‘target slack’,
they’ll be printed before and after the global mapping step. A cost group with large negative target
slack would normally indicate a problem area. If the constraints are clean, one could try to set the initial
target to 0 or a positive number using the initial_target attribute. This will make RC to work harder on
this cost group. Example setting:

set_attribute initial_target 0 [find / -cost_group reg2reg]

Note, this attribute is hidden and should only be used as a last resource for corner cases. The result will
vary depending on the design.

6.2 Power

Power optimization is not enabled by default. RC will optimize for leakage power and dynamic power if
there are power constraints setting by the attributes: max_leakage_power and
max_dynamic_power. The constraint can be set for either leakage or dynamic power or both. In
case both power constraints are set, the attribute ‘lp_power_optimization_weight’ also
needs to be used to indicate the balance between leakage and dynamic power optimization. The
attribute is used as:

set_attribute lp_power_optimization_weight weight /designs/design

The weight (w) value is a floating number between 0 and 1. RC will use the weight to calculate the total
power in the formula:

weighted_total_power = w x leakage_power + (1-w) x dynamic_power

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 10
Optimization Best Practices in RTL Compiler

Without setting this weight factor, RC only optimizes for leakage power.

When optimizing for leakage power, RC will make use of multiple threshold voltage (Vth) libraries if they
are available. The cells from the high Vth library will be used in the non-critical path to reduce leakage
power. The root attribute ‘lp_multi_vt_optimization_effort’ controls how aggressively
the high Vth cells are used to reduce leakage power.

When optimizing for dynamic power, the user needs to have the switching activities annotated to
increase the accuracy of the dynamic power goal. RC uses a default switching activity of 0.02 toggle/ns
for every non-clock net. This switching activity might not be close to the actual switching activity
obtained when the design is in an operational mode. Therefore, the dynamic power that RC optimizes
for will not reflect the switching power during actual operation.

The trade-off between power and area can be controlled by the root attribute
‘power_optimization_effort’, which is set to ‘medium’ by default. At the ‘high’ setting, RC
will optimize aggressively to improve power in the expense of area impact.

6.3 Area

RC by default will optimize for a smallest design that satisfy the timing constraint, so there is no area
constraint (i.e. set_max_area) in RC. The logic in the non-critical paths is automatically downsized to
save area. Unused sequential instances are removed by default. It would be a good exercise to
synthesize the design without any constraints, that would yield smallest design possible, and the
number can be used as a baseline for area comparison in subsequent runs.

Assuming there is no timing violation in the design, these are possible places to check for area
reduction:

• Artificial DRC: DRC fixing is a source for area increase. Check for artificial max_fanout,
max_capacitance, and max_transition constraints that might have been set in the
design.
• Datapath components: RC usually starts with very_fast datapath architecture and downgrades
them as needed during optimization. For this reason, do not ungroup datapath components before
mapping, so that they can be downgraded as needed. During incremental synthesis, the datapath
components can be downsized by enable this root attribute ‘dp_postmap_downsize’.
However, this operation might increase runtime.
• Resource sharing: use high effort generic synthesis to enable resource sharing.
• Preserved modules: there should not be any preserved modules in non-critical paths, they won’t be
downsized for area saving.
• WLM: don’t use the segmented wireload mode, it gives the most pessimistic view of the net
parasitic and area.

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 11
Optimization Best Practices in RTL Compiler

• Boundary optimization: disabling boundary_opto might increase area since constant nets won’t
be propagated and optimized.
• Path adjust: the path_adjust method discussed in the Timing section above can be used to
relax the timing (by adding a positive delay number) for the non-critical paths for area saving.

SUMMARY

This application note outlined some of the best practices and strategies when using optimization with
RTL Compiler. We discussed on the how and why user should check the input data, pay attention to the
warning messages and correct any obvious issues. We discussed top-down vs. bottom-up; wire load vs.
PLE approaches, and also Runtime considerations. We finally described how to achieve optimizations
goals for timing, power, and area.

COPYRIGHT © 2012, CADENCE DESIGN SYSTEMS, INC. ALL RIGHTS RESERVED


Page 12

You might also like