You are on page 1of 5

Overview

Eclypse Low Power Solution: Clock Tree Synthesis


IC Compiler CTS Addresses Complex Power Issues
David Hsu, Director of Product Marketing, Synopsys Inc. Harvey Toyama, Product Marketing Manager, Synopsys Inc. December 2008

Abstract
With the predominance of mobile devices, the rising cost of energy, and an increasing sensitivity to green practices, low power consumption has become a major concern for design engineers. This paper will outline some best practices for low power design and explain how IC Compiler, a key part of Synopsys Eclypse Low Power Solution, delivers low power clock tree synthesis (CTS) that concurrently achieves the lowest design power and the best possible performance and area.
Eclypse Low Power Solution
Software and System Exploration Innovator DesignWare IP C R T L U P F Power-aware Verication VCS with MVSIM MVRC HSIM Power-aware Implementation Design Compiler DFT MAX IC Compiler Low Power Sign-o Formality, MVRC PrimeTime PX PrimeRail Services
Figure 1. Clock Tree Synthesis is a key component of IC Compiler within Eclypse

Low Power Methodology

Introduction
Over the past several years, low power design has steadily moved up the list of engineers design concerns and now resides right alongside timing as a major design objective. Several factors are driving low powers ascent. At the consumer level, the explosion in the popularity and capability of handheld systems has made extended battery life a major selling point for portable devices. On the opposite end of the spectrum, huge server farms that manage ever-growing Internet traffic require vast amounts of power, to the point where the cost of energy (for power and cooling) now overshadows the cost of the servers themselves. Rising energy costs and increasing awareness of global warming are also fueling the increasing sensitivity to power consumption, which ultimately has led to the burgeoning green movement in IC design. The Eclypse Low Power Solution provides the industrys most complete low power flow with a versatile portfolio of design and verification tools, intellectual property (IP) and design services all built upon the reliability of industry standards and silicon-proven methodology. IC Compiler, the physical implementation component of the Galaxy Design Platform and a key component in the Eclypse Low Power Solution, uses an array of techniques for low power design to achieve the best possible balance of minimum silicon area, maximum design performance, and lowest overall power consumption. The Eclypse Low Power Solution gives designers the technology, platforms and services they need to develop power-optimized, leading-edge silicon. IC Compiler and the Eclypse Low Power Solution support the industry-standard Unified Power Format (UPF), which is used to specify power behavioral information throughout the design flow. UPF allows designers to describe low power design intent and improve how advanced low power integrated circuits can be designed, verified and implemented. The Accellera standard UPF 1.0 (and IEEE P1801 pending ratification), permits all EDA tool providers to implement advanced tool features that enable the design and verification of low power ICs.
2008 Synopsys, Inc.

Technology Solutions
Before exploring the low power clock tree synthesis technologies in Synopsys IC Compiler, it is important to define the basic components of power consumption in an IC: dynamic and static power. Dynamic power is the power dissipated when a logic element changes states, i.e., 0-to-1 or 1-to-0 transitions in the design. Static power, also referred to as leakage power, is the power dissipated due to device leakage currents, even when the circuit is not switching. As silicon process nodes continue to shrink, static power, generally negligible above 180nm, has become a significant source of power loss. Hence, reducing both the dynamic and static power consumption in an IC is essential in low power design, particularly for designs at 90nm and below. In order to reduce both dynamic and static power consumption in an IC, many designs today use multiple voltage (MV) supplies to minimize overall power usage by allowing certain regions to operate at lower voltage (and, therefore, lower power) levels. These regions, referred to as power domains, require separate power rails and different component libraries to correctly accommodate the multiple voltage levels. To further reduce static power, regions can also be shut down completely, requiring MTCMOS cells or other power switching devices to correctly turn power on and off. Some techniques also enable these MV regions to have their voltage levels dynamically changed during operation. A full, robust, low power flow requires all of these techniques and capabilities to achieve maximum power savings.

Managing Clock Resources


Having established the basic components of power consumption and the multi-voltage architectures that reduce overall power, this paper will focus on clock resource optimization techniques that designers employ to conserve power. Since 30% to 50% of a designs power is typically consumed by the clock network, its important to wisely manage clock resources for low power design. In the past, when clocks were relatively slow and power consumption wasnt as great a concern, all nets on a clock tree were pulsed at every clock cycle. This synchronous clocking architecture even powered clock nets that werent switching a logic element, which was extremely inefficient. Clock skew, which occurs when the clock signal arrives at different sequential components at different times, was minimized using large, high-powered buffers. Today, with clock rates in the gigahertz range and with low power as a major design goal, dissipating power in the clock tree for sequential elements when data transitions are not occurring is simply an unacceptable waste of power. Clock gating is the most commonly used power savings technique in clock tree power management. Simply put, clock gating disables sections of the clock tree that dont have to be clocked on every cycle to maintain the functional integrity of the circuit. Clock gating may be instantiated at various stages in the design flow at RTL compilation, gate-level netlist, or post-placement. As with other areas of design implementation, clock gating has the greatest array of features and flexibility at the higher levels of design abstraction. A flow that supports the insertion of clock gating at the RTL level and utilizes additional refinement and optimization techniques throughout the implementation flow produces the most optimal results. In the physical implementation stage, power-aware placement is also a means to minimize power use. It is important, therefore, to analyze the distribution of registers throughout the design. The best practice is to place registers that share a clock buffer into localized areas. Using this technique, referred to as clustering, will conserve power because it minimizes the required strength of the buffers that reside on each branch of the clock tree. A placement that ignores the locality and proximity of one register to another and just scatters them at random can adversely affect power utilization while also introducing timing and on-chip variation (OCV) problems. An optimal flow will also use clock tree optimization techniques to fine tune the design after placement, but before routing. Within the performance parameters established at the outset of system design, optimization automatically makes trade-offs to balance skew, insertion delay and power to achieve the best overall design. In designs where the lowest power is essential, engineers can de-rate the clock, adjust strengths of buffers, and fine tune the number of clock gating cells used.

2008 Synopsys, Inc.

Finally, the best array of features still requires the most accurate analysis infrastructure to achieve optimal results. Clock tree synthesis requires highly accurate parasitic extraction and timing analysis. Since these are usually measured after placement and routing, it is essential that consistent analyses take place throughout the flow. Sharing the same analysis engines throughout the flow guarantees early work is highly correlated with final results. A design infrastructure or platform that utilizes common native engines from beginning to end will yield the best results, both in clocking and circuit performance, and in power utilization. With the essential elements of low power clock tree implementation defined, this paper will explore how the Synopsys Eclypse Low Power Solution enables engineers to achieve optimal power, timing, and area goals in their designs. A key part of the Synopsys Eclypse Low Power Solution is Design Compiler with Power Compiler, which together perform power-aware RTL synthesis, including clock gating insertion at the highest practical level of design abstraction and flexibility. This power-aware synthesis is driven by design constraints in the Synopsys design constraint (SDC) format, and can be guided by switching activity information in the switching activity interchange format (SAIF). Power-driven clock gating does not only insert clock gating; it can also remove clock gates where a non-gated clock would result in lower overall power consumption, and collapse and expand clock gating levels to achieve an optimally balanced clock gating structure.

Synopsys IC Compiler CTS


IC Compiler, a single, convergent, chip-level physical implementation tool in the Eclypse Low Power Solution, includes flat and hierarchical design planning, placement and optimization, clock tree synthesis (CTS) and routing, all with low power capabilities. IC Compilers feature-rich CTS engine produces the highest possible performance as measured by skew and insertion delay, while concurrently managing clock tree power consumption, and is fully integrated into the low power flow. Prior to optimizing the clock tree itself, IC Compiler performs power-aware placement on the standard cell logic. The placement engine uses SAIF information to assign activity-based net weighting to the nets in the design. It also invokes a clock tree estimator to optimize the placement area via the intelligent clustering of registers. These methods achieve the goal of locating registers in close proximity to each other when they are driven by the same clock buffer. This significantly reduces the power consumed by those nets and registers. The IC Compiler CTS engine will also evaluate the initial placement of the clock gates to determine if further optimization is necessary. If so, clock gates may be merged or split to further fine-tune the power profile. If multi-voltage or multi-threshold cells are required, then they will be placed at this time from the appropriate libraries, and power switch cells will be used to correctly govern any power domains that will be shut down. Once the design placement is complete for clock tree synthesis, IC Compilers CTS uses Synopsys PrimeTime and Star-RCXT analysis engines to analyze placement, clock tree synthesis, optimization and routing. PrimeTime and Star-RCXT, universally accepted by customers and foundries for industry-leading sign-off accuracy, are uniquely available to IC Compiler within the tool environment. IC Compiler CTS automatically arrives at the optimal solution taking into account logic optimization, placement, clock tree synthesis, and routing while employing the same timing analysis engine across all these aspects. Options may be set to utilize non default rules (NDR) for layer utilization, spacing, net width and shielding. The use of NDRs allows for minimized capacitance and thus faster performance, enhanced crosstalk immunity and lower power. These options will be enforced by the detailed router after clock tree synthesis is complete. NDRs may specify layers with lower resistance to reduce power. Clock shielding provides a secure clock structure that reduces coupling effects. Shielding can be done laterally, with shields on either side of the clock signal, or for more critical clock lines, coaxially, which places shields above and below as well as on the same layer. Clock tree optimization then examines the design to reduce buffer count and size where possible to further conserve power. The optimization step takes into account wire length and signal strength necessary to drive buffers in its analysis. Based on that information, optimization can automatically reduce buffer area and lower drive strength of the selected buffer to reduce power.

2008 Synopsys, Inc.

New Clock Mesh Technology


For designs with the highest performance requirements, IC Compiler CTS also provides Clock Mesh as an alternative to the conventional clock tree topology. As its name implies, the structure is a grid that would look like the mesh of a sieve or fabric, with clock straps traversing the block or die in horizontal rows and vertical columns. This mesh is driven by multiple drivers dispersed across the expanse of the fabric to evenly drive the mesh. Because the outputs of the multiple drivers are shorted in the mesh fabric, variations tend to get smoothed out and the effective result is a near-zero clock skew at the mesh inputs to lower level buffers and clock gates. One of the main benefits of Clock Mesh technology is this ability achieve much lower clock skew than with conventional clock tree topology. In addition, Clock Mesh technology offers significant improvements in the area of on-chip variation (OCV). OCV effects are of great concern at 45nm and below. Clock Mesh is inherently variation tolerant because the majority of all clock paths (root to lower-level buffer/ clock gate) are shared among all loads; only the last buffer-to-load paths are unique in a Clock Mesh flow. So while the mesh itself can consume more power than the equivalent tree implementation, Clock Mesh technology can still be a critical component to achieving the highest possible design performance goals, as part of a comprehensive low power methodology. Figure 2 shows an example of a Clock Mesh latency map, with less than 16ps (picoseconds) of skew across the entire chip.

Figure 2. New Clock Mesh technology minimizes clock skew across the entire chip

Conclusion
Todays design engineers are facing a growing need to conserve power. To meet low power requirements and still meet timing, cost, and time-to-market goals, they need a comprehensive, easy-to-use solution that takes them from RTL to sign-off. They also need robust low power IP and design services to help them meet their toughest design challenges. The Synopsys Eclypse Low Power Solution, which includes Design Compiler, Power Compiler and IC Compiler, provides all of these elements. At the heart of the implementation flow, the clock tree synthesis engine analyzes the timing, area, and power tradeoffs of the design, and optimizes the clock network during the physical layout process to produce the highest performance design with lowest area and power utilization. By integrating these technologies into one overall solution, the Eclypse Low Power Solution allows the designer to stay focused on solving low power issues without having to go outside the design environment to engage cumbersome point tool solutions. The Synopsys Eclypse Low Power Solution with IC Compiler gives engineers access to the golden timing and extraction engines of PrimeTime and Star RCXT. The Eclypse Low Power Solution provides engineers with everything they need to achieve predictable low power design success.

700 East Middlefield Road, Mountain View, CA 94043 T 650 584 5000 www.synopsys.com 2008 Synopsys, Inc. Synopsys, the Synopsys logo, DesignWare, Galaxy, Design Compiler, PrimeTime, and PCI Express are registered trademarks and Eclypse, Power Compiler, Star-RCXT, are trademarks of Synopsys, Inc. All other products or service names mentioned herein are trademarks of their respective holders and should be treated as such. Printed in the U.S.A. 12/08.CE.WO.08-16845

You might also like