You are on page 1of 5

Clock Tree Synthesis for Generated Clocks

Clock Tree Synthesis for Generated Clocks - By: Kashyap Kansara, ASIC Engineer, eInfochips
Nainesh Shah, ASIC Engineer, eInfochips

With advent of ultra deep sub-micron technologies a whole system can be integrated on a chip. Which means the designs will grow in size and as the complexity increases it becomes challenging to handle large clock structures. With addition of new features & higher complexity of chips the functionality also gets complex. For traditional designs with single clock domains it is easier to build and balance clock tree. But as clock domain cross over increases and number of generated clock domains are used, it becomes a challenging task to balance clock tree as well as achieve STA. This article mainly focuses on to implementation considerations for multiple generated clock domains. Following are the guide lines which need to be followed at different stages. At logical level: If the combinational elements are present in the clock path then their effect on the clock frequency must be taken into consideration. At the synthesis stage the care should be taken that the clock path is neither modified nor restructured, so applying hard forces for clock module would prove to be a good idea. E.g. apply forces on hand instantiated cell so they are neither remapped nor restructured.

At constraints preparation: The designer must check for application of correct forces on clock generation nodes, e.g. accurate source latency definitions must be applied. By default the source of clock generation will also consider network latency applied to all other clock sinks in master clock domain. Any sequential element which is generated clock source must be tapped with minimum delay. Because all sinks in generated clock domain will be delayed on the basis of generated clocks insertion delay. If required apply negative network latency on clock generation source node to consider succeeding stage clock network delay. For example: ck is clock pin of design and driving CK pin of generated clock source flop G. Flop F is driven by generated clock. Network latency for master clock (ck) is 1000p. Delay from output of G to CK pin of F flop is 400p. Apply 1000p network latency on master clock (ck). 400p network latency on generated clock -400p network latency at CK pin of Generated clock source G.
Dashboard October 2010 | dashboard@einfochips.com | www.einfochips.com Page 1

Clock Tree Synthesis for Generated Clocks

So delay from ck pin to CK pin of generated clock would be 600p and from output of G to CK pin of flop F would be 400p. Total insertion delay becomes 1000p. Once there are huge skew violations it takes a lot of runtime & multiple iterations to solve all of them with degrading quality of design. The designer also need to consider the interaction between a flop driven by generated clock and another flop driven by its master clock, for skew balancing. In such cases accurate constraints like source latency definition, network latency definition will make huge impacts on skew between master and generated clock domain sinks. Higher accuracy in constraints preparation can save runtime and yield better results. Identify the sinks in the design where multiple phases are converging and remove them by applying proper constants definitions in the constraints. Apply all other proper constraints between synchronous and asynchronous clock domains like false path, multi cycle path. Also make sure that there is proper correlation between PNR tool and sign-off tool for constraints.

At placement stage: The placement of the clock generation point must be constrained such that once the clock tree is built they can achieve insertion delay as expected at the synthesis stage. If required clusters should be created in order to achieve better timing for generated clock domains as well as its interacting domain.

CTS: The most important point to be considered at CTS stage is to prepare constraints for clock tree building. The skew anchor definition must be given to clock generation nodes in order to tap them early in the clock tree and reduce their insertion delay. Another important point is to define proper skew relationship between master & generated clock domain. Ignore clock pins of secondary clock generation nodes for skew balancing. Balance all sinks of secondary clock domain with master clock domain by applying proper phase relationship. Apply specific network & source latency target for all the clock paths & try to achieve them. Remove source latency definition applied on all generated clock nodes.

Dashboard October 2010 | dashboard@einfochips.com | www.einfochips.com

Page 2

Clock Tree Synthesis for Generated Clocks


Following is one of the real design examples which resemble above explanation: This design has 67K flops driven by 3 clock domains. Total insertion delay for block is 2000p. From figure Insertion delay is divided into 4 parts. If any flop is driven by root clock, insertion delay for those flops should be 2000p. Insertion delay for flop which is generating clock should be (Total Insertion delay (2000p) Insertion delay budgeted for generated clock (1400p)) = 600p. Insertion delay for all the flops working on generated clock should be 1400p. Generated clock is providing clock to the flops as well as also generating 2nd clock. Following constraints are defined for given example Source Latency definition:

Define source latency for all clocks when clock tree is in ideal mode. Remove source latency for generated clock once clock tree switches to computed mode from ideal mode. For STA analysis actual arrival time of clock is required. If we define source latency for generated clock in computed mode, arrival time for generated clock will be (source latency + actual network latency), so timings will not be accurate.

G1, G2, G3 = Clock gates (Generating clock from main clock). Source latency for Generated clock1 = 600p Source latency for Generated clock2 = (600p + 500p) = 1100p Source latency for Generated clock3 = (1100p + 500p) 1600p

Figure 1
Dashboard October 2010 | dashboard@einfochips.com | www.einfochips.com Page 3

Clock Tree Synthesis for Generated Clocks

Network Latency & skew constraints definitions - Apply network latency definition at each clock definition nodes - Apply negative latency definition (or clock balance point )at CK pins of Clock generating flops - Apply skew anchor definitions, skew phase relationships and skew balancing constraints - The results which are expected post CTS, are shown in figure 2 Total Insertion delay = 2000p, Insertion delay for Main clock = 2000p

ID limit for clock1 = 1400p, for clock2 = 900p, for clock3 = 400p Delay from CLK to CK pin of G1 = (2000p-1400p) = 600p Delay from output of G1 to CK pin of G2 = (1400p 900p) = 500p Delay from output of G1 to CK pin of FF23 & FF21 = 1400p Delay from output of G2 to CK pin of G3 = (900p 400p) = 500p Delay from output of G2 to CK pin of FF33 & FF31 = 900p Delay from output of G3 to CK pin of FF43, FF42 & FF41 = 400p All constraints definition related to latency is shown in Figure 1

Figure 2

Dashboard October 2010 | dashboard@einfochips.com | www.einfochips.com

Page 4

Clock Tree Synthesis for Generated Clocks

ID achieved post CTS is shown in Figure 2 which will result in balanced skew The design with above insertion delay values is now able to achieve least skew. With proper constraints, STA can be achieved in less number of iterations with out missing any logically valid paths. If proper constraints in terms of skew phases and clock phases are defined then PNR tools can built best possible clock tree, and they can be correlated best with the sign off tools.

Summary: From the above discussed points and examples we can conclude that to achieve best possible clock tree with multiple generated clock domains, it is critical to define constraints like negative network latency, source latency, skew and clock phase relationship between master-generated clock domains. To achieve STA with least number of iterations it is essential to keep all of these parameters in mind at each specific stage, where ever they are required.

About the Authors: Kashyap Kansara is an ASIC Engineer at eInfochips Limited, Ahmedabad, India, and can be reached at kashyap.kansara@einfochips.com Nainesh Shah is an ASIC Engineer at eInfochips Limited, Ahmedabad, India, and can be reached at nainesh.shah@einfochips.com

Dashboard October 2010 | dashboard@einfochips.com | www.einfochips.com

Page 5

You might also like