You are on page 1of 9

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO.

3, MARCH 2011

455

Using Launch-on-Capture for Testing Scan


Designs Containing Synchronous and
Asynchronous Clock Domains
Shianling Wu, Member, IEEE, Laung-Terng Wang, Fellow, IEEE, Xiaoqing Wen, Senior Member, IEEE,
Zhigang Jiang, Lang Tan, Yu Zhang, Yu Hu, Member, IEEE, Wen-Ben Jone, Senior Member, IEEE,
Michael S. Hsiao, Senior Member, IEEE, James Chien-Mo Li, Member, IEEE,
Jiun-Lang Huang, Member, IEEE, and Lizhen Yu, Member, IEEE

AbstractThis paper presents a hybrid automatic test pattern


generation (ATPG) technique using the staggered launch-oncapture (LOC) scheme followed by the one-hot LOC scheme
for testing delay faults in a scan design containing asynchronous
clock domains. Typically, the staggered scheme produces small
test sets but needs long ATPG runtime, whereas the one-hot
scheme takes short ATPG runtime but yields large test sets. The
proposed hybrid technique is intended to reduce test pattern
count with acceptable ATPG runtime for multi-million-gate scan
designs. In case the scan design contains multiple synchronous
clock domains, each group of synchronous clock domains is
treated as a clock group and tested using a launch aligned or
a capture aligned LOC scheme. By combining these schemes
together, we found the pattern counts for two large industrial
designs were reduced by approximately 1.7X to 2.1X, while the
ATPG runtime was increased by 10% to 50%, when compared
to the one-hot clocking scheme alone.
Manuscript received September 11, 2009; revised September 7, 2010;
accepted September 15, 2010. Date of current version February 11, 2011. This
work was supported in part by the National Science Foundation of America,
under Grant CCF-0541103, and by the Japan Society for the Promotions of
Science Grant-in-Aid for Scientific Research (B) 22300017. This paper was
recommended by Associate Editor F. Lombardi.
S. Wu is with SynTest Technologies, Inc., Princeton, NJ 08550 USA, and
also with the Department of Creative Informatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail: shianlingwu@syntest.com).
L.-T. Wang is with SynTest Technologies, Inc., Sunnyvale, CA 94086 USA,
and also with the Department of Electrical Engineering, National Taiwan
University, Taipei 106, Taiwan (e-mail: wang@syntest.com).
X. Wen is with the Department of Creative Informatics, Kyushu
Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail:
wen@cse.kyutech.ac.jp).
Z. Jiang is with the ATPG Research and Development Group, SynTest
Technologies, Inc., Sunnyvale, CA 94086 USA (e-mail: zjiang@syntest.com).
L. Tan, Y. Zhang, and L. Yu are with SynTest Technologies, Inc., Shanghai
201200, China (e-mail: ltan@syntest.com.cn; yzhang@syntest.com.cn;
lzyu@syntest.com.cn).
Y. Hu is with the Institute of Computing Technology, Chinese Academy of
Sciences, Beijing 100190, China (e-mail: huyu@ict.ca.cn).
W.-B. Jone is with the Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, OH 45221 USA (e-mail:
wjone@ececs.uc.edu).
M. S. Hsiao is with the Department of Electrical and Computer Engineering,
Virginia Polytechnic Institute and State University, Blacksburg, VA 24061
USA (e-mail: mhsiao@vt.edu).
J. C.-M. Li and J.-L. Huang are with the Department of Electrical
Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 106, Taiwan (e-mail: cmli@cc.ee.ntu.edu.tw;
jlhuang@cc.ee.ntu.edu.tw).
Digital Object Identifier 10.1109/TCAD.2010.2092510

Index TermsAligned launch-on-capture, at-speed scan


testing, double-capture, hybrid launch-on-capture, launch-oncapture, one-hot launch-on-capture, staggered launch-on-capture.

I. Introduction

CAN DESIGN is a design-for-testability (DFT) technique


in which the storage elements in a sequential circuit
are converted into scan cells and then these scan cells are
stitched together to form scan chains during scan testing [1]
[5]. By reconfiguring all storage elements into scan cells, the
complexity of automatic test pattern generation (ATPG) for
sequential circuits is transformed to that of manageable ATPG
for combinational circuits. Since the late 1990s, scan design
has become the most widely used DFT technique.
In recent years, with shrinking device geometry due to
advances in design and manufacturing technologies, circuits
containing millions or tens of millions of logic gates have
become common. While the scan design technique has offered
many benefits, it is now becoming a bottleneck for such
large designs due to the associated explosive increase in
test data volume. To fully detect defects in manufactured
chips, the amount of scan test data can easily overflow the
storage capacity of automatic test equipment (ATE). These all
contribute to an increase in test cost.
Traditionally, one of the most popular capture-clocking
schemes is one-hot clocking, in which every clock domain
is tested one by one. This scheme, however, often only saves
ATPG runtime but results in much more test patterns than
expected. Another scheme is simultaneous clocking in which
all clock domains are tested in parallel as long as data
propagating across clock domains are marked with unknown
(X) values whenever needed during ATPG. This scheme can
result in small pattern count but may lead to significant fault
coverage loss caused by the Xs.
In this paper, we first propose two capture-clocking
schemes, namely aligned clocking and staggered clocking,
which can be used to remedy the problems found in one-hot
clocking and simultaneous clocking. For ease of explanation,
we consider only delay faults, such as transition and path-delay
faults. Aligned clocking is mainly used for testing synchronous

c 2011 IEEE
0278-0070/$26.00

456

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011

Fig. 1. Basic at-speed test schemes. (a) Launch-on-shift (a.k.a. skewedload). (b) Launch-on-capture (a.k.a. broad-side or double-capture).

clock domains; whereas staggered clocking is for testing asynchronous clock domains. Next, we propose to partition clock
domains into clock groups, each of which contains a group of
synchronous clock domains or noninteracting asynchronous
clock domains. Synchronous clock domains are a group of
clock domains whose frequencies have integer multiple relations, e.g., 25, 50, and 100 MHz. Asynchronous clock domains
are a group of clock domains whose frequencies are totally
unrelated, e.g., 30, 48, and 100 MHz. By clock grouping, we
can effectively reduce the number of clock controls during
ATPG. We then analyze why using the staggered clocking
scheme alone achieves smaller pattern count but its ATPG
runtime could be much longer. Lastly, we demonstrate that
using a hybrid scheme, which combines staggered clocking
and one-hot clocking, we can reduce pattern counts by 1.7X to
2.1X for two large industrial designs while the ATPG runtime
is increased by 10% to 50%, when compared to the case of
using the one-hot clocking scheme alone.
The rest of this paper is organized as follows. Section II
discusses two basic test timing control diagrams for detecting
delay faults. Section III presents the proposed hybrid launchon-capture (LOC) schemes. Section IV discusses the hybrid
ATPG techniques. Section V shows results on two industrial
designs, and Section VI concludes this paper.

II. Background
There are two basic capture-clocking schemes for testing
multiple clock domains at-speed: 1) launch-on-capture (LOC),
and 2) launch-on-shift (LOS). LOC was referred to as broadside in [6] or double-capture in [4]. LOS was referred to as
skewed-load in [7]. Both schemes are helpful in detecting
structural faults and delay faults within each clock domain
(called intra-clock-domain faults) or across clock domains
(called inter-clock-domain faults). Delay faults include transition faults and path-delay faults.

Fig. 2.

One-hot LOC.

Fig. 3.

Simultaneous LOC.

Unlike the LOS technique, which uses the last shift clock
pulse to launch a transition, LOC uses a capture clock pulse
to launch the transition. Fig. 1 shows the two basic at-speed
test schemes. Typically, testing a scan-based BIST design
based on LOS for at-speed delay fault testing can achieve
higher fault coverage with a shorter test pattern count [8]
[14]. The problems are that LOS can cause unwanted overtesting because more false paths may be exercised, and incur
higher implementation costs because the scan enable signal
SE must be operated at-speed for each clock domain. This
is in sharp contrast to LOC in which only a slow-speed,
global scan enable signal GSE for all clock domains is
needed.
A. One-Hot Launch-on-Capture
Using the one-hot LOC approach, a launch clock pulse
followed by a capture clock pulse are applied to only one
clock domain during each capture window, while all other
clocks are held inactive. An example timing diagram is
shown in Fig. 2. It applies two capture pulses (C1 -followedby-C2 or C3 -followed-by-C4 ) at their respective clock domains frequencies (of period d1 or d2 ) to detect intraclock-domain delay faults, and uses a single, slow-speed
GSE to drive both clock domains. Thus, this approach can
be used for at-speed testing of intra-clock-domain delay
faults. The major disadvantage of one-hot LOC is long test
time.
B. Simultaneous Launch-on-Capture
The long test time problem of one-hot LOC can be resolved
by using the simultaneous LOC scheme illustrated in Fig. 3.
The simultaneous LOC scheme allows testing to be performed
on all clock domains in parallel, which is quite helpful when
clock domains do not interact with each other. For clock
domains where data may propagate from one clock domain
to the other, the values of source scan cells in the originating
clock domains will have to be forced to Xs during ATPG
in order to avoid any pattern mismatch. This could cause
significant fault coverage loss.

WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS

Fig. 5.
Fig. 4.

Capture aligned LOC.

III. Proposed Test Timing Control


This section proposes test timing control methods for
capture-clocking. Techniques to improve fault coverage and
reduce pattern count and ATPG runtime are discussed in
Section IV.
A. Aligned Launch-on-Capture for Synchronous Domains
The X-masking problem in simultaneous LOC can be mitigated by using the proposed aligned LOC scheme for synchronous clock domains, where the clock frequency in either
of two clock domains is an integer multiple of the other. The
aligned LOC scheme is effective in reducing the sequential
depth of the capture clocks as opposed to the staggered scheme
(to be described later) in the capture window. Hence, test
time can become shorter. Also, the scheme can detect interclock-domain faults among all synchronous clock domains
simultaneously.
There are two possible ways to implement the aligned LOC
scheme, namely, capture aligned LOC and launch aligned
LOC. Fig. 4 shows the timing of the capture aligned LOC
scheme. The major advantage of this approach is that all intraclock-domain and inter-clock-domain faults can be tested. A
Ci to C arrow in Fig. 4 represents a set of delay faults that
can be detected by a pair of clocks. For example, there are
three arrows from C1 to C. The horizontal arrow from C1
to C represents those intra-clock-domain delay faults within
the clock domain CK1 . The other two arrows represent those
inter-clock-domain delay faults from CK1 to CK2 and from
CK1 to CK3 , respectively. The remaining six arrows can be
interpreted in the same manner.
Since the active edges (rising edges) of the three capture
pulses (see dashed line C) must be aligned precisely, the
circuit must contain one reference clock, and the frequency of
all remaining test clocks must be derived from the reference
clock. In the example given here, CK1 is the reference clock
operating at the highest frequency, and CK2 and CK3 are
derived from CK1 and designed to operate at 1/2 and 1/4 the
frequency of CK1 , respectively. Therefore, this approach is
only applicable for at-speed testing of intra-clock-domain and
inter-clock-domain delay faults in synchronous clock domains.
A similar aligned LOC approach is shown in Fig. 5 that
aligns all first capture edges (i.e., the launch edges) rather than
second capture edges. This approach is referred to as launch
aligned LOC. Similar to capturing aligned LOC, it is also only

457

Launch aligned LOC.

applicable for at-speed testing of intra-clock-domain and interclock-domain delay faults in synchronous clock domains.
Consider the three clock domains, driven by CK1 , CK2 , and
CK3 , again. The eight arrows among the dashed line C and
the three capture pulses, C1 , C2 , and C3 , represent the intraclock-domain and inter-clock-domain delay faults detected by
the corresponding clocks. Note that in order to detect the interclock-domain delay faults from CK1 to CK3 a special capture
pulse C4 is required. As this method requires much more
complex timing-control diagram, a clock suppression circuit
similar to those proposed in [15][19] is needed to enable or
disable the selected capture pulses. The dotted clock pulses
shown in the figure indicate the suppressed capture pulses.
The main advantages of both aligned LOC approaches are
that: 1) all intra-clock-domain faults and inter-clock-domain
faults can be detected, and 2) a single, slow-speed GSE is
used. Hence, both approaches can be used for true at-speed
testing of synchronous clock domains. However, one major
drawback is that precise alignment of the capture pulses is
still required.
B. Staggered Launch-on-Capture for Asynchronous Domains
The staggered LOC scheme relaxes the capture alignment
requirement problem in the aligned LOC approaches [20],
[21]. A test timing control example is shown in Fig. 6. In this
figure, capture pulses C1 -followed-by-C2 and C3 -followed-byC4 are applied in a sequential or staggered order in the capture
window to test all intra-clock-domain faults and inter-clockdomain structural faults in the two clock domains. A daisychain clock-triggering or token-ring clock-enabling technique
similar to that described in [22] can be employed to generate
the ordered sequence of capture clock pulses.
Although this figure only shows the case of C1 -followedby-C2 occurring before C3 -followed-by-C4 , the reversed
order is also feasible. We will explain the selection of clock
order in the later section. The two capture pulses (C1 and
C3 ) are used to launch transitions at the outputs of some
scan cells, and the output responses to these transitions are
captured by the following two capture pulses (C2 and C4 ),
respectively. Both delays d2 and d4 are set according to the
operating frequency of their respective clock domains. Since
d1 , d3 , and d5 do not affect the detection of delay faults,
we can simply use a single, slow-speed GSE for driving all
clock domains. Hence, this scheme can be used to test all
intra-clock-domain faults and inter-clock-domain structural
faults in asynchronous clock domains.

458

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011

Fig. 6.

Staggered LOC.

When the logic crossing synchronous clock domains is


tested, d3 must satisfy the specified timing relation between
the two clock domains. However, there can be some delay
fault coverage loss among clock domains if a fixed order of
capture clocks is used across all capture windows. This fault
coverage loss is mostly related to sequentially redundant faults
that can be detected when one-hot clocking is employed.
IV. Hybrid ATPG Techniques
This section discusses the hybrid ATPG techniques that
can be used to reduce pattern count with reasonable ATPG
runtime. First, clock grouping and clock ordering along with
clock specification are performed. Then, the proposed hybrid
staggered-followed-by-one-hot ATPG scheme, combining the
staggered LOC and one-hot LOC approaches, based on the
clock grouping results, is discussed.
A. Clock Grouping
The first step in reducing ATPG runtime of testing a scan
design is to identify all asynchronous clock domains that
do not interact with each other or run in a synchronous
manner. For data paths that originate and terminate at different
asynchronous clock domains, additional care must be taken in
terms of the way the clocks are applied, in order to guarantee
the success of the capture operation. This is mainly due to the
fact that the clock skew between different clock domains is
typically nondeterministic. A data path originating in one clock
domain and terminating in another might result in erroneous
captured values when both clocks are pulsed simultaneously,
and the clock skew between the two clocks is larger than
the data path delay from the originating clock domain to the
terminating clock domain. In order to avoid the mismatch, the
timing governing the relationship of such a data path shown
in the following equation must be observed:
Clock-skew < Data-path-delay + Originating-Clock-to-Q
delay.
If the above relation does not hold, a mismatch may occur
during the capture operation. In order to prevent this problem,
clocks belonging to different clock domains can be applied
sequentially (using the staggered clocking scheme), as opposed
to simultaneously, such that any clock skew which exists
between the clock domains can be ignored during the test
generation process. It is also possible to apply only one
clock during each capture operation using the one-hot clocking
scheme. However, almost all designs have noninteracting clock
domains that can be applied simultaneously to reduce the
complexity and final pattern count of the pattern generation

Fig. 7.

Clock grouping example.

and fault simulation processes. Clock grouping is a process


used to analyze all data paths in the circuit in order to
determine all independent or noninteracting clocks, which can
be grouped and applied simultaneously. Note that we can
still group two noninteracting asynchronous domains together
when the operating frequencies are different. The reason is
because an on-chip clock is typically used for controlling each
asynchronous clock domain.
An example of the clock grouping process is shown in
Fig. 7. This example shows the results of performing a circuit
analysis operation on a scan design to identify all clock domain
interactions, where an arrow indicates a data transfer from
one clock domain to a different clock domain. As shown in
Fig. 7, the circuit in this example has seven clock domains,
CD1 CD7 , and five crossing-clock-domain data paths, CCD1
CCD5 . From this example it can be seen that CD2 and CD3
are independent from each other, and hence their related clocks
can be applied simultaneously during test of CK2 . Similarly,
clock domains CD4 through CD7 can also be applied simultaneously during test of CK3 . Therefore, in this example, three
grouped clocks instead of seven individual clocks can be used
to test the circuit during the capture operation.
B. Clock Ordering
Each clock group thus consists of a group of noninteracting
asynchronous clock domains or a group of synchronous clock
domains running at frequencies of multiple integers. As each
clock group varies in gate count (circuit size), the order of
these clock groups plays an important role in the circuits fault
coverage that can be obtained.
There are n! different ways to order n clock groups when
staggered clocking is employed. Using the gate count of each
clock group as an ordering criterion, a common approach
is to place the clock groups either in descending order or
in ascending order. Although it is difficult to predict which
of the n! clock orders would give the best result [23], [24],
one logical reasoning would be that the descending order can
yield better results than the ascending order. In the staggered
approach, the clock groups that receive their capture clock
pairs later are at a disadvantage due to higher sequential depth
during ATPG. This is because their generated patterns should
traverse through a larger number of clock cycles to justify
values through the storage elements of other clock groups
that received their clock pairs earlier. Thus, larger sized clock
groups should receive their clock pairs earlier, so justification

WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS

459

Because all of the above-mentioned clocking schemes can


cause fault coverage loss, large pattern count, or long ATPG
runtime, we propose a hybrid scheme below to reduce pattern
count with reasonable ATPG runtime while maintaining the
same fault coverage as the one-hot clocking scheme.
D. Staggered-Followed-by-One-Hot ATPG

Fig. 8. Clock order when GCD1 is larger than GCD2 . (a) In descending
order. (b) In ascending order.

can be done earlier, resulting in better fault coverage, pattern


count, and ATPG runtime overall.
Assume the gate count of grouped clock domain GCD1
is larger than that of grouped clock domain GCD2 ,
Fig. 8(a) and (b) shows the clock order of CK1 (controlling
GCD1 ) and CK2 (controlling GCD2 ) in descending order and
ascending order, respectively.
C. Clock Specification
The identified ordered clock groups can now be used for
capture-clocking using the basic LOC schemes described
in the previous two sections. During ATPG, we specify the
clock pulses in the capture window according to the given
clock order. For instance, in Fig. 8(a), we can specify the two
clocks and GSE as
%CK1 = 010100000;
%CK2 = 000001010;
%GSE = 000000000.
Similarly, in Fig. 5, we can specify the clock order as
%CK1
%CK2
%CK3
%GSE

=
=
=
=

01010001000000;
01100110000000;
01111000011110;
00000000000000.

The idea on specifying the clocks in the above format is


to allow for the ATPG program to properly perform circuit
expansion (time-frame expansion) prior to ATPG, depending
on whether some clocks may have overlapping or nonoverlapping clock pulses. For example, 10 rather than 14 time
frames will be expanded for the ATPG process of Fig. 5 shown
above, so as not to have two or more consecutive clock phases
(columns), like 0100 or 0010. This is particularly helpful
when an aligned LOC approach is used for testing synchronous
clock domains.

The hybrid approach is to apply the staggered LOC scheme


in the first phase and the one-hot LOC scheme in the second
phase. During the first phase, all clock groups are specified in
a predetermined, sequential, or staggered order. ATPG is then
performed based on the given staggered order. In order to
reduce ATPG runtime, circuit expansion based on the ordered
sequences of clock groups is done on the scan design during
a preprocessing step. Since the staggered clocking scheme
specifically deploys physically disjoint capture clock pulses
from different clock domains (in our case, different clock
groups), there is no need to insert Xs at the fanout branches
of each originating flipflop in any originating clock domain.
Therefore, this staggered approach will not create unnecessary
Xs, complicating response compaction in a compression design. Since staggered clocking can cause the ATPG program
to mark hard-detected faults as untestable or undetected due
to the ordered sequence of clock groups, the second phase
running one-hot ATPG is required to detect those missed
faults.
During ATPG, fault coverage, pattern count, and ATPG
runtime are closely monitored in the program to determine the
timing to switch over from staggered ATPG to one-hot ATPG.
The switch-over criteria of this two-phase capture-clocking
scheme can be made more intelligent, e.g., by monitoring the
increment in fault coverage and runtime versus pattern count
or a percentage of faults already processed. All such rule sets
are used in the program to automatically determine a switchover point that achieves a balance between ATPG runtime and
pattern count.
V. Experimental Results
The proposed hybrid staggered-followed-by-one-hot LOC
scheme has been applied to many industrial designs. We
present two large designs in the range of 15 million primitives
to illustrate the effectiveness of the proposed scheme.
A. Design Statistics
Table I summarizes the statistics of the two designs. We
developed a program to identify all independent clock groups.
A clock group consists of the clocks that do not interact
with each other or control a group of synchronous clock
domains. This allows all clocks in the clock group to be
activated simultaneously during capture without suffering from
any clock skew issue. In the experiments, we then performed
ATPG based on the number of clock groups identified by the
program.
As an LOC scheme is employed for testing scan designs
and an internal PLL-triggered on-chip test clock is often
used to control a clock domain, the proposed hybrid LOC

460

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011

No.
No.
No.
No.
No.

of
of
of
of
of

TABLE I

TABLE IV

Design Statistics

Hybrid Clocking Results on Two Industrial Designs

Design A
1.1 M
3 109 012
102 K
33
5

primitives
faults
flipflops
clock domains
clock groups

Design B
4.7 M
8 879 940
281 K
8
8

Circuit
Design
A

Design
B

TABLE II
Application Results on Design A
One-Hot
2 556 801
84.24%
8309
9:09:06

Hard-detected faults
Fault coverage (%)
Pattern count (one-hot/staggered)
ATPG runtime

Staggered
2 490 101
80.09%
7705(1.08X)
21:20:40

TABLE III
Application Results on Design B

Hard-detected faults
Fault coverage (%)
Pattern count (one-hot/staggered)
ATPG runtime

One-Hot
7 667 136
86.34%
39 099
41:45:53

Staggered
7 063 030
82.93%
12 401(3.15X)
66:19:39

scheme presented here would not have any additional impact


on design and its implementations. One only has to ensure that
in the staggered approach, enough delay is inserted between
interacting clock domains so data can propagate from an
originating domain to all receiving domains.
The one-hot clocking and staggered clocking schemes were
first applied independently to the two industrial designs listed
in Table I. Only intra-clock-domain transition faults are considered. The computer used was a 64-bit based PC operating
at 2.5 GHz under the Linux operating system.
B. Results
Tables II and III summarize the test application results.
We consider only transition faults existing between flipflops
within each clock domain. The results show that one-hot
clocking leads to shorter ATPG runtime and higher fault
coverage but larger pattern count than staggered clocking. This
is expected as staggered clocking can result in sequentially
untestable faults.
The results using the proposed hybrid clocking scheme
on Designs A and B are listed in Table IV. In our experiments, staggered clocking is automatically switched to onehot clocking after the switch-over criteria are met. The first
column shows the circuit name. In the next three columns,
fault coverage, pattern count, and ATPG runtime are associated
with three numbers each. The first number is the result using
staggered clocking, the second using one-hot clocking, and the
third is the sum of the two steps.
For Design A, the pattern count using one-hot clocking
alone given in Table II is 1.77 (= 8309/4697) times the
pattern count using the hybrid approach given in Table IV.

Fault
Coverage
78.84%
+
5.40%
(84.24%)

Pattern
Count
1505
+
3192
(4697)
(1.77X)
1792
+
16 857
(18 649)
(2.10X)

76.34%
+
10.00%
(86.34%)

ATPG
Runtime
6:18:16
+
7:05:58
(13:24:14)
08:04:52
+
39:09:43
(47:14:35)

TABLE V
Results on Design A in Descending and Ascending Orders
Design A
In descending
order

In ascending
order

Fault
Coverage
78.84%
+
5.40%
(84.24%)
78.60%
+
5.64%
(84.24%)

Pattern
Count
1505
+
3192
(4697)
(1.77X)
1428
+
3522
(4950)
(1.68X)

ATPG
Runtime
6:18:16
+
7:05:58
(13:24:14)
6:06:12
+
7:45:33
(13:51:45)

The ATPG runtime, on the other hand, was increased by


approximately 50%. On the contrary, for Design B, the pattern
count using one-hot clocking alone given in Table III is 2.10
(= 39099/18649) times the pattern count using the hybrid
approach given in Table IV. The ATPG runtime, on the other
hand, was increased by approximately 10%.
C. Comparison
Table V shows the ATPG results using hybrid clocking
with the gate counts in the five clock groups processed in
descending and ascending orders for Design A. ATPG in
descending order means that a clock group with the largest
gate count is captured first. The result indicates that performing
ATPG based on the descending order of the gate counts of all
clock groups yields smaller pattern count and ATPG runtime
than on the ascending order. The reason was mainly due to
reduced sequential depth, as explained in Section IV-B. By
making larger sized clock groups receive their clock pairs
earlier, faults inside these clock groups would not have to
justify through other smaller sized clock groups. This will lead
to higher fault coverage, smaller pattern count, and shorter
ATPG runtime.
D. Summary
In summary, the applications results show that: 1) proper
clock grouping and clock ordering help reduce pattern count,
and 2) the proposed hybrid scheme on average can yield
1.7X to 2.1X reduction in pattern count as compared to using
the one-hot scheme alone. One-hot clocking, however, has
the benefit of shorter ATPG runtime. Hence, we recommend
simply using one-hot ATPG at the early development stage

WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS

to have a feel of the fault coverage and pattern count of


the design. When the design is being taped out, the hybrid
scheme is then run to reduce pattern count. Since the same
maximum fault coverage can be reached as one-hot clocking,
the proposed hybrid scheme provides an ideal solution to
reduce manufacturing test cost.
VI. Conclusion
Modern scan designs can contain tens of millions of logic
gates and dozens of clock domains. When a scan design
contains a mix of synchronous and asynchronous clock domains, the conventional one-hot LOC scheme for captureclocking results in very large test sets. This can substantially
increase both test application time and scan test cost. While
the simultaneous LOC scheme could be used to reduce pattern
count, the loss of fault coverage and applicability to test
compression could be unacceptable.
In this paper, we first presented an aligned LOC scheme, either launch aligned or capture aligned, for testing synchronous
domains in a scan design. We next presented a staggered
LOC scheme for testing asynchronous domains. After clock
grouping, we then presented a hybrid ATPG technique that
combines staggered LOC and one-hot LOC clocking schemes
together. The hybrid staggered-followed-by-one-hot scheme
resulted in 1.7X to 2.1X reduction in pattern count with ATPG
runtime increase by approximately 10% to 50%, compared
to the one-hot scheme alone, on two large industrial scan
designs that contain asynchronous clock groups. Because onehot clocking is always used after staggered clocking, the
hybrid scheme causes no loss in fault coverage.
It should be noted that novel, commercial ATPG approaches,
such as those proposed in [25] and [26] for detecting stuck-at
faults, may also be applicable for testing delay faults. Although
we are unable to compare the results, we predict that the
proposed hybrid scheme could result in smaller pattern count,
because the patented staggered clocking scheme is applied to
all asynchronous clock domains in the first phase.
Acknowledgment
The authors are grateful to the anonymous referees for
pointing out unclear descriptions of this paper and giving
constructive suggestions.
References
[1] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems
Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990.
[2] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for
Digital, Memory and Mixed-Signal VLSI Circuits. Boston, MA: Springer,
2000.
[3] N. K. Jha and S. K. Gupta, Testing of Digital Systems. London, U.K.:
Cambridge University Press, 2003.
[4] L.-T. Wang, C.-W. Wu, and X. Wen, Eds., VLSI Test Principles and
Architectures: Design for Testability. San Francisco, CA: Morgan Kaufmann, 2006.

461

[5] L.-T. Wang, C. E. Stroud, and N. A. Touba, Eds., System-on-Chip Test


Architectures: Nanometer Design for Testability. San Francisco, CA:
Morgan Kaufmann, 2007.
[6] J. Savir and S. Patil, Broad-side delay test, IEEE Trans. Comput.-Aided
Design, vol. 13, no. 8, pp. 10571064, Aug. 1994.
[7] J. Savir and S. Patil, Scan-based transition test, IEEE Trans. Comput.Aided Design, vol. 12, no. 8, pp. 12321241, Aug. 1993.
[8] S. Wang, X. Liu, and S. T. Chakradhar, Hybrid delay scan: A low hardware overhead scan-based delay test technique for high fault coverage
and compact test sets, in Proc. IEEE/ACM Design, Autom. Test Eur.
Conf., Feb. 2004, pp. 12961301.
[9] J. Abraham, U. Goel, and A. Kumar, Multi-cycle sensitizable transition
delay faults, in Proc. IEEE VLSI Test Symp., Apr.May 2006, pp. 306
313.
[10] Z. Zhang, S. M. Reddy, I. Pomeranz, X. Lin, and J. Rajski, Scan tests
with multiple fault activation cycles for delay faults, in Proc. IEEE
VLSI Test Symp., Apr.May 2006, pp. 343348.
[11] N. Ahmed and M. Tehranipoor, Improving transition delay test using a
hybrid method, IEEE Design Test Comput., vol. 23, no. 5, pp. 402412,
Sep.Oct. 2006.
[12] G. Xu and A. D. Singh, Delay test scan flipflop: DFT for high
coverage delay testing, in Proc. Int. Conf. VLSI Des., Jan. 2007, pp.
763768.
[13] G. Xu and A. D. Singh, Achieving high transition delay fault coverage
with partial DTSFF scan chains, in Proc. IEEE Int. Test Conf., Oct.
2007, pp. 19.
[14] I. Park and E. J. McCluskey, Launch-on-shift-capture transition tests,
in Proc. IEEE Int. Test Conf., Oct. 2008, pp. 19.
[15] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan,
and J. Rajski, Logic BIST for large industrial designs: Real issues and case studies, in Proc. IEEE Int. Test Conf., Sep. 1999,
pp. 358367.
[16] M. Beck, O. Barondeau, M. Kaibel, F. Poehl, X. Lin, and R. Press,
Logic design for on-chip test clock generation: Implementation details
and impact on delay test quality, in Proc. IEEE/ACM Design Autom.
Test Eur. Conf., Mar. 2005, pp. 5661.
[17] H. Furukawa, X. Wen, L.-T. Wang, B. Sheu, Z. Jiang, and S. Wu,
A novel and practical control scheme for inter-clock at-speed testing,
in Proc. IEEE Int. Test Conf., Oct. 2006, pp. 110.
[18] X.-X. Fan, Y. Hu, and L.-T. Wang, An on-chip test clock control scheme
for multi-clock at-speed testing, in Proc. IEEE Asian Test Symp., Oct.
2007, pp. 341348.
[19] B. Keller, A. Uzzaman, B. Li, and T. Snethen, Using programmable onproduct clock generation (OPCG) for delay test, in Proc. IEEE Asian
Test Symp., Oct. 2007, pp. 6972.
[20] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J. Chao,
and X. Wen, Multiple-capture DFT system for detecting or locating
crossing clock-domain faults during scan-test, U.S. Patent 7 260 756,
Aug. 21, 2007.
[21] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J. Chao,
and X. Wen, Multiple-capture DFT system for detecting or locating
crossing clock-domain faults during self-test or scan-test, European
Patent 1 360 513, Apr. 2, 2008.
[22] L.-T. Wang, X. Wen, S. Wu, H. Furukawa, H.-J. Chao, B. Sheu, J.
Guo, and W.-B. Jone, Using launch-on-capture for testing BIST designs
containing synchronous and asynchronous clock domains, IEEE Trans.
Comput.-Aided Des., vol. 29, no. 2, pp. 299312, Feb. 2010.
[23] L.-T. Wang, K. S. Abdel-Hafez, X. Wen, B. Sheu, and S.-M. Wang,
Smart capture for ATPG (automatic test pattern generation) and fault
simulation of scan-based integrated circuits, U.S. Patent 7 124 342,
Oct. 17, 2006.
[24] K. S. Abdel-Hafez, L.-T. Wang, B. Sheu, Z. Wang, and Z. Jiang,
Method for performing ATPG and fault simulation in a scan-based
integrated circuit, U.S. Patent 7 210 082, Apr. 24, 2007.
[25] V. Jain and J. Waicukauski, Scan test data volume reduction in multiclocked designs with safe capture technique, in Proc. IEEE Int. Test
Conf., Oct. 2002, pp. 148153.
[26] X. Lin and R. Thompson, Test generation for designs with multiple
clocks, in Proc. IEEE/ACM Design Autom. Conf., Jun. 2003, pp. 662
667.

462

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 3, MARCH 2011

Shianling Wu (S88M09) received the M.S. degree in computer science from Columbia University,
New York, NY.
She joined SynTest Technologies, Inc., Princeton,
NJ, in 2003, and is currently the Vice President of
Engineering focusing on advanced very large scale
integration design-for-testability (DFT) research and
development. Prior to SynTest, she was with Bell
Laboratories, Madison, WI, for over 23 years. In
2008, she was with the Department of Creative Informatics at Kyushu Institute of Technology, Iizuka,
Fukuoka, Japan, where she is now a Ph.D. candidate. She currently holds
five U.S. patents and has three pending U.S. patents. She has published over
15 DFT papers and contributed chapters to two DFT textbooks: VLSI Test
Principles and Architectures in 2006 and Electronic Design Automation in
2009.
Ms. Wu has served as a Program Committee Member for the IEEE
International Test Conference, the Asian Test Symposium, and the North
Atlantic Test Workshop. She won numerous AT&T and Lucent Awards
and received the Best Panel Award with her panelists in the 2005 IEEE
International Test Conference. She was a member of SEMATECH, SRC,
GSRC, STARC-International, VSIA, and the IEEE1500 Standard Committee.
Laung-Terng Wang (M87SM04F08) received
the B.S.E.E. and M.S.E.E. degrees from National
Taiwan University, Taipei, Taiwan, in 1975 and
1977, respectively, and the M.S.E.E. and E.E.Ph.D.
degrees under the Honors Cooperative Program from
Stanford University, Stanford, CA, in 1982 and
1987, respectively.
He has been the Chairman and Chief Executive
Officer with SynTest Technologies, Inc., Sunnyvale,
CA, since January 1990, and a Visiting Professor
with the Department of Electrical Engineering and
the Graduate Institute of Electronics Engineering, National Taiwan University
since July 2009. Prior to founding SynTest in 1990, he held several positions
in industry, including Intel, Santa Clara, CA, from 1980 to 1983, and
Daisy Systems, Mountain View, CA, from 1983 to 1986, and was with the
Department of Electrical Engineering, Stanford University, as a Research
Associate and Lecturer from 1987 to 1991. He currently holds 28 U.S.
patents, 15 European patents, one Japanese patent, and one Chinese patent
in the areas of scan synthesis, test generation, at-speed scan testing, test
compression, logic built-in self-test, and design for debug-and-diagnosis. The
design-for-testability technologies developed by him have been successfully
implemented in thousands of application-specific integrated circuit designs
worldwide. He has also co-authored and co-edited three internationally used
DFT/EDA textbooks: VLSI Test Principles and Architectures in 2006, Systemon-Chip Test Architectures in 2007, and Electronic Design Automation in
2009.
Dr. Wang is a member of Sigma Xi. He received the 2007 Meritorious
Service Award from the IEEE Computer Society and was a co-recipient of
the 2008 IEICE Information and Systems Society Excellent Paper Award for
an excellent series of papers that appeared in the IEICE Transactions on
Information and Systems during a period of 5 years. He is a Golden Core
Member of the IEEE Computer Society, and is a member of the 2010 IEEE
Computer Society Fellow Evaluation Committee.
Xiaoqing Wen (S89M93SM08) received the
B.E. degree in computer science and technology
from Tsinghua University, Beijing, China, in 1986,
the M.E. degree in information engineering from
Hiroshima University, Hiroshima, Japan, in 1990,
and the Ph.D. degree in applied physics from Osaka
University, Osaka, Japan, in 1993.
From 1993 to 1997, he was an Assistant Professor
with Akita University, Akita, Japan. He was a Visiting Researcher with the University of Wisconsin,
Madison, from October 1995 to March 1996. He
joined SynTest Technologies, Inc., Sunnyvale, CA, in 1998, and served as
its Chief Technology Officer until 2003. In 2004, he joined the Department
of Creative Informatics, Kyushu Institute of Technology, Iizuka, Fukuoka,
Japan, where he is currently a Professor. He co-authored and co-edited two
books: VLSI Test Principles and Architectures: Design for Testability (San
Francisco, CA: Morgan Kaufmann, 2006) and Power-Aware Testing and Test
Strategies for Low Power Devices (New York: Springer, 2009). He currently

holds 23 U.S. patents and five Japanese patents in logic built-in self-test,
test compression, and low-capture-power (LCP) test generation. His current
research interests include design, test, and diagnosis of integrated circuits.
Dr. Wen is a member of the IEICE, IPSJ, and REAJ. He received the 2008
IEICE-ISS Best Paper Award for LCP X-filling/test generation.
Zhigang Jiang received the B.S. degree from
the Department of Electrical Engineering, Tsinghua
University, Beijing, China, in 1995, the M.S. degree
from the Department of Electrical Engineering, San
Jose State University, San Jose, CA, in 1997, and
the Ph.D. degree from the Department of Electrical
Engineering, University of Southern California, Los
Angeles, in 2005.
He currently manages the ATPG Research and Development Group, SynTest Technologies, Sunnyvale,
CA. His current research interests include design for
testability, built-in self-test, fault diagnosis, and design of high-performance
computer-aided design tools.
Lang Tan received the B.S. degree in computer
science from Central South University, Changsha,
China, in 2004, and the M.S. degree from the
Department of Computer Science, Shanghai Jiaotong
University, Shanghai, China, in 2007.
He is currently a Research and Development Engineer with SynTest Technologies, Inc., Shanghai.
His current research interests include design for
testability, test compression, low-power testing, and
fault diagnosis.

Yu Zhang received the B.S. degree from the Department of Computer Science, Anhui University,
Hefei, China, in 2005, and the M.S. degree from
the Department of Computer Science, University of
Science and Technology of China, Hefei, in 2008.
He is currently a Research and Development Engineer with the ATPG Group, SynTest Technologies,
Inc., Shanghai, China. His primary research interests
include design for testability, fault modeling, test
generation, test compression, and low-power testing.

Yu Hu (M06) received the B.S., M.S., and Ph.D.


degrees, all in electrical engineering from the University of Electronic Science and Technology of
China, Chengdu, China, in 1997, 1999, and 2003,
respectively.
She is currently an Associate Professor with the Institute of Computing Technology, Chinese Academy
of Sciences, Beijing, China. Her current research
interests include reliable design, fault diagnosis, and
testing.
Dr. Hu is a member of ACM, IEICE, and CCF.
Wen-Ben Jone (M84SM02) was born in Taipei,
Taiwan. He received the B.S. degree in computer science and the M.S. degree in computer engineering,
both from National Chiao-Tung University, Hsinchu,
Taiwan, in 1979 and 1981, respectively, and the
Ph.D. degree in computer engineering and science
from Case Western Reserve University, Cleveland,
OH, in 1987.
In 1987, he joined the Department of Computer
Science, New Mexico Institute of Mining and Technology, Socorro, where he was promoted as an
Associate Professor in 1992. From 1993 to 2000, he was a Full Professor
with the Department of Computer Engineering and Information Science,
National Chung-Cheng University, Chiayi, Taiwan. Since 2001, he has been
an Associate Professor with the Department of Electrical and Computer

WU et al.: USING LAUNCH-ON-CAPTURE FOR TESTING SCAN DESIGNS CONTAINING SYNCHRONOUS AND ASYNCHRONOUS CLOCK DOMAINS

Engineering, University of Cincinnati, Cincinnati, OH. He was a Visiting


Scholar with the Institute of Information Science, Academia Sinica, Taipei,
Taiwan, and with the Department of Computer Science and Engineering,
Chinese University of Hong Kong, Shatin, Hong Kong. His current research
interests include very large scale integration design for testability and reliability, low-power circuit design and test, and computer architecture and parallel
processing.
Dr. Jone was a co-recipient of the 2003 IEEE Donald G. Fink Prize Paper
Award. He was also a co-recipient of the Best Paper Award of the 2008
International Symposium on Low-Power Electronics and Design.
Michael S. Hsiao (S95M97SM04) received
the B.S. degree in computer engineering (highest
honors), and the M.S. and Ph.D. degrees in electrical engineering from the University of Illinois
at Urbana-Champaign, Urbana, in 1992, 1993, and
1997, respectively.
He was a Visiting Scientist with NEC America,
Inc., Princeton, NJ, in 1997, and in 2002 he was a
Visiting Professor with Intel, Santa Clara, CA. He
was an Assistant Professor with the Department of
Electrical and Computer Engineering, Rutgers, NJ,
and the State University of New Jersey, Piscataway, between 1997 and 2001.
From 2001 to 2006, he was an Associate Professor with the Department
of Electrical and Computer Engineering, Virginia Polytechnic Institute and
State University, Blacksburg. Since 2006, he has been a Professor with the
same department. He and his research group have published more than 180
refereed journal and conference papers. His current research interests include
very large scale integration testing, design verification, diagnosis, and power
management.
Dr. Hsiao was a recipient of the Digital Equipment Corporation Fellowship,
the McDonnell Douglas Scholarship, the National Science Foundation CAREER Award, and is recognized for most influential papers in the first 10
years (19982007) of the Design Automation and Test Conference in Europe.
He has served on the program committees for more than 40 IEEE international
conferences and workshops, in addition to serving as an Associate Editor on
ACM Transactions Design Automation of Electronic Systems, as well as on
editorial boards of several journals.

463

James Chien-Mo Li (S93M02) received the


B.S.E.E. degree from National Taiwan University,
Taipei, Taiwan, in 1993, and the M.S.E.E. and Ph.D.
degrees in electrical engineering from Stanford University, Stanford, CA, in 1997 and 2002, respectively.
He is currently an Associate Professor with the
Graduate Institute of Electronics Engineering, National Taiwan University. His current research interests include design for testability, built-in selftesting, low-power testing, and fault diagnosis.
Jiun-Lang Huang (S96M99) received the B.S.
degree in electrical engineering from National
Taiwan University, Taipei, Taiwan, in 1992, and the
M.S. and Ph.D. degrees in electrical and computer
engineering from the University of California, Santa
Barbara (UCSB), in 1995 and 1999, respectively.
From 2000 to 2001, he was an Assistant Research Engineer with the ECE Department, UCSB.
In 2001, he joined National Taiwan University and is
currently an Associate Professor with the Graduate
Institute of Electronics Engineering and the Department of Electrical Engineering. His current research interests include design
for testability, built-in self-test, and calibration for mixed-signal systems.
Lizhen Yu (M10) received the B.S. degree in computer science and applications and the M.S. degree
in nuclear technologies and applications, both from
the University of Science and Technology of China,
Hefei, China, in 2001 and 2005, respectively.
She is currently a Research and Development
Manager with Syntest Technologies, Inc., Shanghai,
China. Her current research interests include design
for testability, design methodology, simulation, and
so on, for static random access memories and digital
logic. She has published three papers in these areas.

You might also like