You are on page 1of 14

4885 – P.

1 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

Using Launch-on-Capture for Testing BIST


Designs Containing Synchronous and
Asynchronous Clock Domains
Laung-Terng Wang, Fellow, IEEE, Xiaoqing Wen, Senior Member, IEEE,
Shianling Wu, Member, IEEE, Hiroshi Furukawa, Hao-Jan Chao, Member, IEEE, Boryau Sheu,
Jianghao Guo, and Wen-Ben Jone, Senior Member, IEEE

for industrial designs demonstrate the effectiveness of


Abstract—This paper presents a new at-speed logic the proposed architecture.
Built-In Self-Test (BIST) architecture supporting two
launch-on-capture schemes, namely aligned Index Terms—Aligned Double-Capture, At-Speed
double-capture and staggered double-capture, for Self-Test, Double-Capture, Launch-on-Capture, Logic
testing multi-frequency synchronous and asynchronous BIST, Staggered Double-Capture
clock domains in a scan-based BIST design. The
proposed architecture also includes BIST debug and I. INTRODUCTION
diagnosis circuitry to help locate BIST failures. The
aligned scheme detects and allows diagnosis of
structural and delay faults among all synchronous clock
L OGIC Built-In Self-Test (BIST) [1]-[7] is a
Design-for-Testability (DFT) technique in which a
portion of a circuit on a chip, board, or system is used to test
domains, whereas the staggered scheme detects and
the digital logic circuit itself. Logic BIST is crucial for many
allows diagnosis of structural and delay faults among all
applications, in particular, for life-critical and
asynchronous clock domains. Both schemes solve the
long-standing problem of using the conventional one-hot mission-critical applications. These applications commonly
scheme which requires testing each clock domain one at found in the aerospace/defense, automotive, banking,
a time or the simultaneous scheme which requires computer, health care, networking, and telecommunications
adding isolation logic to normal functional paths across industries require on-chip, on-board, in-system, and in-field
interacting clock domains. Physical implementation is self-test to ensure the reliability of the entire system, as well
easily achieved by the proposed solution due to the use of as its ability to perform remote test and diagnosis.
a slow-speed, global scan enable signal and reduced The logic BIST technique widely used in industry is based
timing-critical design requirements. Application results on the STUMPS (Self-Test Using a MISR and Parallel Shift
register sequence generator) structure [8]. In the STUMPS
Manuscript received May 12, 2008; revised November 12, 2008. architecture, a Pseudorandom Pattern Generator (PRPG) is
Laung-Terng Wang is with Dept. of Electrical Engineering and used to generate pseudorandom patterns and shift each
Graduate Institute of Electronics Engineering at National Taiwan pattern in parallel to the inputs of the scan chains embedded
University, and SynTest Technologies, Inc., 505 S. Pastoria Ave., Suite
in a scan-based design, and a Multiple-Input Signature
101, Sunnyvale, CA 94086, USA (1-408-720-9956x200; fax:
1-408-720-9960; e-mail: wang@syntest.com). Register (MISR) is used to compact the test responses
Xiaoqing Wen is with Dept. of Creative Informatics, Kyushu Institute of shifted out of the scan chain outputs to create a signature.
Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail: After a pre-determined number of test cycles are executed,
wen@cse.kyutech.ac.jp). the final signature is then compared against an embedded
Shianling Wu is with SynTest Technologies, Inc., Princeton Junction,
NJ 08550, USA (e-mail: shianlingwu@syntest.com). golden (good circuit) signature to judge whether the circuit
Hiroshi Furukawa is with Dept. of Creative Informatics, Kyushu under test (CUT) passes or fails. As no test patterns are
Institute of Technology, Iizuka, Fukuoka 820-8502, Japan (e-mail: supplied externally, logic BIST can reduce test cost and also
hiroshi.furukawa@nms.necel.com). allow the circuit to perform in-field self-test.
Hao-Jan Chao is with SynTest Technologies, Inc., 2F, No. 27, Industry
E. Road 9, Science-Based Industrial Park Hsinchu, Taiwan (e-mail: While logic BIST offers many benefits, its real value is in
tom_chao@syntest.com.tw). providing at-speed testing for high-speed and
Boryau Sheu was formerly with SynTest Technologies, 505 S. Pastoria high-performance circuits. These circuits often contain
Ave., Suite 101, Sunnyvale, CA 94086, USA (e-mail: multiple clock domains, each running at a frequency that is
jack_sheu@sdesigns.com).
Jianghao Guo is with Dept. of Electrical and Computer Engineering, either synchronous or asynchronous to the other clock
University of Cincinnati, OH 45221, USA (e-mail: guojh@email.uc.edu). domains. Two clock domains are said to be synchronous if
Wen-Ben Jone is with Dept. of Electrical and Computer Engineering, the active edges of both clocks controlling the two clock
University of Cincinnati, OH 45221, USA (e-mail: wjone@ececs.uc.edu). domains can be aligned precisely or triggered
Digital Object Identifier 09.0909/TCAD.2009.826558
4885 – P. 2 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

simultaneously. Two clock domains are said to be faults within each clock domain (called intra-clock-domain
asynchronous if they are not synchronous. faults) or across clock domains (called inter-clock-domain
Despite its conceptual simplicity, logic BIST faces many faults). Skewed-load uses the last shift clock pulse followed
practical hurdles, especially in at-speed testing for immediately by a capture clock pulse to launch a transition
multi-clock, multi-frequency circuits. Each clock in such a and capture its output test response, respectively.
circuit controls a clock domain, whose clock skew is Double-capture uses two consecutive capture clock pulses to
minimized and which runs at a frequency either synchronous launch the transition and capture the output test response,
or asynchronous to other clock domains. The most critical respectively. In either scheme, both launch and capture clock
yet difficult part of logic BIST is how to detect pulses must be running at the domain’s operating speed or
intra-clock-domain faults and inter-clock-domain faults at-speed. The difference is that skewed-load requires the
thoroughly and efficiently with a proper capture-clocking domain’s scan enable signal SE to switch its value between
scheme. An intra-clock-domain fault originates at one the launch and capture clock pulses making SE function as a
clock domain and terminates at the same clock domain. An clock signal. Figure 1 shows sample waveforms using the
inter-clock-domain fault originates at one clock domain basic skewed-load and double-capture at-speed test
but terminates at another clock domain. schemes.
Previous STUMPS-based logic BIST schemes [9]-[11] Typically, testing a scan-based BIST design based on
have not been effectively applied in practice. The reasons skewed-load for at-speed delay fault testing can achieve
are mainly due to the need to manipulate test frequency when higher fault coverage with shorter test length [14]-[20].
the CUT contains asynchronous clock domains, and the Although some novel DFT techniques as proposed in [21]
difficulty in timing control and physical implementation. and [22] have addressed the timing problem of operating the
Alternatively, a conventional at-speed BIST scheme using scan enable signal SE at-speed for each clock domain,
one-hot clocking or simultaneous clocking would need to skewed-load can cause unwanted over-testing because more
test one clock domain at a time resulting in long test time or false paths can be exercised, and incur higher
add isolation logic to normal functional paths across implementation cost associated with the at-speed scan
interacting clock domains resulting in fault coverage loss enable signal SE. This is in sharp contrast to double-capture
across these clock domains. in which only a slow-speed, global scan enable signal GSE
These problems will be addressed in this paper, with a new for all clock domains is needed.
logic BIST architecture using launch-on-capture schemes - Therefore, this paper will focus on logic BIST architecture
aligned clocking and staggered clocking - that achieves based on double-capture. There are two known at-speed
true at-speed test quality for any multi-clock,

Capture
Launch
multi-frequency design and that is easy for physical
implementation.
It should be noted that all above-mentioned CK
capture-clocking schemes are applicable for both BIST SE
designs and scan designs. The main difference is that for Shift Shift Last Shift
BIST designs no unknown (X) values are allowed to Shift
propagate through the scan chains to reach the MISR. capture-clocking schemes: (1) one-hot double-capture that
Throughout this paper, we will assume that a
conducts capture for one clock domain at a time and (2)
STUMPS-based architecture is used and that each clock
Capture
Launch

domain contains one test clock and one scan enable signal.
The faults we will consider for comparison include
structural faults, such as stuck-at faults and bridging faults,
CK
as well as timing-related delay faults, such as path-delay
faults and transition faults. SE
The rest of the paper is organized as follows: Section 2 Shift Shift Dead Shift
Cycles
describes the background. Section 3 presents the logic BIST
architecture. Section 4 discusses at-speed timing control simultaneous double-capture that allows testing to be
issues, and Section 5 focuses on physical implementation performed on all clock domains in parallel.
issues. Section 6 shows results on several industrial designs, (a) Skewed-load (a.k.a. Launch-on-shift)
and Section 7 concludes the paper. (b) Double-capture (a.k.a. Launch-on-capture)
Fig. 1. Basic at-speed test schemes.
II. BACKGROUND
A. One-Hot Double-Capture
There are two basic capture-clocking schemes for testing
The one-hot double-capture scheme tests clock domains
multiple clock domains at-speed: (1) skewed-load (which is
one by one. A sample timing diagram for two clocks is
now commonly called launch-on-shift [LOS]) [12] and (2)
double-capture (which was called broad-side in [13] but is shown in Fig. 2. The main advantages are that (1) two
now commonly called launch-on-capture [LOC]). Both consecutive capture pulses are applied (C1-followed-by-C2
schemes are helpful for detecting structural faults and delay or C3-followed-by-C4) at their respective clock domains’
4885 – P. 3 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

frequencies (of period d1 or d2) to test intra-clock-domain main focus of this paper. In the following, we propose novel
delay faults, and (2) a single, slow-speed global scan enable logic BIST architecture and launch-on-capture schemes for
signal GSE is used to drive both clock domains. detecting intra-clock-domain faults and inter-clock-domain
faults in a multi-frequency circuit containing both
Shift Window Capture Window Shift Window Capture Window Shift Window
C1 C2 synchronous and asynchronous clock domains. The
CK1 … … …
d1 proposed launch-on-capture (or double-capture) schemes
C3 C4
CK2 … … … are intended to increase the fault coverage of the circuit. The
d2
GSE new architecture implemented with the schemes is intended
to facilitate physical implementation as well as debug and
Fig. 2. One-hot double-capture. diagnosis of ASIC devices from the system level down to the
Hence, this scheme can be used for true at-speed testing of chip level [23]-[26]. The architecture requires a BIST-ready
intra-clock-domain delay faults in both synchronous and core that has complied with all scan and BIST design rules.
asynchronous clock domains. However, this scheme suffers
from two drawbacks in that (1) it cannot be used to detect III. LOGIC BIST ARCHITECTURE USING
inter-clock-domain delay faults and (2) it has long test time. LAUNCH-ON-CAPTURE
B. Simultaneous Double-Capture
A. General Architecture
The long test time problem of one-hot double-capture can The proposed logic BIST architecture is illustrated in Fig.
be resolved by using the simultaneous double-capture 4. The BIST architecture for testing the BIST-ready core
scheme illustrated in Fig. 3. The simultaneous consists of a test pattern generator (TPG) for generating test
double-capture scheme allows testing to be performed on
stimuli, an input selector for providing pseudorandom or
all clock domains in parallel. However, since data may
top-up ATPG patterns for the core-under-test, an output
propagate from one clock domain to the other clock domain,
response analyzer (ORA) for compacting test responses, a
isolation logic (or capture-disabling circuitry), such as
AND/OR gates or multiplexers, must be added at the clock gating block (discussed in Section IV-C) for
sources, sinks, or along the normal functional paths to force generating test clocks from original or functional clocks, and
all inter-clock-domain paths to exhibit constant values of 0’s a BIST controller for coordinating the whole BIST
or 1’s at the receiver side. operation. The top-up ATPG patterns can include
compressed ATPG patterns to improve the circuit’s fault
Shift Window Capture Window Shift Window coverage during manufacturing test, when the
C1 C2 combinational-logic-based scan compression architecture as
CK1 … …
proposed in [27]-[29] is embedded in the design. The test
C3 C4 clocks are placed in a predetermined order of sequence (see
CK2 … … Section IV) so that single-capture or double-capture clock
GSE pulses can be supplied to the BIST-ready core. The self-test
operation is started by asserting the Start signal, its end is
Fig. 3. Simultaneous double-capture. indicated by the Finish signal, and its result is shown by the
Result signal. A standard IEEE 1149.1 Boundary-Scan
The major advantages of this approach are that (1) all interface under the control of the test access port (TAP)
intra-clock-domain (structural and delay) faults can be tested controller is used for loading initialization and configuration
simultaneously thus yielding much shorter test time and (2) data or for downloading internal states for fault diagnosis.
there is no need to worry about the clock skew issue between
any two clock domains, be they synchronous or TPG
asynchronous. This approach, however, requires that CK1
PRPG1 PRPG2
isolation logic be inserted across all interacting clock CK2
PS1/SpE1 PS2/SpE2
domains so each clock domain can be tested independent of
Start
all other clock domains. The insertion of isolation logic into Finish Input Selector PIs/SIs
Result
the domain boundary exposes the design to one major
TDI CCK1 TCK
drawback which is not present in one-hot double-capture: TDO Clock 1
Clock Clock
Controller CCK2 Gating TCK2 Domain C Domain POs/SOs
the added circuitry to isolate all interacting clock domains TCK Block #1 #
#2
TMS
may increase the propagation delay of the design in normal BIST-Ready Core
mode and will prevent all inter-clock-domain faults -
structural faults and delay faults - from being detected. SpC1 SpC2
As the one-hot double-capture or simultaneous MISR1 MISR2
double-capture scheme cannot detect inter-clock-domain ORA
delay faults, how to preserve all benefits of both schemes
while at the same time remove all of their drawbacks is the Fig. 4. Logic BIST architecture.
4885 – P. 4 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

B. BIST-Ready Core the CID bit register will be surrounding all BIST cores that
The BIST-ready core is a full-scan circuit (scan design) have been synthesized with their respective logic BIST
that satisfies all scan design rules. Additional circuitry may controllers under boundary-scan control. There will be an
need to be added for preventing bus conflicts at tri-state isolation wrapper for each core. With the objective to enable
buses and for disabling asynchronous set/reset signals and end-to-end debug and diagnosis from core to in-field
false paths. In addition to scan design rules, the BIST-ready applications, these CID bits are stitched to form a shift
core must also satisfy all BIST-specific design rules, such as register, a CID register, so during test, debug, and diagnosis,
X-blocking and for test point insertion (TPI). X-blocking these bits can be programmed on-the-fly and these cores are
which blocks all unknowns (X’s) from reaching the scan tested either in series or in parallel.
outputs is conducted in an intelligent way so that critical In order to debug or diagnose logic BIST cores in an
paths are avoided and that the X-blocking circuitry is placed integrated circuit, one first sets the CID bits of the BIST
as close to the X-source as possible. TPI is guided by fault cores to be diagnosed to all 1’s. These cores can then be
simulation results. In addition, all multi-cycle paths and false processed in parallel. In addition to single error diagnosis,
paths may be selected or blocked depending on test needs. the CID register can further include additional bits in each
logic BIST core, when desired, to increase diagnostic
C. TPG Circuitry and ORA Circuitry resolution for multiple errors that may arise from different
In general, clock skews between two interacting clock faults, such as stuck-type faults, delay faults, or bridging
domains in a BIST-ready core, as shown in Fig. 4, are not faults.
aggressively managed. In order to avoid additional design There are a number of benefits with this CID approach.
efforts for clock skew management in logic BIST, two First, it allows designers to enable or disable diagnosis of
PRPG-MISR pairs, one for each clock domain, can be used, selected BIST cores at any time. Second, it allows designers
even though both clock domains may operate at the same to skip the failed BIST cores and focus on diagnosis of other
frequency. However, if hardware overhead is a major BIST cores. Third, it allows designers to perform multiple
concern, one PRPG-MISR pair can be used. Also, linear error diagnosis. Finally, it allows test engineers to manage
phase shifters, (PS1 and PS2) (a.k.a. space expanders [SpE1 power consumption during production testing by selectively
and SpE2]) can be used to reduce the length of PRPGs, and choosing BIST cores for serial/parallel testing.
space compactors (SpC1 and SpC2) can be used to reduce the
E-2. Signature Diagnosis
length of MISRs.
The logic BIST (LBIST) architecture includes special DFD
D. Test Control Circuitry circuitry to help locate BIST failures down to all failed
The test control circuitry consists of a BIST controller and a signature cycles. When BIST does not pass a manufacturing
clock gating block. The inputs to the clock gating block are test or a system test, designers can utilize the DFD circuitry to
system clocks CK1 and CK2, which become CCK1 and CCK2 enter the signature diagnosis mode by loading initial seeds
after going through some buffers. The two clocks CCK1 and into the PRPG and the MISR, as well as the required number
CCK2 are in fact used by the PRPG and MISR pair as will be of LBIST cycles to perform signature diagnosis.
discussed in Section V. In addition, the clock gating block is When the logic BIST operation in the logic BIST
controlled by signals from the BIST controller to generate controller completes the selected LBIST cycles, the
test clocks TCK1 and TCK2. The timings of TCK1 and TCK2, controller will issue a cycle-end signal, halt the BIST
especially in capture mode, play a critical role in determining operation, and begin to wait for a continue signal to resume
the test capability and physical implementation easiness of its BIST operation and to reset the cycle-end signal. During
the logic BIST scheme. The BIST controller works in tandem this period, all LBIST output responses are captured
with an embedded TAP controller, which complies with the repeatedly into a test-and-diagnosis (TDR) register to form
IEEE 1149.1 Boundary-Scan standard to coordinate the test, an intermediate signature that will be shifted out for analysis.
debug, and diagnosis tasks. If the intermediate signature does not agree with the expected
signature during analysis, designers can further instruct the
E. Debug and Diagnosis Circuitry
TAP controller to inform the controller to issue the continue
In addition to Logic BIST, diagnosis of the BIST-ready signal to resume the logic BIST operation. The BIST
core at the core level and then down to the failed scan chain operations are repeated until designers have located the first
and signature cycle levels, to locate faulty scan cells and failed BIST pattern (signature cycle) or all failed BIST
logic gates, is also important. Many innovative BIST debug patterns.
and diagnosis approaches have been proposed and surveyed This cycle-based signature diagnosis approach indicates
in [5], [6], [30], and [31]. This subsection details a few that the contents of the TDR register are only sampled after a
unique design for debug and diagnosis (DFD) features. cycle-end signal is received. If there are many BIST
E-1. Core-Level Diagnosis operations to be performed in parallel, the TAP controller
must wait until all cycle-end signals have been generated. In
To facilitate test, debug, and diagnosis, each clock domain
addition to loading the initial seeds for the PRPG and the
(core) is embedded with a unique core-identifier (CID) bit to
MISR, the TDR register shown in Fig. 5 stores additional
decide whether this clock domain will be targeted or not [32].
LBIST data including the golden signature for comparison
A design for debug and diagnosis (DFD) circuitry including
with the MISR, pattern counter, scan-chain mask, cycle-mask
4885 – P. 5 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

start index, cycle-mask stop index, and programmable shift where 0<i<n, causes the MISR to receive a constant 0 for the
and capture modes. Additional seeds can also be ith chain. The method of feeding constant values to the MISR
pre-computed ahead of time. Those seeds are then loaded is called a masking mechanism. With this embedded
into the BIST core to run multiple, short, logic BIST test scan-chain masking logic, designers can run test sessions in a
sessions to further help with diagnosis. linear or binary search fashion to locate faulty chain(s).
TDO
PRPG Scan Chain MISR
P/F Pass/Fail Indicator &
X Scan -Chain Mask
&
Y Cycle -Mask Start Index
Z Cycle -Mask Stop Index
S
S
E
& I
PSM1
PC Pattern Counter E
G
Programmable Shift Mode D
PSM2
PSM
PCM Programmable Capture Mode Scan Chain
Scan-Chain
Masking Logic

SEED Initial seeds for PRPG and MISR


SIG Final Signature of MISR TDI TDO
PCM PSM PSM1
PC Z Y X P/F
TDI

Fig. 5. Contents of the test-and-diagnosis (TDR) register. Fig. 6. Masked-chain diagnosis (MCD) logic diagram.
There are a number of advantages in using this
test-and-diagnosis (TDR) register for signature diagnosis. Once a faulty chain is identified, the next task is to locate
First, it can locate failed patterns. Second, it can further the faulty scan flip-flops within the chain. The Y and Z fields
locate multiple capture errors in the scan chain. Finally, specify the start and stop indices of the scan flip-flops. The
linear search or binary search can be employed to reduce the cycle-mask start index and cycle-mask stop index will allow
diagnosis time. the logic BIST controller to skip the region (cycles) where
Since the TDR register contains a pattern counter in failed scan flip-flops reside so BIST diagnosis can still be
support of the cycle-based approach, the LBIST controller is performed on the remaining fault-free scan cells. The
capable of launching test sessions with varying pattern counts, masked-chain diagnosis feature is also helpful for unknown
i.e., a test session with its required test patterns derived from (X) masking in case the X-sources fall within a small fraction
pre-computed logic/fault simulation, or a test session with a of the scan index or range. The programmable shift and
predetermined pattern count. For diagnosis purposes, a test capture modes are to shift the scan chains at a selected,
session can be of a single pattern, thereby allowing designers reduced frequency to avoid overheating as well as to select
to apply test patterns one-by-one to sort and identify failed the number of capture clock pulses for stuck-at or delay fault
patterns for the entire test set. Alternatively, a binary search testing during capture, respectively.
can be used to speed up finding the failed patterns through
interactive test sessions between the device-under-debug and E-4. One-Chain Diagnosis
the tester (or system). As the TDR register supports reseeding The DFD circuitry further includes additional test
of the PRPG/MISR, designers can adjust the starting and mechanism to dump all scan values from the reconfigured
ending points of a test session according to test and diagnosis scan chains. Assume a failed pattern (signature cycle) has
needs. The identification of failed patterns must be done been located during signature diagnosis or XYZ diagnosis.
before proceeding to masked-chain or one-chain diagnosis to The TAP controller can then load an instruction to the TDR
minimize the time-consuming diagnosis effort. register and instruct the logic BIST controller to (1) set the
E-3. Masked-Chain Diagnosis TDR register to one-chain diagnosis (OCD) mode, and (2)
shift out the contents of all scan cells in the scan chains which
The DFD circuitry also includes the XYZ diagnosis are reconfigured as an OCD register for analysis. The
structure proposed in [33] and the improved masking reconfigured OCD register is treated as one of the several
methods developed in [34] to locate the failed scan chain(s) data register chains of the TAP controller after the diagnostic
and the associated failed scan cells in each failed scan chain. instruction register has been decoded.
An LBIST TDR register for debug or diagnosis of LBIST As the number of failed patterns could be large, all failed
failures is embedded in each core. The XYZ operation using patterns are fault graded such that the number of diagnostic
the contents of the TDR register shown in Fig. 5 in support of patterns being applied in OCD mode can be reduced without
XYZ diagnosis is illustrated in Fig. 6, where the X field affecting the resolution of finding the root cause from the
indicates which scan chain(s) to mask; Y and Z fields indicate failed combinational logic. Additional diagnostic patterns
which ranges of scan cells in the scan chain(s) to mask. can then be shifted into the scan chains to pinpoint the failed
For n scan chains, there will be n bits in the X field and n combinational logic gates. This allows the BIST core to be
AND gates in the masked-scan diagnosis (MCD) logic debugged and diagnosed at the board or system level,
feeding the MISR. A 0 set to the ith bit of the X field, X(i)=0, enabling in-field diagnosis [35].
4885 – P. 6 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

IV. TEST TIMING CONTROL


Shift Window Capture Window Shift Window
This section proposes test timing control methods for C1
capture clocking. Techniques to improve fault coverage and TCK1 … …
d1 d2 d3
ease physical implementation are then discussed. C2

A. Basic Approach TCK2 … …


GSE
The basic idea is to use an ordered sequence of capture Fig. 7. Timing control using staggered single-capture.
clocks for all clock domains in each capture window
[23]-[26]. The order can be properly selected based on test A-2. Aligned Double-Capture
requirements.
A-1. Staggered Single-Capture The inability of staggered single-capture to detect
intra-clock-domain delay faults as well as simultaneous
Single-capture is a slow-speed test technique in which double-capture to detect inter-clock-domain delay faults
only one capture pulse is applied to each clock domain. It is among synchronous clock domains can be resolved by using
the simplest approach for detecting all intra-clock-domain the aligned double-capture scheme. One aligned
and inter-clock-domain structural faults. No double-capture approach that aligns all capture edges
intra-clock-domain delay faults can be detected within each together is illustrated in Fig. 8. The approach is referred to as
clock domain. An example of low-speed test timing control capture aligned double-capture. The major advantage of
is shown in Fig. 7, where test clocks TCK1 and TCK2 are using this approach is that all intra-clock-domain and
staggered and generated by the clock gating block shown in inter-clock-domain faults for synchronous clock domains
Fig. 4. In this approach, capture pulses C1 and C2 are applied can be tested. The arrows shown in Fig. 8 indicate the delay
in a sequential or staggered order within the capture window faults that can be detected. For example, the three arrows
to test all intra-clock-domain and inter-clock-domain from TCK1 to C are used to test all intra-clock-domain delay
structural faults in the two clock domains. For synchronous faults in the clock domain controlled by TCK1, and all
clock domains, adjusting d2 will allow us to detect inter-clock-domain delay faults from TCK1 to TCK2 and
inter-clock-domain delay faults between the two clock TCK3. The remaining six arrows shown from TCK2 to C, and
domains at-speed. TCK3 to C are used to test the remaining delay faults.
For asynchronous clock domains, we often can only detect
inter-clock-domain structural faults, not inter-clock-domain C1 C
C2
delay faults. As synchronization circuits (or two-flop C3
synchronizers) are mostly used as a handshaking protocol to
transfer data between two asynchronous clock domains in TCK1
normal mode [36]-[37], it is still possible to detect
inter-clock-domain delay faults as long as the delay d2 can be TCK2
properly adjusted (or programmed) via the logic BIST
controller. Theoretically, a synchronizer can tolerate any
TCK3
delay, so testing of an inter-clock-domain delay fault across
asynchronous clock domains appears to be not required. GSE
However, an excessive delay in the “request” circuit
(including the synchronizer) or the data path across the two Fig. 8. Capture aligned double-capture.
asynchronous clock domains might not be allowed in a
design which implements a real-time system. Thus, by Since the active edges (rising edges) of the three capture
adjusting the d2 value, the excessive delay can also be pulses (see vertical dashed line C) must be aligned precisely,
detected by this clocking scheme. The value of d2 depends the circuit must contain one reference clock, and the
on the circuit delay across the two asynchronous clock frequency of all remaining test clocks must be derived from
domains under consideration, and the allowed delay between the reference clock. In the example given here, TCK1 is the
the two clock domains which mainly depends on the system reference clock operating at the highest frequency, and TCK2
requirement. and TCK3 are derived from TCK1 and designed to operate at
Since d1 and d3 can be as long as desired, a single, 1/2 and 1/4 the frequency of TCK1, respectively. Therefore,
slow-speed global scan enable signal GSE can be used. This this approach is only applicable for at-speed testing of
significantly simplifies the logic BIST physical intra-clock-domain and inter-clock-domain delay faults in
implementation for designs with multiple clock domains. synchronous clock domains.
There may be some structural fault coverage loss between A similar aligned double-capture approach is shown in
clock domains if the ordered sequence of capture clocks is Fig. 9 that aligns all first capture edges rather than second
fixed for all capture cycles. The reason is because the ordered capture edges. This approach is referred to as launch
sequence may create sequentially untestable faults which aligned double-capture. Similar to capture aligned
could be detected when the sequence is reversed. double-capture, it is also only applicable for at-speed testing
of intra-clock-domain and inter-clock-domain delay faults in
synchronous domains.
4885 – P. 7 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

Consider the three clock domains, driven by TCK1, TCK2,


Shift Window Capture Window Shift Window
and TCK3, again. The eight arrows among the dashed line C
C1 C2
and the three capture pulses, C1, C2, and C3, indicate that the TCK1 … …
intra-clock-domain and inter-clock-domain delay faults that d1 d2 d3 d4 d5
C3 C4
can be tested. Unlike Fig. 8, however, in order to detect the
TCK2 … …
inter-clock-domain delay faults from TCK1 to TCK3, a
GSE
special shift pulse C4 is required. As this method requires a
much more complex timing-control scheme, a clock Fig. 10. Timing control using staggered double-capture.
suppression circuit similar to that proposed in [38] needs to
be used to enable or disable the selected capture pulses. The B. Fault Detection Capability
dotted clock pulses shown in the figure indicate the
Modern VLSI designs often contain synchronous and
suppressed capture pulses.
asynchronous clock domains. To maximize BIST (structural
The main advantages of both aligned double-capture
approaches are that (1) all intra-clock-domain faults and and delay) fault coverage, this section discusses the fault
inter-clock-domain faults can be detected and (2) a single, detection capability associated with each timing control
slow-speed global scan enable signal GSE is used. Hence, diagram.
both approaches can be used for true at-speed testing of
synchronous clock domains. However, one major drawback B-1. Intra-Clock-Domain Fault Detection
is that precise alignment of the capture pulses is still Intra-clock-domain fault detection is relatively easy by
required. using an ordered sequence of capture clocks for all clock
domains in each capture window. For each clock domain, a
C
single clock pulse is used to detect structural faults in
Capture Window
low-speed testing, while two at-speed clock pulses are used
C1 C4
to detect timing-related faults in at-speed testing. It is
TCK1 preferable to use the double-capture scheme as it detects not
C2
only structural faults but also timing-related faults.
TCK2
C3 B-2. Inter-Clock-Domain Fault Detection
TCK3 Inter-clock-domain fault detection is more complex,
GSE especially for timing-related delay faults. Figure 10 shows
four timing waveforms for detecting inter-clock-domain
Fig. 9. Launch aligned double-capture. faults from the clock domain driven by TCK1 to the clock
domain driven by TCK2.
A-3. Staggered Double-Capture If structural faults are to be detected, then delay d can be
The staggered double-capture scheme solves the capture adjusted to be larger than the clock-skew between the two
alignment problem in the aligned double-capture approach. clock domains. This adjustment is easy. On the other hand, if
An example at-speed timing control diagram is shown in Fig. timing-related delay faults also need to be detected, then
10. In the capture window, two capture pulses are generated delay d should be further adjusted to satisfy the specified
for each clock domain. The first two capture pulses (C1 and timing relation between the two clock domains. Generally,
C3) are used to create transitions at the outputs of scan cells, the waveform of Fig. 11d can achieve higher
and the output responses to the transitions are captured by inter-clock-domain delay fault coverage since a pattern of
the second two capture pulses (C2 and C4), respectively. higher randomness is applied and fault effects can be
Both delays d2 and d4 are set to their respective domains’ captured immediately.
operating frequencies. Since d1, d3, and d5 can be adjusted to We conducted an experiment using the logic BIST
any length, we can simply use a single, slow-speed global product TurboBIST-Logic [39] developed by SynTest
scan enable signal GSE for driving all clock domains. Hence, Technologies to compare the transition delay fault detection
this approach can provide at-speed testing of capabilities of the waveforms shown in Fig. 11. The circuit
intra-clock-domain delay faults within each clock domain. consisted of 11K primitives (instances) and had three clock
Similar to staggered single-capture, for both synchronous domains driven by cclk, mclk, and pclk, respectively. For the
and asynchronous clock domains, adjusting d3 will enable sake of clarity, we disabled cclk and only explored the fault
detection of inter-clock-domain delay faults and detection capabilities for the transition delay faults across
inter-clock-domain structural faults. Since a single GSE the clock-domain logic block between the clock domain
signal is used, this scheme significantly eases physical driven by mclk and the one driven by pclk. As shown in
implementation and allows designers to integrate logic BIST Table I, the waveform shown in Fig. 11d can achieve the
with scan/ATPG easily in order to improve the circuit’s highest inter-clock-domain transition fault coverage. If cclk
manufacturing fault coverage. is enabled, one would expect the resulting transition fault
coverage to be much higher than that reported in Table I.
4885 – P. 8 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

(1) One-hot double-capture can yield the highest fault


Shift Window Capture Window Shift Window coverage for intra-clock-domain delay faults, but cannot
C1 C2 detect inter-clock-domain delay faults.
TCK1 … … (2) Simultaneous double-capture can only detect
d
C3 C4 intra-clock- domain structural and delay faults, but
TCK2 … … cannot detect any inter-clock-domain structural or delay
GSE faults.
(a) Two capture pulses followed by two capture pulses (3) Staggered single-capture can yield the highest fault
(staggered double-capture) coverage for detecting intra-clock-domain structural
faults in both synchronous and asynchronous clock
Shift Window Capture Window Shift Window
C1 C2
domains, but cannot detect any intra-clock-domain
TCK1 … … delay faults.
d (4) Aligned double-capture can yield the highest fault
C3
… … coverage for detecting all intra-clock-domain and
TCK2
inter-clock- domain delay faults in synchronous clock
GSE
domains, but is not applicable for testing asynchronous
clock domains; hence, it is best suited for testing
(b) Two capture pulses followed by one capture pulse
synchronous clock domains.
Shift Window Capture Window Shift Window (5) Staggered double-capture can also detect inter-clock-
C1 domain structural and delay faults that aligned double-
TCK1 … … capture cannot detect; hence, it is best suited for testing
d
C2 C3 asynchronous clock domains.
TCK2 … …
GSE The summary further indicates that it is preferred to use a
hybrid clocking scheme that includes aligned double-capture
(c) One capture pulse followed by two capture pulses for testing synchronous clock domains and staggered
double-capture for testing asynchronous clock domains.
Shift Window Capture Window Shift Window
C1 Alternatively, one may consider testing all synchronous and
TCK1 … … asynchronous clock domains in multiple test sessions using a
d hybrid scheme that includes simultaneous double-capture
C2
… … and staggered single-capture. The drawbacks are the
TCK2
GSE
complexity of the BIST controller for providing multiple test
sessions and the need to add isolation logic across all
(d) One capture pulse followed by one capture pulse interacting clock domains.
(staggered single-capture) TABLE II
INTRA-CLOCK-DOMAIN FAULT DETECTION CAPABILITY
Fig. 11. Inter-clock-domain delay fault test timing.
(S: STRUCTURAL FAULTS DETECTED; D: DELAY FAULTS
TABLE I DETECTED)
INTER-CLOCK-DOMAIN DELAY FAULT DETECTION Timing Control Scheme Synchronous Asynchronous
CAPABILITY Intra-domain Intra-domain
One-hot double-capture (S, D) (S, D)
Test Timing Fault Coverage Simultaneous double-capture (S, D) (S, D)
Fig. 11a 61.11% Staggered single-capture (S) (S)
Aligned double-capture (S, D) −
Fig. 11b 61.11% Staggered double-capture (S, D) (S, D)
Fig. 11c 84.92%
TABLE III
Fig. 11d 87.70% INTER-CLOCK-DOMAIN FAULT DETECTION CAPABILITY
B-3. Fault Detection Summary (S: STRUCTURAL FAULTS DETECTED; D: DELAY FAULTS
DETECTED)
Tables II and III show the type of intra-clock-domain and
inter-clock-domain faults in synchronous and asynchronous Synchronous Asynchronous
clock domains that can be detected by the above-mentioned Inter-domain Inter-domain
timing control schemes described in Sections II and IV. One-hot double-capture (S) (S)
Each scheme has its advantages and disadvantages. For Simultaneous double-capture − −
Staggered single-capture (S, D) (S, D)
example:
Aligned double-capture (S, D) −
Staggered double-capture (S, D) (S, D)
4885 – P. 9 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

C. Capture Clock Generation switching operation mode from capture to shift. A timing
In order to generate an ordered sequence of waveform is shown in Fig. 14 where the delay d is properly
double-capture clocks, one can use clock suppression, adjusted depending on whether inter-clock-domain
daisy-chain clock-triggering, or token-ring clock-enabling. structural or delay faults are to be detected or not.
Generally, the clock suppression technique as proposed in
[38] is more suitable for testing synchronous clock domains, GSE
whereas the daisy-chain clock-triggering technique or the
TCK1
token-ring clock-enabling technique as proposed in
[23]-[25] is more suitable for testing asynchronous clock TCK2
domains. Other design flavors of on-chip clock controllers d
can also be found in [40] and [41].
The clock suppression scheme typically requires using a Fig. 14. Daisy-chain clock-triggering.
reference clock operating at the highest frequency. Figure 12
shows an example launch aligned double-capture for two The token-ring clock-enabling technique is very similar to
interacting clocks. Figure 13 shows a clock suppression the daisy-chain clock-triggering technique. The only
circuit for generating the launch aligned double-capture difference between them is that the former uses a clock edge
waveform given in Fig. 12. This circuit uses a reference to trigger the next event, while the latter uses a signal level to
clock (CK1) to program the capture window. The contents of enable the next event. Figure 15 shows a daisy-chain
the 8-bit shift register are preset to {0011, 1111} during each clock-triggering circuit for generating the staggered
shift window. Due to its programmability, the approach can double-capture waveform given in Fig. 14. When the BIST
also be used to generate timing waveforms for testing mode is activated, the SE1/SE2 generators and 2-pulse
asynchronous designs. One major requirement is that controllers will generate the required scan enable and
depending on needs, the delay measured by the number of double-capture clock pulses per the arrows shown in Fig. 14.
reference clock pulses be equal to or longer than delay d Each SE1/SE2 can be treated as a GSE signal for CD1/CD2.
between C2 and C3 as shown in Fig. 11a. A novel clock SE1
gating circuit for generating staggered single-capture clocks BIST SE1 2-Pulse TCK1
for inter-clock-domain at-speed testing of synchronous mode Generator Controller
clock domains can also be found in [42]. CK1
CK1

SE2
SE2 2-Pulse
Shift Window Capture Window Shift Window TCK2
Generator Controller
C1 C2 CK2
TCK1 … … CK2

C3 C4
Fig. 15. A daisy-chain clock-triggering circuit for
… … generating the waveform given in Fig. 14.
TCK2
GSE
V. PHYSICAL IMPLEMENTATION ISSUES
Fig. 12. An example launch aligned double-capture
clocking. A major difference between ATE-based scan test and
logic BIST is that the latter requires a more complex self-test
circuitry be implemented together with the functional
‘0’
0 0 1 1 TCK1 circuitry. Successfully conducting the physical
CK1
implementation of the functional circuitry for a high-speed
CK1 and high-performance design is in itself a big challenge. If
BIST GSE ‘0’ 1 1 1 1 TCK2 the self-test circuitry adds a large number of critical signals
GSE CK1
mode Generator and requires strict clock-skew management, the physical
CK2 implementation of logic BIST can become prohibitively
difficult.
Fig. 13. A clock suppression circuit for generating the The proposed logic BIST scheme employs several
waveform given in Fig. 12. techniques to ease physical implementation. Of the most
The daisy-chain clock-triggering technique means that the significance, a slow-speed global scan enable signal GSE is
completion of the shift-in operation triggers the GSE signal used to greatly reduce the clock-skew management
to become 0, switching operation mode from shift to capture. complexity and a PRPG-MISR pair is used for each clock
This in turn triggers the generation of two at-speed clock domain to avoid layout routing congestion. One more
pulses for the first clock domain, the rising edge of the technique using re-timing logic to control clock skew among
second capture clock pulse triggers the generation of two each group of PRPG, scan chain, and MISR is illustrated in
at-speed clock pulses for the second clock domain, and so Fig. 16 is illustrated in Fig. 16.
on. Finally, the rising edge of second capture clock pulse for
the last clock domain triggers the GSE signal to become 1,
4885 – P. 10 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

VI. EXPERIMENTAL RESULTS


Shift
The logic BIST architecture proposed in this paper has
been successfully implemented in many industrial designs
PRPG1 PS1/SpE1 Scan Chain A SpC1 MISR1
[23], [26], [42]. The TurboBIST-Logic tool [39] developed
CCK1 TCK1 C Capture CCK1
by SynTest Technologies was used for logic BIST
implementation. The tool allows for designing a logic BIST
PRPG2 PS2/SpE2 Scan Chain B SpC2 MISR2 system at the register-transfer level (RTL) or gate level
based on:
CCK2 TCK CCK2
Fig. 16. Controlled clock 2skew management. • The type of logic BIST architecture to adopt.
• The number of PRPG-MISR pairs to use.
For the sake of clarity, Fig. 16 shows only two scan chains,
• The length of each PRPG-MISR (or PEPG-MISR) pair.
one each in two different clock domains. During shifting, the
• The stuck-at and transition faults to be detected and
PRPG-MISR pair and its associated scan chain are
BIST timing control diagrams to be used for detecting
reconfigured as one shift register. Since the PRPG and the
these faults in synchronous clock domains and
MISR are mostly placed far from the scan chain, timing
asynchronous clock domains.
violations may occur between the PRPG and the scan chain
inputs as well as between the scan chain outputs and the • The types of optional logic to be added in the BIST
MISR. system to ease physical implementation, facilitate debug
To facilitate physical design, we propose a technique that and diagnosis, as well as improve the circuit’s fault
always makes the CCK clock that drives the PRPG and the coverage.
MISR arrive earlier than the TCK shift clock that drives the
scan chain. With this approach, only hold-time violations The tool consists of three major steps in designing the
may occur from the PRPG to the scan chain inputs, while logic BIST system once all decisions regarding the logic
only setup-time violations may occur from the scan chain BIST architecture are made. They are described below.
outputs to the MISR. In this case, the hold-time violations A. BIST Rule Checking and Violation Repair
can be corrected with re-timing D flip-flops, whereas the The first step is to perform logic BIST design rule
setup-time violations can be avoided by reducing logic checking on the RTL or gate-level design. All DFT rule
levels from the scan chain outputs to the MISR or be violations of the scan design rules and BIST-specific design
corrected with re-timing D flip-flops. The re-timing logic rules as given in [39] must be repaired. Once all DFT rule
should consist of at least one negative-edge pipelining violations are repaired, the design which is referred to as a
register (D flip-flop) and one positive-edge pipelining BIST-ready core is considered to have followed all scan and
register (D flip-flop). logic BIST design rules.
Figure 16 illustrates an example re-timing logic among the
PRPG, a scan chain, and the MISR, using two pipelining B. RTL BIST Synthesis
registers on each end. Note that the two clocks, CCK and Then, the tool automatically creates the RTL logic BIST
TCK, could belong to one clock tree with a small phase shift; controller. The capture-clocking schemes supported in the
the space expander and space compactor as shown in Fig. 16 BIST controller includes capture aligned double-capture for
can then be ignored. By making the clock-skew problem synchronous clock domains and staggered double-capture
under control in this way, we can significantly simplify for asynchronous clock domains. The number of scan chains
physical implementation. for each clock domain is specified along with the names of
Most importantly, the proposed logic BIST scheme does their associated scan inputs (SIs) and scan outputs (SOs)
not add any isolation logic along the paths from scan chain B without inserting the actual scan chains into the circuit. The
to scan chain A through the combinational logic block C. scan synthesis task can be handled as part of the general
This greatly reduces design complexity and avoids potential synthesis task, implemented using any (commercially
performance degradation. available) synthesis tool for converting the RTL BIST-ready
core and the logic BIST system into a gate-level netlist.
P D Q D Q D Q D Q M
R I C. Design Verification and Fault Coverage
P Scan Chain S Enhancement
CK CK CK CK
G R
Finally, Verilog testbenches are generated and the
CCK synthesized netlist is verified with functional and/or timing
TCK CCK
verification to ensure that the logic BIST system functions as
Fig. 17. Re-timing logic among the PRPG, a scan chain, intended. Fault simulation is then performed on the
and the MISR. pseudorandom patterns generated by the PRPGs to
determine the circuit’s fault coverage. If the circuit does not
reach the target fault coverage goal, additional test points
(including control points and/or observation points) can be
4885 – P. 11 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

selected and inserted in the BIST-ready core, or top-up results indicate that it is possible to achieve more than 93%
ATPG patterns (including compressed ATPG patterns when BIST intra-clock-domain transition fault coverage (in all
available) can be used during manufacturing test, to increase synchronous and asynchronous clock domains) when 1000
the circuit’s fault coverage. The tool allows adding extra test observation points were inserted into the design. Because
points in advance at the RTL design with the hope to achieve each core has already implemented scan chains, we can only
the target fault coverage goal. Otherwise, the test point obtain the BIST area overhead that includes the
insertion can be inserted at the gate-level and the fault PRPG/MISR pairs, the BIST controller, and additional
simulation process is repeated until the final fault coverage circuits to insert the space expanders and observation points,
goal is reached. An example fault simulation and test point block all unknown (X) signals potentially propagated to the
insertion flow is illustrated in Fig. 18. MISRs, and mask off all multi-cycle paths and false paths.
In addition, we implemented logic BIST on two large
Test Point Selection at RTL Design industrial designs, ranging from 15 to 20 million primitives.
Table V shows the experimental results. Again, we
implemented the staggered double-capture architecture in all
Logic/Scan Synthesis
clock domains of each design to reduce physical
implementation efforts. We calculated both BIST transition
Fault Simulation fault coverage in all synchronous and asynchronous clock
domains using the intra-clock-domain transition fault model
Gate-Level Test No Coverage and the BIST stuck-at fault coverage using the staggered
Point Insertion Acceptable ? double-capture patterns in each design. Knowing these BIST
Yes coverage numbers allows for top-up (transition and stuck-at)
ATPG patterns to be added at a later stage to increase the
Done
circuit’s fault coverage during manufacturing test. Both
Fig. 18. Fault simulation and test point insertion flow. designs have been successfully taped out and worked the
first time on manufactured chips.
D. Industrial Design Examples
TABLE V
Based on the TurboBIST-Logic tool capabilities, we have EXPERIMENTAL RESULTS FOR INDUSTRIAL DESIGNS
implemented the staggered double-capture architecture in CKT1 CKT2
two commercially popular CPU cores [23] and added # of Primitives 15M 20M
observation points to improve the fault coverage of each # of Flip-Flops 673K 1.61M
core. The circuit statistics and experimental results for the # of Clock Domains 6 9
two IP cores are shown in Table IV. Operating Frequency 266MHz 533MHz
TABLE IV # of PRPG/MISR Pairs 6 9
EXPERIMENTAL RESULTS FOR IP CORES X and Y PRPG Length Range 19-27 19-24
MISR Length Range 19-253 21-220
Core X Core Y
# of BIST Patterns 64,000 64,000
# of Primitives 218,100 633,400
BIST Fault Coverage
# of Flip-Flops 10,300 33,200 – Stuck-At Faults 87.79% 89.38%
# of Clock Domains 2 8 – Transition Faults 86.63% 82.32%
Operating Frequency 250MHz 330MHz
# of PRPG/MISR Pairs 2 8 BIST Area Overhead 2.39% 2.07%
PRPG Length Range 19 19
MISR Length Range 1: 19 / 1: 99 7: 19 / 1: 80 The results show that each design can run at its intended
# of Observation Points 1,000 1,000 operating frequency. For a design containing millions of
# of BIST Patterns 20,000 20,000 gates, the BIST overhead becomes a fraction of the design.
BIST Fault Coverage Once again, because each circuit has already implemented
– Transition Faults 93.82% 93.22% scan chains, the BIST area overhead is computed similar to
BIST Area Overhead 4.40% 3.20% that used in the two CPU cores. However, obtaining better
BIST fault coverage, for example, 95% or higher, is a
During the implementation, we also chose to: (1) construct
challenge without adding additional fault coverage
one PRPG-MISR pair for each clock domain because there
improvement features, such as test point insertion or other
were crossing clock-domain logic between any two clock
techniques discussed in [5]-[6] and [43]-[45]. This is typical
domains, (2) insert scan cells into all PIs and POs so as to
for BIST designs when only pseudorandom patterns and
increase intra-clock-domain transition fault coverage, and
launch-on-capture schemes are used.
(3) skip inserting space compactors between scan chain
Table VI shows the circuit statistics and the combined
outputs and the MISRs in order to avoid setup-time
BIST and ATPG experimental results for another industrial
violations. This is why there were two long MISRs, one with
design that has been in production [42]. The results
99 bits in Core X and the other with 80 bits in Core Y. Such a
demonstrated how the BIST design is coupled with top-up
MISR is generally related to the main and large clock
ATPG to detect 96.9% intra-clock-domain transition faults
domain that contains a larger number of scan chains. The
4885 – P. 12 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

in all synchronous and asynchronous clock domains. In this inter-clock-domain transition faults between two
experiment, 64K random patterns using logic BIST and synchronous (or two asynchronous) clock domains.
8,183 deterministic patterns using top-up ATPG were
applied to the design. VII. CONCLUSIONS
To detect inter-clock-domain transition faults in all Delay fault testing based on launch-on-capture is
synchronous clock domains, additional top-up ATPG commonly practiced in industry due to the ease of using a
patterns were also generated and applied using the slow-speed global scan enable signal. When a BIST design
inter-clock at-speed control scheme proposed in [42]. The contains a mix of synchronous and asynchronous clock
results are shown in Table VII. In this case, 6 inter-clock domains, the conventional one-hot or simultaneous
logic blocks, A ~ F, were targeted. For example, A is a logic double-capture scheme cannot detect any
block from a 100MHz clock to a 300MHz clock, and inter-clock-domain delay faults. This makes logic BIST
contains 36,858 transition faults. Table VII shows the even more difficult to achieve high BIST fault coverage
generated ATPG patterns, fault coverage, and CPU time. when pseudorandom patterns are mainly used for testing.
Furthermore, the area overhead in this application was This paper presented new at-speed logic BIST
mainly due to 6 inter-clock enable generators, each architecture using double-capture (a.k.a. launch-on-capture)
containing 20 flip-flops and consuming an area of roughly for testing BIST designs containing multi-frequency
124 equivalent 2-input NAND gates. synchronous and asynchronous clock domains. To facilitate
TABLE VI debug and diagnosis, the BIST architecture also includes
INTRA-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3 BIST diagnosis logic to help locate BIST failures.
CKT3 It was shown that the aligned double-capture scheme
# of Primitives 11.1M employed in the architecture is most suitable for testing
# of Flip-Flops 404.9K synchronous clock domains to achieve true at-speed test
# of Clock Domains 11 quality, whereas the staggered double-capture scheme
Min. Frequency 66MHz employed is most suitable for testing asynchronous clock
Max. Frequency 533MHz domains. Physical implementation becomes easier due to the
# of Scan Chains 32
use of a slow-speed global scan enable signal and reduced
Maximum Chain Length 13,598
# of BIST Patterns 64,000
timing-critical design requirements. If structural faults are
# of Top-Up ATPG Patterns 8,183 only considered for detection and diagnosis, the BIST
Intra-Domain Fault Coverage architecture built upon the staggered single-capture scheme
– Transition Faults 96.9% can result in highest fault coverage with lowest BIST
overhead. Application results for several industrial designs
TABLE VII
INTER-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3 have demonstrated the effectiveness of the proposed
architecture. These results further indicated that the
Inter- proposed double-capture schemes can reach high BIST fault
ATPG
Clock From To # of coverage. For designs containing both synchronous and
Logic (MHz) (MHz) Faults # of Fault CPU
Blocks Vec. Cov. (h:m) asynchronous clock domains, a hybrid clocking scheme
A 100 300 36858 232 86.4 4:30 using aligned double-capture and staggered double-capture
B 133 533 8350 32 100 0:15 is proposed; however, challenges still lie ahead with regard
C 133 266 4942 36 100 0:14 to how to increase the BIST transition fault coverage of the
D 533 133 1940 9 100 0:10 design to a much more acceptable level, say 95% or above,
E 266 533 732 9 100 0:10 and how to locate BIST failures more effectively during
F 266 133 64 3 100 0:09 debug and diagnosis from the system level down to the chip
level.
In the case of detecting inter-clock-domain delay faults in
asynchronous clock domains, unfortunately, we have not ACKNOWLEDGMENTS
been able to conduct true experiments because all industrial The authors are grateful to the anonymous referees for
chips taped out as of today have not been designed for such pointing out unclear descriptions of the paper and giving
purpose. However, as discussed earlier, our proposed constructive suggestions. The authors also would like to
staggered single or double capture scheme can deal with thank Dr. B. Cheon and E. Lee of Samsung Electronics in
such cases by carefully adjusting the d value given in Figs. Korea, Tomotaka Odajima of Marubeni Information
10a or 10d. Consequently, delay fault testing of Systems in Japan, and many colleagues of SynTest
inter-clock-domain faults in asynchronous clock domains is Technologies in the US, Korea, China, and Taiwan for
exactly the same as that in synchronous clock domains. providing the experimental results listed in Tables IV to VII.
Table I has demonstrated via an industrial design how This work was supported in part by National Science
various clocking schemes affect the fault coverage of Foundation of USA under grant CCF-0541103.
4885 – P. 13 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

REFERENCES [26] J. Qian, X. Wang, Q. Yang, F. Zhuang, J. Jia, X. Li, Y. Zuo, J.


Mekkoth, J. Liu, H.-J. Chao, S. Wu, H. Yang, L. Yu, F. Zhao, and
[1] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems L.-T. Wang, “Logic BIST architecture for system-level test and
Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990. diagnosis,” in Proc. IEEE Asian Test Symp., 2009.
[2] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing [27] L.-T. Wang, X. Wen, H. Furukawa, F.-S. Hsu, S.-H. Lin, S.-W. Tsai,
for Digital, Memory & Mixed-Signal VLSI Circuits. Boston, MA: K. S. Abdel-Hafez, and S. Wu, VirtualScan: A new compressed
Springer, 2000. scan technology for test cost reduction, in Proc. IEEE Int. Test
[3] C. E. Stroud, A Designer’s Guide to Built-In Self-Test. Boston, MA: Conf., 2004, pp. 916-925.
Springer, 2002. [28] L.-T. Wang, X. Wen, S. Wu, Z. Wang, Z. Jiang, B. Sheu, and X. Gu,
[4] N. K. Jha and S. K. Gupta, Testing of Digital Systems. London, UK: VirtualScan: Test compression technology using combinational
Cambridge University Press, 2003. logic and one-pass ATPG, IEEE Design & Test of Computers, vol.
[5] L.-T. Wang, C.-W. Wu, and X. Wen, Eds., VLSI Test Principles and 25, no. 2, pp. 122-130, March-April 2008.
Architectures: Design for Testability. San Francisco, CA: Morgan [29] L.-T. Wang, H.-P. Wang, X. Wen, M.-C. Lin, S.-H. Lin, D.-C. Yeh,
Kaufmann, 2006. S.-W. Tsai, and K. S. Abdel-Hafez, “Method and Apparatus for
[6] L.-T. Wang, C. E. Stroud, and N. A. Touba, Eds., System-on-Chip Broadcasting Scan Patterns in a Scan Based Integrated Circuit,”
Test Architectures: Nanometer Design for Testability. San U.S. Patents 7 412 637 and 7 412 672, Aug. 12, 2008.
Francisco, CA: Morgan Kaufmann, 2007. [30] W.-T. Cheng, M. Sharma, T. Rinderknecht, L. Lai, and C. Hill,
[7] L.-T. Wang, Y.-W. Chang, and K.-T. Cheng, Eds., Electronic “Signature-based diagnosis for logic BIST,” in Proc. IEEE Int. Test
Design Automation: Synthesis, Verification, and Test. San Conf., 2006, pp. 265-273.
Francisco, CA: Morgan Kaufmann, 2009. [31] Y. Huang, R. Guo, W.-T. Cheng, and J. C.-M. Li, “Survey of Scan
[8] P. H. Bardell and W. H. McAnney, “Self-testing of multiple logic Chain Diagnosis,” IEEE Design & Test of Computers, vol. 25, no. 3,
modules,” in Proc. IEEE Int. Test Conf., 1982, pp. 200-204. pp. 240-248, May-June 2008.
[9] B. Nadeau-Dostie, A. Hassan, D. Burek, and S. Sunter, “Multiple [32] L.-T. Wang, M.-T. Chang, S.-H. Lin, H.-J. Chao, J. Lee, H.-P. Wang,
Clock Rate Test Apparatus for Testing Digital Systems,” U.S. X. Wen, P.-C. Hsu, S.-C. Kao, M.-C. Lin, S.-W. Tsai, and C.-C. Hsu,
Patent 5 349 587, Sept. 20, 1994. “Method and Apparatus for Diagnosing Failures in an Integrated
[10] S. Bhawmik, “Method and Apparatus for Built-In Self-Test with Circuit Using Design-for-Debug (DFD) Techniques,” European
Multiple Clock Circuits,” U.S. Patent 5 680 543, Oct. 21, 1997. Patent 1 364 436, May 24, 2006; also in U.S. Patent 7 284 175, Oct.
[11] G. Hetherington, T. Fryars, N. Tamarapalli, M. Kassab, A. Hassan, 16, 2007.
and J. Rajski, “Logic BIST for large industrial designs: Real issues [33] J. Ghosh-Dastidar and N. A. Touba, “A rapid and scalable diagnosis
and case studies,” in Proc. IEEE Int. Test Conf., 1999, pp. 358-367. scheme for BIST environments with a large number of scan chains,”
[12] J. Savir and S. Patil, “Scan-based transition test,” IEEE Trans. on in Proc. IEEE VLSI Test Symp., 2000, pp. 79-85.
Computer-Aided Design, vol. 12, no. 8, pp. 1232-1241, Aug. 1993. [34] K. S. Abdel-Hafez, X. Wen, L.-T. Wang, P.-C. Hsu, S.-C. Kao,
[13] J. Savir and S. Patil, “Broad-side delay test,” IEEE Trans. on H.-J. Chao, and H.-P. Wang, “Method and Apparatus for Debug,
Computer-Aided Design, vol. 13, no. 8, pp.1057-1064, Aug. 1994. Diagnosis, and Yield Improvement for Scan-Based Integrated
[14] S. Wang, X. Liu, and S. T. Chakradhar, “Hybrid delay scan: A low Circuits,” U.S. Patent 7 058 869, June 6, 2006.
hardware overhead scan-based delay test technique for high fault [35] L.-T. Wang, X. Wen, K. S. Abdel-Hafez, S.-H. Lin, H.-P. Wang,
coverage and compact test sets,” in Proc. IEEE/ACM Design, M.-T. Chang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, and C.-C. Hsu,
Automation and Test in Europe Conf., 2004, pp. 1296-1301. “Method and Apparatus for Unifying Self-Test with Scan-Test
[15] J. Abraham, U. Goel, and A. Kumar, “Multi-cycle sensitizable During Prototype Debug and Production Test,” U.S. Patent 7 444
transition delay faults,” in Proc. IEEE VLSI Test Symp., 2006, pp. 567, Oct. 28, 2008.
306-311. [36] C. Dike and E. Burton, “Miller and noise effects in a synchronizing
[16] Z. Zhang, S. M. Reddy, I. Pomeranz, X. Lin and J. Rajski, “Scan flip-flop,” IEEE J. of Solid-State Circuits, vol. 34, no. 6, pp.
tests with multiple fault activation cycles for delay faults,” in Proc. 849-855, June 1999.
IEEE VLSI Test Symp., 2006, pp. 343-348. [37] R. Ginosar, “Fourteen ways to fool your synchronizer,” in Proc.
[17] N. Ahmed and M. Tehranipoor, “Improving Transition delay test IEEE Int. Symp. on Asynchronous Circuits and Systems, 2003, pp.
using a hybrid method,” IEEE Design & Test of Computers, vol. 23, 89-96.
no. 5, pp. 402-412, Sept.-Oct. 2006. [38] M. Beck, O. Barondeau, M. Kaibel, F. Poehl, X. Lin, and R. Press,
[18] G. Xu and A. D. Singh, “Achieving high transition delay fault “Logic design for on-chip test clock generation - Implementation
coverage with partial DTSFF scan chains,” in Proc. IEEE Int. Test details and impact on delay test quality,” in Proc. IEEE/ACM
Conf., 2007, Paper 17.1. Design, Automation and Test in Europe Conf., 2005, pp. 56-61.
[19] B. Nadeau-Dostie, K. Takeshita, and J.-F. Côté, “Power-aware [39] TurboBIST-Logic User’s Manual, SynTest Technologies,
at-speed scan test methodology for circuits with synchronous Sunnyvale, CA, USA, 2009. (http:www.syntest.com)
clocks,” in Proc. IEEE Int. Test Conf, 2008, Paper 9.3. [40] B. Keller, A. Uzzaman, B. Li, and T. Snethen, “Using
[20] I. Park and E. J. McCluskey, “Launch-on-shift-capture transition programmable on-product clock generation (OPCG) for delay test,”
tests,” in Proc. IEEE Int. Test Conf, 2008, Paper 35.3. in Proc. IEEE Asian Test Symp., 2007, pp. 69-72.
[21] G. Xu and A. D. Singh, “Low Cost LOS Delay Test with Slow Scan [41] X.-X. Fan, Y. Hu, and L.-T. Wang, “An on-chip test clock control
Enable,” in Proc. IEEE European Test Symp., 2006, pp. 9-14. scheme for multi-clock at-speed testing,” in Proc. IEEE Asian Test
[22] G. Xu and A. D. Singh, “Delay test scan flip-flop: DFT for high Symp., 2007, pp. 341-348.
coverage delay testing,” in Proc. Int. Conf. on VLSI Design, 2007, [42] H. Furukawa, X. Wen, L.-T. Wang, B. Sheu, Z. Jiang, and S. Wu,
pp. 763-768. “A novel and practical control scheme for inter-clock at-speed
[23] B. Cheon, E. Lee, L.-T. Wang, X. Wen, P. Hsu, J. Cho, J. Park, H. testing,” in Proc. IEEE Int. Test Conf., 2006, Paper 17.2.
Chao, and S. Wu, “At-speed logic BIST for IP cores,” in Proc. [43] H.-C. Tsai, K.-T. Cheng, and S. Bhawmik, “Improving the test
IEEE/ACM Design, Automation and Test in Europe Conf., 2005, pp. quality for scan-based BIST using a general test application
860-861. scheme,” in Proc. ACM/IEEE Design Automation Conf., 1999, pp.
[24] L.-T. Wang, X. Wen, P.-C. Hsu, S. Wu, and J. Guo, “At-speed logic 748–753.
BIST architecture for multi-clock designs,” in Proc. IEEE Int. Conf. [44] Y. Li, S. Makar, and S. Mitra, “CASP: Concurrent autonomous chip
on Computer Design, 2005, pp. 475-478. self-test using stored test patterns,” in Proc. IEEE/ACM Design,
[25] L.-T. Wang, P.-C. Hsu, S.-C. Kao, M.-C. Lin, H.-P. Wang, H.-J. Automation and Test in Europe Conf., 2008, pp. 885-890.
Chao, and X. Wen, “Multiple-Capture DFT System for Detecting or [45] L.-T. Wang, H. S. Hsiao, H.-J. Chao, Z. Jiang, S. Wu, and J. Yan,
Locating Crossing Clock-Domain Faults During Self-Test or “Method and Apparatus for Delay Fault Coverage Enhancement,”
Scan-Test,” U.S. Patent 7 007 213, Feb. 28, 2006; also in European U.S. Patent Application 12/554,437, Sept. 4, 2009.
Patent 1 360 513, Apr. 2, 2008.
4885 – P. 14 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

Laung-Terng (L.-T.) Wang (M’87–SM’04–F’08) received his BSEE and


MSEE degrees from National Taiwan University in 1975 and 1977, Hiroshi Furukawa received his B.S. degree in Electrical Engineering from
respectively, and his MSEE and EE Ph.D. degrees under the Honors Kumamoto University, Japan, in 1992.
Cooperative Program (HCP) from Stanford University in 1982 and 1987, He joined NEC Micro Systems (Kumamoto, Japan) in 1992 and is
respectively. currently a design manager. He is also a Ph.D. student in the Department of
He has been chairman and chief executive officer (CEO) of SynTest Creation Informatics at the Graduate School of Computer Science and
Technologies, Inc. (Sunnyvale, CA) since January 1990, and a visiting Systems Engineering, Kyushu Institute of Technology, Japan. His research
professor in the Department of Electrical Engineering and the Graduate interests include VLSI testing and logic built-in self-test (BIST).
Institute of Electronics Engineering at National Taiwan University since
July 2009. Prior to founding SynTest in 1990, he worked in the industry,
including Intel (Santa Clara, CA) from 1980 to 1983 and Daisy Systems Hao-Jan Chao (M’09) graduated from the Department of Electronic
(Mountain View, CA) from 1983 to 1986, and was with the Department of Engineering at National Taipei University of Technology (formerly,
Electrical Engineering at Stanford University as Research Associate and National Taipei Institute of Technology), Taipei, Taiwan, in 1990. He
Lecturer from 1987 to 1991. further received his B.S. degree in Nautical Technology at National Taiwan
Dr. Wang currently holds 21 U.S. Patents, 15 European Patents, one Ocean University, Keelung, Taiwan, in 1995, and his M.S. degree in
Japan Patent, and one China Patent, in the areas of scan synthesis, test Electrical Engineering from National Central University, Taoyuan, Taiwan,
generation, at-speed scan testing, test compression, logic built-in self-test in 1999.
(BIST), and design for debug-and-diagnosis (DFD). The He has been working at SynTest Technologies, Inc., Hsinchu, Taiwan,
design-for-testability (DFT) technologies Dr. Wang has developed have since 1999, and is currently an R&D Manger responsible for the logic
been successfully implemented in thousands of ASIC designs worldwide. built-in self-test (BIST) as well as silicon debug and fault diagnosis product
He has also co-authored and co-edited three internationally used DFT/EDA lines. His research interests include logic BIST, core-based design for
textbooks – VLSI Test Principles and Architectures (2006), testability (DFT), and design for debug-and-diagnosis (DFD).
System-on-Chip Test Architectures (2007), and Electronic Design
Automation (2009). Boryau (Jack) Sheu received his BSEE and MSEE degrees from National
A member of Sigma Xi, Dr. Wang received a 2007 Meritorious Service Taiwan University and Washington University in St. Louis in 1984 and
Award from the IEEE Computer Society and is a co-recipient of the 2008 1991, respectively.
IEICE Information and Systems Society Excellent Paper Award for an He was Director of Operation and R&D at SynTest Technologies from
excellent series of papers that appeared in IEICE Transactions on January 2001 to April 2009. Prior to SynTest, he held various software
Information and Systems during a period of five years. He is a Fellow of the engineering positions in startup companies. He is currently with Sigma
IEEE and a Golden Core Member of the IEEE Computer Society. Designs (Milpitas, CA) focusing on DFT flow and implementation. His
research interests include at-speed ATPG, logic BIST, as well as test
strategy and integration for VDSM SOC designs. He is a co-inventor of 6
Xiaoqing Wen (S’89–M’93–SM’08) received his B.S. degree in U.S. Patents related to at-speed scan testing and test compression.
Computer Science and Technology from Tsinghua University, Beijing,
China, in 1986, his M.S. degree in Information Engineering from
Hiroshima University, Hiroshima, Japan, in 1990, and his Ph.D. degree in Jianghao Guo received his B.S. and M.S. degrees in Control Science and
Applied Physics from Osaka University, Osaka, Japan, in 1993. Engineering from Shanghai Jiaotong University, Shanghai, China, in 2001
From 1993 to 1997, he was a Lecturer at Akita University. He was a and 2004, respectively. He joined SynTest Technologies, Inc., Shanghai,
Visiting Researcher at University of Wisconsin - Madison from October China, in 2004, where he served as an engineering manager until 2008.
1995 to March 1996. He joined SynTest Technologies, Inc. (Sunnyvale, He is currently a Ph.D. candidate in the Department of Electrical and
CA) in 1998 and served as its chief technology officer (CTO) until 2003. In Computer Engineering at the University of Cincinnati. His research
2004, he joined the Kyushu Institute of Technology, Iizuka, Japan, where interests include VLSI testing, computer architecture, and multi-core
he is currently a Professor and Chair of the Department of Creative system design.
Informatics. His research interests include design, test, and diagnosis of
integrated circuits.
Prof. Wen currently holds 15 U.S. Patents, 2 Japan Patents, and 22 Wen-Ben Jone (M’84–SM’02) was born in Taipei, Taiwan, Republic of
pending U.S. and Japan Patents in logic built-in self-test (BIST), test China. He received the B.S. degree in Computer Science in 1979, the M.S.
compression, and low-capture-power (LCP) test generation. He has also degree in Computer Engineering in 1981, both from National Chiao-Tung
co-authored and co-edited two textbooks – VLSI Test Principles and University, Hsinchu, Taiwan, and the Ph.D. degree in Computer
Architectures (2006) and Power-Aware Testing and Test Strategies for Engineering and Science from Case Western Reserve University,
Low Power Devices (2009). He received the 2008 IEICE Information and Cleveland, Ohio, in 1987.
Systems Society Excellent Paper Award for LCP X-filling and test In 1987, he joined the Department of Computer Science at New Mexico
generation. He is a senior member of the IEEE, a member of the IEICE, and Institute of Mining and Technology, Socorro, New Mexico, where he was
a member of the REAJ. promoted to an associate professor in 1992. From 1993 to 2000, he was a
full professor in the Department of Computer Engineering and Information
Science at National Chung-Cheng University, Chiayi, Taiwan, R.O.C.
Shianling Wu (A’88–M’09) has an M.S. in Computer Science from Since 2001, he has been an associate professor in the Department of
Columbia University. She joined SynTest Technologies, Inc. (Princeton Electrical and Computer Engineering at the University of Cincinnati, Ohio.
Junction, NJ) in 2003 and is presently Vice President of Engineering His research interests include VLSI design for testability, built-in self-test,
focusing on advanced VLSI DFT research and development. Prior to memory testing, high-performance circuit testing, MEMS testing and
SynTest, she was with Bell Laboratories for over 23 years. She currently repair, and low-power circuit design and test.
holds 5 U.S. Patents and has 3 pending U.S. Patents. She has published over Dr. Jone has been a reviewer in these research areas in various technical
15 DFT papers and contributed chapters to two DFT textbooks – VLSI Test journals and conferences. He also served on the program committee of
Principles and Architectures (2006) and Electronic Design Automation various technical conferences. He received the Best Thesis Award from The
(2009). Chinese Institute of Electrical Engineering (Republic of China) in 1981. He
She has served as a program committee member for IEEE International is a co-recipient of the 2003 IEEE Donald G. Fink Prize Paper Award. He is
Test Conference, Asian Test Symposium, and North Atlantic Test also a co-recipient of the Best Paper Award of the 2008 IEEE International
Workshop. She won numerous AT&T and Lucent Awards and received a Symposium on Low-Power Electronics & Design. He is a senior member of
Best Panel Award in the 2005 IEEE International Test Conference. She was the IEEE and the IEEE Computer Society Test Technology Technical
a member of SEMATECH, SRC, GSRC, STARC-International, VSIA, and Committee.
the IEEE1500 Standard Committee. She is a member of the IEEE.

You might also like