Professional Documents
Culture Documents
simultaneously. Two clock domains are said to be faults within each clock domain (called intra-clock-domain
asynchronous if they are not synchronous. faults) or across clock domains (called inter-clock-domain
Despite its conceptual simplicity, logic BIST faces many faults). Skewed-load uses the last shift clock pulse followed
practical hurdles, especially in at-speed testing for immediately by a capture clock pulse to launch a transition
multi-clock, multi-frequency circuits. Each clock in such a and capture its output test response, respectively.
circuit controls a clock domain, whose clock skew is Double-capture uses two consecutive capture clock pulses to
minimized and which runs at a frequency either synchronous launch the transition and capture the output test response,
or asynchronous to other clock domains. The most critical respectively. In either scheme, both launch and capture clock
yet difficult part of logic BIST is how to detect pulses must be running at the domain’s operating speed or
intra-clock-domain faults and inter-clock-domain faults at-speed. The difference is that skewed-load requires the
thoroughly and efficiently with a proper capture-clocking domain’s scan enable signal SE to switch its value between
scheme. An intra-clock-domain fault originates at one the launch and capture clock pulses making SE function as a
clock domain and terminates at the same clock domain. An clock signal. Figure 1 shows sample waveforms using the
inter-clock-domain fault originates at one clock domain basic skewed-load and double-capture at-speed test
but terminates at another clock domain. schemes.
Previous STUMPS-based logic BIST schemes [9]-[11] Typically, testing a scan-based BIST design based on
have not been effectively applied in practice. The reasons skewed-load for at-speed delay fault testing can achieve
are mainly due to the need to manipulate test frequency when higher fault coverage with shorter test length [14]-[20].
the CUT contains asynchronous clock domains, and the Although some novel DFT techniques as proposed in [21]
difficulty in timing control and physical implementation. and [22] have addressed the timing problem of operating the
Alternatively, a conventional at-speed BIST scheme using scan enable signal SE at-speed for each clock domain,
one-hot clocking or simultaneous clocking would need to skewed-load can cause unwanted over-testing because more
test one clock domain at a time resulting in long test time or false paths can be exercised, and incur higher
add isolation logic to normal functional paths across implementation cost associated with the at-speed scan
interacting clock domains resulting in fault coverage loss enable signal SE. This is in sharp contrast to double-capture
across these clock domains. in which only a slow-speed, global scan enable signal GSE
These problems will be addressed in this paper, with a new for all clock domains is needed.
logic BIST architecture using launch-on-capture schemes - Therefore, this paper will focus on logic BIST architecture
aligned clocking and staggered clocking - that achieves based on double-capture. There are two known at-speed
true at-speed test quality for any multi-clock,
Capture
Launch
multi-frequency design and that is easy for physical
implementation.
It should be noted that all above-mentioned CK
capture-clocking schemes are applicable for both BIST SE
designs and scan designs. The main difference is that for Shift Shift Last Shift
BIST designs no unknown (X) values are allowed to Shift
propagate through the scan chains to reach the MISR. capture-clocking schemes: (1) one-hot double-capture that
Throughout this paper, we will assume that a
conducts capture for one clock domain at a time and (2)
STUMPS-based architecture is used and that each clock
Capture
Launch
domain contains one test clock and one scan enable signal.
The faults we will consider for comparison include
structural faults, such as stuck-at faults and bridging faults,
CK
as well as timing-related delay faults, such as path-delay
faults and transition faults. SE
The rest of the paper is organized as follows: Section 2 Shift Shift Dead Shift
Cycles
describes the background. Section 3 presents the logic BIST
architecture. Section 4 discusses at-speed timing control simultaneous double-capture that allows testing to be
issues, and Section 5 focuses on physical implementation performed on all clock domains in parallel.
issues. Section 6 shows results on several industrial designs, (a) Skewed-load (a.k.a. Launch-on-shift)
and Section 7 concludes the paper. (b) Double-capture (a.k.a. Launch-on-capture)
Fig. 1. Basic at-speed test schemes.
II. BACKGROUND
A. One-Hot Double-Capture
There are two basic capture-clocking schemes for testing
The one-hot double-capture scheme tests clock domains
multiple clock domains at-speed: (1) skewed-load (which is
one by one. A sample timing diagram for two clocks is
now commonly called launch-on-shift [LOS]) [12] and (2)
double-capture (which was called broad-side in [13] but is shown in Fig. 2. The main advantages are that (1) two
now commonly called launch-on-capture [LOC]). Both consecutive capture pulses are applied (C1-followed-by-C2
schemes are helpful for detecting structural faults and delay or C3-followed-by-C4) at their respective clock domains’
4885 – P. 3 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
frequencies (of period d1 or d2) to test intra-clock-domain main focus of this paper. In the following, we propose novel
delay faults, and (2) a single, slow-speed global scan enable logic BIST architecture and launch-on-capture schemes for
signal GSE is used to drive both clock domains. detecting intra-clock-domain faults and inter-clock-domain
faults in a multi-frequency circuit containing both
Shift Window Capture Window Shift Window Capture Window Shift Window
C1 C2 synchronous and asynchronous clock domains. The
CK1 … … …
d1 proposed launch-on-capture (or double-capture) schemes
C3 C4
CK2 … … … are intended to increase the fault coverage of the circuit. The
d2
GSE new architecture implemented with the schemes is intended
to facilitate physical implementation as well as debug and
Fig. 2. One-hot double-capture. diagnosis of ASIC devices from the system level down to the
Hence, this scheme can be used for true at-speed testing of chip level [23]-[26]. The architecture requires a BIST-ready
intra-clock-domain delay faults in both synchronous and core that has complied with all scan and BIST design rules.
asynchronous clock domains. However, this scheme suffers
from two drawbacks in that (1) it cannot be used to detect III. LOGIC BIST ARCHITECTURE USING
inter-clock-domain delay faults and (2) it has long test time. LAUNCH-ON-CAPTURE
B. Simultaneous Double-Capture
A. General Architecture
The long test time problem of one-hot double-capture can The proposed logic BIST architecture is illustrated in Fig.
be resolved by using the simultaneous double-capture 4. The BIST architecture for testing the BIST-ready core
scheme illustrated in Fig. 3. The simultaneous consists of a test pattern generator (TPG) for generating test
double-capture scheme allows testing to be performed on
stimuli, an input selector for providing pseudorandom or
all clock domains in parallel. However, since data may
top-up ATPG patterns for the core-under-test, an output
propagate from one clock domain to the other clock domain,
response analyzer (ORA) for compacting test responses, a
isolation logic (or capture-disabling circuitry), such as
AND/OR gates or multiplexers, must be added at the clock gating block (discussed in Section IV-C) for
sources, sinks, or along the normal functional paths to force generating test clocks from original or functional clocks, and
all inter-clock-domain paths to exhibit constant values of 0’s a BIST controller for coordinating the whole BIST
or 1’s at the receiver side. operation. The top-up ATPG patterns can include
compressed ATPG patterns to improve the circuit’s fault
Shift Window Capture Window Shift Window coverage during manufacturing test, when the
C1 C2 combinational-logic-based scan compression architecture as
CK1 … …
proposed in [27]-[29] is embedded in the design. The test
C3 C4 clocks are placed in a predetermined order of sequence (see
CK2 … … Section IV) so that single-capture or double-capture clock
GSE pulses can be supplied to the BIST-ready core. The self-test
operation is started by asserting the Start signal, its end is
Fig. 3. Simultaneous double-capture. indicated by the Finish signal, and its result is shown by the
Result signal. A standard IEEE 1149.1 Boundary-Scan
The major advantages of this approach are that (1) all interface under the control of the test access port (TAP)
intra-clock-domain (structural and delay) faults can be tested controller is used for loading initialization and configuration
simultaneously thus yielding much shorter test time and (2) data or for downloading internal states for fault diagnosis.
there is no need to worry about the clock skew issue between
any two clock domains, be they synchronous or TPG
asynchronous. This approach, however, requires that CK1
PRPG1 PRPG2
isolation logic be inserted across all interacting clock CK2
PS1/SpE1 PS2/SpE2
domains so each clock domain can be tested independent of
Start
all other clock domains. The insertion of isolation logic into Finish Input Selector PIs/SIs
Result
the domain boundary exposes the design to one major
TDI CCK1 TCK
drawback which is not present in one-hot double-capture: TDO Clock 1
Clock Clock
Controller CCK2 Gating TCK2 Domain C Domain POs/SOs
the added circuitry to isolate all interacting clock domains TCK Block #1 #
#2
TMS
may increase the propagation delay of the design in normal BIST-Ready Core
mode and will prevent all inter-clock-domain faults -
structural faults and delay faults - from being detected. SpC1 SpC2
As the one-hot double-capture or simultaneous MISR1 MISR2
double-capture scheme cannot detect inter-clock-domain ORA
delay faults, how to preserve all benefits of both schemes
while at the same time remove all of their drawbacks is the Fig. 4. Logic BIST architecture.
4885 – P. 4 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
B. BIST-Ready Core the CID bit register will be surrounding all BIST cores that
The BIST-ready core is a full-scan circuit (scan design) have been synthesized with their respective logic BIST
that satisfies all scan design rules. Additional circuitry may controllers under boundary-scan control. There will be an
need to be added for preventing bus conflicts at tri-state isolation wrapper for each core. With the objective to enable
buses and for disabling asynchronous set/reset signals and end-to-end debug and diagnosis from core to in-field
false paths. In addition to scan design rules, the BIST-ready applications, these CID bits are stitched to form a shift
core must also satisfy all BIST-specific design rules, such as register, a CID register, so during test, debug, and diagnosis,
X-blocking and for test point insertion (TPI). X-blocking these bits can be programmed on-the-fly and these cores are
which blocks all unknowns (X’s) from reaching the scan tested either in series or in parallel.
outputs is conducted in an intelligent way so that critical In order to debug or diagnose logic BIST cores in an
paths are avoided and that the X-blocking circuitry is placed integrated circuit, one first sets the CID bits of the BIST
as close to the X-source as possible. TPI is guided by fault cores to be diagnosed to all 1’s. These cores can then be
simulation results. In addition, all multi-cycle paths and false processed in parallel. In addition to single error diagnosis,
paths may be selected or blocked depending on test needs. the CID register can further include additional bits in each
logic BIST core, when desired, to increase diagnostic
C. TPG Circuitry and ORA Circuitry resolution for multiple errors that may arise from different
In general, clock skews between two interacting clock faults, such as stuck-type faults, delay faults, or bridging
domains in a BIST-ready core, as shown in Fig. 4, are not faults.
aggressively managed. In order to avoid additional design There are a number of benefits with this CID approach.
efforts for clock skew management in logic BIST, two First, it allows designers to enable or disable diagnosis of
PRPG-MISR pairs, one for each clock domain, can be used, selected BIST cores at any time. Second, it allows designers
even though both clock domains may operate at the same to skip the failed BIST cores and focus on diagnosis of other
frequency. However, if hardware overhead is a major BIST cores. Third, it allows designers to perform multiple
concern, one PRPG-MISR pair can be used. Also, linear error diagnosis. Finally, it allows test engineers to manage
phase shifters, (PS1 and PS2) (a.k.a. space expanders [SpE1 power consumption during production testing by selectively
and SpE2]) can be used to reduce the length of PRPGs, and choosing BIST cores for serial/parallel testing.
space compactors (SpC1 and SpC2) can be used to reduce the
E-2. Signature Diagnosis
length of MISRs.
The logic BIST (LBIST) architecture includes special DFD
D. Test Control Circuitry circuitry to help locate BIST failures down to all failed
The test control circuitry consists of a BIST controller and a signature cycles. When BIST does not pass a manufacturing
clock gating block. The inputs to the clock gating block are test or a system test, designers can utilize the DFD circuitry to
system clocks CK1 and CK2, which become CCK1 and CCK2 enter the signature diagnosis mode by loading initial seeds
after going through some buffers. The two clocks CCK1 and into the PRPG and the MISR, as well as the required number
CCK2 are in fact used by the PRPG and MISR pair as will be of LBIST cycles to perform signature diagnosis.
discussed in Section V. In addition, the clock gating block is When the logic BIST operation in the logic BIST
controlled by signals from the BIST controller to generate controller completes the selected LBIST cycles, the
test clocks TCK1 and TCK2. The timings of TCK1 and TCK2, controller will issue a cycle-end signal, halt the BIST
especially in capture mode, play a critical role in determining operation, and begin to wait for a continue signal to resume
the test capability and physical implementation easiness of its BIST operation and to reset the cycle-end signal. During
the logic BIST scheme. The BIST controller works in tandem this period, all LBIST output responses are captured
with an embedded TAP controller, which complies with the repeatedly into a test-and-diagnosis (TDR) register to form
IEEE 1149.1 Boundary-Scan standard to coordinate the test, an intermediate signature that will be shifted out for analysis.
debug, and diagnosis tasks. If the intermediate signature does not agree with the expected
signature during analysis, designers can further instruct the
E. Debug and Diagnosis Circuitry
TAP controller to inform the controller to issue the continue
In addition to Logic BIST, diagnosis of the BIST-ready signal to resume the logic BIST operation. The BIST
core at the core level and then down to the failed scan chain operations are repeated until designers have located the first
and signature cycle levels, to locate faulty scan cells and failed BIST pattern (signature cycle) or all failed BIST
logic gates, is also important. Many innovative BIST debug patterns.
and diagnosis approaches have been proposed and surveyed This cycle-based signature diagnosis approach indicates
in [5], [6], [30], and [31]. This subsection details a few that the contents of the TDR register are only sampled after a
unique design for debug and diagnosis (DFD) features. cycle-end signal is received. If there are many BIST
E-1. Core-Level Diagnosis operations to be performed in parallel, the TAP controller
must wait until all cycle-end signals have been generated. In
To facilitate test, debug, and diagnosis, each clock domain
addition to loading the initial seeds for the PRPG and the
(core) is embedded with a unique core-identifier (CID) bit to
MISR, the TDR register shown in Fig. 5 stores additional
decide whether this clock domain will be targeted or not [32].
LBIST data including the golden signature for comparison
A design for debug and diagnosis (DFD) circuitry including
with the MISR, pattern counter, scan-chain mask, cycle-mask
4885 – P. 5 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
start index, cycle-mask stop index, and programmable shift where 0<i<n, causes the MISR to receive a constant 0 for the
and capture modes. Additional seeds can also be ith chain. The method of feeding constant values to the MISR
pre-computed ahead of time. Those seeds are then loaded is called a masking mechanism. With this embedded
into the BIST core to run multiple, short, logic BIST test scan-chain masking logic, designers can run test sessions in a
sessions to further help with diagnosis. linear or binary search fashion to locate faulty chain(s).
TDO
PRPG Scan Chain MISR
P/F Pass/Fail Indicator &
X Scan -Chain Mask
&
Y Cycle -Mask Start Index
Z Cycle -Mask Stop Index
S
S
E
& I
PSM1
PC Pattern Counter E
G
Programmable Shift Mode D
PSM2
PSM
PCM Programmable Capture Mode Scan Chain
Scan-Chain
Masking Logic
Fig. 5. Contents of the test-and-diagnosis (TDR) register. Fig. 6. Masked-chain diagnosis (MCD) logic diagram.
There are a number of advantages in using this
test-and-diagnosis (TDR) register for signature diagnosis. Once a faulty chain is identified, the next task is to locate
First, it can locate failed patterns. Second, it can further the faulty scan flip-flops within the chain. The Y and Z fields
locate multiple capture errors in the scan chain. Finally, specify the start and stop indices of the scan flip-flops. The
linear search or binary search can be employed to reduce the cycle-mask start index and cycle-mask stop index will allow
diagnosis time. the logic BIST controller to skip the region (cycles) where
Since the TDR register contains a pattern counter in failed scan flip-flops reside so BIST diagnosis can still be
support of the cycle-based approach, the LBIST controller is performed on the remaining fault-free scan cells. The
capable of launching test sessions with varying pattern counts, masked-chain diagnosis feature is also helpful for unknown
i.e., a test session with its required test patterns derived from (X) masking in case the X-sources fall within a small fraction
pre-computed logic/fault simulation, or a test session with a of the scan index or range. The programmable shift and
predetermined pattern count. For diagnosis purposes, a test capture modes are to shift the scan chains at a selected,
session can be of a single pattern, thereby allowing designers reduced frequency to avoid overheating as well as to select
to apply test patterns one-by-one to sort and identify failed the number of capture clock pulses for stuck-at or delay fault
patterns for the entire test set. Alternatively, a binary search testing during capture, respectively.
can be used to speed up finding the failed patterns through
interactive test sessions between the device-under-debug and E-4. One-Chain Diagnosis
the tester (or system). As the TDR register supports reseeding The DFD circuitry further includes additional test
of the PRPG/MISR, designers can adjust the starting and mechanism to dump all scan values from the reconfigured
ending points of a test session according to test and diagnosis scan chains. Assume a failed pattern (signature cycle) has
needs. The identification of failed patterns must be done been located during signature diagnosis or XYZ diagnosis.
before proceeding to masked-chain or one-chain diagnosis to The TAP controller can then load an instruction to the TDR
minimize the time-consuming diagnosis effort. register and instruct the logic BIST controller to (1) set the
E-3. Masked-Chain Diagnosis TDR register to one-chain diagnosis (OCD) mode, and (2)
shift out the contents of all scan cells in the scan chains which
The DFD circuitry also includes the XYZ diagnosis are reconfigured as an OCD register for analysis. The
structure proposed in [33] and the improved masking reconfigured OCD register is treated as one of the several
methods developed in [34] to locate the failed scan chain(s) data register chains of the TAP controller after the diagnostic
and the associated failed scan cells in each failed scan chain. instruction register has been decoded.
An LBIST TDR register for debug or diagnosis of LBIST As the number of failed patterns could be large, all failed
failures is embedded in each core. The XYZ operation using patterns are fault graded such that the number of diagnostic
the contents of the TDR register shown in Fig. 5 in support of patterns being applied in OCD mode can be reduced without
XYZ diagnosis is illustrated in Fig. 6, where the X field affecting the resolution of finding the root cause from the
indicates which scan chain(s) to mask; Y and Z fields indicate failed combinational logic. Additional diagnostic patterns
which ranges of scan cells in the scan chain(s) to mask. can then be shifted into the scan chains to pinpoint the failed
For n scan chains, there will be n bits in the X field and n combinational logic gates. This allows the BIST core to be
AND gates in the masked-scan diagnosis (MCD) logic debugged and diagnosed at the board or system level,
feeding the MISR. A 0 set to the ith bit of the X field, X(i)=0, enabling in-field diagnosis [35].
4885 – P. 6 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
C. Capture Clock Generation switching operation mode from capture to shift. A timing
In order to generate an ordered sequence of waveform is shown in Fig. 14 where the delay d is properly
double-capture clocks, one can use clock suppression, adjusted depending on whether inter-clock-domain
daisy-chain clock-triggering, or token-ring clock-enabling. structural or delay faults are to be detected or not.
Generally, the clock suppression technique as proposed in
[38] is more suitable for testing synchronous clock domains, GSE
whereas the daisy-chain clock-triggering technique or the
TCK1
token-ring clock-enabling technique as proposed in
[23]-[25] is more suitable for testing asynchronous clock TCK2
domains. Other design flavors of on-chip clock controllers d
can also be found in [40] and [41].
The clock suppression scheme typically requires using a Fig. 14. Daisy-chain clock-triggering.
reference clock operating at the highest frequency. Figure 12
shows an example launch aligned double-capture for two The token-ring clock-enabling technique is very similar to
interacting clocks. Figure 13 shows a clock suppression the daisy-chain clock-triggering technique. The only
circuit for generating the launch aligned double-capture difference between them is that the former uses a clock edge
waveform given in Fig. 12. This circuit uses a reference to trigger the next event, while the latter uses a signal level to
clock (CK1) to program the capture window. The contents of enable the next event. Figure 15 shows a daisy-chain
the 8-bit shift register are preset to {0011, 1111} during each clock-triggering circuit for generating the staggered
shift window. Due to its programmability, the approach can double-capture waveform given in Fig. 14. When the BIST
also be used to generate timing waveforms for testing mode is activated, the SE1/SE2 generators and 2-pulse
asynchronous designs. One major requirement is that controllers will generate the required scan enable and
depending on needs, the delay measured by the number of double-capture clock pulses per the arrows shown in Fig. 14.
reference clock pulses be equal to or longer than delay d Each SE1/SE2 can be treated as a GSE signal for CD1/CD2.
between C2 and C3 as shown in Fig. 11a. A novel clock SE1
gating circuit for generating staggered single-capture clocks BIST SE1 2-Pulse TCK1
for inter-clock-domain at-speed testing of synchronous mode Generator Controller
clock domains can also be found in [42]. CK1
CK1
SE2
SE2 2-Pulse
Shift Window Capture Window Shift Window TCK2
Generator Controller
C1 C2 CK2
TCK1 … … CK2
C3 C4
Fig. 15. A daisy-chain clock-triggering circuit for
… … generating the waveform given in Fig. 14.
TCK2
GSE
V. PHYSICAL IMPLEMENTATION ISSUES
Fig. 12. An example launch aligned double-capture
clocking. A major difference between ATE-based scan test and
logic BIST is that the latter requires a more complex self-test
circuitry be implemented together with the functional
‘0’
0 0 1 1 TCK1 circuitry. Successfully conducting the physical
CK1
implementation of the functional circuitry for a high-speed
CK1 and high-performance design is in itself a big challenge. If
BIST GSE ‘0’ 1 1 1 1 TCK2 the self-test circuitry adds a large number of critical signals
GSE CK1
mode Generator and requires strict clock-skew management, the physical
CK2 implementation of logic BIST can become prohibitively
difficult.
Fig. 13. A clock suppression circuit for generating the The proposed logic BIST scheme employs several
waveform given in Fig. 12. techniques to ease physical implementation. Of the most
The daisy-chain clock-triggering technique means that the significance, a slow-speed global scan enable signal GSE is
completion of the shift-in operation triggers the GSE signal used to greatly reduce the clock-skew management
to become 0, switching operation mode from shift to capture. complexity and a PRPG-MISR pair is used for each clock
This in turn triggers the generation of two at-speed clock domain to avoid layout routing congestion. One more
pulses for the first clock domain, the rising edge of the technique using re-timing logic to control clock skew among
second capture clock pulse triggers the generation of two each group of PRPG, scan chain, and MISR is illustrated in
at-speed clock pulses for the second clock domain, and so Fig. 16 is illustrated in Fig. 16.
on. Finally, the rising edge of second capture clock pulse for
the last clock domain triggers the GSE signal to become 1,
4885 – P. 10 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
selected and inserted in the BIST-ready core, or top-up results indicate that it is possible to achieve more than 93%
ATPG patterns (including compressed ATPG patterns when BIST intra-clock-domain transition fault coverage (in all
available) can be used during manufacturing test, to increase synchronous and asynchronous clock domains) when 1000
the circuit’s fault coverage. The tool allows adding extra test observation points were inserted into the design. Because
points in advance at the RTL design with the hope to achieve each core has already implemented scan chains, we can only
the target fault coverage goal. Otherwise, the test point obtain the BIST area overhead that includes the
insertion can be inserted at the gate-level and the fault PRPG/MISR pairs, the BIST controller, and additional
simulation process is repeated until the final fault coverage circuits to insert the space expanders and observation points,
goal is reached. An example fault simulation and test point block all unknown (X) signals potentially propagated to the
insertion flow is illustrated in Fig. 18. MISRs, and mask off all multi-cycle paths and false paths.
In addition, we implemented logic BIST on two large
Test Point Selection at RTL Design industrial designs, ranging from 15 to 20 million primitives.
Table V shows the experimental results. Again, we
implemented the staggered double-capture architecture in all
Logic/Scan Synthesis
clock domains of each design to reduce physical
implementation efforts. We calculated both BIST transition
Fault Simulation fault coverage in all synchronous and asynchronous clock
domains using the intra-clock-domain transition fault model
Gate-Level Test No Coverage and the BIST stuck-at fault coverage using the staggered
Point Insertion Acceptable ? double-capture patterns in each design. Knowing these BIST
Yes coverage numbers allows for top-up (transition and stuck-at)
ATPG patterns to be added at a later stage to increase the
Done
circuit’s fault coverage during manufacturing test. Both
Fig. 18. Fault simulation and test point insertion flow. designs have been successfully taped out and worked the
first time on manufactured chips.
D. Industrial Design Examples
TABLE V
Based on the TurboBIST-Logic tool capabilities, we have EXPERIMENTAL RESULTS FOR INDUSTRIAL DESIGNS
implemented the staggered double-capture architecture in CKT1 CKT2
two commercially popular CPU cores [23] and added # of Primitives 15M 20M
observation points to improve the fault coverage of each # of Flip-Flops 673K 1.61M
core. The circuit statistics and experimental results for the # of Clock Domains 6 9
two IP cores are shown in Table IV. Operating Frequency 266MHz 533MHz
TABLE IV # of PRPG/MISR Pairs 6 9
EXPERIMENTAL RESULTS FOR IP CORES X and Y PRPG Length Range 19-27 19-24
MISR Length Range 19-253 21-220
Core X Core Y
# of BIST Patterns 64,000 64,000
# of Primitives 218,100 633,400
BIST Fault Coverage
# of Flip-Flops 10,300 33,200 – Stuck-At Faults 87.79% 89.38%
# of Clock Domains 2 8 – Transition Faults 86.63% 82.32%
Operating Frequency 250MHz 330MHz
# of PRPG/MISR Pairs 2 8 BIST Area Overhead 2.39% 2.07%
PRPG Length Range 19 19
MISR Length Range 1: 19 / 1: 99 7: 19 / 1: 80 The results show that each design can run at its intended
# of Observation Points 1,000 1,000 operating frequency. For a design containing millions of
# of BIST Patterns 20,000 20,000 gates, the BIST overhead becomes a fraction of the design.
BIST Fault Coverage Once again, because each circuit has already implemented
– Transition Faults 93.82% 93.22% scan chains, the BIST area overhead is computed similar to
BIST Area Overhead 4.40% 3.20% that used in the two CPU cores. However, obtaining better
BIST fault coverage, for example, 95% or higher, is a
During the implementation, we also chose to: (1) construct
challenge without adding additional fault coverage
one PRPG-MISR pair for each clock domain because there
improvement features, such as test point insertion or other
were crossing clock-domain logic between any two clock
techniques discussed in [5]-[6] and [43]-[45]. This is typical
domains, (2) insert scan cells into all PIs and POs so as to
for BIST designs when only pseudorandom patterns and
increase intra-clock-domain transition fault coverage, and
launch-on-capture schemes are used.
(3) skip inserting space compactors between scan chain
Table VI shows the circuit statistics and the combined
outputs and the MISRs in order to avoid setup-time
BIST and ATPG experimental results for another industrial
violations. This is why there were two long MISRs, one with
design that has been in production [42]. The results
99 bits in Core X and the other with 80 bits in Core Y. Such a
demonstrated how the BIST design is coupled with top-up
MISR is generally related to the main and large clock
ATPG to detect 96.9% intra-clock-domain transition faults
domain that contains a larger number of scan chains. The
4885 – P. 12 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
in all synchronous and asynchronous clock domains. In this inter-clock-domain transition faults between two
experiment, 64K random patterns using logic BIST and synchronous (or two asynchronous) clock domains.
8,183 deterministic patterns using top-up ATPG were
applied to the design. VII. CONCLUSIONS
To detect inter-clock-domain transition faults in all Delay fault testing based on launch-on-capture is
synchronous clock domains, additional top-up ATPG commonly practiced in industry due to the ease of using a
patterns were also generated and applied using the slow-speed global scan enable signal. When a BIST design
inter-clock at-speed control scheme proposed in [42]. The contains a mix of synchronous and asynchronous clock
results are shown in Table VII. In this case, 6 inter-clock domains, the conventional one-hot or simultaneous
logic blocks, A ~ F, were targeted. For example, A is a logic double-capture scheme cannot detect any
block from a 100MHz clock to a 300MHz clock, and inter-clock-domain delay faults. This makes logic BIST
contains 36,858 transition faults. Table VII shows the even more difficult to achieve high BIST fault coverage
generated ATPG patterns, fault coverage, and CPU time. when pseudorandom patterns are mainly used for testing.
Furthermore, the area overhead in this application was This paper presented new at-speed logic BIST
mainly due to 6 inter-clock enable generators, each architecture using double-capture (a.k.a. launch-on-capture)
containing 20 flip-flops and consuming an area of roughly for testing BIST designs containing multi-frequency
124 equivalent 2-input NAND gates. synchronous and asynchronous clock domains. To facilitate
TABLE VI debug and diagnosis, the BIST architecture also includes
INTRA-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3 BIST diagnosis logic to help locate BIST failures.
CKT3 It was shown that the aligned double-capture scheme
# of Primitives 11.1M employed in the architecture is most suitable for testing
# of Flip-Flops 404.9K synchronous clock domains to achieve true at-speed test
# of Clock Domains 11 quality, whereas the staggered double-capture scheme
Min. Frequency 66MHz employed is most suitable for testing asynchronous clock
Max. Frequency 533MHz domains. Physical implementation becomes easier due to the
# of Scan Chains 32
use of a slow-speed global scan enable signal and reduced
Maximum Chain Length 13,598
# of BIST Patterns 64,000
timing-critical design requirements. If structural faults are
# of Top-Up ATPG Patterns 8,183 only considered for detection and diagnosis, the BIST
Intra-Domain Fault Coverage architecture built upon the staggered single-capture scheme
– Transition Faults 96.9% can result in highest fault coverage with lowest BIST
overhead. Application results for several industrial designs
TABLE VII
INTER-DOMAIN FAULT DETECTION FOR BIST DESIGN CKT3 have demonstrated the effectiveness of the proposed
architecture. These results further indicated that the
Inter- proposed double-capture schemes can reach high BIST fault
ATPG
Clock From To # of coverage. For designs containing both synchronous and
Logic (MHz) (MHz) Faults # of Fault CPU
Blocks Vec. Cov. (h:m) asynchronous clock domains, a hybrid clocking scheme
A 100 300 36858 232 86.4 4:30 using aligned double-capture and staggered double-capture
B 133 533 8350 32 100 0:15 is proposed; however, challenges still lie ahead with regard
C 133 266 4942 36 100 0:14 to how to increase the BIST transition fault coverage of the
D 533 133 1940 9 100 0:10 design to a much more acceptable level, say 95% or above,
E 266 533 732 9 100 0:10 and how to locate BIST failures more effectively during
F 266 133 64 3 100 0:09 debug and diagnosis from the system level down to the chip
level.
In the case of detecting inter-clock-domain delay faults in
asynchronous clock domains, unfortunately, we have not ACKNOWLEDGMENTS
been able to conduct true experiments because all industrial The authors are grateful to the anonymous referees for
chips taped out as of today have not been designed for such pointing out unclear descriptions of the paper and giving
purpose. However, as discussed earlier, our proposed constructive suggestions. The authors also would like to
staggered single or double capture scheme can deal with thank Dr. B. Cheon and E. Lee of Samsung Electronics in
such cases by carefully adjusting the d value given in Figs. Korea, Tomotaka Odajima of Marubeni Information
10a or 10d. Consequently, delay fault testing of Systems in Japan, and many colleagues of SynTest
inter-clock-domain faults in asynchronous clock domains is Technologies in the US, Korea, China, and Taiwan for
exactly the same as that in synchronous clock domains. providing the experimental results listed in Tables IV to VII.
Table I has demonstrated via an industrial design how This work was supported in part by National Science
various clocking schemes affect the fault coverage of Foundation of USA under grant CCF-0541103.
4885 – P. 13 To Appear in IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS