Time-Multiplexed Compressed Test of SOC Designs: Adam B. Kinsman, Student Member, IEEE, and Nicola Nicolici, Member, IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO.
8, AUGUST 2010
1159
Time-Multiplexed Compressed Test of SOC Designs

Adam B. Kinsman, Student Member, IEEE, and Nicola Nicolici, Member, IEEE
AbstractIn this paper we observe that the necessary amount of compressed test data transferred from the tester to the embedded cores in a system-on-a-chip (SOC) varies signicantly during the testing process. This motivates a novel approach to compressed system-on-a-chip testing based on time-multiplexing the tester channels. It is shown how the introduction of a few control channels will enable the sharing of data channels, on which compressed seeds are passed to every embedded core. Through the use of modular and scalable hardware for on-chip test control and test data decompression, we dene a new algorithmic framework for test data compression that is applicable to system-on-a-chip devices comprising intellectual property-protected blocks.
I. INTRODUCTION MPERFECTIONS in the manufacturing process of very large scale integrated (VLSI) circuits give rise to manufacturing defects that result in improper circuit operation, and which must be screened by application of test patterns. Scan-based testing reduces the complexity of sequential automatic test pattern generation (ATPG) to combinational ATPG by providing full controllability/observability to the internal state elements. While providing clear benets in terms of test preparation and fault coverage through structural test of complex system-on-a-chip (SOC) designs, the huge volume of test data (VTD) will render scan-based testing ineffective due to limitations of the automatic test equipment (ATE). Although the ATE memory buffers can be reloaded, this will be achieved at the expense of increased time the SOC spends on the ATE, thus raising the cost of test. To overcome the VTD problem, test data compression is employed and the relevant prior approaches are discussed next.
A. Related Work Two main directions exist in test data compression, corresponding to the two main ways in which a stream mode only tester may achieve compression, i.e., by clocking the scan chains faster than the ATE, or by driving many scan chains from few tester channels. The key feature of temporal decompression is that it exploits the ratio between the scan frequency and ATE frequency, while for spatial decompression the ATE frequency and scan frequency are kept the same, and compression arises from the ratio between number of internal scan chains and tester channels. Some work done in both methods is outlined in this section. Before discussing temporal and spatial methods in depth, a distinction is made regarding test set dependent and test set independent methods. Those which base the structure of the deManuscript received September 30, 2008; revised February 04, 2009; accepted March 23, 2009. First published August 21, 2009; current version published July 23, 2010. A. B. Kinsman and N. Nicolici are with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada (e-mail: kinsmaab@mcmaster.ca, nicola@ece.mcmaster.ca). Digital Object Identier 10.1109/TVLSI.2009.2021602
compressor in some way on the test set are known as test set dependent, such as those which follow. In [1], the test set is analyzed to determine a set of scan cells to invert which will place more vectors in the output space of the decompressor thereby improving the compression. A scan tree is constructed by [2] where the output of a scan cell branches to multiple chains and the structure of the tree (i.e., where to branch) is determined by dependencies in the test set. A nite state machine (FSM) is used in [3] to drive a combinational expander which is derived from the test set. Finally, [4] crafts a decompressor based on the test patterns at hand as is also the case in [5]. Although test set dependent methods can give good compression for a particular test set, once the hardware has been physically implemented, changes to the test set (e.g., used for diagnosis or to detect an un-modeled fault) may come at signicant compression cost or, in the worst case scenario, some patterns cannot be expanded through the decompressor. For this reason, it is better to form a generic compressor which gives a more even treatment of all vectors with certain general characteristics (i.e., average care bit density). Such methods are called test set independent and form the focus of the remainder of our discussion. In the temporal domain, test data compression is often achieved through the use of statistical coding methods, whereby a codeword sent by the tester in a short number of clock cycles is expanded to a long symbol in the scan chains during a large number of clock cycles. To avoid accumulation of test data in the decompressor, temporal methods therefore require that the ATE and the scan chains of the circuit under test (CUT) be clocked at different frequencies. Huffman coding is employed by [6] while [7] uses Golomb codes to achieve test data compression. Variable length input Huffman compression (VIHC) is proposed in [8] of which Golomb coding is shown to be a special case and system level considerations for VIHC are given in [9]. In [10] another class of codes called frequency-directed run-length (FDR) codes is discussed, which improves compression by addressing the low probability of long runs of zeros. FDR codes are also employed by [11], which focuses on the partitioning of test resources at the system level. In [12] run-length encoding is used, but longer run-length codes (which are shown to occur rarely) are stored in an on-chip dictionary to increase efciency of coding. In [13] the test set is divided into blocks and compressed using 9 codewords one of which contains an explicit uncompressed word (for the case where a block does not t any of the other 8 codewords). Most multilevel compression schemes rely on temporal methods as well, an example of this is [14] which rst maps care bits onto seeds as in general linear feedback shift register (LFSR) based compression, and then applies statistical coding to compress the LFSR seeds. One of the main strengths of statistical coding methods is the ability to compress highly specied test sets, making them suitable for intellectually protected (IP-protected) cores, where ATPG and fault simulation may not be performed
1063-8210/$26.00 2009 IEEE
1160
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 8, AUGUST 2010
by the system integrator and thus test generation is the responsibility of the core provider. The system integrator must translate the core level tests of each core into an entire system level test. To reduce the amount of data that must be communicated between the core provider and system integrator, the test sets may be highly compacted (having high care bit densities). Although well suited to high care bit densities, statistical methods suffer from synchronization and power issues due to increased scan clock frequency, making them difcult to implement in practice. Finally, because statistical methods must assign care bits and then compress the entire test set, spatial methods as presented in the next paragraph may provide better compression on the basis that they compress only the specied bits of the test set. As mentioned, spatial compression methods which exploit a a few-to-many mapping of the tester channels have been more widely adopted in practice for test data compression. Although essentially any logic circuitry may be used, linear machines (both sequential and combinational) based on XOR gates have been employed extensively due to the fact that they , i.e., modulo-2 can be modeled by linear equations in arithmetic, leading to tractable calculation of compressed streams through Gaussian elimination. In exchange for more easily calculable compressed streams, however, comes an increased probability of an unsatisable pattern (known as lockout) due to the linear dependencies. The combinational variety of linear decompressor is used by [15], which uses integrated ATPG/fault simulation to reduce the probability of lockout, this is also the case in [16] which achieves additional compression through extremely short scan chains. Despite the simplicity of the combinational approach, sequential decompressors seem to have prevailed due to the ability to share free variables from the tester across multiple clock cycles. As noted in [17], linear dependence results in the expansion of compressed bits to care bits in the test set, having a probability of lockout. For combinational decompressors there is no sharing of compressed bits from one scan clock cycle to another, and the bits stand for the maximum occurring in any scan clock cycle of the test set. For sequential decompressors, compression is improved because the average number of bits sent per clock cycle for the block is less than the maximum per any clock cycle, as in the combinational case. The compressed bits (seed variables) arriving to the sequential decompressor over multiple clock cycles may be handled in a variety of ways. In [18], they are buffered and transferred to the LFSR once, at the beginning of a decompressed block of test data. Again in this method, ATPG/fault simulation are integrated to cover as many faults as possible for each pattern and merge multiple patterns on a single seed. An alternative to this is to allow the variables to affect the LFSR state during each clock cycle, so called injection based methods as covered in [19]. Both [20], [21] also work on this principle of injecting free variables, but using ring generators instead of LFSRs. The injection approach is also used in [22] which connects a few scan chains to an LFSR, and the injection input to the LFSR is seen as a virtual scan chain. Injection methods allow accumulation of variables for long periods of time without ushing them from the system as in the case where buffering is used. If extended bursts of care bits exist in the test set however, enough free variables may not
have accumulated to avoid lockout. A methodology has been proposed to overcome this, where shifting of the scan chains is stalled while more free variables are provided by the tester. This method is used by [23] to achieve good compression at relatively low area overhead. A similar approach is also used by [24], enabling the use of a smaller LFSR however the area is increased since they use it in concert with the buffering approach described earlier. In both cases, gating of the scan clock leads to excessive scan times when bursts of care bits exist. Moreover, in a core-based environment a scan clock gating line will be required for each core, which, besides imposing an additional constraint on the design ow, it will also increase the volume of test data (unless these scan clock gating channels are time-multiplexed as it is done in this paper). The approach used by [25] employs a large LFSR at the system level to drive the scan chains of many cores, and follows the same intuition as [26] of aligning cores/patterns with high care bit density alongside those with low care bit density, creating a virtual system with on average lower care bit density per pattern. Although as mentioned linear state machines are the most widely used type of spatial decompressors, other methods do exist. Examples of this are [27] which uses a decoder to ip a chosen bit in the frame which will be applied to the scan chains, and [28] which uses cores already present in the SOC for decompression. For ease of implementation, linear spatial methods have prevailed. Additionally, better compression can often be obtained for lower care bit densities since only compression of the specied bits is required. When care bit density starts to increase however, these methods fail due to pattern lockout. So a dilemma arises when IP-protected cores are used in an SOC, and the compression task is placed on the system integrator. On one hand, temporal methods provide a mechanism by which higher care bit density patterns may be applied while practical implementation is convoluted. On the other hand, spatial methods are easy to implement but are unable to compress patterns with high care bit density which are likely to come from core providers attempting to reduce the amount of data communicated by them. Furthermore, because spatial methods rely on low care bit densities, they are heavily dependent on integrated ATPG/fault simulation, which is prohibited for IP-protected cores, to achieve added compression through pattern reduction. B. Motivation and Objective In light of the previous discussion, for IP-protected cores one would like to maintain the advantages of both temporal and spatial methods while removing their individual limitations: provide the ability to compress test sets with high care bit density as in the case of temporal methods and enable a seamless integration between the ATE and SOC, inherent to spatial methods. Although the current industrial practices rely on a attened design approach to test generation/fault simulation and test data compression, because the purpose of core-based SOC design is to reduce the cost of the implementation cycle through reuse, the following ve main criteria for testing IP-protected cores in SOCs can be devised: IP-consistency: We desire a method that does not require access to the internal details of the core to provide compression, thus ATPG/fault simulation should not be relied upon;
KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS
1161
Fig. 1. Non-uniform distributions enabling improved compression.
Modularity: In order to facilitate reuse, the method should be easily pluggable to any system and furthermore, the hardware generation should be easily automated; Scalability: To remain relevant in the long term, the hardware architectures employed and the algorithms used should scale reasonably with circuit size; Programmability: Any test pattern should pass through the decompression engine and changes to the test set should not inuence its compressibility so long as the general features of the test set (i.e., care bit density) remain intact; Low Cost Test Platform: No special requirements should be made about the ATE and we consider a very low cost basic tester which operates only in the stream mode without any advanced features, such as sequencing-per-pin or repeat/zero ll options which add to the cost of managing the sequencer memory. The objective of our work is to propose a new method for input test data compression (we consider output compression to be a separate problem) that satises the above ve criteria for SOCs with full-scan IP-protected cores. The key observation introduced in Section II points out that embedded cores exhibit a nonuniform demand for compressed data. This opens new opportunities for sharing the tester channels between different cores by time-multiplexing them. New algorithms for test data compression and architectures for decompression at both core and system levels are detailed in Section III. Experimental results presented in Section IV demonstrate that the proposed solution leads to reduced VTD and area compared to those approaches which provide the same features. Finally, Section V concludes the paper. II. RELATING CARE BIT CLUSTERING TO RESEED DENSITY To reduce VTD and testing time by simultaneous testing of multiple cores sharing a common set of tester channels, the test sets of the cores must satisfy the condition that 100% bandwidth from the tester is not required during 100% of the testing time. Fig. 1 shows empirically how this occurs (the test set used here is for s38584 from the ISCAS89 benchmark setISCAS89 benchmarks [29], and Comp-(number) indicates care bits per pattern bounded by number, a topic which will be revisited in Section IV). In Fig. 1(a), a test set has been broken into blocks of four consecutive scan clock cycles which are binned and plotted
according to care bit density. It is immediately clear that care bit density is low for most of the blocks and high for only a few. Although a block length of 4 consecutive scan cycles has been chosen, the concept remains valid if a different block length is selected. Due to the direct correlation between care bits and seed bits required from the tester as discussed in [17], we expect a correlation between the care bit density and the required density of reseeds as observed in Fig. 1(b). For this gure, each pattern from the test set was broken into blocks, as many as possible of which were satised from a single seed, so some patterns were reseeded frequently (those with high care bit densities) and some reseeded infrequently (those with low care bit densities). The patterns were then binned according to number of reseeds per pattern. As expected, nearly all of the patterns have very few reseeds while only a few require frequent reseeding. Because the tester operates in the stream mode (as per the low cost tester requirement), in the time during which reseeding is not required the test data is wasted, i.e., the tester channels are under-utilized. It is this fact that we exploit: during the time in which one core does not require the seed channels for reseeding, we use the same tester channels to reseed another core and thereby improve the tester channel utilization, which leads to reduced VTD. III. TIME MULTIPLEXING FOR TEST DATA REDUCTION Based on the requirements given in Section I and the observations made in Section II, in this section we introduce a new test data decompression framework for SOCs with IP-protected blocks, including the computer-aided design (CAD) support. The intuition of the method is described rst, specically the distinguishing features of the algorithms used for compression at the core and system levels. This is followed by the basic decompression architecture and its variations that enable compression of test cubes with higher care bit densities. A. Basic Core and System Level Compression Algorithms At the most basic level, the intuition behind our approach is to share a single set of tester channels and allow one core to utilize them while the others are under-utilizing them. The principle is shown in Fig. 2 which illustrates the fundamental ways in which multiple cores in an SOC can be tested. In part (a) all the cores are given independent tester channels, while in part (b), the number of tester channels required is much fewer as one set of channels is allocated and used by the cores in sequence,
1162
Fig. 2. Choices for SOC test application (each square
= 1 core).
i.e., the cores are tested in series. The large volume of test data from choices (a) and (b) can be addressed through part (c) of the gure, where a small number of tester channels are allocated and the cores are tested simultaneously. This is possible only if a single core needs the channels at a time. However this will require some control information to be passed from the ATE to the SOC that indicates when each core will sample the seed placed on the tester channels. We discuss next how provision of such data streams at the core level and exploitation of them at the system level is accomplished.
_______________________________________________________________
Algorithm 1: CompressPattern
_______________________________________________________________
_______________________________________________________________
1) Core-Level Compression by Generating Incremental Solutions to Linear Equations: The central distinguishing feature of our approach is to augment the slack in the demand for the stream of seed bits of core decompressors, thus facilitating the packing algorithm (described later in Section III-A-II) to merge
the compressed streams of all the cores at the system level (as intuitively indicated in Fig. 2(c)). The algorithm which accomplishes core level compression is given as Algorithm 1, the operation of which is illustrated in Fig. 3. The hardware conguration is shown down the left side of the gure, along with the equations [17] which model it. Noting that scan time for the given pattern increases moving to the left, the rightmost frame is the rst one provided. The pattern is divided into blocks of length 4 (upper left portion of the gure) as per line 1 of the algorithm. The column of equations down the center of the gure represents those for which each block is reseeded individually, that is, for each block the values of used to generate the cor(for the rightmost/oldest frame responding equations are (for the leftmost/most recent frame of of that block) to that block). Operation of the algorithm is observed in the column of equations shown down the right side of the gure. Because we seek to link multiple blocks together in a sequence generated from a single seed, blocks must be checked for consistency with one another. When the rst consistency check is performed, only the equations of the rst block are present. Since the equations for this block are consistent, as shown by the uppermost in the gure, we try adding another block to the sequence (line 4 of Algorithm 1). In order to add the equations for the second block, we must rst align them to the same reseed point as the rst block. This is accomplished through multiplication with the LFSR transition matrix , raised to the fourth power. To become convinced of this fact, consider that the equations for a block are derived from the care bits in the block as though the reseed was at the beginning of the block, and we have already established gives the LFSR output at cycles from reseed. Since that the second block expanded from the same seed represents cycles 4, 5, 6 and 7 we obtain this as as were used to obtain the original equations. Upon shifting and adding the equations for the second block, the system is no longer consistent, as shown by the X in the gure. The Reseed points signal is asserted to indicate the LFSR requires reseeding. The equations of block two then form the starting point for a new sequence. After the second sequence starts, as the s indicate we are able to string the second, third and fourth blocks together. Since adding the equations of the fth block would result in an inconsistency, we solve the system for blocks two, three and four, and write it to the block two position of the seed stream. This process continues until either an inconsistency arises in a single block (line 15 is executedan empty Seed stream is returned) in which the hardware must be recongured (see Section III-D), or until all the blocks have been covered. Performing compression at the core level, which we call StreamBIST, on each core in the system yields a collection of core level decompressors and accompanying compressed seed streams, as shown in Fig. 4. The process of merging all this information into a system level test architecture and data stream (named StreamPack) is discussed next altogether with the unique advantages of the proposed methodology1. 2) System-Level Compression by Packing Streams of Seeds: By aligning periods of activity on the seed lines for one core with periods of inactivity for the other cores, seeds for different
1The reader is referred to [30] for specic implementation details of StreamBIST/StreamPack.
1163
Fig. 3. Calculation of seeds for StreamBIST.
Fig. 4. Compression ow for StreamPack.
cores may be interlaced on a single, shared set of tester channels. The improved tester channel utilization is leveraged to reduce test time and, by virtue of the fact that we use all tester channels in the stream mode, VTD is reduced. The StreamPack algorithm (Algorithm 2) re-orders patterns to enable seed line sharing (time-multiplexing). Some system level detail is provided which indicates how the cores are connected to the test
infrastructure. This simply permits a few cores to be grouped serially as in Fig. 2(b), so that instead of all cores in the SOC being tested simultaneously [as in Fig. 2(c)], the respective groups of cores are tested concurrently, while only one core within each group is active at a time. This can be useful for addressing constraints on what cores may be active at the same time, but the simplest and most frequently used conguration is one core per group (all cores tested concurrently). This feature also enables smooth integration of multimode StreamPacking as will be discussed in Section III-D. The basic principle of merging seed streams is shown in Fig. 5, which is used to show the main points of Algorithm 2. The lists of seed streams are sorted in descending order of number of reseeds (line 4). The intuition behind this step is that patterns with more reseeds will be more difcult to pack, and since the lists are traversed in order when trying to pack, we aim to pack the more difcult patterns early, to leave more exibility later in the packing process. Then by shifting the patterns with respect to one another an arrangement can be found where there are no seed conicts, only one core requires the seed lines at a time. We always try to place a pattern in the shortest group which has not had all its patterns placed already (line 7). The purpose of this is to ensure even treatment among the groups. If the patterns for one group were fully
1164
assigned before the others, there would be too much rigidity to effectively assign patterns to the remaining groups. All the un-placed patterns in the current group are tried in the current placement slot, as per the loop on line 10 of Algorithm 2.
_______________________________________________________________
Algorithm 2: StreamPack
_______________________________________________________________
_______________________________________________________________
Each pattern is evaluated against the already placed patterns to see if there are any seed conicts (multiple cores requiring reseeding during the same block). If a seed conict arises, the next pattern is tried and so on. If all the patterns have been tried and none t, a NOOP is inserted, which is a single block on one group where no seed is provided. NOOPs are added until nally there are no seed conicts. Once a pattern ts, it is transferred from its list to the corresponding output list (line 13) and this continues until all the input lists are empty (line 6). After pattern re-ordering has been completed, it is a relatively straightforward job to merge the seed streams and calculate the control
streams. Note that since NOOPs are inserted only between patterns, scan chains need never be stalled. Each pattern for each core is shifted into the scan chains for that core as a whole, and the gaps between the patterns on each core are adjusted to remove seed conicts. The fact that our StreamBIST algorithm presented in Section III-A-I does not rely on ATPG/fault simulation in its compression enables IP-consistency for a core-based methodology. Modularity is ensured by StreamPacks ability to accept any type of core that exhibits irregular demand in test data and t it into the system level test infrastructure. A key advantage of our methodology lies in its scalability. For example, as illustrated in Fig. 5, adding more cores to the SOC will marginally impact the total VTD (because only control lines need to be scaled logarithmically with the number of cores) as long as the 100% tester channel utilization is not achieved for a given tester channel bandwidth and pattern reordering, which is based on a polynomial greedy algorithm, avoids the presence of too many NOOPs. Besides the above, the area investment will also be low. The specic details of core-level and system-level hardware are given in Section III-B to Section III-D, however the key area advantage of the proposed time-multiplexing is shown in Fig. 6. Since the seeds are time-multiplexed the same seed is broadcasted to each core and only one will actually load it. For this reason, in contrast to other approaches that use serial transfers of seeds from testers to linear decompressors [18] [Fig. 6(a)], a single seed buffer may be shared among all the cores as in Fig. 6(b). Even in light of routing concerns, the removal of the functional necessity of this buffer provides an added degree of freedom to explore the area/performance tradeoff. The same area advantage is maintained over a more recent [31], which uses the similar concept of expanding a single seed from a buffer to multiple blocks, as originally proposed in [26]. Other approaches that expand test data over multiple time frames, such as [20], which avoids seed buffers by using injectors in linear decompressors, are different in the sense that they exploit ATEs comprising repeat and/or zero ll features. In contrast, our method works under the assumption of a low cost test platform that uses only a simple stream mode of operation. Therefore, the novelty of our method lies into increasing the utilization of tester channels, and hence compression, by intelligently scheduling the reseed times using low-cost on-chip hardware. The Sections III-BD detail the hardware architectures and the pertaining algorithmic considerations necessary for maintaining the above-mentioned benets while ensuring the programmability (Section III-D), i.e., any pattern can be passed through the decompressor. B. Seed-Only Packing Fig. 7 shows the hardware proposed for time-multiplexing of tester channels. We call this rst architecture seed-only because only the seed channels are time-multiplexed. Note that the LFSR, the modulo counter, the seed and control registers and the scan chains are all in the same clock domain during the test mode. Seeds are shifted into the Seed Register on the Seed Lines on a block by block (as opposed to a pattern by pattern) basis. For any given block, there is an option to pick up
1165
Fig. 5. Packing of seed streams and generation of control.
Fig. 6. Transformation from core level seed buffering to system level.
a new seed (by asserting the Reseed Indicator) or to continue expanding the next block from the previous seed, thus allowing some slack to be introduced in the demand for tester channels by a single core. Thus, our approach compresses variable sized blocks of test data onto xed size seeds for which determination of entropy is left as an open problem by [32]. It is from this variable to xed encoding that time-multiplexing is enabled. The architecture is similar in concept to the Illinois Scan Architecture for uncompressed data [33], [34], however some control must be added to regulate which core is using the tester channels during any given block. As shown in the gure this can be accomplished relatively simply, a Control register is used to buffer the control stream and a periodic reseed signal is generated in every block. It can be seen that the hardware used in this compression methodology is not exceedingly elaborate, especially at the system level. The simplicity of the hardware involved is misleading, and is only made possible by sophisticated CAD support as discussed earlier. C. Seed-Mux Packing In order to cope with the linear dependencies inherent to LFSRs, one potential solution is to introduce some recongurability into the output sequence of the decompressor to enable compression of an extended set of patterns. That is precisely
the approach that has been used in the seed and mux packing architecture described in this section. The changes made to the architecture are very simple. A barrel shifter-like structure is added on each side of the phase shifter network in the core-level decompressor. The conguration is selected by an additional set of data lines, the mux lines, which can be packed in essentially the same way as the seed lines. As a result, system level modications consist mainly of provision of additional hardware like that for seed sharing to control the distribution of the mux lines. The major change to the algorithm involves Gaussian elimination for multiple systems of equations, since now the phase shifter may be recongured. The challenge here is computational complexity. Consider the case where two mux lines are used, one for the input muxes and one for the output muxes (as in Fig. 8). This means that in each clock cycle, a choice of four distinct phase shifter matrices exists. Over two clock cycles, sixteen systems of equations may be formed (any of four matrices in the rst clock cycle with any of four in the second). Extrapolating, it is easy to see that the number of potential systems of equations which must be checked grows exponentially with the block length. At the algorithmic level this is manifested through backtracking, which can result in explosive run-times. One way to circumvent the need for backtracking is through the use of symbolic Gaussian elimination as in [35], however memory requirements would explode instead of run-time since the equations are added across multiple time frames. Instead, we have opted to impose a backtrack limit permitting only a set number of trials before abandoning the block. Imposing the backtrack limit permits increased block lengths, which improves compression (through reduced number of seed-lines required) by more than what is lost through the blocks which may be compressible but are abandoned by the backtrack limit. Furthermore, it has been our experience that smaller muxes (2-to-1 in our case) are preferred for reasons of area overhead, and at the same time will reduce the complexity since fewer congurations per frame must be tried. In this way we are able to still gain signicant compression without paying severely in memory or run-time. Modications to the algorithm at the system level are much simpler. The pattern compatibility check must now ensure that not more than one core requires reseeding, not more than one
1166
Fig. 7. Decompression scheme for seed-only packing.
Fig. 8. Core level decompressor for seed and mux packing.
core requires the in-mux lines, and not more than one core requires the out-mux lines during any block of the test pattern. Although this could result in more NOOP insertion due to increased probability of conict, our experiments indicate that for increased care bit densities this does not happen. Although this method is able to improve compression somewhat by enabling increased block lengths, the difference is not substantial as will be discussed in the experimental results (Section IV). For this reason, we propose next another method of addressing linear dependencies, which ensures that every pattern can be applied through the decompressor. D. MultiMode Packing As we have seen in the previous section, introducing some re-congurability (muxes and mux data streams) in the decompressor can enable better compression by improving block length, but as will be demonstrated in the experimental results, the improvement in compression is not consistently dependable, often VTD is improved, but it sometimes increases. This can
result either from a larger number of NOOPs inserted due to increased probability of conict, or if the testing time is reduced it may not be reduced enough to balance the additional data overhead on the mux lines. Thus, either the extra NOOPs or extra data (mux lines) dilute the effect of the increased block length. This section proposes another way to address the linear dependencies responsible for lockouts which reduce compression. The proposed solution uses reconguration to reduce the number of care bits which should appear in any block by reducing the number of scan chains. With fewer scan chains and the same block length, the ratio of expanded bits to provided bits in each block can be reduced, thus patterns with higher care bit densities may be passed, but with poorer compression. In the forthcoming discussion explanations will be centered around the multimode seed-only architecture (shown in Fig. 9) for clarity, however anything discussed applies equally to the multimode seed-mux architecture, which can be obtained by adding reconguration muxes between the phase shifter output muxes and the scan chain inputs to the circuit from Fig. 8.
1167
Fig 9. Multimode core level decompressor.
Of high importance in this approach is that reconguration can be performed a sufcient number of times to enable any pattern to be passed through it, so long as one is willing to pay the penalty in reduced compressionknown by some other methods as a bypass mode. This means that application of a pattern with 100% care bit density would be straightforward with this architecture whereas other approaches (including the seed-only and seed-mux without reconguration as discussed earlier) might not be able to apply the pattern. This ts well with the programmability feature which we desire as discussed earlier. In terms of hardware overhead, a mux per scan chain is required at the core level. At the core level, CAD support changes are minor, an algorithm is wrapped around the StreamBIST algorithm to enable simple calculation of the multimode streams. Starting with the test patterns formatted for the rst mode, StreamBIST compresses all the patterns it can in the rst mode (with highest number of scan chains). Next, the incompressible patterns are reformatted and are passed again through StreamBIST and the cycle continues until all patterns have been compressed. When packing multimode decompressor streams, the ability to connect cores into groups of cores with exclusive test activity (see description of Algorithm 2) enables different modes of the same core to be placed as virtual cores in the same group. Thus, no two modes of any core will be tested at the same time, and we can directly reuse the same StreamPack algorithm to obtain multimode packed streams. By allowing this freedom to test the core in one mode during one pattern and another in the next at will, the probability of NOOP insertion is reduced and thus so is the overall test time. Although we target a low cost test platform it is essential to note that our method is benecial even when testers with more advanced features, such as repeat and/or zero ll, are used. According to [36], the repeat ll feature should be used for the
duration of at least sixteen repeats (due to the conguration of the sequencer memory). Since when patterns have a high care bit density it is very unlikely that long runs of repeats will exist, the ll feature may be deemed unusable. On the contrary, we have observed that in a core-based SOC environment, even in the case of the s35392 ISCAS89 circuit [29], which is known to have highly dense test cubes, our approach exploits the few idle cycles in the reseed stream for allocating the tester data channels to other cores that need to be reseeded. This is enabled by the contribution of our paper, i.e., the few test control streams we employ and the low-cost on-chip logic used to distribute the seeds to the appropriate cores, thus intelligently leveraging the available tester channel bandwidth while ensuring that there is a mode to compress every single pattern. IV. EXPERIMENTAL RESULTS This section provides experimental results for the StreamBIST/StreamPack methodology for SOCs with IP-protected cores. Seeking to establish a compression approach which provides all ve features introduced in Sections I-B, IV-A rst shows the effect of packing on tester channel utilization as the basis of VTD improvement. The costs of area overhead and execution time are discussed in Section IV-B. Finally, Section IV-C shows a comparison of both compression and features to several relevant known solutions. A. Tester Channel Utilization and VTD In this section, we discuss the effect of time-multiplexed tester channels on the tester channel utilization. Fig. 10 shows how employing StreamPack can substantially improve the utilization of the tester channels. As a note, Comp-(num), as rst introduced in Section II, refers to care bit density through the level of compaction. During compaction for a given Comp-(num)
1168
TABLE I SEED-ONLY AND SEED-MUX PACKING VTD AND SCAN TIME
TABLE II SEED-ONLY AND SEED-MUX MULTI-MODE VTD RESULTS (SECTION III-D)
set, two patterns were merged only if the compacted pattern contained less than (num) specied bits, thus Comp-50 is a sparse test set and Comp-1000 is a dense test set. Tester channel utilization in the gure was calculated for a hypothetical SOC (comprised of the 5 largest ISCAS89 benchmark circuits [29]: s13207, s15850, s35932, s38417 and s38584) in the following way. Before packing, StreamBIST was rst applied to each core and utilization was taken as the ratio of the total number of reseeded blocks across the 5 cores from our hypothetical SOC to the number of blocks in total for the 5 cores, giving the No Packing result in the gure. Post packing utilization was taken (after running StreamPack) as the number of active (reseeded) blocks against the total blocks in the StreamPack output stream. The reason for the improvement is that during StreamPack the algorithm attempts to use the inactive periods on each core (the unseeded blocks) to reseed another core. This reduces the overall number of unused blocks and thus improves the utilization. Another way to consider this point is that the number of reseeds contained in the pre- and post-packed streams is the same, while reducing the total amount of test time from the serial case. The total number of blocks (the denominator of the utilization expression) is reduced while the numerator remains constant, thus utilization increases. We change at this point from analysis based on tester channel utilization to analysis based on compression, where effects of block size and number of scan chains are explored more thoroughly. Table I shows the scan times and VTDs for the cases of no packing, seed-only and seed-mux packing for our hypothetical SOC. In both Tables I and II, the number in brackets indicates the block length which was used. Block size can be selected based on target compression, since one seed will expand at least one block of test data. Furthermore, the seed size in Table I was equal to the number of scan chains, and for Table II it was
Fig. 10. Tester channel utilization improvement.
32. Varying numbers of control channels were needed for the different cases, however the control data required was always included in the VTD gure. Since a low-cost streaming ATE is assumed (as per the ve desired features) the VTD is the direct product of the number of tester channels and the scan time, both of which are given as outputs of StreamPack once the control logic has been generated and all NOOPs have been inserted. The VTD of seed-only packing is improved over the no packing case, where neither StreamBIST nor StreamPack is performed. Moving to the seed-mux case, the addition of muxes enables increased block length/more blocks expanded from a single seed which improves packing thus the decrease in scan time. While the block length increase also enables fewer seed channels to be used, more lines are added to drive the muxes. For the higher care bit densities (Comp-400 on), the block length could not be increased and the scan time saved could not compensate for the increased number of lines, thus the poorer VTD. Turning to the multimode approaches (see Table II), the VTDs obtained are signicantly better than the single mode cases just discussed. This is because in the single mode case each pattern is subject to the same hardware as must be allocated for the worst case pattern. Multimode compression allows the difcult to compress patterns to be tested in a low compression mode while the easily compressible patterns (the majority) are provided with high compression. Fig. 10 shows the percentage of patterns per mode for a block length of 4 in the seed-only case using Mintest compaction (bottom row of Table II, the
1169
Fig. 11. Breakout of patterns versus scan chains for ISCAS circuits.
same setup which will be used in the comparison to other works in Section IV-C). Note that for s13207, s15850 and s38584 the majority of patterns are passed in the lowest mode (highest compression) while in s35932, the majority are relegated all the way to the nal mode (with lowest compression) as expected from the high care bit density which the patterns of s35932 exhibit. The relationship between the seed-only and seed-mux multimode approaches arise for the same reasons as discussed in the single mode case above. In these multimode cases, the number of scan chains in the rst (highest compression) mode is 32. B. Area Overhead and Execution Time To obtain results for area, the HDL descriptions generated by StreamBIST and StreamPack for each different sized decompressor were compiled for TSMC [37] 0.18 m CMOS technology using Synopsys Design Compiler [38] for synthesis to obtain the results which are given as equivalent number of 2-input NAND gates. The relationship between block length and decompressor area is shown for 8 and 16 bit LFSR decompressors (with 8 and 16 scan chains respectively) in Fig. 12(a) and for 32 bit LFSR and multimode decompressors (with 32 scan chains) in Fig. 12(b). Note the large jump between the decompressor area for a block length of 1 and that for a block length of 2. This arises from the fact that when using a block length of 1, the need for a seed buffer is removed since all variables are passed from the seed lines to the LFSR in a single clock. When the block length is increased beyond 1 the area for the seed buffer must be added (resulting in the noticeable jump) however, once the buffer has been added the area remains relatively constant with the block size. The small variations in area that appear across different block sizes in the 8, 16 and 32 bit LFSR cases result from a different phase shifter being generated for differing block sizes. Note that the area reported here is pessimistic since no optimization of the phase shifter has been done, as in [39]. Because of this, our phase shifter area exceeds the lower bound of one gate per scan chain and our results could be improved if the phase shifter were optimized. The reason for increased area from each seed-only case to its respective seed-mux case is clear, since the difference between the hardware architectures is the addition of a mux for each scan chain on either side of the phase shifter.
Fig. 12. Core level decompressor area overhead.
In Fig. 12, some system level considerations are illustrated. The most notable feature of the graph is the minuscule area of the control hardware. As mentioned before, this area consists only of a small register and some decoding logic and it grows logarithmically with the number of cores in the system. The uppermost curve (control cores, individual seed buffers) of the gure shows the total area required at the system level, as the sum of the core level decompressors and the control hardware. Since the area is dominated by the core level decompressors, it is no surprise that it grows linearly with the number of cores. As a result of sharing the same seed buffer for all the cores (see Fig. 6 from Section III-A-II) the area will be reduced to the values shown in the middle curve (control+cores, shared seed buffer) of Fig. 12. It should be noted that the system level area for our approach is not dependent on the circuit size, only on the number of scan chains and the number of cores. In the case of 30 cores and 32 scan chains per core, we may make a pessimistic assumption of 100 FFs per scan chain and 20 gates per FF giving 1 920 000 gates for the total circuit area. Fig. 12 shows an area gates for this case which amounts to an overhead of and it is expected that even with variations in overhead of FFs/chain and gates/FF it is very unlikely that the overhead will exceed 2%. Furthermore, as the number of cores grows the area saved will increase, i.e., the number of seed buffers replaced by a single seed buffer will increase. Furthermore, if the phase
1170
TABLE III VTD FOR COMPARED APPROACHES ( FOOTNOTE 2,
= FOOTNOTE 3)
Fig. 13. System level area overhead.
shifter area is optimized as mentioned before, the seed buffer will constitute a greater portion of each core level decompressor and as a result the savings will further increase. In this way, time multiplexing has enabled increased compression not at the expense of extra area, but with reduced area compared to using [18] for each core. Similarly to core level domination of area, execution time is dominated by the core level compression algorithm. Experiments were performed on a 3.06 GHz Pentium 4 Xeon processing node with 1GB of memory running Red Hat Linux 7.3. Which yielded execution times ranging on average from a few patterns per second to 30 seconds per pattern. Results in the upper end of this range were for the Seed-mux StreamBIST case, and were a result of the backtracking required to try multiple congurations of the mux lines. With the introduction of multimode StreamBIST, reconguration handles lockout and thus the need for muxes is eliminated. The domination of execution time by core level compression arises largely from the stringing of multiple blocks together. Gaussian elimination is performed on each block which is of worst case quadratic complexity in the number of equations (number of care bits), linked to the size of each block. The execution time of pattern compression is relatively manageable then, with linear dependence on the number of blocks in the test set, and essentially quadratic dependence on the block size. This complexity at the
core level however enables fast calculation of packed streams at the system level. Specically, the algorithm ran in under one minute for all experiments performed in this paper. Although the worst case complexity of StreamPack is order quadratic in the total number of test patterns, practical run times are often far less since as the algorithm sweeps through the list of patterns remaining to be placed, a pattern t is frequently found quickly. The seed conict check is simple and executes quickly, and as each pattern is processed the number of remaining patterns diminishes, thereby accelerating the process further accounting for the fast execution time and giving condence regarding scalability. With the core level compression needing to be run only when the test set is changed (an unlikely event for IP-protected cores), and system level compression run for each system that uses the core, it is advantageous to place most of the algorithmic complexity at the core level to facilitate fast development of system level tests. C. Comparison to Existing Solutions Although reduction of VTD by increased compression was a major goal of this work, we have considered it equally important to provide the features discussed with regards to compression methods for SOCs with IP-protected cores. These features ultimately determine the domain of applicability of a method and by narrowing the scope of comparison, important issues can be missed. It should be kept in mind that different features are required in different situations and the fact that comparison cannot be made against each of over one hundred approaches proposed even in just the most recent years, the following discussion details comparison with some representative approaches to test data compression not only in terms of the compression which they achieve but also what features they provide. In Section I-B we have established the importance of IP-consistency, modularity, scalability, programmability and a low cost test platform, and the methods discussed here are evaluated in terms of these ve criteria. Table III summarizes the VTD comparison for the selected approaches, and Table IV shows their compliance to the established criteria. Note that although not appearing in the tables, implicit comparison has been made with [18] through the tester channel utilization discussion provided in Section IV-A and area discussion from Section IV-B.
1171
TABLE IV CRITERIA SATISFIED FOR COMPARED APPROACHES
Beginning with IP-consistency, [5], [20], [22] all rely on integrated fault simulation and ATPG to provide compression whereas [1], [2] modify the circuit. In both cases circuit details are required, and thus IP-protection is not respected however, for [5], [20], [22], the methods could be modied to not rely on ATPG/fault simulation thereby making them IP-consistent. Because this would require appreciable effort and would cause signicant changes to the VTD results as they have been reported in the respective papers, we quote criterion satisfaction here according to the way these methods have been presented initially. In terms of modularity, [2], [5], [14], [22], [27] do not appear to exhibit any system level exploitable feature and [3], [13], [40] suffer from methodology issues (i.e., requirement of acknowledgement signal to the ATE and scan clock). It can be noted that for those approaches which are modular, our method is largely complementary, providing a way to manage connection of the cores at the system level. With respect to scalability, [27] suffers large complexity when deciding how to map care bits and mutation order, [2] requires a large signature analyzer to collect scan tree outputs and [5], [40] use unreasonably short internal scan chains. As before, [5], [40] can be considered scalable if applied to reasonable length scan chains, but this will impact the VTD, so these methods are listed as non-scalable since the results have been reported in a non-scalable way. Turning attention to programmability, none of [1], [14], [20], [22] provide any mechanism for application of patterns with high care bit densities once the hardware is in place, and [5] crafts the phase shifter around the test set reducing the chance that even new vectors of low care bit density could be applied. Finally, all these methods can immediately use a low cost test platform, in cases where the particular ATE features are exploited they are not crucial to the compression methodology. In light of the above, the only methods which provide identical features as indicated in Table IV are [9], [33]. As Table III indicates, we achieve relatively similar and superior VTDs to [9], [33] respectively. A further work [25] which follows the same intuition as [26] (the basis for this work) has also recently been proposed which incorporates all ve of the desired features, although programmability is obtained through the use of scan clock gating which can negatively impact modularity and VTD is improved over that reported here at the expense of higher on chip area. Note that while a single LFSR does implicit time-multiplexing of the seeds from the tester, when multiple LFSRs are used instead reconguration can be done on a single core, thereby focussing the seeds to where they are most needed. Also, the case of solving one large system of equasize LFSR should have comtions across all cores for an plexity proportional to , while the case of separate LFSRs would be more like or a factor of less where is the number of cores.
It should be noted that the item in Table III marked with a indicates that result includes s5378 and s9234 from the ISCAS89 benchmark set2. Our result for VTD in this case is 262000. Along the same lines, items marked with a indicate that s35932 has been neglected3. This circuit is particularly random pattern resistant (not easily encodable in pseudorandom streams) and is thus often not reported in VTD results for compression methods, especially those employing linear decompressors. It is important to note that the programmability criteria requires the ability to apply any pattern, and lack of results for s35932 is often an indicator of lack of programmability, as can be seen in the respective row of Table IV. Finally it is noteworthy to mention that removal of the methodology constraints on our approach, for example integrating ATPG/fault simulation during compression or increasing the scan chains to over 256 for ISCAS89 circuits, which results in very short scan chains but large decompressor sizes, will signicantly improve our VTD results reported in Tables I and II. Nevertheless, we choose not do this because it will defeat the very purpose of this work which is to offer a clear picture of how compressed SOC testing can be achieved for IP-protected blocks with seamless integration to low cost test equipment. Besides, the added value of this paper, i.e., time-multiplexing the tester channels, can be better judged if the VTD results are not biased by other factors that inuence them, such as integrating compression with ATPG/fault simulation. This will actually enable a better decision on integration of time-multiplexing of tester channels to any compatible complementary compression techniques, whenever the SOC design methodology permits it. V. CONCLUSION Despite enabling the use of low cost testers for rapidly achieving high fault coverage, compression methods must consciously use the available tester channels to ensure non-disruptive scaling to future devices of increased complexity and design methodologies that have to deal with IP-protected cores. In this paper we have shown that generally, tester channels are under-utilized for compressed SOC test when dealing with multiple embedded cores that are provided with their own test cubes. Through exploitation of care bit clustering, a compression method is proposed based on time-multiplexing the tester channels among multiple cores. Although focused on compressed test of SOCs with IP-protected cores, the proposed solution can be viewed also as an add-on to other compression methods, whenever different clusters of logic exhibit non-uniform demand of compressed data during test.
2VTDs 3VTDs
marked by marked by
in the table indicate inclusion of s5378 and s9234. in the table indicate s35932 is not included.
1172
REFERENCES [1] K. J. Balakrishnan and N. A. Touba, Improving encoding efciency for linear decompressors using scan inversion, in Proc. IEEE ITC, 2004, pp. 936944. [2] K. Miyase, S. Kajihara, and S. M. Reddy, Multiple scan tree design with test vector modication, in Proc. IEEE ATS, 2004, pp. 7681. [3] L. Li and K. Chakrabarty, Test data compression using dictionaries with xed-length indices, in Proc. IEEE VLSI Test Symp. (VTS), 2003, pp. 219224. [4] I. Pomeranz and S. M. Reddy, Test data compression based on inputoutput dependence, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 5, pp. 14501455, Oct. 2003. [5] W. Rao, I. Bayraktaroglu, and A. Orailoglu, Test application time and volume compression through seed overlapping, in Proc. IEEE/ACM DAC, 2003, pp. 732737. [6] A. Jas, J. Ghosh-Dastidar, M.-E. Ng, and N. A. Touba, An efcient test vector compression scheme using selective Huffman coding, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 3, pp. 797806, Jun. 2003. [7] A. Chandra and K. Chakrabarty, Test data compression and decompression based on internal scan chains and Golomb coding, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 3, pp. 715722, Jun. 2002. [8] P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici, Variable-length input Huffman coding for system-on-a-chip test, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 3, pp. 783796, Jun. 2003. [9] P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici, Synchronization overhead in SoC compressed test, IEEE Trans. Very Large Scale Integr. (VLSI) Syst. , vol. 13, no. 1, pp. 140152, Jan. 2005. [10] A. Chandra and K. Chakrabarty, Test data compression and test resource partitioning for system-on-a-Chip using frequency-directed runlength (FDR) codes, IEEE Transactions on Computers, vol. 52, pp. 10761088, Aug. 2003. [11] V. Iyengar, A. Chandra, S. Schweizer, and K. Chakrabarty, A unied approach for SoC testing using test data compression and TAM optimization, Proc. IEEE/ACM Design, Automation and Test in Europe (DATE), pp. 11881189, 2003. [12] A. Wurtenberger, C. S. Tautermann, and S. Hellebrand, A hybrid coding strategy for optimized test data compression, in Proc. IEEE ITC, 2003, pp. 451459. [13] M. Tehranipour, M. Nourani, and K. Chakrabarty, Nine-coded compression technique with application to reduced pin-count testing and exible on-chip decompression, in Proc. IEEE/ACM DATE, 2004, vol. 2, pp. 12841289. [14] C. V. Krishna and N. A. Touba, Reducing test data volume using LFSR reseeding with seed compression, in Proc. IEEE ITC, 2002, pp. 321330. [15] S. Mitra and K. S. Kim, XMAX: X-tolerant architecture for maximal test compression, in Proc. IEEE ICCD, 2003, pp. 326330. [16] I. Bayraktaroglu and A. Orailoglu, Concurrent application of compaction and compression for test time and data volume reduction in scan designs, IEEE Trans. Comput., vol. 52, no. 11, pp. 14801489, Nov. 2003. [17] B. Koenemann, LFSR-Coded test patterns for scan designs, in Proc. IEEE ETC, 1991, pp. 237242. [18] P. Wohl, J. A. Waicukauski, S. Patel, and M. B. Amin, Efcient compression and application of deterministic patterns in a logic bist architecture, in Proc. IEEE/ACM DAC, 2003, pp. 566569. [19] C. V. Krishna, A. Jas, and N. A. Touba, Test vector encoding using partial LFSR reseeding, in Proc. IEEE ITC, 2001, pp. 885893. [20] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, Embedded deterministic test, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 23, no. 5, pp. 776792, May 2004. [21] G. Mrugalski, J. Rajski, and J. Tyszer, Ring generatorsNew devices for embedded test applications, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., ser. 9, vol. 23, pp. 13061320, Sep. 2004. [22] A. Jas, B. Pouya, and N. A. Touba, Test data compression technique for embedded cores using virtual scan chains, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 7, pp. 775781, Jul. 2004. [23] B. Koenemann, C. Barnhart, B. Keller, T. Snethen, O. Farnsworth, and D. Wheater, A smartbist variant with guaranteed encoding, in Proc. IEEE ATS, 2001, pp. 325330. [24] E. H. Volkerink and S. Mitra, Efcient seed utilization for reseeding based compression, in Proc. IEEE VTS, 2003, pp. 232237.
[25] Z. Wang, K. Chakrabarty, and S. Wang, SoC testing using LFSR reseeding, and scan-slice-based RAM optimization and test scheduling, Proc. IEEE/ACM DATE, pp. 201206, 2007. [26] A. B. Kinsman and N. Nicolici, Time-multiplexed test data decompression architecture for core-based SoCs with improved utilization of tester channels, in Proc. IEEE ETS, 2005, pp. 196201. [27] S. Reda and A. Orailoglu, Reducing test application time through test data mutation encoding, Proc. IEEE/ACM DATE, pp. 387393, 2002. [28] L. Schafer, R. Dorsch, and H.-J. Wunderlich, Respin++Deterministic embedded test, in Proc. IEEE Eur. Test Workshop, 2002, pp. 3744. [29] F. Brglez, D. Bryan, and K. Kozminski, Combinational proles of sequential benchmark circuits, in Proc. IEEE ISCAS, 1989, pp. 19291934. [30] A. B. Kinsman, Embedded Deterministic Test for Systems-on-aChip Masters thesis, McMaster Univ., Hamilton, ON, Canada, 2005. [31] P. Wohl, J. A. Waicukauski, S. Patel, F. DaSilva, T. W. Williams, and R. Kapur, Efcient compression of deterministic patterns into multiple prpg seeds, in Proc. IEEE ITC, 2005, pp. 110. [32] K. J. Balakrishnan and N. A. Touba, Relating entropy theory to test data compression, in Proc. IEEE ETS, 2004, pp. 9499. [33] I. Hamzaoglu and J. H. Patel, Reducing test application time for full scan embedded cores, in Proc. Int. Symp. Fault-Tolerant Comput., 1999, pp. 260267. [34] F. F. Hsu, K. M. Butler, and J. H. Patel, A case study on the implementation of the Illinois scan architecture, in Proc. IEEE ITC), 2001, pp. 538547. [35] K. J. Balakrishnan and N. A. Touba, Recongurable linear decompressors using symbolic Gaussian elimination, Proc. IEEE/ACM DATE, vol. 2, pp. 11301135, 2005. [36] H. Vranken, F. Hapke, S. Rogge, D. Chindamo, and E. Volkerink, ATPG padding and ATE vector repeat per port for reducing test data volume, in Proc. IEEE ITC, 2003, pp. 10691078. [37] Taiwan Semiconductor Manufacturing Corporation TSMC Website [Online]. Available: http://www.tsmc.com [38] Design Compiler Synopsys synthesis tools, 2003 [Online]. Available: http://www.synopsys.com/products/logic/design_compiler.html [39] J. Rajski, N. Tamarapalli, and J. Tyszer, Automated synthesis of phase shifters for built-in self-test applications, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 19, pp. 11751188, Oct. 2000. [40] C. V. Krishna and N. A. Touba, Adjustable width linear combinational scan vector decompression, in Proc. IEEE/ACM ICCAD, 2003, pp. 863866.
Adam B. Kinsman (S04) received the B.Eng. degree in engineering physics and the M.A.Sc. degree in computer engineering from McMaster University, Hamilton, ON, Canada in 2004 and 2005, respectively, where he is currently pursuing the Ph.D. degree in computer engineering. His research interests include computer-aided design for test and debug.
Nicola Nicolici (S99M00) received the Dipl. Ing. degree in computer engineering from the University of Timisoara, Romania (1997), and the Ph.D. degree in electronics and computer science from the University of Southampton, U.K. (2000). He is an Associate Professor in the Department of Electrical and Computer Engineering at McMaster University, Canada. His research interests are in the area of computer-aided design and test. Dr. Nicolici has authored a number of papers in this area and received the IEEE TTTC Beausang Award for the Best Student Paper at the International Test Conference (ITC 2000) and the Best Paper Award at the IEEE/ACM Design Automation and Test in Europe Conference (DATE 2004).

Time-Multiplexed Compressed Test of SOC Designs: Adam B. Kinsman, Student Member, IEEE, and Nicola Nicolici, Member, IEEE

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time-Multiplexed Compressed Test of SOC Designs: Adam B. Kinsman, Student Member, IEEE, and Nicola Nicolici, Member, IEEE

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO.

Time-Multiplexed Compressed Test of SOC Designs

1063-8210/$26.00 2009 IEEE

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

Fig. 1. Non-uniform distributions enabling improved compression.

Fig. 2. Choices for SOC test application (each square

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

Fig. 3. Calculation of seeds for StreamBIST.

Fig. 4. Compression ow for StreamPack.

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

Fig. 5. Packing of seed streams and generation of control.

Fig. 6. Transformation from core level seed buffering to system level.

Fig. 7. Decompression scheme for seed-only packing.

Fig. 8. Core level decompressor for seed and mux packing.

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

Fig 9. Multimode core level decompressor.

TABLE I SEED-ONLY AND SEED-MUX PACKING VTD AND SCAN TIME

TABLE II SEED-ONLY AND SEED-MUX MULTI-MODE VTD RESULTS (SECTION III-D)

Fig. 10. Tester channel utilization improvement.

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

TABLE III VTD FOR COMPARED APPROACHES ( FOOTNOTE 2,

Fig. 13. System level area overhead.

KINSMAN AND NICOLICI: TIME-MULTIPLEXED COMPRESSED TEST OF SOC DESIGNS

TABLE IV CRITERIA SATISFIED FOR COMPARED APPROACHES

You might also like