You are on page 1of 11

426

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

3-D System-on-System (SoS) Biomedical-Imaging Architecture for Health-Care Applications


Sang-Jin Lee, Student Member, IEEE, Omid Kavehei, Student Member, IEEE, Yoon-Ki Hong, Tae Won Cho, Member, IEEE, Younggap You, Member, IEEE, Kyoungrok Cho, Member, IEEE, and Kamran Eshraghian
AbstractThis paper presents the implementation of a 3-D architecture for a biomedical-imaging system based on a multilayered system-on-system structure. The architecture consists of a complementary metaloxide semiconductor image sensor layer, memory, 3-D discrete wavelet transform (3D-DWT), 3-D Advanced Encryption Standard (3D-AES), and an RF transmitter as an add-on layer. Multilayer silicon (Si) stacking permits fabrication and optimization of individual layers by different processing technology to achieve optimal performance. Utilization of through silicon via scheme can address required low-power operation as well as high-speed performance. Potential benets of 3-D vertical integration include an improved form factor as well as a reduction in the total wiring length, multifunctionality, power efciency, and exible heterogeneous integration. The proposed imaging architecture was simulated by using Cadence Spectre and Synopsys HSPICE while implementation was carried out by Cadence Virtuoso and Mentor Graphic Calibre. Index Terms3D-AES, 3-Ddiscrete wavelet transfrom (3D-DWT), biomedical imaging, system-on-system (SoS), unary computation.
Fig. 1. Proposed ubiquitous mobile digital biomedical-imaging system as part of health-care monitoring and management.

I. INTRODUCTION OBILE HEALTH-CARE monitoring systems and services are rapidly growing as the result of advances in silicon complementary metaloxide semiconductor (CMOS) scaling as well as rapid improvements in the availability of broadband communication systems and networks. Video images, such as magnetic resonance imaging (MRI), computed tomography (CT), and X-rays introduce heavy demand on the storage capacity of the memory layer of a processing engine. Contemporary research into future digital biomedical-imaging technology depicted in Fig. 1 is conjectured to inuence the manner in which radiologists, medical practitioners, and specialists interact. The dynamic and bandwidth requirement of

Manuscript received March 31, 2010; revised July 25, 2010; accepted September 09, 2010. Date of current version November 24, 2010. This work was supported by the World Class University (WCU) Project of MEST and KOSEF through Chungbuk National University under Grant no. R33-2008-000-1040-0. This paper was recommended by Associate Editor A. Bermak. S.-J. Lee, Y.-K. Hong, T. W. Cho, Y. You, K. Cho, and K. Eshraghian are with the College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 361-763, Chungbuk, South Korea (e-mail: sjlee@hbt. chungbuk.ac.kr; omid@hbt.chungbuk.ac.kr; twcho@chungbuk.ac.kr; ygyou@chungbuk.ac.kr; krcho@chungbuk.ac.kr; k.eshraghian@innovationlabs.com.au). O. Kavehei is with the College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 361-763, Chungbuk, South Korea. He is also with the School of Electrical and Electronic Engineering, University of Adelaide, Adelaide SA 5005, Australia (e-mail: omid@eleceng.adelaide.edu.au). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TBCAS.2010.2079330

medical sensor data vary signicantly from 10 ksamples/s at 20 b/sample for CT, to more than 100 Msamples/s at 16 b/sample for MRI. Medical sensor data rates are increasing exponentially. It is conjectured that next generations of CT and MRI systems are likely to generate 4 Gb/s to more than 200 Gbit/s of sampled data [1]. Usually, compression algorithms offer either lossy or lossless operations. As a consequence, several approaches, such as discrete cosine transform (DCT)-based JPEG, MPEG, and H.26x have been implemented to address the complexity associated with the massive amount of data required to be transported in one form or another. The wavelet-based image processing, such as discrete wavelet transform (DWT), has emerged as an option to DCT-based schemes. The approach permits implementation of a highly scalable image compression with inherently superior features, such as progressive lossy to lossless performance within a single data stream, the ability to enhance the quality associated with selected spatial regions in selected quality layers, and the absence of blocky artifacts in low bit rates. Substantial clinical research has been devoted to applications of image compression for biomedical equipments and, as a consequence, the medical imaging standard Digital Imaging and Communications in Medicine (DICOM) employs JPEG 2000 [2], [3]. Privacy protection has also been an important issue as medical image data are transmitted through public communication networks. Encryption is an effective way to protect medical information records against modication and eavesdropping of communication channels and, hence, can offer the security and integrity of stored data from tampering. The conventional 2-D systems-on-a-chip (SoC) technology [4] that has characterized implementation strategies of the industry over the last decade has many challenges in terms of

1932-4545/$26.00 2010 IEEE

LEE et al.: 3-D SOS BIOMEDICAL-IMAGING ARCHITECTURE

427

Fig. 2. Three-dimensional stacked IC technology featuring 5-m TSV technology, enabling the interconnection of global wires between different dies [7].

area utilization, long signal paths, and the related complexity of signal routing that requires a large number of inputs/outputs (I/Os), resulting in an increase in power consumption. Consequently, a number of constraints are encountered in the design of high-speed and low-power portable multimedia-based biomedical-imaging systems. Recent advances in 3-D multilayered fabrication, together with the progress in vertical interconnect technology, such as that of the through silicon via (TSV) shown in Fig. 2, make the 3-D architectural mapping a viable option for gigascale integrated systems [5][7] demanded by the more futuristic metahealth cards and the like [8]. Therefore, the combination of 3-D integrated architectures with multilayer silicon die stacking is a promising solution to the severe problem faced by the integrated-circuit (IC) industry as geometries are scaled below 32 nm. Three-dimensional VLSI technology increases packing density while having the potential for providing signicant improvement in propagation delay and reduced power consumption when compared to its equivalent 2-D counterpart. For example, propagation delay and energy dissipation associated with interconnects are reduced by up to 50% at a 45-nm feature size [9]. Hence, multilayer Si die stacking is an imminent scheme for mobile healthcare monitoring that can address low-power requirements, small footprint/volume, high reliability, as well as the option for high-speed performance. Through utilization of TSV technology, chips can be individually optimized and, hence, vertically stacked. The length of TSV can range from 20 to 50 m [10], being shorter than the conventional 2-D SoC designs. The smaller size and shorter physical dimension reduce parasitic effects resulting in faster and cooler circuits. TSVs are good thermal conductors. Therefore, strategic positioning of TSVs in adjacent layers, inclusion of dummy thermal TSVs, and placement of TSVs within hotspots can enhance thermal conduction of central layers at a cost of utilizing an additional chip area. Other options also also available, such as an on-chip thermoelectric thin lm cooler that has limited promise [11]. However, these exotic techniques are more expensive and have limited applications. Although scaled transistors show an improved performance between the 9022 nm range, the wiring interconnects dominate the behavior [5], [12][15]. The improvement gained, which is the result of scaled transistors, is insignicant when campared with the inuence of scaled interconnects. The 3-D ICs offer a promising solution, reducing the interconnect length and the footprint, without the need for scaled transistors [16]. As a consequence, the focus of this paper is in the realization of a multilayered SoS architecture that consists of an image capture layer (upper layer), memory, 3-D DWT (3D-DWT), 3-D

Advanced Encryption Standard (3D-AES), and an RF transmitter that can be implemented as an add-on layer to facilitates secure data transmission over public communication channels. For the image capture layer, many alternatives are feasible that can produce the necessary data stream [17]. The focus in our approach, however, is directed toward the design and simulation of a single-inverter pulsewidth modulation (PWM) readout structure that provides a digital data stream to the next layer, and consumes about 5% of total consumed power. This paper is organized as follows. Section II generalizes the overall 3-D system architecture for image capture, data compression, and encryption. Section III introduces the image-capturing layer followed by Section IV where the 3-D image-processing and encryption layers are presented. Performance and related analysis are given in Section V. Section VI concludes this paper. II. 3-D MULTILAYERED PHYSICAL ARCHITECTURE The 3-D portable health-care monitoring system is comprised of multiple Si dies that accommodate system layers, such as the CMOS image sensor (CIS), memory layer, 3D-DWT [18], and 3D-AES [19] blocks, and an add-on RF transmitter. Fig. 3 illustrates the logical architecture as well as the approach toward the segmentation of the layers. The approach pursued in the design of the CIS layer is based on a single-inverter digital pixel sensor (DPS). In this arrangement, the pixels output signal is compatible with a unary (in coding the lengths of zeros in a bit vector) data-computation framework in anticipation that this approach will prove advantageous in future designs [20]. In order to represent a decimal in the unary number system, an arbitrarily chosen number symbol representing 1 is repeated times. For example, decimal 6 as an 8-b word in unary is represented by 11111100. The signicant feature to be noted is that there is only one switching transition from 1 to 0. The unary computation has the potential to improve the frame rate of CIS and provide further improvement in power dissipation due to reduction in the switching activity. The stored image frames in the memory layer (Fig. 3) are forwarded to the 3D-DWT layer, which consists of a series of decompositions and embedded block coding with optimized truncation (EBCOT). Subsequently, the compressed image data are encrypted by the 3D-AES layer and can be transmitted by the RF transmitter to a host computer. The memory layer may be implemented by using the more conventional SRAM. However, due to the possible requirement for very low power and ultra-high capacity memory, inclusion of Memristor (a portmanteau of memory and resistor) creates new possibilities [21] in implementing this layer. The signicant features of the memristor are its ability to remember its state after the removal of the power source, its nano-based feature size that would allow realization of Terabit memories at low power, and its compatibility with CMOS processing technology [5] In 3-D implementation, the average global interconnects wiring length and, hence, overhead delay increase by a square root of the number of layers [22]. Therefore, smaller feature size and shorter physical dimension of TSV technology, when adopted as the interconnect between layers, reduce the parasitic effects resulting in faster and cooler circuits. To address the possible dissipation problem of the central layers that could

428

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

Fig. 4. Proposed pixel with conditional reset transistors: M2 and M3.

A. CMOS Image Sensor (CIS) Layer In order to improve the dynamic range of CIS and, as our initial step, we rst augmented the conventional CIS design [25] with the addition of conditional reset transistors M2 and M3 in the pixel as illustrated in Fig. 4. The M3 transistor selects only one pixel to reset conditionally in a given column by the c_reset signal that is a column-select common signal. The M1 transistor resets the photodiode at the end of every integration period, which prevents pixels from being bloomed. The source follower transistor M4, amplies the photodiodes output signal and applies it to a column line of CIS (pixel out). 1) Conditional Reset Algorithm : The pixel output is sampled at exponentially increasing exposure times and is digitized to bits [26]. The digitized samples for each bit binary numbers with the pixel are combined into oating-point format, which is further converted to a oatingpoint number with an exponent ranging from 0 to and an -bit mantissa. This increases the dynamic range by a factor of , while providing bits of resolution for each exponent range of illumination. Since analog-to-digital (A/D) conversion is carried out at each sampling instance, there is a potential problem of blooming due to the pixel saturation. A higher dynamic range, however, can be obtained by reducing the sample-to-sample interval [25], [27]. In the conditional reset-based pixel, the combination of the photodiode and the fd nodes parasitic capacitances was estimated to be 48 fF, and the estimated value of conversion gain . The pixel output voltage of 1.6 was re[25] was 3.3 freshed at the 5-ms integration time frames. The photocurrent ( ) and the total number of electrons were calculated to be 15.3 pA and 500 000, respectively [25]. The product of the conversion gain and the total number of electrons gives a voltage drop of 1.6 V at the pixel output [25]. The pixel area and ll factor are 16 16 m and 34%, respectively [25]. To improve the performance of the CIS layer, we redesigned and simulated a 128 128 pixels structure (Fig. 5) using a Hynix 0.25- m standard CMOS process, and commercial design tools. Using the conditional algorithm, the sensors dynamic range can be increased to , where is the sampling times during an integration period and it is only limited by the speed of the readout circuitry.

Fig. 3. Proposed 3-D multilayer secure imaging system where the interlayer interconnections are based on the TSV technology. (a) Logical architecture. (b) Physical implementation using TSVs for connecting the layers.

lead to a rise in temperature, the strategy in our proposed architecture is to change the thermal prole of the central core by interleaving 3D-DWT and 3D-AES into two layers (Fig. 3). This approach distributes the switching activity between two layers and, as a consequence, changes the thermal prole of the core. Moreover, the signicant aspect of the proposed 3-D architecture for medical application is the possibility for compression to take place at the sensor level [23]. The advantage offered appears in terms of performance (bandwidth/memory requirements) and computational complexity improvements [23], [24]. III. IMAGE CAPTURING: CMOS IMAGE SENSOR LAYER Conventional CIS usually includes pixel arrays, gain ampliers, and analog-to-digital converters (ADC) implemented on the same Si die. The overall performance of CIS can be enhanced if the high-speed digital processor layer and the analog layer are optimized separately, and connected by TSV [14]. An asynchronous event-driven signal processor can manage communication between the layers.

LEE et al.: 3-D SOS BIOMEDICAL-IMAGING ARCHITECTURE

429

Fig. 5. CMOS image sensor chip layout. (a) CIS chip layout used for extraction and simulations. (b) Individual pixel layout.

Fig. 7. Physical architecture of a multilayered CIS, which also includes the memory layer as well as the compression (3D-DWT) and encryption (3D-AES) layers.

Fig. 6. New pixel circuit with an inverter acting as an A/D converter. The 0 to 1 transition at the output of the inverter can occur only once which implies a reduction in switching activity and, hence, an improvement in power dissipation.

B. Advanced CIS Pixel The circuit depicted in Fig. 4 has an advantage in dynamic range. However, the pixel area is larger than the conventional structure. The need for inclusion of an A/D convertor also inuences area utilization, speed, and power consumption. To overcome the area/speed limitations and to address the need for the CIS layer to be connected directly to the next layer, a new digital pixel sensor (DPS) approach as shown in Fig. 6 is investigated. The pixel output in this novel circuit is a pulsewidth-modulated signal that transfers illumination levels directly to the next layer. The amount of charge induced due to the incident illumination determines the output state of the inverter. Pixel information in an 8-b depth has 256 steps of luminance. Within the unary number system, there is only one transition in 256 steps, which implies it is possible to reduce the dynamic power dissipation of the circuit [20]. The nature of unary arithmetic is an important strategy for the design of future processors for low-power and portable applications. The M3 transistor shown in Fig. 6 is used to cancel the offset voltage while the M2 transistor is OFF. M3 and reset transistors are ON simultaneously. Since the inverters ), depending on the iloutput is not fully charged (near lumination level, it gradually charges as the input to the inverter decreases. As a consequence, it is possible to increase the frame rate by reducing the integration time. The new pixel downloads frame data to the processor layer through TSV in parallel. Since the integration time can be over several milliseconds, the pixel has adequate time to accumulate the input charge because of the low-level incident illumination. Moreover, the dynamic switching 0 to 1 transition occurs

only once at the output data stream of the pixel as shown in Fig. 6. The physical architecture of a multilayered CIS, which also includes the compression (3D-DWT) and encryption (3D-AES) layers is shown in Fig. 7. The pixel array and peripheral circuitry are implemented in separate layers, and linked by TSV interconnects. The upper layer comprises pixel arrays, row drivers, and decoders, and RGB sample-and-hold circuitry. The lower layers perform an image-saving task in the memory, such as SRAM and memristor [21], image compression with DWT, and encryption by AES. IV. 3-D IMAGE SIGNAL-PROCESSING ARCHITECTURE The DWT algorithm has been widely used for image-processing techniques [18]. Traditionally, the wavelet scheme in image processing is based on the 2D-DWT, where the focus is on treating only still images as planes. In contrast, the 3D-DWT expands the 2D-DWT to 3-D for the compression of volumetric data, such as that created by CT and MRI scanners. The 3D-DWT has an advantage of allowing a series of 2-D images to be further compressed in the third dimension by exploiting correlation between the adjacent images in series. The outcome provides better compression ratios as well as the absence of blocking artifacts [28], [29]. Therefore, such an approach is implemented in a variety of applications, including image compression and noise reduction demanded by MRI and CT scans. The algorithm for DWT is 3-D while that of AES is 2-D. However in order to optimize the thermal prole of the core layer, the 2D-AES algorithm is implemented in two layers which gives a computation advantage gained through much shorter TSV communication paths and, hence, further power reduction. A. 3-D-DWT Architecture A 3-D image is an extension of 2-D images along the temporal direction. A 3-D-DWT performs a spatial-temporal decomposition along two spatial , , and one temporal direc-

430

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

tions on the image sequences. The captured image sequences pass through 3D-DWT decomposition and are coded through the embedded block coder. The 3-D-DWT repeats the decomposition of 1-DDWT. The DWT computation is described by (1)

(2) where and are the low- and high-pass subband lter and represent imoutputs, respectively. The functions pulse responses of low-pass lters and high-pass lters, respecand are the retively. The shows the lter length, and lated output signals [30]. Equation (1) describes a lter for the low-frequency range, yielding approximate information while (2) shows a lter for the high-frequency range delivering detailed information. The decomposed approximate information is further decomposed into another set of approximate and detailed information with a lower degree of resolution. Fig. 8 illustrates one level of 3D-DWT, where the H-pass and the L-pass represent a high-pass lter and a low-pass lter, respectively. The downsampled LLL (1-leveled low-pass signal) component can be decomposed further through multilevel DWT decomposition processes. The LLL component is the low-resolution version of the original image sequence and has most of its features. The levels of image decomposition depend on the system conguration dictated by the desired system performance. Conventional 3D-DWT is inefcient since it requires accessing all of the image frames at the same time and, therefore, a signicant amount of memory space is needed to perform the DWT process [31]. The concept of the group of frames (GoF), which is similar to the group of pictures in MPEG, is introduced here to overcome the drawbacks associated with conventional 3-D-DWT. Furthermore, 2-D physical implementation of the 3-D-DWT algorithm has limitations in terms of compression efciency due to frame access. Therefore, a new architecture based on a two-layered system is introduced which addresses the frame-access issue. The 3-D physical implementation using TSV permits accesses to all frames on the same time axis, thus providing better data-compression efciency. B. DWT Functional Block and Filter Design Functional block composition for the 3-D-DWT to implement one-level 3-D-DWT is illustrated in Fig. 9 using the algorithm highlighted in Fig. 8. The DWT functional block consists of lter bank-1s (FB-1s), lter bank-2s (FB-2s) and their related interconnections. The nine bank-1 lters downsample in the direction. Two bank-2 lters and the next four bank-1 lters decompose images in the and direction, respectively. The lter architecture is based on the Daubechies 9/7 lter [31]. The key parameters in the design of lters are the peak signal-to-noise ratio (PSNR) and the occupied chip area. Cao et al. achieved a high level of accuracy in xed-point implementation by using 13-b coefcients [32]. Our approach is based on the modied version of Caos lter bank-1, which maps the adder/compressor array into two parallel streams in order to generate the high and the low subband lter outputs simultaneously. Fig. 10 highlights the modied lter bank-1

Fig. 8. One-level 3-D-DWT decomposition where H and L represent a highpass lter and a low-pass lter, respectively. The LLL component is the lowresolution version of the original image sequence and has most of its features.

Fig. 9. Logical architecture of the 3-D-DWT of lter bank-1 and lter bank-2.

having a serial input. Filter bank-2 is the same as lter bank-1, taking nine pixel data input in parallel while 3-D-DWT decomposes the sequence of images along the direction of the axis (which is time), the axis (vertical), and the axis (horizontal) simultaneously. This implies that 81 pixels are processed in three dimensions at a time. C. Unary to Binary Convertor We need a unary-to-binary (U2B) converter to save data from the CMOS image sensor to the memory layer which can be implemented and can be either a SRAM or a memristor-based structure. The proposed U2B converter consists of a controller, register array, and clock source. As shown in Fig. 11, the converted data are temporally saved in an 8-b register to adjust timing that is based on S. Agarwals ip-op [33]. The DWT processing utilizes binary computation. However, the binary computation can be replaced with unary system processing [34], [35]. D. 3-D-AES-Based Image Encryption The medical images are encrypted with a block cipher algorithm AES that has a 128-b input data block with a cipher key of length 128, 192, or 256 b [19]. The block cipher can

LEE et al.: 3-D SOS BIOMEDICAL-IMAGING ARCHITECTURE

431

Fig. 12. Logical architecture for the 3-D-AES functional block.

not in the proximity of blocks in the neighborhood which have high-power density. V. FUNCTIONAL MODULE SIMULATIONS AND PERFORMANCE ANALYSIS In this section, we present simulation results and analyze performance of the individual modules to be implemented in the proposed 3-D biomedical imager.
Fig. 10. Modied lter bank-1 design. The adder/compressor array is divided into two parallel structures in order for the high and the low subband lter outputs to be generated simultaneously.

A. Pixel Dynamic Range and Power Evaluation The inverter-based pixel structure was designed using Samsung semiconductor 0.13- m CMOS technology. Fig. 13 shows the relationship between the photodiode normalized capacitance with respect to sensitivity (V/sec-lux) and effective photore). A set of active pixel sensors with difceptor area ( m ferent photodiode dimensions with a 4 m pixel pitch was simulated. The cell areas were varied between 16.25 m ( axis) and the photodiode capacitances that are related to the accumulated charge at each pixel were noted. The axis of Fig. 13 highlights the normalized capacitance for each pixel while the axis demonstrates the sensitivity of a pixel. As the photodiode area increases, sensitivity decreases and more effective charge becomes available. However, there is a highly nonlinear relationship between the photodiode junction capacitance and photodiode area, when there is relatively low sensitivity and high ll-factor (effective photoreceptor area/total pixel area). The power analysis for the single inverter pixel was also carried out using the mentioned technology. Simulations show that the average power consumption to capture a frame (where the total photocurrent ( ) varies from 10 pA to 10 nA across the 128 128 sensor) is 1.36 mW. Fig. 14 also shows the delay be) and the switching point tween integration time triggering ( ). The delay decreases by inat the output of the inverter ( creasing the light intensity, and an adaptive approach to adjust the integration time will signicantly help to reduce the power consumption of the pixel. B. CIS Readout Architectures and Related Analysis There are several options for the implementation of the readout part of the CIS. In El-Desoukis work [37], readout is classied in terms of pixel-by-pixel ADC (PBP-ADC), per-column ADC (PC-ADC), and per-pixel ADC (PP-ADC). Fig. 15 highlights these three different architectures of CIS. Their respective frame rates (FR) in each case can be modeled

Fig. 11. Logical architecture for the unary-to-binary convertor.

perform encipher and decipher operations using the repeated operation of a substitute permute network (SPN) on 128 b of data. Each time the SPN is used, it is supplied with a different round-key. These are generated by a function known as key-expansion. The rst round comprises a 128-b XOR of the plaintext with the key to form a new 128-b state. Each middle round operates on the state by performing the operationssub-bytes, shift-rows, mix-columns, and add-round-key. In the decryption procedure, the order of transformations is reversed. Mix- or inverse mix-columns are not included at the last round of the function [19]. Fig. 12 shows the block diagram for AES [36]. The 3-D-AES is implemented in two layers, reducing the data-path length. The functional blocks in the two upper and bottom layers are partitioned in accordance with their area utilization and power density. In this architecture, the MixColumn multiplier and the S-box 1 have approximately ve times higher power density than other functional blocks [36]. Therefore, the MixColumn multiplier and the S-box 1 are distributed through two layers and positioned so that they are

432

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

Fig. 13. Relationship between the photodiode-normalized capacitance with respect to sensitivity and effective photodiode area.

Fig. 15. Different readout architectures of CIS. (a) Pixel-by-pixel ADC (PBPADC). (b) Per-column ADC (PC-ADC). (c) Per-pixel (PP-ADC) architectures.

then, the frame rate (FR) is modied to Proposed (6)

Fig. 14. Delay between the starting point of the integration time and inverters output switching point as a function of photocurrent, where i , i , i , T , are the total current that passes through the photodiode, photocurand Pix rent, dark current, integration time (here is its triggering point), and the inverters output.

by (3) (4) (5) where and are the number of rows and columns in the pixel is the time necessary for the ADC to array, respectively. is the time required by the chip complete one conversion, I/O to send out the converted digital result, is the number of digital bits, and is the number of parallel outputs [37], [38]. corresponds to In our proposed CIS architecture, . The remaining parameters are kept the same for simulations. If unary computation is considered for the CIS

Thus, the FR of readout for proposed architectures can be estimated by increasing the resolution. This is illustrated in Fig. 16, , , and are xed at 2 ms, 50 MHz, and 32, rewhere spectively. The bandwidth of the output I/O bus is assumed to be 32-b for the purpose of comparison with other architectures. The results shown in Fig. 16 indicate that for high resolution imaging, that is, higher than 10 K, the unary architecture is a better performer than the conventional approach. Thus, our proposed inverter-based architecture can maintain higher resolution at a very high frame rate than the conventional architecture. C. Dynamic Power Estimation for TSV Data Transition Since TSV interconnection plays an important function in the proposed architecture, we address the issue of dynamic power dissipation in this section. The conventional CIS image sensor outputs image data in the binary format at the output of the A/D converter while the inverter-based CMOS image sensor outputs image data with the unary format. As a preprocessor, we employ a data format converter that converts data from unary to a binary at the front stage of DWT using a 256-b counter. When unary data changes from 0 to 1, the controller writes a count value onto the registers. The dynamic power dissipation is expressed as (7)

LEE et al.: 3-D SOS BIOMEDICAL-IMAGING ARCHITECTURE

433

Fig. 16. Relationship between frame rate and resolution of different CIS readout architectures. The readout architecture is classied in terms of pixel-by-pixel ADC (PBP-ADC), per-column ADC (PC-ADC), and per-pixel ADC (PP-ADC), also showing the corresponding proposed inverter-based architecture with unary computation domain. TABLE I PHYSICAL DIMENSION AND CHARACTERISTICS OF TSV

Fig. 17. Dynamic power dissipation for TSV using two medical images (MRI and CT) and two nonmedical images (Stefan and Container) for the binary and unary implementations. B and U denote binary and unary simulations, respectively.

MRI that were adopted for simulations. The DWT images were compressed with quantization and the embedded block coding [39]. Raw data for all cases is formatted as UXGA (1600 1200) with 8-b depth and 64 frames. The PSNRs and the related mean-square error (MSE) for 2-D and 3-D-DWT are modeled by (8) and (9) [40]

(8) (9) where is a load capacitance of TSV, is an operation is a transition probavoltage, is the frame rate, and bility. Table I shows the physical dimension and characteristics of the TSV based on Marchal et al. Reference [7] is used for dynamic power analysis. A 5- m TSVs diameter has a capacitance of 239.5 fF and is chosen to carry out simulation work. and the frame rate are 1.2 V and 50 The operating voltage MHz, respectively. Data-format transitions of the binary format differ from those for the unary format. For transmitting 8-b data in the binary, the worst-case number of transitions is 8, while in the unary, the transition occurs, at most, once during data transmission. In order to simulate the behavior of TSV and estimated dynamic power dissipation, (7) is employed and two medical MRI and CT images are adopted for simulation. In addition, two nonmedical images (Stefan and Container) with the QCIF (176 144) format and 64 frame for the binary and unary data format are also included. Stefan shows higher dynamic power dissipation than those of the medical images since image changes dynamically. The unary format keeps constant power dissipation as data switches only once. The results of simulation illustrating the dynamic power versus frame number are shown in Fig. 17. D. Comparison Between 2-D and 3-D-DWT Implementation Sample medical images of MRI and CT shown in Fig. 18 illustrate test images of the original and reconstructed CT and where MSE is the quadratic average difference between the original and the restored images. represents the number of pixels on a frame, and is the number of frames. A comparison between several approaches ranging from lossless to near-lossless approaches is shown in Table II. The 3-D approach for medical test MRI and CT images offers approximately 20% improvement in PSNR over the corresponding 2-D compression when the compression ratio is kept at 2:1. The compression ratio has been calculated as the ratio between the number of bits occupied by the unpacked image and the number of bits in the compressed image. Fig. 19 shows the PSNR comparison between 2-D and 3-D-DWT for Ultra eXtended Graphics Array (UXGA, 1600 1200) size of MRI and CT with 64-frames. As for the results, the PSNR for 3-D-DWT is 61.0 dB, which is higher than the corresponding 2-D-DWT of only 51.2 dB under the 2:1 compression ratio. The LLL component carries the largest amount of image information. The 3D-DWT, allows a more aggressive compression to take place, while retaining the LLL impact on PSNR. E. Power Estimation for 3-D Implementation We designed a 128 128 CMOS image sensor pixel array with the proposed single-inverter architecture. We estimate power of the blocks CIS, U2B, memory, 3-D-DWT, and AES. The average power consumption of CIS is 1.36 mW at 200 MHz using 1.2 V. For the 3-D-DWT, we employed Caos lter architecture [32] having a 9/7 tap lter that consists of

434

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

Fig. 18. Test images (UXGA (1600 1200) with 8-b depth and 64 frames) of the (a) original CT and MRI and (b) reconstructed CT and MRI by the proposed algorithm. TABLE II PERFORMANCE COMPARISON OF THE PROPOSED ARCHITECTURE AND EXISTING METHODS IN TERMS OF COMPRESSION RATIO AND AVERAGE PSNR

Fig. 19. PSNR comparison result of the 2-D and 3D-DWT for (a) MRI and (b) CT images. TABLE III POWER ESTIMATION FOR THE PROPOSED ARCHITECTURE

27 adders. In this design, we replaced adders with Kaveheis circuit [44] having a power consumption in the order of 2.2 mW. The 3-D-AES is based on Panus architecture [36] that consumes 5.6 mW at 152 MHz. Table III shows estimated power consumptions for the functional blocks of our proposed 3-D architecture. All power estimations are simulated using Samsung semiconductor 0.13- m standard CMOS technology . and 1.2 V VI. CONCLUSION This paper presents an SoS architecture that targets biomedical imaging and secure transmission over various communication networks. Multiple chips in conventional systems are integrated into a consolidated 3-D structure, employing TSV technology. Individual die can be fabricated with optimal Si process technology suitably adapted to targeted applications.

The architecture yields several advantages over the conventional single die approach. High performance and low power are one of the major potential advantages of this multilayered structure. Speed improvement is achieved due to the shorter interconnects. The system dissipates less power due to the replacement of power hogging interchip I/O circuitry with inter-die land-TSV structures without the need for signal amplication. The load of each circuit is much smaller than the conventional I/O ports and, therefore, the power consumption is lower. Several options of the readout part of the CIS were studied. The results derived from simulations indicate that for high-resolution applications, the proposed CIS architecture based on unary computation shows better performance than conventional approaches. Furthermore, the proposed 3-D architecture can maintain high resolution over 10 million pixels at a relatively high frame rate. The comparison between 2-D and 3-D architectures for MRI and CT images highlighted that 3-D implementation provides an improvement in PSNR of about 20% over the corresponding 2-D compression. The 3-D integration is perhaps the best available option to continue with Moores law. ACKNOWLEDGMENT The authors would like to thank iDataMap Pty Ltd. for their contribution.

LEE et al.: 3-D SOS BIOMEDICAL-IMAGING ARCHITECTURE

435

REFERENCES
[1] A. Wegener, Compression of medical sensor data [Exploratory DSP], IEEE Signal Process. Mag., vol. 27, no. 4, pp. 125130, Jul. 2010. [2] D. Clunie, DICOM Standard 2009. [Online]. Available: http://www. dclunie.com/dicom-status/status.html [3] D. Dhouib, A. Nait-Ali, C. Olivier, and M. S. Naceur, Performance evaluation of wavelet based coders on brain MRI volumetric medical datasets for storage and wireless transmission, Int. J. Biol., Biomed. Med. Sci., vol. 3, pp. 147156, 2008. [4] F. Catthoor, N. D. Dutt, and C. E. Kozyrakis, How to solve the current memory access and data transfer bottlenecks: At the processor architecture or at the compiler level, in Proc. Conf. Des., Autom. Test Eur., New York, 2000, pp. 426435. [5] International Technology Roadmap for Semiconductors (ITRS) Interconnect, 2008. [Online]. Available: http://public.itrs.net [6] E. Beyne, 3D Interconnection and packaging: Impending reality or still a dream?, in Proc. IEEE Int. Solid-State Circuits Conf., 2004, vol. 1, pp. 138139. [7] P. Marchal, B. Bougard, G. Katti, M. Stucchi, W. Dehaene, A. Papanikolaou, D. Verkest, B. Swinnen, and E. Beyne, 3-D technology assessment: Path-nding the technology/design sweet-spot, Proc. IEEE, vol. 97, no. 1, pp. 96107, Jan. 2009. [8] K. Eshraghian, RadCard (DICOM Datacard) iDataMap, 2010. [Online]. Available: http://www.idatamap.com [9] M. Bamal, S. List, M. Stucchi, A. Verhulst, M. Van Hove, R. Cartuyvels, G. Beyer, and K. Maex, Performance comparison of interconnect technology and architecture options for deep submicron technology nodes, in Proc. Int. Interconnect Technol. Conf., 2006, pp. 202204. [10] V. F. Pavlidis and E. G. Friedman, Interconnect-based design methodologies for three-dimensional integrated circuits, Proc. IEEE, vol. 97, no. 1, pp. 123140, Jan. 2009. [11] I. Chowdhury, R. Prasher, K. Lofgreen, G. Chrysler, S. Narasimhan, R. Mahajan, D. Koester, R. Alley, and R. Venkatasubramanian, On-chip cooling by superlattice-based thin-lm thermoelectric, Nature Nanotechnol., vol. 4, pp. 235238, 2009. [12] J. A. Davis, R. Venkatesan, A. Kaloyeros, M. Beylansky, S. J. Souri, K. Banerjee, K. C. Saraswat, A. Rahman, R. Reif, and J. D. Meindl, Interconnect limits on gigascale integration (GSI) in the 21st century, Proc. IEEE, vol. 89, no. 3, pp. 305324, Mar. 2001. [13] G. Metze, M. Khbels, N. Goldsman, and B. Jacob, Heterogeneous Integration, Tech Trend Notes, vol. 12, no. 2, p. 3, Sep. 2003. [14] R. S. Patti, Three-dimensional integrated circuits and the future of system-on-chip designs, Proc. IEEE, vol. 94, no. 6, pp. 12141224, Jun. 2006. [15] S. Gupta, M. Hilbert, S. Hong, and R. Patti, Techniques for producing 3D ICs with high-density interconnect, presented at the 21st Int. VLSI Multilevel Interconnection Conf., Waikoloa Beach, HI, Sep. 2004. [16] M. Koyanagi, T. Fukushima, and T. Tanaka, High-density through silicon vias for 3-D LSIs, Proc. IEEE, vol. 97, no. 1, pp. 4959, Jan. 2009. [17] T. Miyoshi, Y. Arai, M. Hirose, R. Ichimiya, Y. Ikemoto, T. Kohriki, T. Tsuboyama, and Y. Unno, Performance study of SOI monolithic pixel detectors for X-ray application, Nucl. Instrum. Meth. Phys. Res. Section A: Accel.,Spectrometers, Detectors Assoc. Equip., Apr. 2010, to be published. [18] R. M. Jiang and D. Crookes, Area-efcient high-speed 3D-DWT processor architecture, Electron. Lett., vol. 43, no. 9, 2007. [19] Advanced Encryption Standard (AES), National Institute of Standard and Technology (NIST), FIPS PUB 197, Nov. 2001, Fed. Inf. Process. Std. Publ. [20] S. Xue and B. Oelmann, Unary-prexed encoding of lengths of consecutive zeros in bit vector, Electron. Lett., vol. 41, no. 6, pp. 346347, Mar. 2005. [21] O. Kavehei, A. Iqbal, Y. S. Kim, K. Eshraghian, S. F. Al-Sarawi, and D. Abbott, The fourth element: Characteristics, modelling and electromagnetic theory of the memristor, Proc. Roy. Soc. A: Math., Phys. Eng. Sci., vol. 466, no. 2120, pp. 21752202, 2010. [22] J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, Global interconnect design in a three-dimensional system-on-a-chip, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 4, pp. 367372, Apr. 2004. [23] C. Posch, D. Matolin, and R. Wohlgenannt, A QVGA 143 dB dynamic range asynchronous address-event PWM dynamic image sensor with lossless pixel-level video compression, in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 711, 2010, pp. 400401.

[24] F. Luisier, C. Vonesch, T. Blu, and M. Unser, Fast interscale wavelet denoising of poisson-corrupted images, Signal Process., vol. 90, no. 2, pp. 415427, Feb. 2010. [25] S. H. Yang and K. R. Cho, High dynamic range CMOS image sensor with conditional reset, in Proc. IEEE Custom Integrated Circuits Conf., 2002, pp. 265268. [26] D. X. D. Yang, A. El Gamal, B. Fouler, and H. Tian, A 640 512 CMOS image sensor with ultrawide dynamic range oating-point pixel-level ADC, IEEE J. Solid-State Circuits, vol. 34, no. 12, pp. 18211834, Dec. 1999. [27] Z. Milin and A. Bermak, Compressive acquisition CMOS image sensor: From the algorithm to hardware implementation, IEEE Trans, Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 3, pp. 490500, Mar. 2010. [28] R. M. Jiang and D. Crookes, FPGA implementation of 3D discrete wavelet transform for real-time medical imaging, in Proc. Circuit Theory and Design, Aug. 2007, pp. 519522. [29] H. Y. Yoo, K. Lee, and B. D. Kwon, Implementation of 3D discrete wavelet scheme for space-borne imagery classication and its application, in Proc. IEEE Int. Geoscience and Remote Sensing Symp., Jul. 2328, 2007, pp. 34373440. [30] T. Huang, P. C. Tseng, and L. G. Chen, Flipping structure: An efcient VLSI architecture for lifting-based discrete wavelet transform, IEEE Trans. Signal Process., vol. 52, no. 4, pp. 10801089, Apr. 2004. [31] O. Fatemi and S. Bolouki, Pipeline, memory-efcient and programmable architecture for 2D discrete wavelet transform using lifting scheme, in Proc. Inst. Elect. Eng., Circuits, Devices and Systems, Dec. 2005, vol. 9, pp. 703708. [32] X. Cao, Q. Xie, C. Peng, Q. Wang, and D. Yu, An efcient VLSI implementation of distributed architecture for DWT, in Proc. IEEE 8th Workshop Multimedia Signal Processing , Oct. 2006, pp. 364367. [33] S. Agarwal, P. Ramanathan, and P. T. Vanathi, Comparative analysis of low power high performance ipops in the 0.13  technology, in Proc. Int. Conf. Advanced Computing and Communications, Dec. 1821, 2007, pp. 209213. [34] Y. K. Hong, K. T. Kim, Y. G. Kim, Y. H. Kim, K. R. Cho, T. W. Cho, Y. You, and K. Eshraghian, Three-dimensional discrete wavlet transform (DWT) based on unary arithmetic, in Proc. 14th World Multi-Conf. Systemics, Cybernetics and Informatics, Orlando, FL, 2010, vol. 1, pp. 3033. [35] S. Y. Kim, K. T. Kim, K. R. Cho, T. W. Cho, and Y. You, Unary computation for biomedical data, in Proc. 14th World Multi-Conf. Systemics, Cybernetics and Informatics, Orlando, FL, 2010, vol. 1, pp. 3438. [36] H. Panu, A. Timo, H. Marko, and D. H. Timo, Design and implementation of low-area and low-power AES encryption hardware core, in Proc. 9th EUROMICRO Conf. Digital System Design, 2006, pp. 577583. [37] M. El-Desouki, M. J. Deen, Q. Fang, L. Liu, F. Tse, and D. Armstrong, CMOS image sensors for high speed applications, in Sensors (Peterboroug 2009), Jan. 2009, vol. 9, pp. 430444. [38] S. Chen, F. Boussaid, and A. Bermak, Robust intermediate read-out for deep submicron technology CMOS image sensors, IEEE Sensors J., vol. 8, no. 3, pp. 286294, Mar. 2008. [39] K. Mei, N. Zheng, C. Huang, Y. Liu, and Q. Zeng, VLSI design of a high-speed and area-efcient JPEG2000 encoder, IEEE Trans., Circuits Syst. Video Technol., vol. 17, no. 8, pp. 10651078, Aug. 2007. [40] A. B. Watson, Whats wrong with mean-squared error?, in Digital Images and Human Vision. Cambridge, MA: MIT Press, 1993, pp. 207220. [41] S. Chokchaitam, M. Iwahashi, and S. Jitapunkul, A new unied lossless/lossy image compression based on a new integer DCT, IEICE Trans. Inf. Syst., vol. E88-D, no. 7, pp. 15981606, Jul. 2005. [42] Y. Xie, X. Tang, and M. Sun, Image compression based on classication row by row and LZW encoding, in Proc. Congr. Image and Signal Processing, May 2730, 2008, vol. 1, pp. 617621. [43] V. Sanchez, P. Nasiopoulos, and R. Abugharbieh, Efcient lossless compression of 4-D medical images based on the advanced video coding scheme, IEEE Trans. Inf. Technol. Biomed., vol. 12, no. 4, pp. 442446, Jul. 2008. [44] O. Kavehei, M. R. Azghadi, K. Navi, and A. P. Mirbaha, Design of robust and high-performance 1-bit CMOS full adder for nanometer design, in Proc. IEEE Computer Soc. Annu. Symp. VLSI, Apr. 79, 2008, pp. 1015.

436

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, VOL. 4, NO. 6, DECEMBER 2010

Sang-Jin Lee (S10) received the B.S. degree in chemical engineering and the M.S. degree in information and communication engineering from the Chungbuk National University, Cheongju, Korea, in 2008 and 2010, respectively, where he is is currently pursuing the Ph.D. degree. His current research interests include the design of multilayer system-on-systems technology, complementary metaloxide semiconductor image sensors, as well as cryptographic and embedded systems. Mr. Lee is a member of the World Class University (WCU) Program at Chungbuk National University, Korea.

is a Professor and Dean of the College of Electrical and Computer Engineering at Chungbuk University, Cheongju, Korea. His research interests are fault-tolerant computing, cryptography, silicon die-stacking architectures, and systems testing. Dr. You is a member of the World Class University Program of the Chungbuk University. He has 25 patents, and authored ve technical books, including Fundamentals of DRAM Design and Analysis (2004). He is a member of The Institute of Electronics Engineers of Korea.

Omid Kavehei (S05) received the M.S. degree in computer systems engineering from the National University of Iran (Shahid Beheshti University), Tehran, Iran, in 2005 and is currently pursuing the Ph.D. degree in electrical and electronic engineering at the University of Adelaide, Adelaide, Australia. Since 2009, he has been a Visiting Scholar at Chungbuk National University (CBNU), South Korea. His research interests include emerging nonvolatile memory systems and bio-inspired information processing. Dr. Kavehei received an Endeavor International Postgraduate Research Scholarship from the Australian Government. He was a recipient of the D. R. Stranks Travelling Fellowship, Simon Rockliff (DSTO) Scholarship, Research Abroad Scholarship, and the World Class University (WCU) Program Research Scholarships.

Kyoungrok Cho (S89M92) received the B.S. degree in electronic engineering from Kyoungpook National University, Taegu, Korea, in 1977, and the M.S. and Ph.D. degrees in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1989 and 1992, respectively. From 1979 to 1986, he was with the TV research center of LG Electronics, Korea. Currently, he is a Professor in the College of Electrical and Computer Engineering, Chungbuk National University, Korea. His research interests are in the eld of high-speed and low-power circuit design, system-on-a-chip platform design for communication systems, and prospective complementary metaloxide semiconductor image sensors, memristor-based circuits, and the design of multilayer system-on-systems technology. Currently, he is Director of the World Class University (WCU) Program at Chungbuk National University, Korea. He is also a Director of the IC Design Education Center at the Chungbuk National University. During 1999 and 2006, he spent two years at Oregon State University as a Visiting Scholar. Prof. Cho was the recipient of the The Institute of Electronics Engineers of Korea (IEEK) award in 2004. He is a member of The IEEK.

Yoon-Ki Hong received the B.S. degree in electronic engineering from the Chungbuk National University, Cheongju, Korea, in 2009, where he is currently pursuing the M.S. degree. His current research interests include discrete wavelet transform and circuit design. He is also a member of the World Class University (WCU) Program at Chungbuk National University, Korea.

Tae Won Cho (M92) received the B.S degree in electronic engineering from Seoul National University, Seoul, Korea, in 1973, the M.S. degree in electrical engineering from the University of Louisville, Louisville, KY, in 1986, and the Ph.D. degree in electrical engineering from the University of Kentucky in 1992. From 1973 to 1983, he was with Gold Star, Korea. Since 1992, he has been a Professor at the College of Electrical and Computer Engineering, Chungbuk National University, Chungbuk, South Korea. Also, he is a representative of the Research Institute of Ubiquitous Bio-Information Technology and CEO of Youbicom. His research interests are very-large scale integrated design, low-power circuits, computer system architecture, and embedded systems. Dr. Cho is a member of the Institute of Electronics Engineers of Korea.

Kamran Eshraghian received the B.Tech., M.Eng.Sc., and Ph.D. degrees from the University of Adelaide, Adelaide, South Australia, and the Dr.-Ing e.h. degree from the University of Ulm, Germany. He is best known in the international arena as being one of the fathers of complementary metaloxide semiconductor very-large scale integrated (VLSI) systems, having inuenced two generations of researchers in academia and industry. In 1979, he joined the Department of Electrical and Electronic Engineering at the University of Adelaide, after spending about ten years with Philips Research, both in Australia and Europe. In 1994, he was invited to take up the Foundation Chair of Computer, Electronics and Communications Engineering in Western Australia, and became Head of the School of Engineering and Mathematics, and Distinguished University Professor, and subsequently became the Director of the Electron Science Research Institute. In 2004, he became Founder/President of Elabs as part of his vision for horizontal integration of nanoelectronics with those of bioand photon-based technologies, thus creating a new design domain for system on system integration. Currently, he is the President of Innovation Labs and and is the Chairman of the Board of Directors of two high-technology companies. In 2007, he was visiting Professor of Engineering and the holder of the inaugural Ferrero Family Chair in Electrical Engineering at UC Merced prior to his move in 2009 to Chungbuk National University, Korea, as Distinguished Professor, World Class University (WCU) program. He has co-authored six textbooks and has lectured widely in VLSI and multitechnology systems. He has founded six high-technology companies, providing intimate links between university research and industry. Prof. Eshraghian is a Fellow and Life Member of the Institution of Engineers, Australia. In 2004, he was awarded for his research into the integration of nanoelectronics with that of lightwave technology.

Younggap You (M81) received the B.S. degree in electronic engineering from Sogang University, Seoul, Korea, in 1975, and the M.S. and Ph.D. degrees in electrical engineering from the University of Michigan, Ann Arbor, in 1981 and 1986, respectively. From 1975 to 1979, he was with the Agency for Defense Development, Korea, where he was involved in high-speed logic design. He was a Principal Engineer at LG Semiconductor Inc. (now Hynix Semiconductor Inc.) from 1986 through 1988. Currently, he

You might also like