VLSI Project

DETECTING BACKGROUND SETTING FOR DYNAMIC SCENE ABSTRACT Processing Real-Time image sequence is now possible because of advancement
of technological developments in digital signal processing, wide-band communication, and high-performance VLSI. With the developments in video technology, the surveillance system can be built with some low cost gadget such as the web-camera. In this modern life with increasing number of crime rate, people in society need for security and safety; video surveillance has become important reason to oppose threats of crime and terrorism. The most fundamental part of surveillance is foreground detection, which is retrieval of an object of interest. The object of interest can remodel by common background subtraction technique. There is some problem arises by using this technique, where because of variation of light source, the background constantly changes. The intensity of pixel changes throughout the object detection takes place. Intensity of pixel value changes leads to improper foreground detection, the background detected as foreground object. This paper proposes a method to model and update the background of the scene by intersection solving method.
LOSSLESS IMPLEMENTATION OF DAUBECHIES 8-TAP WAVELET TRANSFORM
ABSTRACT A new mapping scheme and its hardware implementation to error-freely compute the Daubechies 8-tap wavelet transform is presented. The multidimensional technique maps the irrational transform basis coefficients with integers and results in considerable reduction in hardware and power consumption. When implemented in Xilinx FPGA, the scheme costs 518 logic cells, 186 registers and runs at a frequency of 71MHz. While comparing with finite-precision architecture, the proposed scheme yields a reduction of 15% in hardware and 41% in power consumption for similar image reconstruction, and noticeable improvement in image reconstruction quality.
PERFORMANCE ANALYSIS OF INTEGER WAVELET TRANSFORM FOR IMAGE COMPRESSION
ABSTRACT For image compression, it is very necessary that the selection of transform should reduce the size of the resultant data as compared to the original data set .In this paper, a new lossless image compression method is proposed. For continuous and discrete time cases, wavelet transform and wavelet packet transform has emerged as popular techniques. While integer wavelet using the lifting scheme significantly reduces the computation time, we propose a completely new approach for further speeding up the computation. First, wavelet packet transform (WPT) and lifting scheme (LS) are described. Then an application of the LS to WPT is presented which leads to the generation of integer wavelet packet transform (IWPT). The proposed method, Integer Wavelet Packet Transform (IWPT) yields a representation which can be lossless, as it maps an integer valued sequence onto the integer valued coefficients. The idea of Wavelet Packet Tree is used to transform the still and color images. IWPT tree can be built by iterating the single wavelet decomposition step on both the low-pass and high-pass branches, with rounding off in order to achieve the integer transforms. Thus, the proposed method provides good compression ratio.
A MEDIAN FILTER FPGA WITH HARVARD ARCHITECTURE ABSTRACT To improve the speed of the image processing chip, to quick share the market and to reduce costs, this paper designs a chip with Harvard Architecture and FPGA. The chip is also used with a new hardware algorithm. Using the chip, the processing time is 13.2? less than the time of the chip with Von Neumann Architecture. The used units of filter are 13% of the whole FPGA gates, less than the claim part of the multi-image processing chip.
AUTOMATIC ROAD EXTRACTION USING HIGH RESOLUTION SATELLITE IMAGES BASED ON LEVEL SET AND MEAN SHIFT METHODS ABSTRACT Analysis of high resolution satellite images has been an important research topic for urban analysis. One of the important features of urban areas in urban analysis is the automatic road network extraction. Two approaches for road extraction based on Level Set and Mean Shift methods are proposed. From an original image it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. The image is preprocessed to improve the tolerance by reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are first extracted as elongated regions, non-linear noise segments are removed using a median filter (based on the fact that road networks constitute large number of small linear structures). Then road extraction is performed using Level Set and Mean Shift method. Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m resolution IKONOS data has been used for the experiment.
A NEW ADAPTIVE WEIGHT ALGORITHM FOR SALT AND PEPPER NOISE REMOVAL
ABSTRACT A new adaptive weight algorithm is developed for the removal of salt and pepper noise. It consists of two major steps, first to detect noise pixels according to the correlations between image pixels, then use different methods based on the various noise levels. For the low noise level, neighborhood signal pixels mean method is adopted to remove the noise, and for the high noise level, an adaptive weight algorithm is used. Experiments show the proposed algorithm has advantages over regularizing methods in terms of both edge preservation and noise removal, even for heavily contaminated image with noise level as high as 90%, it still can get a significant performance.
REMOVAL OF HIGH DENSITY SALT AND PEPPER NOISE THROUGH MODIFIED DECISION BASED UNSYMMETRIC TRIMMED MEDIAN FILTER ABSTRACT A modified decision based unsymmetrical trimmed median filter algorithm for the restoration of gray scale, and color images that are highly corrupted by salt and pepper noise is proposed in this paper. The proposed algorithm replaces the noisy pixel by trimmed median value when other pixel values, 0's and 255's are present in the selected window and when all the pixel values are 0's and 255's then the noise pixel is replaced by mean value of all the elements present in the selected window. This proposed algorithm shows better results than the Standard Median Filter (MF), Decision Based Algorithm (DBA), Modified Decision Based Algorithm (MDBA), and Progressive Switched Median Filter (PSMF). The proposed algorithm is tested against different grayscale and color images and it gives better Peak Signal-to-Noise Ratio (PSNR) and Image Enhancement Factor (IEF).
OPERATION IMPROVEMENT OF INDOOR ROBOT BY GESTURE RECOGNITION ABSTRACT Recently, the demand for the indoor robots has increased. Therefore, increased opportunities for many people to operate the robots have emerged. However, for many people, it is often difficult to operate a robot using the conventional methods like remote control. To solve this problem, we propose a robot operation system using the hand gesture recognition. Our method pays attention to the direction and movement of the hand. We were able to recognize several gestures in real-time.
ADIABATIC TECHNIQUE FOR ENERGY EFFICIENT LOGIC CIRCUITS DESIGN ABSTRACT The Energy dissipation in conventional CMOS circuits can be minimized through adiabatic technique. By adiabatic technique dissipation in PMOS network can be minimized and some of energy stored at load capacitance can be recycled instead of dissipated as heat. But the adiabatic technique is highly dependent on parameter variation. With the help of TSPICE simulations, the energy consumption is analyzed by variation of parameter. In analysis, two logic families, ECRL (Efficient Charge Recovery Logic) and PFAL (Positive Feedback Adiabatic Logic) are compared with conventional CMOS logic for inverter and 2:1 multiplexer circuits. It is find that adiabatic technique is good choice for low power application in specified frequency range.
DESIGN AND FPGA IMPLEMENTATION OF MODIFIED DISTRIBUTIVE ARITHMETIC BASED DWT-IDWT PROCESSOR FOR IMAGE COMPRESSION ABSTRACT Image compression is one of the major image processing techniques that is widely used in medical, automotive, consumer and military applications. Discrete wavelet transforms is the most popular transformation technique adopted for image compression. Complexity of DWT is always high due to large number of arithmetic operations. In this work a modified Distributive Arithmetic based DWT architecture is proposed and is implemented on FPGA. The modified approach consumes area of 6% on Virtex-II pro FPGA and operates at 134 MHz. The modified DA-DWT architecture has a latency of 44 clock cycles and a throughput of 4 clock cycles. This design is twice faster than the reference design and is thus suitable for applications that require high speed image processing algorithms.
AN FPGA-BASED ARCHITECTURE FOR LINEAR AND MORPHOLOGICAL IMAGE FILTERING ABSTRACT Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of real time algorithms suited to video image processing applications. The unique architecture of the FPGA has allowed the technology to be used in many applications encompassing all aspects of video image processing. Among those algorithms, linear filtering based on a 2D convolution, and non-linear 2D morphological filters, represent a basic set of image operations for a number of applications. In this work, an implementation of linear and morphological image filtering using a FPGA NexysII, Xilinx, Spartan 3E, with educational purposes, is presented. The system is connected to a USB port of a personal computer, which in that way form a powerful and low-cost design station. The FPGA-based system is accessed through a Matlab graphical user interface, which handles the communication setup. A comparison between results obtained from MATLAB simulations and the described FPGA-based implementation is presented.
DESIGN OF A LOW POWER FLIP-FLOP USING CMOS DEEP SUBMICRON TECHNOLOGY ABSTRACT This paper enumerates low power, high speed design of flip-flop having less number of transistors and only one transistor being clocked by short pulse train which is true single phase clocking (TSPC) flip-flop. Compared to Conventional flip-flop, it has 5 Transistors and one transistor clocked, thus has lesser size and lesser power consumption. It can be used in various applications like digital VLSI clocking system, buffers, registers, microprocessors etc. The analysis for various flip flops and latches for power dissipation and propagation delays at 0.13 m and 0.35 m technologies is carried out. The leakage power increases as technology is scaled down. The leakage power is reduced by using best technique among all run time techniques viz. MTCMOS. Thereby comparison of different conventional flip-flops, latches and TSPC flip-flop in terms of power consumption, propagation delays and product of power dissipation and propagation delay with SPICE simulation results is presented.
LOW-POWER AND AREA-EFFICIENT CARRY SELECT ADDER ABSTRACT Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18-$mu$m CMOS process technology. The results analysis shows that the proposed CSLA structure is better than the regular SQRT CSLA.
A PIPELINE VLSI ARCHITECTURE FOR HIGH-SPEED COMPUTATION OF THE 1-D DISCRETE WAVELET TRANSFORM ABSTRACT In this paper, a scheme for the design of a high-speed pipeline VLSI architecture for the computation of the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the number and period of clock cycles for the DWT computation with little or no overhead on the hardware resources by maximizing the inter- and intrastage parallelisms of the pipeline. The interstage parallelism is enhanced by optimally mapping the computational load associated with the various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations. The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks that can be performed independently in parallel and by optimally organizing the bitwise operations for performing each subtask so that the delay of the critical data path from a partial-product bit to a bit of the output sample for the filtering operation is minimized. It is shown that an architecture designed based on the proposed scheme requires a smaller number of clock cycles compared to that of the architectures employing comparable hardware resources. In fact, the requirement on the hardware resources of the architecture designed by using the proposed scheme also gets improved due to a smaller number of registers that need to be employed. Based on the proposed scheme, a specific example of designing an architecture for the DWT computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme, the architecture thus designed is simulated and implemented on a field-programmable gate-array board. It is seen that the simulation and implementation results conform to the stated goals of the proposed scheme, thus making the scheme a viable approach for designing a practical and realizable architecture for real-time DWT computation.
DUAL STACK METHOD: A NOVEL APPROACH TO LOW LEAKAGE AND SPEED POWER PRODUCT VLSI DESIGN
ABSTRACT The development of digital integrated circuits is challenged by higher power consumption. The combination of higher clock speeds, greater functional integration, and smaller process geometries has contributed to significant growth in power density. Scaling improves transistor density and functionality on a chip. Scaling helps to increase speed and frequency of operation and hence higher performance. As voltages scale downward with the geometries threshold voltages must also decrease to gain the performance advantages of the new technology but leakage current increases exponentially. Today leakage power has become an increasingly important issue in processor hardware and software design. In 65 nm and below technologies, leakage accounts for 30-40% of processor power. In this paper, we propose a new dual stack approach for reducing both leakage and dynamic powers. Moreover, the novel dual stack approach shows the least speed power product when compared to the existing methods.
POWER MANAGEMENT OF MIMO NETWORK INTERFACES ON MOBILE SYSTEMS Very Large Scale Integration (VLSI) Systems, IEEE Transactions on
ABSTRACT High-speed wireless network interfaces are among the most power-hungry components on mobile systems. This is particularly true for multiple-input-multiple-output (MIMO) network interfaces which use multiple RF chains simultaneously. In this paper, we present a novel power management solution for MIMO network interfaces on mobile systems, called antenna management. The key idea is to adaptively disable a subset of antennas and their RF chains to reduce circuit power consumption, when the capacity improvement of using a large number of antennas is small. Antenna management judiciously determines the number of active antennas to minimize energy per bit while satisfying the data rate requirement. This work provides both theoretical framework and system design of antenna management. We first present an algorithm that efficiently solves the problem of minimizing energy per bit and, then offer its 802.11n-compliant system designs. We employ both Matlab-based simulation and prototype-based experiment to validate the energy efficiency benefit of antenna management. The results show that antenna management can achieve 21% one-end energy per bit reduction to the front end of the MIMO network interface, compared to a static MIMO configuration that keeps all antennas active.
HIGH-SPEED LOW-POWER VITERBI DECODER DESIGN FOR TCM DECODERS Very Large Scale Integration (VLSI) Systems IEEE Transactions on
ABSTRACT High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module determining the overall power consumption of TCM decoders. We propose a pre-computation architecture incorporated with $T$-algorithm for VD, which can effectively reduce the power consumption without degrading the decoding speed much. A general solution to derive the optimal pre-computation steps is also given in the paper. Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that compared with the full trellis VD, the precomputation architecture reduces the power consumption by as much as 70% without performance loss, while the degradation in clock speed is negligible
PVT VARIATION TOLERANT CURRENT SOURCE WITH ON-CHIP DIGITAL SELF-CALIBRATION Very Large Scale Integration (VLSI) Systems IEEE Transactions on
ABSTRACT A current source with a small current error has been proposed to maintain the bandwidth of the system without an increase in power consumption for a margin. It minimizes the current error under process, supply voltage, and temperature (PVT) variations. Because the on-resistance of the nMOS array is self-calibrated digitally by an on-chip digital PVT detector, a current error of only ${pm}$ 2% is achieved. The current source has been implemented in an 80-nm CMOS process, occupies 0.018 mm$^{2}$ and consumes 94.9 $mu$ W at a supply voltage of 1.0 V.
LOW-COMPLEXITY SEQUENTIAL SEARCHER FOR ROBUST SYMBOL SYNCHRONIZATION IN OFDM SYSTEMS Very Large Scale Integration (VLSI) Systems IEEE Transactions on
ABSTRACT Based on the frequency-domain analog-to-digital conversion (FD ADC), this work builds a low-complexity sequential searcher for robust symbol synchronization in a 4$,times,$ 4 FD multiple-input multipleoutput orthogonal frequency-division multiplexing (MIMO-OFDM) modem. The proposed scheme adopts a symbol-rate sequential search with simple cross-correlation metric to recover symbol timing over the frequency domain. Simulation results show that the detection error is less than 2% at signal-to-noise ratio (SNR) $leqq $5 dB. Performance loss is not significant when carrier frequency offset (CFO) $leqq $100 ppm. Using an in-house 65-nm CMOS technology, the proposed solution occupies 84.881 k gates and consumes 5.2 mW at 1.0 V supply voltage. This work makes the FD ADC more attractive to be adopted in high throughput OFDM systems
AN AUTONOMOUS VECTOR/SCALAR FLOATING POINT COPROCESSOR FOR FPGAS
ABSTRACT We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The FPVC is completely autonomous from the embedded processor, exploiting parallelism and exhibiting greater speedup than alternative vector processors. The FPVC supports scalar computation so that loops can be executed independently of the main embedded processor. Floating point addition, multiplication, division and square root are implemented with the Northeastern University VFLOAT library. The FPVC is parameterized so that the number of vector lanes and maximum vector length can be easily modified. We have implemented the FPVC on a Xilinx Virtex 5 connected via the Processor Local Bus (PLB) to the embedded PowerPC. Our results show more than five times improved performance over the PowerPC augmented with the Xilinx Floating Point Unit on applications from linear algebra: QR and Cholesky decomposition.
BUILDING AN AMBA AHB COMPLIANT MEMORY CONTROLLER
ABSTRACT Microprocessor performance has improved rapidly these years. In contrast, memory latencies and bandwidths have improved little. The result is that the memory access time has been a bottleneck which limits the system performance. Memory controller (MC) is designed and built to attacking this problem. The memory controller is the part of the system that, well, controls the memory. The memory controller is normally integrated into the system chipset. This paper shows how to build an Advanced Micro controller Bus Architecture (AMBA) compliant MC as an Advanced High-performance Bus (AHB) slave. The MC is designed for system memory control with the main memory consisting of SRAM and ROM. Additionally, the problems met in the design process are discussed and the solutions are given in the paper.
4 BIT SFQ MULTIPLIER BASED ON BOOTH ENCODER
ABSTRACT We have designed a 2-bit Booth encoder with Josephson Transmission Lines (JTLs) and Passive Transmission Lines (PTLs) by using cell-based techniques and tools. The Booth encoding method is one of the algorithms to obtain partial products. With this method, the number of partial products decreases down to the half compared to the AND array method. We have fabricated a test chip for a multiplier with a 2-bit Booth encoder with JTLs and PTLs. It has a processing frequency of 20 GHz with the bias margin 25%. The frequency of this circuit increases up to 45 GHz with the bias voltage by 25% increased from the design voltage. The circuit area of the multiplier designed with the Booth encoder method is compared to that designed with the AND array method.
HIGH-ACCURACY FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS FOR LOSSY APPLICATIONS
ABSTRACT The fixed-width multiplier is attractive to many multimedia and digital signal processing systems which are desirable to maintain a fixed format and allow a little accuracy loss to output data. This paper presents the design of high-accuracy fixed-width modified Booth multipliers. To reduce the truncation error, we first slightly modify the partial product matrix of Booth multiplication and then derive an effective error compensation function that makes the error distribution be more symmetric to and centralized in the error equal to zero, leading the fixed-width modified Booth multiplier to very small mean and mean-square errors. In addition, a simple compensation circuit mainly composed of the simplified sorting network is also proposed. Compared to the previous circuits, the proposed error compensation circuit can achieve a tiny mean error and a significant reduction in mean-square error (e.g., at least 12.3% reduction for the 16-bit fixed-width multiplier) while maintaining the approximate hardware overhead. Furthermore, experimental results on two real-life applications also demonstrate that the proposed fixed-width multipliers can improve the average peak signal-to-noise ratio of output images by at least 2.0 dB and 1.1 dB, respectively.
EFFICIENT WEIGHTED MODULO 2N+1 ADDERS BY PARTITIONED PARALLEL-PREFIX COMPUTATION AND ENHANCED CIRCULAR CARRY GENERATION
ABSTRACT In this paper, we propose a low complexity design of weighted modulo 2n+1 adder, derived by decomposition of parallel-prefix computation into several blocks of smaller input bit-widths. Besides, we have proposed a novel enhanced circular carry generation (ECCG) unit to process the carrybits produced by all the parallel-prefix computation units (of small input bit-widths) to obtain the final modulo sum efficiently in terms of area-delay product. We have implemented the proposed adders using 0.13 ?m CMOS technology; and from the synthesis results we find that our proposed adder outperforms the previously reported weighted modulo 2n+1 adders. It offers a saving of area-delay product up to 49% over the existing methods.
DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS
ABSTRACT Parallel-prefix adders (also known as carry-tree adders) are known to have the best performance in VLSI designs. However, this performance advantage does not translate directly into FPGA implementations due to constraints on logic block configurations and routing overhead. This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder) and compares them to the simple Ripple Carry Adder (RCA) and Carry Skip Adder (CSA). These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay measurements were made with a high-performance logic analyzer. Due to the presence of a fast carrychain, the RCA designs exhibit better delay performance up to 128 bits. The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256.
HIGH SPEED ASIC DESIGN OF COMPLEX MULTIPLIER USING VEDIC MATHEMATICS
ABSTRACT Vedic Mathematics is the ancient methodology of Indian mathematics which has a unique technique of calculations based on 16 Sutras (Formulae). A high speed complex multiplier design (ASIC) using Vedic Mathematics is presented in this paper. The idea for designing the multiplier and adder/sub-tractor unit is adopted from ancient Indian mathematics Vedas. On account of those formulas, the partial products and sums are generated in one step which reduces the carry propagation from LSB to MSB. The implementation of the Vedic mathematics and their application to the complex multiplier ensure substantial reduction of propagation delay in comparison with DA based architecture and parallel adder based implementation which are most commonly used architectures. The functionality of these circuits was checked and performance parameters like propagation delay and dynamic power consumption were calculated by spice spectre using standard 90nm CMOS technology. The propagation delay of the resulting (16, 16)(16, 16) complex multiplier is only 4ns and consume 6.5 mW power. We achieved almost 25% improvement in speed from earlier reported complex multipliers, e.g. parallel adder and DA based architectures.
A LIGHTWEIGHT HIGH-PERFORMANCE FAULT DETECTION SCHEME FOR THE ADVANCED ENCRYPTION STANDARD USING COMPOSITE FIELDS
ABSTRACT The faults that accidently or maliciously occur in the hardware implementations of the Advanced Encryption Standard (AES) may cause erroneous encrypted/decrypted output. The use of appropriate fault detection schemes for the AES makes it robust to internal defects and fault attacks. In this paper, we present a lightweight concurrent fault detection scheme for the AES. In the proposed approach, the composite field S-box and inverse S-box are divided into blocks and the predicted parities of these blocks are obtained. Through exhaustive searches among all available composite fields, we have found the optimum solutions for the least overhead parity-based fault detection structures. Moreover, through our error injection simulations for one S-box (respectively inverse S-box), we show that the total error coverage of almost 100% for 16 S-boxes (respectively inverse S-boxes) can be achieved. Finally, it is shown that both the application-specific integrated circuit and field-programmable gatearray implementations of the fault detection structures using the obtained optimum composite fields, have better hardware and time complexities compared to their counterparts.
IMPLEMENTATION AND PERFORMANCE ANALYSIS OF SEAL ENCRYPTION ON FPGA, GPU AND MULTICORE PROCESSORS
ABSTRACT Accelerators, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), are special purpose processors designed to speed up compute-intensive sections of applications. FPGAs are highly customizable, while GPUs provide massive parallel execution resources and high memory bandwidth. In this paper, we compare the performance of these architectures, presenting a performance study of SEAL, a fast, software-oriented encryption algorithm on a Virtex-6 FPGA, a Graphics Processor Unit (GPU), and Intel Core i7, a 2-way hyper-threaded, 4-core processor. We show that each platform has relative competitive advantages in encrypting an input plaintext using SEAL.
ON THE TRANSMISSION METHOD FOR SHORT RANGE MIMO COMMUNICATIONS ABSTRACT This paper investigates a transmission scheme that is suitable for short-range multiple-input-multipleoutput (MIMO) transmission. Since the distance between two array antennas that face each other is comparable with the size of the array antenna aperture in short-range MIMO, the propagation characteristics are greatly different from those in conventional MIMO. Unlike conventional MIMO, the optimal element spacing, which maximizes channel capacity, exists in short-range MIMO. Moreover, the channel capacity with optimal antenna spacing exceeds the ergodic capacity of independent identically distributed (i.i.d.) channels since optimal eigenvalue distribution, which can maximize channel capacity, is obtained in the short-range MIMO. In this paper, we focus on the actual transmission methods, because complex transmission schemes such as eigenmode transmission or maximum-likelihood detection are required to obtain ideal channel capacity. We clarify that the channel capacity obtained by zero forcing (ZF) at the receiver without beamforming at the transmitter is almost the same as that using eigenmode transmission when considering the optimal element spacing. The effectiveness of short-range MIMO communication is also clarified using a 4 4 MIMO testbed with actual signals based on the IEEE 802.11n standard. Simulated and measured results show that optimal element spacing is a key parameter in the short-range MIMO communication. We found that designing antenna arrays with optimal element spacing is a very effective approach to achieving a simple hardware configuration.
DESIGN AND IMPLEMENTATION OF CORDIC PROCESSOR FOR COMPLEX DPLL
ABSTRACT Now-a-days various Digital Signal Processing systems are implemented on a platform of programmable signal processors or on application specific VLSI chips. Coordinate Rotation Digital Computer (CORDIC) algorithm has turned out to be such kind of programmable signal processor. In recent times, it has been a widely researched topic in the field of vector rotated Digital Signal Processing (DSP) applications due to its simplicity. This paper presents the design of pipelined architecture for coordinate rotation algorithm for the computation of loop performance of complex Digital Phase Locked Loop (DPLL) in In-phase and quadrature channel receiver. The design of CORDIC in the vector rotation mode results in high system throughput due to its pipelined architecture where latency is reduced in each of the pipelined stage. For on-chip application, the area reduction in proposed design can is achieved through optimization in the number of micro rotations. For better loop performance of first order complex DPLL and to minimize quantization error, the numbers of iterations are also optimized.
DIRECT DIGITAL FREQUENCY SYNTHESIZER USING NONUNIFORM PIECEWISE-LINEAR APPROXIMATION
ABSTRACT This paper investigates a novel direct digital frequency synthesizer architecture, based on piecewise linear approximation with segments of nonuniform length. The new approach allows reducing the total number of segments with respect to the well-known uniform segmentation. In this way the size of the coefficient ROM is also reduced with beneficial effects in terms of speed and power. We show that the optimal nonuniform segmentation (that maximizes the spurious-free dynamic range for a given number of nonuniform segments) can be obtained as the solution of a mixed-integer linear programming problem. Three simple, suboptimal, nonuniform segmentation schemes (which lend themselves to efficient hardware implementation) are proposed in this paper. We present also several design examples and VLSI implementation results, which demonstrate the effectiveness of the developed technique.
A ROTATION-BASED BIST WITH SELF-FEEDBACK LOGIC TO ACHIEVE COMPLETE FAULT COVERAGE
ABSTRACT This paper presents a deterministic BIST technique that can efficiently achieve complete fault coverage without using any storage devices. A novel test structure containing a self-feedback logic unit and a circular shift register is proposed by which all the required deterministic patterns can be generated onchip in real time. Experiments on ISCAS 85 benchmark circuits show that compared with previous work addressing the same problem our technique requires much less test time to achieve 100% fault coverage for all testable stuck-at faults.
TECHNIQUE OF LFSR BASED TEST GENERATOR SYNTHESIS FOR DETERMINISTIC AND PSEUDORANDOM TESTING ABSTRACT The structure of test system based on application built-in self-test (BIST) circuitries has been proposed. The main idea is oriented on minimization of hardware overheads and dealt with automatization of BIST-circuitries generation. Test generator based on linear feedback shift register (LFSR) provides two types of testing pseudorandom and deterministic. The proposed modified Berlekamp-Massey algorithm is used for generation the LFSR polynomial coefficients. The experimental results of technique application for some ISCAS'89 benchmark circuits have been shown.
TASK MIGRATION IN MESH NOCS OVER VIRTUAL POINT-TO-POINT CONNECTIONS Processor allocation in todays many core MPSoCs is a challenging task, especially since the order and requirements of incoming applications are unknown during design stage. To improve network performance, balance the workload across processing cores, or mitigate the effect of hot processing elements in thermal management methodologies, task migration is a method which has attracted much attention in recent years. Runtime task migration was first proposed in multicomputer with load balancing as the major objective. However, specific NoC properties such as limited amount of communication buffers, more sensitivity to implementation complexity, and tight latency and power consumption constraints bring new challenges in using task migration mechanisms in NoCs. As a consequence, the efficiency and applicability of traditional migration mechanisms (developed for multicomputers) are under question. Due to the limited resource budget in NoC-based MPSoCs as well as tight performance constraints of running applications, in this paper, we propose an efficient methodology based on virtual point-to-point (VIP for short) connections. These dedicated VIP connections provide low-latency and low-power paths for heavy communication flows created by task migration mechanisms. Analyzing the results show that the proposed scheme reduces message latency by 13% and migration latency by 14%, while 10% power savings can be achieved compared to the previously proposed task migration strategy (known as Gathering-RoutScattering) for mesh multiprocessors.

VLSI Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VLSI Project

Uploaded by

Copyright:

Available Formats

DETECTING BACKGROUND SETTING FOR DYNAMIC SCENE ABSTRACT Processing Real-Time image sequence is now possible because of advancement

LOSSLESS IMPLEMENTATION OF DAUBECHIES 8-TAP WAVELET TRANSFORM

PERFORMANCE ANALYSIS OF INTEGER WAVELET TRANSFORM FOR IMAGE COMPRESSION

AN AUTONOMOUS VECTOR/SCALAR FLOATING POINT COPROCESSOR FOR FPGAS

BUILDING AN AMBA AHB COMPLIANT MEMORY CONTROLLER

4 BIT SFQ MULTIPLIER BASED ON BOOTH ENCODER

HIGH-ACCURACY FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS FOR LOSSY APPLICATIONS

DESIGN AND CHARACTERIZATION OF PARALLEL PREFIX ADDERS USING FPGAS

HIGH SPEED ASIC DESIGN OF COMPLEX MULTIPLIER USING VEDIC MATHEMATICS

DESIGN AND IMPLEMENTATION OF CORDIC PROCESSOR FOR COMPLEX DPLL

DIRECT DIGITAL FREQUENCY SYNTHESIZER USING NONUNIFORM PIECEWISE-LINEAR APPROXIMATION

A ROTATION-BASED BIST WITH SELF-FEEDBACK LOGIC TO ACHIEVE COMPLETE FAULT COVERAGE

You might also like