Professional Documents
Culture Documents
Abstract—A new approach for Discrete Wavelet Transform computational efficiency, saving of memory, "in-place"
(DWT) has been proposed recently under the name of lifting computation of the DWT, integer-to-integer wavelet
scheme. This scheme presents many advantages over the transform (IWT), symmetric forward and inverse
convolution-based approach. In this paper, a high speed 9/7 transform, etc[4].
lifting DWT algorithm which is implementation on FPGA Real-time, high quality, image transmission from
with multi-stage pipelining structure and rational 9/7 mobile wireless sensors requires a low-power, low-cost,
coefficients is presented. Compared with the architecture high-speed hardware implementation of a state-of-the-art
without multi-stage pipeline, the proposed architecture has codec like the JPEG2000 lossy coder[5]. This coder uses
higher operating frequency, the design raises operating the 9/7 discrete wavelet transform (DWT). The
frequency around 3 times more fast, at the expense of high-speed implementation of lifting-based 9/7 DWT on
about 40% more hardware area. The hardware architecture field-programmable gate array (FPGA) using multi-stage
is suitable for high speed implementation. pipelining and rational wavelet coefficients is presented
in this paper.
Keywords-Discrete wavelet transform (DWT); lifting scheme;
The rest of the paper is organized as follows. In
multi-stage pipelining; FPGA
section 2 the theoretical basis of the lifting-based discrete
Fig.2. 2D-DWT
A. Split phase.
In this split phase, the data set x(n) is split into two ċ. DESIGN OF THE ARCHITECTURE
subsets to separate the even samples from the odd ones: The design of the 2D-DWT has 5 blocks: wavelet
X e = X ( 2n), X o = X (2 n + 1) (1) level select, 1D-DWT, boundary process, memory and
B. Prediction phase. memory control, as shown in Fig.3.
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on March 27,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
TABLE1. Lifting coefficients: original9/7, rational9/7 and fixed
point binary of rational9/7
Coefficient Original 9/7 Rational Fixed point binary of with 6 multipliers, 8 adders and 14 registers.
9/7 rational 9/7
Į -1.586134342 -3/2 11.1
k -1.230174105 4/5 00.1100110011001100 Generic multipliers usually have high area cost.
1/k 0.812893066 5/4 01.01 Multiplication by constant can be performed by shifted
The calculation of the rational 9/7 lifting factorization additions when the number of bits in the multiplication is
proposed in [7] is relatively easier than the original 9/7 large. The decimal point showed in binary representation
lifting factorization. The rational 9/7 architecture requires in table1 is for documenting the design of the control, as
only 2 float-point multipliers, 1 integer multiplier, 10 it is not considered in the hardware multiplier, which is
integer adders and 5 shifters. The original 9/7 architecture fix point. According to the binary representation of
requires 6 float-point multipliers and 8 float-point adders. multiplier constants presented in table1, the multiplication
The floating operation of the former is only 1/7 of the by Ȗ needs 7 adders, the first one performs c6+c8, the
latter, but the image compression performance of them next five ones are to perform the sum of shifted partial
are almost the same. Table2 shows the peak signal to products of c6+c8, and the last one performs the sum with
noise ratio (PSNR) obtained for two standard 512×512 c9.
images (‘lena’-img1, ‘barbara’-img2) of different bit rates The multiplication by Į needs 4 adders, ȕ needs 3
(0.25, 0.5 and 1 bit per pixel)at wavelet decomposition adders and the multiplication by į needs 5 adders. The k
level 5. equivalent constant has 8 high bits and this stage needed a
TABLE2. PSNR values of the original and rational 9/7 wavelet simple multiplication, so 7 adders can perform the
Image L Original 9/7 Rational 9/7 multiplication by k. The 1/k equivalent has 2 high bits, so
0.25 0.5 1 0.25 0.5 1
1 adder can perform the multiplication by 1/k.
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on March 27,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.
The 2-i in the expression can be achieved by right shift of the fastest design presented in [9]) due to use of the
i bits. The pipelining arithmetic stage structure to perform rational 9/7 DWT coefficients, tree-type arrangement and
the multiplication by Ȗ is shown in Fig.6(b). This stage is Horner’s rule for the accumulation of partial products in
straightforward as the insertion of some registers inside multiplication.
the logic. There is just one sum operation at each pipeline TABLE3. Implementation results
Design1 550 60 3
Design2 812 186 18
č. CONCLUSIONS
(a) The original arithmetic stage structure
This paper presents an efficient FPGA
implementation of the DWT algorithm with lifting
scheme. For the purpose of reducing the resource
consumption and speeding up the performance, the
rational 9/7 DWT coefficients and multi-stage pipeline
technique architecture are used. By use of the rational 9/7
DWT coefficients, the PSNR values of the reconstructed
image will not exceed 0.1dB. At the same time, the
Horner’s rule and the Tree-Height are used to reduce the
multiplication truncation error and latency. The
(b) The pipelining arithmetic stage structure
performance evaluation has proven that the proposed
Fig.6. Arithmetic stage structure of Ȗ multiplication
architecture has much higher operating frequency than the
Hence, pipelining increases the area cost of the architecture which without multi-stage pipelining.
architecture, but reduces the worst delay path between
registers, increasing the throughput rate of the DWT. And REFERENCE
[1] R. Calderbank, I. Daubechies, W. Sweldens, and B.L. Yeo, “Losless
it also reduces the power consumption. image compression using integer to integer wavelet transforms,”
International Conference on Image Processing (ICIP), 1997, Vol. I,
pp. 596–599.
Č. IMPLEMENTATION RESULTS
[2] A.A. Haj, “Fast Discrete Wavelet Transformation Using FPGAs and
Distributed Arithmetic,” International Journal of Applied Science
Here we present the implementation results of and Engineering, 2003, 160-171
1D-DWT architectures presented in section 3. The two [3] S.L. Bishop, S. Rai, B. Gunturk, J.L. Trahan, R.Vaidyanathan,
“Reconfigurable Implementation of Wavelet Integer Lifting
designs were compiled and simulated in a Stratix FPGA Transforms for Image Compression”, ReConFig’06, pp.1-9, Sept.
2006
device from Altera. Table3 shows the results of area cost, [4] S. Khanfir, M. Jemni, “Reconfigurable Hardware Implementations
for Lifting-Based DWT Image Processing Algorithms,” ICESS’08,
maximum data throughput (maximum operating 2008, pp. 283-290.
frequency) and number of pipeline stages. The design 1 is [5] ITU-T Recommentation T.800. JPEG2000 Image Coding
System—PartI, ITU Std (2002, Jul.). [Online]. Available:
the architecture showed in Fig.6(a). The design 2 is an http://www.itu.int/ITU-T/
[6] W. Sweldens, “The lifting scheme: a custom-design construction of
implementation with pipelined integer shifted adders, biorthogonal wavelets,” Applied and Computational Harmonic
Analysis, vol.3, no. 15, pp. 186-200, 1996
resulting in a 18 stages pipeline, which showed in [7] D. Tay, ‘A class of lifting based integer wavelet transform’. IEEE
Int.Conf. on Image Processing, 2001, pp. 602–605.
Fig.6(b). This is an architecture that has a 3 times larger
[8] K.K. Parhi, “VLSI digital signal processing systems: design and
operating frequency of design 1. So, the architecture of implementation”, John Wiley & Sons, Inc, 1999, p.508-510
[9] S.V. Silva, S. Bampi, “Area and Throughput Trade-Offs in the
design 2 definitely shows a better area-throughput Design of Pipelined Discrete Wavelet Transform Architectures,”
Proceedings of the Design Automation and Test in Europe
compromise than the first architecture. Conference and Exhibition (DATE’05), 1530-1591, 2005
Compared with the architectures presented in [9], our
architecture presents a larger data throughput (1.2 times
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on March 27,2010 at 02:36:55 EDT from IEEE Xplore. Restrictions apply.