Professional Documents
Culture Documents
Submitted to the faculty
Of
MASTER OF TECHNOLOGY
By
Sri P.MURALIDHAR
Lecturer
(DEEMED UNIVERSITY)
2006-2008
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY
(DEEMED UNIVERSITY)
WARANGAL – 506 004
2008
CERTIFICATE
This is to certify that this is the bonafide record of the project work
“IMPLEMENTATION OF WAVELET TRANSFORM BASED IMAGE COMPRESSION
ON FPGA” carried out by E.VAMSHI KRISHNA (062502) of final year M.Tech
(Electronic Instrumentation) during the academic year 2007-2008 in the partial
fulfillment of the requirements for the award of the degree of Master of Technology.
I wish to thank all the staff members in the department for their kind cooperation
and support given throughout my project work. I am also thankful to all of my friends
who have given valuable suggestion throughout the project work.
Finally, I wish to thank all those who have involved directly or indirectly, for the
successful completion of my project work.
E.VAMSHI KRISHNA
ABSTRACT
Image processing systems can encode raw images with different degrees of
precision, achieving varying levels of compression. Different encoders with different
compression ratios can be built and used for different applications. The need to
dynamically adjust the compression ratio of the encoder arises in many applications. One
example involves the real-time transmission of encoded data over a packet switched
network.
In this thesis, a design for efficient hardware acceleration of the Discrete Wavelet
Transform is proposed and implemented on FPGA. The complete design of the codec on
FPGA is presented. Implementation details of the individual blocks are discussed in great
detail. Finally, results from testing are reported and discussed.
i
CONTENTS
ii
Chapter 3: HARDWARE
3.1 Field Programmable Gate Array Architecture..……..…(11)
3.1.1 Applications of FPGA…………………………………(12)
3.2 Altera DE2 Board……………...…………………….…(12)
3.2.1 Examine the Board………………………………………(13)
3.2.2 Features………………………………………………….(13)
3.3 NIOS II Processor………………………………..….…(15)
3.3.1 Introduction……………………………………………..(15)
3.3.2 NIOS II Architecture...……………………………….…(15)
3.3.3 NIOS II Hardware Development……..….……………...(16)
3.3.4 System Development Flow………………..…………….(17)
3.3.5 Generating the System in SOPC builder…..……………(18)
3.3.6 NIOS II Software Development Tools……..……………(19)
3.3.7 Custom Instructions………………………..……………(20)
3.3.8 Summary of Development of NIOS II System…....……(22)
3.3.9 SOPC Builder………………………………..…...…..…(23)
4.3.3 Entropy Encoding………………………………………(40)
4.3.4 Bit Packing……………………………………...………(42)
4.3.5 Output File Format………………………………………(43)
4.3.6 Stage 2 Overall Architecture………………………….…(44)
4.4 Stage 3…………….……………………………………(46)
4.4.1 Entropy Decoding………………………………………(46)
4.4.2 Run Length Decoding………………………..…………(47)
4.4.3 Dequantization…………………………………….……(48)
4.4.4 Stage 3 Overall Architecture……………………………(49)
4.5 Stage 4: Inverse DWT…..………………………..….…(49)
4.5.1 3 Stages of Inverse Wave letting…………..……………(50)
4.6 Implementations..……………………………….…….. (51)
Chapter 5: RESULTS
APPENDIX…………………………………………………………….. (66)
REFERENCES……………………………………………………….... (67)
iv
List of Figures
Figure 4.18: Binary Shifter for bit packing……………………………...….……
Figure 4.19: Output File Format……………………………………….……..… (44)
Figure 4.20: Stage2, Data Flow Diagram…………………………………..…... (45)
Figure 4.21: Stage2, Control Flow Diagram …………………...………….…... (46)
Figure 4.22: Entropy Decoder…………………………………………..…….... (47)
Figure 4.23: Run Length Decoder …………………………………………....... (47)
Figure 4.24: Dequantizer …………….………………………….………….….. (48)
Figure 4.25: Stage3, Data Flow Diagram …………………...……………….… (49)
Figure 4.26: Coefficient Processing along X direction…………...………….…. (49)
Figure 4.27: Coefficient Processing along Y direction……………...……..…… (50)
Figure 4.28: RTL view of Inverse Wavelet …………………………...….....…. (50)
Figure 4.29: Custom Logic System Module with Avalon Switch Fabric………. (51)
Figure 4.30: Block Diagram of Processing Element (PE)...…….………..…….. (52)
Figure 4.31: 3 way Handshaking between PE & Host……...….…………..…... (52)
Figure 4.32: Timing Diagram for Memory Read after a Write Access………... (53)
Figure 5.1: VHDL Simulation Output for Forward Wavelet X...…….…...…… (54)
Figure 5.2: VHDL Simulation Output for Forward Wavelet Y...………….…… (55)
Figure 5.3: VHDL Simulation Output for Stage1 Top Level...…….………...… (55)
Figure 5.4: VHDL Simulation Output for Inverse Wavelet Transform………… (56)
Figure 5.5: VHDL Simulation Output for Quantizer...…….…………………… (56)
Figure 5.6: VHDL Simulation Output for Dequantizer...…….………………… (57)
Figure 5.7: VHDL Simulation Output for Run Length Encoder...…...….……… (57)
Figure 5.8: VHDL Simulation Output for Shifter...…….………………….…… (58)
Figure 5.9: PSNR & RMSE Equations ………….…………………………..…. (59)
Figure 5.10: Original Image of LENA……………………………………...….. (60)
Figure 5.11: Reconstructed Images of LENA………………………………….. (60)
Figure 5.12: Original Image of BARBARA………………………….…….….. (61)
Figure 5.13: Reconstructed Images of BARBARA…………………….…..….. (61)
Figure 5.14: Original Image of GOLD HILL……………………….…....…….. (62)
Figure 5.15: Reconstructed Images of GOLD HILL…………………..……...... (62)
Figure 5.16: Compression Ratio Vs PSNR Graph of LENA……….…….…..… (63)
Figure 5.17: Compression Ratio Vs PSNR Graph of BARBARA……...……… (63)
Figure 5.18: Compression Ratio Vs PSNR Graph of GOLD HILL……….…… (63)
vi
List of Tables
Table 2.1: (2,2) CDF Wavelet with Lifting Scheme ………………………….... (9)
Table 4.1: Bit Range allocation for RLE……………………….………………. (40)
Table 4.2: Coefficient Value Calculation for Dequantizer…………………...… (48)
Table 5.1: Compression Level & Noise Measurement of LENA……….…....… (60)
Table 5.2: Compression Level & Noise Measurement of BARBARA………… (61)
Table 5.3: Compression Level & Noise Measurement of GOLD HILL….….… (61)
vii
CHAPTER 1
INTRODUCTION
1.2 Background
1 | P a g e
1.3 Problem Description
Computer data compression is, of course, a powerful, enabling technology that
plays a vital role in the information age. Among the various types of data commonly
transferred over networks, image and video data comprises the bulk of the bit traffic. For
example, current estimates indicate that image data take up over 40% of the volume on
the Internet. The explosive growth in demand for image and video data, coupled with
delivery bottlenecks has kept compression technology at a premium. So to overcome this
problem an efficient Image compression technique using Wavelet Transform is
proposed. Not only the efficiency but the execution time also plays a vital role, so a
hardware FPGA is used as implementation platform which not only provides fast
executions but also a concept of Reconfigurability Thus Implementation of Wavelet
Transform based Image Compression on FPGA is the main focus of my work.
2 | P a g e
CHAPTER 2
WAVELET TRANSFORM BASED IMAGE COMPRESSION
2.1 Introduction
Image compression is different from binary data compression. When
binary data compression techniques are applied to images, the results are not optimal. In
lossless compression, the data (such as executables, documents, etc.) are compressed
such that when decompressed, it gives an exact replica of the original data. They need to
be exactly reproduced when decompressed. For example, the popular PC utilities like
Winzip or and Adobe Acrobat perform lossless compression. On the other hand, images
need not be reproduced exactly. A ‘good’ approximation of the original image is enough
for most purposes, as long as the error between the original and the compressed image is
tolerable. Lossy compression techniques can be used in this application. This is because
images have certain statistical properties, which can be exploited by encoders
specifically designed for them. Also, some of the finer details in the image can be
sacrificed for the sake of saving bandwidth or storage space.
In digital images the neighboring pixels are correlated and therefore contain
redundant information. Before the image is compressed, the pixels, which are correlated
is to be found. The fundamental components of compression are redundancy and
irrelevancy reduction. Redundancy means duplication and irrelevancy means the parts of
signal that will not be noticed by the signal receiver, which is the Human Visual System
(HVS). There are three types of redundancy that can be identified:
3 | P a g e
Spatial Redundancy is the correlation between neighboring pixel values.
Spectral Redundancy is the correlation between different color planes or
spectral bands.
Temporal Redundancy is the correlation between adjacent frames in a sequence
of images (in video applications).
Input Compressed
Signal Signal
Source Encoder Quantizer Entropy Encoder
Source Encoder
An encoder is the first major component of image compression system. A variety
of linear transforms are available such as Discrete Fourier Transform (DFT), Discrete
Cosine Transform (DCT), and Discrete Wavelet Transform (DWT). The Discrete
Wavelet Transform is main focus of my work.
4 | P a g e
Quantizer
A quantizer reduces the precision of the values generated from the encoder and
therefore reduces the number of bits required to save the transform co-coefficients. This
process is lossy and quantization can be performed on each individual coefficient. This is
known as Scalar Quantization (SQ). If it is performed on a group of coefficients together
then it is called Vector Quantization (VQ).
Entropy Encoder
An entropy encoder does further compression on the quantized values. This is
done to achieve even better overall compression. The various commonly used entropy
encoders are the Huffman encoder, arithmetic encoder, and simple run-length encoder.
For improved performance with the compression technique, it’s important to have the
best of all the three components.
5 | P a g e
2.5 Wavelets
A wavelet is a localized function in time (or space in the case of images) with
mean zero. A wavelet basis is derived from the wavelet (small wave) by its own dilations
and translations.
W j,k (t) = 2-j/2W(2-jt – k)
Functionally, Discrete Wavelet Transform (DWT) is very much similar to the
Discrete Fourier Transform, in that the transformation function is orthogonal. A signal
passed twice through the orthogonal function is unchanged. As the input signal is a set of
samples, both transforms are convolutions. While the basis function of the Fourier
transform is a sinusoid, the wavelet basis is a set waves obtained by the dilations and
translations of the mother wavelet.
The wavelet basis forms an orthogonal basis if the basis vectors are orthogonal to
its own dilations and translations. A less stringent condition is that the vectors be bi-
orthogonal. The DWT and inverse DWT can be implemented by filter banks. This
includes an analysis filter and a synthesis filter. When the analysis and synthesis filters
are transposes as well as inverses of each other, the whole filter bank is orthogonal.
When they are inverses, but not necessarily transposes, the filter bank is bi-orthogonal.
6 | P a g e
2.5.3 A simple example -the Haar wavelet
One of the first wavelet was that of Haar. The Haar scaling function is shown
below.
⎧ 1,0 ≤ t ≤ 1
w( n) = ⎨
⎩0, otherwise
Applying the Haar wavelet on a sequence of values computes its sums and
differences. For example, a sequence of values a, b would be replaced by s=(a+b)/2 and
d=(b-a). The values of a and b can be reconstructed as a=s-d/2 and b=s+d/2. The input
signal with 2n samples is replaced with 2n-1 no of averages s 0 (i) and 2n-1 no of differences
d 0 (i). The averages can be thought of as a coarser representation of the signal and the
differences as the information needed to go back to the original resolution. The averages
and differences are now computed on the coarser signal s 0 (i) of length 2n-1. This gives
s 1 (i) and d 1 (i) of length 2n-2 each. This operation can be performed n times, till we run
out of samples. The inverse operation starts by computing s n-2 (i) from s n-1 (i) and d n-1 (i).
7 | P a g e
A more general lifting scheme consists of three steps -split, predict and update.
The splitting stage splits the signal into two disjoint sets of samples. In the above
example, it consists of even numbered samples and odd numbered samples. Each group
contains half as many samples as the original signal. If the signal has a local correlation
the consecutive samples will be highly correlated. In other words, given one set it should
be able to predict the other. In the diagram, the even samples are used to predict the odd
samples. Then the detail is the difference between the odd sample and its prediction. In
the Haar case the prediction is simple, every even value is used to predict the next odd
value. The order of the predictor in the Haar case is 1 and it eliminates zeroth order
correlation. The reverse operation is done as undo- update, undo-predict and merge.
f = ∑aw i i
8 | P a g e
adds non-linearity to the transform, the transform is fully invertible as long as the
rounding is deterministic.
Forward transform
si x 2i
di x 2i+1
di d i – (s i + s i+1 )/2
si s i + (d i-1 - d i )/4
Inverse transform
si s i – d i /2
di di + si
x 2i si
x 2i+1 di
9 | P a g e
Another advantage of a wavelet basis is that it supports multi resolution. Consider
the windowed Fourier transform. The effect of the window is to localize the signal being
analyzed. Because a single window is used for all frequencies, the resolution of the
analysis is same at all frequencies. To capture signal discontinuities (and spikes), one
needs shorter windows, or shorter basis functions. At the same time, to analyze low
frequency signal components, one needs longer basis functions. With wavelet based
decomposition, the window sizes vary. Thus it allows analyzing the signal at different
resolution levels.
10 | P a g e
CHAPTER 3
HARDWARE
3.1 FPGA Basics
3.1.1 Field Programmable Gate Array (FPGA) Architecture:
A Field Programmable Device (FPD) is a type of integrated circuit used for
implementing digital hardware, where the chip can be configured by the end user to
realize different designs. Programming of such a device often involves placing the chip
into a special programming unit, but some chips can also be configured “in-system”.
Another name for FPDs is programmable logic devices (PLDs).
FPGA — a Field-Programmable Gate Array is an FPD featuring a general
structure that allows very high logic capacity. The basic structure of FPGAs is array-
based, meaning that each chip comprises a two dimensional array of logic blocks that
can be interconnected via horizontal and vertical routing channels. FPGAs comprise an
array of uncommitted circuit elements, called logic blocks, and interconnect resources,
but FPGA configuration is performed through programming by the end user. As the only
type of FPD that supports very high logic capacity, FPGAs have been responsible for a
major shift in the way digital circuits are designed.
Besides logic, the other key feature that characterizes an FPGA is its interconnecting
structure. The interconnect is arranged in horizontal and vertical channels.
11 | P a g e
There are two basic categories of FPGAs used today:
1. SRAM-based FPGAs.
2. Antifuse-based FPGAs.
• Another promising area for FPGA application is the usage of FPGAs as custom
computing machines. This involves using the programmable parts to “execute”
software, rather than compiling the software for execution on a regular CPU.
• Layout traces and components are carefully arranged so that they are properly
aligned. This nice alignment will increase the yield for manufacturing and ease
board debugging procedure.
• Jumper-free design for robustness. Jumpers are a great point of failure and
might cause frustration for users who don’t keep the manuals with them all the
time.
12 | P a g e
• Compponents’ sellection was made according to thee volume shhipped. We selected
the most
m uration usedd in PC annd DVD players to
commoon componnent configu
ensurre the continnuous supplly of the com
mponent ressource in thhe future.
• Protection on Poower and IOs
I are con
nsidered to cover mosst of the acccidental
cases in the fieldd.
2
3.22.1 Exam
mine the Board
B
3.22.2 Feattures:
DE2 boarrd providess users maany features to enablee various m
multimedia project
devvelopments.. Componennt selectionn was made according to the most popular design
d in
vollume producction multim
media produucts such ass DVD, VC
CD, and MP3 players. The
T DE2
plaatform allow
ws users to
t quickly understand
d all the insight
i triccks to desiign real
muultimedia prrojects for inndustry.
• Alteraa Cyclone III 2C35 FPG
GA with 350
000 LEs
• Alteraa Serial Connfiguration devices
d (EP
PCS16) for Cyclone
C II 22C35
• USB Blaster
B builtt in on boarrd for prograamming andd user API ccontrolling
• JTAG Mode and AS Mode are
a supporteed
• 8Mbytte (1M x 4 x 16) SDRA
AM
• 1Mbytte Flash Meemory (upgrradeable to 4Mbyte)
• SD Card Socket
13 | P a g e
• 4 Pushh-button sw
witches
• 18 DP
PDT switchhes
• 9 Greeen User LE
EDs
• 18 Reed User LED
Ds
• 50MH
Hz Oscillatoor and 27MH
Hz Oscillator for externnal clock soources
• 24-bitt CD-Qualiity Audio CODEC with
w line-in, line-out, aand microp
phone-in
jacks
• VGA DAC (10-bbit high-speeed triple AD
DCs) with VGA
V out coonnector
• TV Decoder
D (NT
TSC/PAL) and
a TV in connector
• 10/1000 Ethernet Controller with
w socket.
• USB Host/Slave Controller with USB type
t A and type
t B connnectors.
• RS-2332 Transceiiver and 9-ppin connecto
or
• PS/2 mouse/keyb
m board conneector
• IrDA transceiverr
• Two 40-pin
4 Expaansion Headders with diiode protecttion
• DE2 Lab
L CD-RO
OM which contains
c maany examplees with sourrce code to exercise
the booards, includding: SDRA
AM and Flaash Controlller, CD-Quuality Musicc Player,
VGA and TV Labs,
L SD Card readeer, RS-2322/PS-2 Com
mmunication
n Labs,
NIOSIII, and Conttrol Panel API.
A
14 | P a g e
3.3 NIOS II PROCESSOR
3.3.1 Introduction
The NIOS II soft core processor is a general purpose RISC processor having the
following features
• Full 32 bit instruction set, data path and address space
• 32 general purpose registers
• 32 external interrupt sources
• Single-instruction 32 × 32 multiply and divide producing a 32-bit result
• Dedicated instructions for computing 64-bit and 128-bit products of
multiplication
• Single-instruction barrel shifter
• Access to a variety of on-chip peripherals, and interfaces to off-chip memories
and peripherals
• Hardware-assisted debug module enabling processor start, stop, step and trace
under integrated development environment (IDE) control
• Software development environment based on the GNU C/C++ tool chain and
Eclipse IDE
• Instruction set architecture (ISA) compatible across all NIOS II processor
systems
A NIOS II processor system is equivalent to a microcontroller or computer on a chip that
includes a CPU and a combination of peripherals and memory on a single chip. The
NIOS II software development environment is called The NIOS II integrated
development environment (IDE).
15 | P a g e
• Instruction and data cache memories
• Tightly coupled memory interfaces for instructions and data
• JTAG debug module
These functional units form the instruction set for the NIOS II processor. Each
hardware unit can be implemented and emulated in software.
The NIOS II architecture supports the addressing schemes like register
addressing, displacement addressing, immediate addressing, register indirect addressing,
and absolute addressing. In register addressing, all operands are registers, and results are
stored back to a register. In displacement addressing, the address is calculated as the sum
of a register and a signed, 16-bit immediate value. In immediate addressing, the operand
is a constant within the instruction itself. Register indirect addressing uses displacement
addressing, but the displacement is the constant 0.
16 | P a g e
Figure 3.44 Example design
d of NIOS
17 | P a g e
Figure 3.5 NIOS
N II dev
velopment fllow
3.33.5 Defin
ning and generating
g g the system in SO
OPC build
der
After analyyzing the syystem hardw
ware requireements, usee the SOPC Builder too
ol which
is included
i in the Altera Quartus II software. Using
U SOPC
C Builder sspecify the NIOS
N II
proocessor coree(s), memoory, and othher peripheerals the syystem requirres. SOPC Builder
auttomatically generates the intercoonnect log
gic to integgrate the ccomponentss in the
harrdware systeem.
Select from a list of standardd processorr cores and peripheralss provided with
w the
NIO
OS II deveelopment tools. We caan also add our own custom
c harddware to acccelerate
18 | P a g e
sysstem perform
mance. Wee can add custom
c instrruction logic to the N
NIOS II corre which
acccelerates CP
PU perform
mance, or we can add a custom peeripheral w
which offloaads tasks
from the CPU.
Thee primary outputs
o of thhe SOPC buuilder are
1. SOPC bu
uilder systeem file (.pptf) -This file
f stores the hardwaare contentss of the
system. Thhe NIOS III IDE requuires the .pttf file to coompile softw
ware for th
he target
program
2. Hardwaree descriptio
on files (HD
DL) - Thesse are the haardware dessign files thaat which
describes the
t SOPC builder
b systeem. The Qu
uartus II sofftware uses these HDL
L files to
compile thhe overall deesign.
Using Quartus III software we can assign pins locations
l foor the I/O signals,
speecify timingg requiremennts and otheer design co
onstraints. Finally
F the Q
Quartus II project
p is
com
mpiled to geet FPGA coonfigurationn file (.sof).
Downlload the FPG
GA configuuration file on
o the targeet FPGA booard using an
a Altera
dow
wnload cabble like USB blaster. After con
nfiguration, the FPGA
A behaves as the
harrdware speccified whichh in this casee the NIOS II processoor.
3.33.6 NIOS
S II softwaare develoopment ta
asks
Using NIO
OS II IDE we
w can perfoorm all the software deevelopment tasks for th
he NIOS
II processor
p syystem builtt. After the generation
n of the system in SOP
PC builder, we can
dessign C/C++ applicationn code with the NIOS II
I IDE. In addition
a to thhe applicatiion code
we can designn and reuse the
t custom libraries available in ouut NIOS II IDE projectts.
19 | P a g e
After the configuration of Altera FPGA board with NIOS II system, we can
download the software using a download cable.
3. Memory initialization files (.hex): These are the initialization files for the on-chip
memories that support initialization content.
4. Flash programming data: The IDE includes a flash programmer, which allows to
write the program to flash memory. The flash programmer adds appropriate boot
code to allow the program to boot from flash memory.
20 | P a g e
Figurre 3.7 Custoom instructio
on logic in NIOS
N
21 | P a g e
3.33.8 Summ
mary of Developm
D ment of NIOS II Sysstem
1) Analyze system
s requuirements: Based on the applicaation the ccomponentss of the
system shoould be deciided.
2) Start the Quartus
Q II sooftware andd open a pro
oject: Open a new Quarrtus II projeect. This
project serrves as an eaasy starting point of thee NIOS devvelopment fl
flow.
3) Start a new SOPC builder systeem: SOPC Builder is used to gennerate the NIOS
N II
processor system. Add
A the dessired periph
herals, and configure how they connect
together. The
T follow
wing steps are
a to be performed
p to create a system in
n SOPC
builder.
• Choosee SOPC Buuilder (Tools menu) in ware. SOPC Builder
n the Quarttus II softw
starts and
a displayss the Createe New Systeem dialog boox.
• Enter the
t system name
n
• Select verilog or VHDL
V as thhe target HD
DL
4) Define thee system in SOPC buildder:
Define thee hardware characteristtics of the NIOS
N II syystem such as NIOS III core to
use, whaat peripheraals to includde in the sy
ystem. SOP
PC builder ddoes not deefine the
software behavior.
b Thhe followingg steps are to
t be follow
wed to definee a system.
• Specify
fy the target FPGA and clock settin
ngs.
• Add thhe NIOS II soft
s core, onn-chip mem
mory, and othher peripherals.
• Specify
fy base address and inteerrupt requeest (IRQ) prriorities.
• Specify
fy more NIO
OS II settinggs
• Generaate the SOP
PC builder syystem
Figu
ure 3.8 Com
mponents of the SOPC system
22 | P a g e
5) Integrate SOPC builder into Quartus II project:
The following steps are to be performed to complete the hardware design.
• Instantiate the SOPC Builder system module in the Quartus II project.
• Assign FPGA pins.
• Compile the Quartus II project
• Verify timing
6) Download hardware design to target FPGA:
• Connect the board to the host computer with the download cable, and apply
power to the board.
• Choose Programmer (Tools menu) in the Quartus II software. The Programmer
window appears and automatically displays the appropriate configuration file
(nios2_quartus2_project.sof).
• Click Hardware Setup in the top-left corner of the Programmer window to verify
your download cable settings. The Hardware Setup dialog box appears.
• Select the appropriate download cable in the currently selected hardware list.
• Turn on Program/ configure.
7) Develop software using NIOS II IDE:
• Create a new C/C++ application project.
• Compile the project
8) Run the program: We can run the program on target hardware or on the NIOS II
Instruction set simulator.
23 | P a g e
dessign file thhat connectss all the coomponents together. SOPC
S Builder generattes both
Verilog HDL and VHDL equally, annd does not favor one over
o the otheer.
3.33.9.2 SOPC
C builder componen
c nts
SOPC Builder com
mponents are
a the build
ding blockss of the systtem modulee. SOPC
Buuilder compoonents use the
t Avalon interface fo
or the physical connecttion of comp
ponents,
andd SOPC Buuilder can bee used to coonnect any logical
l deviice (either oon-chip or off-chip)
o
thaat has an Avvalon interfface. The Avalon
A interface uses an address--mapped read/write
prootocol that allows
a master componeents to read and/or writte any slave componentt.
Alttera providees ready-to-uuse SOPC Builder
B com
mponents, thhose are:-
• Micropprocessors, such as the NIOS II prrocessor
• Microccontroller peripherals
• Timerss
• Serial communiccation interrfaces, such
h as a UA
ART and a serial peeripheral
interface (SPI)
24 | P a g e
• General purpose I/O
• Digital signal processing (DSP) functions
• Communications peripherals
• Interfaces to off-chip devices
− Memory controllers
− Buses and bridges
− Application-specific standard products (ASSP)
− Application-specific integrated circuits (ASIC)
− Processors
The purpose of SOPC Builder is to abstract away the complexity of interconnect logic,
allowing designers to focus on the details of their custom components and the high-level
system architecture.
3.3.9.4 SDRAM
Altera provides a free SDRAM controller core, which uses inexpensive SDRAM
as bulk RAM in FPGA designs. The SDRAM controller core is necessary, because
Avalon signals cannot describe the complex interface on an SDRAM device. The
SDRAM controller acts as a bridge between the Avalon switch fabric and the pins on an
SDRAM device. The SDRAM controller can operate in excess of 100 MHz The choice
of SDRAM device(s) and the configuration of the device(s) on the board heavily
influence the component-level design for the SDRAM controller. Typically, the
25 | P a g e
component-level design task involves parameterizing the SDRAM controller core to
match the SDRAM device(s) on the board.
The Avalon Tristate Bridge automatically adds registers to output signals from the
Tristate Bridge to off-chip devices. Registering the input and output signal shortens the
register-to-register delay from the memory device to the FPGA, resulting in higher
system f max performance. However, in each direction, the registers add one additional
cycle of latency for Avalon master ports accessing memory connected to the Tristate
Bridge. The registers do not affect the timing of the transfers from the perspective of the
26 | P a g e
memory device. The Avalon interface for the PWM component requires a single slave
port using a small set of Avalon signals to handle simple read and write transfers to the
registers.
The component's Avalon slave port has the following characteristics:
• It is synchronous to the Avalon slave port clock.
• It is readable and writeable.
• It has zero wait states for reading and writing, because the registers are able to
respond to transfers within one clock cycle.
• It has no setup or hold restrictions for reading and writing.
• Read latency is not required, because all transfers can complete in one clock
cycle. Read latency would not improve performance.
• It uses native address alignment, because the slave port is connected to registers
rather than a memory device.
3.3.9.8 SRAM
The choice of RAM device(s) and the configuration of the device(s) on the board
determine how the interface component is created. The component-level design task
involves entering parameters into the component editor to match the device(s) on the
board.
27 | P a g e
3.3.9.9 UART
An UART, universal asynchronous receiver / transmitter is responsible for
performing the main task in serial communications. The device changes incoming
parallel information to serial data which can be sent on a communication line. A
second UART can be used to receive the information. The UART performs all the tasks,
timing, parity checking, etc. needed for the communication. The only extra devices
attached are line driver chips capable of transforming the TTL level signals to line
voltages and vice versa. To use the UART in different environments, registers are
accessible to set or review the communication parameters. Eight I/O bytes are used for
each UART to access its registers. Settable parameters are for example the
communication speed, the type of parity check, and the way incoming information is
signaled to the running software.
28 | P a g e
CHAPTER 4
DESIGN AND IMPLEMENTATION
This chapter explains aspects of design and implementation of the encoder and decoder.
Original Image Compressed Image
Discrete Wavelet Huffman Decoding
Transform
Run‐Length Decoding
Quantization
Inverse Quantization
Run ‐ Length Coding
Inverse Discrete
Huffman Coding Wavelet Transform
Compressed Image Reconstructed Image
Figure 4.1 Flow Chart showing the sequence of steps in Compression & Decompression
The above flow charts give the sequence of routines executed in Image
Compression and Decompression
29 | P a g e
4.11 Design parameteers and coonstraintss
4.11.1 Memoory read/wrrite
The innput image to the encooder is raw gray scale frames of 512 by 512
2 pixels.
Eacch pixel is representedd by 256 gray scale lev
vels (8 bits)). Input fram
me is loadeed to the
em
mbedded memory by thhe host compputer and reesults are reead back, oonce the all PEs has
proocessed it. The top leevel VHDL
L module of
o each Staage is referrred as Pro
ocessing
Eleement (PE). The PE also
a uses thee embedded
d memory as
a intermediiate storagee to hold
results betweeen different stages of prrocessing.
4.11.2 Design
n partitioniing
The whhole compuutation is partitioned in
nto four staages. The fi
first stage co
omputes
discrete wavellet transform
m coefficiennts of the in
nput image frame
f and w
writes it bacck to the
em
mbedded mem
mory. The second
s stagge, operates on this resuult, does dynnamic quan
ntization,
zerro thresholdding, run length enccoding for zeroes, annd entropy encoding on the
coeefficients. The
T third staage does thee entropy deecoding, runn length deccoding of zeeros and
deqquantizationn. The fourtth stage com
mputes the pixel
p data frrom the wavvelet coefficcients.
F
Figure 4.2 Stage wise Ordering of Routines
R
30 | P a g e
4.2 Stage 1: Discrete Wavelet Transform
Discrete Wavelet transform is implemented by filter banks. The filter used is the
(2,2) Cohen-Daubechies-Feuveau wavelet filter. Though much longer filters are common
for audio data, relatively short filters are used for video.
31 | P a g e
can be written back. In our design, we use the in-place ordering scheme described above
which is optimized for memory read/write operation. Once the three stages of wave-
letting is done, we resort back to Mallot ordering.
Once the filter has been applied along all rows in a stage, the same filter is applied along
the columns. With the afore mentioned interleaved ordering scheme, alternate columns
are all fs or all gs. Unlike the row traversal, the two values obtained in a memory read on
a column traversal, are not consecutive values of the same column. Rather, they are
corresponding values from two different vertically parallel streams (figure 4.4).
These differences along the row and column computations are accounted by
having two separate data flow blocks along the two directions. The data flow block in X
direction (ForwardWaveletX) accepts two successive values of the same row and outputs
either two consecutive fs or two consecutive gs, in alternate fashion. The data flow block
32 | P a g e
in Y direction (ForwardW
WaveletY) accepts
a one value each from two pparallel streaams and
outtputs either the fs for the two streams or thee gs in an alternate
a m
manner, (figu
ure 4.5).
Theese blocks also need informationn on when a row/coluumn starts/eends to han
ndle the
bouundary condditions. Theey also havee a pipeline latency of 3 cycles.
33 | P a g e
Figgure 4.7 RT
TL view of Forward Wavelet Y
Once all
a the 512 rows are processed,
p the
t filters are
a applied in the Y direction.
Thiis completees the firstt stage of wave-lettin
ng. While conventiona
c al Mallot ordering
o
schheme aggreggates coeffiicients into the 4 quadrrants, our orrdering scheme interleaves the
coeefficients inn the memoory. The seecond stagee of wave-letting only processes the low
freqquency coeefficients frrom the firrst stage. This
T correspponds to thhe upper leeft hand
quaadrant in thhe Mallot scheme.
s Thhus, second
d stage operrates on roow and colu
umns of
lenngth 256, while
w the thhird stage operates
o on
n rows and columns oof length 128. The
34 | P a g e
aggregation of coefficients along the 3 stages under Mallot ordering is shown in figure
4.7. The memory map with the interleaved ordering is shown in figure 4.8.
35 | P a g e
Figure 4.10 Interleaved ordering along the 3 stages of wave-letting
4.3 Stage 2
Stage 2 does the rest of the processing on the wavelet coefficients computed in
the first stage. The coefficients, are quantized, zero-thresholded, zeroes run length
encoded, and entropy encoded to get the final compressed image.
36 | P a g e
4.3.1 Dynamic quantization
The coefficients from different sub-bands (different quadrants with the Mallot
ordering scheme) are quantized separately. The dynamic range of the coefficients for
each sub-band (computed in first stage) is divided into 16 quantization levels. The
coefficients are quantized into one of the 16 possible levels. The maximum and
minimum value of the coefficients for each sub-band is also needed while decoding the
image.
Enable
Maximum
(16)
Minimum Dynamic
(16) Quantizer
Quantized
Coefficients output (4)
(16)
Clock
37 | P a g e
Figure 4.13 RTL view of Dynamicc Quantizer
After the
t zero thrresholding a large num
mber of coeffficients aree truncated to zero.
Lonng sequences of zeroes can be eff
ffectively co
ompressed by
b run lenggth encoding
g, which
repplaces eachh individuall occurrencce of a zero in a coontinuous sspell with a count
inddicating the length of thhe spell. Too decode a run
r length encoded
e stream, this co
ount has
to be
b distinguiishable from
m other charracters of th
he input dataa set. The oother valid character
c
is the
t 4 bit output
o from the quantiizer. Sixteeen numbers 0 to 15 arre reserved
d for the
quaantizer outpput values, while num
mbers 16 to 255 (240 numbers) aare free. Th
hus, any
conntinuous sppell of zerooes rangingg from 1 (represented
( d by the nnumber 16) to 240
(reppresented by
b the num
mber 255) caan be replaaced by the correspondding count. Longer
speells have too be broken down to fall
fa within this
t range. Table
T 4.1 sshows the bit
b range
alloocation. Thhe run lenggth encoderr, might no
ot have an output onn every cyccle. The
38 | P a g e
succeeding block has to be signaled as to when to read the RLE count, and when to wait
for a spell to finish. Whenever RLE detects a zero, it asserts ’RLErunning,’ and starts
counting the sequence of continuous zeroes. The current sum of zeroes is always
available on ’RLEout.’ When the continuous spell of zeroes end, ’RLErunning’ is
deasserted, and ’RLEspellEnd’ is asserted for one cycle to allow the next block to read
off the RLE count. The RLE counter is also reset to 15.
In this set-up, there is look ahead problem. Before RLE can signal the end of a
spell, it needs to see the next value is the stream. But, RLE is used in conjunction with
the dynamic quantizer, (RLE and quantizer are connected in parallel) which is a 4 staged
pipeline.
RLE might face an arbitrarily long sequence of zeroes. RLE can count only upto
a maximum of 240 zeroes. Thus, when RLE has seen 240 continuous zeroes and still
more zeroes are arriving, ’RLEspellEnd’ would be asserted for one clock cycle, and the
internal counter is reset to 15. Here, ’RLErunning’ would be high through out the spell.
Enable Flush
RLE out
Input
(8)
(16) Run Length
Encoder RLE
Zero Threshold running
(16) RLE Spell
end
Clock
39 | P a g e
Taable 4.1 Bit range
r alloca
ation for RL
LE
4.33.3 Entrop
py encoding
Entroppy encodingg involves assigning a smallerr length encoding fo
or more
freqquently useed characterrs in the daata set and a larger lenngth encoding for infreequently
useed characterrs in the datta set. This involves variable lenggth encodinng of the inp
put data.
To efficiently retrieve thee original daata, an enco
oded word shhould not bbe a proper prefix
p of
anyy other enccoded wordd. Huffmann trees are an efficiennt way of coming up
p with a
varriable lengthh encoding for a set of
o characterss, given thee relative frrequencies. Further,
forr a Huffmann tree basedd encoding,, decoding can be donne in linear time (lineaar in the
lenngth of the encoded
e woord). Varioous other scchemes of encoding
e ussing differen
nt levels
of context
c sensitive inform
mation exitss. This migh
ht incur a coostlier decooding functio
on.
40 | P a g e
Encoding scheme
In this implementation, I use an encoding scheme which is not a Huffman tree
based code. The bit allocation is shown in figure 4.12. Eight bit inputs are variable length
encoded between 3 to 18 bits. The encoding is implemented by two look-up tables on the
FPGA. Given an eight bit input, the first look-up table (LUT), provides information
about the size of encoding. The second LUT gives the actual encoding. Only the relevant
bits from the second LUT should be used. The rest of the bits in the output are don’t care
and are either chosen as logic 0 or 1 during logic optimization.
LUT for
encoding
Input (8) length (5)
LUT for variable
length bit
(18)
encoding
41 | P a g e
4.3.4 Bit packing
The output of the entropy encoder varies from 3 to 18 bits. The bits need to be
packed into 32 bit words before being written back to the embedded memory. This is
achieved by the shifter discussed below. This shifter is inspired from the Xtetris
computer game and the binary search algorithm.
Shifter
The shifter consists of 5 register stages, each 32 bits wide. The input data can be
shifted (rotated) by 16 or latched without shifting, to stage 1. The data can be shifted by
8 or passed on straight from stage 1 to stage 2. Similarly data can be shifted by 4, 2, and
1 when moving between the remaining stages. Data is shifted from stage to stage, and is
accumulated at the last stage. When the last stage has 32 bits of data, a memory write is
initiated and the last stage is flushed.
The data is shifted to the right place over the 5 stages in order to complete a word
at the last stage. The key decision is whether to shift or not at each stage. A 5 bit counter
is maintained to store the length of the data currently held. For example, let the lengths
42 | P a g e
of the words arriving at stage 1 be a 1 , a 2 , a 3 etc. The counter will have values 0, a 1 ,
a 1 +a 2 , etc. in the corresponding clock cycles. The counter is allowed to overflow once it
reaches 31. Thus, the counter value indicates where the next word should start by the
time it reaches the last stage. Different bits of the counter (delayed appropriately) are
used to decide whether to shift or not at each stage.
Part of the last stage needs double buffering. To determine the size of the double buffer
needed, consider the worst case. The last stage already has 31 bits and the next data
coming from stage 4 is of maximum size (18 bits). Only 1 out of the 18 bits can be added
to the last stage and a memory write initiated. The rest of the 17 bits need to be buffered
for this cycle, and brought out in the next cycle. Thus, 17 out of the 32 bits in the last
stage are double buffered. Thus, whenever an overflow is detected, the double buffer is
loaded with the excess bits and taken out during the next cycle.
The output file written has all the information needed to reconstruct the image.
The format of the output file generated is shown in figure 4.15.
43 | P a g e
Figure 4.19 Output file format
44 | P a g e
Figure 4.20 Stage 2, data flow diagram
The control flow is show in figure 4.17. Before reading the wavelet coefficients,
the maximum and minimum of coefficients in each sub-band are read from the lower
memory. The coefficients are then read and processed for each sub-band, starting with
the lowest frequency band. As shown in the state diagram, a memory read is fired in
stage Read 001. Memory read has a latency of 2 clock cycles. The result of the read is
finally available in state Read 100. Memory writes are completed in the same cycle. The
two intermediate states, Read 010 and Write can be used to write back the output, if
output is available. Each memory read brings in two wavelet coefficients. Consider the
worst case, where the two coefficients gets expanded to 18 bits each. There are two
memory write cycles before the next read. When ever a memory write is performed, the
memory address register is incremented.
The read address generators read each sub-band from the interleaved memory
pattern. The output is written as a continuous stream, starting with the lowest sub-band.
Thus the output is effectively in Mallot ordering and can be progressively transmitted or
decoded.
45 | P a g e
Figure 4.21 Stage 2, control flow diagram
4.4 Stage 3
Stage 3 decodes the compressed data into wavelet coefficients. The compressed
image data is entropy decoded , run length zeroes are decoded to the zero thershold value
and the quantized values are dequantized to the coefficient values.
The decoding is implemented by a single look-up table on the FPGA. The 18 bit
input pattern is searched in the look-up table and its corresponding eight bit output
pattern is given as output.
46 | P a g e
Figu
ure 4.22 Entropy Decod
der
47 | P a g e
4.44.3 Dequaantization
Here thhe quantizeed 4 bit valuues are deq
quantized innto 16 bit w
wavelet coeffficients.
Inpputs to this block
b are 166 bit maxim
mum and 16
6 bit minimuum values oof the block
k and the
4 bit
b quantizedd value. Bassed upon thhe 4 bit valu
ue the 16 bitt coefficientt value is caalculated
as shown in Table
T 4.2. Thus
T coefficcient valuess for each block
b are caalculated with
w their
respective maxximum and minimum values.
v
[1 * ( Maax − Min )]
M +
0000 ⇒ Min
16
[ 2 * ( Maax − Min )]
Min +
0001 ⇒ Mi
16
[ 3 * ( Maax − Min )]
M +
0010 ⇒ Min
16
[ 4 * ( Maax − Min )]
Mi +
0011 ⇒ Min
16
•
•
•
•
•
•
[13 * ( Max
M − Min )]]
M +
1100 ⇒ Min
16
[14 * ( Max
M − Min )]
Min +
1101 ⇒ M
16
[15 * ( Max
M − Min )]]
M +
1110 ⇒ Min
16
1111 ⇒ Max
M
T
Table 4.2 Coefficient va
alues calculaation for Deequantizer
Figu
ure 4.24 Deq
quantizer
48 | P a g e
4.44.4 Stage 3,
3 Overall architectur
a re
Figu
ure 4.25 Stag
ge3 Data Floow Diagram
m
49 | P a g e
Figu
ure 4.27 coeffficient valu
ues processin
ng along Y d
direction
50 | P a g e
4.6 Implementation
The VHDL top level modules (PE) are added to the NIOS system as the custom
logic peripherals. Initially NIOS system is created by using NIOS core processor,
SRAM, SDRAM, JTAG UART and PIO’s. Then each stage top level VHDL module is
added as custom logic peripheral to the NIOS system. The block diagram of the NIOS
system with the custom logic peripherals is shown in the Figure 4.24
Ethernet PHY
Chip
System Module
M M S M S M S
Avalon Switch Fabric
Figure 4.29 Custom Logic System Module with Avalon Switch Fabric
The top level VHDL module for each stage can be named as Processing Element
(PE). PE uses those signals to access memory and communicate with host. PE along with
the memory and the control signals is shown in Figure 4.25
51 | P a g e
Figure 4.30 Block Diagram of Processing Element (PE)
PE and memory create a block interface to the outside world (host). Signals
having arrows pointing to the block denote the input signals to PE and memory interface.
Signals having arrows leaving the block denote the output signals from interface to host.
52 | P a g e
The tim
ming diagraam for PE accessing
a itts memory is
i shown inn Figure 4.2
27. Each
PE communiccates with itts memory through a 22-bit
2 addreess bus, a 332-bit bi-dirrectional
datta bus, a read-write
r select conttrol signal PE_Writesselect_n, annd a strobee signal
PE_MemStrobbe. When thhe PE is intended to do
o the processsing, NIOS
S enables th
he PE by
drivving PE_M
MemBusGrannt_n signal to ‘0’. PE_
_MemStrobbe_n shouldd be set to ‘0’ when
PE is accessinng the mem
mory. PE_M
MemWriteSeel_n signal ‘0’ denotes a write acccess and
‘1’ denotes a read
r access.. PE is required to assert the addreess of the acccess on thee address
signal. If PE wants
w to write memoryy, the data to
o be writtenn must be aasserted in the same
circcle as addrress assertioon. If PE wants
w to reead data froom memoryy, the data will be
avaailable threee clock cyclles after thee PE asserts the addresss.
F
Figure 4.32 Timing Diagram for Memory
M Read
d after a Wrrite Access
53 | P a g e
CHAPTER 5
RESULTS
Forward Wavelet X
Figure 5.1 VHDL Simulation Output for Forward Wavelet X
Figure 5.1 captured from Model Sim-Altera shows the timing of the Forward Wavelet X.
Fwav_p2, Fwav_p3 are the input signals, Fwav_f, Fwav_g are the output signals.
Fwavstart and Fwavend signals the starting and ending of the row data. Fwavclk
54 | P a g e
provides the clock signal to the component with a time period of 100ns. In every cycle 2
pixel data is given to the input lines, 2 fs or 2 gs is latched on to the output lines.
Forward Wavelet Y
Figure 5.2 VHDL Simulation Output for Forward Wavelet Y
Figure 5.2 captured from Model Sim-Altera shows the timing of the Forward Wavelet Y.
Fwav_a4, Fwav_b4 are the input signals, Fwav_a, Fwav_b are the output signals.
Fwavstart and Fwavend signals the starting and ending of the column data. Fwavclk
provides the clock signal to the component with a time period of 100ns. In every cycle 2
pixel data is given to the input lines, 2 fs or 2 gs is latched on to the output lines.
Stage1 Top Level
Figure 5.3 VHDL Simulation Output for Stage1 Top Level
55 | P a g e
Figure 5.3 captured from Model Sim-Altera shows the timing of the Stage1 toplevel.
Data_InReg , are the input signals, Data_OutReg, Addr_OutReg are the output signals.
MemReadSel_n, MemWriteSel_n, MemStrobe_n are the control signals used for
synchronizing the read and write operations. Fwavclk provides the clock signal to the
component with a time period of 100ns. In every cycle 2 pixel data is given to the
Data_InReg, 2 fs or 2 gs is latched on to the Data_OutReg.
Inverse Wavelet
Figure 5.4 VHDL Simulation Output for Inverse Wavelet
Figure 5.4 captured from Model Sim-Altera shows the timing of the Inverse Wavelet.
Fwav_f, Fwav_g are the input signals, Fwav_p2, Fwav_p3 are the output signals.
Fwavstart signals the starting of the row data. Fwavclk provides the clock signal to the
component with a time period of 100ns. In every cycle 2 coefficient values (2 fs or 2 gs)
is given to the input lines, 2 pixel data is latched on to the output lines.
Quantizer
Figure 5.5 VHDL Simulation Output for Quantizer
56 | P a g e
Figure 5.5 captured from Model Sim-Altera shows the timing of the Quantizer.
QUANTin, QUANTmax, QUANTmin are the input signals, QUANTout is the output
signal. QUANTclk provides the clock signal to the component with a time period of
100ns. For every sub band the max and min values are given and remain until all the
coefficients of the sub band are quantized. In every cycle coefficient value is given to the
QUANTin, quantized 4 bit output is latched on to the QUANTout.
Dequantizer
Figure 5.6 VHDL Simulation Output for Dequantizer
Figure 5.6 captured from Model Sim-Altera shows the timing of the Dequantizer.
QUANTin, QUANTmax, QUANTmin are the input signals, QUANTout is the output
signal. QUANTclk provides the clock signal to the component with a time period of
100ns. For every sub band the max and min values are given and remain until all the
quantized values of the sub band are dequantized. In every cycle quantized 4 bit is given
to the QUANTin, coefficient value is latched on to the QUANTout.
Run-Length Encoder
Figure 5.7 VHDL Simulation Output for Run‐Length Encoder
57 | P a g e
Figure 5.7 captured from Model Sim-Altera shows the timing of the Run Length
Encoder. RLEin, RLEzeroth are the input signals, RLEout, RLEspellend, RLErunning
are the output signals. RLEclk provides the clock signal to the component with a time
period of 100ns. For every sub band RLEzeroth is given and remains until all the values
of the sub band are Run length coded. In every cycle coefficient value is given to the
RLEin, RLEin compared to RLEzeroth, if it is less than RLEzeroth then the zeros count
is activated by asserting RLErunning and no of zeros are counted. When the zeros count
is broken then it latches the output on RLEout and makes RLEspellend high for 1 clock
cycle.
Shifter
Figure 5.8 VHDL Simulation Output for Shifter
Figure 5.8 captured from Model Sim-Altera shows the timing of the shifter. SFTRdatain,
SFTRlenin are the input signals, SFTRout, SFTRouten are the output signals. SFTRclk
provides the clock signal to the component with a time period of 100ns. In every clock
cycle SFTRlenin gives the size of valid data on SFTRdatain. As soon as the data
received reaches 32 bit then the 32 bit packed output is latched on SFTRout and asserts
SFTRouten for 1 clock cycle.
58 | P a g e
5.3 NIOS II Implementation Results
The encoder runs in two stages and decoder runs in two stages. A raw frame of
512 by 512 pixels is loaded to the embedded memory. The starting address of the image
data memory location and the control is transferred to the stage1 by NIOS. After each
stage finishes it’s processing on this memory, it interrupts the NIOS signaling the end of
operation, then NIOS transfers the control and starting address of the memory locations
to the next stage. The hardware configuration runs at a system clock of 10MHz. For each
level of compression the hardware is reconfigured and downloaded onto the FPGA
The embedded memory is loaded and unloaded by the host computer using the
operating system driver routines. The control software utilized, for loading the memory
and servicing each stage interrupts is written in C. The device driver routines provided
by the board vendor are employed for this task.
Figure 5.9 PSNR & RMSE Equations
59 | P a g e
5.3.2 Results analysis for LENA (512 x 512) image
Figure 5.10 Original Image of LENA
Table 5.1 Compression Level & Noise Measurements of LENA Image
Reconstructed Image after Reconstructed Image after Reconstructed Image after
Minimum Compression Medium Compression Maximum Compression
Figure 5.11 Reconstructed Images of LENA
60 | P a g e
Figure 5.12 Original Image of BARBARA
Table 5.2 Compression Level & Noise Measurements of BARBARA Image
Reconstructed Image after Reconstructed Image after
Reconstructed Image after
Minimum Compression Medium Compression Maximum Compression
Figure 5.13 Reconstructed Images of BARBARA
61 | P a g e
Figure 5.14 Original Image of GOLD HILL
Table 5.3 Compression Level & Noise Measurements of GOLD HILL Image
Reconstructed Image after Reconstructed Image after
Reconstructed Image after
Minimum Compression Medium Compression Maximum Compression
Figure 5.15 Reconstructed Images of GOLD HILL
62 | P a g e
5.4 Performance Graphs (Compression Ratio Vs PSNR (db))
LENA
32
PSNR(db) 31 9.11, 30.894
30
47.18, 29.53
29
69.58, 28.05 Series1
28
9
27
0 20 40 60 80
Compression Ratio
BARBARA
25.5
25 8.82, 25.017
PSNR(db)
24.5 32.01, 24.47
2
24
23.5 53.33, 23.42
Series1
23 7
0 20 40 60
Compression Ratio
GOLD HILL
31
30 8.7, 30.038
PSNR(db)
29
43.18, 28.11
28 9
27 72.09, 26.41Series1
26 7
0 20 40 60 80
Compression Ratio
63 | P a g e
CONCLUSIONS
64 | P a g e
FUTURE WORK
The lessons learned from this work will help us enhance similar implementations in the
future. Few of the improvements that we now foresee are listed below:
65 | P a g e
APPENDIX
66 | P a g e
REFERENCES
67 | P a g e