Image Processing With VHDL PDF

First Edition
U. Chuks
6/1/2010
Copyright © 2010 by U. Chuks
Cover design by U. Chuks

Book design by U. Chuks
All rights reserved.
No part of this book may be reproduced in any

form or by any electronic or mechanical means
including information storage and retrieval
systems, without permission in writing from the
author. The only exception is by a reviewer, who
may quote short excerpts in a review.
U. Chuks
Visit my page at
http://www.lulu.com/spotlight/Debarge
Contents
Table of Contents
Contents............................................................................................ iii
Preface ............................................................................................... vi
Chapter 1 .......................................................................................... 1
Introduction...................................................................................... 1
1.1 Overview of Digital Image Processing .................................. 1
1.1.1 Application Areas .................................................... 2
1.2 Digital Image Filtering .......................................................... 2
1.2.1 Frequency Domain .......................................................... 2
1.2.2 Spatial Domain ................................................................. 4
1.3 VHDL Development Environment ......................................... 6
1.3.1 Creating a new project in ModelSim .............................. 7
1.3.2 Creating a new project in Xilinx ISE ............................. 14
1.3.3 Image file data in VHDL image processing ................. 18
1.3.4 Notes on VHDL for Image Processing ......................... 20
References................................................................................... 23
Chapter 2 ........................................................................................ 25
Spatial Filter Hardware Architectures ............................................ 25
2.1 Linear Filter Architectures .................................................... 25
2.1.1 Generic Filter architecture ............................................. 28
2.1.2 Separable Filter architecture ......................................... 30
2.1.3 Symmetric Filter Kernel architecture ............................ 32
iii
2.1.4 Quadrant Symmetric Filter architecture ....................... 34
2.2 Non-linear Filter Architectures ............................................. 35
Summary...................................................................................... 35
References................................................................................... 36
Chapter 3 ........................................................................................ 37
Image Reconstruction .................................................................. 37
3.1 Image Demosaicking .......................................................... 37
3.2 VHDL implementation........................................................... 44
3.2.1 Image Selection ............................................................. 49
Summary...................................................................................... 57
References................................................................................... 57
Chapter 4 ......................................................................................... 59
Image Enhancement....................................................................... 59
4.1 Point-based Enhancement................................................... 60
4.1.1 Logarithm Transform ..................................................... 60
4.1.2 Gamma Correction ........................................................ 62
4.1.3 Histogram Clipping ........................................................ 62
4.2 Local/neighbourhood enhancement .................................... 64
4.2.1 Unsharp Masking ........................................................... 64
4.2.2 Logarithmic local adaptive enhancement .................... 65
4.3 Global/Frequency Domain Enhancement ........................... 65
4.3.1 Homomorphic filter......................................................... 66
4.4 VHDL implementation........................................................... 66
Summary...................................................................................... 68
References................................................................................... 68
Chapter 5 ......................................................................................... 70
iv
Image Edge Detection and Smoothing ......................................... 70
5.1 Image edge detection kernels.............................................. 70
5.1.1 Sobel edge filter ............................................................. 71
5.1.2 Prewitt edge filter ........................................................... 72
5.1.3 High Pass Filter.............................................................. 73
5.2 Image Smoothing Filters ...................................................... 74
5.2.1 Mean/Averaging filter..................................................... 75
5.2.2 Gaussian Lowpass filter ................................................ 75
Summary...................................................................................... 77
References................................................................................... 77
Chapter 6 ......................................................................................... 78
Colour Image Conversion............................................................... 78
6.1 Additive colour spaces ......................................................... 78
6.2 Subtractive Colour spaces ................................................... 79
6.3 Video Colour spaces ............................................................ 82
6.4 Non-linear/non-trivial colour spaces .................................... 91
Summary...................................................................................... 95
References................................................................................... 95
Circuit Schematics .......................................................................... 97
Creating Projects/Files in VHDL Environment ............................ 106
VHDL Code ................................................................................... 118
Index............................................................................................... 123
v
Preface
The relative dearth of books regarding the know-how involved in implementing
several algorithms in hardware was the motivating factor in writing this book,
which was written for those with a prior understanding of image processing
fundamentals who may or may not be familiar with programming environments
such as MATLAB and VHDL. Thus, the subject is addressed very early on,
bypassing the fundamental theories of image processing, which are better
covered in several contemporary books given in the references sections in the
chapters of this book.
By delving into the architectural design and implications of the chosen

algorithms, the user is familiarized with the necessary tools to realize an
algorithm from theory to software to designing hardware architectures.
Though the book does not discuss the vast theoretical mathematical processes
underlying image processing, it is hoped that by providing working examples
of actual VHDL and MATLAB code and simulation results of the software, that
the concepts of practical image processing can be appreciated.
This first edition of this book attempts to provide a working aid to readers who
wish to use the VHDL hardware description language for implementing image
processing algorithms from software.
vi
Chapter 1
Introduction
Digital image processing is an extremely broad and ever
expanding discipline as more applications, techniques and
products utilize digital image capture in some form or the
other. From industrial processes like manufacturing to
consumer devices like video games and cameras, etc,
image processing chips and algorithms have become
ubiquitous in everyday life.
1.1 Overview of Digital Image Processing

Image processing can be performed in certain domains
using:
 Point (pixel-by-pixel) processing operations.
 Local /neighbourhood/window mask operations.
 Global processing operations.
A list of the areas of digital image processing includes but is

not limited to:
 Image Acquisition and Reconstruction
 Image Enhancement
 Image Restoration
 Geometric Transformations and Image Registration
 Colour Image Processing
 Image Compression
 Morphological Image Processing
 Image Segmentation
 Object and Pattern Recognition
For the purposes of this book we shall focus on the areas of
1
Introduction
Image Reconstruction, Enhancement and Colour Image
Processing and the VHDL implementation of selected
algorithms from these areas.
1.1.1 Application Areas

 Image Reconstruction and Enhancement techniques
are used in digital cameras, photography, TV and
computer vision chips.
 Colour Image and Video Enhancement is used in
digital video, photography, medical imaging, remote
sensing and forensic investigation.
 Colour Image processing involves colour
segmentation, detection, recognition and feature
extraction.
1.2 Digital Image Filtering

Digital image filtering is a very powerful and vital area of
image processing, with convolution as the fundamental and
underlying mathematical operation that underpins the
process makes filtering one of the most important and
studied topics in digital signal and image processing.
Digital image filtering can be performed in the Frequency,

Spatial or Wavelet domain and operating in any of these
domains requires a domain transformation or changing the
representation of a signal or image into a form in which it is
easier to visualize and/or modify the particular aspect of the
signal one wishes to analyze, observe or improve upon.
1.2.1 Frequency Domain

Filtering in the frequency domain involves transforming an
image into a representation of its spectral components and
then using a frequency filter to modify and alter the image
2
Introduction
by passing a particular frequency and suppressing or
eliminating other unwanted frequency components. This
frequency transform can involve the famous Fourier
Transform or Cosine Transform. Other frequency
transforms also exist in the literature but these are the most
popular. The (Discrete) Fourier transform is another core
component in digital image processing and signal analysis.
The transform is built on the premise that complex signals
can be formed from fundamental and basic signals when
combined together spectrally. For a discrete image function,
of M ×N dimensions with spatial coordinates, x and
y, the DFT transform is given as;
(1.2.1-1)
And its inverse transform back to the spatial domain is;
(1.2.1-2)
Where is the discrete image function in the

frequency domain with frequency coordinates, u and v, and
j is the imaginary component. The basic steps involved in
frequency domain processing are shown in Figure 1.2.1(i).
Pre- Frequency Post-

Processing Domain Processing
Filter
Fourier Inverse
Transform Fourier
Transform
Figure 1.2.1(i) - Fundamental steps of frequency domain filtering

3
Introduction
The frequency domain is more intuitive due to the
transformation of the spatial image information to
frequency-dependent information. The frequency
transformation makes it is easier to analyze image features
across a range of frequencies. Figure 1.2.1(ii) illustrates the
frequency transformation of the spatial information inherent
in an image.
(a) (b)
Figure 1.2.1(ii) – (a) Image in spatial domain (b) Image in frequency
domain
1.2.2 Spatial Domain

Spatial domain processing operates on signals in two
dimensional space or higher, e.g. grayscale, colour and
MRI images. Spatial domain image processing can be
point-based, neighbourhood/kernel/mask or global
processing operations.
The spatial domain mask filtering involves convolving a

small spatial filter kernel or mask around a local region of
the image, performing the task repeatedly until the entire
image is processed. Linear spatial filtering processes each
pixel as a linear combination of the surrounding, adjacent
neighbourhood pixels while non-linear spatial filtering uses
statistical, set theory or logical if-else operations to process
4
Introduction
each pixel in an image. Examples include the median and
variance filters used in image restoration. Figure 1.2.2(i)
show the basics of spatial domain processing where
is the input image and is the processed output
image.
Pre- Filter Post-
Ii ( x, y) processing Function processing Io(x, y)
Figure 1.2.2(i) - Basic steps in spatial domain filtering
Spatial domain filtering is highly favoured in hardware

image processing filtering implementations due to the
practical feasibility of employing it in real-time industrial
processes. Figure 1.2.2(ii) shows the plots of a frequency
response of the filter and the spatial domain equivalent for
high and low pass filters.
(a) (b)
(c) (d)
Figure 1.2.2(ii) – Low-pass filter in the (a) frequency domain (b) spatial
domain and High-pass filter in the (c) frequency domain (d) spatial
domain
5
Introduction
This gives an idea of the span of the spatial domain filter
kernels relative to their frequency domain counterpart.
Since a lot of the algorithms in this book involve spatial

domain filtering techniques and their implementation in
hardware description languages (HDLs), emphasis will be
placed on spatial domain processing throughout the book.
1.3 VHDL Development Environment

VHDL is one of the languages for describing the behaviour
of digital hardware devices and highly complex circuits such
as FPGAs, ASICs and CPLDs. In other words, it is called a
hardware description language (HDL) and others include
ADA and Verilog, which is the other commonly-used HDL.
VHDL is preferred because of its open source nature in that
it is freely available and has a lot of user input and support
helping to improve and develop the language further. There
has been three or four language revisions of VHDL since its
inception in the 80s, and have varying syntax rules.
Tools for hardware development with VHDL include such

popular software such as ModelSim for simulation and
Xilinx ISE tools and Leonardo Spectrum for complete circuit
design and development. With software environments like
MathWorks MATLAB and Microsoft Visual Studio, image
processing algorithms and theory can now be much more
easily implemented and verified in software before being
rolled out into physical, digital hardware.
We will be using the Xilinx software and ModelSim software

for Xilinx devices for the purposes of this book.
6
Introduction
1.3.1 Creating a new project in ModelSim
Before proceeding, ModelSim software from Mentor
Graphics must be installed and enabled. Free ModelSim
software can be downloaded from internet sites like Xilinx
website or other sources. The one used for this example is
a much earlier version of ModelSim (version 6.0a) tailored
for Xilinx devices.
Once ModelSim is installed, run it and the window like the

one in Figure 1.3.1(i) should appear.
Figure 1.3.1(i) – ModelSim starting window
Close the welcome page and click on File, select New ->
Project as shown in Figure 1.3.1(ii).
Click on the Project option and a dialog box appears as

shown in Figure 1.3.1(iii). You can then enter the project
name. However we would select an appropriate location to
7
Introduction
store all project files to have a more organized work folder.
Thus, click on Browse and the dialog box shown in Figure
1.3.1(iv) appears. Now we can navigate to an appropriate
folder or create one if it doesn‟t exist. In this case, a
previously created folder called „colour space converters‟
was created to store the project files. Clicking „OK‟ returns
us to the „Create a New Project‟ dialog box and now we
name the project as „Colour space converters‟ and click
„OK‟.
Figure 1.3.1(ii) – Creating a new project in ModelSim
8
Introduction
A small window appears for us to add a new or existing file
as shown in Appendix B, Figure B1.
Since we would like to add a new file for illustrative

purposes, we create a file called „example_file‟ as in Figure
B3 and it appears on the left hand side workspace as
depicted in Figure B4.
Then we add existing files by clicking the „Add Existing File‟

and navigate to the relevant files and select them as shown
in Figure B5. They now appear alongside the newly created
file as shown in Figure B6.
The rest of the process is easy to follow. For further

instruction on doing this, refer to Appendix B or the Xilinx
sources listed at the end of the chapter.
Now these files can be compiled before simulation as

shown in the subsequent figures.
Successful compilation is indicated by messages in green

colours while a failed compilation messages are in red and
will indicate the errors and the location of those errors like
all smart debugging editors for software code development.
Any errors are located and corrected and the files
recompiled until there are no more syntax errors.
9
Introduction
Figure 1.3.1(iii) – Creating a new project
Once there are no more errors, the simulation of the files

can begin. Clicking on the simulation tab will open up a
window to select the files to be simulated. However, you
must create a test bench file for simulation before running
any simulation. A test bench file is simply a test file to
evaluate your designed system to verify its correct
functionality.
You can choose to add several more windows to view the

ports and signals in your design.
10
Introduction
Figure 1.3.1(iv) – Changing directory for new project
The newly created file is empty upon inspection, thus we

have to add some code to the blank file. We start with
including and importing the standard IEEE libraries needed
as shown in Figure 1.3.1(v) at the top of the blank file.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;
Figure 1.3.1(v) – Adding libraries
11
Introduction
The “IEEE.std_logic_1164” and the
“IEEE.std_logic_arith” are the standard logic and the
standard logic arithmetic libraries, which are the minimum
libraries needed for any VHDL logic design since they
contain all the necessary logic functions.
With that done, the next step would be to add the

architecture of the system we would like to describe in this
example file. Thus, the block diagram for the design we are
going to implement in VHDL is shown in Figure 1.3.1(vi).
clk
rst example_file output_port
input_port
Figure 1.3.1(vi) – Top level system description of example_file
This leads to the top level architecture description in VHDL

code shown in Figure 1.3.1(vii).
----TOP SYSTEM LEVEL DESCRIPTION-----

entity example_file is
port ( ---the collection of all input and output
ports in top level
Clk : in std_logic; ---clock for synchronization
rst : in std_logic; ---reset signals for new data
input_port : in bit; ---input port
output_port : out bit ---output port
);
end example_file;
Figure 1.3.1(vii) – VHDL code for black box description of example_file
12
Introduction
The code in Figure 1.3.1(vii) is the textual or code
description of the black box diagram shown in Figure
1.3.1(vi).
The next step is to detail the actual operation of the system

and the relationship between the input and output ports and
this operation of the system is shown in the VHDL code in
Figure 1.3.1(viii).
---architecture and behaviour of TOP SYSTEM

LEVEL DESCRIPTION in more detail
architecture behaviour of example_file is
---list signals which connect input to output
ports here
---for example
signal intermediate_port : bit := '0'; --
initialize to zero
begin ---start
process(clk, rst) --process which is
triggered by clock or reset pin
begin
if rst = '0' then --reset all output ports
intermediate_port <= '0'; --initialize
output_port <= '0'; --initialize
elsif clk'event and clk = '1' then --operate
on rising edge of clock
intermediate_port <= not(input_port); -
-logical inverter
output_port <= intermediate_port or
input_port; --logical or operation
end if;
end process; --self-explanatory
end behaviour; --end of architectural behaviour
Figure 1.3.1(viii) – VHDL code for operation of example_file
13
Introduction
The first line of code in Figure 1.3.1(viii) defines the
beginning of the behavioural level of the architecture. The
next line defines a signal or wire that will be used in
connecting the input port to the output port. It has been
defined as a single bit and initialized to zero.
The next line indicates the beginning of a triggered process

that responds to both the clock and reset signals.
The if…then…else…then statements indicate what actions

and statements to trigger when the stated conditions are
met.
The actual logical operation starts at the rising edge of the

clock and the signal takes on the value from the input port
and inverts it while the output port performs the logical „or‟
operation on the inverted and non-inverted signals to
produce the output value. Though this is an elaborate circuit
design for a simple inverter operation, it was added to
illustrate several aspects that will be recurring themes
throughout the work discussed in the book.
1.3.2 Creating a new project in Xilinx ISE

Like the ModelSim software, the software for evaluating
VHDL designs in FPGA devices can be downloaded for free
from FPGA Vendors like Leonardo Spectrum for Altera and
Actel FPGAs or the Xilinx Project Navigator software from
Xilinx. The Xilinx ISE version used in this book is 7.1.
Once the software has been fully installed, we can then

begin, so by opening the program, we get a welcome
screen, just like that when we launched ModelSim.
14
Introduction
Creating a project in the Xilinx ISE is similar to the process

in ModelSim., however one would have to select the
specific FPGA device for which the design is to be loaded.
This is because the design must be physically mapped onto
a physical device and the ISE software is comprised of
special, complicated algorithms that emulate the actual
hardware device to ensure that the design is safe and error-
free before being downloaded to an actual device. This
saves on costly errors and damage to the device by
incorrectly routed pins when designing for large and
expensive devices like ASICs.
A brief introduction to creating a project in Xilinx is shown in

Figure 1.3.2(i) – 1.3.2(iv).
Figure 1.3.2(i) – Opening the Xilinx Project Navigator
15
Introduction
We then click „OK‟ on the welcome dialog box to access the
project workspace. Then click on File, select New Project
as shown in Figure 1.3.2(ii) and enter a new name for the
project as shown in Figure 1.3.2(iii). Then click „Next‟ and
the next window shown in Figure 1.3.2(iv) prompts you to
select the FPGA hardware device family your final design is
going to be implemented in. We select the Xilinx Spartan 3
FPGA chip which is indicated by the chip number xc3s200
and the package is ft256 and the speed grade is -4. This
device will be referred to as 3s200ft256-4 in the Project
Navigator.
We leave all the other options as they are since we will be

using the ModelSim simulator and use the VHDL language
for most of the work and only implementing the final design
after correct simulation and verification.
Depending on the device you are implementing your design

on, the device family name will be different. However, the
cost of the free software means that you do not have
access to all the FPGA devices in every available device
family in the software‟s database and thus will not be able
to generate a programming file to be downloaded to an
actual FPGA.
The design process from theoretical algorithm description to

circuit development and flashing to an FPGA device is a
non-linear exercise as the design may need to be optimized
and/or modified depending on the design constraints of the
project.
16
Introduction
Figure 1.3.2(ii) – Creating a new project in Xilinx Project Navigator
Figure 1.3.2(iii) – Creating a new project name
17
Introduction
Figure 1.3.2(iv) – Selecting a Xilinx FPGA target device
Clicking Next to the next set of options allows you to add

HDL source files, similar to ModelSim. The user can add
them from here or just click through to create the project
and then add the files manually like in ModelSim.
1.3.3 Image file data in VHDL image processing

Figure 1.3.3 shows an image in the form of a text file, which
will be read using the textio library in VHDL. A software
program was written to convert image files to text in order to
process them. The images can be converted to any
numerical type including binary, hexadecimal (to save
space). Integers were chosen for easy readability and
debugging and for illustration of the concepts. After doing
this, another software program is written to convert the text
files back to images to be viewed.
18
Introduction
Writing MATLAB code is the easiest and quickest way of

doing this when working with VHDL. MATLAB also enables
fast and easy prototyping of algorithms without re-inventing
the wheel and being force to write each and every function
needed to perform standard operations, especially image
processing algorithms. This was why it was chosen over the
.NET environment.
Coding in VHDL is a much different experience than coding

with MATLAB, C++ or JAVA since it is describing hardware
circuits, which have to be designed as circuits rather than
simply software programs.
VHDL makes it much easier to describe highly complex

circuits that would be impractical to design with basic logic
gates and it infers the fundamental logical behaviour based
on the nature of the operation you describe within the code.
In a sense, it is similar to the Unified Modeling Language
(UML) used to design and model large and complex object-
oriented software algorithms and systems in software
engineering.
SIMULINK in MATLAB is also similar to this and new tools

have been developed to allow designers with little to know
knowledge of VHDL to work with MATLAB and VHDL code.
However, the costs of these tools are quite prohibitive for
the average designer with a small budget.
FPGA system development requires a reasonable amount

of financial investment and the actual prototype hardware
chip cost can be quite considerable in addition to the
software tools needed to support the hardware. Thus, with
19
Introduction
these free tools and a little time spent on learning VHDL,
designing new systems becomes much more fulfilling and
gives the coder the chance to really learn about how the
code and the system they are trying to build is going to
work on a macro and micro level. Also, extensive periods
debugging VHDL code will definitely make the coder a
much better programmer because of the experience.
Figure 1.3.3 – image as a text file to be read into VHDL testbench
1.3.4 Notes on VHDL for Image Processing

Most users of this book probably have had some exposure
to programming or at least have heard of programming
languages and packages like C++, JAVA, C, C#, Visual
Basic, MATLAB, etc. But fewer people are aware of
languages like VHDL and other HDLs like Verilog and ADA,
which make it much easier to design larger and more
complex circuits for digital hardware chips like ASICs,
FPGAs, and CPLDs used in highly sophisticated systems
and devices.
20
Introduction
When using these fourth generation languages like C# and
MATLAB, writing programs to perform mathematical tasks
and operations is much easier and users can make use of
existing libraries to build larger scale systems that perform
more complex mathematical computations without thinking
much about them.
However, with languages like VHDL, performing certain

mathematical computations like statistical calculations or
even divisions require careful system design and planning if
the end product is to realize a fully synthesizable circuit for
downloading to an FPGA. In order words, floating point
calculations in VHDL for FPGAs is a painful and difficult
task for the uninitiated and those without developer and
design resources. Some hardware vendors have developed
their own specialized floating point cores but these come at
a premium cost and are not for the average hardware
design hobbyist. Floating point calculations take up a lot of
system resources and along with operations like divisions,
especially when calculating non-multiples of 2. Thus, most
experienced designers prefer to work with fixed-point
mathematical calculations.
For example, if we choose to write a program to calculate

the logarithm, cosine or exponential of signal values, this is
usually taken care of in software implementation by calling
a log, cosine or exponential function from the inbuilt library
without even being aware of the algorithm behind the
function. This is not the case with VHDL or hardware
implementation. Though it is vital to note that VHDL has
libraries for all these non-linear functions, the freely
available functions are not synthesizable. This means that
they cannot be realized in digital hardware and thus
21
Introduction
hardware design engineers must devise efficient
architectures for these algorithms or purchase hardware IP
cores developed by FPGA vendors before they can be
implement them on an FPGA.
The first obvious route to building these type of functions is

to create a look-up-table (LUT) consisting of pre-calculated
entries in addressable memory (ROM) which can then
accessed for a defined range of values. However, the size
of the LUT can expand to unmanageable proportions and
render the entire system inefficient, cumbersome and
wasteful. Thus, a better approach would involve a mixture
of some pre-computed values and the calculation of other
values to reduce the memory size and increase efficiency.
Thus, the LUT is a constant recurring theme in hardware
design involving certain systems that perform intensive
mathematical computation and signal processing.
Usually, when a non-linear component is an essential part

of an algorithm, the LUT becomes an alternative to
implementing such crucial part of the algorithm or an
alternative algorithm may have to be devised in accordance
with error trade-off curves. This is the standard theme of
research papers and journals on digital logic circuits.
Newer and more expensive FPGAs now have a soft core

chip built into them, enabling the designer the flexibility of
apportioning soft computing tasks to the PC chip on the
FPGA while devoting more appropriate device resources to
architectural demands. However the other challenge of real-
time reconfigurable computing and linking both the soft core
and the hard core aspects of the system to work in tandem
comes into play.
22
Introduction
Most of the images used in this book are well known in the
image processing community and were obtained from the
University of South Carolina Signal and Image Processing
Institute website and others from relevant research papers
and online repositories.
References
 R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2
ed.: Prentice Hall, 2002.
 R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital Image
Processing Using MATLAB: Prentice Hall, 2004.
 W. K. Pratt, Digital Image Processing, 4 ed.: Wiley-Interscience,
2007.
 U. Nnolim, “FPGA Architectures for Logarithmic Colour Image
Processing”, Ph.D. thesis, University of Kent at Canterbury,
Canterbury-Kent, 2009.
 MathWorks, "Image Processing Toolbox 6 User's Guide for use
with MATLAB," The Mathworks, 2008, pp. 285 - 288.
 [6] Mathworks, "Designing Linear Filters in the Frequency
Domain," in Image Processing Toolbox for use with MATLAB,
T. Mathworks, Ed.: The Mathworks, 2008.
 Mathworks, "Filter Design Toolbox 4.5," 2009.
 Weber, "The USC-SIPI Image Database," University of South
Carolina Signal and Image Processing Institute (USC-SIPI),
1981.
 Zuloaga, J. L. Martín, U. Bidarte, and J. A. Ezquerra, "VHDL
test bench for digital image processing systems using a new
image format."
 Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar
online, 1999.
 T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing
Image Processing Algorithms on FPGAs," in Proceedings of the
Eleventh Electronics New Zealand Conference (ENZCon‟04),
Palmerston North, 2004, pp. 118 - 123.
 EETimes, "PLDs/FPGAs," 2009.
 Digilent, "http://www.digilentinc.com," 2009.
 E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities
3rd ed.: Morgan Kaufmann Publishers, 2005.
 Xilinx, "XST User Guide ": http://www.xilinx.com, 2008.
23
Introduction
 www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol.
2008: Xilinx, 2005.
24
Chapter 2
Spatial Filter Hardware Architectures
Prior to the implementation of the various filters, it is
necessary to lay the groundwork for the design of spatial
filter hardware architectures in VHDL.
2.1 Linear Filter Architectures

Using spatial filter kernels for image filtering applications in
hardware systems has been a standard route for many
hardware design engineers. As a result, various
architectures in the spatial domain exist in company
technical reports, academic journals and conferences
papers dedicated to digital FPGA hardware-based image
processing. This is not surprising because of the myriad of
image processing applications that incorporate image
filtering techniques.
Such applications include but are not limited to image

contrast enhancement/sharpening, demosaicking,
restoration/noise removal/deblurring, edge detection,
pattern recognition, segmentation, inpainting, etc.
Several authors have published papers involving

implementing a myriad of algorithms involving spatial
filtering hardware architectures for FPGA platforms
performing different tasks or used as add-ons for even
more complex and sophisticated processing operations..
25
Linear Spatial filter architectures
A sample of application areas in industrial processes

include the detection of structural defects in manufactured
products using real-time imaging and edge detection
techniques to remove damaged products from the
assembly line.
Though frequency (Fourier Transform) domain filtering may

be faster for larger images and optical processes, spatial
filtering using relatively small kernels and make several of
these processes feasible for physical, real-time applications
and reduce computational costs and resources in FPGA
digital hardware systems.
Figure 2.1(i) shows one of the essential components of a

spatial domain filter, which is a window generator for a 5 x 5
kernel for evaluating the local region of the image.
Line In 1 FF FF FF FF FF Line Out 1
Figure 2.1(i) – 5×5 window generator hardware architecture
The boxes represent the flip flops (FF) or delay elements

with each box providing one delay. In digital signal
processing notation, a flip flop is represented in the z-
domain by and in the discrete time domain as ,
26
where x would be the delayed signal. The data comes in

from the left hand side of the unit and each line is delayed
by 5 cycles. For a 3 x 3 kernel, there would be three lines
and each would be delayed by 3 cycles.
Figure 2.1(ii) shows the line buffer array unit which consists
of long shift registers composed of several flip flops. Each
line buffer is set to the length of one row of the image.
Thus, for a 128 x 128 greyscale image with 8 bits per pixel,
each line buffer would be 128 wide and 8 bits deep.
Data_in Line Buffer1 Line out1
Line Buffer2 Line out2
Figure 2.1(ii) – Line buffer array hardware architecture
The rest of the architecture would include adders, dividers,

and multipliers or look up tables. These are not shown as
they are much easier to understand and implement.
The main components of the spatial domain architectures

are the window generator and line delay elements. The
delay elements can be built from First in First out (FIFO) or
shift register components for the line buffers.
27
The architecture of the processing elements is heavily

determined by the mathematical properties of the filter
kernels. For instance the symmetric or separable nature of
certain kernels is incorporated in the hardware design to
reduce multiply-accumulate operations. There are mainly
three kinds of filter kernels, namely symmetric, separable-
symmetric and non-separable, non-symmetric kernels. To
understand the need for this clarification, it is necessary to
discuss the growth in mathematical operations of image
processing algorithms implemented in digital hardware.
2.1.1 Generic Filter architecture

In the standard spatial filter architectures, the filter kernel is
defined as is and each coefficient of the defined kernel has
its own dedicated multiplier and corresponding image
window coefficient. Thus, this architecture is flexible for a
particular defined size of kernel and any combination of
coefficient values can be loaded to this architecture without
modifying the architecture in any way. However, this
architecture is inefficient when a set of coefficients in the
filter have the same values and redundancy grows as the
number of matching coefficients increases. It also becomes
computationally complex as filter kernel size increases
since more processing elements will be needed to perform
the full operation on a similarly sized image window. The
utility of the filter is limited to small kernel sizes ranging
from 3×3 to about 9×9 dimensions. Beyond this, the
definition and instantiation of the architecture and its
coefficients become unwieldy, especially in digital hardware
description languages used to program the hardware
devices. Figure 2.1.1 depicts an example of generic 5×5
filter kernel architecture.
28
c0
×
c1
×
c2
× ∑ Data_out
c3
×
c4
×
c5
×
c6
×
c7
× ∑ Data_out
c8
×
c9
×
Data_in Line Buffer FF FF FF FF FF
c10
×
Line Buffer FF FF FF FF FF
c11
×
c12
× ∑ Data_out
c13
×
c14
×
c15
×
c16
×
c17
× ∑ Data_out
c18
×
c19
×
c20
×
c21
×
c22
× ∑ Data_out
c23
×
c24
×
Figure 2.1.1 – Generic 5×5 spatial filter hardware architecture
The 25 filter coefficients range from c0 to c24 and are

multiplied with the values stored in the window generator
grid made up of flip flops (FF). These coefficients are
weights, which determine the extent of the contribution of
29
the image pixels in the final convolution output. The partial

products are then summed in the adder blocks. Not shown
in the diagram is another adder block to sum all the five
sums of products. The final sum is divided by a constant
value, which is usually defined as a multiple of 2 for good
digital design practice.
2.1.2 Separable Filter architecture

The separable filter kernel architectures are much more
computationally efficient where applicable. However, these
are more suited to low-pass filtering using Gaussian kernels
(which have the separability property). The architecture
reduces a two dimensional N × N sized filter kernel to two,
one dimensional filters of length N. Thus a one-dimensional
convolution operation (which is much easier than 2-D
convolution) is performed followed by multiplication
operations. The savings on multiply-accumulate operations
as a result in the reduction in the number of processing
elements demanded by the architecture can really be truly
appreciated when designing very large filter convolution
kernel sizes. Due to the fact that spatial domain convolution
is more computationally efficient for small filter kernel sizes,
separable spatial filter kernels further increase this
efficiency (especially for large kernels built as with a generic
filter architecture implementation).
Figure 2.1.2 depicts an example of separable filter kernel

architecture for a 5 × 5 spatial filter now reduced to 5 since
the row and the column filter coefficients are the same with
one 1-D filter being the transpose of the other.
30
Figure 2.1.2 – Separable 5×5 spatial filter hardware architecture
31
Observing the diagram in Figure 2.1.2, it can be seen that

the number of processing elements and filter coefficients
have been dramatically reduced in this filter architecture.
For example, the 25 coefficients in the generic filter
architecture have been reduced to just 5 coefficients which
are reused.
2.1.3 Symmetric Filter Kernel architecture

Symmetric filter kernel architectures are more suited to
high-pass and high-frequency emphasis (boost filtering)
operations with equal weights and reduce the number of
processing elements, thereby reducing the number of
multiply-accumulate operations. A set of pixels in the image
window of interest are added together and then the sum is
multiplied by the corresponding coefficient, which has the
same value for those particular pixels in their respective,
corresponding locations. Figure 2.1.3(i) shows a Gaussian
symmetric high-pass filter generated using the windowing
method while Figure 2.1.3(ii) depicts an example of
symmetric filter kernel architecture
Figure 2.1.3(i) – Frequency domain response of symmetric Gaussian

high-pass filter obtained from spatial domain symmetric Gaussian with
windowing method
32
Figure 2.1.3(ii) – 5 x 5 symmetric spatial filter hardware architecture
33
2.1.4 Quadrant Symmetric Filter architecture

The quadrant symmetric filter is basically one quadrant (or
a quarter) of a circular symmetric filter kernel and rotated
360 degrees. The hardware architecture is very efficient
since it occupies a quarter of the space normally used for a
full filter kernel.
To summarize the discussion of spatial filter hardware

architectures, it is necessary to present a comparison of the
savings of hardware resources with regards to reduced
multiply-accumulate operations.
For an N × N spatial filter kernel, N × N multiplications and

(N × N)-1, additions are required. For example, for a 3 × 3
filter, 9 multiplications and 8 additions are needed for each
output pixel calculation, while for a 9×9 filter, 81
multiplications and 80 additions are needed per output pixel
computation.
Since multiplications are costly in terms of hardware,

designs are geared towards reducing the number of
multiplication operations or eliminating them entirely.
Table 2.1.4 gives a summary of the number of multiplication

and addition operations per image pixel required for varying
filter kernel sizes using different filter architectures.
34
Kerne */pixel +/ */ +/pixe */ pixel +/ pixel

l size (GFKA) pixel pixel l (Sym (Sym
(GFK (SFK (SFK FKA) FKA)
A) A) A)
3×3 9 8 6 4 4/3 8
5×5 25 24 10 8 6/5 24
7×7 49 48 14 12 8/7 48
9×9 81 80 18 16 10/9 80
13×13 169 168 26 24 14/13 168
27×27 729 728 54 52 28/27 728
31×31 961 960 62 60 32/31 960
Table 2.1.4 – MAC operations and filter kernel size and type
KEY
*/pixel – Multiplications per pixel
+/pixel – Additions per pixel
GFKA – Generic Filter Kernel Architecture
SFKA – Separable Filter Kernel Architecture
Sym FKA – Circular Symmetric Filter Kernel Architecture
2.2 Non-linear Filter Architectures

The nature of non-linear filter architectures is more complex
than that of linear filters and depends on the algorithm or
order statistics used in the algorithm. Since most of the
algorithms covered in this book involve linear filtering, we
focus more on linear spatial domain filtering.
Summary
In this section, we discussed several linear spatial filter
hardware architectures used for implementing algorithms in
FPGAs using VHDLs and analyzed the cost savings of
each architecture with regards to use of processing
elements in hardware.
35
References
 Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar
online, 1999.
 E. Nelson, "Implementation of Image Processing Algorithms on
FPGA Hardware," in Department of Electrical Engineering. vol.
Master of Science Nashville, TN: Vanderbilt University, 2000, p.
86.
 T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing
Image Processing Algorithms on FPGAs," in Proceedings of the
Eleventh Electronics New Zealand Conference (ENZCon‟04),
Palmerston North, 2004, pp. 118 - 123.
 S. Saponara, L. Fanucci, S. Marsi, G. Ramponi, D. Kammler,
and E. M. Witte, "Application-Specific Instruction-Set Processor
for Retinex-Like Image and Video Processing," IEEE
Transactions on Circuits and Systems II: Express Briefs, vol.
54, pp. 596 - 600, July 2007.
 Google, "Google Directory," in Manufacturers, 2009.
 Digilent, "http://www.digilentinc.com," 2009.
 E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities
3rd ed.: Morgan Kaufmann Publishers, 2005.
 www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol.
2008: Xilinx, 2005.
 Mathworks, "Designing Linear Filters in the Frequency
Domain," in Image Processing Toolbox for use with MATLAB,
T. Mathworks, Ed.: The Mathworks, 2008.
 Mathworks, "Filter Design Toolbox 4.5," 2009.
36
Chapter 3
Image Reconstruction
The four stages of image retrieval from camera sensor
acquisition to display device comprise of Demosaicking,
White/Colour Balancing, Gamma Correction and Histogram
Clipping. The process of interest in this chapter is the
demosaicking stage and the VHDL implementation of the
demosaicking algorithm will also be described. The steps of
colour image acquisition from the colour filter array are
shown in Figure 3.
Colour Gamma Histogram

Demosaicking Balancing Correction Clipping
Figure 3 – Image acquisition process from camera sensor
3.1 Image Demosaicking

The process of demosaicking attempts to reconstruct a full
colour image from incomplete sampled colour data from an
image sensor overlaid with a colour filter array (CFA) using
interpolation techniques.
The Bayer array is the common type of colour filter array

used in colour sampling for image acquisition. The other
methods of colour image sampling are the Tri-filter, and
Fovean sensor. References to these other methods are
listed at the end of the chapter.
Before we delve deeper into the mechanics of

demosaicking, it is necessary to describe the Bayer filter
37
Image Demosaicking
array. This grid system involves a CCD or CMOS sensor
chip with M columns and N rows. A colour filter is attached
to the sensor in a certain pattern. For example, the colour
filters could be arranged in a particular pattern as shown by
the Bayer Colour Filter Array architecture shown in Figure
3.1(i).
G R G R G
B G B G B
G R G R G
B G B G B
G R G R G
Figure 3.1(i) – Bayer Colour Filter Array configuration
Where R, G, B stands for the red, green and blue colour

filters respectively and the sensor chip produces an M × N
array. There are two green pixels for every red and blue
pixel in a 2x2 grid because the CFAs are designed to suit
the human sensitivity to green light.
The demosaicking process involves splitting a colour image

into its separate colour channels and filtering with an
interpolating filter. The final convolution results from each
channel are recombined to produce the demosaicked
image.
38
Image Demosaicking
The equation for the basic linear interpolation demosaicking
algorithm is shown for one image channel of an RGB colour
image in (3.1-1 to 3.1-5).
(3.1-1)
(3.1-2)
(3.1-3)
Yielding
(3.1-4)
Expressing the input image as a function of the output

image gives the expression:
1+ (3.1-5)
Where and are the original, interpolated stage 1 and

2 images respectively, while is the demosaicked output
image and and are interpolation kernels usually
consisting of an arrangement of ones and zeros. In the
case of this implementation, and are 3 x 3 spatial
domain kernels defined as;
and respectively.
Note the redundant summation of and with , the

original image.
Keeping in mind that this is for one channel of an RGB
colour image, this process can be performed on the R and
39
Image Demosaicking
B channels modified for the G channel as will be explained
further in the following subsections.
The system level diagram for the process for an RGB

colour image is a shown in Figure 3.1(ii):
R Convolution Redundant R
G Interpolatio Summation ’G’
B Demosaicking B
n
’
Figure 3.1(ii) – Image Demosaicking process
’
In the diagram, the convolution involves an interpolation

process in addition to redundant summation
Some example images have been demosaicked to illustrate

the results. The first example is the image shown on the left
hand side in Figure 3.1(iii), which needs to demosaicked.
More examples of demosaicking are shown in Figure 3.1(v)
and 3.1(vi).
(a) (b)
Figure 3.1(iii) – (a) Original undersampled RGB image overlaid with
bayer colour filter array and (b) demosaicked image
40
Image Demosaicking
(a) (b)
Figure 3.1(iv) – (a) Original undersampled R,G and B channels (b)
Interpolated R, G and B channels
41
Image Demosaicking
The images in Figure 3.1(iv) show the gaps in the image
channel samples. The checkerboard pattern indicates the
loss of colours in between colour pixels by black
spaces/pixels in each channel.
A checkerboard filter kernel is generated and convolved

with the images in Figure 3.1(iv)-(a) to produce the
interpolated images in Figure 3.1(iv)-(b). As can be seen,
most of the holes or black pixels have been filled. The
images in Figure 3.1(iv)-(b) can be filtered again with
checkerboard filters to eliminate all the lines seen in the
blue and red channels.
The reason why the green channel is interpolated in one

pass is that there are two green pixels for every red and
green pixel thus the green channel provides the strongest
contribution in each 2 x 2 grid of the array.
It is important to note that there are various demosaicking

algorithms and they include Pixel Binning/Doubling, Nearest
Neighbour, Bilinear, Smooth Hue Transition, Edge-sensing
Bilinear, Relative Edge-sensing Bilinear, Edge-sensing
Bilinear 2, Variable Number Gradients, Pattern Recognition
interpolation methods. For more information about these
methods, consult the sources listed at the end of the
chapter. Some comparisons between some of the methods
are made using energy images in Figure 3.1(vii).
This is by no means an exhaustive list but indicates that

demosaicking is a very important and broad field as
evidenced by the volume of published literature, which can
be found in several research conference papers and
journals.
42
Image Demosaicking
(a) (b)
Figure 3.1v) – (a) Image with Bayer pattern (b) Demosaicked image
(a) (b) (c)
(d) (e) (f)

Figure 3.1(vi) – (a) Image combined with Bayer array pattern and
demosaicked image using (b) bilinear interpolation (c) Original image
(d) demosaicked using bilinear 2 and (e) high quality bilinear (f)
Gaussian-laplacian method
It is important to note that current modern digital cameras

have the ability to store digital images in raw format, which
43
Image Demosaicking
enables users to accurately demosaick images using
software without restricting them to the camera‟s hardware.
(a) (b) (c)
(d) (e) (f)

Figure 3.1(vii) – Energy images calculated using Sobel kernel
operators for (a) Original Image (b) combined with Bayer array pattern
and demoisaicked image using (c) and (d) bilinear interpolation, (e)
Gaussian smoothing with Laplacian and (f) Pixel doubling
3.2 VHDL implementation

In this section, the VHDL implementation of the linear
interpolation algorithm used in the demosaicking of RGB
colour images will be discussed. The first part of the
chapter dealt with the software implementation using
MATLAB as the prototyping platform.
Using MATLAB, the implementation was quite trivial,

however in the hardware domain, the VHDL implementation
of a synthesizable digital circuit for the demosaicking
algorithm is going to be a lot more involved as we will
discover.
44
Image Demosaicking
Prior to coding in VHDL, the first step is to understand the

dataflow and to devise the architecture for the algorithm. A
rough start would be to draw a system level diagram that
would include all the major processing blocks of the
algorithm. A top level system diagram is shown in Fig.
3.2(i).
R R’
G Demosaicking G’
B B’
Figure 3.2(i) –Black Box system top level description of demosaicking
This is the black box system specification for this

demosaicking algorithm. The next step is to go down a level
into the demosaicking box to add more detail to the system.
Figure 3.2(ii) shows the system level description of the first
interpolation stage of the demosaicking algorithm for the R
channel.
Interpolation Stage 1
Linear Rc
R Spatial Filter1 Σ Rs
Figure 3.2(ii) – System level 1 description showing first interpolation

stage of R channel using demosaicking algorithm
The R channel is convolved with a linear spatial filter mask

as specified in the previous section used in the MATLAB
implementation. The convolved R channel or Rc is then
summed with the original R channel to produce an
interpolated R channel, Rs. The channel, Rs is then passed
on to the second interpolation stage shown in Figure 3.2(iii).
45
Image Demosaicking
In this stage, the Rs channel is then convolved with another
linear spatial filter mask, to produce a new signal, Rcs,
which is subsequently summed with the original R channel
and the Rs output channel from the first interpolation stage.
This produces the final interpolated channel, R‟ shown as
the output in Figure 3.2(iii).
Rs Linear Rc
Spatial Filter2 s Σ R’
Figure 3.2(iii) – System level 1 description showing second

interpolation stage of R channel using demosaicking algorithm
The block diagrams shown in the Figure 3.2(ii) and (iii) can
also be used for the B channel. For the G channel, only the
first stage of the interpolation is needed as shown in the
original algorithm equations. Thus, the system level
description for G is as shown in Figure 3.2(iv).
Linear Gc
G Spatial Filter1 Σ G’
Figure 3.2(iv) – System level 1 description showing interpolation stage

of G channel using demosaicking algorithm
The system design can also be done in SIMULINK, which is

the visual system description component of MATLAB. The
46
Image Demosaicking
complete circuit would look that shown in Figure 3.2(v).
Rs
Rs interp_filter_r2 R_prime
R interp_filter_r R
R1
Embedded
R Embedded R
MATLAB Function3
MATLAB Function Video
dc168_lenna_bayer.png
G G interp_filter_g G_prime
G
Viewer
B B
Image From File Embedded Video Viewer

MATLAB Function1
Bs
Bs
B interp_filter_b interp_filter_b2 B_prime
B1
B
Embedded Embedded
MATLAB Function2 MATLAB Function5
Figure 3.2(v) – SIMULINK System description of linear interpolation

demosaicking algorithm
The diagram designed in SIMULINK shown in Figure 3.2(v)

is the system level architecture of the demosaicking
algorithm with the major processing blocks.
The next step is to develop and design the crucial inner

components of the major processing blocks. Based on the
mathematical expression for the algorithm, we know that
the system will incorporate 3x3 spatial filters and adders.
This leads to the design specification for the spatial filter
which is the most crucial component of this algorithm.
Several spatial filter architectures exist in research literature

with various modifications and specifications depending on
the nature of the desired filter. These basic architectures
were discussed in Chapter 2 and include the generic form,
separable, symmetric and separable symmetric filters. In
this section, we choose the generic 3 x 3 filter architecture
using long shift registers for the line buffers to the filter
instead of FIFOs. We remember that hardware spatial filter
architecture comprises a window generator, pixel counter
and line buffers, shift registers, flip flops, adders and
47
Image Demosaicking
multipliers. Building on the spatial filter architectures
discussed in Chapter 2, all that needs to be modified in the
filter architecture are the coefficients for the filter and the
divider settings. Skeleton VHDL codes, which can be
modified for this design can be found in the Appendices.
A brief snippet of the VHDL code used in constructing the

interpolation step for the R channel is shown in Figure
3.2(vi). The top part of the code in Figure 3.2(vi) includes
the instantiations of the necessary libraries and packages.
Figure 3.2(vi) – VHDL code snippet for specifying interpolation filter for
R channel
48
Image Demosaicking
top_clk dat_out
top_rst Interp_filter_r D_out_valid
dat_in
Figure 3.2(vii) – Visual system level description of the VHDL code
snippet for specifying interpolation filter for R channel
The component specification of the “interp_mask_5x5_512”

part in the VHDL code shown in Figure 3.2(i) is embedded
within the system level description of the interp_filter_r
system as described in Figure 3.2(ii) – 3.2(iii).
3.2.1 Image Selection

Now we select the image we want to process so for the
convenience, we choose the Lena image overlaid over a
CFA array as shown in Figure 3.2.1(i) The criteria for
choosing this image includes the familiarity of this image to
the image processing community and also because it is a
square image (256 x 256), which makes it easier to specify
in the hardware filter without having to pad the image or
add extra pixels.
Figure 3.2.1(i) – Original image to be demosaicked
49
Image Demosaicking
Based on what was discussed about demosaicking, we

know that the easiest channel to demosaick would be the
green channel since there are two green pixels for every
red and blue pixel in a 2 x 2 CFA array, thus only one
interpolation pass is required. Thus we will discuss the
green channel last.
(a) (b)
(c) (d)
Figure 3.2.1(ii) – Demosaicked R image channel from left to right
(software simulation) (a) original R channel, (b) filtered channel, Rc,
from first stage interpolation (c) filtered channel, Rcs, from second
stage interpolation (d) demosaicked image
50
Image Demosaicking
In Figure 3.2.1(ii), we can observe the intermediate
interpolation results of the spatial filter from left to right. Red
image is the original red channel, (R) of the image in Figure
3.2.1(i). Red1 image is the interpolated image from the first
stage or Rc from the diagram in Figure 3.1.2(i). Red2 image
is the second interpolated image (Rcs) from Figure 3.1.2(iii)
while (d) is the final demosaicked R channel, R‟.
The diagrams shown in Figure 3.2.1(iii), are the image

results obtained from both software (a) and the hardware
simulation (b). The results show no visually perceptible
difference, thus the hardware filter scheme was
implemented correctly.
There is no need to attempt to quantify the accuracy by

taking the difference between the images obtained from
software and hardware simulations as the visual results are
very good.
The three image channels processed with both the software

and hardware implementation of the demosaicking
algorithm are shown for the purposes of visual analysis.
The three channels are then recombined together to create

the compositie RGB colour image and compared with the
colour image obtained from the software simulation as well
as the original CFA overlaid image in Figure 3.2.1(iv).
51
Image Demosaicking
(a) (b)
Figure 3.2.1(iii) – Demosaicked images with (a) software simulation
and (b) hardware simulation: first row: R channel, second row: G
channel and third row: B channel
52
Image Demosaicking
(a) (b)
Figure 3.2.1(iv) – Demosaicked colour image: (a) software simulation
(b) hardware simulation
Comparing the images in the Figure 3.2.1(iv) shows the

strikingly good result obtained from the hardware simulation
in addition to the successful removal of the CFA
interference in the demosaicked image. However, on closer
inspection, one may observe that there are colour artifacts
in regions of high/low frequency discontinuities in the
image.
Also, because this image contains a relatively medium

amount of high frequency information, one can get away
with this linear interpolation demosaicking method. For
images with a lot of high frequency information, the
limitations of linear methods become ever more apparent.
In Figure 3.2.1(v), we present the original CFA overlaid

image with the demosaicked results for comparison and the
results are even more striking.
The investigation of higher and more advanced methods

are left to the reader who wishes to learn more. Some
53
Image Demosaicking
useful sources and research papers are listed at the end of
the chapter for further research.
(a) (b) (c)

Figure 3.2.1(v) – (a) Image to be demosaicked (b) Demosaicked
image (software) (c) Demosaicked image (hardware simulation)
A snapshot of the ModelSim simulation window is shown in

Figure 3.2.1(vi) indicating the clock signal, the inputs and
outputs of the interpolation process.
Figure 3.2.1(vi) – Snapshot of VHDL image processing in ModelSim

simulation window
The system top level description generated by Xilinx ISE

from the VHDL code is shown in Figure 3.2.1(vii). Since we
are dealing with unsigned 8-bit images, we only require 8
bits for each channel leading to 256 levels of gray for each
54
Image Demosaicking
channel. The data_out_valid signal and the clock are
needed for proper synchronization of the inputs and outputs
of the system. Note that this diagram mirrors the black box
system description defined at the beginning of this section
describing the VHDL implementation of the algorithm.
Figure 3.2.1(vii) – Black box top level VHDL description of

demosaicking algorithm
The next level of the top level system shows the major
components of the system for each of the R, G and B
channels.
Further probing reveals structures similar to those that were

described earlier on at the beginning of the VHDL section of
this chapter. Refer to the Appendix for more detailed RTL
technology schematics and levels of the system.
55
Image Demosaicking
Figure 3.2.1(viii) – first level of VHDL description of demosaicking

algorithm
The synthesis results for the implemented demosaicking

algorithm on the Xilinx Spartan 3 FPGA chip is given as:
Minimum period: 13.437ns (Maximum Frequency: 74.421MHz)

Minimum input arrival time before clock: 6.464ns
Maximum output required time after clock: 10.644ns
Maximum combinational path delay: 4.935ns
The maximum frequency implies that for a 256 x 256

image, the frame rate for this architecture is given by:
56
Image Demosaicking
Using this formula yields 1135 frames/sec, which is
exceedingly fast.
Using the spatial filter architectures described in Chapter 2,

several of the other demosaicking methods can be
implemented in VHDL and hardware. Some good papers on
image demosaicking are listed in the references section
and enable the reader to start implementing the algorithms
quickly and performing experiments with the various
algorithms.
Summary
In this chapter, the demosaicking process using linear
interpolation was described and implemented in software
and followed by the VHDL implementation of the linear
interpolation algorithm for demosaicking.
References
2007.
 Henrique S. etal, “HIGH-QUALITY LINEAR INTERPOLATION
FOR DEMOSAICING OF BAYER-PATTERNED COLOR
IMAGES”, Microsoft Research, One Microsoft Way, Redmond
WA 98052
 Alexey Lukin and Denis Kubasov, “An Improved Demosaicing
Algorithm”,Faculty of Applied Mathematics and Computer
Science, State University of Moscow, Russia
 Rémi Jean, “Demosaicing with The Bayer Pattern”, Department
of Computer Science, University of North Carolina.
 Robert A. Maschal Jr., etal, “Review of Bayer Pattern Color
Filter Array (CFA) Demosaicing with New Quality Assessment
Algorithms”, ARMY RESEARCH LABORATORY,ARL-TR-5061,
January 2010.
 Yang-Ki Cho, etal, “Two Stage Demosaicing Algorithm for Color
Filter Arrays”, International Journal of Future Generation
Communication and Networking, Vol. 3, No. 1, March, 2010.
57
Image Demosaicking
 Rajeev Ramanath and Wesley E. Snyder, “Adaptive
demosaicking”, Journal of Electronic Imaging 12(4), 633–642
(October 2003).
 Boris Ajdin, etal, “Demosaicing by Smoothing along 1D
Features”, MPI Informatik, Saarbr¨ucken, Germany.
 Yizhen Huang,“Demosaicking Recognition with Applications in
Digital Photo Authentication based on a Quadratic Pixel
Correlation Model”, Shanghai Video Capture Team, ATI
Graphics Division, AMD Inc.
58
Chapter 4
Image Enhancement
This chapter explores some image enhancement concepts,
algorithms, their architectures and implementation in VHDL.
Image enhancement is a process that involves the

improvement of an image by modifying attributes such as
contrast, colour, tone and sharpness. This process can be
performed manually by a human user or automatically using
an image enhancement algorithm, developed as a
computer program. Unlike image restoration, image
enhancement is a subjective process and usually operates
without prior objective image information used to judge the
level or quantify the amount of enhancement. Also,
enhancement results are usually targeted at human end
users who use visual assessment to judge the quality of an
enhanced image, which would be difficult for a machine or
program to perform.
Image enhancement can be performed in the spatial,

frequency, wavelet, and fuzzy domain and in these domains
can be classified as local (point and/or mask) or global
operations in addition to being linear or nonlinear
processes.
A myriad of algorithms have been developed in this field

both in industry and in academia evidenced by the
numerous conference papers and journals, reports and
books and several useful sources are listed at the end of
the chapter for further study.
59
Image Enhancement
4.1 Point-based Enhancement

These work on each individual pixel of the image
independent of surrounding points or pixels to enhance the
whole image. An example would be any function like
logarithm, cosine, exponential, or square root operations.
4.1.1 Logarithm Transform

An example of a point-based enhancement process is the
logarithm transform. It is used to compress the dynamic
range of the image scene and can also be a pre-processing
step for further image processing processes as will be seen
in the subsequent section. The logarithm transform using
the natural logarithm (in base e) is given as;
(4.1-1)
In digital hardware implementation, it is more convenient

and logical to use binary (base 2) logarithms instead. A
simple logarithm circuit could consist of a range of pre-
computed logarithm values stored in ROM memory as a
look up table (LUT). This relatively trivial design is shown in
Figure 4.1(i). More complex designs can be found in the
relevant literature.
Linear Input Address ROM

Generator LUT
Logarithmic
+
Output
Offset
Figure 4.1.1(i)–ROM LUT-based binary logarithm hardware

architecture
60
Image Enhancement
Figure 4.1.1(ii) show the results of using the design in
Figure 4.1.1(i) to enhance the original cameraman image
(top left) to produce the log-transformed image (top right)
and the double precision, floating point log-transformed
image (bottom left) while the error image (bottom right is
shown).
Figure 4.1.1(ii) – Comparison of image processed with fixed-point LUT

logarithm values against double-precision, floating-point logarithm
values
There is a subtle difference between the fixed point and

floating point logarithm results in Figure 4.1.1(ii).
As was mentioned earlier, there are several other complex

algorithms used to compute binary logarithms in digital logic
circuits and these have a varying range of performance with
61
Image Enhancement
regards to power, accuracy, efficiency, memory
requirements, speed, etc. However, the topic of binary
logarithmic calculation is quite broad and beyond the scope
of this book. The next section discusses the Gamma
Correction method used in colour display devices.
4.1.2 Gamma Correction

Gamma correction is a simple process for enhancing
images for display on various displays, viewing and printing
devices. The formula is quite straightforward and is
basically an exponential transform using a particular
constant value known as the gamma factor, which is the
exponent. An example of image processed with Gamma
Correction is shown in Figure 4.1.2.
(a) (b)
Figure 4.1.2 – (a) Original image (b) Gamma Corrected image
Note the change in colour difference after Gamma

Correction, especially for adjacent, similar colours. The next
section talks about Histogram Clipping, which belongs to
the same class of algorithms like Histogram Equalization.
4.1.3 Histogram Clipping

Histogram clipping involves the re-adjustment of pixel
intensities to enable the proper display of the acquired
image from the camera sensor. It expands the dynamic
range of the captured image to improve colour contrast.
62
Image Enhancement
Figure 4.1.3(i) illustrates an the image from Figure 4.1.2(a)

processed with Histogram clipping and Gamma Correction.
(a) (b)
Figure 4.1.3(i) – (a) Histogram clipped image and (d) Gamma
Corrected image after Histogram Clipping
Note the difference between the original and Gamma

Corrected images in Figure 4.1.2 and the Histogram
Clipped image in Figure 4.1.3(i) and its Gamma Corrected
version in (b).
The code snippet for the basic histogram clipping algorithm

is shown in Figure 4.1.3(ii).
Figure 4.1.3(ii) – MATLAB code snippet of histogram clipping

63
Image Enhancement
4.2 Local/neighbourhood enhancement
These types of enhancement methods process individual
pixels as a function of adjacent neighbourhood image
pixels. They perform these operations using linear or non-
linear filter processes.. Examples of this type of filtering
include un-sharp masking (linear) and logarithmic local
adaptive (non-linear) enhancement.
4.2.1 Unsharp Masking

Unsharp masking involves using sharpening masks like the
Laplacian to sharpen an image by magnifying the effects of
the high frequency components of the image, where most
of the information in the scene resides. The Laplacian
masks used in the software and VHDL hardware
implementation are shown as;
 1  1  1  0 1 0 
   1 5  1
L1 =  1 9  1 and L2 =   respectively.
 1  1  1  0  1 0 
The image results from the hardware simulation of the low-

pass and Laplacian 3x3 filters are shown in Figure 4.2.1.
(a) (b) (c)

Figure 4.2.1 – VHDL-based hardware simulation of (a) - (c) Laplacian-
filtered images using varying kernel coefficients
64
Image Enhancement
4.2.2 Logarithmic local adaptive enhancement
This algorithm uses the logarithmic transform and local non-
linear statistics, (local image variance) to enhance the
image. The method is similar to a spatial filtering operation
in addition to using a logarithm transform. Figure 4.2.2
shows an image processed with the algorithm.
(a) (b)
Figure 4.2.2 – (a) Original image (b) Image processed with LLAE
This method produces improved contrast in the processed

image as is evident in the images in Figure 4.2.2 where the
lines and details on the mountain terrain can be clearly
seen after enhancement in addition to richer colours.
4.3 Global/Frequency Domain Enhancement

Global/Frequency domain enhancement processes the
image as a function of the cumulative summation of the
frequency components in the entire image. This transforms
the spatially varying image into a spectral one by summing
up all the contributions of each pixel in relation to the entire
image. The image is then processed in the spectral domain
with a spectral filter after which the image is transformed
back to the spatial domain for visual observation.
65
Image Enhancement
4.3.1 Homomorphic filter
The operation of the Homomorphic filter is based on the
Illuminance/Reflectance image model and was developed
by Allan Oppenheim initially for filtering of audio signals and
has found numerous applications in digital image
processing. This filtering technique achieves enhancement
by improving the contrast and dynamic range compression
of the image scene. The process follows the scheme in
Figure 1.2 and the equation for the operation is given as
follows:
(4.3.1-1)
Where is the enhanced image and is the

input image, FFT stands for the Fast Fourier Transform and
is the frequency domain filter.
With the basic introduction to enhancement, the next step is

to describe the VHDL implementation of the key
enhancement algorithm.
4.4 VHDL implementation

Unfortunately, performing the Fourier Transform in software
is much less demanding than in hardware and though there
are hardware IP cores for the FFT algorithm, it makes
sense to transform all frequency domain image filtering
processes to spatial domain because of the ease of
implementation in hardware. Thus, the VHDL
implementation of the Homomorphic filter is done in the
spatial domain since we can then avoid the Fourier
Transform computation and generate small but effective
spatial domain filter kernel for the filtering.
66
Image Enhancement
In implementing this, the main components are the
logarithm transformation components and the spatial
domain filter. By building each individual component
separately, debugging and testing becomes much easier.
Once more, we describe the top level system.
Thus, we have the RGB input and output ports in the top
level. Then the next level in Figure 4.4(ii) shows the inner
main components of the top level system.
Figure 4.4(i) – Top level architecture of RGB Homomorphic filter
Homomorphic Filter System (top level)
Red(15:0) Red(15:0)
Red(7:0) 1
A1
1
a1 b1
5 1
A1
1
a1 b1
5 1
A1
1
a1 b1
5 1
A1
Red(7:0)
Green(15:0) Green(15:0)
Green(7:0) 1
A1
2
a2 b2
6 1
A1
2
a2 b2
6 1
A1
2
a2 b2
6 1
A1 Green(7:0)
3x3
LIN2LOG LOG2LIN
Blue(15:0) Spatial Filter Blue(15:0)
Blue(7:0) 1
A1
3
a3 b3
7 1
A1
3
a3 b3
7 1
A1
3
a3 b3
7 1
A1
Blue(7:0)
4 8 4 8 4 8
a4 b4 a4 b4 a4 b4
A1
1
A1
1 1
A1
1
A1
1
A1 Data_Valid
Clk 1
A1
1
A1
1
A1
A1
1
A1
1
Figure 4.4(ii) – Top level architecture of RGB Homomorphic system

with inner sub-components
The image shown in Figure 4.4(iii) was processed with a

RGB Homomorphic filter implemented in VHDL for an
67
Image Enhancement
FPGA. The hardware simulation image result is also shown
alongside the original image for comparison.
(a) (b)
Figure 4.4(iii) – (a) Original image (b) processed image with RGB
Homomorphic filter (hardware simulation)
It can be easily observed that the Homomorphic filter clearly

improved the original image as there are more details in the
enhanced image scene where we can now distinguish
foreground and background objects. The maximum speed
of this architecture on the Xilinx Spartan 3 FPGA is around
80 MHz based on synthesis results.
Summary
We discussed several image enhancement algorithms and
implemented the more effective and popular ones in VHDL
and analysed the image results of implemented
architectures of the system.
References
68
Image Enhancement
2007.
1981.
 Zuloaga, J. L. Martín, U. Bidarte, and J. A. Ezquerra, "VHDL
test bench for digital image processing systems using a new
image format."
 Xilinx, "XST User Guide ": http://www.xilinx.com, 2008..
 G. Deng and L. W. Cahill, "Multiscale image enhancement
using the logarithmic image processing model," Electronics
Letters, vol. 29, pp. 803 - 804, 29 Apr 1993.
 G. Deng, L. W. C., and G. R. Tobin, "The Study of Logarithmic
Image Processing Model and Its Application to Image
Enhancement," IEEE Transaction on Image Processing, vol. 4,
pp. 506-512, 1995.
 S. E. Umbaugh, Computer Imaging: Digital Image Analysis and
Processing. Boca Raton, FL: CRC Press, Taylor & Francis
Group, 2005.
 A. Oppenheim, R. W. Schafer, and T. G. Stockham, "Nonlinear
Filtering of Multiplied and Convolved Signals," Proceedings of
the IEEE, vol. 56, pp. 1264 - 1291, August 1968.
 U. Nnolim and P. Lee, "Homomorphic Filtering of colour images
using a Spatial Filter Kernel in the HSI colour space," in IEEE
Instrumentation and Measurement Technology Conference
Proceedings, 2008, (IMTC 2008) Victoria, Vancouver Island,
Canada: IEEE, 2008, pp. 1738-1743.
 F. T. Arslan and A. M. Grigoryan, "Fast Splitting alpha - Rooting
Method of Image Enhancement: Tensor Representation," IEEE
Transactions on Image Processing, vol. 15, pp. 3375 - 3384,
November 2006.
 S. S. Agaian, K. Panetta, and A. M. Grigoryan, "Transform-
Based Image Enhancement Algorithms with Performance
Measure," IEEE Transactions on Image Processing, vol. 10, pp.
367 - 382, March 2001.
69
Chapter 5
Image Edge Detection and
Smoothing
This chapter deals with the VHDL implementation of image
edge detection and smoothing filter kernels using the
spatial filter architectures from Chapter 2.The original
greyscale images to be processed are shown in Figure 5.
All the filters are modular in their design, thus the RGB
colour versions are simply triplicate instantiations of the
greyscale filters.
Figure 5 – Original (256 × 256) images to be processed
5.1 Image edge detection kernels

These kernels are digital mask approximations of derivative
filters for edge enhancements and they include:
 Sobel kernel
 Prewitt kernel
 Roberts kernel
They are the products of numerical solutions to complex

partial differential equations like the Laplacian equation.
70
Image Edge Detection and Smoothing
This class of filter kernels are used to find and identify

edges in an image by finding gradients of the image in
vertical and horizontal directions which are then combined
to produce the actual amplitude of the image. Some well
known kernels are the Sobel, Prewitt and Roberts kernels.
Also the Canny edge detection method uses the edge
finding filters as part of the algorithm.
The Sobel, Prewitt and Roberts kernel approximations are

simple but effective tools in image edge and corner
detection. The best edge detection algorithm commonly
used is the famous Canny edge detector technique, which
is the most effective method for detecting both weak and
strong edges. However, though the Canny algorithm is a bit
more involved and beyond the focus of this book the
mentioned filtering techniques provide the basic steps of
the algorithm.
5.1.1 Sobel edge filter

The Sobel kernel masks used to find the horizontal and
vertical edges in the image in the VHDL implementation
were
1 2 1   1 0  1
0 0 0 , S Y =  2 0 2 

SX = 
 1  2 1   1 0  1
The x and y subscripts denote horizontal and vertical

positions respectively.
The hardware and software simulation results of the images

processed with these filter kernels in hardware are shown in
Figure 5.1.1.
71
(a) (b)
(c) (d)
Figure 5.1.1 – Comparison between (a) & (b) VHDL-based hardware
simulation of Sobel filter (x and y direction) processed image and (c)
and (d) MATLAB-based software simulation of Sobel filter (x and y
direction)
5.1.2 Prewitt edge filter

The Prewitt kernel masks used for finding horizontal and
vertical lines in the image in the VHDL implementation were
 0 1 0    1 0  1
   2 0 2 
P X =  1 5  1 , P Y =  
 0  1 0    1 0  1
72
(a) (b)
Figure 5.1.2 – VHDL-based hardware simulation of (a) & (b) Prewitt
filter (x and y direction) processed image.
The image results for the edge filters and the high-pass are
in this form because most of the (negative) image pixel
values are outside the unsigned 8-bit integer display range
Appropriate scaling within the filter (using the “mat2gray”

function in MATLAB for example) would ensure that all pixel
values are mapped into the display range. The end result
will be an embossing effect in the output image.
On further analysis and comparison, the results from the

hardware filter simulation are quite comparable to the
software versions.
5.1.3 High Pass Filter

The high-pass filter only allows high frequency components
in the image (like lines and edges) in the passband and is
the default state of the edge or derivative filters. These
filters are the image processing applications of derivatives
from Calculus as was mentioned earlier.
73
The kernel for the default high-pass filter used is defined as;
 1  1  1

HPF =  1 8  1

 1  1  1
Though the earlier filter kernels mentioned are also types of

high-pass filters, the default version is much harsher than
the Sobel and Prewitt filters as can be observed from the
image results in Figure 5.1.3 from VHDL hardware
simulation. And the reason for the harshness is easily seen
from the kernel coefficients because weak edges are not
favoured over strong edges since all weights are equally
weighted unlike the other edge filter kernels.
Figure 5.1.3 – VHDL hardware simulations of high-pass filtered images
5.2 Image Smoothing Filters

These types of filters enhance the low frequency
components of the image scene by reducing the gradients
or sharp changes across frequency components in the
image, which is visually manifested as a smoothing effect.
They can also be called integration or anti-derivative filters
from Calculus. They can be used in demosaicking, noise
removal or suppression and their effectiveness varies
74
depending on the complexity and level of non-linearity of

the algorithms.
5.2.1 Mean/Averaging filter

Averaging or mean filters are low-pass filters used for
image smoothening tasks such as removing noise in an
image. They eliminate the high frequency components,
which contribute to the visual sharpness and high contrast
areas of an image. The easiest method of implementing an
averaging filter is to use the kernel specified as:
1 / 9 1 / 9 1 / 9
 
LPF= 1 / 9 1 / 9 1 / 9
 
1 / 9 1 / 9 1 / 9
There is a considerable loss of detail in using the basic

mean (box) filter for image smoothing/denoising as it will
blur edges along with the noise it is attempting to remove.
Also, note that the Low-pass mean filter is the inverse of the
Highpass in 5.1.3.
5.2.2 Gaussian Lowpass filter

The Gaussian lowpass filter is another type of smoothing
filter that produces a better result than the standard
averaging filter because it assigns different weights to
different pixels in the local image neighbourhood. Also,
Gaussian filters can be separable and/or circularly
symmetric depending on the design. Separable filter
kernels are very important in hardware image filtering
operations because of the reduction of multiplications or
75
operations needed as was discussed in Chapter 2. The

kernel for the spatial Gaussian filter is;
Which can also be expressed in its separable form;
Figure 5.2.2 shows the image results comparing a mean

filter with the Gaussian low-pass filter.
(a) (b)
Figure 5.2.2 – VHDL-based hardware simulation of (a) mean filter & (b)
Gaussian low-pass filtered images
In (b), the Gaussian filter with different weights was used

and provides a much better result than the image in (a).
It is important to note that the filter architectures for these

types of filters can be further minimized for efficient usage
of hardware resources. For example, the High-pass filter
76
can have symmetric filter architecture while the Low-pass

filter can have separable and symmetric filter architecture
while the Laplacian and high boost filtering for edge
enhancement can also have symmetric filter architecture).
Additionally, the Sobel, Prewitt and Gaussian filters can
have symmetric and separable filter architecture.
Summary
In this chapter, we introduced spatial filters used for edge
detection and smoothing and showed the VHDL
implementation of the algorithms compared with the
software versions.
References
2007.
1981.
77
Chapter 6
Colour Image Conversion
This chapter deals with the VHDL implementation of colour
space converters for colour image processing.
Colour space conversions are necessary for certain

morphological and analytical processes such as
segmentation, pattern and texture recognition, where the
colour information of each pixel must be accurately
preserved throughout the processing. Processing an image
in RGB colour image with certain algorithms like histogram
equalization will lead to distorted hues in the output image
since each colour pixel in an RGB image is a vector
composed of three scalars values from the individual R, G
and B image channels.
Colour space conversions can be additive, subtractive,

linear and non-linear processes. Usually, the more involved
the colour conversion process, the better the results.
Examples of the various types of colour spaces include but

are not limited to:
6.1 Additive colour spaces

The additive colour spaces include:
CIELAB/ L*a*b* Colour Coordinate System
RGB Colour Coordinate System
These colour spaces are used in areas such as digital film

photography and television
78
The CIELAB Colour space system was developed to be
independent of display devices and is one of the more
complete colour spaces since it approximates human
vision. Additionally, a lot of colours in the L*a*b* space
cannot be realized in the real world and so are termed
imaginary colours. This implies that this colour space
requires a lot of memory for accurate representation, thus
conversion to 24 bit RGB is a lossy process and will require
at least 48 bit RGB for good resolution.
The RGB colour space was devised specifically for

computer vision for display (LCDs, CRTs, etc) and camera
devices and has several variants of which include sRGB
(used in HD digital image and video cameras) and Adobe
RGB. It is made of Red, Green and Blue channels from
which various combinations of these three primary colours
are used to generate a myriad of secondary and higher
colours.
6.2 Subtractive Colour spaces

CMY Colour Coordinate System
CMYK Colour Coordinate System
Subtractive colour spaces like the CMY (Cyan, Magenta

and Yellow) and CMYK (CMY plus black Key) are used for
printing purposes. For CMY, the simple formula is:
(6.2-1)
Where R, G and B values are normalized to the range [0, 1]

by using the expressions,
79
, and (6.2-2)
However, just by observation and using this formula, one

can observe that this formula is not very good in practice.
Thus, the CMYK method is the preferred colour space for
printers.
The formula of the CMYK method is a bit more involved and

is dependent on the colour space and the colour ICC
profiles used by the hardware device used to output the
colour image (e.g. scanner, printer, camera, camcorder,
etc).
Some sample formulae include:
(6.2-3)
(6.2-4)
Another variation is given as;
(6.2-5)
(6.2-6)
(6.2-7)
(6.2-8)
80
(6.2-9)
(6.2-10)
(a) (b) (c) (d)

Figure 6.2(i) –(a)C image (b)M image (c) Y image (d) K image
(a) (b) (c) (d)

Figure 6.2(ii) –(a)C image (b)M image (c) Y image (d) K image
(a) (b)
Figure 6.2(iii) –(a)CMY image (b) CMYK image (K not added)
81
The VHDL implementation of the CMYK converter is trivial
and is left as an exercise for the interested reader using the
design approach outlined for the more complex designs.
6.3 Video Colour spaces

YIQ NTSC Transmission Colour Coordinate System
YCbCr Transmission Colour Coordinate System
YUV Transmission Colour Coordinate System
These colour space conversions must be fast and efficient

to be useful in video operation. The typical form for such
transformations is as given in (6.3-1).
(6.3-1)
Where X, Y and Z are the channels of the required colour

space, R, G and B are the initial channels from the RGB
colour space and are constant coefficients.
The implemented colour spaces are the YIQ (NTSC) colour

space, YCbCr and the Y’UV colour spaces. The MATLAB
code for the conversion is given in Figure 6.3(i).
The YIQ transformation matrix is given as
(6.3-2)
A software program is developed to test the algorithm and

as a template for the hardware system that will be
82
implemented in VHDL. The program is shown in Figure
6.3(i).
Figure 6.3(i) – Software and hardware simulation results of RGB2HSI

converter
The top level system architecture is given in the form shown

in Figure 6.3(ii).
R Y
G RGB to YIQ I
converter
B Q
Figure 6.3(ii) – Software and hardware simulation results of RGB2HSI
converter
83
The detailed system is shown in Figure 6.3(iii).
Figure 6.3(iii) – Hardware architecture of RGB2YIQ/Y’UV converter
(a) (b) (c)

Figure 6.3(iv) – (a) RGB image (b)Software and (c) hardware
simulation results of RGB2YIQ/NTSC colourspace converter
The transformation matrix for the Y‟UV conversion from

RGB is given as:
84
(6.3-3)
(a) (b) (c)

Figure 6.3(v) –(a)RGB image (b)Software and (c) hardware simulation
results of RGB2Y’UV colourspace converter
Figure 6.3(vi) –VHDL code snippet for RGB2YIQ/Y’UV colour converter

showing coefficients
The coding of the signed, floating point values in VHDL is

achieved with a custom program written in MATLAB to
convert the values from double-precision floating point
85
values to fixed point representation in VHDL. The use of
fixed-point math is necessary since this system must be
feasible and synthesizable in hardware. The RTL level
system description generated from the synthesized VHDL
code is shown in Figure 6.3.
Figure 6.3(vii) –(a)RTL top level of RGB to YIQ/Y’UV colour converter
Based on synthesis results on a Spartan 3 FPGA device,

the device usage is as shown in Table 6.3.
Device Usage percentage

Number of Slices 268 out of 1920 13%
Number of Slice Flip 373 out of 3840 9%
Flops
Number of 4 input LUTs 174 out of 3840 4%
Number of bonded IOBs 51 out of 173 29%
Number of MULT18X18s 9 out of 12 75%
Number of GCLKs 1 out of 8 12%
Table 6.3 – Device utilization summary of RGB2YIQ/Y’UV converter
The minimum period is 8.313ns (Maximum Frequency:

120.293MHz), which is extremely fast. Thus for a 256 x 256
image, using the formula from chapter 3, we get 1835
frames per second. The results of software and VHDL
86
hardware simulation are shown and compared in Figure
6.3(viii).
(a) (b) (c)

Figure 6.3(viii) – Software (first row) and VHDL hardware (second row)
simulation results of RGB2YIQ converter showing (a) Y (b) I and (c) Q
channels
This particular implementation takes 8-bit colour values and

can output up to 10 bits though it can easily be scaled to
output 9 bits, where the extra bit is used for the signing
since we expect negative values. The formula for
conversion back to RGB from YIQ is given as :
(6.3-4)
The architecture for this conversion is the same as the

RGB2YIQ, except that the coefficients are different and the
image result from the VHDL hardware simulation of the YIQ
to RGB conversion are shown in Figure 6.3(ix).
87
(a) (b)
Figure 6.3(ix) – (a) Software and (b) VHDL hardware simulation results
of YIQ2RGB converter
The colour of the image in Figure 5.3(ix) obtained from the

hardware result (b) is different and the solution to improving
the colour of the output is left as an exercise to the reader.
The next converter to investigate is the RGB to YCbCr
architecture.
R Y
G RGB to YCbCr Cb
converter
B Cr
Figure 6.3(x) – Software and hardware simulation results of RGB2HSI

converter
The equation for the RGB to Y‟CbCr conversion is similar to

that of the YIQ and Y‟UV methods (in that they all involve
simple matrix multiplication) and is shown in (6.3-5).
(6.3-5)
88
The architecture is also similar except that there are
additional adders for the constant integer values.
Figure 6.3(ix) – Hardware architecture of RGB2YCbCr converter
The results of the hardware and software simulation are

shown in Figure 6.3)x and its very difficult to differentiate
the two images considering that the hardware result was
generated from fixed point math and using truncated
integers without floating point values unlike in the software
version. However, it is up to the reader to investigate the
conversion back to RGB space and what would be the likely
result in RGB space using the image results from the VHDL
hardware simulation.
89
Figure 6.3(x) – (a)Software and (b) VHDL hardware simulation results

of RGB2YCbCr colourspace converter
(a) (b) (c)

Figure 6.3(xi) – Software (first row) and VHDL hardware (second row)
simulation results of RGB2YCbCr converter showing (a) Y (b) I and (c)
Q channels
The RGB2Y’CbCr circuit was rapidly realized by including

three extra adders in the circuit template used in performing
NTSC and Y’UV conversions and loading a different set of
90
coefficients. Thus, the device utilization results and
operating frequencies are similar.
The ease of hardware implementation of video colour space

conversion is a great advantage when designing digital
hardware circuits for high speed colour video and image
processing where a lot of colour space conversions are
regularly required.
6.4 Non-linear/non-trivial colour spaces

These are the more complex colour transformations, which
are better models for human colour perception. These
colour spaces decouple the colour information from the
intensity and the saturation information in order to preserve
the values after non-linear processing. They include;
 Karhunen-Loeve Colour Coordinate System
 HSV Colour Coordinate System
 HSI/LHS/IHS Colour Coordinate System
We will focus on the HSI and HSV colour spaces in this
section.
The architecture for the conventional RGB2HSI described

in [] is depicted in Figure 6.4(i).
RG B
I (6.4-1)
3
3  min R, G, B 
S  1 (6.4-2)
RG B
91
 if B  G
H  (6.4-3)
360   if B  G
 1 


R  G   R  B  

  cos 1  2 (6.4-4)



 R  G 2  R  B G  B   

 
0.5
R
- ×
G +
| . |2 / cos-1(.)
- + √
- MUX
/
B
× 2π
2π
H
-
comparator 1
- S
×
3
+ / I
3
Figure 6.4(i) – RGB2HSI colour converter hardware architecture
The results for HSI implementation are shown in Figure 6.4.

For further information on this implementation, refer to
references at the end of the chapter. The conventional HSI
conversion for a hardware synthesis is extremely difficult to
implement accurately in digital hardware without using
some floating point facilities or large LUTs.
The results of the implementation are shown in Figure

6.4(ii) where the last two images show the results when the
individual channels are processed and recombined after
being output and the latter is before being output.
92
From visual observation, the hardware simulation results

are quite good.
Figure 6.4(ii) – Software and hardware simulation results of RGB2HSI

converter
The equations for conversion to HSV space are :

V  max( R, G, B) (6.4-5)
S  V  min( R, G, B) (6.4-6)
G  B
 S , for R  V

 BR (6.4-7)
H  2  , for G  V
 S
 RG
4  S , for B  V

The diagram of the hardware architecture for RGB to HSV

colour space conversion is shown in Figure 6.4(iii). Note the
division operations and the digital hardware constraints and
device a solution for implementing these dividers in a
synthesizable circuit for typical FPGA hardware.
93
4
R
- +
G /
2
- + MUX
/ H
B
/ 6
-
/
Max
- S
Min
v
Figure 6.4(iii) – Hardware architecture of RGB2HSV converter
The synthesizable HSV conversion is relatively easy to

implement in digital hardware without floating point or large
LUTs.
The results of VHDL implementation are shown in Figure

6.4(iv). Compare the hue from the HSV to the HSI and
decide which one is better for colour image processing.
(a) (b) (c)

Figure 6.4(iv) – (a) Software and hardware simulation results of
RGB2HSV converter for (b) individual component channel processing
and (c) combined channel processing
94
The implementations of these non-linear colour converters
are quite involved and much more complicated than the
VHDL implementations of the other colour conversion
algorithms.
Summary
In this chapter, several types of colour space conversions
were investigated and implemented in VHDL for analysis.
The architectures show varying levels of complexity in the
implementation and can be combined with other
architectures to form a hardware image processing pipeline.
It should also be kept in mind that the architectures
developed here are not the most efficient or compact but
provide a basis for further investigation by the interested
reader.
References
2007.
1981.
 E. Welch, R. Moorhead, and J. K. Owens, "Image Processing
using the HSI Colour space," in IEEE Proceedings of
Southeastcon '91, Williamsburg, VA, USA, 1991, pp. 722-725.
 T. Carron and P. Lambert, "Colour Edge Detector using jointly
Hue, Saturation and Intensity," in Proceedings of the IEEE
95
International Conference on Image Processing (ICIP-94),
Austin, TX, USA, 1994, pp. 977-981.
 Andreadis, "A real-time color space converter for the
measurement of appearance," Journal of Pattern Recognition
vol. 34 pp. 1181-1187, 2001.
96
APPENDIX A
Circuit Schematics
Appendix A contains the schematic design files and the

device usage summary generated from the synthesized
VHDL code (relevant sample code sections are also
included) using the Xilinx Integrated Software Environment
(ISE) synthesis tools.
97
Appendix A
.
Figure A1 – Demosaicking RTL schematic1
98
Appendix A
99
Appendix A
100
Appendix A
101
Appendix A
102
Appendix A
103
Appendix A
104
Appendix A
Figure A8 – Colour Space Converter RTL schematic
105
APPENDIX B
Creating Projects/Files in VHDL Environment
Appendix B contains the continuation guide of setting up a

project in ModelSim and Xilinx ISE environments.
106
Appendix B
Figure B1 – Naming a new project
107
Appendix B
Figure B2 – Adding a new or existing project file
108
Appendix B
Figure B3 – Creating a new file
109
Appendix B
Figure B4 – Loading existing files
110
Appendix B
Figure B5 – Addition and Selection of existing files
111
Appendix B
Figure B6 – Loaded files
Figure B7 – inspection of newly created file

112
Appendix B
Figure B8 – Inspection of existing file
Figure B9 – Compilation of selected files
113
Appendix B
Figure B10 – Compiling Loaded files
Figure B11 – Successful compilation

114
Appendix B
Figure B12 – Code Snippet of newly created VHDL file
115
Appendix B
Figure B13 – Adding a new VHDL source in an open project
116
Appendix B
Figure B14 – Adding an existing file to an open project
117
APPENDIX C
VHDL Code
Appendix C lists samples of relevant VHDL code sections.
118
Appendix C
example_file.vhd
library IEEE;
----TOP SYSTEM LEVEL DESCRIPTION-----
entity example_file is
port ( ---the collection of all input and output
ports in top level
Clk : in std_logic; ---clock for
synchronization
rst : in std_logic; ---reset signals for new
data
input_port : in bit; ---input port
output_port : out bit); ---output port
end example_file;
---architecture and behaviour of TOP SYSTEM
LEVEL DESCRIPTION in more detail
architecture behaviour of example_file is
---list signals which connect input to output
ports here
---for example
signal intermediate_port : bit := '0'; --
initialize to zero
begin ---start
process(clk, rst) --process which is
triggered by clock or reset pin
begin
if rst = '0' then --reset all output ports
intermediate_port <= '0'; --initialize
output_port <= '0'; --initialize
elsif clk'event and clk = '1' then --operate
on rising edge of clock
intermediate_port <= not(input_port); -
-logical inverter
output_port <= intermediate_port or
input_port; --logical or operation
end if;
end process; --self-explanatory
end behaviour; --end of architectural behaviour
119
Appendix C
colour_converter_pkg.vhd
------------------------------------------------
------------------------------------------------
library IEEE;
use IEEE.numeric_std.all;
package colour_converter_pkg is
--Filter Coefficients---------------------------
--------------------------------------------
---NTSC CONVERSION COEFFICIENTS USING Y, I, Q
------------------------------------------------
------------------------------------------------
constant coeff0 : std_logic_vector(15 downto
0):= "0001001100100011"; -- 0.299
0):= "0010010110010001"; -- 0.587
0):= "0000011101001100"; -- 0.114
0):= "0010011000100101"; -- 0.596
0):= "1110111001110111"; -- -0.274
------------------------------------------------
0):= "1110101101100100"; -- -0.322
0):= "0000110110000001"; -- 0.211
0):= "1101111010000111"; -- -0.523
0):= "0001001111111000"; -- 0.312
----------------------------------------------
--End colour Coefficients-----------------------
------------------------------------------------
constant data_width : integer := 16;
end colour_converter_pkg;
------------------------------------------------
------------------------------------------------
----------------------------
120
Appendix C
colour_converter.vhd
library IEEE;
use work.colour_converter_pkg.all;
entity colour_converter is
generic (data_width: integer:=16);
port (
Clk : in std_logic;
rst : in std_logic;
R, G, B : in integer range 0 to 255;
X, Y, Z : out integer range -511 to 511;
Data_out_valid : out std_logic
);
end colour_converter;
architecture struct of colour_converter is
signal x11, x12, x13, X21, x22, x23, x31, x32,

x33 : std_logic_vector(data_width-1 downto 0);
signal m0, m1, m2, m3, m4, m5, m6, m7, m8 :

signed((data_width*2) downto 0):=(others=>'0');
signal a10, a20, a30 : signed((data_width*2)+1
downto 0):=(others=>'0');
begin
Data_out_valid <= '1';
x11 <= conv_std_logic_vector(R, 16);
x21 <= x11; x31 <= x21;
x12 <= conv_std_logic_vector(G, 16);
x22 <= x12; x32 <= x22;
x13 <= conv_std_logic_vector(B, 16);
x23 <= x13; x33 <= x23;
----multiplication------------------------------
-----------------------------------
m0 <= signed('0'&x11)*signed(coeff0);
121
Appendix C
----addition------------------------------------
-----------------------------
a10 <= (m0(32)&m0)+m1+m2;
a20 <= (m3(32)&m3)+m4+m5;
a30 <= (m6(32)&m6)+m7+m8;
----output--------------------------------------
---------------------------
Data_out_valid <= '1';
X <= conv_integer(a10(24 downto 14));
Y <= conv_integer(a20(24 downto 14));
Z <= conv_integer(a30(24 downto 14));
end struct;
122
Index
Index
adders ................ 27, 47, 89, 90 Edge-sensing Bilinear 2 ........ 42

amplitude ............................. 71 embossing............................ 73
anti-derivative ...................... 74 FIFOs................................... 47
ASICs .............................. 6, 20 filter kernel4, 28, 30, 32, 34, 35,
averaging filter ..................... 75 42, 66
background .......................... 68 fixed-point ................ 21, 61, 86
Bayer Colour Filter Array ...... 38 flip flops.............. 26, 27, 29, 47
Bilinear ................................ 42 floating point . 21, 61, 85, 89, 92,
binary ....................... 18, 60, 61 94
black box ............ 12, 13, 45, 55 Floating point calculations .... 21
Calculus......................... 73, 74 floating point cores ............... 21
Canny .................................. 71 foreground ........................... 68
CFA array ...................... 49, 50 Fourier transform.................... 3
CIELAB/ L*a*b* .................... 78 Fourier Transform................... 3
circularly symmetric ............. 75 FPGAs ................................... 6
CMY .............................. 79, 81 frame rate ............................ 56
CMYK ................. 79, 80, 81, 82 Frequency . 2, 23, 36, 56, 65, 86
colour .. 2, 8, 37, 38, 40, 42, 51, Domain .............................. 2
53, 59, 62, 69, 78, 79, 80, Gamma Correction ... 37, 62, 63
82, 85, 86, 87, 88, 91, 92, Gaussian 30, 32, 43, 44, 75, 76,
93, 94, 95, 120, 121 77
colour filter array ............ 37, 40 global ............................... 4, 59
colour image processing ...... 94 gradients ........................ 71, 74
Colour Image Processing .. 1, 2, hardware ................................ 5
23, 36, 68, 77, 95 hardware description language
colour space ....... 69, 79, 82, 91 ..................................... vi, 6
Colour space conversions hardware description
additive, subtractive, linear, languages (HDLs) .............. 6
non-linear .................... 78 hardware IP cores ................ 66
contrast... 25, 59, 62, 65, 66, 75 hardware simulation . 52, 53, 89
convolution ........... 2, 30, 38, 40 HDL ................................. 6, 18
CPLDs ............................. 6, 20 high boost filtering ................ 77
demosaicking ..... 25, 37, 38, 39, high frequency components. 64,
40, 42, 44, 45, 46, 47, 50, 73, 75
51, 53, 55, 56, 57, 58, 74 Histogram Clipping 37, 62, 63
derivative filters ......... 70, 73, 74 histogram equalization.......... 78
display range ....................... 73 Homomorphic filter ... 66, 67, 68
double-precision floating point HSI/LHS/IHS ........................ 91
........................................ 85 HSV ......................... 91, 93, 94
dynamic range .......... 60, 62, 66 ICC profiles .......................... 80
edge detection ... 25, 26, 70, 71, IEEE libraries ....................... 11
77 Illuminance/Reflectance ....... 66
Edge-sensing Bilinear .......... 42
123
Index
image vi, 1, 2, 3, 4, 5, 6, 18, 19, kernel ................................. 4
20, 23, 25, 27, 28, 30, 32, Non-linear .............................. 4
34, 37, 38, 39, 40, 42, 43, open source ........................... 6
44, 49, 50, 51, 53, 54, 56, partial differential equations .. 70
57, 59, 60, 61, 62, 63, 64, passband ............................. 73
65, 66, 67, 68, 69, 70, 71, pattern and texture recognition
72, 73, 74, 75, 76, 78, 79, ........................................ 78
80, 81, 84, 85, 86, 87, 88, Pattern Recognition
89, 91, 94, 95 interpolation ..................... 42
image contrast Pixel Binning ........................ 42
enhancement/sharpening. 25 pixel counter......................... 47
Image Enhancement ... 1, 59, 69 Prewitt ..... 70, 71, 72, 73, 74, 77
Image Reconstruction ............ 1 raw format ............................ 43
image scene .................. 66, 68 Relative Edge-sensing Bilinear
integration ............................ 74 ........................................ 42
interpolating filter ................. 38 restoration/noise
Karhunen-Loeve .................. 91 removal/deblurring ........... 25
kernel coefficients .......... 64, 74 RGB . 39, 40, 44, 51, 67, 68, 70,
Laplacian ............ 44, 64, 70, 77 78, 79, 82, 84, 85, 86, 87,
line buffers ..................... 27, 47 88, 89, 93
linear 21, 22, 35, 39, 44, 45, 47, RGB colour .. 39, 40, 44, 51, 70,
53, 57, 59, 64, 65, 78, 91, 95 78, 79, 82
linear interpolation................ 57 RGB2HSI ...... 83, 88, 91, 92, 93
logarithm transform ........ 60, 65 Roberts .......................... 70, 71
logarithmic .......... 62, 64, 65, 69 ROM .............................. 22, 60
look-up-table (LUT) .............. 22 segmentation ............. 2, 25, 78
low frequency components... 74 separable 28, 30, 47, 75, 76, 77
LUT .......................... 22, 60, 61 sharpness ...................... 59, 75
mat2gray ............................. 73 shift registers.................. 27, 47
MATLAB ....vi, 6, 19, 20, 23, 36, signal . 2, 13, 14, 21, 22, 26, 46,
44, 45, 46, 63, 69, 72, 73, 54, 55, 119, 121
77, 82, 85, 95 simulation .... vi, 6, 9, 10, 50, 51,
maximum frequency............. 56 52, 53, 54, 64, 68, 71, 72,
mean filters .......................... 75 73, 74, 76, 83, 84, 85, 87,
median .................................. 5 88, 89, 90, 93, 94
median and variance filters .... 5 SIMULINK ................ 19, 46, 47
ModelSim .... 6, 7, 8, 14, 15, 16, Smooth Hue Transition ......... 42
18, 54, 106 smoothing ..... 44, 70, 74, 75, 77
morphological ...................... 78 Sobel ...... 44, 70, 71, 72, 74, 77
multipliers ...................... 27, 48 Spatial ................ 2, 4, 5, 25, 69
multiply-accumulate . 28, 30, 32, Domain .............................. 2
34 Spatial domain ................... 4, 5
natural logarithm .................. 60 Spatial domain filtering ........... 5
Nearest Neighbour ............... 42 spatial filtering ...4, 6, 25, 26, 65
neighbourhood ............ 1, 64, 75 spatially varying.................... 65
124
Index
sRGB................................... 79 70, 71, 72, 73, 74, 76, 77,
Symmetric filter .................... 32 78, 82, 83, 85, 86, 87, 88,
textio.................................... 18 89, 90, 94, 95, 97, 106, 115,
tone ..................................... 59 116, 118
Unified Model Language (UML) weak edges .......................... 74
........................................ 19 window generator 26, 27, 29, 47
un-sharp masking ................ 64 Xilinx .... 6, 7, 14, 15, 16, 17, 18,
Unsharp masking ................. 64 23, 24, 36, 54, 56, 68, 69,
unsigned ........................ 54, 73 96, 97, 106
Variable Number Gradients .. 42 Xilinx Project Navigator . 14, 15,
Verilog ................................... 6 17
VHDL..... vi, 2, 6, 12, 13, 14, 16, YCbCr ............................ 82, 88
18, 19, 20, 21, 23, 25, 37, YIQ NTSC ............................ 82
44, 45, 48, 49, 54, 55, 56, YUV ..................................... 82
57, 59, 64, 66, 67, 68, 69,
125

Image Processing With VHDL PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Processing With VHDL PDF

Uploaded by

Copyright:

Available Formats

First Edition

Cover design by U. Chuks

All rights reserved.

No part of this book may be reproduced in any

By delving into the architectural design and implications of the chosen

1.1 Overview of Digital Image Processing

A list of the areas of digital image processing includes but is

For the purposes of this book we shall focus on the areas of

1.1.1 Application Areas

1.2 Digital Image Filtering

Digital image filtering can be performed in the Frequency,

1.2.1 Frequency Domain

And its inverse transform back to the spatial domain is;

Where is the discrete image function in the

Pre- Frequency Post-

Figure 1.2.1(i) - Fundamental steps of frequency domain filtering

1.2.2 Spatial Domain

The spatial domain mask filtering involves convolving a

Figure 1.2.2(i) - Basic steps in spatial domain filtering

Spatial domain filtering is highly favoured in hardware

Since a lot of the algorithms in this book involve spatial

1.3 VHDL Development Environment

Tools for hardware development with VHDL include such

We will be using the Xilinx software and ModelSim software

Once ModelSim is installed, run it and the window like the

Figure 1.3.1(i) – ModelSim starting window

Click on the Project option and a dialog box appears as

Figure 1.3.1(ii) – Creating a new project in ModelSim

Since we would like to add a new file for illustrative

Then we add existing files by clicking the „Add Existing File‟

The rest of the process is easy to follow. For further

Now these files can be compiled before simulation as

Successful compilation is indicated by messages in green

Figure 1.3.1(iii) – Creating a new project

Once there are no more errors, the simulation of the files

You can choose to add several more windows to view the

Figure 1.3.1(iv) – Changing directory for new project

The newly created file is empty upon inspection, thus we

Figure 1.3.1(v) – Adding libraries

With that done, the next step would be to add the

Figure 1.3.1(vi) – Top level system description of example_file

This leads to the top level architecture description in VHDL

----TOP SYSTEM LEVEL DESCRIPTION-----

Figure 1.3.1(vii) – VHDL code for black box description of example_file

The next step is to detail the actual operation of the system

---architecture and behaviour of TOP SYSTEM

Figure 1.3.1(viii) – VHDL code for operation of example_file

The next line indicates the beginning of a triggered process

The if…then…else…then statements indicate what actions

The actual logical operation starts at the rising edge of the

1.3.2 Creating a new project in Xilinx ISE

Once the software has been fully installed, we can then

Creating a project in the Xilinx ISE is similar to the process

A brief introduction to creating a project in Xilinx is shown in

Figure 1.3.2(i) – Opening the Xilinx Project Navigator

We leave all the other options as they are since we will be

Depending on the device you are implementing your design

The design process from theoretical algorithm description to

Figure 1.3.2(ii) – Creating a new project in Xilinx Project Navigator

Figure 1.3.2(iii) – Creating a new project name

Figure 1.3.2(iv) – Selecting a Xilinx FPGA target device

Clicking Next to the next set of options allows you to add

1.3.3 Image file data in VHDL image processing

Kerne /pixel +/ / +/pixe */ pixel +/ pixel