You are on page 1of 108

Adaptive Real-1ime Image 1hresholding

for Hardware Implementation





by


Llham Ashari


A thesis
presented to the Uniersity o \aterloo
in ulillment o the
thesis requirement or the degree o
Master o Applied Science
in
Llectrical and Computer Lngineering



\aterloo, Ontario, Canada 2004


Llham Ashari 2004


ii
I hereby declare that I am the sole author o this thesis.
I authorize the Uniersity o \aterloo to lend this thesis to other institutions or indiiduals or
the purpose o scholarly research.





Llham Ashari





I urther authorize the Uniersity o \aterloo to reproduce this thesis by photocopying or other
means, in total or in part, at the request o other institutions or indiiduals or the purpose o
scholarly research.





Llham Ashari





iii
Abstract

An adaptie thresholding algorithm concerning extraction o targets rom the background in a
gien image sequence is proposed or implementation in hardware. 1he conentional histogram
based thresholding methods are deicient in detecting targets due to the poor contrast between
targets and the background, or to the change o illumination. Other thresholding techniques are
not ast enough or hardware implementation in real-time application. 1he proposed
thresholding algorithm calculates a global optimum threshold by learning rom the image
background and oreground eatures. A simple two-weight neural network is employed to cluster
the oreground and background pixels.
1he main application or the algorithm is in a high speed laser rangeinder. Lxperiences
conirm that the proposed algorithm has superior perormance in separating objects rom
background in comparison with other thresholding methods. 1he superior eature o the
algorithm is its simplicity and ease o implementation in hardware. A special purpose hardware
implementation in lPGA is presented as well. 1he applied approximation in data path and its
eects in the results are discussed in detail.
linally the speed requirement or real-time applications and speed enhancement in this
type o applications are described. 1wo dierent pipelined architectures and their speed
perormance are analysed.

i
Acknowledgements

I would like to express my deepest appreciation to my superisor, Pro. Richard lornsey, or his
constant support, encouragement, adice, and belie in me throughout this work. le gae me
the opportunity to continue my education in postgraduate leel.

I would like to acknowledge Canadian Microelectronics Corporation and Xilinx Inc. or their
generosity and donation, XSA-50 Spartan-II Prototyping Board with 2.5V, 50,000-gate lPGA.

I also would like to acknowledge Dr. lamid R. 1izhoosh system design engineering proessor
or his interesting ideas and discussions we shared.

I would like to show appreciation or my riends at VISOR lab specially \innie \ong or her
aluable help.

Special and warmest thanks to my husband Masoud Makrehchi who has been constant source o
support and encouragement. I am grateul to his eort on my educational and emotional lie.

linally, my proound gratitude goes to my parents, l. Mosheghian and A.M. Ashari, or their
unconditional loe and support.




Contents

J Introduction and Motivation.................................................................................. 1
1.1 Motiation ........................................................................................................... 1
1.2 Outline ................................................................................................................. 3

2 Overview of Image Binarization........................................................................... 4
2.1 1hresholding ....................................................................................................... 5
2.2 Reiew o Lxisting 1hresholding 1echniques ............................................... 8
2.3 Perormance Laluation.................................................................................... 8

3 Proposed Approach .................................................................................................... 11
3.1 Competitie Learning Neural Network ........................................................ 11
3.2 Proposed 1hresholding Method .................................................................... 15
3.2.1 The Network Convergence .................................................. 17
3.3 MA1LAB Simulation Results......................................................................... 24
3.3.1 The Laser Spot Application ................................................. 24
3.3.1.1 Poor Contrast Images............................................ 24
3.3.1.2 Noisy Images ........................................................ 26
3.3.1.3 Various Illuminations............................................ 28
3.3.2 Other Applications ............................................................... 30
3.3.2.1 Document Binarization......................................... 30
3.3.2.2 Face Recognition .................................................. 31
3.3.2.3 Low Resolution Images ........................................ 32
3.4 Summary ............................................................................................................ 33

4 Hardware Iramework............................................................................................... 35
4.1 Introduction ...................................................................................................... 35
4.2 Platorm Oeriew........................................................................................... 36
4.3 System oeriew............................................................................................... 38

i
4.3.1 Downloading the Design to XSA board .............................. 39
4.3.2 Downloading/Uploading the Image..................................... 40

S Algorithm Implementation in Hardware....................................................... 43
5.1 lardware Block Diagram ............................................................................... 43
5.2 Memory Controller Unit ................................................................................. 44
5.2.1 Read/Write Operation Timing ............................................. 47
5.3 \eight-Updating Unit ..................................................................................... 49
5.3.1 Data Type............................................................................. 54
5.3.2 Data Width........................................................................... 54
5.4 1hresholding Unit ............................................................................................ 56
5.5 Clock Distribution............................................................................................ 5
5.6 Implementation Results................................................................................... 58
5.6.1 Visual Performance.............................................................. 58
5.6.2 Speed.................................................................................... 61
5.6.3 Area...................................................................................... 64

6 Iuture Work.................................................................................................................... 65
6.1. Approximate 1hresholding............................................................................ 66
6.2 Pipeline Thresholding Architecture .................................................... 68
6.1.1 Pipelined Thresholding........................................................ 68
6.1.2 Parallel Pipeline Thresholding............................................. 71

7 Concluding Remarks................................................................................................. 3
A Abbreviations................................................................................................................A-1
B VHDL Code............................................................................................................... B-6
B.1. General.hd................................................................................................... B-1
B.2 memCnt.vhd......................................................................................B-2
B.3 memCntMod....................................................................................B-11
B.4 ImBinar.vhd.....................................................................................B-14
B.5 ImBinarMod.vhd.............................................................................B-17


ii
List of Iigures

2.1 : A triangulation rangeinder ........................................................................................................ 5
2.2 : 1hresholding example,................................................................................................................ 6
2.3 : Optimum threshold alue range or low contrast images, ...................................................
2.4 : 1hresholding with patterned back ground, ............................................................................
3.1 : A simple processing element ,node,. ...................................................................................... 12
3.2 : A neural network structure. ..................................................................................................... 13
3.3 : A simple competitie neural network..................................................................................... 14
3.4 : 1wo-Dimensional data clusters and their weight ectors.................................................... 15
3.5 : \eights positions....................................................................................................................... 16
3.6 : Update process low chart........................................................................................................ 1
3.: Conergence and constant learning rate. ................................................................................ 18
3.8 : Conergence and initial alue .................................................................................................. 20
3.9 : \C1 and proposed method.................................................................................................... 21
3.10: \C1 and proposed method................................................................................................... 22
3.11 : \eight conergence in the proposed method. ................................................................... 23
3.12 : \eights conergence and initial alues in the proposed method .................................... 23
3.13 : Poor contrast images o laser spot ........................................................................................ 25
3.14 : Noise eect on the perormance o the proposed approach showing
in term o X-projection and \-projection o the leaser spot .......................................... 2
3.15: Dierent illuminations............................................................................................................. 29
3.16 : 1he results o consecutie subtraction o binary images in ligure 3.15 ......................... 29
3.1 : Document binarization application ..................................................................................... 31
3.18 : lace recognition application.................................................................................................. 32
3.19 : Low resolution images............................................................................................................ 33
4.1 : XSA board. ................................................................................................................................. 38
4.2 : ,a, 1est system, ,b, Practical system. ...................................................................................... 39
4. 3 : Board programming low. ....................................................................................................... 40
4.4 : SDRAM programming low. ................................................................................................... 41
5.1 : 1hresholding block diagram. ................................................................................................... 44

iii
5.2 : Memory controller interaces. ................................................................................................. 46
5.3 : Memory Controller timing diagram. ....................................................................................... 49
5.4 : \eight-updating unit low chart. ............................................................................................ 50
5.5 : lunctional simulation result or weight-updating unit......................................................... 53
5.6 : An example or approximation. .............................................................................................. 55
5. : 1hreshold unit low chart......................................................................................................... 56
5.8 : lunctional simulation result or thresholding unit. .............................................................. 5
5.9: Clock distribution. ...................................................................................................................... 58
5.10 : lardware s. Matlab results or normal images ................................................................ 59
5.11 : lardware s. Matlab results or poor contrast images..................................................... 60
5.12 : Critical path. ............................................................................................................................. 63
6.1 : 1he system block diagram. ..................................................................................................... 68
6.2: Pipeline thresholding block diagram........................................................................................ 69
6.3 : Pipelined thresholding timing diagram................................................................................... 69
6.4 : Multi-rame Pipeline thresholding block diagram. ............................................................... 0
6.5 : Parallel pipeline block diagram................................................................................................ 1


ix
List of 1ables

2.1 : 1hresholding ealuation ranking o ND1 images. ............................................................... 10
3.1 : Centroid position changes o the laser spot. ......................................................................... 26
5.1: Memory,Memory Controller Interacing. .............................................................................. 46
5.2 : Memory Controller,Internal Logic Interacing. ................................................................... 4
5.3 : 1he calculated alues or weights............................................................................................ 53
5.4 : Objects in weigh-updating unit. .............................................................................................. 55
5.5 : 1hreshold alue with approximation and without approximation. .................................. 61
5.6 : Logic consumption in lPGA. ................................................................................................. 64


1



Chapter J
Introduction and Motivation

Image binarization is one o the principal problems o image processing applications. lor
extracting useul inormation rom an image we need to diide it into distinctie components
e.g. background and oreground objects or urther analyses. Oten the gray leel pixels o
oreground components are quite dierent rom background. 1hresholding becomes then an
eectie technique to separate objects rom background. Seeral superior methods or image
binarization hae been reported and implemented. 1he main goal o most o these is high
eiciency in term o perormance rather than speed. loweer or some applications especially
those inoling customized hardware the speed is the key requirement.
lor example, on-chip image processing integrated with CMOS imager sensors is
prealent in a ariety o imaging system. In such systems the real-time processing and
inormation are ital. Applications present or these systems are robotics, automobiles, object
tracking, and laser range inding.


J.J Motivation
1his thesis attempts to enhance a thresholding method or image binarization and implement it
in hardware or real-time applications. 1he requirement o a ast and simple thresholding
technique has widespread applications in practical imaging systems. One application is laser

2
rangeinding where the range o an object in motion is determined, the captured image is
binarized. 1he thresholding technique is applied to separate the laser spot rom the background
and to locate the spot centroid. 1he rangeinding technology and CMOS image sensor are used
in industrial inspection such as production quality control monitoring, solder paste height and
uniormity, and lead position and pitch on integrated circuits.
Another application is systems that use a laser pointer as a pointing deice, where they
acquire image rames at the rate o real-time. lor example paper-based puzzle game on a
computer with a display and laser pointer as mouse, and in an interactie presentation where the
audience are able to control the slides by laser pointer |1|.
1he other application o real time thresholding is document processing and Optical
Character Recognition ,OCR,. lor example a high-speed scanner can scan and process oer one
hundred pages per minute. 1he speed requirement in this system imposes a dedicated hardware
or image processing and binarization. 1ypically image captured rom scanners by CMOS or
CCD camera are conerted to binary images. A document consists o text on a relatiely
uniorm background. 1hereore conerting it to a binary image is suitable or output and storage
because it signiicantly reduces size without loss o important data.
All o the mentioned applications hae one thing in common. 1he high perormance and
high precision systems dictate an eicient and ast algorithm or thresholding. 1hey also use the
image binarization as pre-processing step prior to urther processing. 1hereore they hae to be
able to separate the objects rom background by calculating an optimum threshold alue to
aoid losing important inormation ,such as object dimensions and shape,. Consequently, the
goals o this research are:
1. Lxploring an eicient thresholding method in terms o speed and eiciency suitable or
hardware implementation.
2. Implementing the proposed algorithm in hardware.
3. Lnhancing the hardware design or pipelined architecture.



3
J.2 Outline
1his thesis describes new technique or image thresholding in real-time applications. 1he
thresholding technique is implemented in an lPGA. \e also present two architectures or real-
time applications.
Cbater 2 proides an oeriew o image binarization concentrating on laser spot location
application. \ell-known image thresholding techniques and their perormance ealuation are
discussed.
Cbater describes the proposed algorithm or thresholding techniques. Perormance
comparison in MA1LAB is presented. Simulation results or the perormance o our algorithm
in comparison with Otsu`s method and the weight-based clustering method are shown. 1he
perormance o the proposed algorithm or dierent illumination, noise, and applications is also
discussed.
Cbater 1 proides an oeriew o Xilinx XSA board as a platorm or the hardware
design demonstration. 1he set up o the test bench system is described.
Cbater : presents lPGA implementation o the proposed algorithm. Simulation results
or hardware implementation concentrating on the unctional perormance are discussed. 1he
hardware perormance in terms o speed and area is pointed out.
Cbater presents dierent architectures or real-time application as possible uture
directions o this research. 1iming condition or each architecture is discussed.
Cbater draws some key conclusion rom the work presented in this thesis. 1he results
o the research are summarized and pros and cons are highlighted.
.evai . includes a glossary o terms to assist the comprehension o the reader.
.evai includes VlDL code or the deeloped hardware.

4



Chapter 2
Overview of Image Binarization

Image binarization has broad range application in machine ision, biomedical imaging, image
segmentation, shape recognition, and target tracking. In all o these applications the gray leel
inormation o image is reduced to bi-leel inormation. 1he main application o the present
research is high-speed object location and range determination where the unimportant image
inormation can be ignored. lere, an object is deined a cluster o gray leel pixels aboe a
threshold alue and well separated rom background. ligure2.1 illustrates an application or ast
object location, a triangulation rangeinder` that uses the image o a laser spot to determine the
range to a moing object. 1he imaging system need only output the centroid location` o the
image o the target spot, which can be calibrated to proide the range. In such cases the
determination o object location, the centriod o an object is crucial. One eicient way to ind
the centroid location is binarizing the captured image to separate the laser spot rom the
background.
ligh speed is one important consideration. lor example in some industrial inspection
applications such as surace proiling, position monitoring, or metrology ,e.g thickness,
ranginding is required at high throughputs. Another consideration is accuracy. ligh accuracy is
necessary either or achieing reasonable perormance oer a long working distance or or high
resolution applications.


5

Iigure2.J : A triangulation rangefinder


2.J 1hresholding
1he objectie o image binarization is to diide an image into two groups. In image processing
applications the gray leel alues assigned to an object are dierent rom the gray leel alues o
the background. 1hresholding is then an eectie way to separate oreground and background
and the result is a binary image. A binary image is obtained by assigning pixels with alues less
than the threshold with zeros and the remaining pixels with ones.
Let us consider image f o size N M with L gray leels in the range ] 1 , 0 [ L . 1he
gray leel or the brightness o a pixel with coordinates ) , ( j i is denoted by ) , ( j i f . 1he
threshold, T , is a alue in the range o ] 1 , 0 [ L . Now, the thresholding technique determines
an optimum alue or T based on predeined measurements, so that:

>

=
T j i f for
T j i f for
j i g
) , ( 1
) , ( 0
) , ( (2-J)


where ) , ( j i g is binarized image. In this work we are interested in a light object on a dark
background thereore in the binarized image the pixels below a certain alue o gray leel are
represented by 0, i.e. background, and the pixels aboe the certain pixel alue are represented by
1, i.e. oreground. ligure 2.2 shows an example or gray-leel image, gray leel distribution
histogram, and the thresholding result.


6
0 100 200
0
200
400
600
800

Iigure 2.2 : 1hresholding example, (a) Original image, (b) Image histogram, (c) Binarized image.

Generally there are two approaches or thresholding methods, local and global. In global
thresholding a single threshold alue is applied to the entire image, while in local thresholding an
image is diided into many sub-regions. lor each sub-region, a speciic threshold is applied.
Local thresholding examines the relationships between brightness o neighbouring pixels to
adapt the threshold according to the intensity statistics o dierent. Sometimes it employs more
than one threshold alue.
1here are some actors that aect the certain alue o gray leel ,threshold`, and
complicate the thresholding, such as poor contrast, inconsistency between sizes o object and
background, non-uniormity in the background, and correlated noise. Sometimes the binary
image loses too much o the region and sometimes gets too many irreleant background pixels.
So, the success o the binarization critically depends on the selection o an appropriate
threshold.
lor example ligure 2.3,a, shows a case where the original image is low contrast image
i.e. the gray leel pixels belonging to oreground pixels are not totally distinct rom the ones in
the background. 1hree dierent binary images are obtained by thresholding at dierent leels.
1he binarized images show the thresholding in this case is too sensitie to the threshold alue.
Another case exists where the threshold alue is critical, when the object ,spot, is located
on a non-uniorm background or patterned background. 1he pattern requently has a gray leel
between those o background or oreground. ligure 2.4,a, shows a captured image on a
patterned background. ligure 2.4,b, shows the binarized image or high threshold where the
irreerent background inormation and noisy pixels are included in the result. In ligure 2.4,c,
the threshold is decreased but some parts o background pattern are detected as object. lor
these cases the threshold method has to be adaptie to the image characteristics.




Iigure 2.3 : Optimum threshold value range for low contrast images,
(a) Original image, (b) Binary image with 1=2J9, (c) Binary image with 1=20S, (d) Binary image with 1=20J.

All o these cases show that inding an optimum threshold requires an adaptie
algorithm. 1he adaptie algorithm calculates a threshold alue based on the image eatures such
as image statistics and image illumination.
In the next section a brie reiew o the existing thresholding techniques is presented.
1his reiew is based on two well-known reerences in the literature |2|, and |3|. Quantitatie
perormance ealuations or these thresholding techniques hae been perormed and the
techniques are ranked based on their perormance.



Iigure 2.4 : 1hresholding with patterned back ground,
(a) Original image, (b) High threshold value, (c) Low threshold value.


8
2.2 Review of Lxisting 1hresholding 1echniques
According to the inormation the thresholding techniques are employing, Sezgin|2| has
categorized them into six groups. 1he categories are:
1. i.tograv .baeba.ea vetboa., where the histogram o the image is iewed as a mixture o
two Gaussian distributions associated to the object and background classes, such as
conex hull thresholding ,Roseneld|4|,, and peak & alley thresholding ,Sezan|5|,.
2. Ctv.terivgba.ea vetboa., where the gray-leel pixels are clustered in two classes as either
background and oreground objects, or alternately are modeled as a mixture o two
Gaussians, such as iteratie thresholding ,Riddler|6|,, clustering thresholding ,Otsu||,,
minimum error thresholding ,Kittler|8|,, and uzzy clustering thresholding ,Jawaher|9|,.
3. vtro,ba.ea vetboa. use the dierence in entropy between the oreground and
background regions, such as entropy thresholding ,Kapur|10|,, and entropy thresholding
,Shanbag |11|,.
4. Ob;ect attribvteba.ea vetboa., ind a measure o similarity ,uzzy shape similarity, edge
coincidence, etc., between the gray-leel and the binarized images, such as edge ield
matching thresholding ,lertz|12|,, and topological stable-state thresholding ,Pikaz|13|,.
5. atiat vetboa., use higher-order probability distribution and,or correlation between
pixels, such as higher order entropy thresholding ,Abutaleb|14|,.
6. ocat vetboa., calculate the threshold alue at each pixel based on the local image
characteristics, such as local contrast method ,\hite |15|,, and surace-itting threshold
,\aowitz |16|,.
Among these thresholding methods we need to ind one method with high perormance.
loweer an important concern about image thresholding is perormance ealuation. In the next
section measures or perormance ealuation are discussed and the results o applying these
measures or some o the mentioned techniques are shown.


2.3 Performance Lvaluation
Perormance ealuation o low-leel image processing, e.g. binarization, is inherently diicult.
One approach in perormance ealuation or binarization is to deine a set o criteria with
dierent weights, and gie scores to each criterion |1| |18|. 1hese criteria are isual criteria or

9
computational criteria. loweer the isual measure is subjectie and aries or dierent
applications. 1he main drawback o subjectie ealuation is application dependency, but it is
useul or identiying poor binarization methods. Alternatiely a quantitatie measure may be
used or ealuation |2||3|.
1here are some isual or subjectie outlines or perormance ealuation, or example:
regions should be uniorm and homogeneous with respect to the original image, region interiors
should be without artiacts, adjacent regions should hae signiicantly dierent alues, and
boundaries o each segment should be simple and continuous and which, or our laser range
inder application, must be spatially accurate.
lor quantitatie perormance ealuation Sezgin|2| has accomplished a comparatie
surey on the methods listed aboe and has adanced some useul criteria or thresholding
perormance ealuation in two dierent contexts, document images and ND1 ,non-destructie
testing, images. It employed an aerage o ie perormance criteria: misclassiication error, edge
mismatch, relatie oreground area error, modiied lausdor distance, and region non-
uniormity. As our work is applied to ND1 images the result o thresholding method ealuation
or ND1 images are shown in 1able 2.1 or the top seen methods. It can be seen that all these
top ranked methods belong to clustering and entropy category. 1he ranking also considers the
subjectie ealuation on the isual outlines o the extracted object.
lrom the hardware implementation point o iew, the eectieness o a thresholding
method can also be considered in terms o other parameters such as speed and complexity.
1hese become ery important in real-time image processing applications. All o the high- ranked
cluster-based techniques ,1able 2.1, hae to compute some image eatures, such as the
histogram, maximum,minimum gray leel alues, or ariance o image, beore calculating the
threshold alue. 1hereore an image must be preprocessed pixel by pixel. lor these methods a
large processing oerhead is present. In the entropy-based techniques complex computational
processes, such as logarithms, are also required. In hardware implementation logarithm and
standard deiation calculation makes the hardware requirement or these methods complicated.
Moreoer, the methods require considerable processing time ater the ull image is aailable to
compute the threshold. Although these discussed methods hae good perormance, they are not
generally suitable or our work. Alternatiely we can enhance or modiy these techniques. 1he
basic requirement or the thresholding method is its adaptability and eiciency. It should also

10
hae the least dependency on image pre-processing. lor our speciic application the techniques
should be able to calculate an optimum threshold or poor contrast image.

1able 2.J : 1hresholding evaluation ranking of ND1 images.
Rank Method 1hresholding Iunction
1
Cluster-
Kittler

{ [ ] ) ( log ) ( ) ( log ) ( 1 ) ( log ) ( min arg T P T P T T P T T P T
b f opt
+ =
[ ] [ ] } ) ( 1 log ) ( 1 T P T P where
) (T
f
and ) (T
b
are foreground and background standard
deviations
2
Entropy-
Kapur
)] ( ) ( max[ arg T H T H T
b f opt
+ = with
) (
) (
log
) (
) (
) (
0
T P
g p
T P
g p
T H
T
g
f
=
= and
) (
) (
log
) (
) (
) (
1
T P
g p
T P
g p
T H
G
T g
b
+ =
=
3
Entropy-
Sahoo
]
4
1
1 [
4
1
[
3 ] 3 [ 2 ] 2 [ ] 1 [
] 3 [ ] 1 [
B w P T B w T P T T
T T opt
+ + + = where
] [ ] [ , 3 , 2 , 1 ), (
) 1 ( ) 3 (
) (
0
] [
T P T P w k g p P
k T
g
T
k
= = =


4 Entropy-Yen
)} ( ) ( max{ arg T C T C T
f b opt
+ = with

=

=
T
g
b
T P
g p
T C
0
2
) (
) (
log ) ( and

=

+ =
G
T g
f
T P
g p
T C
1
2
) ( 1
) (
log ) (
5 Cluster-Lloyd
(
(

+
+
=
) (
) ( 1
log
) ( ) ( 2
) ( ) (
min arg
2
T P
T P
T m T m
T m T m
T
b f
b f
opt

where
2
is the variance of the whole image
6 Cluster-Otsu
(
(

+

=
) ( )] ( 1 [ ) ( ) (
)] ( ) ( )][ ( 1 )[ (
max arg
2 2
2
T T P T T P
T m T m T P T P
T
b f
b f
opt


7 Cluster-Yanni

=
=
*
min
) ( ) (
min max
mid
g
g g
opt
g p g g T






11



Chapter 3
Proposed Approach

As it was mentioned in chapter 2, the clustering-based method is one the high ranked
thresholding techniques. In this method, gray leel pixels o an image are diided into two
clusters, oreground and background. 1he optimum threshold can be calculated rom these
clusters. 1here are seeral approaches or clustering a set o input gray leel pixels. lor example,
1alukdar and Sridhar |19| used an artiicial neural network structure as a clustering technique,
called weighted-based clustering threshold ,\C1,. 1he weighted-based clustering method uses
the clustering property o artiicial neural networks to calculate a threshold, where the threshold
is aerage the centroids o these two clusters.
1his chapter presents a method inspired by the \C1 method. 1he uniqueness o the
method is its speed, simplicity and ease o implementation in hardware. It is a single pass
algorithm that requires minimal pre,post processing. lirst the basic o neural network and
competitie learning neural network is described. 1hen the proposed method and the
implemented results in Maltab are discussed in detail.


3.J Competitive Learning Neural Network
1he clustering o gray leel pixels in an image can be done by an artiicial neural network. Neural
networks are especially useul or classiication and unction approximation,mapping problems.

12
An artiicial neural network ,ANN, is an interconnected assembly o simple elements called
nodes. 1he processing ability o the network is stored in the interlayer connection strengths,
called weights, which are obtained by a process o learning rom a set o training patterns. ligure
3.1 shows the basics o an artiicial neural node. Inputs to the network are represented by ) (n X .
Lach o these inputs is multiplied by a connection weight, these weights are represented by
) (n W . In the simplest case, these products are simply summed, ed through a transer unction
to generate an output.


Iigure 3.J : A simple processing element (node).

\here the summation is described as,

=
i
i i
X W net (3-J)

and the transer unction is,

) (net f Y = (3-2)

Artiicial neural networks cluster the primitie nodes. 1his clustering occurs by creating layers,
which are then connected to one another. 1he connection model o these layers connect ary.
All artiicial neural networks hae a similar structure o topology, some o the nodes interace to
inputs, some hidden nodes interace internal neurons, and other nodes proide outputs.
1he nodes are grouped into layers. 1he input layer consists o nodes that receie input
data orm the external enironment. 1he output layer consists o nodes that communicate the
output o the system to the external enironment. 1here are usually a number o hidden layers
between these two layers, ligure 3.2 shows a simple neural network structure with only two

13
hidden layers. \hen receiing data by input layer, its nodes produce output, which consequently
becomes input to the other layers o the network. 1he process continues until a speciied
condition is satisied.


Iigure 3.2 : A neural network structure.

Lssentially, an ANN has two modes o operation: training mode and operation mode. In
the training mode weights are changed based on the training data set. 1he learning ability o a
neural network is determined by its architecture and by the algorithmic method chosen or
training. All learning algorithms are grouped to superised and unsuperised learning.
In superised learning the inal grouping o data, called target alues, are known, which
implies the input data is labelled. 1he labelled data is gien to the ANN during training so that
the ANN can adjust the weights and match its outputs to the target alues. Ater training the
ANN is gien a set o unlabelled data, which are input alues without target alues. 1he resulting
output is ealuated by measuring its distance with correct target alue ,label,.
In unsuperised learning, the ANN is not proided with the target alues during training.
An unsuperised method can learn a summary o a probability distribution, and then the
summarized distribution can be used to make predictions. Unsuperised ANNs usually perorm
some kind o data compression, such as dimensionality reduction or clustering. One application
or unsuperised neural network is unlabelled data clustering.
During recent decades, ery dierse categories o ANN hae been introduced by
researchers. Lach category is applicable to a speciic domain and proposing a general neural
network to sole all problems seems to be impossible. One o the proposed solutions which is

14
applicable to the classiication and image segmentation problems, is Unsuperised Competitie
Learning |20|. 1his network diides the input data into a number o clusters such that the inputs
in the same cluster hae similar eatures.
A basic competitie learning network has one layer o input nodes and one layer o
output nodes, called the competitie layer. ligure 3.3 depicts a simple competitie learning
network. In this network, an input pattern is a sample point which is represented by an n-
dimensional ector. 1he number o the nodes in the output layer is equal to the number o
distinct classes. 1he outputs are directly connected to the inputs by weighted connections.


Iigure 3.3 : A simple competitive neural network.

In this ANN model, the weights are updated using a competitie learning algorithm.
\eights are initially assigned to a random alue. A weighted sum is calculated or connecting
each input pattern to an output node. lor each pattern, only the weights which are associated to
the winning node are updated. As the updating process proceeds, the weights o the winning
node become closer to the current input pattern. 1he output node moes or a certain
proportion o its distance to the input pattern which is scaled by the learning rate. Lach weight is
updated by the ollowing equation:

) _ ( _ _ W old I W old W new + = (3-3)

where W is winner cluster center, is learning rate, and I is input.
It can be seen that the output node, on the competitie layer, with a weight closer to the
training input pattern wins the competition and the weight connected to this node changes, so
that the distance is decreased. As the training o the competitie learning network proceeds or a
group o similar data patterns, the same weight wins again and is updated. Ater eeding a certain

15
number o input patterns, one speciic weight tends to be updated or a set o similar input
patterns. 1hereore, i the input patterns with distinct classes are applied to the network, each
output node will represent one class o data input and winning weight will go to this output
node.
In the training mode, the closest weight moes toward the cluster to which the data
input belongs. 1he network conergence is guaranteed only i the weight changes decrease
during the training mode. I the weight changes are always less than a predeined small alue, the
ANN is trained and the weight ectors represent the input classes. ligure 3.4 demonstrates the
distribution o two distinct input patterns. \e assume the neural network has only two weights
to be updated. 1he inal alues o these weights are supposed to represent the center o the two
clusters. It should be noted that the center o a cluster is a point rom which the aerage
distance o all points is minimized.


Iigure 3.4 : 1wo-Dimensional data clusters and their weight vectors.


3.2 Proposed 1hresholding Method
In the proposed algorithm, we use competitie learning neural network classiication capability
in order to compute the threshold o an image. It is assumed that the gray leel image can be
classiied into two distinct classes o pixels, background and oreground. 1he input pixels are
used or training the unsuperised competitie learning neural network. Ater the network
conergence or at the end o the training mode, the weights represent the centroids o both
groups o pixels which includes center o background and oreground. ligure 3.5 shows a
bimodal image histogram and weights positions at the end o training mode. 1he threshold is
located at the position with optimum distance rom both groups. In the operation mode, by

16
haing the threshold alue, we can apply the alue to all pixels o the image to conert a gray
leel image to a binary image.

0 50 100 150 200 250
0
20
40
60
80
100
120

Iigure 3.S : Weights positions.

Let us assume eery image consists o two distinct classes which are two major groups o pixels
describing two dierent subjectie properties o the image. \e can call the two pixel groups as
background and oreground. Obiously, this sort o image has a bimodal histogram.
Consequently, in order to construct an ANN, we need just two weights to classiy these two
groups o pixels. 1he weights are updated by input pixels. lor eery input pixel the closest
weight is selected or being updated. 1he dierence between the input pixel and the closest
weight is scaled and added to the closest weight. 1his alue is the updated alue or the winner
,closest, weight. 1he update unction is as ollows,

) (
old i old new
W I W W + = (3-4)

As equation ,3-4, explains the dierence between the input and old weight is scaled by a actor
, also called the learning rate. 1his weight update is applied or eery pixel o the image. At
the end o the training mode, the weights are located at the center o each cluster o the pixels,
namely background and oreground, and the threshold is calculated by taking an aerage o these
two weights. ligure 3.6 shows the lowchart or weight updating and thresholding process.


1

Iigure 3.6 : Update process flow chart.

3.2.J 1he Network Convergence
It is necessary to analyse conergence criteria in the artiicial neural network. In order to achiee
a set o precise weights at the end o training process that addresses an optimum threshold, the
network has to conerge. 1he rate o conergence in most eedback neural networks is a critical
parameter, but in eed-orward networks like the proposed network the rate o conergence is
not applicable. 1he reason is that we hae to terminate the training process at the end o data set
,pixels at this application,, whether it is conerged or not. As a result we hae to analyse the
conergence criteria and set a boundary or the conergence parameters to support guaranteed
conergence. 1he conergence totally depends on two parameters, learning rate and initial alue.
Determining the learning rate as well as the initial alue is dependent on the application.
\ith a constant alue o learning rate the network does not conerge |20|. Initially the
weights may not be near to the actual centroids. I the learning rate parameter is set to a small
alue, then the learning process proceeds smoothly. In this case the network may not conerge
to a stable alue within the number o pixels because the weight moements towards the actual
centroids are too slow. On the other hand i the learning-rate parameter is set to a large alue,
the rate o the learning is accelerated, but now there is a risk that the network dierges and
becomes unstable.
1his problem is explained with an example. In ligure 3.,a, and ,b, the original gray
leel image and its histogram is shown. 1he gray leel distribution histogram ,ligure 3.,b,,
indicates that ideally the weight should be around 100 and 140, which are centroids o
background and oreground clusters o the image. \ith the small alue o learning rate, the net

18
does not conerge. ligure 3. ,d, and ,e, show that the weights do not moe ast enough toward
the expected weight alues or centroids. 1he case or large alue o learning rate or the same
input image is shown in ligure 3. ,g, and ,h,. 1here are luctuations or the weights alues and
consequently the conergence is not achieed. Since the ANN does not conerge properly the
optimum threshold is not obtained. 1his results in poor binary images, ligure 3. ,c, and ,,.

0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 100 200
0
500
1000
Gray Leel
Iteration
Iteration
,a, ,b,
,c, ,d, ,e,
,, ,g, ,h,
Lxpected value
Lxpected value
Lxpected value
Lxpected value

Iteration Iteration

Iigure 3.7: Convergence and constant learning rate, (a) Original image, (b) Image histogram,
(c) (d) and (e) Binarized image, weightJ, weight2 respectively with low learning rate,
(f) (g) and (h) Binarized image, weightJ, weight2 respectively with high learning rate.

1hereore the learning rate has to be decreased gradually as the training proceeds. In
practice, to guarantee the conergence o the network, the learning rate is taken as the reciprocal
o the number o cases that hae been assigned to the winning cluster. Let us assume that or an
input image,
i
CW pixels hae been preiously assigned to the i
tb
weight as the closest or winning
weight. I
i
W is a winning weight once more, it will be updated as ollows,


19
) (
1
1
old
i i
i
old
i
new
W I
CW
W W
i

+
+ = (3-S)

Reducing the learning rate causes each weight approaches to the mean o all pixels
assigned to the corresponding cluster |20| and guarantees conergence o the algorithm to an
optimum alue o the error unction ,the sum o squared Luclidean distances between inputs
and weights,. In other words, as the number o input pixels increases the learning rate o eery
weight, and consequently the update alue or the winning weight, are reduced. Although this
guarantees the conergence but it might be risky when the initial alues o the weights are
trapped in local minimum. 1he \C1 method has applied equation ,3-5, in order to update the
weights.
ligure 3.8 represents how in \C1 method the conergence o the network can be
sensitie to the initial alue o the weight. ligure 3.8 ,d, and ,e, show when the initial alues are
close to the centroids the weights updating process conerges. 1his results in an acceptable
binary image shown in ligure 3.8 ,b,. But i the initial alues are ar away rom the centriods the
network does not conerge. 1he weights conergence with an inappropriate initial alue are
shown in ligure 3.8 ,g, and ,h,. In ligure 3.8,h, initial alue or weight2 is set to a small alue
and it trapped to the local minimums.


20
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 100 200
0
500
1000
\
e
i
g
h
t

1
\
e
i
g
h
t

2
\
e
i
g
h
t

1
\
e
i
g
h
t

2
l
r
e
q
u
e
n
c
y

Iigure 3.8 : Convergence and initial value, (a) Original image, (b) Image histogram,
(c) (d) and (e) Binarized image, weightJ, weight2 respectively for initial values J28,
(f) (g) and (h) Binarized image, weightJ, weight2 respectively for initial values J0.

1he drawback o this method is its sensitiity to the initial alues o weights. Some
modiications are necessary to make the learning rate less dependent on the initial alues o
weights. Lquation ,3-5, indicates the learning rate continuously decreases proportional to the
number o cases the winning weight get updated. Ater a number o the input pixels are
processed the weight becomes smaller. In some cases this process is not desirable, especially
when the gray leel image does not hae a uniorm distribution, or example, images with poor
contrast. 1his problem can be seen in the case that the initial alue is ar away rom the centroid
o the cluster and all into a local minimum. Alternatiely, it can start with a learning rate alue
and the learning rate decrement applies ater a predetermined point. 1he breaking point or
decreasing learning rate monotonicaly may set to the ratio o object pixels to the background
pixels. Beore this point the network is training with the object and background pixels behaiour
and ater that it needs to reduce the weights change rate. 1his modiication enhances the
network conergence but it makes the approach application dependent.

21
1he learning rate enhancement is shown with an example in ligure 3.9. 1he \C1 and
proposed method or equal initial alues o weights are applied to a sample picture. 1he initial
weights are set to out-o-range alues. ligure 3.9,e, shows that weight2 or \C1 method does
not moe toward the correct alue. 1he binary results hae been demonstrated in ligure 3.9,e,
and ,,.

0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 2 4 6
0
100
200
0 100 200
0
500
1000

Iigure 3.9 : WC1 and proposed method, (a) Original image, (b) Image histogram,
(c) (d) and (e) Binarized image, weightJ, weight2 respectively for WC1 method, threshold=92,
(f) (g) and (h) Binarized image, weightJ, weight2 respectively for proposed method, threshold=JJ2.

lor our application, almost all the objects are about 25 o the image pixels. So the
learning rate is constant up to 25 o image pixels and later on it starts decreasing. 1his can be
seen with an example in ligure 3.10, where the original image has poor contrast. Comparing to
\C1 method the results improed een or the images with poor contrast ,ligure 3.10,.
In the proposed method, the network is trained with the all the image pixels. 1he
learning process is started with a constant alue o learning rate. Ater processing o a percentage
o image pixels, which is roughly equal to the ratio o number o pixels in the oreground to the

22
number o pixels in the background or a particular application, the learning rate is decreased to
make the weight changes smaller than speciic alue. 1his is shown in ligure 3.11, where the
weights changes and threshold changes are shown. 1he weights start conerging during the
learning rate reduction.
1he chance o inding an optimum global threshold can be improed by using rational
initialization. loweer the network should hae minimum sensitiity to the network initial state.
1he simulation results show the network conerges or random alue o initial weights alues.
ligure 3.12 shows een when the initial alue is not close enough to the optimum alue the
network conerges to the proper alue.

0 5000 10000 15000
0
100
200
0 5000 10000 15000
0
100
200
0 5000 10000 15000
0
100
200
0 100 200
0
200
400
0 5000 10000 15000
0
100
200

Iigure 3.J0: WC1 and proposed method, (a) Original image, (b) Image histogram,
(c) (d) and (e) Binarized image, weightJ, weight2 respectively for WC1 method, threshold=JS0,
(f) (g) and (h) Binarized image, weightJ, weight2 respectively for proposed method, threshold=202.




23
0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|
0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|
0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|

Iteration
Iteration
Iteration
,b,
,a,
,c,

Iigure 3.JJ : Weight convergence in the proposed method, (a) WeightJ, (b) Weight2, (c) 1hreshold.

0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|
0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|
0 1 2 3 4 5 6
0
100
200
300
|
|
|
|
|
|
|
|
|
|
|
\
e
i
g
h
t
2
\
e
i
g
h
t
1
1
h
r
e
s
h
o
l
d

Iigure 3.J2 : Weights convergence and initial values in the proposed method,
(a) WeightJ, (b) Weight2, (c) 1hreshold.



24
3.3 MA1LAB Simulation Results
1he proposed algorithm is comparable to the other binarization techniques discussed in chapter
2, in terms o speed or eiciency o calculation o the optimum threshold. In this section we
attempt to show the isual perormance and optimum threshold calculation eiciency o the
proposed method. 1he algorithm has been implemented in MA1LAB and compared with a
built-in MA1LAB threshold method, which is Otsu`s method. 1he results also illustrate under
dierent circumstances, such as noisy, dierent illumination and low resolution images.
Although the main application o the proposed algorithm in this research is tracking a
laser spot on dierent backgrounds, studying the results o other applications is also interesting.
1he comparisons ealuate how the perormance o the proposed method is comparable to other
classical methods like Otsu thresholding method. Dierent types o images with arious qualities
are supplied to the proposed and Otsu`s methods and the results which are binary images are
compared to determine which method is more eicient.
1he simulation results conirm the proposed method outperorms the Otsu`s method in
most conditions, especially in poor quality cases. 1he images in this experimental study hae
been categorized to ,i, the leaser spot application, ,ii, other application.

3.3.J 1he Laser Spot Application
In the laser spot application it is necessary to extract the laser spot rom dierent background
patterns. It is also important to eriy the perormance o the proposed method or this
application under dierent circumstances.

3.3.J.J Poor Contrast Images
In a low contrast image because o haing a condensed contrast range, thresholding can be more
challenging. Although a pre-processing contrast stretching technique on the image can improe
the image quality and inluence the result o image thresholding, but it exposes a pre-processing
pass to the algorithm which is undesirable especially in real-time applications.

25
ligure 3.13 shows the results o applying the two thresholding methods to the low
contrast images, which are laser spots on dierent backgrounds. 1he result o the proposed
method is subjectiely quite better than that o Otsu`s method.

,a, ,b, ,c,

Iigure 3.J3 : Poor contrast images of laser spot, (a) Original images,
(b) and (c) Binary images proposed and Otsu's method respectively.


26
3.3.J.2 Noisy Images
Another important experiment is the study o the eect o spatial noise on the perormance o
the proposed approach. An image can be degraded by two dierent types o spatial noises,
correlatie and additie noise, o which the latter is the most prealent. As a correlatie or
multiplicatie noise, we deect the image by a speckle noise. Additie noises are Gaussian and
salt-and-pepper noise. In the study o noise eect on the perormance o the thresholding
technique or the tracking laser spot in triangulation range-inding, we inestigate how the center
o mass ,centroid, o the laser spot in the binary image may moe. 1he desired situation is where
noise cannot aect location o the center o main object in the image. In order to locate the
centroid o the object, the horizontal and ertical image projections are used. 1he
horizontal,ertical projection is a sum o pixel alues along image columns,rows. ligure 3.14
shows the results o this experiment that the center o the laser spot has minor change.
1he accuracy o the laser spot location directly aects the precision o the system. 1o
ealuate the perormance o the proposed thresholding approach the exact centroid position o
the object in the gray leel images and their binarized results are calculated and summarized in
1able 3.1. It can be seen that the changes are negligible.

1able 3.J : Centroid position changes of the laser spot.
Gray level Image
Centroid position
Binary Image
Centroid position
Original Image ,4,64, ,3,65,
Gaussian noisy image ,,59, ,,58,
Salt-paper noisy image ,80,64, ,80,64,
Speckle noisy image ,,66, ,84,66,


2
0 50 100
0
50
100
0 50 100
0
100
200
0 100
0
50
100
0 50 100
0
100
200
0 100
0
50
100
0 50 100
0
100
200
0 100
0
50
100
0 50 100
0
100
0 100
0
50
100
0 50 100
0
100
200
0 100
0
50
100
0 50 100
0
100
0 100
0
50
100
0 50 100
0
100
200
0 100
0
50
100
0 50 100
0
100

Iigure 3.J4 : Noise effect on the performance of the proposed approach showing in term of
X-projection and Y-projection of the leaser spot, (a) and (b) Original image and its binary image,
(c) and (d) Gaussian noisy (mean=0, variance=0.0J) image and its binary image,
(e) and (f) Salt-and-pepper (density=0.03) noisy image and its binary image,
(g) and (h) Speckle (variance=0.02) noisy image and its binary image.

28
3.3.J.3 Various Illuminations
Another case study is the impact o illumination or brightness on the perormance o the
proposed approach. \e set up an experiment in which illumination o the image is increased in
three steps. In each step, the thresholding algorithm is applied to the image. As there is no
spatial dierence in original gray leel images, a desired thresholding algorithm should not moe
the edges. In other words, an eicient thresholding technique is independent o the image
illumination. In ligure 3.15,a, the original images which are some plain shapes ,objects, on a
background with arious illuminations are shown. 1he histograms and the binary images are
shown in ligure 3.15,b, and ,d,. lrom the binary images we can isually say that, except or the
star`, all the edges or the triangular` and the rectangular` are remained unchanged.
To confirm the shape of the object is not degraded with the illumination changes the
difference of the binary image in each step and the former step is obtained. The difference
algorithm is a simple pixel-by-pixel image subtraction. Figure 3.16 represents the results of
consecutive subtraction of binary images in Figure 3.15(c).

29
0 100 200
0
1000
2000
3000
0 100 200
0
1000
2000
3000
0 100 200
0
1000
2000
3000
0 100 200
0
1000
2000
3000

Iigure 3.JS: Different illuminations, (a) Original images with consecutive increased illuminations,
(b) Histograms, (c) Binary images.

,a, ,b, ,c,

Iigure 3.J6 : 1he results of consecutive subtraction of binary images in Iigure 3.JS
(a) Second binary image first binary image, (b) 1hird binary image first binary image,
(c) Iorth binary image first binary image.


30
3.3.2 Other Applications
1he main assumption o the proposed algorithm is the input gray leel image should hae two
distinct groups o pixels. 1his is eectie or some application such as laser spot tracking, and
text binarization. loweer it may not be an eicient method or other application such as ace
recognition, and low resolution applications. In the ollowing sections the algorithm is applied to
the dierent applications and is compared to Otsu`s method.

3.3.2.J Document Binarization
Document binarization has been subject o research interest in the last iteen years and many
high perormance algorithms hae been deeloped |21|,|22|. loweer, some applications o the
document binarization need real-time thresholding. lor example high speed scanner can scan
and process oer one hundred pages per minute. 1he speed requirement in this system dictates a
dedicated hardware or image processing and binarization. 1ypically image captured rom
scanners by CMOS or CCD camera are conerted to binary images. A document consists o text
on a relatiely uniorm background. 1hereore conerting it to a binary image is suitable or
output and storage because it signiicantly reduces size without loss o important data. In the
text binarization applications where the image can be easily clustered into object and
background, we expected well-binarized results by applying the proposed algorithm.
ligure 3.1,a, is an image o a scanned text downloaded rom electronic text center`
website |23|. Visual perormance o the thresholding method is quiet comparable with Otsu`s
method. It successully extracts the text rom the image, despite the illumination gradient in the
main gray leel image.


31

Iigure 3.J7 : Document binarization application (a) Original images,
(b) and (c) Binary images by proposed and Otsu's method respectively.

3.3.2.2 Iace Recognition
lace recognition has been the ocus o computer ision researcher or many years. lacial
eature detection plays an important role in application such as human computer interaction,
ideo sureillance, ace detection and ace recognition. 1he problem o ace eature detection
rom gray scale ideo images dictates a ast thresholding algorithm to the system. 1he aim o
binarization or the application is detecting ace organs like eyes, nose, and lips.

32
ligure 3.18,a, shows gray leel ace images these images are downloaded rom \ale
lace Database B` |24| |25|. Although these images do not hae distinct region o oreground
and background in their histograms, the proposed weight clustering thresholding shows good
results in detection ace organs, ligure 3.18,b,.


Iigure 3.J8 : Iace recognition application (a) Original images,
(b) and (c) Binary images by proposed and Otsu's method respectively.

3.3.2.3 Low Resolution Images
In some applications, especially in real-time ones which the memory space and sampling time are
critical issues the captured images hae low resolution. Some other applications, such as ideo
sureillance where higher resolution sensors become aailable, people are interested in
recognizing objects urther away. lence, identiying and extracting objects rom lower
resolution images is still important. 1he perormance o the proposed method cannot
outperorm the Otsu`s method, but still is satisactory in terms o presering the edges, which

33
are important in the image understanding and machine ision applications. It can be
compromised with lower complexity, being suitable or hardware implementation and one-pass
behaiour o the proposed approach. ligure 3.19 shows some examples o low resolution
images. 1hese gray leel images are taken with CMOS camera.


Iigure 3.J9 : Low resolution images, (a) Original image,
(b) and (c) Binary images by proposed and Otsu's method respectively.


3.4 Summary
In this section, the proposed thresholding method was presented. \e take adantage o the one-
pass thresholding algorithm which is implemented by a simple weight-based clustering neural
network. Despite its simplicity, the perormance o the algorithm is still comparable to well-

34
known methods. 1he perormance o the proposed method was tested in dierent conditions
such as low contrast, low brightness, and noisy images.
1he proposed method is a modiied ersion o the \C1 method. 1he modiication is
applied or the learning rate which improed the conergence o the network and decreases the
sensitiity to the initial alues. In this technique the learning rate starts to decrease ater a
speciic percentage o the input image is processed. 1his ratio is roughly the ratio o the object
pixels to background pixels. Although this makes the proposed algorithm dependent on the
application the results show signiicant improements in the network conergence. loweer this
ratio is adjustable and can be easily set or dierent type o applications.
Our goal was to ind a high-perormance and easy-to-implement thresholding technique.
1he proposed method meets our requirements.



35



Chapter 4
Hardware Iramework

1he proposed algorithm needs to be implemented in hardware. Some design decisions are
dependent on the hardware platorm and hae to be made at the beginning o the design phase.
lor example, processes or writing and reading images depend on the storage capacity and
structure o the platorm. 1he target platorm used or hardware implementation in this thesis is
introduced in this chapter.


4.J Introduction
lield Programmable Gate Arrays ,lPGAs, hae been recently used as an eectie platorm or
implementing many image processing applications. 1hese reprogrammable deices contain a
collection o programmable logic blocks interconnected ia wires and programmable switches.
Logic unctionality or each block is speciied ia a small programmable memory, called a lookup
table, drien by a limited number o inputs, which generates a single Boolean output. \hile early
lPGA architectures contained small numbers o logic blocks, new deice amilies hae quickly
grown to capacities o tens o thousands o lookup tables containing millions o gates o logic.
In this work we hae used the VlDL ,Very high speed circuit lardware Description
Language, language to program the lPGA. lor implementing a hardware design in lPGA
seeral steps are required. According to the description o the requirements and algorithm, a

36
block diagram illustrating the unctionality o the hardware, a data low diagram showing the
picture o the data path, a state machine describing the control behaiour, and optimized VlDL
model describing the required hardware can be designed.
A VlDL model consists o two main parts, the entity and the architecture. 1he entity
part deines the input and output ports o the model while the architecture deines the behaiour
o the model. A VlDL entity can be compared to a discrete component, and it is important to
speciy the interaces between the VlDL entities. \hen the VlDL models or the hardware
design are ready, simulations can be run. Generally the aim o simulation is to eriy the
behaiour o the models. 1he simulation is run in a simulation tool with a test bench. 1here are
dierent leels o the simulation. 1he primary simulation is just or the unctionality o the
models and it does not take into account the timing hardware constraints. 1he simulations
coering timing inormation and constraints are run ater the synthesis.
\hen the primary simulation results shows that the unctionality satisies the
requirements, synthesis o the design can begin. In the synthesis process, the VlDL models are
transormed into physical hardware taking the timing inormation to account. 1iming
inormation and constraints are gien to the tool prior to the synthesis. 1he synthesis tool does
the timing requirements based on a predeined clock requency and the optimization options
proided by the tool.
1he process generating hardware rom synthesis has to be placed on the lPGA loor
plan. 1his generated hardware inormation rom place and route tool is a ile that is used or
static timing analysis and back annotation. 1he static timing analysis shows i the hardware
generated, unctions at the required requencies, and i all constraints are met. 1he back
annotation can be used to eriy that the hardware generated has the unctionality speciied in
the VlDL models.


4.2 Platform Overview
1he proposed algorithm or image binarization is implemented on XLSS XSA-50 Spartan II
deelopment board |26|. 1he inherent eatures o this board make it suitable or our application.
1he board has high speed Spartan II lPGA. Also on board SDRAM XSA-50 is an appropriate

3
interace between the proposed hardware design and a CMOS Camera On a Chip` image
sensor.
ligure 4.1 shows the XSA-50 board and its components. 1he XSA board consists o the
ollowing components,
`C2:0 artav PC.. 1his ield programmable gate array ,lPGA, is the main
programmable logic cell on the XSA Board. It has 50k gates in a 144-pin PQlP package.
`C:2` CPD. 1his complex programmable logic deice ,CPLD, manages the
interace between the PC and the components o the XSA board ia parallel port.
O.cittator. 1he oscillator on this board is programmable. It generates the master clock or
the board. It has maximum requency o 100Mlz that can be diided to proide
requencies o 100Mlz, 50 Mhz, 33Mhz, 25Mhz., 48. Klz.
ta.b. A 128 KByte llash deice is or storing non-olatile data and coniguration
bitstreams or the Spartan II lPGA.
DR.M. An 8 MByte SDRAM is or olatile data storage accessible by the lPGA. It is
organized as 4 banks o 1,048,56 x 16. All the inputs and outputs are synchronized with
the rising edge o the input clock. 1he SDRAM includes some programmable options
such as the length o pipeline and the numbers o consecutie read or write cycles
initiated by a single control command, and the burst count sequence.
D. A seen-segment LLD allows isible eedback as the board operates. It can be
used or test and debugging purposes or lPGA or CPLD.
DP .ritcb. A our-position DIP switch passes settings to the XSA Board or controls the
upper address bits o the llash deice. 1his switch is accessible rom lPGA and CPLD.
Pv.bbvttov. A single pushbutton input to the lPGA. 1his can be used or test and debug.
Parattet Port. 1his is the main interace or passing coniguration bitstreams and data to
and rom the XSA board.
P,2 Port. A keyboard or mouse can interace to the board through this port.
1C. Port: 1he board can send signals to display graphics on a VGA monitor through
this port.


38

Iigure 4.J : XSA board.

1he XSA-50 board is used only as demonstration hardware. In the next section the
practical system and test system set up or the hardware implementation are described.


4.3 System overview
ligure 4.2,a, shows the block diagram o the test system in which the XSA board is used as our
hardware implementation platorm. It consists o three major parts, CMOS camera, a PC, and
XSA board. 1he unctions o each part are described as ollows.
1. CMOS camera, this unit captures one rame o image rom an object in a digital ormat.
1his image is transerred to the PC through a USB bus.
2. PC, the role o PC in test system is to proide an intermediate leel to interace lPGA-
based processing unit to the image sensor which is CMOS camera. In the irst place, the
PC receies one rame image rom CMOS camera and proides it to the XSA board to
be processed. 1he PC also receies and displays the image ater binarization. In practical
system the PC will be substituted with a customized interace, which manages all
transactions between camera,display rom one side and the thresholding process unit on
the other side. 1his alternatie system is depicted in ligure 4.2,b,.

39
3. XSA board, which contains o two major components. 1he SDRAM or storing gray
leel and binarized image, and the lPGA which includes the main core o the
thresholding process.


Iigure 4.2 : (a) 1est system, (b) Practical system.

1here are two dierent programming modes or the test system. One programming
mode is or coniguring the lPGA based on the design model, and the other programming
mode is or downloading,uploading the captured image to,rom the board and o-chip
SDRAM. In the ollowing sections the sequences o the two programming modes are described.

4.3.J Downloading the Design to XSA board
1he processes in which the design code is loaded to the board, is reerred to as coniguration.
Once the board is conigured and programming inormation loaded, the lPGA switches rom
the programming mode to the operational mode. In the operational mode the designed logic is
run. 1he logic works until the lPGA is reprogrammed or the power is turned o. Coniguring
the lPGA consists o the ollowing steps, as illustrated in the ligure 4. 3 |2|. 1he proposed
thresholding algorithm is modeled and the description o the logic circuit is proided by using a
hardware description language VlDL. lor transorming the VlDL code into a netlist we used
a logic synthesizer program, `itiv .2 |28|. In act the netlist is a description o the logic
gates in the design and their interconnection. 1he implementation tools used to map the logic
gates and interconnections into the lPGA is .2 as well. 1he lPGA consists o many
conigurable logic blocks ,CLBs, which can be urther decomposed into look-up tables ,LU1s,

40
that perorm logic operations. 1he CLB and LU1s are connected with arious routing resources.
1he mapping tool collects the netlist gates into groups that it into the LU1s and then the place
& route tool assigns the gate collections to speciic CLBs while opening or closing the switches
in the routing matrices to connect the gates together. Once the implementation phase is
complete, a program extracts the state o the switches in the routing matrices and generates a
bitstream where the ones and zeroes correspond to open or closed switches. 1he bitstream is
downloaded into the lPGA chip. 1he switches in the lPGA are opened or closed in response
to the binary bits in the bitstream. Upon completion o the downloading, the lPGA will
perorm the operations speciied by the VlDL code. `itiv proides the VlDL editors,
logic synthesizer, itter, and bitstream generator sotware. 1he `1OOs rom ` |29|
proide utilities or downloading the bitstream into an XSA Board containing a Xilinx XC2S50
Spartan II lPGA.


Iigure 4. 3 : Board programming flow.


4.3.2 Downloading/Uploading the Image
1he XSA board contains an 8-MByte synchronous DRAM. 1he data stored in the SDRAM can
be downloaded and uploaded by C`O.D tool. 1his is used or downloading an image rame

41
beore the lPGA starts to operate and or uploading the binarized image ater lPGA completes
the operations.
1he SDRAM needs to hae access to the parallel port or downloading,uploading the
data. 1he `1OO. proide an interace to download a .bit ile to the lPGA. Upon
downloading, the lPGA is reprogrammed to create an interace between the SDRAM and the
PC parallel port. 1he lPGA has access to the PC parallel port through the CPLD. 1hereore the
CPLD has to be loaded with a ..rf ile to create an interace between the parallel port and the
lPGA in adance. ligure 4.4 represents the SDRAM programming low.


Iigure 4.4 : SDRAM programming flow.

1he `1OOs allows us to load the SDRAM with .eo, .vc., .be, or .e. iles. \e used
.e. ormat to download,upload the image. 1he .e. ormat is hexadecimal ormat with 16-bit
addresses. 1his is a simpliied ile ormat that does not use checksums. 1he contents o the .e.
iles are downloaded into the SDRAM through the parallel port. 1he lPGA remains conigured
as an interace between the PC and the SDRAM.
\ith the `1OOs we also are able to read the content o the SDRAM by uploading it to
the PC. 1o upload data rom an address range in the SDRAM upper and lower address range is
required. 1he lPGA on the XSA Board is reprogrammed to create an interace between the
RAM deice and the PC parallel port. 1he SDRAM data between the high and low addresses
,inclusie, is uploaded through the parallel port. 1he uploaded data is stored in a ile in .e.
ormat.
1he 16-bit data words in the SDRAM are mapped into the eight-bit data ormat o the
.e. iles using a Big Lndian style. 1hat is, the 16-bit word at address N in the SDRAM is stored

42
in the eight-bit ile with the upper eight bits at location 2N and the lower eight bits at location
2N-1. 1his byte-ordering applies or both SDRAM uploads and downloads.

43



Chapter S
Algorithm Implementation in Hardware

1his chapter discusses hardware implementation o the adaptie image thresholding. 1he
proposed method has the adantage o being computationally simple without decreasing the
eiciency o calculating the optimum threshold. 1hese eatures o the algorithm make it ery
suitable or hardware implementation. In this chapter the oeriew o the hardware
implementation is presented and each indiidual processing unit is described in detail. 1he
simulation and perormance results are analysed as well.


S.J Hardware Block Diagram
Lssentially the hardware includes three major modules, memory controller unit, weight-updating
unit, and thresholding unit. 1he modules and their interconnections are illustrated in ligure 5.1.
1he memory controller unit` generates timing or control signals to send,receie image
pixels data to,rom the o-chip memory. 1he weight-updating unit` is, in act, an arithmetic
processor to calculate the weights and threshold alues. Lach input pixel is read rom memory,
compared with the weights and the closer weight is updated. 1he update is done based on the
dierence between the input pixel and the weight, scaled by a learning rate actor. Once a
complete rame o the image is processed, the center o background and oreground clusters is
computed. 1hresholding unit` determines the threshold alue by aeraging the weights. 1hen

44
each pixel o the same image rame is read rom o-chip memory ia memory controller unit.
Lach read pixel is compared to the threshold alue, and the result is written back to the memory.

Memory
Controller
Unit
Weight
Update
Unit
1hresholding
Unit
Data
Read
Done
Address
Data
Read
\rite
Done
Start
Address

Iigure S.J : 1hresholding block diagram.

All o these units are modeled with VlDL. In the ollowing sections the unctional
perormance o each module is described in more details.


S.2 Memory Controller Unit
1he hardware architecture depends on the hardware platorm, in this case the XSA board. lor
example in the primary hardware design phase a choice has to be made or the captured image
storage. 1his storage can be either o-chip or on-chip memory. 1he Spartan-II lPGA on XSA
board proides dedicated block o on-chip, dual-read,write port synchronous RAM. Lach port
o the block RAM memory can be independently conigured as a read,write port, a read port, a
write port, and can be conigured to a speciic data width. 1he adantages o using on-chip dual
port memory are irstly that there is no need to the memory controller module. Secondly, with
dual port memory, the updating module and the thresholding modules are able to write and read
rom the memory simultaneously which means they can work in parallel. 1his parallel structure
will signiicantly enhance the speed. 1hereore, using on-chip dual port memory can signiicantly
improe the speed and area o the hardware design. But the main drawback o using on-chip
memory is its small size. 1he typical image size in our application is 128 x 128 which at least

45
needs 132K memory size. But XC2S50 has only 32K RAM aailable. 1hereore the o-chip
memory is used or image storage.
1he memory on the XSA board is high-speed CMOS, dynamic random-access memory
containing 6,108,864 bits. It is internally conigured as a 4-bank DRAM with a synchronous
interace and all signals are registered on the rising positie edge o the clock signal. Lach o the
banks is organized as 4096 rows by 256 columns by 16 bits. Read and write accesses to the
SDRAM are burst. Accesses begin with the initiation an AC1IVL command, which is then
ollowed by a RLAD or \RI1L command. 1he address bits registered coincident with the
AC1IVL command are used to select the bank and row to be accessed, two input pins to select
the bank and 12 input pins select the row. 1he address bits registered coincident with the RLAD
or \RI1L command are used to select the starting column location or the burst access.
1he memory controller unit is a SDRAM controller that accepts simple read and write
requests and generates the timed signals required to perorm the operations on the SDRAM.
1he controller unit also manages the reresh operations needed to keep the SDRAM data alid,
and will place the SDRAM in a sel-reresh mode so data remains alid een i the controller
terminates operation.
1he controller is implemented in VlDL and is part o lPGA logic. 1he core o the
VlDL code or this unit is adapted rom the XSA board website |30| and some modiication
has been done to optimize it or our design. ligure 5.2 shows the data and control path between
the memory,memory controller and memory controller,internal logic.
Generally the interace signals o the memory controller unit are diided into two
groups, memory, memory controller signals which are interacing the o-chip SDRAM, and
memory controller,internal logic signals which are interacing the internal logic. 1able 5.1 and
1able 5.2 describe the signal unctions or both interacing.


46

Iigure S.2 : Memory controller interfaces.

1able S.J: Memory/Memory Controller Interfacing.
Signal Name Signal 1ype Signal Description
mc_cke output
Controls internal clock signal. \hen it is deactiated the SDRAM
will be on power down, suspend or sel reresh state. It directly
output to the clock-enable input o the memory.
mc_cs output
Dries the chip-select o the memory. It enables all inputs to the
SDRAM except or clock, clock-enable and data input,output.
mc_ras output Dries the row address strobe ,RAS, input o the memory.
mc_cas output Dries the column address strobe ,CAS, input o the memory.
mc_we output Dries the write-enable input o the memory.
mc_ba output
Selects one o the our banks o memory. Bank address inputs
deine to which bank an AC1IVL, Read, \rite or PRLClARGL
command is being applied.
mc_data input,output
Input,output data word to read,write rom,to memory during
read,write operation.
mc_addr output Output the row and column address or the memory location.
udqm output
Dries the SDRAM input that controls the driers or the upper
hal o the data bus during read operations.
ldqm output
Dries the SDRAM input that controls the driers or the lower
hal o the data bus during read operations.


4
1able S.2 : Memory Controller/Internal Logic Interfacing.
Signal Name Signal 1ype Signal Description
reset input
Resets the logic o the controller unit and also initialize the
SDRAM.
cl_rd input
Initiates a read command rom the memory. It is sampled on the
rising clock edge and is held high or read operation. It also must
be low ater done signal indicates the end o the read operation.
cl_wr input
Initiates a write operation. It is sampled on the rising edge o
clock. 1he internal logic holds this signal high or the write
operation and must be low ater the done signal goes high.
prog output Indicates the initiation o a read or write operation.
done output Indicates the completion o a read or write operation.
cl_addr input
Indicates the address o the SDRAM word that is to be read or
written. 1he internal logic must keep the address alue during the
read and write operation. 1he two most-signiicant bits indicates
the bank address bits o the SDRAM, the next 12 bits correspond
to the row address within that bank, and the least signiicant 8 bits
correspond to the column address within that row.
cl_data input,output
1he data or read,write rom,to memory is put on this bus. lor
the write operation the data is alid during the whole operation.
lor read operation the data is latched in the internal logic.

S.2.J Read/Write Operation 1iming
1he timing diagram or a read operation rom memory is shown in ligure 5.3,a,. In this diagram
it is assumed the read operation accesses a memory location in the currently actie bank and row
o the SDRAM. 1he sequence o the read operation is:
11: 1he memory controller put the SDRAM address, ct_aaar, on the bus and the read
control signal, ct_ra, is drien high. 1hen the rog signal is actiated i the SDRAM
controller is able to begin the read operation. 1he address and read control must be held
stable at least until the next rising edge ater rog goes high.
12: 1he rog signal goes high. 1he column address is, vc_aaar, output that directly goes
to the SDRAM chip and the SDRAM control signals are set to initiate a read operation.

48
1here are some possibilities that the memory controller unit delay the initiation o the
read operation. lirst case, while a row is being rereshed memory controller completes
the row reresh operation. 1hen the SDRAM banks are precharged and the bank and
row containing the gien address are actiated. 1hen the read operation can progress.
Second case, when the gien address is not in the currently actie bank or row o the
SDRAM. 1he memory controller precharges the SDRAM banks and the bank and row
containing the gien address is actiated. loweer, when the initiation o the read
operation is delayed, the rog signal is held low until the read operation is actually
initiated.
13: 1he SDRAM initiates a read o the gien column address on the rising clock edge.
1 4: 1he SDRAM waits or the data to arrie rom the gien address.
15: 1he data rom the SDRAM arries sometime during this cycle and is guaranteed to
be stable by the end o the cycle.
16: 1he data rom the SDRAM is clocked into a register on the rising clock edge. 1he
aove signal goes high to signal the internal logic that the data is aailable on the ct_aata
bus. 1he read control must be lowered beore the next rising clock edge or else another
read operation will be initiated.
1: 1he aove signal goes low again but the data on ct_aata remains stable until another
read operation is completed.
1he timing diagram or a write operation is shown in ligure 5.3,b,. 1he sequence o a wire is
almost the same as the read operation.
11: 1he SDRAM address and the data to be written are put on the bus and the write
control signal, ct_rr, is drien high. I the memory controller is able to begin the write
operation, then the rog signal goes high. 1he address, data and write control must be
held stable at least until the next rising edge ater rog goes high.
12: 1he rog signal goes low. 1he data and the column address are output on the pins
that go to the SDRAM chip and the SDRAM control signals are set to initiate the write
operation. 1he aove signal goes high because the memory controller is eectiely done at
this point since the SDRAM can complete the write operation on its own. Like the read
operation there are seeral cases when the memory controller delays the initiation o the
write operation.

49
13: On the rising clock edge the SDRAM latches the address and data and initiates a
write operation. 1he output driers on the data bus are disabled to ree the SDRAM data
bus.
14 and 15: 1he SDRAM continues its internal operations to write the data into the gien
address. In the preious sequence o actions, it was assumed the write operation was
initiated as soon as the write control signal was asserted.


Iigure S.3 : Memory Controller timing diagram.

1here seeral timing requirements or communication with SDRAM in the data sheet
like minimum initialization interal, minimum interal between actie to precharge, minimum
interal between actie and RLAD,\RI1L commands, maximum reresh interal duration o
reresh operation, minimum precharge command duration, and write recoery time. 1o consider
the timing constraints a generic parameter is set in VlDL code. 1his parameter represents the
operating requency o the memory controller. All other required timing parameters are set
based on this requency. 1his parameter is used to determine the widths o the timers that
sequence the controller operations.


S.3 Weight-Updating Unit
1he thresholding process starts rom the weight-updating unit. 1his process is controlled by an
external signal, .tart of frave, i this signal is not actiated the complete system remains in the
reset state. 1he system starts operating only when this signal is actie. So, the weight-updating

50
module waits or the .tart of frave signal, the push button on the XSA board. 1hen it actiates a
control signal, ct_ra, to request a read operation or a pixel rom the memory controller unit. It
also puts the address o the pixel on the address bus, ct_aaar. By the time the memory controller
unit dries the aove signal high, the alid data is ready on the data bus, ct_aata. Ater the weights
are updated or the pixel, the ct_ra is actiated again to request a read operation or the next
pixel. 1his continues until the eva of frave signal is actiated by the memory controller unit. As
the result two weight alues are output to the thresholding unit. ligure 5.4 illustrates the
lowchart or the weight-updating unit.

Read a pixel
Compare
Update \1 Update \2
Lnd o
lrame
Store weights
yes
\1 \2
yes
Start o
lrame
Initialize weights

Iigure S.4 : Weight-updating unit flow chart.

1his unit is implemented in VlDL. 1he weight-updating circuitry computs the output
data based on the input data so its complexity is dominated by its data path rather than control
or storage. In act the major arithmetic operations on the input data alue are proided here.
1he network is implemented with two weights
1
W and
2
W . 1he input image, , is compared to
them and either o weights are updated with equations ,5-1, and ,5-2,.


51
) 1 ( * 1 1
old i old new
W I W W + = (S-J)
) 2 ( * 2 2
old i old new
W I W W + = (S-2)

Lach weight is compared to the input and the comparator inds the smaller input-weight
distance. 1hereore the comparator must compare the absolute alues. 1he result o the
comparator is used to determine the weight with smaller distance to be updated. 1he dierence
or each weight is scaled by the learning rate and the result is added to the weight alue to obtain
the updated alue. lor each input pixel the comparator allows only one weight to get updated
while the other remains unchanged.
In this unit the weights are initialized with constant alues. As it was discussed in the
chapter 3 this algorithm is not sensitie to the initial alues, so the initial alues are deined as
constant ariables in the VlDL code. 1hese constants are set to 128, the midpoint o the
histogram o gray scale images.
1he weight-updating unit needs to know the size o the image. 1his alue must be
initiated beore the .tart of frave signal is actiated. lere we hae assumed identical input image
size. 1he start address and the end address o the input image in the SDRAM are passed through
a generic ariable to the weight-updating unit in VlDL code.
1his unit is implemented with two separate processes. One combinatorial process which
describing the logic between the lip lops process, and the other sequential or clocked process
or initializing the registers and describing the lip lops. In the combinatorial process all o the
arithmetic operations are combinatorial. 1he inal result or one weight update is registered. 1he
ollowing is the pseudo code o it.


52
-- Combinatorial process
Process (cl_data, w1_pres, w2_pres)
-- Variables definition
begin
-- Variables assignment
If dif1<= dif2 then
w1_next <= . . .;
w2_next <= . . .;
Else
w2_next <= . . .;
w1_next <= . . .;
End if;
End process;

-- Clocked process
Process (clk, rst)
Begin
If clk'event and clk = '1' then
If rst = '1' then
-- signals initialization
Else
w1_pres<=w1_next;
w2_pres<=w2_next;
End if;
End if;
End process;
End Behavioral;


In order to eriy the VlDL model works as intended it has to be tested. lor testing the
VlDL models simulation tools and a test bench are required. Using the test bench it is possible
to apply the appropriate combination o test stimuli. 1est stimuli deine the input signals or the
VlDL entities, and it is important to speciy a suicient number o stimuli to test the required
unctionality o the design. 1he VlDL models or the lPGA designs were simulated using
Moaetiv ` re:.e.
1he purpose o testing the weight-updating module is to eriy the desired alue o the
weights or each input pixel. As test stimuli a simple combination o random alues is applied to
the input data, ct_aata. Once the reset signal, re.et, is actiated the weights are initialized with their
initial alues 128. 1he result o the unctional simulation can be seen in ligure 5.5.


53

Iigure S.S : Iunctional simulation result for weight-updating unit.

1he desired results or each input pixel are calculated by substituting in equations ,5-1,
and ,5-2, and compared with the simulation results in 1able 5.3. Comparing the results conirms
that the weight alues ,\1 and \2, are correct. It is also important to ind out rom the
simulation results that the weight alues or each pixel are ready at the same clock cycle. As it
was explained preiously or each clock cycle only one weight is changed another weight latched
its preious alue.

1able S.3 : 1he calculated values for weights.
clk Cycle Reset

Input pixel
(cl_data)
WeightJ
(WJ)
Weight2
(W2)
0 0 98 X X
1 1 1 128 128
2 0 2 100 128
3 0 32 66 128
4 0 18 66 153
5 0 102 84 153
6 0 231 84 192
0 29 56 192
8 0 48 52 192
9 0 139 52 165

Generally each object in the VlDL model has a type and a class. 1ype indicates what
type o data the object contains and class indicates what can be done with the object. Deining
an object means instancing a constant, ariable and signal. It is important to assign a correct type
and class to the objects o the design. In the next two sections the data type and data width o
the modeled weight-updating unit is described.


54
S.3.J Data 1ype
In the weight-updating unit some signed operators are used, such as calculating the dierences,
so the internal data path has to be signed. loweer, the basic data type o the VlDL is limited
when using the language or R1L design. 1o do the signed arithmetic with the data in the
circuitry design, a strong and supportie VlDL package is required. 1wo ILLL standard
packages vvveric_.ta and .ta_togic_aritb are most common used packages. \e used the vvveric_.ta
package or the data path signed arithmetic in the VlDL implementation. 1his package deines
arithmetic oer .ta_togic ectors and integers. 1he .ta_ togic_ aritb package has less uniorm
support or mixed integer,signal arithmetic and has a greater tendency or dierences between
tools. 1he vvveric_.ta also deines types signed and unsigned, which are .ta_togic ectors on which
the signed or unsigned arithmetic can be perormed simply.

S.3.2 Data Width
Normally the gray leel alue o an image pixel is an integer between 0 and 255 and is
represented with an 8-bit signal. But here as equations ,5-1, and ,5-2, show the internal data
alues hae to be between -255 and 255. In order to coer the signed range one extra sign bit is
added to the signals and they are represented with 9-bit. On the other hand data access to the
SDRAM is in 16-bit signal. So the output and input data path need to be 16 bit as well. 1o
compensate the mismatch between internal data path and interace data path, the inal results
orm internal data path is concatenated to 0000000`.
In this algorithm we also need reat numbers or some parts o the numerical
computation. lor example the learning rate is a reat number between 0 and 1. 1here are many
ways to represent non-integers, such as loating point. lloating point allows a wide range o
alues to be represented its numerical stability has been well studied in the research literature.
But loating-point arithmetic units consume signiicantly greater hardware resources than the
integer arithmetic |31| and this make it more suitable or million gate lPGA like Xilinx Virtex
series. O course so many enhancement and optimization has been proposed or the real
numbers representation |32| |33|. Since the resource o the current lPGA deice is limited and
because the ocus o this algorithm is not on the high precision o the numbers, all numbers are
represented in integer and an approximation is applied or the arithmetic.

55
ligure 5.6 explains an example in which the approximation is applied. In this example
the inputs o the multiplier are the dierence between input and weight which is an integer and
represented with D, and learning rate which is real and represented with atba. D is a signed
integer alue between -255 and 255 and is represented in 8-bit plus one sign bit. .tba is a real
alue between 0 and 1 and it is scaled between 0 and 255. So it can also be represented by 8-bit
plus one sign bit. 1he output o the multiplier is an 18-bit real number. 1he 8 least signiicant
bits o the product are in act the mantissa. I these bits are truncated the let bits are rounded
alue o the real product.


Iigure S.6 : An example for approximation.

1he data type in data path the employed objects in the weight-updating unit and their
classes, types, and widths are explained in 1able 5.4.

1able S.4 : Objects in weigh-updating unit.
Object Description Class Data
1ype
Data
width
cl_data Input,output data rom,to memory controller Signal Unsigned 16
addr Address to memory controller Signal Unsigned 22
addr_pres Address register or current state Signal Unsigned 22
addr_next Address register or next state Signal Unsigned 22
w_pres \eight registers or old alue Signal Signed 9
w_next \eight registers or new alue Signal Signed 9
w_init, \eights initialization registers Constant Signed 9
alpha Learning rate ariable Variable Signed 9
prod Multiplier production alue Variable Signed 18
prodt 1runcated multiplier production alue Variable Signed 9


56
S.4 1hresholding Unit
Once the weight-updating unit receies the eva of frave signal indicating the completion o one
rame weight update process, the weight alues are sent to the thresholding unit. lirst the
threshold alue is calculated by aeraging the weight alues. 1hen the thresholding unit puts the
address o the irst pixel on the address bus and actiates the read control signal, ct_ra. Ater the
memory controller unit dries aove signal high, this unit read the pixel rom the data bus and
compares it to the threshold alue. I the read alue is less than the threshold the write control
signal, ct_rr, is drien high and 0` is output to the data bus to be written to the same address o
the memory, otherwise 1` is output to the data bus. \hen the aove signal rom the memory
controller unit indicates the completion o the write operation, the unit request or the next read
operation and put the address or the next pixel on the bus. 1his process is repeated until the eva
of frave signal terminates the process. ligure 5. shows the low chart or thresholding unit.


Iigure S.7 : 1hreshold unit flow chart.

In order to eriy the unctionality o the thresholding unit the unctional simulation o
the unit is shown in ligure 5.8. 1he threshold alue is calculated by the time the eva of frave

5
signal goes high. Lach data input rom aiv bus is compared to the threshold alue and the result
is sent out to the aovt bus.


Iigure S.8 : Iunctional simulation result for thresholding unit.


S.S Clock Distribution
1he clock source o the XSA board is proided by programmable oscillator. 1he clock signal
rom this oscillator is directly connected to a dedicated clock input o the CPLD. 1he CPLD
passes the clock signal on to the lPGA. 1his allows the CPLD to control the clock source or
the lPGA. Physically the SDRAM clock signal is rerouted back to a dedicated clock input o the
lPGA to allow synchronization o the lPGA`s internal operations with the SDRAM operations.
ligure 5.9 shows the clock distribution block diagram or the complete design.
1he master clock input or the memory controller, ct/, is connected to the lPGA master
clock input pin. 1he programmable oscillator proides the lPGA clock through the CPLD.
1hereore the CPLD can control the input clock o the lPGA. 1he memory controller
generates an output clock, vc_ct/, which is deried rom the master clock and is connected to
the input clock o the external memory. A copy o the memory clock signal called vc_ct/_b is ed
back into the lPGA input global clock. 1his signal passes through the lPGA and external
memory and back to the lPGA again, so there would be delays. 1o preent rom this delay a
DLL is used to synchronize the ct/ and vc_ct/_b. 1his helps all the signals o the memory
controller and external SDRAM register at the positie edge o the clock.
1he clock source or the internal logic circuitry is also proided by the memory
controller unit. Another DLL is used to synchronize the memory controller unit and the internal
logic units.


58

Iigure S.9: Clock distribution.



S.6 Implementation Results
1he results rom hardware implementation are analyzed in terms o isual perormance,
speed, and area consumption or the thresholding technique.

S.6.J Visual Performance
1he XSA board is programmed with the implemented design. Binarization has been perormed
on more than 100 dierent input images. 1he method is able to eectiely calculate the
optimum global threshold or all o the images or which Otsu`s method is successul. ligure
5.10 shows actual outputs rom the hardware thresholding in comparison with the Matlab
simulations. lor images with dierent sizes the XSA board will hae to be reprogrammed.


59
,a-2, ,b-2, ,c-2,
,a-3, ,b-3, ,c-3,
,a-4, ,b-4, ,c-4,
,a, ,b, ,c,
,a-1, ,b-1, ,c-1,

Iigure S.J0 : Hardware vs. Matlab results for normal images (a) Original images,
(b) and (c) Binary images hardware and Matlab thresholding respectively.

1he binarized images rom the hardware designs are compared to the binarized images
rom Matlab in ligure 5.10. 1hese images are samples o normal bimodal images with good
contrast. 1he applied approximation or the data path in hardware implementation does not
hae isible eect on the binarized images. loweer in the case o poor contrast images where
the background and oreground are not distinct groups the applied approximation degrades the
binarized images. 1he eect o the approximation or poor contrast mages is shown in ligure
5.11.


60

Iigure S.JJ : Hardware vs. Matlab results for poor contrast images (a) Original images,
(b) and (c) Binary images hardware and Matlab thresholding respectively.

1o assist the realization o the changes in the threshold alue the numeric results or
ligure 5.10 and ligure 5.11 are summarized in 1able 5.5. As the result in this table shows there
is not signiicant dierence between the change rate o the threshold alue in the case o good
contrast and poor contrast. But in poor contrast images the tolerance rate or the optimum
threshold alue is too small. 1his causes the degradation in the binary result or the images.


61
1able S.S : 1hreshold value with approximation and without approximation.
Image Contrast 1hreshold value (1J)
with approx. (HW)
1hreshold value (12)
w/o approx.
Changes
(1J/12)
ligure 5-10,a-1, good 136 158.096 1.162
ligure 5-10,a-2, good 102 110.1 1.058
ligure 5-10,a-3, good 1 8.98 1.234
ligure 5-10,a-4, good 140 160.936 1.149
ligure 5-11,a-1, poor 165 18.53 1.082
ligure 5-11,a-2, poor 20 216.502 1.041
ligure 5-11,a-3, poor 15 186.2 1.06
ligure 5-11,a-4, poor 110 131.349 1.194

S.6.2 Speed
1o compute the speed o the system process it is required to determine the relation between a
rame process total time and number o image pixels. 1he total process time per image is weight-
updating process time plus thresholding process, that is

) ( * ) ( * ) (
2 1 wr rd p rd p
t t t n t t n frame T + + + + = (S-3)

where n is the number o image pixels,
1 p
t is operating time or weight-calculating process per
pixel,
2 p
t is the operating time or thresholding process per pixel,
rd
t is memory read access
time per pixel, and
wr
t is memory read access time per pixel. \here or this design the memory
read access time and memory write access time are

t t
rd
* 5 = (S- 4)
t t
wr
* 1 = (S- S)

So the total process time or a rame is determined by equation ,5-6,

t n frame T * * 13 ) ( = (S- 6)

1he maximum operation requency o the design is determined by the critical path. 1he
speed perormance o the design is analyzed with Static timing analyzer` generated by ..1.

62
1he static timing analyzer is run ater the place and routing o the design, where eery primitie
element o logic and interconnect has been determined. 1he analyzer determines the critical path
by tracing eery circuit path rom lip lop to lip lop or eery clock. 1his calculates the
maximum operation requency o the circuit.
1he worst-case delay path or so called critical path or the design is shown in ligure
5.12. 1he timing analyzer report indicates that the worst-case delay path or the design is 16.201
ns. It means eery stage could be clocked with minimum 16.201 ns. 1hereore the total process
time or a 128 x 128 image size the minimum total processing time is 3.25 ms ,13128128
16.201ns,. Comparing to the real-time application requirement 30 rames per second, the
implemented design is able to meet the timing requirement. And or bigger image with size 1000
x 1000 the process time per rame is 0.21s.
1he numeric results indicate the implemented algorithm is suitable or high speed
applications. lrom equation ,5-6, it is clear the total time or thresholding rame changes linearly
with the number o the image pixels. Comparing with Otsu`s method where the threshold is
calculated using histogram this relation is not linear ||. By increasing the image size the process
time
1 p
t signiicantly increases. Generally or all the histogram-based thresholding methods an
extra time is required or computing the image histogram. Also this extra time diers rom
algorithm to algorithm.

63

Iigure S.J2 : Critical path.

64
S.6.3 Area
In terms o area the design also has ery good perormance. 1he storage consumed in this
implementation is ery eicient. lor example two registers are required to store the weights.
1wo temporary locations are used or storing the dierences. But in Otsu`s method and all
histogram-based thresholding methods or storing the histogram 256 locations are required.
1he place and route report rom ..1 shows numbers or consumed logic. 1hese
results are briely shown in 1able 5.6

1able S.6 : Logic consumption in IPGA.
Name Unit Description Numbers of units
IOs 53
Registers Macro 36
Multiplexers Macro 4
1ristates Macro 1
Adders,Subtractors Macro 14
Comparators Macro 5
BLLS Cell 86
llip llops,Latches Cell 226
Clock Buers Cell 2
IO Buers Cell 50
DLLs Cell 2


In summary the number o used Slices is 288 out o 68 ,3 ,, number o Slice llip
llops are 226 out o 1536 ,14,, number o 4 input LU1s is 515 out o 1536 ,33 ,, number
o bonded IOBs is 50 out o 96 ,52,, number o GCLKs is 2 out o 4 ,50,.

65



Chapter 6
Iuture Work

1he main motiation o this research is proposing an image thresholding algorithm which is ast
enough to employ it in a real-time imaging system. In order to design a real-time image
processing system, the major constraint is speed`. Generally in real-time image processing
systems no unlimited processing time is allowed. 1he only time interal between two rames
should be used to: ;i) image capturing, ;ii) pre-processing like iltering, histogram calculation,
contrast stretching, ;iii) main image processing task like thresholding, edge detection, or
segmentation, and ;ir) image outputting.
1o achiee a real time image processing solution, the design can be implemented by
either sotware or hardware. In sotware implementation a wide range o multi-purpose
processing units, rom DSP chips to personal computers can be used. loweer using these
processors may trade-o between complexity, cost, speed, and lexibility.
One solution is deeloping the real-time processing application sotware in a DSP
processor. 1he beneit o using DSP chip is its lexibility to modiy the sotware o real-time
image processing application. On the other side, the more complicated the image processing
algorithm, the more critical the processing time can be in the DSP chip. lor the complex
algorithm DSP chip cannot process at real-time and may miss some rames. In these cases
parallel structure can be used to improe the speed. But a single DSP chip cannot proide
parallelism.

66
Alternatiely, the hardware implementation can be used as a solution. A hardware
implementation may sae many temporal oerheads present in the sotware implementation.
loweer the drawback o hardware implementation is the lack o lexibility, expensie design,
and debugging processes. 1o achiee more lexibility, the lPGA approach is a smart choice
because the structure o hardware implementation is ully conigurable.
In addition to speed, the prominent adantage o hardware is its potential capability to
realize parallelism in dierent layers rom operations` to processes`. Operation is a ery low
leel task while processor is higher leel task. An example o parallel operation in our
implementation is inputting a data byte to two dierent units including a comparator and a
multiplier. 1he comparator compares the input data with other data and multiplier multiplies the
data by a coeicient. 1hese two operations can be accomplished in parallel.
1he three major processes or the proposed hardware are capturing an image rame and
write to the memory, calculating the threshold o the image rame, and applying the obtained
threshold to the image rame. 1he parallel process solution cannot be applied or this structure
because each o the processes needs the perious one to complete its task. Alternatiely a
pipeline architecture, which is a degree o parallelism, can be employed. Pipelining as a hardware
technique can signiicantly increase the speed o lPGA designs.
1he ocus o this chapter is inestigating how to increase the speed o the proposed
system so that it meets the requirement or real-time application independent o image size. 1wo
possible approaches are proposed here. One is approximate thresholding and the other is
pipeline thresholding.


6.J. Approximate 1hresholding
Let us assume that the implemented hardware contains two major processes, the weigh-updating
process, P1, and threshold-applying process, P2. 1he order o complexity or each process is
O;v), where v is the number o pixels in the image. lere in this discussion or generalization
purpose the memory access times are ignored. So the total process time per rame would be:

2 1
* * ) (
p p
t n t n frame T + = (6- J)


6
Obiously decreasing
1 p
t or
2 p
t can reduce total processing time and enhance the
perormance o the algorithm. 1he processing time or thresholding unit cannot be reduced and
is ixed because we need eery single pixel to be conerted rom gray leel to binary
representation. loweer
1 p
t or the weight-calculating unit processing time can be reduced in
two ways.
1. 1his process can be terminated as soon as the network has conerged, or the numbers
o the weight changes become less than a small alue. 1he drawback o this approach is
the possibility o alling into local optima, which are ery common in image
thresholding problems.
2. In a real time application, in which there is a sequence o image rames, usually any
rame can statistically and subjectiely be similar to the preious rame. It means we can
discard subsequent similar rames in weight-calculating process. In the best case, we
may sae only the time o weight-calculating, i.e., the total time becomes
1
* ) (
p
t n frame T = . 1he drawback o this approach is the threshold o a sequence o
image rames is not completely predictable. lor example there might be changes in the
position o the object in the background, the number o objects in the image, in the
image contrast and brightness between the sequence o rames. All mentioned eects
can change the statistical behaiour o the image and consequently the image threshold.
A rough solution is calculating the image threshold once in eery ew rames. It means
in eery / rames, the image threshold is calculated. 1he aerage total time is as ollows,

1
1
1
1
) (
p
p
p
nt
k
k
k
nt
nt frame T
+
= + = (6- 2)

Generally neither o the aboe approaches are eectie because the perormance o the
system may signiicantly degraded by ignoring either a number o pixels o a rame or discarding
seeral subsequent image rames, during image threshold calculating. As a result, any
approximate thresholding may hae ery poor output perormance which can be ery critical in
real time applications.


68
6.2 Pipeline 1hresholding Architecture
1he second approach is improing the architecture o thresholding systems. In this section,
regardless o the hardware platorm, two pipeline architectures, pipeline thresholding and parallel
pipeline thresholding, are proposed. In contrast to the irst approach, thresholding based on
approximation, the proposed architectures are recommended or real-time applications.
Basically the proposed algorithm works in a system shown in ligure 6.1. A CMOS
camera takes an image rame which sent to the threshold processing block ia an interace. 1he
captured image is binarized and the result is displayed or sent to the next operational block. As it
was described beore, the thresholding algorithm is diided into two main processes. One
process updates the weights and calculates threshold or a rame, named P1. Another process
applies the calculated threshold to the rame, named P2 in ligure 6.1.


Iigure 6.J : 1he system block diagram.

Based on the described system block diagram we hae introduced two pipeline
architectures, pipeline thresholding` and parallel pipeline thresholding`.

6.J.J Pipelined 1hresholding
In pipeline thresholding architecture the i
th
rame is processed in P1 while the i1
th
rame is
processed in P2 concurrently. 1he weights are calculating or the i
th
rame and the threshold
which is now ready or preious rame is applying to the i1
th
rame. 1his architecture is shown
in ligure 6.2.


69
Irame i
Irame i-J
PJ
P2
PU

Iigure 6.2: Pipeline thresholding block diagram.

ligure 6.3 shows the timing diagram or the pipeline architecture. 1he required time to
moe the system to the next cycle is called system cycle`. 1he length o the system cycle is
determined by the time required or the slowest process. 1he slowest process is the weight-
updating process,
1 p
t . lrom the second system cycle the rate o system cycle increases to two
processes per cycle.


Iigure 6.3 : Pipelined thresholding timing diagram

I
1 p
t is the processing time or 1 P ,
2 p
t is the processing time or 2 P , and m is the
number o rame then the total time or processing per rame in pipelined architecture is
reduced to

1
* ) (
p
t n frame T = (6- 3).

where
2 1 p p
t t > .



0
Now suppose that each rame is diided into m subrames and each subrame
represented by
j i
f
,
, where i is the number o rame, and j is the number subrame o the
th
i rame. lor the same architecture
j i
f
,
can be applied to P1 while
1 , j i
f is applied to the P2
process concurrently. 1his architecture is named multi-rame pipeline thresholding and shown in
ligure 6.4.


Iigure 6.4 : Multi-frame Pipeline thresholding block diagram.

1he number o pixel in a subrame is smaller than a rame so in the multi-rame
architecture the system cycle reduces to
1 p
t which is also smaller than
1 p
t . As it is discussed in
chapter 5, the process time o each rame changes linearly with regard to the number o pixels
per rame, so that

n t t
p p 1 1
= (6- 4)


where v is the number o pixel per image. 1he total time or process v rames in this
architecture would be

1
* * ) (
p
t m n frame m T = (6- S)

1he system cycle is decreased ,equation ,6-5,, but, in comparison with the rame
pipeline architecture, there is no enhancement or the speed in the multi-rame pipeline. Suppose
that the weight-updating unit, P1, has an adaptie cycle based on conergence rate. Once the

1
weight conerges to the desired alue P1 would be terminated. I
1 p
t represents the adaptie
time cycle then
1 1 p p
t t < . In cases where
2 1 p p
t t < the total time or processing mrames would
be decreased to

2
* * ) (
p
t m n frame m T = (6- 6).

6.J.2 Parallel Pipeline 1hresholding
In the parallel pipeline architecture the number o the processor units is increased to the number
o subrames per rame, ligure 6.5. All subrames or i
tb
rame,
j i
f
,
, is processed or its weight-
calculating while the all subrames rom i1
tb
rame,
1 , j i
f , are processed or applying the
threshold.


Iigure 6.S : Parallel pipeline block diagram

I the process time or each weight-calculating unit is represented with
1 p
t and or
threshold calculating with
2 p
t then
2 1 p p
t t > . 1he total process time or per rame is

2

1
* ) (
p
t
m
n
frame T = (6- 7)

Lquation ,6-, shows the architecture signiicantly reduces the total process time in the
expense o the area. Lach subrame has a dedicated hardware i.e. the number o processing units
are equal to the number o subrames. It is a large oerhead in hardware implementation.
loweer the million gates lPGA are a good choice or such an architecture deelopment.






3



Chapter 7
Concluding Remarks

An adaptie real-time image thresholding suitable or real-time application is presented in the
thesis. Instead o using a well-known and complicated threshold algorithm, the proposed
threshold technique has employed a simple neural network with two nodes to separate the
oreground and background pixels o a gray leel image. It was assumed the network should be
conerged with one pass o an image rame. 1he proposed method is a modiied ersion o the
\C1 method. Some modiication o learning rate is done to enhance the network conergence
and consequently to improe the perormance o the algorithm. 1his makes the algorithm
application-dependent, but only one parameter has to be set or each application. On the other
hand any other applications which need to cluster the oreground and background can employ
the threshold method.
1he results show the problems o binarization or poor contrast and non-uniorm
background in the range-inding` application are soled. 1he main adantages o the proposed
algorithm are its simplicity with no degradation o the binary image, and minimum pre,post
processing.
Another goal o this thesis was to deelop the algorithm in hardware. 1he implemented
hardware in lPGA was tested or aboe 100 images. 1he precision o calculating optimum
threshold can be improed using loating point representation or data path.

4
lor the uture hardware implementation the two pipelined architectures proposed in this
thesis can be done or real-time and high-speed applications. In theses architectures there is a
trade-o between area and speed.

5



Appendix A
Abbreviations

ANN Artiicial Neural Network
ASIC Application Speciic Integrated Circuit
CAS Column Address Strobe
CLB Conigurable Logic Block
CPLD Complex Programmable Logic Deice
lPGA lield Programmable Gate Array
IOB Input,Output Blocks
LU1 Look Up 1able
LVQ Learning Vector Quantization
ND1 Non-destructie 1esting
PQlP Plastic Quad llat Pack
RAS Row Address Strobe
R1L Register 1ranser Leel
VlDL Very high speed circuit lardware Description Language
VQ Vector Quantization
\C1 \eighted based Clustering 1hreshold


6



Appendix B
VHDL Code

1he source iles or the deeloped hardware are:
geverat.rba: Some unctions and deinitions useul in many applications are proided in
this ile.
vevCvt.rba: 1his ile describes the core state machine o the memory controller.
vevCvtMoa.rba: 1his ile creates a wrapper around the memory controller core to
customize it or the XSA Board.
vivar.rba: 1he image binarization tester state machine is described in this ile.
vivarMoa.rba: 1he top-leel iles that combine the logic and the memory controller
core to make the complete system or the XSA Board.

B.J General.vhd
package general is
constant YES :std_logic := '1';
constant NO :std_logic := '0';
constant HI :std_logic := '1';
constant LO :std_logic := '0';
constant ONE :std_logic := '1';
constant ZERO :std_logic := '0';
function boolean2stdlogic(b: in boolean) return std_logic;
function log2(v: in natural) return natural;

end package general;

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;



package body general is
function boolean2stdlogic(b: in boolean) return std_logic is
variable s: std_logic;
begin
if b then
s := '1';
else
s := '0';
end if;
return s;
end function boolean2stdlogic;

function log2(v: in natural) return natural is
variable n: natural;
variable logn: natural;
begin
n := 1;
for i in 0 to 128 loop
logn := i;
exit when (n>=v);
n := n * 2;
end loop;
return logn;
end function log2;

end package body general;

B.2 memCnt
library IEEE, UNISIM;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use WORK.general.all;
package memCnt is
component memCnt
generic(
FREQ :natural:= 70_000; -- operating frequency in KHz
DATA_WIDTH :natural:= 16; -- logic & MEM data width
NROWS :natural:= 4096; -- number of rows in MEM array
NCOLS :natural:= 256; -- number of columns in MEM array
CL_ADDR_WIDTH :natural:= 22; -- logic-side address width
MC_ADDR_WIDTH :natural:= 12; -- MEM-side address width
MAX_NOP:natural:= 10000;-- number of NOPs before entering self-refresh
IN_PHASE:boolean:= TRUE-- MEM and controller work on same or opposite
clock edge

);
port(
-- logic side
clk :in std_logic; -- master clock
lock :in std_logic; -- true if clock is stable
rst :in std_logic; -- reset
cl_rd :in std_logic; -- initiate read operation
cl_wr :in std_logic; -- initiate write operation
prog :out std_logic; -- read/write/self-refresh op has begun
done :out std_logic; -- read or write operation is done
rdDone :out std_logic; -- read operation is done and data is available
cl_addr :in unsigned(CL_ADDR_WIDTH-1 downto 0); -- address from logic to MEM
cl_Din :in unsigned(DATA_WIDTH-1 downto 0); -- data from logic to MEM
cl_Dout :out unsigned(DATA_WIDTH-1 downto 0); -- data from MEM to logic
-- MEM side
mc_cke :out std_logic; -- clock-enable to MEM
ce_n :out std_logic; -- chip-select to MEM
mc_ras :out std_logic; -- MEM row address strobe
mc_cas :out std_logic; -- MEM column address strobe
mc_we :out std_logic; -- MEM write enable

8
mc_ba :out unsigned(1 downto 0); -- MEM bank address
mc_addr :out unsigned(MC_ADDR_WIDTH-1 downto 0); -- MEM row/column address
sDIn :in unsigned(DATA_WIDTH-1 downto 0); -- data from MEM
sDOut :out unsigned(DATA_WIDTH-1 downto 0); -- data to MEM
sDOutEn:out std_logic; -- true if data is output to MEM on sDOut
udqm :out std_logic; -- enable upper-byte of MEM databus if true
ldqm :out std_logic -- enable lower-byte of MEM databus if true
);
end component;
end package memCnt;

library IEEE, UNISIM;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use WORK.general.all;

entity memCnt is
generic(
FREQ :natural:= 70_000; -- operating frequency in KHz
IN_PHASE: boolean:= TRUE;
MAX_NOP: natural := 10000;
DATA_WIDTH: natural := 16; -- logic & MEM data width
NROWS: natural := 4096; -- number of rows in MEM array
NCOLS: natural := 256; -- number of columns in MEM array
CL_ADDR_WIDTH:natural := 22; -- logic-side address width
MC_ADDR_WIDTH:natural := 12 -- MEM-side address width
);
port(
-- logic side
clk :in std_logic; -- master clock
lock :in std_logic; -- true if clock is stable
rst :in std_logic; -- reset
cl_rd :in std_logic; -- initiate read operation
cl_wr :in std_logic; -- initiate write operation
prog:out std_logic; -- read/write/self-refresh op has begun
done :out std_logic; -- read or write operation is done
rdDone :out std_logic; -- read operation is done and data is available
cl_addr :in unsigned(CL_ADDR_WIDTH-1 downto 0); -- address from logic to MEM
cl_Din :in unsigned(DATA_WIDTH-1 downto 0); -- data from logic to MEM
cl_Dout :out unsigned(DATA_WIDTH-1 downto 0); -- data from MEM to logic
-- MEM side
mc_cke :out std_logic; -- clock-enable to MEM
ce_n :out std_logic; -- chip-select to MEM
mc_ras :out std_logic; -- MEM row address strobe
mc_cas :out std_logic; -- MEM column address strobe
mc_we :out std_logic; -- MEM write enable
mc_ba :out unsigned(1 downto 0); -- MEM bank address
mc_addr :out unsigned(MC_ADDR_WIDTH-1 downto 0); -- MEM row/column address
sDIn :in unsigned(DATA_WIDTH-1 downto 0); -- data from MEM
sDOut :out unsigned(DATA_WIDTH-1 downto 0); -- data to MEM
sDOutEn:out std_logic; -- true if data is output to MEM on sDOut
udqm :out std_logic; -- enable upper-byte of MEM databus if true
ldqm :out std_logic -- enable lower-byte of MEM databus if true
);
end memCnt;

architecture arch of memCnt is
constant OUTPUT:std_logic := '1'; -- direction of dataflow w.r.t. this controller
constant INPUT :std_logic := '0';
constant NOP :std_logic := '0'; -- no operation
constant READ :std_logic := '1'; -- read operation
constant WRITE :std_logic := '1'; -- write operation

-- MEM timing parameters
constant Tinit :natural := 200; -- min initialization interval (us)
constant Tref :natural := 64_000_000;-- maximum refresh interval (ns)
constant Trfc :natural := 66; -- duration of refresh operation (ns)
constant Trp :natural:= 20; -- min precharge command duration (ns)
constant Twr :natural := 15; -- write recovery time (ns)
constant Txsr :natural := 75; -- exit self-refresh time (ns)
constant Tras :natural := 45;--min interval between active-precharge commands(ns)

9
constant Trcd :natural := 20;-- min interval between active and R/W commands (ns)

-- MEM timing parameters converted into clock cycles (based on FREQ)
-----------------------------------------------------------------------
constant NORM :natural := 1_000_000;
-- normalize ns * KHz
constant INIT_CYCLES_N :natural := 1+((Tinit*FREQ)/1000);
-- MEM power-on initialization interval
constant RAS_CYCLES_N: natural := 1+((Tras*FREQ)/NORM);
-- active-to-precharge interval
constant RCD_CYCLES_N: natural := 1+((Trcd*FREQ)/NORM);
-- active-to-R/W interval
constant REF_CYCLES_N: natural := 1+(((Tref/NROWS)*FREQ)/NORM);
-- interval between row refreshes
constant RFC_CYCLES_N: natural := 1+((Trfc*FREQ)/NORM);
-- refresh operation interval
constant RP_CYCLES_N: natural := 1+((Trp*FREQ)/NORM);
-- precharge operation interval
constant WR_CYCLES_N: natural := 1+((Twr*FREQ)/NORM);
-- write recovery time
constant XSR_CYCLES_N: natural := 1+((Txsr*FREQ)/NORM);
-- exit self-refresh time
constant MODE_CYCLES_N:natural := 2;
-- mode register setup time
constant CAS_CYCLES_N: natural := 3;
-- CAS latency
constant RFSH_OPS_N: natural := 8;
-- number of refresh operations needed to init MEM

-- timer registers that count down times for various MEM operations
-----------------------------------------------------------------------
signal timer_r, timer_x: unsigned(log2(INIT_CYCLES_N+1)-1 downto 0);
-- current MEM op time
signal rasTimer_r, rasTimer_x:unsigned(log2(RAS_CYCLES_N+1)-1 downto 0);
-- active-to-precharge time
signal wrTimer_r, wrTimer_x: unsigned(log2(WR_CYCLES_N+1)-1 downto 0);
-- write-to-precharge time
signal refTimer_r, refTimer_x:unsigned(log2(REF_CYCLES_N+1)-1 downto 0);
-- time between row refreshes
signal rfshCntr_r, rfshCntr_x:unsigned(log2(NROWS+1)-1 downto 0);
-- counts refreshes that are neede
signal nopCntr_r, nopCntr_x: unsigned(log2(MAX_NOP+1)-1 downto 0);
-- counts consecutive NOP operations
signal doSelfRfsh: std_logic;
-- active when the NOP counter hits zero and self-refresh can start

-- MEM timing parameters converted into unsigned clock cycles for clarity
----------------------------------------------------------------------------
constant INIT_CYCLES :unsigned := TO_UNSIGNED(INIT_CYCLES_N, timer_r'length);
constant RAS_CYCLES :unsigned := TO_UNSIGNED(RAS_CYCLES_N, rasTimer_r'length);
constant RCD_CYCLES :unsigned := TO_UNSIGNED(RCD_CYCLES_N, timer_r'length);
constant REF_CYCLES :unsigned := TO_UNSIGNED(REF_CYCLES_N, refTimer_r'length);
constant RFC_CYCLES :unsigned := TO_UNSIGNED(RFC_CYCLES_N, timer_r'length);
constant RP_CYCLES :unsigned := TO_UNSIGNED(RP_CYCLES_N, timer_r'length);
constant WR_CYCLES :unsigned := TO_UNSIGNED(WR_CYCLES_N, wrTimer_r'length);
constant XSR_CYCLES :unsigned := TO_UNSIGNED(XSR_CYCLES_N, timer_r'length);
constant MODE_CYCLES :unsigned := TO_UNSIGNED(MODE_CYCLES_N, timer_r'length);
constant CAS_CYCLES :unsigned := TO_UNSIGNED(CAS_CYCLES_N, timer_r'length);
constant RFSH_OPS :unsigned := TO_UNSIGNED(RFSH_OPS_N, rfshCntr_r'length);
constant MAX_NOP_CNT :unsigned := TO_UNSIGNED(MAX_NOP, nopCntr_r'length);

-- states of the MEM controller state machine
------------------------------------------------
type cntlState is (
INITWAIT, -- initialization
INITPCHG, -- initialization - initial precharge of MEM banks
INITSETMODE, -- initialization - set MEM mode
INITRFSH, -- initialization - do initial refreshes
RW -- read/write/refresh the MEM
ACTIVATE, -- open a row of the MEM for reading/writing
REFRESHROW, -- refresh a row of the MEM

80
SELFREFRESH -- keep MEM in self-refresh mode with CKE low
);

signal state_r, state_x: cntlState; -- state register and next state

-- commands that are sent to the MEM to make it perform certain operations
-- commands use these MEM input pins (ce_n,mc_ras,mc_cas,mc_we,udqm,ldqm)
-----------------------------------------------------------------------------
subtype sdramCmd is unsigned(5 downto 0);
constant NOP_CMD :sdramCmd := "011100";
constant ACTIVE_CMD :sdramCmd := "001100";
constant READ_CMD :sdramCmd := "010100";
constant WRITE_CMD :sdramCmd := "010000";
constant PCHG_CMD :sdramCmd := "001011";
constant MODE_CMD :sdramCmd := "000011";
constant RFSH_CMD :sdramCmd := "000111";

-- MEM mode register
subtype sdramMode is unsigned(11 downto 0);
constant MODE: sdramMode := "00" & "0" & "00" & "011" & "0" & "000";

-- the logic address is decomposed into these sets of MEM address components
constant ROW_LEN :natural := log2(NROWS);-- number of row address bits
constant COL_LEN :natural := log2(NCOLS);-- number of column address bits

signal bank :unsigned(mc_ba'range); -- bank address bits
signal row :unsigned(ROW_LEN - 1 downto 0); -- row address within bank
signal col :unsigned(mc_addr'range); -- column address within row

-- registers that store the currently active bank and row of the MEM
signal activeBank_r, activeBank_x :unsigned(bank'range);
signal activeRow_r, activeRow_x :unsigned(row'range);
signal activeFlag_r, activeFlag_x:std_logic;-- indicates that some row is active
signal doActivate:std_logic;-- indicates when a new row needs to be activated

-- there is a command bit embedded within the MEM column address
constant CMDBIT_POS :natural := 10; -- position of command bit
constant AUTO_PCHG_ON:std_logic := '1';-- CMDBIT value to auto-precharge the bank
constant AUTO_PCHG_OFF :std_logic := '0';-- CMDBIT value to disable auto-precharge
constant ALL_BANKS :std_logic := '1';-- CMDBIT value to select all banks
constant ACTIVE_BANK:std_logic := '0';-- CMDBIT value to select only active bank

-- status signals that indicate when certain operations are in progress
signal wrInProgress :std_logic; -- write operation in progress
signal rdInProgress :std_logic; -- read operation in progress
signal activateInProgress:std_logic; -- row activation is in progress

-- these registers track the progress of read and write operations

-- registered outputs to logic
signal prog_r,prog_x :std_logic;
-- true when MEM read or write operation is started
signal cl_Dout_r,cl_Dout_x:unsigned(cl_Dout'range);
-- holds data read from MEM and sent to the logic
signal cl_DoutOppPhase_r, cl_DoutOppPhase_x :unsigned(cl_Dout'range);
-- holds data read from MEM on opposite clock edge

-- registered outputs to MEM
signal mc_cke_r,mc_cke_x:std_logic; -- clock enable
signal cmd_r,cmd_x :sdramCmd; -- MEM command bits
signal mc_ba_r,mc_ba_x :unsigned(mc_ba'range); -- MEM bank address bits
signal mc_addr_r,mc_addr_x:unsigned(mc_addr'range); -- MEM row/column address
signal mc_data_r,mc_data_x:unsigned(sDOut'range); -- MEM out databus
signal mc_dataDir_r,mc_dataDir_x:std_logic;-- MEM databus direction control bit

begin
-----------------------------------------------------------
-- attach some internal signals to the I/O ports
-----------------------------------------------------------
-- attach registered MEM control signals to MEM input pins
(ce_n,mc_ras,mc_cas,mc_we,udqm,ldqm) <= cmd_r;-- MEM operation control bits

81
mc_cke <= mc_cke_r; -- MEM clock enable
mc_ba <= mc_ba_r; -- MEM bank address
mc_addr <= mc_addr_r; -- MEM address
sDOut <= mc_data_r; -- MEM output data bus
sDOutEn<= YES when mc_dataDir_r=OUTPUT else NO;-- output databus enable

-- attach some port signals
cl_Dout <= cl_Dout_r; -- data back to logic
prog <= prog_r; -- true if requested operation has begun


-----------------------------------------------------------
-- compute the next state and outputs
-----------------------------------------------------------
combinatorial: process(cl_rd, cl_wr, cl_addr, cl_Din, cl_Dout_r, sDIn, state_r,
prog_x, activeFlag_r, activeBank_r, activeRow_r, cl_DoutOppPhase_r,
nopCntr_r,lock, rfshCntr_r, timer_r, rasTimer_r, wrTimer_r, refTimer_r,
cmd_r,mc_cke_r)
begin
-----------------------------------------------------------
-- setup default values for signals
-----------------------------------------------------------
prog_x <= NO; -- no operations have begun
mc_cke_x <= YES; -- enable MEM clock
cmd_x <= NOP_CMD; -- set MEM command to no-operation
mc_dataDir_x <= INPUT; -- accept data from the MEM
mc_data_x <= cl_Din(mc_data_x'range);-- output data from logic to MEM
-- reload these registers and flags with their existing values
state_x <= state_r;
activeFlag_x <= activeFlag_r
activeBank_x <= activeBank_r;
activeRow_x <= activeRow_r;
rfshCntr_x <= rfshCntr_r;
-----------------------------------------------------------
-- setup default value for the MEM address
-----------------------------------------------------------

-- extract bank field from logic address
bank <= cl_addr(bank'length+ROW_LEN+COL_LEN-1 downto ROW_LEN +COL_LEN);
mc_ba_x <= bank; -- set MEM bank address bits

-- extract row, column fields from logic address
row <= cl_addr(ROW_LEN + COL_LEN - 1 downto COL_LEN);

-- extend column until it is as large as the(MEM address bus - 1)
col <= (others=>'0'); -- set it to all zeroes
col(COL_LEN-1 downto 0) <= cl_addr(COL_LEN-1 downto 0);

-- by default, set MEM address to the column address with interspersed
-- command bit set to disable auto-precharge
mc_addr_x <= col(col'high-1 downto CMDBIT_POS) & AUTO_PCHG_OFF
& col(CMDBIT_POS-1 downto 0);

-----------------------------------------------------------
-- manage row activation
-----------------------------------------------------------
-- request a row activation operation if the row and bank of the current
-- address do not match the currently active row and bank, or if no row
-- andbank is currently active
if (row /= activeRow_r)or(bank /= activeBank_r)or(activeFlag_r = NO) then
doActivate <= YES;
else
doActivate <= NO;
end if;

-----------------------------------------------------------
-- manage self-refresh
-----------------------------------------------------------
-- enter self-refresh if neither a read or write is requested for
--MAX_NOP_CNT consecutive cycles.
if (cl_rd = YES) or (cl_wr = YES) then

82
-- any read or write resets NOP counter and exits self-refresh state
nopCntr_x <= (others=>'0');
doSelfRfsh <= NO;
elsif nopCntr_r /= MAX_NOP_CNT then
-- increment NOP counter whenever there is no read or write operation
nopCntr_x <= nopCntr_r + 1;
doSelfRfsh <= NO;
else
-- start self-refresh when counter hits maximum NOP count
--and leave counter unchanged
nopCntr_x <= nopCntr_r;
doSelfRfsh <= YES;
end if;


-----------------------------------------------------------
-- update the timers
-----------------------------------------------------------
-- row activation timer
if rasTimer_r /= 0 then
-- decrement a non-zero timer and set the flag
-- to indicate the row activation is still inprogress
rasTimer_x <= rasTimer_r - 1;
activateInProgress <= YES;
else
-- on timeout, keep the timer at zero and reset the flag
-- to indicate the row activation operation is done
rasTimer_x <= rasTimer_r;
activateInProgress <= NO;
end if;

-- write operation timer
if wrTimer_r /= 0 then
-- decrement a non-zero timer and set the flag
-- to indicate the write operation is still inprogress
wrTimer_x <= wrTimer_r - 1;
wrInPRogress <= YES;
else
-- on timeout, keep the timer at zero and reset the flag that
-- indicates a write operation is in progress
wrTimer_x <= wrTimer_r;
wrInPRogress <= NO;
end if;

-- refresh timer
if refTimer_r /= 0 then
refTimer_x <= refTimer_r - 1;
else
-- on timeout, reload the timer with the interval between row refreshes
-- and increment the counter for the no. of row refreshes that are needed
refTimer_x <= REF_CYCLES;
rfshCntr_x <= rfshCntr_r + 1;
end if;

-- main timer for sequencing MEM operations
if timer_r /= 0 then
-- decrement the timer and do nothing else since the previous operation
--has not completed yet.
timer_x <= timer_r - 1;
else
-- the previous operation has completed once the timer hits zero
timer_x <= timer_r; -- by default, leave the timer at zero

-----------------------------------------------------------
-- compute the next state and outputs
-----------------------------------------------------------
case state_r is

-----------------------------------------------------------
-- let clock stabilize and then wait for the MEM to initialize
-----------------------------------------------------------

83
when INITWAIT =>
if lock = YES then
timer_x <= INIT_CYCLES;
state_x <= INITPCHG;
else
mc_cke_x <= NO;
end if;

-----------------------------------------------------------
-- precharge all MEM banks after power-on initialization
-----------------------------------------------------------
when INITPCHG =>
cmd_x <= PCHG_CMD;
mc_addr_x(CMDBIT_POS) <= ALL_BANKS;
timer_x <= RP_CYCLES;
rfshCntr_x <= RFSH_OPS - 1;
state_x <= INITRFSH;

-----------------------------------------------------------
-- refresh the MEM a number of times after initial precharge
-----------------------------------------------------------
when INITRFSH =>
cmd_x <= RFSH_CMD;
timer_x <= RFC_CYCLES;
rfshCntr_x <= rfshCntr_r - 1;
if rfshCntr_r = 0 then
state_x <= INITSETMODE;
end if;

-----------------------------------------------------------
-- set the mode register of the MEM
-----------------------------------------------------------
when INITSETMODE =>
cmd_x <= MODE_CMD;
mc_addr_x <= MODE;
timer_x <= MODE_CYCLES;
state_x <= RW;

-------------------------------------------------------------------
-- process read/write/refresh ops after initialization is done
-------------------------------------------------------------------
when RW =>
---------------------------------------------------------
-- highest priority operation: row refresh
-- do a refresh op if the refresh counter is non-zero
---------------------------------------------------------
if rfshCntr_r /= 0 then
if (activateInProgress=NO) and (wrInProgress=NO)
and (rdInProgress=NO) then
cmd_x <= PCHG_CMD;
mc_addr_x(CMDBIT_POS)<= ALL_BANKS;
timer_x <= RP_CYCLES;
activeFlag_x<= NO;
state_x <= REFRESHROW;
end if;
elsif cl_rd = YES then
if doActivate = YES then
if (activateInProgress=NO) and (wrInProgress=NO)
and (rdInProgress=NO) then
cmd_x <= PCHG_CMD;
Addr_x(CMDBIT_POS) <= ALL_BANKS;
timer_x <= RP_CYCLES;
activeFlag_x <= NO;
state_x <= ACTIVATE;
end if;
elsif (rdInProgress=NO) then
cmd_x <= READ_CMD;
prog_x <= YES;
end if;
---------------------------------------------------------
-- do a logic-initiated write operation

84
---------------------------------------------------------
elsif cl_wr = YES then
if doActivate = YES then
if (activateInProgress=NO) and
wrInProgress=NO)and (rdInProgress=NO) then
cmd_x <= PCHG_CMD;
timer_x <= RP_CYCLES;
activeFlag_x<= NO;
state_x <= ACTIVATE;
mc_addr_x(CMDBIT_POS) <= ALL_BANKS;
end if;
elsif rdInProgress = NO then
cmd_x <= WRITE_CMD;
mc_dataDir_x <= OUTPUT;
wrTimer_x <= WR_CYCLES;
prog_x <= YES;
end if;
---------------------------------------------------------
-- do a logic-initiated self-refresh operation
---------------------------------------------------------
elsif doSelfRfsh = YES then
if (activateInProgress=NO) and (wrInProgress=NO)
and (rdInProgress=NO) then
cmd_x <= PCHG_CMD;
timer_x <= RP_CYCLES;
activeFlag_x <= NO;
state_x <= SELFREFRESH;
mc_addr_x(CMDBIT_POS) <= ALL_BANKS;
end if;
--------------------------------------------------------
-- no operation
-------------------------------------------------------
else
state_x <= RW;
end if;

-----------------------------------------------------------
-- activate a row of the MEM
-----------------------------------------------------------
when ACTIVATE =>
cmd_x <= ACTIVE_CMD;
mc_addr_x <= (others=>'0');
mc_addr_x(row'range) <= row;
activeBank_x <= bank;
activeRow_x <= row;
activeFlag_x <= YES;
rasTimer_x <= RAS_CYCLES;
timer_x <= RCD_CYCLES;
state_x <= RW;

-----------------------------------------------------------
-- refresh a row of the MEM
-----------------------------------------------------------
when REFRESHROW =>
cmd_x <= RFSH_CMD;
timer_x <= RFC_CYCLES;
rfshCntr_x <= rfshCntr_r - 1;
state_x <= RW;

-----------------------------------------------------------
-- place the MEM into self-refresh and keep it there until
--further notice
-----------------------------------------------------------
when SELFREFRESH =>
if (doSelfRfsh = YES) or (lock = NO) then
cmd_x <= RFSH_CMD;
mc_cke_x<= NO;
else
mc_cke_x <= YES;
rfshCntr_x <= (others=>'0');
activeFlag_x<= NO;

85
timer_x <= XSR_CYCLES;
state_x <= RW;
end if;

-----------------------------------------------------------
-- unknown state
-----------------------------------------------------------
when others =>
state_x <= INITWAIT;
end case;
end if;
end process combinatorial;

-----------------------------------------------------------
-- update registers on the appropriate clock edge
-----------------------------------------------------------
update: process(rst,clk)
begin
if rst = YES then
state_r <= INITWAIT;
activeBank_r <= (others=>'0');
activeRow_r <= (others=>'0');
activeFlag_r <= NO;
rfshCntr_r <= (others=>'0');
timer_r <= (others=>'0');
refTimer_r <= REF_CYCLES;
rasTimer_r <= (others=>'0');
wrTimer_r <= (others=>'0');
nopCntr_r <= (others=>'0');
prog_r <= NO;
mc_cke_r <= NO;
cmd_r <= NOP_CMD;
mc_ba_r <= (others=>'0');
mc_addr_r <= (others=>'0');
mc_data_r <= (others=>'0');
mc_dataDir_r <= INPUT;
cl_Dout_r <= (others=>'0');
elsif clk'event and clk='1' then
state_r <= state_x;
activeBank_r <= activeBank_x;
activeRow_r <= activeRow_x;
activeFlag_r <= activeFlag_x;
rfshCntr_r <= rfshCntr_x;
timer_r <= timer_x;
refTimer_r <= refTimer_x;
rasTimer_r <= rasTimer_x;
wrTimer_r <= wrTimer_x;
nopCntr_r <= nopCntr_x;
prog_r <= prog_x;
mc_cke_r <= mc_cke_x;
cmd_r <= cmd_x;
mc_ba_r <= mc_ba_x;
mc_addr_r <= mc_addr_x;
mc_data_r <= mc_data_x;
mc_dataDir_r <= mc_dataDir_x;
cl_Dout_r <= cl_Dout_x;
end if;

if rst = YES then
cl_DoutOppPhase_r<= (others=>'0');
elsif clk'event and clk='0' then
cl_DoutOppPhase_r<= cl_DoutOppPhase_x;
end if;

end process update;
end arch;


86
B.3 memCntMod.vhd
library IEEE, UNISIM;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use UNISIM.VComponents.all;
use WORK.general.all;
use WORK.memCnt.all;

package memCnt is
component memCntMod
generic(
FREQ :natural:= 70_000; -- operating frequency in KHz
DATA_WIDTH :natural:= 16; -- logic & MEM data width
NROWS :natural:= 4096; -- number of rows in MEM array
NCOLS :natural:= 256; -- number of columns in MEM array
CL_ADDR_WIDTH :natural:= 22; -- logic-side address width
MC_ADDR_WIDTH :natural:= 12; -- MEM-side address width
MAX_NOP: natural:= 10000 -- number of NOPs before entering self-refresh

);
port(
-- logic side
clk :in std_logic; -- master clock
bufclk :out std_logic; -- buffered master clock
clk1x :out std_logic; -- logic clock sync'ed to master clock
clk2x :out std_logic; -- double-speed logic clock
lock :out std_logic; -- logic clock is locked to master
clock=1 rst :in std_logic; -- reset
cl_rd :in std_logic; -- initiate read operation
cl_wr :in std_logic; -- initiate write operation
prog:out std_logic; -- read/write/self-refresh op begun
done :out std_logic; -- read or write operation is done
rdDone :out std_logic; -- read done and data is available
cl_addr :in unsigned(CL_ADDR_WIDTH-1 downto 0); -- address from logic
cl_Din :in unsigned(DATA_WIDTH-1 downto 0); -- data from logic
cl_Dout :out unsigned(DATA_WIDTH-1 downto 0); -- data to logic
-- MEM side
sclkfb :in std_logic; -- clock from MEM after PCB delays
sclk :out std_logic; -- MEM clock sync'ed to master clock cke
:out std_logic; -- clock-enable to MEM
mc_cs :out std_logic; -- chip-select to MEM
mc_ras :out std_logic; -- MEM row address strobe
mc_cas :out std_logic; -- MEM column address strobe
mc_we :out std_logic; -- MEM write enable
mc_ba :out unsigned(1 downto 0); -- MEM bank address bits
mc_addr :out unsigned(MC_ADDR_WIDTH-1 downto 0);-- MEM row/column address
mc_data :inout unsigned(DATA_WIDTH-1 downto 0); -- MEM in/out databus
udqm :out std_logic; -- high databits I/O mask
ldqm :out std_logic -- low databits I/O mask
);
end component;

end package memCnt;

library IEEE, UNISIM;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use UNISIM.VComponents.all;
use WORK.general.all;
use WORK.memCnt.all;

entity memCntMod is
generic(
FREQ :natural:= 70_000; -- operating frequency in KHz
DATA_WIDTH :natural:= 16; -- logic & MEM data width
NROWS :natural:= 4096; -- number of rows in MEM array
NCOLS :natural:= 256; -- number of columns in MEM array
CL_ADDR_WIDTH :natural:= 22; -- logic-side address width
MC_ADDR_WIDTH :natural:= 12; -- MEM-side address width

8
MAX_NOP: natural:= 10000 -- number of NOPs before entering self-refresh
);
port(
-- logic side
clk :in std_logic; -- master clock
bufclk :out std_logic; -- buffered master clock
clk1x :out std_logic; -- logic clock sync'ed to master clock
clk2x :out std_logic; -- double-speed logic clock
lock :out std_logic; -- logic clock is locked to master clock=1 rst
:in std_logic; -- reset
cl_rd :in std_logic; -- initiate read operation
cl_wr :in std_logic; -- initiate write operation
prog:out std_logic; -- read/write/self-refresh op begun
done :out std_logic; -- read or write operation is done
rdDone :out std_logic; -- read done and data is available
cl_addr :in unsigned(CL_ADDR_WIDTH-1 downto 0); -- address from logic
cl_Din :in unsigned(DATA_WIDTH-1 downto 0); -- data from logic
cl_Dout :out unsigned(DATA_WIDTH-1 downto 0); -- data to logic
-- MEM side
sclkfb :in std_logic; -- clock from MEM after PCB delays
sclk :out std_logic; -- MEM clock sync'ed to master clock
cke :out std_logic; -- clock-enable to MEM
mc_cs :out std_logic; -- chip-select to MEM
mc_ras :out std_logic; -- MEM row address strobe
mc_cas :out std_logic; -- MEM column address strobe
mc_we :out std_logic; -- MEM write enable
mc_ba :out unsigned(1 downto 0); -- MEM bank address bits
mc_addr :out unsigned(MC_ADDR_WIDTH-1 downto 0);-- MEM row/column address
mc_data :inout unsigned(DATA_WIDTH-1 downto 0); -- MEM in/out databus
udqm :out std_logic; -- high databits I/O mask
ldqm :out std_logic -- low databits I/O mask
);
end memCntMod;


architecture arch of memCntMod is
-- the MEM controller and external MEM chip will clock on the same edge
-- if the frequency is greater than the minimum DLL lock frequency
constant MIN_LOCK_FREQ: natural := 25_000;
constant IN_PHASE: boolean := (FREQ >= MIN_LOCK_FREQ);

-- signals for internal logic clock DLL
signal int_clkin, int_clk1x, int_clk1x_b, int_clk2x, int_clk2x_b, int_lock: std_logic;

-- signals for external logic clock DLL
signal ext_clkin, sclkfb_b, ext_clk1x, ext_lock: std_logic;
signal clk_i : std_logic; -- clock for MEM controller logic

signal lock_i: std_logic;

-- bus for holding output data from MEM
signal sDOut: unsigned(mc_data'range);
signal sDOutEn: std_logic;

begin
-----------------------------------------------------------
-- setup the DLLs for clock generation
-----------------------------------------------------------
clkin: IBUFG port map (I=>clk, O=>int_clkin);
ext_clkin <= int_clkin when IN_PHASE else not int_clkin;

gen_dlls: if IN_PHASE generate
-- generate an internal clock sync'ed to the master clock
dllint: CLKDLL port map(
CLKIN=>int_clkin,
CLKFB=>int_clk1x_b,
CLK0=>int_clk1x,
RST=>ZERO,
CLK90=>open,
CLK180=>open,
CLK270=>open,

88
CLK2X=>int_clk2x,
CLKDV=>open,
LOCKED=>int_lock
);
int_clk1x_buf : BUFG port map(I=>int_clk1x, O=>int_clk1x_b);
int_clk2x_buf : BUFG port map(I=>int_clk2x, O=>int_clk2x_b);
sclkfb_buf : IBUFG port map(I=>sclkfb, O=>sclkfb_b);
dllext : CLKDLL port map(
CLKIN=>ext_clkin,
CLKFB=>sclkfb_b,
CLK0 =>ext_clk1x,
RST =>ZERO,
CLK90=>open,
CLK180=>open,
CLK270=>open,
CLK2X=>open,
CLKDV=>open,
LOCKED=>ext_lock);
end generate;

bufclk <= int_clkin;
clk_i <= int_clk1x_b when IN_PHASE else int_clkin;
clk1x <= int_clk1x_b when IN_PHASE else int_clkin;
clk2x <= int_clk2x_b when IN_PHASE else int_clkin;
sclk <= ext_clk1x when IN_PHASE else ext_clkin;

-- indicate the lock status of the internal and external DLL
lock_i <= int_lock and ext_lock when IN_PHASE else YES;
lock <= lock_i; -- lock signal for the logic logic

-- MEM memory controller module
u1: memCnt
generic map(
FREQ => FREQ,
IN_PHASE => IN_PHASE,
PIPE_EN => PIPE_EN,
MAX_NOP => MAX_NOP,
NROWS => NROWS,
NCOLS => NCOLS,
DATA_WIDTH=> DATA_WIDTH,
CL_ADDR_WIDTH => CL_ADDR_WIDTH,
MC_ADDR_WIDTH => MC_ADDR_WIDTH
)
port map(
clk => clk_i, -- master clock from external clock source
lock => lock_i, -- valid synchronized clocks indicator
rst => rst, -- reset
cl_rd => cl_rd, -- logic-side MEM read control from memory tester
cl_wr => cl_wr, -- logic-side MEM write control from memory tester
prog => prog, -- MEM memory read/write done indicator
rdDone => rdDone, -- MEM memory read/write done indicator
done => done,
cl_addr => cl_addr, -- logic-side address from memory tester
cl_Din => cl_Din, -- test data pattern from memory tester
cl_Dout => cl_Dout, -- MEM data output to memory tester
status => status, -- MEM controller state (for diagnostics)
mc_cke => mc_cke -- MEM clock enable
ce_n => mc_cs, -- MEM chip-select
mc_ras => mc_ras, -- MEM RAS
mc_cas => mc_cas, -- MEM CAS
mc_we => mc_we, -- MEM write-enable
mc_ba => mc_ba, -- MEM bank address
mc_addr => mc_addr, -- MEM address
sDIn => mc_data, -- input data from MEM
sDOut => sDOut, -- output data to MEM
sDOutEn => sDOutEn, -- enable drivers to send data to MEM
udqm => udqm -- MEM UDQM
ldqm => ldqm -- MEM LDQM
);
mc_data <= sDOut when sDOutEn=YES else (others=>'Z');
end arch;

89
B.4 ImBinar.vhd
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use WORK.general.all;

package mem is
component ImBinar
generic(
DATA_WIDTH :natural := 16; -- memory data width
ADDR_WIDTH :natural := 22; -- memory address width
BEG_TEST :natural := 16#00_0000#; -- beginning test range address
END_TEST :natural := 16#00_000F# -- ending test range address

);

port(
clk :in std_logic; -- master clock input
rst :in std_logic; -- reset or pushbotton on the board
done :in std_logic; -- memory operation done indicator
cl_rd :out std_logic; -- memory read control signal
cl_wr :out std_logic; -- memory write control signal
addr :out unsigned(ADDR_WIDTH-1 downto 0);-- address to memory
dIn :in unsigned(DATA_WIDTH-1 downto 0);-- data from memory
dOut :out unsigned(DATA_WIDTH-1 downto 0);-- data to memory
doAgain:in std_logic; -- re-do memory test
progress:out std_logic_vector(2 downto 0) -- memory test progress indicator

);
end component;

end package mem;
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use WORK.general.all;


entity ImBinar is
generic(
DATA_WIDTH :natural := 16; -- memory data width
ADDR_WIDTH :natural := 22; -- memory address width
BEG_TEST :natural := 16#00_0001#;-- beginning test range address
END_TEST :natural := 16#00_0010#-- ending test range address
);
port(
clk :in std_logic; -- master clock input
rst :in std_logic; -- reset or pushbotton on the board
done :in std_logic; -- memory operation done indicator
cl_rd :out std_logic; -- memory read control signal
cl_wr :out std_logic; -- memory write control signal
addr :out unsigned(ADDR_WIDTH-1 downto 0);-- address to memory
dIn :in unsigned(DATA_WIDTH-1 downto 0);-- data from memory
dOut :out unsigned(DATA_WIDTH-1 downto 0);-- data to memory
doAgain:in std_logic; -- re-do memory test
progress:out std_logic_vector(2 downto 0)-- memory test progress indicator
);
end ImBinar;
architecture Behavioral of ImBinar is
type testState is (
INIT, -- initialization
MEM_RD_1, -- Read data from mem
TRESH_CAL, -- calculate the threshold
MEM_RD_2, -- Read data from mem and compare
MEM_WR, -- Write the binarized image in mem
STOP
);

signal state_r,state_x :testState; -- state register and next state

90
signal addr_r,addr_x :unsigned(addr'range); -- address register
signal data_x :unsigned(dOut'range);-- data register
signal w1_r, w2_r :signed(8 downto 0); -- current weights registers
signal w1_x,w2_x :signed(8 downto 0); -- next weights registers
signal alpha_r,alpha_x :signed(8 downto 0) -- alpha register
signal thresh :unsigned(8 downto 0);--threshold register

constant w1_init :signed(8 downto 0):= "010000000"; -- weight1 Initial value
constant w2_init :signed(8 downto 0):= "010000000"; -- weight2 Initial value
Constant alpha_init :signed(8 downto 0):= "010000000"; -- alpha initial value

begin

-- states of the state machine
combinatorial: process(state_r,addr_r,dIn,done,doAgain,w1_r,w2_r, alpha_r)

--Internal Variables
variable d1,d2,ad1,ad2 :signed(8 downto 0);
variable prod1, prod2 :signed (17 downto 0);
variable prodt1, prodt2:signed(8 downto 0);
variable s_dIn :signed(8 downto 0);
variable dIn_8 :unsigned(7 downto 0);

begin

-- default operations (do nothing unless explicitly stated in
-- the following case statement)
cl_rd <= NO; -- no memory write
cl_wr <= NO; -- no memory read
addr_x <= addr_r; -- next address is the same as current address
state_x<= state_r; -- no change in states

-- compute the next state and operations
case state_r is
------------------------------------------------------
-- initialize the registers
------------------------------------------------------
when INIT =>
progress<= "000";-- indicate the current controller state
addr_x <= TO_UNSIGNED(BEG_TEST,addr_x'length);
-- load starting mem address
state_x <= MEM_RD_1; -- next go to memory read state
w1_x <= w1_init; --initialize w1
w2_x <= w2_init; --initialize w2
alpha_x <= alpha_init;-- initialize alpha
thresh <= (others=>'0');-- reset threshold register
------------------------------------------------------
-- Read data from mem
------------------------------------------------------
when MEM_RD_1 =>
progress<= "001"; -- indicate the current controller state
if done = NO then
cl_rd <= YES;
else
cl_rd <= NO;
prod_alpha := alpha_r* "011111101";
alphat:= prod_alpha(16 downto 8);
alpha_x<=alphat;
dIn_8:=dIn(7 downto 0);
s_dIn:= signed('0'& dIn_8);
d1 := s_dIn - w1_r;
d2 := s_dIn - w2_r;
ad1 := abs(d1);
ad2 := abs(d2);
prod1 := alpha * d1;
prod2 := alpha * d2;
prodt1:= prod1 (16 downto 8);
prodt2:= prod2 (16 downto 8);

-- Comapare weights and update
if ad1 <= ad2 then

91
w1_x <= w1_r+ prodt1;
w2_x<= w2_r;
-- Check if the weight value is
-- between 0 and 255
if w1_x<0 then
w1_x <="000000000";
elsif w1_x>255 then
w1_x <="011111111";
end if;
else
w2_x <= w2_r+ prodt2;
w1_x<= w1_r;
-- Check if the weight value is
-- between 0 and 255
if w2_x<0 then
w2_x <="000000000";
elsif w2_x>255 then
w2_x <="011111111";
end if;
end if;


if addr_r = TO_UNSIGNED(END_TEST,addr_r'length) then
state_x <= TRESH_CAL; --go to the next state
else
addr_x <= addr_r + 1;
-- increment address to check next
-- memory location
end if;
end if;
------------------------------------------------------
-- calulate the threshold
------------------------------------------------------
when TRESH_CAL =>
progress<= "010"; -- indicate the current controller state
thresh<= (unsigned(w1_x+w2_x))/2;--calculate threshold
state_x<= MEM_RD_2;
addr_x <= TO_UNSIGNED(BEG_TEST,addr_x'length);
-- load starting mem address
------------------------------------------------------
-- Read data from mem and compare
------------------------------------------------------
when MEM_RD_2=>
progress <= "011";
if done = NO then
cl_rd <= YES;
else
cl_rd <= NO;
-- Appling threshold value
if dIn <= ("0000000"&thresh) then
data_x <= X"0000";
else
data_x <= X"00FF";
end if;
state_x <= MEM_WR;
end if;
------------------------------------------------------
-- Write the binarized image in mem
------------------------------------------------------
when MEM_WR =>
progress<= "100"; -- indicate the current controller state
if done = NO then
cl_wr <= YES;
else
cl_wr<= NO;
if addr_r/= TO_UNSIGNED(END_TEST,addr_r'length) then
addr_x <= addr_r + 1;
-- increment address to check next
-- memory location

state_x <= MEM_RD_2; -- go to the next state

92
else
state_x <= STOP;
end if;
end if;
------------------------------------------------------
-- STOP
------------------------------------------------------
when others=>
progress <= "101"; -- indicate the current controller state
if (doAgain = YES) then
addr_x <= TO_UNSIGNED(BEG_TEST,addr_x'length);
-- load starting mem address
state_x<= INIT;
-- go to the INIT state and and re-do memory
test
end if;
end case;
end process;

-- update the registers
update: process(clk)
begin
if clk'event and clk = '1' then
if rst = YES then
-- go to starting state
state_r <= INIT;
else
-- update address register, and state
state_r <= state_x;
addr_r <= addr_x;
w1_r<=w1_x;
w2_r<=w2_x;
alpha_r<=alpha_x
end if;
end if;
end process;

-- connect internal registers to external busses(outputs)
addr <= addr_r;
dOut <= data_x;

end Behavioral;

B.S ImBinarmod.vhd
library IEEE, UNISIM;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use UNISIM.VComponents.all;
use WORK.general.all;
use WORK.mem.all;
use WORK.memCnt.all;

entity ImBinarMod is
generic(
FREQ :natural:= 70_000; -- frequency of operation in KHz
DATA_WIDTH :natural:= 16; -- MEM data width
BEG_ADDR :natural:= 16#00_0000#;-- beginning MEM address
END_ADDR :natural:= 16#3f_FFFF#;-- ending MEM address
BEG_TEST :natural:= 16#00_0000#;-- beginning test range address
END_TEST :natural:= 16#00_000F# -- ending test range address
);
port(
pushb_n:in std_logic; -- pushbutton input
clk :in std_logic; -- main clock input from external clock source
sclkfb :in std_logic; -- feedback MEM clock with PCB delays
mc_data :inout unsigned(DATA_WIDTH-1 downto 0);-- data bus to/from MEM

93
ce_n :out std_logic; -- Flash RAM chip-enable
sclk :out std_logic; -- clock to MEM
mc_cke :out std_logic; -- MEM clock-enable
mc_cs :out std_logic; -- MEM chip-select
mc_ras :out std_logic; -- MEM RAS
mc_cas :out std_logic; -- MEM CAS
mc_we :out std_logic; -- MEM write-enable
mc_ba :out unsigned( 1 downto 0); -- MEM bank-address
mc_addr :out unsigned(11 downto 0); -- MEM address bus
udqm :out std_logic; -- MEM UDQM
qml :out std_logic; -- MEM LDQM
seven_seg:out unsigned(6 downto 0); -- seven segment LED
);
end ImBinarMod;

architecture arch of ImBinarMod is
constant ADDR_WIDTH: natural := log2(END_ADDR-BEG_ADDR+1);
signal rst_i :std_logic; -- internal reset signal
signal clk_i :std_logic; -- internal master clock signal
signal clk_b :std_logic; -- buffered input (non-DLL) clock
signal lock :std_logic; -- MEM clock DLL lock indicator
signal begun :std_logic; -- MEM operation started indicator
signal done :std_logic; -- MEM operation complete indicator
signal rdDone :std_logic; -- MEM operation complete indicator
signal cl_addr :unsigned(ADDR_WIDTH-1 downto 0); -- logic address bus
signal cl_Din :unsigned(DATA_WIDTH-1 downto 0); -- logic-side data to MEM
signal cl_Dout :unsigned(DATA_WIDTH-1 downto 0); -- logic-side data from MEM
signal cl_rd :std_logic; -- logic-side read control signal
signal cl_wr :std_logic; -- logic-side write control signal
signal dataIn :unsigned(DATA_WIDTH-1 downto 0); -- input databus from MEM
signal dataOut :unsigned(DATA_WIDTH-1 downto 0); -- output databus to MEM
signal progress:std_logic_vector(2 downto 0); -- test progress indicator
signal syncPushb: std_logic_vector(1 downto 0);

attribute INIT: string;
attribute INIT of rst_i: signal is "1";
begin
ce_n <= '1'; -- disable Flash RAM

-- internal reset flag is set active by config. bitstream
-- and then gets reset after clocks start.
process(clk_b)
begin
if(clk_b'event and clk_b='1') then
if lock = NO then
rst_i <= YES; -- stay in reset until DLLs start up and lock
else
rst_i <= NO; -- release reset once DLLs lock
end if;
end if;
end process;

-- synchronize the pushbutton to the main clock
process(clk_b)
begin
if(clk_b'event and clk_b='1') then
syncPushb <= syncPushb(syncPushb'high-1 downto 0) & not pushb_n;
end if;
end process;

-- generic ImBinar module
slow_u0: ImBinar
generic map(
DATA_WIDTH => cl_Din'length,
ADDR_WIDTH => cl_addr'length,
BEG_TEST => BEG_TEST,
END_TEST => END_TEST
)
port map(
clk => clk_i, -- master internal clock
rst => rst_i, -- reset

94
doAgain => syncPushb(syncPushb'high), -- re-do the memory test
done => done -- MEM controller operation complete
dIn => cl_Dout, -- logic-side data from MEM goes to memory tester
cl_rd => cl_rd, -- logic-side MEM read control from memory tester
cl_wr => cl_wr, -- logic-side MEM write control from memory tester
addr => cl_addr, -- logic-side address from memory tester
dOut => cl_Din, -- logic-side data to MEM from memory tester
progress=> progress -- current phase of memory test
);

-- MEM memory controller module
u1: memCntMod
generic map(
FREQ => FREQ, -- master clock frequency
DATA_WIDTH => cl_Din'length, -- width of the logic and MEM databus
NROWS => 4096, -- number of rows in the MEM
NCOLS => 256, -- number of columns in each row
CL_ADDR_WIDTH => cl_addr'length, -- logic-side address width
MC_ADDR_WIDTH => mc_addr'length -- MEM-side address width
)
port map(
clk => clk, -- master clock from external clock source (unbuffered)
bufclk => clk_b, -- buffered master clock output
clk1x => clk_i, -- synchronized master clock
clk2x => open, -- synchronized doubled master clock
lock => lock, -- DLL lock indicator
rst => rst_i, -- reset
cl_rd => cl_rd, -- logic-side MEM read control from ImBinar
cl_wr => cl_wr, -- logic-side MEM write control from memory tester
prog => begun, -- indicates memory read/write has begun
rdDone => rdDone, -- indicates MEM memory read operation is done
done => done, -- indicates MEM memory read or write operation is done
cl_addr => cl_addr, -- logic-side address from memory tester to MEM
cl_Din => cl_Din, -- test data pattern from memory tester to MEM
cl_Dout => cl_Dout -- MEM data output to memory tester
sclkfb => sclkfb, -- clock feedback with added external PCB delays
sclk => sclk, -- synchronized clock to external MEM
mc_cke => mc_cke, -- MEM clock enable
mc_cs => mc_cs, -- MEM chip-select
mc_ras => mc_ras, -- MEM RAS
mc_cas => mc_cas, -- MEM CAS
mc_we => mc_we, -- MEM write-enable
mc_ba => mc_ba, -- MEM bank address
mc_addr => mc_addr, -- MEM address
mc_data => mc_data, -- MEM databus
udqm => udqm, -- MEM UDQM
ldqm => ldqm -- MEM LDQM
);

-- indicate the phase of the ImBinar on the seven segment
seven_seg <= "1110111" when progress="000" else -- "0"
"0010010" when progress="001" else -- "1"
"1011101" when progress="010" else -- "2"
"1011011" when progress="011" else -- "3"
"0111010" when progress="100" else -- "4"
"1101011" when progress="101" else -- "5"
"1111111";
end arch;


95


References

|1| J. \oung Oh and \. Stuerzlinger, Laser Pointer as Collaboratie Deices`, Crabic. vterface
2002, Lds. Strzlinger, McCool, AK Peters and ClCCS, ISSN 013-5424, ISBN
156881183-, 141~149, May 2002.

|2| M. Sezgin and B. Sankur, Surey oer Image 1hresholding 1echniques and Quantitatie
Perormance Laluation`, tectrovic vagivg 13,1,, 146~165, January 2004.

|3| O. D. 1rier and A. K. Jain, Goal-Directed Laluation o Binarization Methods`,
trav.actiov ov atterv avat,.i. ava vacbive ivtettigevce 1, NO 12, December 1995.

|4| A. Roseneld and P. De la 1orre, listogram Concaity Analysis as an Aid in 1hreshold
Selection``, 1rav.. ,.tev., Mav ava C,bervetic. , SMC-13, 231~235, 1983.

|5| M. I. Sezan, A Peak Detection Algorithm and Its Application to listogram-based Image
Data Reduction`, Crabic Moaet. ava vage Proce..ivg, 29, 4~59, 1985.

|6| 1. \. Ridler and S. Calard, Picture 1hresholding Using an Iteratie Selection Method``,
1rav.. Crabic Moaet. ava vage Proce..ivg. SMC-8, 630~632, 198.

|| N. Otsu, A threshold Selection Method rom Gray Leel listograms``, 1rav.. ,.tev.,
Mav ava C,bervetic., SMC-9, 62~66, 199.


96
|8| J. Kittler and J. Illingworth, Minimum Lrror 1hresholding``, Patterv Recogvitiov 19, 41~4,
1986.

|9| C. V. Jawahar, P. K. Biswas, and A. K. Ray, Inestigations on luzzy 1hresholding Based on
luzzy Clustering``, Patterv Recogvitiov, 30~10, 1605~1613, 199.

|10| J. N. Kapur, P. K. Sahoo, and A. K. C. \ong, A New Method or Gray-leel Picture
1hresholding Using the Lntropy o the listogram`, Crabic Moaet. vage Proce..ivg 29,
23~285, 1985.

|11| A. G. Shanbag, Utilization o Inormation Measure as a Means o Image 1hresholding``,
Covvter 1i.iov Crabic vage Proce..ivg, 56, 414~419, 1994.

|12| L. lertz and R. \. Schaer, Multileel 1hresholding Using Ldge Matching``, Covvter
1i.iov Crabic. vage Proce..ivg 44, 29~295, 1988.

|13| A. Pikaz and A. Aerbuch, Digital Image 1hresholding Based on 1opological Stable State``,
Patterv Recogvitiov 29, 829~843, 1996.

|14| A. S. Abutaleb, Automatic 1hresholding o Gray-leel Pictures Using 1wo-dimensional
Lntropy``, Covvter 1i.iov Crabic vage Proce..ivg 4, 22~32, 1989.

|15| J. M. \hite and G. D. Rohrer, Image 1hresholding or Optical Character Recognition and
Other Applications Requiring Character Image Lxtraction``, M ]. Re.. De. 2~4,
400~411, 1983.

|16| S. D. \anowitz and A. M. Bruckstein, A New Method lor Image Segmentation``, Covvter
Crabic vage Proce..ivg 46, 82~95, 1989.

|1| O.D. 1rier and 1. 1axt, Laluation o Binarization Methods lor Document Images`,
1rav.. Patterv .vat,.i. ava Macbive Intelligence 1, no. 3, 312~315, 1995.


9
|18| S.\. Lee, L. Lam, and C.\. Suen, Perormance Laluation o Skeletonizing Algorithms or
Document Image Processing`, Proc. ir.t vt`t Covf: aocvvevt .vat,.i. ava Recogvitiov,
260~21, 1991.

|19| D. 1alukdar and R. Sridhar, Adaptie ASIC 1hresholding or Image Processing
Application`, vtervatiovat .C Covferevce, September 29, 1993.

|20| 1. Kohonen, Learning Vector Quantization`, Neural Networks, 1 ,suppl1,, 303, 1988.

|21| M. Kamel and A. Zhao, Lxtraction o Binary Character,Graphics Images rom Grayscale
Document Images`, Crabicat Moaet. ava vage Proce..ivg 55, 203~21, 1993.

|22JSauola, 1. Seppanen, S. laapakoski, and M. Pietikainen, Adaptie Document
Binarization`, CD.R, 14~152, 199.

|23|Llectronic 1ext Centre, Optical Character Recognition: Some Samples`, http:,,
courses.cs.t.edu,csonline,AI,Lessons,VisualProcessing,OCRscans.html.

|24| \ale lace Database B, http:,,cc.yale.edu,projects,yaleacesB,yaleacesB.html, \ale
Uniersity, May 31, 2001.

|25| A.S. Georghiades and P.N. Belhumeur and D.J, Kriegman, lrom lew to Many:
llumination Cone Models or lace Recognition under Variable Lighting and Pose`,
1rav.. Patterv .vat,.i. ava Macbive vtettigevce 23, 643~660, 2001.

|26| Xess Crop., XSA Board V1.1, V1.2 User Manual`, www.xess.com,manuals,xsa-manual-
1_2.pd , March 2004.

|2| XLSS Corporation, Using Xilinx \ebPACK Sotware to Create lPGA Designs or the
XSA Board, rrr.e...cov, 2001.

|28| Xilinx Inc., www.xilinx.com, 1994-2004.

98

|29| Xess Crop., www.xess.com, 1993-2003.

|30| D. Vanden Bout, XSA Board SDRAM Controller.pd`,http:,,www.xess.com,appnotes,
an-030104-sdramcntl.pd, March 2004.

|31| N. Shirazi, A. \alters, P. Athanas, Quantitatie Analysis o lloating-point Arithmetic on
lPGA Based Custom Computing Machines`, ,vo.ivv ov PC.. for Cv.tov
Covvtivg Macbive., Caliornia, April 1995.

|32| \. B. Ligon III, S. McMillan, G. Monn, K. Schoonoer, l. Stiers, and K. D. Underwood,
A Re-ealuation o the Practicality o lloating-point Operations on lPGAs`,
,vo.ivv ov PC.. for Cv.tov Covvtivg Macbive., 1998.

|33| L. Louca, 1.A. Cook, and \.l. Johnson, Single Precision lloating Point Addition and
Multiplication on lPGAs`, PC.. for Cv.tov Covvtivg Macbive., 1996.

You might also like