Professional Documents
Culture Documents
using Matlab
Ciaran Cooney
School of Engineering
Dundalk Institute of Technology
Supervisor: Tim Daly, Paul Egan, Tommy Gartland, Alan
Kennedy
2016
Abstract
Text detection and character recognition in natural scene images is a challenging and
complex operation due to the potential for varying degrees of quality expected from
the input data. Therefore development of a robust and adaptable algorithm requires
several stages of pre-processing to identify regions of interest before character
recognition can be applied. This paper presents a methodology for implementation of
a character recognition algorithm based on identification of the alphanumeric digits
on vehicle registration plates.
The text detection algorithm has been integrated within a system requiring
initial image acquisition and a visual indication of results. The reason for this
development is to promote the use of the technique in a commercial application. A
wireless network and graphical user interface are incorporated to supplement the
primary utility of the system i.e. image processing and character recognition.
Results demonstrate the strengths and weaknesses of the techniques employed.
The quality of the input image, ambient conditions and various parameters within the
algorithm itself are found to impact the Optical Character Recognition (OCR) engines
ability to accurately detect text.
i
Acknowledgments
ii
Declaration
is entirely the authors own work and has not been taken from the work of others,
except as cited and acknowledged within the text.
The thesis has been prepared according to the regulations of Dundalk Institute of
Technology and has not been submitted in whole or in part for an award in this or any
other institution.
iii
List of Abbreviations and Symbols
iv
Table of Contents
Abstract ......................................................................................................................... i
Acknowledgments........................................................................................................ ii
1 Introduction .......................................................................................................... 1
2 Literature Review................................................................................................. 3
2.2 Technique..................................................................................................... 4
3 Theory ................................................................................................................ 11
4 Methodology ...................................................................................................... 19
v
6 Results and Discussion ...................................................................................... 37
7 Conclusions ........................................................................................................ 57
Appendix A ................................................................................................................ 64
vi
List of Figures
vii
Figure 27 Alfa Romeo Input Image ........................................................................... 42
viii
List of Tables
ix
1 Introduction
1.1 Introduction
Image processing in general and object recognition in particular is becoming an
increasingly important facet in modern electronics and communications. Some of the
more prevalent applications include medical imaging using fMRI (Steele et al., 2016),
process automation in industrial settings (Choi, Yun, Koo, & Kim, 2012) and text
detection in natural scene images (Zhao, Fang, Lin, & Wu, 2015) (Liu, Su, Yi, & Hu,
2016). The techniques deployed across these applications are wide-ranging and
diverse due to the different requirements of each. With such a vast array of criteria for
investigation it is necessary to define a specific area of interest.
Text Detection, or Character Recognition, is a field of study with an extensive
literature behind it and a burgeoning market for applications. Typical applications
where character recognition is especially important include scanning of text
documents, reading license plate numbers and language translation of text images.
Just as there are many applications for text detection, there are many techniques and
methodologies for implementation of a detection algorithm. Edge-detection,
thresholding and Hough transforms are three of the most common methods employed.
In fact, Otsus Method (Otsu, 1979) is a thresholding technique often implemented
within commercial Optical Character Recognition (OCR) algorithms.
License plate recognition is a standard paradigm for investigation and
experimentation of character recognition techniques and is the frame in which this
project has been carried out. A variety of methods have been implemented in license
plate detection such as Harris Corner and Character Segmentation (Panchal, Patel, &
Panchal, 2016), the use of SIFT descriptors (Yu Wang, Ban, Chen, Hu, & Yang,
2015) and probabilistic neural networks (ztrk & zen, 2012).
Much of the preliminary work undertaken has been focused on obtaining a
deeper understanding of the various techniques involved in text detection processes,
particularly those related to natural-scene images. Although the theory is extremely
important, practical usage must also be considered. With this, hardware and software
platforms are investigated in the literature review for this project to ascertain their
relative compatibility with image processing applications.
1
To test the efficacy of the investigation into the various detection and
recognition methods a practical implementation of these techniques is developed. In
most cases character recognition systems will consist of several component parts
including acquisition, pre-processing and recognition. The system proposed here
incorporates each of these elements within a wireless network which will provide an
automated response to positive character detection and an equivalent alert to failed or
negative detection.
The system is framed as a method for detecting the characters of a vehicle
registration plate and permitting or denying entry based on comparison of the detected
text and a pre-existing vehicle-registration database. However there is inherent
flexibility in the model and it may be adapted to service other applications. Figure 1 is
a flowchart depicting a high-level description of the required functionality of the
system.
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response
2
All the relevant theory, methodology and results relating to implementation of the
system described are contained within the main body of this document.
2 Literature Review
2.1 Introduction
Image processing and text recognition are increasingly important areas for research
and development in the modern world. Sectors in which image processing techniques
provide the basis for critical applications include medical, communications and
security. In the medical industry image processing techniques, such as improving the
quality of fMRI scans, have been employed in diagnostics (Misaki et al., 2015), with
some modern applications facilitating automated diagnosis of certain conditions.
Text recognition is an area with increasing relevance and the technology in this
area is keeping pace with this need. One of the most impressive applications present
in the literature is the use of text recognition technology in the development of a text-
to-speech synthesis system(Rebai & BenAyed, 2015).
Not only are the potential applications for image processing widespread but the
techniques used to extract the information are equally diverse. Methods deployed are
of course dependent on the desired outcome and there is no shortage of techniques
that can be tailored towards a specific target. Image processing is not unlike other
types of data processing in that the particular process is chosen based on the exact
requirements of the intended application.
With the project for which this literature review has been compiled being
primarily concerned with character recognition in a static image, much of this report
has been written with reference to this area (Zhao et al., 2015; Zhu, Wang, & Dong,
2015).
The expected outcome of this paper is to review, understand and analyse the
present literature on image processing techniques, the platforms used to implement
these techniques and the applications which most commonly employ image
processing as a means of achieving a desired outcome. Section 2 of the report gives an
overview of the techniques employed in the processing of images, usually to extract a
specific piece of information. Section 3 will discuss the operation of Optical
Character Recognition (OCR), which is an adaptable algorithm designed to recognise
3
specific features contained within an image i.e. text. The fourth and fifth sections of
the report will feature an assessment of the hardware and software platforms which
could be used to implement the specific techniques associated with image processing.
The report will conclude with a concise summary of the key findings from the
literature review. An outline will be included providing some of the relevant
information which will inform the future progress of this project.
2.2 Technique
There are numerous techniques documented and discussed in the literature available
on image processing. Among those most prominently featured are segmentation,
edge-detection and thresholding. Of course, the technique(s) employed by researchers
or professionals are largely dependent upon the requirements of a given application,
although not exclusively so. In some cases the limitations of software or hardware
may be the deciding factor in choices regarding technique.
Edge-Detection is one of the most common approaches to segmentation with its
method of detecting meaningful discontinuity in intensity values(Rafael C. Gonzalez,
Woods, & Eddins). The method makes use of derivatives and generally computed
using a LaPlacian filter. In their 1997 paper, (Smith & Brady, 1997) document an
approach to low level image processing, labelled the SUSAN principle which was
basically developed on existing edge-detection and corner protection techniques.
Another method with considerable presence within the literature is the use of
Moment Invariants. Moments are used to analyse and characterize the patterns
contained within image and are thus useful in character recognition. For instance,
Zernike moment invariants have been shown to be extremely effective in pattern
recognition applications(Belkasim, Shridhar, & Ahmadi, 1991).
Alongside Edge-Detection, Thresholding is one of the most commonly used
techniques used in image processing, specifically segmentation. The reason for this
prevalence seems to be its simplicity of implementation as well as the intuitive
properties it exibits(Rafael C. Gonzalez et al.). Thresholding is used for all sorts of
applications that require the extraction of information from a given image. One such
application is the detection of glioblastoma multiforme tumors from brain magnetic
resonance images(Banerjee, Mitra, & Uma Shankar, 2016). Global thresholding is
shown in this case to estimate the statistical parameters of the object and
4
background of an image. The literature in this area certainly supports the view that
thresholding is among the primary techniques used in image processing.
As well as the most common image processing techniques in the literature exist
some that are more specialized. One such technique is Nonnegative Matrix
Factorisation (NMF). Problems can occur with this method and several algorithms
have been proposed to solve these(Hu, Guo, & Ma, 2015). Although NMF is
purported to be an effective tool for large scale data processing it is not one that is
likely to be pursued for the requirements of this project.
Another less prominent but interesting method sometimes used for image
processing is Fuzzy Logic (Amza & Cicic, 2015). Among its current uses are in
automated quality control image processing systems. It works by extracting
geometrical characteristics of an object and then using this information with a fuzzy
pre-filtering unit to estimate the probability of a foreign body being present on the
object being analyzed. Although the use of this fuzzy logic is extremely successful in
these types of applications it does not appear to be the logical approach to a text
recognition application.
Before the more technical aspects of the image processing algorithm are
activated, it may be necessary to implement some of the more basic image processing
techniques to prepare an image for this. These basic adjustments may come in the
form of an image resizing, rotation or cropping, depending on the particular
characteristics of the image and the data to be extracted. In an article on low-quality
underwater images (Abdul Ghani & Mat Isa, 2015), the authors reference Eustace et
al. by adapting a contrast-limited adaptive histogram specification (CLAHS) as a pre-
processing step.
In most cases, the literature presents a combination of techniques that have been
chosen because of a particular capability to carry out a specific function or as a means
of experimentation in order to improve existing techniques. With regards to any
nascent image processing project or assignment, it is quite clear that a pragmatic
approach should be taken from the outset so that a suitable technique(s) can be
chosen.
5
2.3 Optical Character Recognition
One of the more dominant themes present in the literature surrounding image
processing techniques is that of Optical Character Recognition (OCR). OCR appears
as the final processing step in many of the papers research on image extraction and
recognition. There is clearly a wide range of applications and extraction methods that
OCR can be used in conjunction with. Among some of the potential applications for
the use of OCR are keyword searches and document characterization in printed
documents(M. R. Gupta, Jacobson, & Garcia, 2007).
A summary of the theories underpinning the OCR function is provided in Optical
Character Recognition-Theory and Practice(Nagy, 1982). Among the topics
discussed in this book is the classical decision-theoretic formulation of the character
recognition problem. Statistical approximations, including dimensionality reduction,
feature extraction and feature detection are discussed with regard to the appropriate
statistical techniques.
Commercially available OCR algorithms are primarily designed to interpret
binary (black and white) images. However, more and more pre-processing techniques
are being developed as a means of preparing images for use with this function. An
example of this is the denoising and binarizing of historical documents as a pre-
processing step(M. R. Gupta et al., 2007). Many researchers have pursued methods
based on development of a new or unique method of extraction that can be used along
with existing OCR functions (Roy et al., 2015).
One of the limitations associated with OCR-based applications is that they may
not work well when properties of the captured character images are significantly
different from those in the training data set. A supervised adaptation strategy is one
that has been developed as a potential solution to this problem(Du & Huo, 2013).
Nagy et al. also demonstrated that a character classifier trained on many typefaces can
be adapted effectively to text in a single unknown typeface by using a self-adaptation
strategy.
A further problem which can sometimes be faced when using an OCR algorithm
for text recognition is the assumption that individual characters can be isolated
(Fernndez-Caballero, Lpez, & Castillo, 2012). Some traditional methods of OCR
implementation have less than ideal recognition performance because of the difficulty
in achieving clear binary character images.
6
The literature clearly indicates that OCR is a vital function in relation to image
processing and text recognition. However, due to some of the limitations stated above,
it is important that any image be properly processed and segmented before being put
through an OCR algorithm.
2.4 Software
The extensive literature on image processing and text recognition techniques
incorporates the use of several types of software for implementation. Whether it is due
to personal preference or application specific criteria, it appears that there are a large
number of platforms available for consideration when undertaking an image
processing project.
Software which has been developed with the specific intention of being used for
image processing applications are available, often initiated from academic research. A
classic example of this is ImageJ, software written in Java and designed to run on any
operating system. ImageJ supports various functions and capabilities. For instance, it
is able to acquire images directly scanners, cameras or video sources. The program
also supports all common image manipulations including reading and writing of
image files and operations on individual pixels (Abrmoff et al., 2004).
The use of Labview as a tool for image acquisition and processing is an interesting
proposition and does have some presence in the literature. A program named Image-
Sensor Software (ISS) is one that is based on the Labview programming
language(Jurjo, Magluta, Roitman, & Batista Gonalves, 2015). Use of this type of
software enables image acquisition tools such as zoom, focus and capture. The
features required by the overall image recognition system must be defined by the user
when programming.
Matlab is a powerful piece of software with many uses in modelling,
experimentation and signal analysis. Its connectivity with many advanced
programming languages (like C, Java, VB) and availability of a wide range of
toolboxes make it popular among the scientific and research community(R. Gupta,
Bera, & Mitra, 2010). It possesses an extensive array of tools which can be harnessed
in the interests of image recognition. The use of the segmentation method id
particularly powerful within Matlab. Its use has been demonstrated by tracing yarn to
accurately compute useful parameters of fibre migration by statistically calculating
7
mean yarn axis and tracing out mean fibre axis(Khandual, Luximon, Rout, Grover, &
Kandi, 2015).
By employing Matlab as the means of processing an image for some form of
character recognition, the user has the ability to tailor code to develop algorithms with
specific image properties in mind. This may involve text or shape recognition, simple
colour recognition or perhaps properties contained within the image such as depth
perception.
Matlab has the additional advantage of being compatible for use in connection with
some form of hardware acquisition unit that may be implemented as part of an
embedded system. Its use in this context has been proven successfully(R. Gupta et
al., 2010), as a method for controlling image acquisition as well as image processing.
There are some specialised software packages that have been designed to facilitate
a specific function. A prime example of one of these is Xmipp, software developed
primarily as a means of image processing in electron microscopy(de la Rosa-Trevn et
al., 2013). Graphical tools incorporated within this software include data visualisation
and particle picking which can allow visual selection of some of the key parameters of
an image. It can be seen from reviewing the literature that image processing software
is both prevalent and sophisticated. At times it can appear overwhelming from the
sheer density of techniques available, however this does suggest that the type of
application being pursued in this project is very much achievable.
Although not always used exclusively, Matlab is very often used as a sun-section
in an overall processing technique. This seems to be due to the vast array of different
commands available within its image processing toolboxes. Images can be treated
using commands such as fspecist and imfilter in Matlab (HashemiSejzei &
Jamzad), before being processed elsewhere for different reasons. This is certainly a
consideration for the progress of the project being considered here, particularly in the
earlier stages of development when the use of some of these Matlab commands could
prove to be extremely informative.
2.5 Hardware
As with software, hardware is an important factor that must be given careful
consideration when entering into an image processing project. The relative strengths
and weaknesses of a specific hardware platform must be carefully gauged with
8
reference to the processing requirements. Not only this, but compatibility with a
chosen piece of software must be given due consideration. The presence of discussion
and critique of specific hardware units is not as strong as in software. This is primarily
due to the fact that most of the experimental work in this area is focused on the
various image processing algorithms, which are generally cross-platform.
The presence of embedded systems as a means of computing image processing is
fairly extensive in the literature. An ARM processor in conjunction with Matlab and a
Linux based operating system has been used to automatically identify cracks in a wall
(Pereira & Pereira, 2015).
Some applications may require the use of high-speed image processing systems.
Due to demands that may include increasing the speed of a transform process of
decreasing overall processing time, it may be necessary to design a specific
architecture to support the function. This is often the case with complex algorithms
which can be implemented using an FPGA for prototyping and verification (Mondal,
Biswal, & Banerjee).
As commented upon at the beginning of this section, there is a comparative lack
of hardware-related literature. The obvious conclusion to draw from this fact is that
the choice of hardware is secondary to the choices of technique, algorithm and
software. However one of the key hardware considerations is the processing
capability of any PC or laptop being used. A powerful CPU and specifically the
inclusion of a Graphics Processing Unit (GPU) can dramatically improve the
performance of any image processing application (Cugola & Margara, 2012).
2.6 Conclusions
There are several component factors to be investigated when considering a project
related to image processing. The relative importance of each of these factors is
reflected in their presence in the literature. Certainly the techniques or algorithms to
be implemented are critical factors which will determine the success or failure of a
given project. As has been documented previously in this report, there are many
potential techniques that can be useful in a variety of applications. This being the
case, it is always an important first step to define the functionality of an applications
before determining the correct method for achieving this aim.
9
With one of the potential objects of a project being text recognition from a scene
image, the use of segmentation and particularly thresholding techniques are very
likely to be required in some form. As well as these processing techniques, Optical
Character Recognition (OCR) in one form or other is almost ubiquitous across text
recognition applications. As there are many commercially available OCR engines, the
decision of which to use is almost entirely intertwined with the choice of software
platform. Matlab for example as an OCR algorithm associated with its own image
processing toolboxes.
With regards to software selection for image processing functions, it appears as if
this may come down to a personal preference for a particular interface in many cases.
However, an analytical approach should be taken to ensure that the chosen software
has the desired capabilities. A secondary, or perhaps even primary, factor worth
consideration is the relative expense of some of the software available for image
processing tasks. As noted in this literature review, there are free image processing
programs currently available and extensively developed, although it is possible that
they may come with certain compatibility issues. At the opposite end of the spectrum
software such as Matlab may only include its best image processing software at
additional expense, separate from the main program license.
One of the key decisions to be made is in the choice between the possible
implementation of an embedded system or developing the process on a PC or laptop.
Depending on the overall functionality of a system, it may be more desirable to have
an embedded image processing algorithm that acts as a device for detecting very
specific types of data. Alternatively, the use of a PC or laptop in this area allows for
continuing flexibility in the processing techniques even after completion of the final
design. As with every aspect related to this topic, decisions must be primarily based
upon the end-requirements of the application.
Overall impressions of the available literature on image processing techniques are
that the research and experimentation in this area is both extensive and expanding. It
is a field that is extremely relevant in the technology and communications sector
today and the work being undertaken reflects this status. Of course this means that its
pace of development is exceptionally fast but it also means that the potential
applications for its use will continue to grow.
10
3 Theory
3.1 Introduction
Text detection has of course been heavily researched with multiple methods being
suggested for application (cite). There are some differences in the literature as to how
these methods are categorised. (Zhang, Zhao, Song, & Guo, 2013) for example,
categorise these techniques into four groups: edge-based, texture-based, connected-
component (CC)-based and others. However (Chen et al., 2011) have categorised
these techniques into two primary classes: texture-based and CC-based.
Maximally Stable Extremal Regions is the technique being employed in this
case. The use of an MSER approach to text detection is advocated for several reasons.
Among these are the observations that text regions tend to have quite high colour-
contrasts with their backgrounds and they also typically consist of homogenous colour
formations (Liu et al., 2016).
The following sections introduce the theory underpinning the methodology
being implemented for this image processing algorithm in various stages. Each of the
key components of the algorithm are discussed indivually and their anticipated effects
on a given input image stated. The theory in this section is laced with referances to
Matlab and the methods available on this software for applying these techniques. The
section begins with a note on the image formats typically used in this type of
application. In many instances the image format itself is not a critical factor in image
processing but it is nevertheless worthy of consideration.
M-by-N-by-3 truecolour A true colour image is a 24-bit image (8-bits for each
colour Red, Blue and Green (RGB)) such as a JPEG, capable of displaying millions of
11
colours (224 ) (Robbins, 2007). The quantity of possible colours is due to the fact each
byte is able to represent 256 different shades.
M-by-N 2D grayscale This is an image in which all colours are a different shade of
grey. One of the virtues of this format is that less information is required for each
pixel. They are stored in an 8-bit integer allowing for 256 (28 ) shades of grey from
white to black (Fisher, Perkins, Walker, & Wolfart, 2003b). Grayscale is a common
format in image processing.
M-by-N binary In binary images pixels have only two possible intensity values.
These values are typically displayed as black and white, with 0 used for black and 1
or 255 used for white (Fisher, Perkins, Walker, & Wolfart, 2003a). The binary format
is often used to distinguish between text and background in pattern recognition
algorithms.
As stated above, the class of image is not a defining factor in the success or
failure of the recognition algorithm. Due to this two image types have been
implemented throughout the testing and experimentation process: PNG and JPEG.
PNG is a relatively new image format and uses 24-bit true colour
(Willamette.edu, 2016). Although the files can be considerably larger than the JPEG
format this is not a major concern in this instance as all image files are to be deleted
immediately after use.
JPEG is said to be a lossy format (Willamette.edu, 2016) as it has the potential
for some data loss associated. These losses result in slight degradation of the image
but have minimal impact on the visual perception of the image. JPEG is not limited in
colour and is a popular format for images containing natural scenes and vibrant
colours. However the vibrancy of the colour image is not a primary factor for
consideration in this case.
12
been defined as blob detection (Matas, Chum, Urban, & Pajdla, 2004), meaning that
the MSER command in Matlab will return relevant information pertaining to MSER
features in a given input image.
Due to the fact that an input image will present significant variation in
granulation, resolution and grey-scale levels, amongst other features, the roughness or
smoothness of the edges within that image can vary also (Moreno-Daz, Pichler, &
Quesada-Arencibia, 2012). For this reason the blob detection is applied with an
MSER algorithm for detecting sections of significant intensity within an image. The
Extremal region associated with the MSER acronym is an area within an image with
connected components which maintain intensity levels below a threshold.
Through this technique areas of interest can be filtered to allow an OCR
algorithm to attempt character recognition.
13
3.5 Stoke-Width Thresholding
In an effort to obtain more consistent results a stroke width transform of the MSER
regions is generated and applied to perform filtering and pairing of the connected
components (Chen et al., 2011). The stroke width is computed with the bwdist
command which calculates the Euclidean distance transform of a binary image.
(Epshtein, Ofek, & Wexler, 2010) designed a method of stroke-width transformation
based on the premise that text characters could be detected from the regions where
stable stroke widths occurred.
The reason for including this approach within a character detection algorithm
is that it can be effectively implemented as a means of reducing background noise.
This is because regions contained within the image are grouped into blocks, having
been further verified as containing properties relating to likely text characters (Yi &
Tian, 2011). For example, the stroke-width of the letter T should be identical to the
stroke-width of the letter D assuming the text font is the same. However a non-text
region is not likely to share this stroke-width and can therefore be eliminated as a text
region.
Thinning is a method of reducing binary objects in an image to strokes which
are a single pixel wide (R.C. Gonzalez, Woods, & Eddins, 2010). The Matlab
command bwmorph implements this approach with a series of operations including
dilations and erosions. Matlab enables the programmer to set the number of iterations
for which the thinning operation occurs. In fact, the number of iterations can be set to
infinity (inf) indicating that the operation will continue until the image ceases to
change.
The results from the distance transform and the thinning operation are then
combined to provide the stroke width values contained within the image. A
measurement for stroke width is calculated by dividing the standard deviation of the
stroke width values by the mean of the same stroke width values:
Stoke Width Measurement =
An array index is computed which is comprised of those regions of the image with
a greater stroke width measurement value than the value of the predefined stroke
width threshold. It is expected that those regions with a greater than threshold value
14
will be the text regions of the image. This index is then subject to the operation of the
mserRegions command so that the desired regions of the image, i.e. the text regions
can be removed.
Prudence dictates that the minor precaution of ensuring the expanded bounding boxes
do not exceed the outer limits of the image. This is achieved by comparing the
maximum axis limits calculated from the expansion coefficient with the axis limits
defined by the size of the image. The new axis limits of the bounding boxes are then
15
taken as the minimum value computed from the previous comparison. This is
implemented in Matlab in the following fashion:
16
height of the region of interest is determined by x and y coordinates established for
the bounding boxes and these must not extend beyond the area of the image.
Almost all of the commercially available OCR functions are designed to operate
on binary images and Matlab is one of these. Matlabs OCR function uses Otsus
method of thresholding (Otsu, 1979) to convert an input image into a binary
equivalent before the recognition process is implemented. Otsus method has been
demonstrated to exhibit better overall performance in OCR than other techniques (M.
R. Gupta et al., 2007).
Modern OCR algorithms like the one employed by Matlab add multiple
algorithms of neural network technology to analyse character stroke-edge. This stroke
edge is effectively the collision point between the concentration of character pixels
and the background image. The algorithm takes averages of the black and white
values along the edge of each character. The result is then matched to the characters
contained in the dataset and the closest estimation is selected as the output character
(Potocnik & Zadnik, 2016).
When the OCR algorithm has completed the recognition process the results are
printed in the Matlab command line with the following entry: [txt.Text]. Should the
user require information on the properties of the OCR output the command ocrText
contains recognised text and metadata collected during optical character recognition
(Mathworks.com, 2016b). However, some of these features are not available with the
student edition of Matlab used during this project.
17
with implementation of an MSER methodology. However, under favourable lighting
conditions should not pose a substantial problem as vehicle registration plates have
reasonably distinct contrast between character and background regions.
18
4 Methodology
4.1 Introduction
Implementation of a system which will acquire an image, process it and provide
automated indication of the success or failure of the operation requires an elaborate
methodology to incorporate each of the individual components into one design. This
section documents how the complete system has been put together.
There are two distinct sections covered in this methodology section. The first is
the hardware element of the project containing all the communications involved,
including image acquisition and an automated response. The second is the procedure
implemented for processing of the acquired image and extraction of the desired text
region.
The overall system design is discussed, providing insight into how the various
components are expected to interact. The selection of the Raspberry Pi 2 Model B and
the specifications of this microcontroller which lend itself to the application are
documented before the additional circuitry required of the system is discussed with
particular regard to the design of a PCB.
Following the hardware description of the project a detailed description of the
image processing algorithm is provided. This description discusses each of the major
sections of the algorithm individually, highlighting the effects of each technique on a
given input image. As stated previously, this image processing algorithm is the
technical focus of the project and the level of detail reflects this.
The concluding paragraphs of this methodology section are intended to provide
the reader with information on how the extracted number plate text is compared to an
existing text string and how this comparison is used to provide indication of
recognition.
19
Image Acquisition
Image Transmission
Pre-Processing
OCR
Results Comparison
Automated Response
20
a series of steps designed to provide the best possible image for the OCR function to
operate on. This will involve some of the methods mentioned in the introduction and
literature review of this document. The OCR function is used to produce a text string
output which is expected to match the characters present in the input image.
The result from the OCR function will then be used alongside some existing
database of vehicle registration numbers in a comparison function which will
determine whether or not the character string obtained is one of the registrations
expected. Finally, the result from the comparison function, which will be a Boolean 1
or 0, will be used to initiate an automated response, tailored to each condition.
21
Ethernet port
Camera interface
Display interface
MicroSD card slot
VideoCore IV 3D graphics core
The camera interface included in the list above enables the user to connect the
custom-designed add-on module for Raspberry Pi hardware (Mathworks.com, 2016d).
This small and lightweight device supports both still capture and video mode, making
it ideal for mobile projects. In still capture mode the camera has a 5 megapixel native
resolution, supporting 1080p30 and 720p60.
The Raspberry Pi camera module is popular in home security applications and
wildlife camera traps and is often used for time-lapse and slow motion imaging
(Raspberrypi.org, 2016a).
The GPIO pins on the Raspberry Pi model B are an essential element in its use
as the central node of a system as they facilitate connection with external electronic
circuitry and sensors (Vujovi & Maksimovi, 2015). These pins can accept input and
output commands which can be programmed to act as required. With particular
reference to this project these input pins can be used to monitor the status of switches
or sensors which can be implemented as triggers for other components of the system.
The pin layout in Figure 3 can be seen in the diagram taken from element14.com,
included below:
22
As witnessed by the pin diagram, the Raspberry Pi model B is equipped with several
DC power lines which can be used as a power source for external circuitry. In terms
of portability and using the microcontroller remotely this is a powerful feature as it
eliminates the necessity for further external power supplies which may otherwise be
required.
The facility to integrate a wireless network, database server and web server into
a single compact, low-power computer, which can be configured to run without a
monitor, keyboard or mouse is a major advantage when working with the Raspberry
Pi (Ferdoush & Li, 2014). This became a particularly important feature for use in this
project as the Pi could be controlled remotely following initial setup. As the system
was developed and became more refined, the wireless element grew in importance,
not only as a means of data transmission but as a method for implementing overall
control. For this reason the selection of the Raspberry Pi for the hardware
requirements of the project proved correct.
There are a several options for powering the Raspberry Pi with the condition
that the source is able to provide enough current to the device (Vujovi &
Maksimovi, 2015). The device is powered by 5V from a micro-USB connector;
however the current requirements differ for each model of the device and depend on
the number of connections drawing power from the microcontroller. For the model
being used in this case (2B), a PSU current capacity of 1.8Amps is recommended
(Raspberrypi.org, 2016c).
With a device such as the Raspberry Pi acting as the central node of a system
like this one, there is a possibility that an excessive number of parasitic devices may
be connected and drawing current that the Pi cannot facilitate. It is therefore essential
that the number of connected devices and components are kept to the minimum
required. Typical connections to the Raspberry Pi including HDMI cable, keyboard
and mouse require between 50mA and several hundred milliamps of current
(Raspberrypi.org, 2016c) and the camera module being used here requires a
significant draw of 250mA. Those external devices are required during the testing and
prototyping stages of this project. However, due to the specification of the system
some of these current drawing devices are not required for the final construction. With
remote connectivity there is no need for GUI-related connections to the Raspberry Pi,
thus relieving the power-burden on the device somewhat.
23
4.4 Wireless Network
The system design specifies that some form of wireless network is used for
communication between the microcontroller and the computer containing Matlab.
Wifi has been selected as the protocol for this purpose and there are several reasons
behind this decision.
The prevalence of Wifi in commercial and academic premises makes it an
easily accessable resource for implementation of this system. Wifi also enables
greater range than could be provided by a single Bluetooth device. The use of Wifi for
transmission of the acquired image is not a major concern as only one picture is being
sent at any one time.
The simple fact that Matlab is able to communicate directly with the
Raspberry Pi by forming a connection via the devices IP address made the selection of
Wifi a certainty. An IP address along with a username and password for the
Raspberry Pi is all that is required to enable remote control of the device from Matlab.
The choice of Wifi as the network model may have been premature in regards
to experimental testing of the system due to the intermittent coverage in the lab
setting. This issue is discussed further in the section of this paper relating to testing.
24
Initial testing of the system in this configuration required that construction of a
circuit be carried out on a breadboard. Once successfully tested and a final design
settled upon, this circuit could be designed and constructed as a Printed Circuit Board
(PCB).
The circuit design incorporated two push-button switches. One to simulate the arrival
of a vehicle at the position where image acquisition takes place and a second to
simulate the end of the operation and system reset. These two switches are connected
to one of the Raspberry Pi GPIO pins which will be polling for a change in state.
The two LEDs being used to simulate the systems output response are also
connected to GPIO pins on the Raspberry Pi. Due to a lack of intensity experienced
while testing with the LEDs two NPN transistors have been included in the circuit to
enable extra current to be driven to the LEDs.
The design of the circuit can be seen in the appendix and the PCB design in
Figure 5. Both the schematic and PCB layout have been drawn on proteus.
25
Figure 5 Software Design Flowchart
26
The first critical objective of the program is to connect with the Raspberry Pi device
and take control of the onboard camera module. At this point the external LEDs are
set to 0 to ensure they are not considered as false positives.
The program then polls the appropriate GPIO pin which is connected to the
switch being used to trigger image acquisition. This polling effectively sees the
system wait for this switch to be pressed before any other action can begin. Figure 6
shows how this has been implemented two simple lines of code.
27
The parameter values seen in Figure 6 have generally been used thoughout
testing but they can be adjusted and additional parameters included if required.
The image in Figure 7 has been operated on by the MSER technique discussed
and exhibits all the potential text regions detected. Due to the relatively wide scope of
this image a large number of MSER regions have been returned. The weakness of
this technique is obvious from this image as the number of non-text regions
indentified vastly outnumbers the text regions.
4.8 Regionprops
The Matlab command regionprops is deployed to apply removal of MSER regions
based on geometric properties. Data from the MSER regions must be converted to
linear indices so that it can be operated on with regionprops. The regionprops
command then measures and returns statistical analysis of the MSER regions
previously identified. There are numerous property types which can be used for
geometric thresholding and selection may depend on the requirements of an
application. Thos properties selected in this instance are included in the section of
Matlab code in Figure 8.
28
criteria. Eccentricity, Solidity and Euler Number are each calculated in a similar
manner using the regionprops command.
The Aspect Ratio is calculated as the ratio of the height of the image area to
its width. Information is extracted from the bounding box regions of the image using
regionprops and the ratio calculated as the width divided by the height. A threshold is
applied in the same way as with the other geometric properties.
MSER regions determined by the thresholds are removed from the image
based on this technique. It is anticipated that this would result in a significant
reduction in the number of those non-text regions present in an image like the one
seen in Figure 7.
29
4.10 Bounding Boxes
As stated in the theory section, bounding boxes are used to bring form to the data
present in the image. Matlab is equipped with considerable functionality for applying
bounding boxes and the process is initiated by determining bounding boxes for each
of the remaining text regions. These bounding boxes can be expanded slightly to help
ensure overlap between connected components. This is achieved by applying a small
expansion amout to the bounding boxes and is an important feature in determing the
structure of a text string returned from the OCR function. The effect of varying the
expansion amout is discussed in greater detail in the results section. Figure 10 shows
the effect of applying bounding boxes to each of the character regions and a clear
overlap is clearly visble among the components.
30
Figure 13 Merged Bounding Boxes
31
function cannot be directly compared with the string entered as the expected
registration digits.
The solution to the problem of comparing different data types is to convert
them both to a mutual type. This requires the use of the cellstr(S) function in Matlab
which facilitates the creation of a cell array of strings from any character array. A cell
array in Matlab is one whose elements are cells. Each cell in a cell array can hold any
Matlab data type including numerical arrays, character strings, symbolic objects and
structures (Hanselman & Littlefield, 2001).
Taking the example of a vehicle registration plate accurately detected by the
OCR engine as XJZ 7743, the answer returned is a 1x10 Character Array and is
stored in Matlab as such. Entering the string B = 'XJZ 7742' is stored in Matlab
simply as the value XJZ 7747. Comparisons of these two results with the string
compare function returns a 0 as the values are in different formats.
This error is overcome by creating two cell arrays from the stated values. The
lines of code in Figure 14 below show the method for comparing these two cell
arrays:
32
would be required. The basic premise of this function, as stated previously in this
report, is to enable or block entry to a parking facility and to alert an operator when
this is considered necessary.
The Raspberry Pi provides a suitable platform for this purpose as its GPIO
pins can be implemented to trigger an external response to the system inputs. In real-
world applications this output may be tailored to meet the specific requirements of a
given system. For example, a servo motor may be triggered to raise a barrier or an
alarm sounded to alert a system operator. In this case a simple LED can be used for
simulation and testing of the efficacy of the Matlab code and external circuitry.
In Matlab code an if statement can be used to determine a response which is
dependent upon the presence of a specified input condition(s). For instance, it may be
used to implement a certain set of conditions when the if statement is true,
otherwise the status-quo persists. Alternatively it could be used to determine an output
based on several potential input conditions, determining the required output upon the
presence of a given condition.
For testing the output of the system the input condition is provided for by the
result of the string compare function discussed in the previous section. Therefore the
code could be compiled to trigger some form of response when the output of the
string compare function (F) is equal to 1. In cases where the OCR function is unable
to determine a positive match F is equal to 0, in which case the system can be
configured to produce no response at all or an alternative response such as a red light
to indicate that the comparison is negative.
A section of code containing the if statement is inserted in Figure 15below.
33
provide enough current so that the LED is easily visible. The current provided from
the GPIO pins on the Raspberry Pi is insufficient for this purpose.
In the event of a positive match, i.e. F==1, the green LED is switched on. When
the code is run and the result is a negative match, then the red LED will be switched
on.
A single iteration of the system is completed when the second push-button switch
is pressed as this switched off all external LEDs, closes all open figures, deletes the
input image and exits the While Loop.
34
5 Experimental Testing
35
5.2 Complete System Test
Complete system testing combined each of the component elements of the project to
determine whether or not it would operate as expected. This process has not
proceeded as smoothly as anticipated although it has provided some positive results.
The hardware used for the complete system test, including the Raspberry Pi and PCB
can be viewd in Figure 15.
36
5.3 Limitations to Testing
Several limitations to testing of the system have been experienced, some of which
may be relevant to the results obtained. Perhaps the most debilitating of these has
been the difficulty in obtaining and maintaining adequate wireless connectivity in the
lab. Intermittent connectivity led to a significant amount of time being expended on
troubleshooting network problems. Occasionally it was not possible to establish any
connectivity between the Raspberry Pi and the college network, making testing of the
overall system very difficult.
On reflection, a more prudent approach may have been to perform all testing
with an Ethernet connection to avoid time wasted on wireless issues. Final completion
of the system could, in that case, have incorporated the wireless element.
Lighting proved to be something of a restriction to results obtained from live
input images. The less than adequate lighting in the lab setting combined with
intermittent changes in intensity due to sunlight made consistency of results extremely
difficult during testing. However this can also be interpreted as a positive aspect as
solutions to these problems are required in real-world scenarios.
Finally with regards to limitations, it is important to understand that all of the
testing completed for this project has been in relation to static text images. What is
meant by the word static is that the text content of the image is stationary at the
moment of image acquisition. This is in contrast to more advanced systems that use
sophisticated techniques to extract text from moving vehicles for example.
6.1 Introduction
The results obtained from testing of the image processing algorithm and the overall
system are numerous and generally successful in relation to prior expectations. The
Raspberry Pi is able to acquire an image when triggered. This image can be
transmitted to a laptop wirelessly via a Wifi network where it is applied to the image
processing algorithm. In many instances the correct characters are obtained and a
green LED switched on in response.
37
With specific regards to the image processing and character recognition element
of the project it is important to understand that those results demonstrated in the
following paragraphs have been obtained through many stages of experimentations
with the various components of the algorithm. It is not possible to discuss the result of
each test but a detailed overview is provided.
Not all tests have been successful in achieving the desired target of the system
i.e. to correctly identify the characters in a vehicle registration plate. However each
unsuccessful test has provided information on the effects of the various processing
techniques which has helped in refining elements of the program. Some of the more
interesting results, obtained from unsuccessful tests, are documented in the Complex
Recognition section of this report.
The Basic Detection section will show how the algorithm has been successful in
identifying text regions and recognising them correctly as those in the input image.
The title of the section relates to the relative complexity of the input image which is a
primary reason for the positive results. The Complex Detection section employs a
series of examples to demonstrate how changes to thresholding parameters in the
algorithm affect its performace.
The results presented are supplemented by discussion and analysis of the overall
system and recommendations for further work.
38
method should have no difficulty in detecting all of the character regions and should
only detect minimal non-text regions, or perhaps even zero non-text regions.
39
One of the problems encountered when attempting to generate a character
output was in returning the full registration in the correct order. Following the stroke-
width thresholding, bounding boxes are applied to the image in an attempt to form a
coherent structure from the data. As stated in the methodology section these bounding
boxes are calculated within Matlab but can be adjusted to suit specific applications.
Due to the fact that there are two distinct sections, LLZ and 2268 and the
bounding boxes are included to establish text regions, the resultant output tended to
return the two sections in reverse order. A certain amount of trial and error can be
required to overcome an issue like this one but adjustments to the expansion amount
required for increasing the size of each box proved effective in overcoming the issue.
Figures 20 and 21 show how the bounding boxes have been applied
differently in two iterations of the same algorithm.
40
Figure 24 Basic Detection - Bounding Box Comparison (1)
The top image in Figure 21 shows how a small expansion amount can result in a
vehicle registration plate being separated into two distinct lines of text. This is an
unwanted situation as it can lead to errors when comparing the text string with an
existing database of registration numbers.
In the second image the increased expansion amount has ensured that the OCR
function will consider the text regions on the image as a single string. This is the ideal
scenario when inputting the image to an OCR function as it eliminates alternative
interpretation of the order of the data.
The algorithm has been extremely successful in identifying and correctly
recognising the characters when operating on basic input images like the one in
Figure 16. The processed image, having been applied to each of the stages
documented in this section is applied to the Matlab OCR function, which provides a
result based on its interpretation of the image. Comparing the edges of the character
regions in the image it returns a text string based on correlation to existing templates.
Figures 22 and 23 show the result of the OCR operation on the processed
image, as printed on the Matlab command line.
41
In Figure 22 the result has been returned as two distinct text strings. Although it has
returned the correct characters and proven the effectiveness of the various pre-
processing stages as well as the OCR function it is preferable that the result in a single
line of text.
42
value which can be adjusted to determine the effect of each property in distinguishing
between general colour concentrations and text regions. The first five properties in
each table are the geometric properties discussed in Section 3.4 and the sixth is the
Stroke-width threshold discussed in Section 3.5.
Table 1 contains the base values used to configure the region properties and
stroke-width thresholding levels. With these values in place several separate instances
of character recognition have been successful. In fact, with this configuration the
system has been able to produce the automated response to positive recognition
anticipated in the design. However, these successful cases have been induced in ideal
conditions or with much less complex input images than the one seen in Figure 24.
Table 1 Parameter Values - First Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2< OR <0.9
Euler Number <-4
Stroke-width Threshold 0.4
From left to right the images in Figure 25 as well as in Figures 26, 27 and 28 depict
the three key stages of the image processing algorithm: 1) MSER region detection, 2)
removal of MSER regions based on geometric properties and 3) removal of remaining
non-text regions based on stoke-width detection.
43
the image but has also identified a very high number of additional regions which are
considered potential text regions. It is the sheer quantity of potential RoIs determined
using the MSER methodology that make further pre-processing of the image a
necessary requirement. However it should be noted that the volume of MSER regions
detected in this image is a consequence of the inherent complexity presented. Much of
the experimentation carried out for this project has been undertaken with extremely
basic text images, often resulting in detection of text regions only, or very limited
non-text regions.
In the second image presented in Figure 25 the regionprops command has
been employed to measure the specified geometric properties with the intention of
eliminating non-text regions based on the threshold values seen in Table 1. In this
instance the technique has been fairly successful in removing many of those blob
regions detected using MSER. The areas surrounding the license plate have been
removed, as have many of those on the grill and window-wipers of the vehicle. This
stage of the process has also been successful in maintaining the character regions on
the number plate for further processing.
Despite many of the non-text regions being removed during this stage of the
process it can be deduced from those remaining regions that the parameters
documented in Table 1 are not ideally refined for this image.
The final image in Figure 25 depicts the result of applying stroke-width
analysis and a threshold of 0.4 to the picture. As stated previously, the stroke-width
measurement is calculated as the standard deviation of the stroke widths divided by
the mean of the stoke widths. In this example the stroke-width threshold has been
entered as 0.4. Those areas of the image with a stoke-width measurement greater than
0.4 are indexed and identified as likely text regions.
Use of stroke-width analysis has been partially successful in removing some
of the remaining non-text regions, particularly the significant blob of colour to the
top-left of the image. However there are still several areas of non-text regions which
have not been eliminated. Perhaps of even greater significance is the fact that
application of the stroke width threshold has actually resulted in removal of one of the
license plate characters as a potential text character. In this instance the W does not
meet the specified criteria.
There are a number of possible reasons for the disappointing results obtained
in this example. One of these is the likelihood that the number of non-text regions
44
remaining after the geometric property thresholding had been applied has resulted in a
skewed calculation of the stoke-width average which is not primarily based on actual
text character values.
Another potential reason for the removal of the W character is that the
threshold setting may be too low when applying the stoke-width analysis. Although
increasing the threshold may result in this character being detected it may also result
in additional non-text regions being identified.
The results demonstrated in Figures 26, 27 and 28 will show how making
changes to parameter values in the image processing algorithm can improve or worsen
the overall performance in detection of RoIs.
In Table 2 the stroke-width threshold has been increased from 0.4 to 0.5 with
the other parameters remaining constant from the previous example. The expectation
if that the increased stroke-width threshold will result in inclusion of the W
character as a detected text region. However this change is not going to be a panacea
for the many non-text regions seen previously.
Table 2 Parameter Values - Second Iteration
Parameter Threshold Value
Aspect Ratio >3
Eccentricity >0.995
Solidity <0.3
Extent 0.2<OR<0.9
Euler Number -4
Stroke-width Threshold 0.5
As in the previous example, Figure 26 depicts the effect of the three important pre-
processing techniques on the input image.
45
Figure 29 Processing Results - Second Iteration
The first image shows the results from application of the MSER technique. This is
identical to the one seen in Figure 25 as the MSER technique has no dynamic
properties in this case and the input image is constant. This will also be the case when
MSER is applied in Figures 27 and 28.
As stated, for this iteration of the process all parameters have been held
constant except for the stroke-width threshold. Therefore the second image in Figure
26 displays the same effects as in Figure 25. By following this methodology the
potential effects of varying stoke-width thresholding can be seen.
The final image in Figure 26 demonstrates the effects of applying the
increased stoke-width filter. The most notable change from the previous example is
that every character has now been identified as a potential text region, as the W is
clearly highlighted. Despite the positive result in detection of the previously
overlooked character, the image also exhibits some negative traits associated with an
increased stroke-width threshold. The final image in Figure 26 shows that additional
non-text regions have been identified due to this change, particularly to the right of
the car bonnet, beneath the wing-mirror.
This result helps to highlight the importance of a well-defined stroke-width
threshold when implementing a character detection algorithm. When a narrow filter is
applied the potential for loss of text regions is increased. Equally debilitating is a filter
with too great a threshold as it will enable more non-text regions to be included.
The results displayed in Figure 27 have been achieved by applying greater
adjustments to the initial parameter values in an effort to implement a more robust
algorithm. Table 3 contains the parameter values entered in this instance.
Table 3 Parameter Values - Third Iteration
Parameter Threshold Value
46
Aspect Ratio >5
Eccentricity >0.995
Solidity 0.15
Extent >0.3 OR >0.7
Euler Number <-3
Stroke-width Threshold 0.2
All parameter values in Table 3 have been adjusted except for Eccentricity.
Solidity has been decreased to 0.15, the Euler Number threshold to <-3 and the
Extent range from >0.3 to >0.7. The Aspect Ratio has been increased and the
stroke-width threshold decreased significantly from 0.5 to 0.2.
47
only minimal additional non-text regions have been eliminated at the expense of an
actual text region, suggest that some changes are required.
The final demonstration of the pre-processing stages using the input image in
Figure 16 is shown in Figure 28. The changes made to the parameter values have been
inserted into Table 4 and include some significant adjustments. The aspect ratio has
been adjusted quite dramatically to > 1 with solidity slightly reduced also. The Extent
has been adjusted following a series of tests aimed at refining the range for this
particular image.
Table 4 Parameter Values - Fourth Iteration
Parameter Threshold
Aspect Ratio >1
Eccentricity >0.995
Solidity <0.1
Extent <0.35 OR <0.8
Euler Number <-3
Stroke-width Threshold 0.5
Once again the initial image in the sequence exhibits the same results as previously.
However the difference between this and the second image is clearly a significant
improvement on previous results.
48
instance that only actual text regions remain. Equally impressive is the fact that this
increased robustness did not result in removal of the W character on the registration
plate as had occurred in previous examples.
The effectiveness of the first two stages of processing in this example have
made the stoke-width thresholding element redundant and Figure 28 shows that
application of this techniques produces an output image identical to the input.
49
Figure 33 Complete Test (Basic) - MSER regions
In this instance, the results show that all of the text regions in the image have been
detected. This is not unusual when applying the MSER methodology, however it is
surprising that no other blob regions have been detected. There are several possible
reasons for this to have occurred. One may be the lack of contrast between the
different regions of content in the image. Another may be the particular lighting
conditions in the room at the time.
As a result of the impressive performance of the first stage of image
processing in this example, the subsequent two stages proved redundant as they
produced identical results to the one seen in Figure x
In Figure X4 colour image acquired and transmitted by the Raspberry Pi can
be seen with bounding boxes having been applied. The substantial overlap present
among the bounding boxes suggests that the expansion coefficient entered is larger
than required. However this has not has a negative effect on the overall result
obtained.
50
Figure 34 Complete Test (Basic) - Bounding Boxes
The image in Figure x4 also gives an impression of the light levels in the room at the
moment the image was taken. The apparent dimness may be one of the reasons why
the MSER stage of pre-processing proved so effective as it lessens the contrast
present in the background of the image.
In Figure X5 the text region presented to the OCR engine is depicted by the
merged bounding box surrounding the text characters.
51
Figure 36 Complete Test (Basic) - Result
The answer is then compared with a series of existing registration plate numbers and a
match is found. The final output of the system is the switching on of the green LED to
indicate detection and positive match. This event proves the overall operation from
initial trigger to automated response is functional
52
Figure 38 Complete Test (Complex) - MSER regions
With the addition of extra characters in the foreground image and improved lighting
increasing contrast in the background image, the MSER stage of the algorithm has
detected all of the desired text regions, the undesired foreground characters and one
additional MSER region in the background. This shows that this iteration of the
algorithm is proving a more difficult test of the system than before.
Having applied the technique of removal based on geometric properties, the
image in Figure 4 shows how the method has results in the removal of a number of
the non-text region identified previously.
53
of the image. However an initial visual impression would suggest that these characters
do not correlate very closely with the stroke-width of the text regions and should be
removed following the next step.
Figure 6 is the image after stroke-width thresholding has been applied.
54
Although the bounding boxes may be a little bigger than is strictly necessary it has not
resulted in any unwanted regions being considered by the OCR function. The final
text region applied to the OCR engine can be seen in Figure 8.
55
many instances is a vindication of the work undertaken. These results also provide a
solid foundation from which future work can be developed.
56
lab tests would be uncovered thus improving understanding of the actual requirements
of such a system in a commercial capacity.
The possibility of extending the research element of the project to encompass
the area of neural netwroks and machine learning is an interesting one. These are
methods which may provide an avenue for improving the range of inputs a character
recognition algorithm is able to decipher. Neural networks are widely used in image
processing applications and research is ongoing. For example, they have been used to
determine semantically related words in text documents through embedded clustering
and convolutional neural networks (P. Wang et al., 2016).
Machine learning algorithms have been applied in many contexts regarding
image processing such as improving the accuracy and reproducibility of diagnostic
imaging through a computer-assisted approach (Gertych et al., 2015). However, this
approach does not appear to have such a significant presence related specifically to
text recognition applications.
To correctly determine a course for future work in the subject area
consideration must be given to an eventual outcome thus informing the direction of
study and experimentation. Refinement of the possible outcomes will assist in
narrowing the range of potential areas of interest.
7 Conclusions
7.1 Introduction
Image processing and character recognition is a computationally intensive and
extremely sensitive undertaking. Approaches are fundamentally algorithmic and
require a structured methodology to produce accurate results. There is significant
debate in the literature as to the efficacy of each technique among the multitude
widely practiced and it is perhaps obvious to state that the technique should be
selected on the basis of a pre-defined requirement of the results.
The approach taken for this work has been based on a Connected-Components
methodology, employing MSER and stroke-width analysis as a means of pre-
processing an image for character recognition. Based on the experimental results
obtained and the operation of the overall system, some conclusions can be drawn on
the effectiveness of the technique as applied in this instance.
57
Furthermore, the system developed can be scrutinised and its weaknesses
uncovered with the intention of improving performance in the future.
58
one of the more problem-free sections of the project. It is possible that had the system
been developed further the inclusion of a sensor network to trigger image acquisition
may have caused greater difficulty in this area. However it is unlikey that this would
have been a defining issue.
Control of the Raspberry Pi GPIO pins for the various inputs and outputs of
the system wirelessly through Matlab is a fairly novel approach for implementation of
this type of system. Although not without its difficulties, this wireless communication,
supplemented by the Graphical User Interface provided a user-friendly network for
operators.
The requirement of an automated response was one of the initial specifications
of the system design and with the use of red and green LEDs it has been successfully
integrated into the final result. It is important to provide a visual representation of the
success or failure of the character recognition process to enhance the demonstration of
the product. As stated previously, an ideal model may have included an automated
response to initate the lifting of a barrier or some other appropriate reaction. However,
with the stated primary aim of developing image processing and character
recognition, to proceed with this would have been an unnecessary waste of time.
59
constitute the pre-processing algorithm. It is in these primary and secondary stages of
the process, rather that on the OCR engines themselves, that research and
development must focus to improve the performance of character recognition
techniques in future.
60
References
Abdul Ghani, A. S., & Mat Isa, N. A. (2015). Enhancement of low quality underwater image through integrated
global and local contrast correction. Applied Soft Computing, 37, 332-344.
doi:http://dx.doi.org/10.1016/j.asoc.2015.08.033
Amza, C. G., & Cicic, D. T. (2015). Industrial Image Processing Using Fuzzy-logic. Procedia Engineering, 100,
492-498. doi:http://dx.doi.org/10.1016/j.proeng.2015.01.404
Arm.com. (2016). Arm.com.
Banerjee, S., Mitra, S., & Uma Shankar, B. (2016). Single seed delineation of brain tumor using multi-
thresholding. Information Sciences, 330, 88-103. doi:http://dx.doi.org/10.1016/j.ins.2015.10.018
Belkasim, S. O., Shridhar, M., & Ahmadi, M. (1991). Pattern recognition with moment invariants: A comparative
study and new results. Pattern Recognition, 24(12), 1117-1138. doi:http://dx.doi.org/10.1016/0031-
3203(91)90140-Z
Chen, H., Tsai, S. S., Schroth, G., Chen, D. M., Grzeszczuk, R., & Girod, B. (2011). Robust text detection in
natural images with edge-enhanced maximally stable extremal regions. Paper presented at the Image
Processing (ICIP), 2011 18th IEEE International Conference on.
Choi, S., Yun, J. P., Koo, K., & Kim, S. W. (2012). Localizing slab identification numbers in factory scene
images. Expert Systems with Applications, 39(9), 7621-7636.
doi:http://dx.doi.org/10.1016/j.eswa.2012.01.124
Cugola, G., & Margara, A. (2012). Low latency complex event processing on parallel hardware. Journal of
Parallel and Distributed Computing, 72(2), 205-218. doi:http://dx.doi.org/10.1016/j.jpdc.2011.11.002
de la Rosa-Trevn, J. M., Otn, J., Marabini, R., Zaldvar, A., Vargas, J., Carazo, J. M., & Sorzano, C. O. S.
(2013). Xmipp 3.0: An improved software suite for image processing in electron microscopy. Journal of
Structural Biology, 184(2), 321-328. doi:http://dx.doi.org/10.1016/j.jsb.2013.09.015
Du, J., & Huo, Q. (2013). A discriminative linear regression approach to adaptation of multi-prototype based
classifiers and its applications for Chinese OCR. Pattern Recognition, 46(8), 2313-2322.
doi:http://dx.doi.org/10.1016/j.patcog.2013.01.021
Epshtein, B., Ofek, E., & Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. Paper
presented at the Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.
Ferdoush, S., & Li, X. (2014). Wireless Sensor Network System Design Using Raspberry Pi and Arduino for
Environmental Monitoring Applications. Procedia Computer Science, 34, 103-110.
doi:http://dx.doi.org/10.1016/j.procs.2014.07.059
Fernndez-Caballero, A., Lpez, M. T., & Castillo, J. C. (2012). Display text segmentation after learning best-
fitted OCR binarization parameters. Expert Systems with Applications, 39(4), 4032-4043.
doi:http://dx.doi.org/10.1016/j.eswa.2011.09.162
Fisher, R., Perkins, S., Walker, A., & Wolfart, E. (2003a). Binary Images.
Fisher, R., Perkins, S., Walker, A., & Wolfart, E. (2003b). Grayscale Images.
Gao, Y., Shan, X., Hu, Z., Wang, D., Li, Y., & Tian, X. Extended Compressed Tracking via Random Projection
Based on MSERs and Online LS-SVM Learning. Pattern Recognition.
doi:http://dx.doi.org/10.1016/j.patcog.2016.02.012
Gertych, A., Ing, N., Ma, Z., Fuchs, T. J., Salman, S., Mohanty, S., . . . Knudsen, B. S. (2015). Machine learning
approaches to analyze histological images of tissues from radical prostatectomies. Computerized
Medical Imaging and Graphics, 46, Part 2, 197-208.
doi:http://dx.doi.org/10.1016/j.compmedimag.2015.08.002
Gonzalez, R. C., Woods, R. E., & Eddins, S. L. Digital Image Processing Using MATLAB: Publishing House of
Electronics Industry.
Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2010). Digital Image Processing Using MATLAB: Tata McGraw
Hill Education.
Gupta, M. R., Jacobson, N. P., & Garcia, E. K. (2007). OCR binarization and image pre-processing for searching
historical documents. Pattern Recognition, 40(2), 389-397.
doi:http://dx.doi.org/10.1016/j.patcog.2006.04.043
Gupta, R., Bera, J. N., & Mitra, M. (2010). Development of an embedded system and MATLAB-based GUI for
online acquisition and analysis of ECG signal. Measurement, 43(9), 1119-1126.
doi:http://dx.doi.org/10.1016/j.measurement.2010.05.003
Hanselman, D. C., & Littlefield, B. (2001). Mastering MATLAB 6: A Comprehensive Tutorial and Reference:
Pearson Education.
HashemiSejzei, A., & Jamzad, M. Evaluation of Various Digital Image Processing Techniques for Detecting
Critical Crescent Moon and Introducing CMD - a tool for Critical Crescent Moon Detection. Optik -
International Journal for Light and Electron Optics. doi:http://dx.doi.org/10.1016/j.ijleo.2015.09.158
Hu, L.-Y., Guo, G.-D., & Ma, C.-F. (2015). Image processing using Newton-based algorithm of nonnegative
matrix factorization. Applied Mathematics and Computation, 269, 956-964.
doi:http://dx.doi.org/10.1016/j.amc.2015.08.034
Jurjo, D. L. B. R., Magluta, C., Roitman, N., & Batista Gonalves, P. (2015). Analysis of the structural behavior of
a membrane using digital image processing. Mechanical Systems and Signal Processing, 5455, 394-
404. doi:http://dx.doi.org/10.1016/j.ymssp.2014.08.010
Kavitha, A. S., Shivakumara, P., Kumar, G. H., & Lu, T. Text segmentation in degraded historical document
images. Egyptian Informatics Journal. doi:http://dx.doi.org/10.1016/j.eij.2015.11.003
61
Khandual, A., Luximon, A., Rout, N., Grover, T., & Kandi, I. M. (2015). Evaluation of Fibre Migration Angle by
Image Processing Using Economic Usb Camera and Matlab: Demonstrated Example. Materials Today:
Proceedings, 2(45), 2463-2471. doi:http://dx.doi.org/10.1016/j.matpr.2015.07.187
Khare, V., Shivakumara, P., & Raveendran, P. (2015). A new Histogram Oriented Moments descriptor for multi-
oriented moving text detection in video. Expert Systems with Applications, 42(21), 7627-7640.
doi:http://dx.doi.org/10.1016/j.eswa.2015.06.002
Liu, J., Su, H., Yi, Y., & Hu, W. (2016). Robust text detection via multi-degree of sharpening and blurring. Signal
Processing, 124, 259-265. doi:http://dx.doi.org/10.1016/j.sigpro.2015.06.025
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal
regions. Image and vision computing, 22(10), 761-767.
Mathworks.com. (2015). Automatically Detect and Recognise Text in Natural Images.
Mathworks.com. (2016a). ocr.
Mathworks.com. (2016b). ocrText Class.
Mathworks.com. (2016c). Regionprops - Measure properties of Image Regions.
Mathworks.com. (2016d). Working with Raspberry Pi camera board.
Misaki, M., Barzigar, N., Zotev, V., Phillips, R., Cheng, S., & Bodurka, J. (2015). Real-time fMRI processing with
physiological noise correction Comparison with off-line analysis. Journal of Neuroscience Methods,
256, 117-121. doi:http://dx.doi.org/10.1016/j.jneumeth.2015.08.033
Mishra, A., Alahari, K., & Jawahar, C. V. (2016). Enhancing energy minimization framework for scene text
recognition with top-down cues. Computer Vision and Image Understanding, 145, 30-42.
doi:http://dx.doi.org/10.1016/j.cviu.2016.01.002
Mondal, P., Biswal, P. K., & Banerjee, S. FPGA based accelerated 3D affine transform for real-time image
processing applications. Computers & Electrical Engineering.
doi:http://dx.doi.org/10.1016/j.compeleceng.2015.04.017
Moreno-Daz, R., Pichler, F., & Quesada-Arencibia, A. (2012). Computer Aided Systems Theory -- EUROCAST
2011: 13th International Conference, Las Palmas de Gran Canaria, Spain, February 6-11, 2011,
Revised Selected Papers: Springer Berlin Heidelberg.
Nagy, G. (1982). 29 Optical character recognitionTheory and practice Handbook of Statistics (Vol. Volume 2,
pp. 621-649): Elsevier.
Otsu, N. (1979). Thresholds selection method form grey-level histograms. IEEE Trans. On Systems, Man and
Cybernetics, 9(1), 1979.
Panchal, T., Patel, H., & Panchal, A. (2016). License Plate Detection Using Harris Corner and Character
Segmentation by Integrated Approach from an Image. Procedia Computer Science, 79, 419-425.
doi:http://dx.doi.org/10.1016/j.procs.2016.03.054
Pereira, F. C., & Pereira, C. E. (2015). Embedded Image Processing Systems for Automatic Recognition of Cracks
using UAVs. IFAC-PapersOnLine, 48(10), 16-21. doi:http://dx.doi.org/10.1016/j.ifacol.2015.08.101
Potocnik, P., & Zadnik, Z. (2016). Handwritten character Recognition: Training a Simple NN for
classification using MATLAB.
Raspberrypi.org. (2016a). Camera Module.
Raspberrypi.org. (2016b). Raspberry Pi.
Raspberrypi.org. (2016c). What are the power requirements?
Rebai, I., & BenAyed, Y. (2015). Text-to-speech synthesis system with Arabic diacritic recognition system.
Computer Speech & Language, 34(1), 43-60. doi:http://dx.doi.org/10.1016/j.csl.2015.04.002
Robbins, J. N. (2007). Learning Web Design: A Beginner's Guide to (X) HTML, StyleSheets, and Web Graphics: "
O'Reilly Media, Inc.".
Roy, S., Shivakumara, P., Roy, P. P., Pal, U., Tan, C. L., & Lu, T. (2015). Bayesian classifier for multi-oriented
video text recognition system. Expert Systems with Applications, 42(13), 5554-5566.
doi:http://dx.doi.org/10.1016/j.eswa.2015.02.030
Smith, S., & Brady, J. M. (1997). SUSANA New Approach to Low Level Image Processing. International
Journal of Computer Vision, 23(1), 45-78. doi:10.1023/A:1007963824710
stackexchange.com. (2016). Why images need to be padded before filtering in Frequency Domain.
Steele, V. R., Anderson, N. E., Claus, E. D., Bernat, E. M., Rao, V., Assaf, M., . . . Kiehl, K. A. (2016).
Neuroimaging measures of error-processing: Extracting reliable signals from event-related potentials and
functional magnetic resonance imaging. NeuroImage, 132, 247-260.
doi:http://dx.doi.org/10.1016/j.neuroimage.2016.02.046
Tomar, V. S., & Bhatia, V. (2015). Low Cost and Power Software Defined Radio Using Raspberry Pi for Disaster
Effected Regions. Procedia Computer Science, 58, 401-407.
doi:http://dx.doi.org/10.1016/j.procs.2015.08.047
Tseng, L. Y., & Chen, R. C. (1998). Segmenting handwritten Chinese characters based on heuristic merging of
stroke bounding boxes and dynamic programming1. Pattern Recognition Letters, 19(10), 963-973.
doi:http://dx.doi.org/10.1016/S0167-8655(98)00073-7
Vujovi, V., & Maksimovi, M. (2015). Raspberry Pi as a Sensor Web node for home automation. Computers &
Electrical Engineering, 44, 153-171. doi:http://dx.doi.org/10.1016/j.compeleceng.2015.01.019
Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.-L., & Hao, H. (2016). Semantic expansion using word embedding
clustering and convolutional neural network for improving short text classification. Neurocomputing,
174, Part B, 806-814. doi:http://dx.doi.org/10.1016/j.neucom.2015.09.096
62
Wang, Y., Ban, X., Chen, J., Hu, B., & Yang, X. (2015). License plate recognition based on SIFT feature. Optik -
International Journal for Light and Electron Optics, 126(21), 2895-2901.
doi:http://dx.doi.org/10.1016/j.ijleo.2015.07.040
Wang, Y., Hu, Y.-j., Fan, J.-c., Zhang, Y.-f., & Zhang, Q.-j. (2012). Collision Detection Based on Bounding Box
for NC Machining Simulation. Physics Procedia, 24, Part A, 247-252.
doi:http://dx.doi.org/10.1016/j.phpro.2012.02.037
Willamette.edu. (2016). Image File Formats.
Yi, C., & Tian, Y. (2011). Text String Detection from Natural Scenes by Structure-based Partition and Grouping.
Ieee Transactions on Image Processing, 20(9), 2594-2605. doi:10.1109/TIP.2011.2126586
Zhang, H., Zhao, K., Song, Y.-Z., & Guo, J. (2013). Text extraction from natural scene image: A survey.
Neurocomputing, 122, 310-323. doi:http://dx.doi.org/10.1016/j.neucom.2013.05.037
Zhao, Z., Fang, C., Lin, Z., & Wu, Y. (2015). A robust hybrid method for text detection in natural scenes by
learning-based partial differential equations. Neurocomputing, 168, 23-34.
doi:http://dx.doi.org/10.1016/j.neucom.2015.06.019
Zhu, A., Wang, G., & Dong, Y. (2015). Detecting natural scenes text via auto image partition, two-stage grouping
and two-layer classification. Pattern Recognition Letters, 67, Part 2, 153-162.
doi:http://dx.doi.org/10.1016/j.patrec.2015.06.009
ztrk, F., & zen, F. (2012). A New License Plate Recognition System Based on Probabilistic Neural Networks.
Procedia Technology, 1, 124-128. doi:http://dx.doi.org/10.1016/j.protcy.2012.02.024
63
Appendix A
64