Project Report On 2factor Authentication

A Dissertation Report on
"Accurate Text Detection in Natural Scene Images using MSER Approach"
Is approved for the Degree of Master of Engineering in Computer Science and Engineering
by
Shital Digamber Gheware
Under the Guidance of

Prof. N. G. Dharashive
Dept. of Computer Science and Engineering

M. S. Bidve Engineering College, Latur
Maharashtra State, India
2017-2018
Dissertation Approval Sheet
Dissertation Entitled
“Accurate Text Detection in Natural Scene Images using MSER Approach”

by
Is approved for the Degree of Master of Engineering in Computer Science and Engineering
of
Swami Ramanand Teerth Marathwada University, Nanded
2017-2018
External Examiner Internal Examiner

Name: Name:
Date: Date:
M. S. Bidve Engineering College, Latur.
Department of Computer Science and Engineering
Certificate
This is to certify that dissertation entitled “Accurate Text Detection in Natural Scene Images using
MSER Approach " is bona-fide record of dissertation carried out by Shital Digamber Gheware under
my supervision and guidance for the award of degree of "Master of Computer Science & Engineering"
as per the requirements of M. S. Bidve Engineering College, Latur, (M.S.), India. In my opinion, the
work presented in this dissertation report is of the standard required for the award of degree.
Submitted a satisfactory report about it as per the curriculum of Swami Ramanand Teerth
Marathwada University, Nanded.
During the academic year 2017-2018
PROF. N. G. DHARASHIVE PROF. N. J. PATHAN

PROJECT GUIDE ME CO-ORDINATOR
PROF. S. R. TANDLE PROF. N.B. KHATOD

H.O.D. PRINCIPAL
DECLARATION
I hereby declare that I have formed, completed and written the report entitled “Accurate
Text Detection in Natural Scene Images using MSER Approach”. It has not
previously submitted for the basis of the award of any degree or diploma or other similar
title of this for any other examining body or university.
Place: Latur
Date:
Shital D. Gheware
M.E. II Year
Dept. of Computer Science and
Engineering
ACKNOWLEDGEMENT
With profound sense of regards and gratitude. I thank Prof. N.G. Dharashive for his
valuable guidance incessant interest and constructive suggestions during the course of
dissertation. This dissertation wouldn’t have been possible without his zeal and interest
through the task, right from the beginning. I thank him for his valuable and immense
knowledge and timely help, which made this, project a reality.
I also take this opportunity to thank Prof. S.R. Tandle Head of Computer Science
and Engineering Department for his kind permission towards undertaking this
dissertation work and his keen encouragement during the course of this dissertation.
I also take this opportunity to thank Prof. N.J.Pathan ME Coordinator, for his
kind permission towards undertaking this dissertation work and his keen encouragement
during the course of this dissertation.
I feel grateful to record my pleasant thanks to all staff members of the Computer
Science and Engineering Department for their support, help and assistance which they
extend as and when required.
I deeply thankful to our Principal Prof. N.B. Khatod and the college for providing
us with a platform to excel in life.
Last but by no means the least, I am extremely grateful to my batch mates for their
consistent help.
ME(CSE)
I
Abstract
Text detection method discovers the presence of text in images, videos, etc. This technique
is very needful for many applications, based on content based image analysis, such as web
image search, map analysis, video information retrieval, etc. It is very challenging to
detect the text from natural scene images due to its complex background and noise. The
objective is to design a text detection system which will be able to detect maximum
characters from natural scene images.
In this text detection system, first, the input natural scene image is pre-processed. The
input color image is converted into grey scale image. Then, in the next stage, character
candidates are extracted from gray scale image using the MSERs region detector method.
MSER will extract various features from input image and then divide it into number of
different regions. Then stroke width transform is applied which computes the per pixel the
width of the most likely stroke containing the pixels. Then, morphological filter is applied
to remove noise and unwanted regions detected by MSER. After morphological operation
some heuristic rules are applied to these regions; which removes non-text candidates. The
probability of text is estimated by calculating features like height, width and area of
contours. Finally, text candidates corresponding to true texts are constructed and displayed
using rectangle and text is shown on the command prompt.
II
INDEX
Acknowledgement I
Abstract II
List of Figures VI
List of Tables VII

I
CHAPTER 1 INTRODUCTION 1
1.1 Text Detection 2

1.2 Literature Survey 2
1.2.1 Image types 2
1.2.2 Existing methods for scene text
detection 4
1.2.3 Factors that should be considered 6
during text detection
1.2.4 Different methods for Text Detection 7
1.2.5 Comparative Study 9
1.2.6 Applications 12
CHAPTER 2 BACKGROUND AND RELATED WORK 14

2.1 Overview 15
2.2 Various Methods For text detection 15
2.2.1 MSER ++ 15
2.2.2 Two stage algorithm for ER Pruning 18

2.2.3 Edge-Enhanced MSERS 18
2.2.4 Graph cut model with MSER 19
2.2.5 Snooper Text 22
2.2.6 Skeleton Matching based approach 22
2.2.7 Text detection for Multi- Orientation 23
Scene Images Using Adaptive Clustering
III
2.2.8 Natural scene text detection with 25
multi-layer segmentation and higher order
conditional random field based analysis
2.2.9 Text detection approach based on 25
confidence map and context information
CHAPTER 3 PROPOSED SYSTEM OF TEXT DETECTION IN NATURAL SCENE

IMAGES USING MSER TECHNIQUE 27
3.1 Problem Definition 28
3.2 Problem Description 28

3.3 Scope of the work 28
3.4 Objectives 28
3.5 Technology and associated
Platform 29
3.6 Mathematical Model 29
3.6.1 Complexity Analysis 30
3.7 UML DIAGRAMS 30
3.7.1 Use case diagram 30
3.7.2 Level 0 data flow diagram 31
3.7.3 Level 1 data flow diagram 31
3.7.4 Sequence diagram 32
3.8 Proposed System 32
3.8.1 Character Candidate Extraction (MSER) 33
3.8.2 Non-text elimination 35
3.8.3 Text construction 38
3.8.4 Display Text 38
3.8.5 Output 39
3.9 Selection Tools 40
3.9.1 General 40
3.9.2 Functions having definition 48
IV
CHAPTER 4 PERFORMANANCE ANALYSIS 53
4.1 Datasets 54
4.1.1Sample Dataset 54
4.1.2ICDAR 2011 dataset 54
4.2 Snapshot of the text detection System 56
4.3 Accuracy of the system. 68
4.4 Comparative result of Different methods. 70
4.5 Chapter Summery. 71
CHAPTER 5 CONCLUSION AND FUTURE WORK 72

5.1 Conclusion. 73
5.2 Future Enhancement. 73
References 74
PAPER PUBLISHED 78
1. Survey on Text Detection, Segmentation and Recognition from

a Natural Scene Images.
2. Accurate Text Detection in Natural Scene Images using MSER
Approach
V
List of Figures
Fig No Figure Name Page No
1.1 Document text image 3
1.2 Scene text image 3
1.3 Sample born-digital image 4
2.1 MSER lattice induced by the inclusion relation.
Only certain nodes correspond to characters 17
2.2 System Flowcharts 18
2.3 The flowchart of the algorithm and the illustration of MSER
labeling via graph cut. (a) The flowchart. (b) MSERs in the
Original image.(c) The construction of the graph 21
2.4 Flowchart of the proposed system 23
3.1 Mathematical model for text detection system 29
3.2 Use Case Diagram for Text Detection System 30
3.3 Level 0 data flow diagram for Text Detection System 31
3.4 Level 1 data flow diagram for Text Detection System 31
3.5 Sequence diagram for Text Detection System 32
3.6 System architecture for text detection system 32
3.7 The process of MSER finding (a) MSERs according 33
to variations; (b) MSERs tree after linear reduction;
(c) character candidates after tree accumulation.
3.8 Implementation of SWT 37
3.9 Filing pixels with SWT values 37
3.10 The output of text detection system a) Natural scene image 39
b) Gray image c) MSER regions of input image d) character candidates’

extractions e) Stroke width transform of I/p image f) Joining individual
characters g) Text extractions h) output text on command window
4.1 Sample Dataset images 54
VI
Fig No Figure Name Page No
4.2 ICDAR 2011 dataset images 55
4.3 GUI of text detection system 56
4.4 Scroll bar adjustment for resizing 57
4.5 When we press get image button the open window is going to open 57
4.6 Selected image is going to open in first axis 58
4.7 The result of MSER button press is displayed in second axis 58
4.8 Shows the result of Character data button press 59
4.9 The result of text Data button press 59
4.10 The text will be displayed in command window 60
4.11 The result of reset button press is displayed 60
4.12 The rotate button press for rotated image 61
4.13 The result of rotated image 61
4.14 The image having vertical text 62
4.15 The result of image having vertical text 62
4.16 The image having different font size of each character 63
4.17 The result of image having different font size of each character 63
4.18 The text detection of image having noise 64
4.19 The result of image having noise 64
4.20 The text detection of image having only numbers 65
4.21 The result of Image having only numbers 65
4.22 Text detection of blur image 66
4.23 detected Text of blur image 66
4.24 Text detection system apply on street images 67
4.25 Detected text of street view image 67
4.26 Comparison of MSER with other methods by Recall,
Precision and F measure. 70
VII
List of Tables
Table No Table Name Page No
2.1 Comparative analysis of literature survey 10
4.1 Result analysis using relevant and retrieved values 69

Of sample dataset and ICDAR 2011
4.2 Result analysis of Own Dataset and ICDAR 2011 70
VIII
ROBUST TEXT DETECTION IN NATURAL SCENE IMAGES USING MSER APPROACH
CHAPTER 1
INTRODUCTION
This chapter describes the introduction to Text detection in natural scene images
using MSER.
MSBECL Page 1
INTRODUCTION
1.1 TEXT DETECTION

Text detection from image is one of the important tasks for image processing and
computer vision. Image can have needful information and this information is
important for fully understanding of the image. Text detection is useful in many
content-based image and video applications, such as Content-based web image
search, video information retrieval, and mobile based text analysis and Recognition.
As in recent years, image, as the visual basis for perceiving the world, is the key
media for information acquisition, expression and transmission. The text in natural
scene images has to be robustly detected before being recognized and retrieved. Many
problems need to be solved in order to read text in natural images including text
localization, character and word segmentation, recognition, integration of language
models and context, etc.[8]. Text detection and localization in natural scene images is
important for content-based image analysis. This problem is challenging due to its
complex background, and variations of font, size, color and orientation, etc.[2]. If the
text can be automatically detected, extracted and recognized by the computers, then,
more reliable content-based access to the image data can be achieved. Therefore, how
to locate and extract textual information quickly and accurately from videos and
images becomes a hot topic area in the world today.
1.2LITERATURE SURVEY
Many methods for text detection in natural scene images have been proposed over the
past years; we will briefly review some of these text detection methods and the
comparative study of these methods
1.2.1Image types
In general, the images are divided into three different types : document image, scene
image and born-digital images[1][20]. In this dissertation work only natural scene
image is considered for the detection of text.
A. Document images
Document images are nothing but image-format of the document. Image format of
MSBECL Page 2
any document is created by scanners or cameras. In which, the image is transformed

from paper-based documents into image-format for electric read. In the early stage of
text extraction, there is only focus on document images. The Optical Character
Recognition (OCR) is one of the text extraction techniques that deal with document
images. In General, the document images can have good quality and the background
is very clean, so, the existing OCR software can process document images effectively.
Figure 1.1. Document text image
B. Scene images
Scene images contain the text, such as the advertising boards, banners, which is
captured by the cameras; therefore scene text appears with the background part of the
scene. These types of images are very challenging to detect and recognize, because
the backgrounds are complex, containing the text in different sizes, styles and
alignments. Also, scene text is affected by lighting conditions and perspective
distortions.
Figure 1.2. Scene text image
MSBECL Page 3
C. Born-digital images
Born-digital images are generated by computer software and are saved as digital
images. Compared with document images and scene images, there are more defects in
born digital images, such as more complex foreground/background, low resolution,
compression loss, and severe edge softness. Therefore, during the text extraction from
born-digital images, it is difficult to distinct the text from the background.
Figure 1.3: Sample born-digital image
D. Heterogeneous images
This type of image contains the combination of all above given images i.e. it can have
a digital image with scene text and document text.
1.2.2 Existing methods for scene text detection

Scene text detection methods can roughly be categorized into three groups [1] :
A. Sliding window based methods

Sliding window based methods, also known as region-based methods, use a sliding
window to search for possible texts in the image and then use machine learning
techniques to identify text. Region based methods attempt to detect and localize text
regions by texture analysis [2][11].
Advantage: This approach has been extremely successful in face and pedestrian
detection [16].
Disadvantages:
1. These methods are slow as the image has to be processed in multiple scales [1]
[2][19].
MSBECL Page 4
2. Performance is sensitive to text alignment orientation [2].

3. Require large number of training set of text and non-text samples to train the
classifier [19].
B. Connected component based methods

Connected component based methods directly segment candidate text components by
edge detection or color clustering [2]. This method extracts character candidates from
images by connected component analysis followed by grouping character candidates
into text; text can be treated as union of cc’s [9]. Additional checks may be performed
to remove false positives [13].
Advantages:
1. Achieves good result.
2. ICDAR text locating contest and ImagEval evaluation campaign ranked a
connected component Approach at the first place [15].
3. Another evidence of connected component effectiveness is their recent use for
text detection in video context [15].
Disadvantages:
1. Fails in some natural scene images which have very poor contrast text and
strong illumination [1].
2. Difficult for elimination of false positives [19].
C. Hybrid methods
To overcome the problems of sliding window and connected component methods, this
method is introduced, which combines the advantages of both for better result in text
detection[2].
Advantages:
1) Region-based information is very helpful for text component segmentation
and analysis;
2) This method differentiates text components from non-text components better
than other;
3) learning-based energy minimization method can group text components into
text lines (words) robustly.
Disadvantage: Complex text segmentation.
MSBECL Page 5
1.2.3 Factors that should be considered while detecting text from Scene Images
a) Font style, size (height, width) and thickness (stroke width)[14];
b) Co-ordinates (X, Y) or position of text in image;
c) Background as well as foreground colour and texture;
d) Camera position which can introduce geometric distortions or orientation;
e) Alignment;
f) Symbols, integers and non-text contents;
g) Illumination;
h) Language;
i) Resolution;
j) Contrast;
k) Blur and noise.

These factors are related to the textual information appearing in images, which can be
divided into two groups:
1. The text appeared in image that does not represent any important contents related to
image, that referred as scene text.
2. The text which produced separately from the image is good key to understand the
image, is called as an artificial text.
In contrast to scene text, artificial text is not only an important source of information
but also a significant entity for indexing and retrieval purposes. So it is very
challenging task to detect, segment, recognize and retrieve text from an image with
accuracy and robustness of the image contents.
MSBECL Page 6
1.2.4 Different methods for text detection

Many methods for text detection from scene images have been proposed over the past
years; by various authors. This section gives brief review on methods based on
connected components[2][9],edges[2][15], colors[2], combination of edges and
colors[2], textures[2], corners[2], semiautomatic ground truth generation[7],
strokes[1], etc.
1) Connected Components based method:

The method consists of two steps. The first step is to draw CC from images using a
specific method And the second step is to estimate whether the CC is text CC or not
based on CC feature and CC Relative feature.
2) Sliding window based method:

Sliding window based methods, also known as region-based methods, use a sliding
window to search for possible texts in the image and then use machine learning
techniques to identify text. These methods are slow as the image has to be processed
in multiple scales.
3) Hybrid method:
The hybrid method presented by Pan et al exploits a region detector to detect text
candidates and extracts connected components as character candidates by local
binarization; non-characters are eliminated with a Conditional Random Fields model,
and characters can finally be grouped into text [1].
4) Edge based method:

This method is based on the factor like edge of character; edge is reliable feature of
the text regardless of color/intensity, layout, orientations, etc. As the text region has
high contrast to its background, the edges of character can be easily detected. There
are two steps used in this method: first, an edge extraction algorithm (such as canny
edge detector) is used to get the edges and second, smoothing algorithm or
morphology is used for edges connections and obtaining a full character boundary.
The main disadvantage of this method is that small image regions and stroke may be
misidentified. Therefore this method needs to be verified using other methods.
MSBECL Page 7
5) Color based method:

In this method, color clustering is done by categorizing the pixels with same or
similar colors and forming a candidate region. Then the candidate regions are
analyzed and the CC is estimated. The main challenge of this method is the degree of
clustering. If the data is over clustered, the background and text region may be mixed
together. And if the data is under clustered, the number of clustering will be increased
and the system performance will be degraded.
6) Combination of edges and colors:

Some methods combine Method 1 and Method 2, which detects both edges and colors
of the text. This method has achieved better results by combining both features
together than using these features separately.
7) Texture based method:

This method deals with text regions as a special texture. The region is identified as
text region or not according to the extracted relevant texture of the candidate regions.
To overcome the disadvantages mentioned above, hybrid approach is presented, which
takes the advantages of both texture-based and CC-based methods, to robustly detect
and localize texts in natural scene images. In this method, a text region detector is
designed which is based on the texture. This can be used to estimate the probabilities
of the position and the scale of the text and then it is analyzed to be text region or not.
8) Corner based method:

This approach is inspired by the observation that the characters in the text, usually
contains multiple corner points. The method is to describe the text regions formed by
the corner points using several discriminative features. The research on the method
based on corners is still in the early stage. Compared with texture based method, this
method is faster but the performance is less satisfied.
9) Stroke based method:

As a basic element of text strings, strokes provide robust features for text detection in
natural scene images. Text can be modeled as a combination of stroke components
MSBECL Page 8
with a variety of orientations, and features of text can be extracted from combinations
and distributions of the stroke components. One feature that separates text from other
elements of a scene is its nearly constant stroke feature like stroke width. This can be
utilized to recover regions that are likely to contain text. For stroke- based methods,
text stroke candidates are extracted by segmentation, verified by feature extraction
and classification, and grouped together by clustering. These methods are easy to
implement on specific applications because of the intuition and simplicity. However,
complex backgrounds make text strokes hard to segment and verify.
10) Semiautomatic Ground Truth Generation method:

The semiautomatic ground truth generation system for text detection and recognition
includes text with different orientation and language. In this method, the system
allows user to manually correct the ground truth if the automatic method produces
incorrect results. This method uses eleven at-tributes at the word level, namely: line
index, word index, coordinate values of bounding box, area, content, script type,
orientation information, type of text (caption/scene), condition of text
(distortion/distortion free), start frame, and end frame to evaluate the performance of
the method.
1.2.5 Comparative Study
Comparison of MSER with other region detectors:

In Mikolajczyk et al., 6 region detectors are studied (Harris-affine, Hessian-affine,
MSER, edge based regions, intensity extrema, and salient regions).
1. Region density - MSER detect about 2600 regions for a textured blur scene
and 230 for a light changed.
2. Region size - MSER tended to detect many small regions, versus large regions
which not cover a planar part of the scene.
3. Viewpoint change - MSER outperforms the 5 other region detectors in both

the original images and those with repeated texture motifs.
MSBECL Page 9
4. Scale change - Following Hessian-affine detector, MSER comes in second

under a scale change and in-plane rotation.
5. Blur - MSER proved to be the most sensitive to this type of change in image,
which is the only area that this type of detection is lacking in.
6. Light change - MSER showed the highest repeatability score for this type of
scene, with all the other having good robustness as well.
Table below shows the comparative analysis of different methods by their accuracy
and datasets used.
Table 2.1: Comparative analysis of literature survey
Author’s f- Dataset
Name Year Precision Recall Methodology
measure Used
Lukas
Neumann 2011 68.9 52.5 59.6 MSER ++ ICDAR
et.al.[1] 2011
Lukas ICDAR
2 stage
Neumann 2011,
2012 73.1 64.7 68.7 Algorithm for Street View
et.al.[18]
ERs pruning. Text
Dataset
Cunzhao Graph cut
Shi 2013 83.3 63.1 71.8 model ICDAR
et.al.[19] with MSER 2011
ICDAR
2011,
MSER as Multilingual
Xu-Cheng Yin 2013 86.29 68.26 76.22 Character DB, Street
et.al.[1] candidate view
DB,Multi-
orientation
DB.
MSBECL Page 10
Author’s f- Dataset
Name Year Precision Recall Methodology

measure Used
Use of
snooper text,
Rodrigo toggle- ITW, SVT,
Minetto 2014 0.74 0.63 0.68 Mapping EPS,
et.al.[24] image ICD DB.
segmentation,
HOG-based
descrip
tor
Obtaining ICDAR2003
B.H.Shekar 2015 0.84 0.79 0.82 skeleton ,
et.al.[21] using ICDAR
morphology 2011
hierarchicalcl USTB-
SV1K
ustering with DB,TD500
a unified MSRA-DB,
Xu- ChengYin distance
2015 0.81 0.63 0.71 ICDAR2011
et.al.[22] metric
,
learning
ICDAR
framework.
2013.
multi-layer seg- ICDAR

mentation,hig 2003,
XiaobingWang her ICDAR
et.al.[23] 2015 0.81 0.68 0.74 order
conditional 2011,
random ICDAR
field(CRF), 2013.
Graph cuts
Runmin confidence ICDAR2005
map ,
Wang 2015 0.77 0.60 0.68
and context ICDAR2011
et.al.[25]
information ,
ICDAR2013
MSBECL Page 11
1.2.6 Applications.[3]
Text detection, segmentation and extraction from complex images can be applied to a
variety of fields where the information needs to be analyzed and understood. Some of
these applications are given below:
1) Image understanding: When images can be automatically understood and

indexed by computer, the efficiency of running digital libraries and video
database system will be greatly improved [1].
2) Content-based image filtering: In content based filtering, image spam can be

detected and pornography, reactionary and fraud words can be easily filtered
[1].
3) Super map: Text extraction technology can be applied to detect scene text from
images taken with laptops, phones and other equipments, so as to be applied to
maps, navigation, automatic translation, foreign-related tour guides, walking
robots and intelligent monitoring system [1] and also used as visual impaired
peoples assistance [9].
4) Vehicle testing: Vehicle license and scene subtitles have many features in
common, so text extraction can be used to supervise the traffic in real time.
After text extraction from highway video flow, the traffic situation can be
overseen and vehicle licenses can be recognized easily from traffic accidents,
which can improve the efficiency of the transportation systems [1].
5) Optical character reading: Reads text from paper and translates images into a
form that computer can manipulate (for example, into ASCII codes). An OCR
system enables to take a book, feed it directly into an electronic computer file,
and then edit the file using a word processor.
MSBECL Page 12
6) Automatic localization of postal addresses on envelopes and Automatic Geo

coding: Postal automation tries to get the mail from the sender to the recipient
quickly, in a reliable and economical process.
7) Text extraction in video sequences: Caption text or superimposed text provides

valuable information about contents in images and video sequences.
8) Wearable applications: Wearable devices such as goggles, phones, cameras are

created for detecting text elements and can be converted into voice for blind
peoples [12].
9) Online electric goods search: Online shopping applications using mobile phone
allows customer to type the name of goods and get required information about
it with images and descriptions [12].
MSBECL Page 13
_________________________________
CHAPTER 2
BACKGROUND AND RELATED WORK
In this chapter, we review some of the existing work in the field Text detection in
natural scene images by using MSER technique and without using MSER technique.
MSBECL Page 14
2.1 Overview
MSER-based methods have demonstrated very promising performance in many real
projects. However, current MSER-based methods still have some key limitations, i.e.,
they may suffer from detecting of repeating components and also insufficient text
candidates construction algorithms. In this section, we will review the MSER-based
methods focusing on these two problems1) MSER pruning problem2) Text candidate
construction problem. The main advantage of MSER-based methods over traditional
connected component based methods roots in the usage of the MSERs algorithm for
character extraction. The MSERs algorithm is able to detect most characters even
when the image is in low quality (low resolution, strong noises, low contrast, etc.).
However, one severe but not so obvious pitfall of the MSERs algorithm is that most of
the detected MSERs are in fact repeating with each other. Repeating MSERs are
problematic for the latter character candidates grouping algorithm, thus most of the
repeating MSERs, apart from the MSERs that most likely correspond to character,
need to be removed before being fed to the character grouping algorithm.
2.2 Various Methods For text detection

In these methods we are going to see how the text detection is going to perform
on text by using MSER approach and discarding it.
2.2.1 MSER ++[21]

An efficient method for text localization and recognition in real-world images is
proposed. Thanks to effective pruning, it is able to exhaustively search the space of all
character sequences in real time (200ms on a 640_480 image).The method exploits
higher-order properties of text such as word text lines. We demonstrate that the
grouping stage plays a key role in the text localization performance and that a robust
and precise grouping stage is able to compensate errors of the character detector. The
method includes a novel selector of Maximally Stable Extremal Regions (MSER)
which exploits region topology. Experimental validation shows that 95.7% characters
in the ICDAR dataset are detected using the novel selector of MSERs with a low
sensitivity threshold. The proposed method was evaluated on the standard ICDAR
2003 dataset where it achieved state-of-the-art results in both text localization and
recognition
MSBECL Page 15
A. Character grouping search space
Let I denote an image of n pixels and let P(I) denote set of all sub regions of the
image I. Let sL denote an arbitrary sequence of non-repeating image sub regions in equation
2.1
(2.1)
sL=( , ) ri € P (I); ri ≠rj ˅ i; j of length L
1 2,……..,
(2.2 )
let SL = ⋃
=1
Denote set of all sequences of length L and let S denote set of all sequences of
lengths up to in equation 2.2
⋃ =1
SL =
Given a verification function v : S →{0; 1}, the set of estimates (words) E. The
*
methods for text localization aim to find an optimal verification function v (s) so
that f-measure of precision and recall is maximized, where T denotes set of words
in the ground truth.
B. Extended Maximally Stable Extremal Regions
we extend this approach by using whole tree of MSER lattice induced by the
inclusion relation, in contrast to [6] where only root nodes (i.e. supremums of the
MSER lattice) were considered which implied that a high MSER margin had to be
used to maximize the number of root nodes which correspond to letters. If a lower
margin is used, the MSER detector finds more regions but only certain regions
correspond to characters. As shown in Figure 2.1, smaller MSERs are embedded
into bigger ones, thus forming a tree where only certain combinations of nodes
can be selected as letters, because in a word one letter cannot be embedded into
another. We refer to individual nodes of the MSER tree as MSER++ to emphasize
that multiple projections (gray, red, green and blue channel) are used and the
internal tree structure is taken into account.
MSBECL Page 16
Figure 2.1 MSER lattice induced by the inclusion relation. Only certain nodes
correspond to characters
C .Exhaustive search
Let M denote the set of MSER++ in the image I. Even though the cardinality of M is
linear in number of pixels, the cardinality of the set S of all sequences is still
exponential.
^ ^ ^
Let v1; v2, , , , vn denote “upper-bound” verification functions which determine
L
whether s is a subsequence of a text
Sequence or a text sequence itself.

(2.3)
^ =⋃ { ∶ ∀ ′ℶ +1
̂ + 1( ′) = 0}
=1
This decomposition allows efficient pruning of the exhaustive search, because non-
text subsequences are excluded without a need to build a complete sequence, which
L
prevents from a combinatorial explosion of enumerating the S sets of all sequences
of length L.
C. Verification functions
The function ^v1(r) is a SVM character classifier, which determines whether the
region is a character or not based on a set of region measurements (height ratio,
compactness, etc.) -The function is scale invariant, but not rotation invariant so
possible rotations had to be included in the training set. On average, the ^v1 function
correctly includes 83% of text regions whilst it correctly excludes 93% of non text
regions such as plants, trees or other random textures.
MSBECL Page 17
2.2.2 Two stage algorithm for ER Pruning. [18]
An end-to-end real-time scene text localization and recognition method is presented.

The real-time performance is achieved by posing the character detection problem as an
efficient sequential selection from the set of Extremal Regions (ERs). The ER detector is
robust to blur, illumination, color and texture variation and handles low-contrast text.
In the first classification stage, the probability of each ER being a character is estimated using
novel features calculated with O (1) complexity per region tested. Only ERs with locally
maximal probability are selected for the second stage, where the classification is improved
using more computationally expensive features. A highly efficient exhaustive search with
feedback loops is then applied to group ERs into words and to select the most probable
character segmentation. Finally, text is recognized in an OCR stage trained using synthetic
fonts. The method was evaluated on two public datasets. On the ICDAR 2011 dataset, the
method achieves state-of-the- art text localization results amongst published methods and it is
the first one to report results for end-to-end text recognition. On the more challenging Street
View Text dataset, the method achieves state-of-the-art recall. The robustness of the proposed
method against noise and low contrast of characters is demonstrated by “false positives”
caused by detected watermark text in the dataset.
2.2.3 EDGE-ENHANCED MSERS [18]
Fig 2.2 System Flowcharts
Detecting text in natural images is an important prerequisite. They propose a novel

text detection algorithm, which employs edge-enhanced Maximally Stable Extremal
Regions as basic letter candidates. These candidates are then filtered using geometric
and stroke width information to exclude non-text objects. Letters are paired to identify
text lines, which are subsequently separated into words. They evaluate their system
using the ICDAR competition dataset and their mobile document database. The
experimental results demonstrate the excellent performance of the proposed
method.The flowchart of text detection algorithm is shown in Fig.2.2 at the input of
MSBECL Page 18
the system; the image intensities are linearly adjusted to enhance the contrast.
Subsequently, MSER regions are efficiently extracted from the image and enhanced
using canny edges obtained from the original gray-scale image. As a next step, the
resulting CCs are filtered using geometric constraints on properties like aspect ratio
and number of holes. The stroke width information is robustly computed using a
distance transform and objects with high variation in stroke width are rejected. Text
candidates are grouped pair wise and form text lines. Finally, words within a text line
are separated, giving segmented word patches at the output of our system.
2.2.4 Graph cut model with MSER [23]

For region-based methods, a large number of training set of text and non-text samples
are needed to train a suited classifier and it is especially difficult to make sure the non-
text samples are representative enough. Moreover, as region-based methods need to
scan the image at different scales, the speed is relatively slow .it is quite difficult to
design a fast and also reliable CC analyzer to eliminate false positives without losing
the text components. To overcome the above problems, we propose an effective CC-
based scene text detection method by incorporating region based as well as context
information into a graph model to build a fast and effective CC analyzer. First, we use
MSERs (Chum et al.,2002) detected in the original image as basic CCs. Then, due to
the high degree of interclass variation of scene characters as well as the limited
number of training samples, single information source or classifier is not enough to
label the MSERs as text or non-text ones. Thus, in order to make use of various
information sources,
We construct a MSERs-based graph model whose cost function incorporates region

based as well as context-relevant information and the MSERs could then be
efficiently labeled as text or non text ones by minimizing the cost function via graph
cut algorithm. Finally, since most non-text MSERs are eliminated, the left text
candidate components are grouped into lines by simple heuristic rules and the false
positives are removed by a trained classifier. Since we use the same evaluation
framework as ICDAR 2011 text localization competition, the text lines are partitioned
into words. The proposed method is scale-insensitive, context-relevant and there is no
need for multi-scale computation. Experimental results on ICDAR 2011 text
localization dataset report higher performance.
MSBECL Page 19
The flowchart of the proposed method is shown in Fig. 2. 3 .The main contributions
of this paper include four aspects.
1. MSERs detected in the original image, which are shown to be suitable for text
detection in the experiment (Neumann and Matas, 2010; Chen et al., 2011; Neumann
and Matas, 2011b), are used as basic CCs.
2. Effective features specially designed for MSERs are used to train a classifier to
estimate the probability of MSERs being text.
3. In order to design an effective CC analyzer to label the MSERs as text regions or

non-text ones, we build a MSERs-based graph model whose cost function
incorporates region-based as well as context-relevant information.
4. Different information carried by the cost function could be optimally balanced to

get the final MSERs labeling result by minimizing the cost function via graph cut
algorithm.
MSBECL Page 20
Fig 2.3 The flowchart of the algorithm and the illustration of MSER labeling via
graph cut. (a) The flowchart. (b) MSERs in the Original image. (c) The construction
of the graph
Concretely, as shown in Fig. 2.3, first, two kinds of MSERs, dark on-light and light-
on-dark ones, are detected. Then, we focus on the MSERs labeling process which
removes the non-text MSERs while also preserving text ones. To this end, a graph
whose nodes are the MSERs is first constructed, and the MSERs are then labeled as
text or non-text regions by minimizing the carefully designed unary and pair wise cost
MSBECL Page 21
function via max-flow/min-cut algorithm (Boykov and Kolmogorov, 2004).Next, text

candidates’ components are grouped into lines which are then partitioned into words
and a classifier is used to Remove the non-text blocks. Finally, results from both kinds
of the MSERs are merged.
2.2.5 Snooper Text [24]

SNOOPERTEXT is an original detector for textual information embedded in photos
of building facades (such as names of stores, products and services) that we
developed for the iTowns urban geographic information project. SNOOPERTEXT
locates candidate characters by using toggle-mapping image segmentation and
character/non-character classification based on shape descriptors. The candidate
characters are then grouped to form either candidate words or candidate text lines.
These candidate regions are then validated by a text/ non text classifier using a HOG-
based descriptor specifically tuned to single-line text regions. These operations are
applied at multiple image scales in order to suppress irrelevant detail in character
shapes and to avoid the use of overly large kernels in the segmentation. We show that
SNOOPERTEXT outperforms other published state-of-the-art text detection
algorithms on standard image benchmarks. We also describe two metrics to evaluate
the end-to-end performance of text extraction systems, and show that the use of
SNOOPERTEXT as a pre-filter significantly improves the performance of a general-
purpose OCR algorithm when applied to photos of urban scenes.
2.2.6 Skeleton Matching based approach [21]

In this paper, they propose a skeleton matching based approach which aids in text
localization in scene images. The input image is preprocessed and segmented into
blocks using connected component analysis. We obtain the skeleton of the segmented
block using morphology based approach. The skeleton zed images are compared with
the trained templates in the database to categorize into text and non-text blocks.
Further, the newly designed geometrical rules and morphological operations are
employed on the detected text blocks for scene text localization. The experimental
results obtained on publicly available standard datasets illustrate that the proposed
method can detect and localize the texts of various sizes, fonts and colors.
MSBECL Page 22
Fig.2.4 Flowchart of the proposed system
2.2.7 Text detection for Multi- Orientation Scene Images Using Adaptive
Clustering.[22]
Detection of text in camera-based images is a vital requirement for several computer
vision applications. Text detection task is frequently challenging due to difficulties
like composite backgrounds, dissimilarities of text orientations, font, size, color. The
aim is to recognize text in a combine manner by searching for words from the image
into text areas or single character candidates. Text captured in natural scenes is most
of the times with multiple orientations and point of distortions. Currently most
research efforts focuses on horizontal orientation from images. To address same issues
a novel approach unified distance metric learning framework is proposed an adaptive
hierarchical clustering, which learns weights of the character candidates once at a
time and adaptively integrate different feature similarities. An effective multi-
orientation text detection system, which constructs the text character candidates by
grouping characters based on an adaptive clustering.
A hierarchical structure based 2-dimensional proximity matrix is design with the help
of an hierarchical clustering and also arrange data into a hierarchical structure manner.
The outcomes are typically presented by a binary tree or dendrogram. From these
outcomes different clusters of the data formation is done.
Distance metric learning form is expressed in eqn 2.4
T (2.4)
d (xi, xj: ω) = ω vec(xi, xj),
where weight vector ?, vec (xi, xj) is the similarity vector of two variables xi and xj.
MSBECL Page 23
In this, the aim of this framework is to forming the two sets of clusters i.e. set S for
same pair of points and set D for different pair of points. The distance of pair of points
in set D is maximized and the same is minimized in set S. this framework is also
capable of providing tough and indicative problems which are responsible for the
formation of number of representative part of the problem, i.e., given the labeled
cluster set {Ck}mk=1(with m clusters), the following strategy is used to compute D
and S
^ ^ m (2.5)
D = {( , ) = arg min d(x, y; ω)} k=1
x ϵ Ck ,y ϵ C-k
S = {( ,) = arg max min d(x, y; ω)}mk=1 (2.6)
x ϵ Ck y ϵ Ck
Multi orientation scene text detection system

1) Character candidates’ extraction. Character candidates are extracted using the
MSERs algorithm; most of the repeating Components are removed with a MSERs
pruning algorithm by minimizing regularized variations.
2) Text candidates’ construction. Text candidates are constructed using three
sequential coarse-to-fine character grouping Steps, i.e., morphology clustering,
orientation clustering and projection clustering.
3) Text candidates’ elimination. The posterior probabilities of text candidates
corresponding to non-text are estimated using the character classifier and text
candidates with high non-text probabilities are removed.
4) Text candidates’ classification. Text candidates corresponding to true text are
identified by the text classifier, which is Trained to decide whether a text candidate
corresponding to the true text or not.
Clustering types.
1) Morphology clustering (morphology-based grouping via clustering). By the
character appearances (color, stroke width and location differences), character
candidates are clustered into Nmor groups using single-link clustering.
MSBECL Page 24
2) Orientation clustering (orientation-based grouping via clustering).By the character

pair orientation, character pairs from each group Gmor i are then clustered into Nori
groups.
3) Projection clustering (projection-based grouping via clustering).By the character
pair intercepts (projection of orientation Vectors), pairs in each group Gori i finally
clustered into Npro groups, each group corresponds to a text candidate).
2.2.8 Natural scene text detection with multi-layer segmentation and higher
order conditional random field based analysis [23]
Text detection in natural scene images is a hot and challenging problem in pattern
recognition and computer vision .Considering the complex situation in natural scene
images ,they propose a robust two steps method in this paper based on multi layer
segmentation and higher order conditional random field .Given an input image ,the
method separates text from its background by using multi layer segmentation, which
decomposes the input image into nine layers .Then ,the connected components in
these different layers are obtained as candidate text .these candidate text CCs are
verified by higher order CRF based analysis. Inspired from the multistage information
integration mechanism of visual brains, features from three different levels, including
separate CCs, CC pairs and CC strings, are integrated by higher order CRF model to
distinguish text from non text. The remaining CCs are then grouped into words for
easy evaluation.
2.2.9 Text detection approach based on confidence map and context information
[25]
Text information plays a significant role in many applications for providing more
descriptive and abstract information than other objects. In this paper, an approach
based on the confidence map and context information is proposed to robustly detect
text in natural scenes. Most of the conventional methods design sophisticated texture
features to describe the text regions, while we focus on building a confidence map
model by integrating the seed candidate appearance and the relationships with
Its adjacent candidates to highlight the texts from the background, and the candidates
with low confidence value will be removed. In order to improve the recall rate, the
text context information is adopted to regain the missing text regions. Finally, the text
MSBECL Page 25
lines are formed and further verified, and the words are obtained by calculating the
threshold to separate the intra word letters from the inter word letters.
MSBECL Page 26
______________________________________________
CHAPTER 3
PROPOSED SYSTEM OF TEXT DETECTION IN NATURAL SCENE IMAGES
USING MSER TECHNIQUE
______________________________________________
This chapter describes the proposed system development objective, requirement of
the system and implementation
_____________________________________________________________________
MSBECL Page 27
3.1 PROBLEM DEFINITION
To design a system for text detection in natural scene images using MSER to extract
Maximally Stable Extremal Regions (MSER) as character candidates.
3.2 PROBLEM DESCRIPTION

To help in detecting and recognizing the texts from natural scene images, by
developing the text detection system using maximally stable extremal region
approach. This method will consume less computational time and memory space by
designing intelligent, flexible architecture for detecting the MSERs as texts. Text
detection in images can be applied to a variety of fields where the information needs
to be analyzed and understood using the image. When natural scene image is applied
to this system, image is converted to grayscale. Then, MSERs are detected from
image, which is considered as character candidates and then clustering is applied for
grouping texts into words.
3.3 SCOPE OF THE WORK

In this dissertation work, first natural scene image is pre-processed to remove various
kinds of noise. Second, character candidates are extracted from gray image using the
MSERs algorithm. Third, non text candidates are removed using text measurement
and SWT, and the probability of text is estimated. Fourth, text candidates
corresponding to true texts are estimated finally the text will be displayed by using
OCR.
3.4 OBJECTIVES
The objectives of this project are as follows:
a. To develop a text detection system.
b. The main objective of literature survey is to compare various detection
methods and there results.
c. Creation of the natural scene image dataset.
d. To detect text from natural scene images.
e. To detect maximum characters from blur images.
MSBECL Page 28
3.5 TECHNOLOGY AND ASSOCIATED PLATFORM
Hardware Requirements
1. Processor : Pentium IV. ( and onwards ).
2. Memory ( RAM ) : 256 MB RAM.
3. Hard disk : 40GB
4. Web Cam / Camera
1. Operating System : Microsoft operating system(Windows 7)

2. Mat Lab R2014b
3. Data sets (ICDAR 2011, Own Dataset)
3.6 MATHEMATICAL MODEL

Mathematical model for text detection system is defined using deterministic
finite automata and the State diagram below:
M = {Q, ∑, ∂, q0, F} (3.1)
Figure 3.1: Mathematical model for text detection system

where,
Q = {S1, S2, S3, S4,
S5}; ∑ = {Scene
Image}; q0= S1;
F= S5;
∂(S1, P reprocessing) = S2;
∂(S2, CharCandidateExtraction) = S3;
∂(S3, N on − Text Elimination) = S4;
∂(S4, Text Candidate Construction) = S5;
Assumptions:
i0→ input image;
MSBECL Page 29
i0= ∑nj=0; where pjis pixel;

S1= Taking input image I and apply resize and convert to grayscale;
I → G (I1);S2= Character candidate extraction;
I1→find MSER using the detectMSERFeatures on input image;
S3= Non-text elimination;
I2=common text measurement and stroke width
transform; S4= Text candidate construction;
I3= {P (O(m, n; p)) → P (O(m, n; p) | text)P (text)+P (O(m, n; p) | non − text)P
(non − text);
S5= Display text;
3.6.1 Complexity Analysis

The complexity of this algorithm is linear to the number of tree nodes.
3.7 UML DIAGRAMS

3.7.1 Use case diagram
Figure 3.2 below shows Use Case diagram for the text detection system.
Figure 3.2: Use Case Diagram for Text Detection System
MSBECL Page 30
As shown in the figure, there are six different use cases and one actor. The actor can
be end user or the system that will upload the natural scene image to detect the text in
it. The use cases are uploading Natural scene image to text detection system,
uploading image, pre-processing, extracting character candidates using MSER
Algorithm, non-text elimination, text construction with rectangle and display of text.
3.7.2 Level 0 data flow diagram
Figure below shows the Level 0 Data flow diagram for text detection system:
As shown in the figure, the text detection system takes the natural scene image as
an input and produce Output as detected text.
Figure 3.3: Level 0 data flow diagram for Text Detection System
3.7.3 Level 1 data flow diagram

Figure below shows the Level 1 Data flow diagram for text detection system:
Figure 3.4: Level 1 data flow diagram for Text Detection System
As shown in figure 3.4, the flow of the system is from user can upload the natural
scene image and pre-process on it. After that image pass through the extraction,
Elimination and construction phase. Finally, it displays detected text from input
image.
MSBECL Page 31
3.7.4 Sequence diagram
Figure 3.5 below shows the Sequence diagram for text detection system:
As shown in figure, the sequence of text detection is as follows: First the user
will upload the natural scene image in the system as an input. After that, the
system will pre-process on the input image. It converts the RGB image into gray
scale. Then the character candidates are extracted using MSER algorithm. The
non-text is eliminated and the final text is constructed and displayed using OCR.
Figure 3.5: Sequence diagram for Text Detection System
3.8 Proposed System
Input Natural Pre processing Character

scene Image candidate
Extraction (MSER)
Display Text Text Candidate Non Text

Construction Elimination
Figure 3.6: System architecture for text detection system.
System architecture shows the first stage is taking the natural scene image which
is taken by web cam or camera. The pre processing stage is the resize which takes
MSBECL Page 32
input as a RGB or Grayscale or binary image .this function returns image B that is
scale times the size of A. In the next stage rgb2gray (RGB) function is used to convert
RGB images to grayscale by eliminating the hue and saturation information while
retaining the luminance. Here we can also use the global thresholding binarization
method [7][17]. But here we have used the method detectMSERFeatures method
which can take only gray image as a input.
3.8.1 Character Candidate Extraction (MSER)
Extremal Region
Extremal region: An extremal region is a connected component of an image whose

pixels have either higher or lower intensity than its outer boundary pixels [1].
Extremal regions of the whole image are extracted as a rooted tree. An extremal
region is in fact a set of pixels. Its variation is defined as follows [1][18]:
Let Rl be an extremal region in the input image, then the branch of the tree rooted at R
will be as:
) (3.2)
B ( Rl ) = (Rl,Rl+1, ...,Rl+∆
Where, ∆ is a parameter for region R;the variation (instability) of Rlis defined as :
V (Rl) = | Rl+ ∆Rl| ÷ | Rl| (3.3)
Where, | R |= number of pixels in R.
An extremal region R l is a maximally stable extremal region if its variation is lower

and more stable than its parent Rl−1 and child Rl+1 [1].
Figure 3.7: The process of MSER finding (a) MSERs according to

variations; (b) MSERs tree after linear reduction; (c) character
candidates after tree accumulation.
MSBECL Page 33
Linear Reduction
When MSER tree has only one child then the linear reduction algorithm is applied.
This will select one of the character from parent and child, which has minimum
variation and removes the another one. This procedure is applied to the MSER tree
recursively.
The algorithm works as below.
Given the MSER node t, the linear reduction algorithm first calculate number of
children of t; if t has no children, then it return t; if t has only one child, it get the root
c of child tree by first applying the linear reduction procedure to the child tree; then it
checks the variations of both parent and child; if t has a lower variation compared to
c, it link c to t and return t; otherwise return c; if t has more than
one child, it process these children using linear reduction procedure and link the
resulting trees to t.
Step 1: linear reduction with input text tree

Step 2: if no. of children in tree T = 0 then
return T
else if no. of children in T = 1 then
Apply linear reduction with empty set c.
Step 3 : if variation of T is less than variation of c then link the childrens of T to tree
Return T
Else return c
if t has more than 1 children
Process t by LR and link to T
Return T
Tree Accumulation
When MSERs has more than one child, then the tree accumulation algorithm is used.
This procedure returns a set of disconnected nodes.
The algorithm for tree accumulation works as follows.
For a given node t, tree accumulation calculates the number of children’s of t; if t has
no children, then it returns t; if t has more than two children, it creates an empty set C
MSBECL Page 34
and append the result of applying tree accumulation to children of t to C; if one of the
nodes in C has a lower variation than variation of t, it return C, otherwise it removes
children of t and return t.
Step 1: Apply tree accumulation on the linear reduction tree
Step 2: if T has more than two children’s then
Create empty set as C
For each c in the children of tree T
Apply C = C ∪ TREE ACCUMULATION(c)
Step 3: if variation of T is less than the minimum variation of c then

Discard the children’s of T
Return T
Else return c
In our project we have used the method detectMSERFeatures for getting the
MSERRegions object. This object contains information about MSER features detected
in 2D grayscale input image. We retrieve the pixels from this. Then we use the canny
edge detector method, it is an image processing technique for finding the boundaries
of objects within images .It works by detecting discontinuities in brightness. Edge
detection is used for image segmentation and data extraction in areas such as image
processing, computer vision and machine vision. Then apply the image gradient of
image I which return magnitude and direction of intensities. HelperGrowEdges Grow
edges along or opposite to gradients helperGrowEdges
(Edges, GradientDirection, TextPolarity) asymmetrically dilates binary image
edges in the direction specified by gradient Direction. Text Polarity is a string
specifying whether to grow along or opposite the gradient Direction, 'Light Text On
Dark' or 'Dark Text On Light', respectively. Then we check for connected
components by using the following method. CC = bwconncomp (BW) returns the
connected components CC found in BW. The binary image BW can have any
dimension. CC is a structure with four fields, Connectivity, Image Size, Num Objects,
and Pixel Idx List.
3.8.2 Non-text elimination

To eliminate the region this does not follow the text measurement rules like
Eccentricity, Area, Solidity, and Extent.
MSBECL Page 35
D = bwdist(BW) computes the Euclidean distance transform of the binary image BW.
For each pixel in BW, the distance transform assigns a number that is the distance
between that pixel and the nearest nonzero pixel of BW. bwdist uses the Euclidean
distance metric by default. BW can have any dimension. D is the same size as BW.
helperStrokeWidth Transforms distance image into stroke width image
StrokeWidthImage = helperStrokeWidth (Distance Image);
returns a Stroke Width Image computed from Distance Image, containing a value for
stroke width at each non-zero pixel in the Distance Image. Distance Image is a
Euclidean distance transform of a binary image computed by bwdist.
Another useful discriminator for text in images is the variation in stroke width within
each text candidate. Characters in most languages have a similar stroke width or
thickness throughout. It is therefore useful to remove regions where the stroke width
exhibits too much variation [1]. The stroke width image below is computed using the
helperStrokeWidth helper function. Note that most non-text regions show a large
variation in stroke width. These can now be filtered using the coefficient of stroke
width variation.
The Stroke Width Transform In this section I will describe the Stroke Width
Transform algorithm as it is presented in [1], with several additions and
enhancements. These additions will be discussed in further extent in the next section
‘The Application’. The algorithm receives an RGB image and returns an image of the
same size, where the regions of Suspected text is marked. It has 3 major steps: the
stroke width transform, grouping the pixels into letter candidates based on their stroke
width, and finally, grouping letter candidates into regions of text.
The Stroke Width Transform
A stroke in the image is a continuous band of a nearly constant width. An example of

a stroke is shownin figure 3.8(a). The Stroke Width Transform (SWT) is a local
operator which calculates for each pixel the width of the most likely stroke containing
the pixel.
First, all pixels are initialized with ∞ as their stroke width. Then, we calculate the
edge map of the image by using the canny edge detector. We consider the edges as
MSBECL Page 36
possible stroke boundaries, and we wish to find the width of such stroke. If p is an
edge pixel, the direction of the gradient is roughly perpendicular to the orientation of
the stroke boundary. Therefore, the next step is to calculate the gradient direction
gp of the edge pixels, and follow the ray r=p+n*gp (n>0) until we find another edge
pixel q. If the gradient direction gq at q is roughly opposite to gp, then each pixel in
the ray is assigned the distance between p and q as their stroke width, unless it already
has a lower value. If, however, an edge pixel q is not found, or gq is not opposite to
gp, the ray is discarded.
Fig 3.8 Implementation of SWT
Fig 3.9 Filing pixels with SWT values.

In order to accommodate both bright text on a dark background and dark text on a
bright background, we need to apply the algorithm twice: once with the ray direction
gp and once with –gp. After the first pass described above, pixels in complex
locations might not hold the true stroke width value (figure 3.8(b)). For that reason,
we will pass along each non-discarded ray, where each pixel in the ray will receive
the minimal value between its current value, and the median value along that ray. (In
the original algorithm, the pixels are assigned the median value, yet from my
experiments, I got better results when I took the minimum).
MSBECL Page 37
3.8.3 Text construction

To compute a bounding box of the text region, we will first merge the individual
characters into a single connected component. This can be accomplished using
morphological closing followed by opening to clean up any outliers.
Then again we use the bwconncomp() method to find the text present and we apply
bounding box on it.
3.8.4 Display Text

Finally to display the text we use the OCR function & display the text which is
present in the bounding box.
MSBECL Page 38
3.8.5 Output
Fig 3.10 the output of text detection system a) Natural scene image b) Gray image c)
MSER regions of input image d) character candidates’ extractions e) Stroke width
transform of I/p image f) Joining individual characters g) Text extractions h) output
text on command window
MSBECL Page 39
3.9 Selection Tools
3.9.1 General
MATLAB (matrix laboratory) is a numerical computing environment and fourth-
generation programming language. Developed by Math Works, MATLAB allows
matrix manipulations, plotting of functions and data, implementation of algorithms,
creation of user interfaces, and interfacing with programs written in other languages,
including C, C++, Java, and Fortran.
For Implementation of this project we have used MATLAB version 2014 software
Tool .MATLAB® is a high-level technical computing language and interactive
environment for algorithm development, data visualization, data analysis, and
numerical computation. Using MATLAB, you can solve technical computing
problems faster than with traditional programming languages, such as C, C++, and
Fortran. Matlab is a data analysis and visualization tool which has been designed with
powerful support for matrices and matrix operations. As well as this, Matlab has
excellent graphics capabilities, and its own powerful programming language. One of
the reasons that Matlab has become such an important tool is through the use of sets
of Matlab programs designed to support a particular task. These sets of programs are
called toolboxes, and the particular toolbox of interest to us is the image processing
toolbox. Rather than give a description of all of Matlab‟s capabilities, we shall restrict
ourselves to just those aspects concerned with handling of images. We shall introduce
functions, commands and techniques as required. A Matlab function is a keyword
which accepts various parameters, and produces some sort of output: for example a
matrix, a string, a graph. Examples of such functions are sin, imread, imclose. There
are many functions in Matlab, and as we shall see, it is very easy (and sometimes
necessary) to write our own. Matlab’s standard data type is the matrix all data are
considered to be matrices of some sort. Images, of course, are matrices whose
elements are the grey values (or possibly the RGB values) of its pixels. Single values
are considered by Matlab to be matrices, while a string is merely a matrix of
characters; being the string’s length. When you start up Matlab , you have a blank
window called the _Command Window_ in which you enter commands. Given the
vast number of Matlab’s functions, and the different parameters they can take, a
command line style interface is in fact much more efficient than a complex sequence
of pull-down menus .You can use MATLAB in a wide range of applications,
MSBECL Page 40
including signal and image processing, communications ,control design, test and
measurement financial modeling and analysis. Add-on toolboxes (collections of
special-purpose MATLAB functions) extend the MATLAB environment to solve
particular classes of problems in these application areas. MATLAB provides a number
of features for documenting and sharing your work. You can integrate your MATLAB
code with other languages and applications, and distribute your MATLAB algorithms
and applications. Although MATLAB is intended primarily for numerical computing,
an optional toolbox uses the MuPADsymbolic engine, allowing access to symbolic
computing capabilities. An additional package, Simulink, adds graphical multi-
domain simulation and Model-Based Design for dynamic and embedded systems.
In 2004, MATLAB had around one million users across industry and academia.
MATLAB users come from various backgrounds of engineering, science, and
economics. MATLAB is widely used in academic and research institutions as well as
industrial enterprises.
MATLAB was first adopted by researchers and practitioners in control engineering,
Little's specialty, but quickly spread to many other domains. It is now also used in
education, in particular the teaching of linear algebra and numerical analysis, and is
popular amongst scientists involved in image processing. The MATLAB application
is built around the MATLAB language. The simplest way to execute MATLAB code
is to type it in the Command Window, which is one of the elements of the MATLAB
Desktop. When code is entered in the Command Window, MATLAB can be used as
an interactive mathematical shell. Sequences of commands can be saved in a text file,
typically using the MATLAB Editor, as a script or encapsulated into a function,
extending the commands available. MATLAB provides a number of features for
documenting and sharing your work. You can integrate your MATLAB code with
other languages and applications, and distribute your MATLAB algorithms and
applications.
MATLAB's Power of Computational Mathematics

MATLAB is used in every facet of computational mathematics. Following are some
commonly used mathematical calculations where it is used most commonly:
MSBECL Page 41
1. Dealing with Matrices and Arrays

2. 2-D and 3-D Plotting and graphics
3. Linear Algebra
4. Algebraic Equations
5. Non-linear Functions
6. Statistics
7. Data Analysis
8. Calculus and Differential Equations
9. Numerical Calculations
10. Integration
11. Transforms
12. Curve Fitting
13. Various other special functions
Key Features
1. High-level language for numerical computation, visualization, and application
development
2. Interactive environment for iterative exploration, design, and problem solving
3. Mathematical functions for linear algebra, statistics, Fourier analysis, filtering,
optimization, numerical integration, and solving ordinary differential
equations
4. Built-in graphics for visualizing data and tools for creating custom plots
5. Development tools for improving code quality and maintainability and
maximizing performance
6. Tools for building applications with custom graphical interfaces
7. Functions for integrating MATLAB based algorithms with external
applications and languages such as C, Java, .NET, and Microsoft® Excel®
Uses of MATLAB
MATLAB is widely used as a computational tool in science and engineering
encompassing the fields of physics, chemistry, math and all engineering streams. It is
used in a range of applications including:
a. Signal processing and Communications
b. Image and video Processing
c. Control systems
MSBECL Page 42
d. Test and measurement

e. Computational finance
f. Computational biology
MATLAB is used in vast area, including signal and image processing,

communications, control design, test and measurement, financial modeling and
analysis, and computational. Add-on toolboxes (collections of special-purpose
MATLAB functions) extend the MATLAB environment to solve particular classes of
problems in these application areas.
MATLAB can be used on personal computers and powerful server systems, including
the Cheaha compute cluster. With the addition of the Parallel Computing Toolbox, the
language can be extended with parallel implementations for common computational
functions, including for-loop unrolling. Additionally this toolbox supports offloading
computationally intensive workloads to Cheaha the campus compute cluster.
MATLAB is one of a few languages in which each variable is a matrix (broadly
construed) and "knows" how big it is. Moreover, the fundamental operators (e.g.
addition, multiplication) are programmed to deal with matrices when required. And
the MATLAB environment handles much of the bothersome housekeeping that makes
all this possible. Since so many of the procedures required for Macro-Investment
Analysis involves matrices, MATLAB proves to be an extremely efficient language
for both communication and implementation.
a) INTERFACING WITH OTHER LANGUAGES.

MATLAB can call functions and subroutines written in the C programming language
or FORTRAN. A wrapper function is created allowing MATLAB data types to be
passed and returned. The dynamically loadable object files created by compiling such
functions are termed "MEX-files" (for MATLAB executable).
Libraries written in Java, ActiveX or .NET can be directly called from MATLAB and
many MATLAB libraries (for example XML or SQL support) are implemented as
wrappers around Java or ActiveX libraries. Calling MATLAB from Java is more
complicated, but can be done with MATLAB extension, which is sold separately by
Math Works, or using an undocumented mechanism called JMI (Java-to-Mat lab
MSBECL Page 43
Interface), which should not be confused with the unrelated Java that is also called
JMI.
As alternatives to the MuPAD based Symbolic Math Toolbox available from Math
Works, MATLAB can be connected to Maple or Mathematical. Libraries also exist to
import and export MathML.
The M Files
MATLAB allows writing two kinds of program files: Scripts - script files are
program files with .m extension. In these files, you write series of commands, which
you want to execute together. Scripts do not accept inputs and do not return any
outputs. They operate on data in the workspace.
Functions - functions files are also program files with .m extension. Functions can
accept inputs and return outputs. Internal variables are local to the function.
You can use the MATLAB editor or any other text editor to create your .m files. In
this section, we will discuss the script files. A script file contains multiple sequential
lines of MATLAB commands and function calls. You can run a script by typing its
name at the command line.
Development Environment
a) Startup Accelerator for faster MATLAB startup on Windows, especially on

Windows XP, and for network installations.
b) Spreadsheet Import Tool that provides more options for selecting and loading
mixed textual and numeric data.
c) Readability and navigation improvements to warning and error messages in
the MATLAB command window.
d) Automatic variable and function renaming in the MATLAB Editor .
Developing Algorithms and Applications

MATLAB provides a high-level language and development tools that let you quickly
develop and analyze your algorithms and applications.
The MATLAB Language
The MATLAB language supports the vector and matrix operations that are
fundamental to engineering and scientific problems. It enables fast development and
MSBECL Page 44
execution. With the MATLAB language, you can program and develop algorithms
faster than with traditional languages because you do not need to perform low-level
administrative tasks, such as declaring variables, specifying data types, and allocating
memory. In many cases, MATLAB eliminates the need for „for‟ loops. As a result,
one line of MATLAB code can often replace several lines of C or C++ code.
At the same time, MATLAB provides all the features of a traditional programming
language, including arithmetic operators, flow control, data structures, data types,
object-oriented programming (OOP), and debugging features.
MATLAB lets you execute commands or groups of commands one at a time, without
compiling and linking, enabling you to quickly iterate to the optimal solution. For fast
execution of heavy matrix and vector computations, MATLAB uses processor-
optimized libraries. For general-purpose scalar computations, MATLAB generates
machine-code instructions using its JIT (Just-In-Time) compilation technology. This
technology, which is available on most platforms, provides execution speeds that rival
those of traditional programming languages.
Development Tools
MATLAB includes development tools that help you implement your algorithm
efficiently. These include the following:
MATLAB Editor
Provides standard editing and debugging features, such as setting breakpoints and
single stepping
Code Analyzer
Checks your code for problems and recommends modifications to maximize
performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency, file differences, file
dependencies, and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface Development
Environment) to layout, design, and edit user interfaces. GUIDE lets you include list
boxes, pull-down menus, push buttons, radio buttons, and sliders, as well as
MSBECL Page 45
MATLAB plots and Microsoft ActiveX® controls. Alternatively, you can create GUIs
programmatically using MATLAB functions.
b) ANALYZING AND ACCESSING DATA.

MATLAB supports the entire data analysis process, from acquiring data from external
devices and databases, through preprocessing, visualization, and numerical analysis,
to producing presentation-quality output.
Data Analysis
MATLAB provides interactive tools and command-line functions for data analysis
operations, including:
a. Interpolating and decimating
b. Extracting sections of data, scaling, and averaging
c. Thresholding and smoothing
d. Correlation, Fourier analysis, and filtering
e. 1-D peak, valley, and zero finding
f. Basic statistics and curve fitting
g. Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from files, other applications,
databases, and external devices. You can read data from popular file formats, such as
Microsoft Excel; ASCII text or binary files; image, sound, and video files; and
scientific files, such as HDF and HDF5. Low-level binary file I/O functions let you
work with data files in any format. Additional functions let you read data from Web
pages and XML.
Visualizing Data
All the graphics features that are required to visualize engineering and scientific data are
available in MATLAB. These include 2-D and 3-D plotting functions, 3-D volume
visualization functions, tools for interactively creating plots, and the ability to export
results to all popular graphics formats. You can customize plots by adding multiple
MSBECL Page 46
axes; changing line colors and markers; adding annotation, Latex equations, and
legends; and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create:
a. Line, area, bar, and pie charts.
b. Direction and velocity plots.
c. Histograms.
d. Polygons and surfaces.
e. Scatter/bubble plots.
f. Animations.
3-D Plotting and Volume Visualization

MATLAB provides functions for visualizing 2-D matrices, 3-D scalar, and 3-D vector
data. You can use these functions to visualize and understand large, often complex,
multidimensional data. Specifying plot characteristics, such as camera viewing angle,
perspective, lighting effect, light source locations, and transparency.
3-D plotting functions include:
a. Surface, contour, and mesh.
b. Image plots.
c. Cone, slice, stream and isosurface.
c) PERFORMING NUMERIC COMPUTATION.

MATLAB contains mathematical, statistical, and engineering functions to support all
common engineering and science operations. These functions, developed by experts
in mathematics, are the foundation of the MATLAB language. The core math
functions use the LAPACK and BLAS linear algebra subroutine libraries and the
FFTW Discrete Fourier Transform library. Because these processor-dependent
libraries are optimized to the different platforms that MATLAB supports, they execute
faster than the equivalent C or C++ code. MATLAB provides the following
MSBECL Page 47
types of functions for performing mathematical operations and analyzing data:
I. Matrix manipulation and linear algebra.

II.Polynomials and interpolation.
III.Fourier analysis and filtering.
V. Optimization and numerical integration.

VI. Ordinary differential equations (ODEs).
VII. Partial differential equations (PDEs).
VIII. Sparse matrix operations.
MATLAB can perform arithmetic on a wide range of data types, including doubles,
singles, and integers.
3.9.2 Functions having definition

Stroke width transform
function strokeWidthImage = helperStrokeWidth(DistanceImage)
% StrokeWidthImage = helperStrokeWidth(DistanceImage);
% returns a Stroke Width Image computed from DistanceImage, containing a
% value for stroke width at each non-zero pixel in the DistanceImage.
% DistanceImage is a Euclidean distance transform of a binary image
% computed by bwdist.
%
DistanceImage = round(DistanceImage); % bins distances into integer values for
comparison
% Define 8-connected neighbors

connectivity = [ 1 0; -1 0; 1 1; 0 1; -1 1; 1 -1; 0 -1; -1 -1]’;
% Create padded version of distance image for matrix-wise neighbors comparison
MSBECL Page 48
PaddedDistanceImage = padarray(DistanceImage,[1,1]);
Dind = find (PaddedDistanceImage ~= 0);
sz=size(paddedDistanceImage);
% Compare whether eight neighbors are less than current pixel for all
% pixels in image
neighborIndices = repmat(Dind,[1,8]);
[I,J] = ind2sub(sz,neighborIndices);
I = bsxfun(@plus,I,connectivity(1,:));
J = bsxfun(@plus,J,connectivity(2,:));
neighborIndices = sub2ind(sz,I,J);
lookup =
bsxfun(@lt,paddedDistanceImage(neighborIndices),paddedDistanceImage(Dind));
lookup(paddedDistanceImage(neighborIndices) == 0) = false;
% Propagate local maximum stroke values to neighbors recursively

maxStroke = max(max(paddedDistanceImage));
for Stroke = maxStroke:-1:1
neighborIndextemp = ...
neighborIndices(paddedDistanceImage(Dind) == Stroke,:);
lookupTemp = lookup(paddedDistanceImage(Dind) == Stroke,:);
neighborIndex = neighborIndextemp(lookupTemp);
while ~isempty(neighborIndex)
paddedDistanceImage(neighborIndex) = Stroke;
[~,ia,~] = intersect(Dind,neighborIndex);
neighborIndextemp = neighborIndices(ia,:);
lookupTemp = lookup(ia,:);
neighborIndex = neighborIndextemp(lookupTemp);
end
end
% Remove pad to restore original image size

strokeWidthImage = paddedDistanceImage(2:end-1,2:end-1);
MSBECL Page 49
HelperGrowEdges function
Function GradientGrownEdgesMask =
helperGrowEdges(Edges,GradientDirection,TextPolarity)
% helperGrowEdges Grow edges along or opposite to gradients
%
% GradientGrownEdgesMask =
% helperGrowEdges(Edges,GradientDirection,TextPolarity) Asymmetrically
% dilates binary image edges in the direction specified by
% gradient Direction. Text Polarity is a string specifying whether to grow
% along or opposite the gradient Direction, 'LightTextOnDark' or
% 'DarkTextOnLight', respectively.
% Quantize to 8 cardinal and ordinal directions
GrowthDirection=round ((GradientDirection + 180) / 360 * 8);
GrowthDirection(GrowthDirection == 0) = 8;
if strcmp('DarkTextOnLight',TextPolarity) % Reverse growth direction for dark text

GrowthDirection=mod(GrowthDirection + 3, 8) + 1;
end
% Build structuring elements
% structuring element template to grow edges
diagonally northwestTemplate = ...
[1,1,1,1,1,0,0;...
1,1,1,1,1,0,0;...
1,1,1,1,1,0,0;...
1,1,1,1,0,0,0;...
1,1,1,0,0,0,0;...
zeros (2,7)];
% structuring element template to grow edges horizontally and

vertically northTemplate = ...
[0,1,1,1,1,1,0;...
0,1,1,1,1,1,0;...
0,1,1,1,1,1,0;...
MSBECL Page 50
0,1,1,1,1,1,0;...
zeros(3,7)];
% each structuring element is a rotation of the element

templates N = strel(northTemplate);
W = strel(rot90(northTemplate,1));
S = strel(rot90(northTemplate,2)); E
= strel(rot90(northTemplate,3));
NW = strel(northwestTemplate);
SW = strel(rot90 (northwestTemplate));
SE = strel(rot90(northwestTemplate,2));
NE = strel(rot90(northwestTemplate,3));
Strels = [NE,N,NW,W,SW,S,SE,E];
% Initialize mask
GradientGrownEdgesMask = false(size(Edges));
% Use structuring element to grow Edges along each gradient

direction for i = 1:numel(Strels)
BWCurrentDirection = false(size(Edges)); BWCurrentDirection(Edges == true
& GrowthDirection == i ) = true; BWCurrentDirection =
imdilate(BWCurrentDirection,Strels(i)); GradientGrownEdgesMask =
GradientGrownEdgesMask | BWCurrentDirection;
end
Morphological operations [5]

SE = strel('disk', R)
Create morphological structuring element (STREL)
SE = strel(shape, parameters) creates a structuring element, SE, of the type specified
by shape.
IM2 = imopen(IM,SE)
IM2 = imopen(IM,SE) performs morphological opening on the grayscale or binary
MSBECL Page 51
image IM with the structuring element SE. The argument SE must be a single
structuring element object, as opposed to an array of objects. The morphological open
operation is an erosion followed by a dilation, using the same structuring element for
both operations.
IM2 = imclose(IM,SE)
IM2 = imclose(IM,SE) performs morphological closing on the grayscale or binary
image IM, returning the closed image, IM2. The structuring element, SE, must be a
single structuring element object, as opposed to an array of objects. The
morphological close operation is a dilation followed by an erosion, using the same
structuring element for both operations.
OCR
Recognize text using optical character recognition.
txt = ocr(I) returns an ocrText object containing optical character recognition
information from the input image, I. The object contains recognized text, text
location, and a metric indicating the confidence of the recognition result.
txt = ocr(I, roi) recognizes text in I within one or more rectangular regions. The roi
input contains an M-by-4 matrix, with M regions of interest.
MSBECL Page 52
CHAPTER 4
PERFORMANANCE ANALYSIS
_____________________________________________________________________
This chapter describes the performance analysis of the proposed system.
_____________________________________________________________________
MSBECL Page 53
4.1 DATASETS
Accuracy of the text detection system is calculated by three parameters: Recall,

Precision and f- measure. This text detection system is evaluated using two
different datasets: 1. Sample Dataset; 2. ICDAR 2011. Also the experiments on
street view, multi-orientation, blur, similar foreground and background images
are demonstrated, it shows the effectiveness of this method.
4.1.1 Sample Dataset

Sample Dataset contains 25 different scene images, captured in various places, in
different light effects and font sizes, styles, etc. These images tested on the text
detection system and results are shown using recall, precision and f measure
values. Some of the sample images from Own Dataset are shown in figure below:
Figure 4.1: Sample Dataset images
4.1.2 ICDAR 2011 dataset
The ICDAR 2011 Robust Reading Competition (Challenge 2: Reading Text in

Scene Images) database is a widely used database for benchmarking scene text
MSBECL Page 54
detection algorithms . The database contains

training images and testing images. Here, 25 scene images from ICDAR 2011
datasets are tested on the MSER text detection system, using recall, precision and
f-measure values. Some of the sample images from ICDAR 2011 dataset is
shown in figure below :
Figure 4.2: ICDAR 2011 dataset images
MSBECL Page 55
4.2 SNAPSHOTS OF THE TEXT DETECTION SYSTEM

Different stages to detect the text region from natural scene image are present but
we have wrapped it in the graphical user interface with four buttons. First button
take input image as a natural scene image .Input image is pre-processed by
converting it into gray scale. Second button shows the result of MSER function,
and then third button shows character candidates separated by using bounding
boxes to each character weather it is a character or not a character. And at last the
fourth button shows the final text present in the image by removing non text with
the help of stroke width transform and morphological operations. Fig 4.3 shows
the GUI of text detection system.
Fig 4.3 GUI of text detection system
For checking the robustness of text detection system we have taken different natural
scene images to retrieve text. Fig 4.4 to Fig 4.22 shows the system with scene image
having font size changed as well as the oriented images means the images having
vertical text, the image having noise in it and the images having blurred in it, the
images which are having street view. And the detected text is also shown in the
command window.
MSBECL Page 56
Fig 4.4 Scroll bar adjustment for resizing
Fig 4.5 when we press get image button the open window is going to open
MSBECL Page 57
Fig 4.6 Selected images is going to open in first axis
Fig 4.7 the result of MSER button press is displayed in second axis
MSBECL Page 58
Fig 4.8 shows the result of Character data button press
Fig 4.9 the result of text Data button press
MSBECL Page 59
Fig 4.10 the text will be displayed in command window
Fig 4.11 the result of reset button press is displayed
MSBECL Page 60
Fig 4.12 The rotate button press for rotated image
Fig 4.13 The result of rotated image
MSBECL Page 61
Fig 4.14 the image having vertical text
Fig 4.15 the result of image having vertical text
MSBECL Page 62
Fig 4.16 The image having different font size of each character
Fig 4.17 The result of image having different font size of each character
MSBECL Page 63
Fig 4.18 The text detection of image having noise
Fig
4.19 The result of image having noise
MSBECL Page 64
Fig 4.20 The text detection of image having only numbers
Fig 4.21 The result of Image having only numbers
MSBECL Page 65
Fig 4.22 Text detection of blur image
Fig 4.23 Detected text of blur image
MSBECL Page 66
Fig 4.24 Text detection system apply on street images
Fig 4.25 Detected text of street view image
MSBECL Page 67
4.3 Accuracy of the system:
Table shows the results of two different datasets by using Precision, Recall and f
measure values. Number of texts in input image is considered as the relevant
text, and the number of text detected by system is defined as retrieved text. If the
system detects non text regions as text, then it is considered as the false positives
in the output. And if the system does not detect the text region available in input
image, then it will be considered as false negative. The value for the same is
calculated using the equations below:
Precision = (No. of text in input image ∩ No. of detected text in output image) ÷
No. of detected text in output image (4.1)
OR
Precision = TP / (TP + FP) (4.2)
And the value of recall is calculated as below:

Recall = (No. of text in input image ∩ No. of detected text in output image)
÷ No. of text in input image
(4.3)
OR
Recall = TP / (TP + FN) (4.4)
The value of f-measure is defined using the value of Precision and Recall
calculated above, it can be as below [10] :
F-measure = 2× (Precision × Recall) ÷ (Precision + Recall) (4.5)
MSBECL Page 68
Table 4.1 Result analysis using relevant and retrieved values of sample dataset
and ICDAR 2011
No. of text
No. of text detected in out- False False
Image No in input image put image retrieved(Fa Negative
(Relevant) (True lse positives)
Retrieved)
1 42 42 0 0
2 7 7 0 0
3 13 13 0 0
4 5 5 0 0
5 17 17 0 0
6 12 12 0 0
7 5 5 0 0
8 12 12 0 0
9 20 18 06 02
10 6 6 0 0
11 33 31 05 02
12 5 5 0 0
13 3 3 0 0
14 11 11 02 0
15 32 32 04 0
16 12 11 0 01
17 8 8 0 0
18 10 10 0 0
19 14 14 0 0
20 25 25 0 0
21 18 17 0 01
22 39 38 0 01
23 53 50 0 03
24 34 34 01 0
25 20 20 03 0
26 19 18 02 01
27 10 10 0 0
28 118 111 10 07
29 55 55 0 0
30 8 8 2 0
31 20 20 01 0
32 43 43 06 0
33 64 62 0 02
34 16 16 0 0
35 11 11 0 0
36 15 15 01 0
37 25 25 02 0
38 25 25 0 0
39 26 26 2 0
40 10 10 06 0
41 09 08 02 01
42 13 13 0 0
43 13 11 01 02
44 37 33 0 04
45 23 19 0 04
46 5 5 0 0
47 09 06 01 02
MSBECL Page 69
Table 4.2 Result analysis of Own Dataset and ICDAR 2011
Dataset Precision Recall F- measure
Own Dataset 94.81 98.41 96.57

ICDAR 2011 94.23 96.17 95.19
4.4 COMPARATIVE RESULT OF DIFFERENT METHODS
120
100
80
60
Recall
40 Precision
F - measure
20
Figure 4.26: Comparison of MSER with other methods by Recall, Precision

and F measure.
As shown in above graph, the result of this text detection system is better than other
methods. This system gives 94.23 precision, 96.17 recall and 95.19 f-measure, which
is very high compare to other methods and calculated from values of Table 4.1.
MSBECL Page 70
4.5 Chapter Summery.
In this and in the preceding chapters we have seen how our method gives better
results as compare to the other text detection systems. In this chapter we have taken
screenshots of the system with different images for example noise added image, blur
image, vertical text i.e. multi oriented image, the rotated image; the image is having
different font size for each character. This many images give us accurate result. The
chart is also drawn for showing the performance of the different systems on ICDAR
2011 dataset.
MSBECL Page 71
_____________________________________________
CHAPTER 5
CONCLUSION AND FUTURE WORK
This chapter describes conclusion and future work.
_________________________________
MSBECL Page 72
5.1 Conclusion.
We have developed a text detection system which detects text present in the
Natural scene images as well as display the maximum characters on the
command window, for getting such accurate result we have used the MSER
technique. The MSER method is proven better results from various techniques, because
it easily finds the extremal regions from the input image. It finds most of the characters
from natural scene images, although having noise, blur, etc. For non text elimination
we have used text measurement rules, stroke width transform and morphological
operations so that it can give more accurate result. Finally for displaying text we
used the OCR technique .
Our contribution is we have developed such a system which detects
text and also shows the text present in it on the command window more
accurately. We also retrieve the text which is having different font size in a
image. We can also retrieve the text which is oriented.
5.2 FUTURE ENHANCEMENT
The future enhancement of this system is to retrieve multilingual text from the natural
scene images. Also to detect highly blurred texts in low-resolution natural scene
images.
MSBECL Page 73
References
[1] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, “Robust Text
Detection in Natural Scene Images”, IEEE transaction on Pattern Analysis And
Machine Intelligence, Vol:36,2013, pp. 970-983.
[2] Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu, “A Hybrid Approach to Detect
and Localize Texts in Natural Scene Images”, IEEE Transactions On Image
Processing, Vol. 20, 2011.
[3] Gheware S.D , Nandkishor Dharashive,”Survey on text detection

,segmentationand recognition from a natural scene images “,IJARCCE ,Vol 6.Issue
11,November 2017.
[4] Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk
and Bernd Girod, “Robust Text Detection In Natural Images With Edge-Enhanced
Maximally Stable Ex-tremal Regions”, 18th IEEE international conference on Image
Processing, pp.2609-2612, 2011.
[5] Thuy Ho, Ngoc Ly, “A Scene Text-Based Image Retrieval System”, IEEE
international symposium on Signal Processing and Information Tech., pp. 79-84,
2012.
[6] L. Neumann and J. Matas, “A method for text localization and recognition in real-
world images,” in ACCV 2010, ser.LNCS 6495, vol. IV, November 2010, pp. 2067–
2078.
[7] Aroop Mukherjee, Soumen Kanrar, “Enhancement of Image Resolution by

Binarization”, International Journal of Computer Applications (0975 8887),Volume
10 No.10, 2010.
[8] Teofilo E. de Campos, Bodla Rakesh Babu, Manik Varma, “Character
MSBECL Page 74
Recognition In Natural Images”, International conf. on Intelligence Science and Big

data Engg., pp. 193-200, 2011.
[9] Xiaobing Wang, Yonghang Song, Yuanlin Zhang, “Natural scene text detection in
multi-channel connected component segmentation”, 12th International conf. on
Document Analysis and Recognition, pp. 1375-1379, 2013.
[10] Chucai Yi, Yingli Tian, “Text extraction from scene images by character
appearance and structure modeling”, Elsevier journal on Computer Vision and Image
Understanding, 2013,pp. 182-194
[11] Chitrakala Gopalan, D.Manjula, “Sliding window approach based Text

Binarization from Complex Textual images”, International Journal on Computer
Science and Engineering Vol. 02, No.02, 2010, pp. 309-313
[12] Trung Quy Phan, Palaiahnakote Shivakumara, Souvik Bhowmick, Shimiao Li,
Chew Lim Tan,Umapada Pal, “Semiautomatic Ground Truth Generation for Text
Detection and Recognition in Video Images”, IEEE Trans. Circuits And Systems For
Video Technology, VOL. 24, NO. 8, 2014
[13] Rong-Chi Chang, “Intelligent Text Detection and Extraction from Natural Scene
th
Images”, 15 North- East Asia Symposium on Nano, Information tech. and reliability,
pp.23-28, 2011.
[14] Yao Li, Huchuan Lu, “Scene Text Detection via Stroke Width”, 21st International
Conference on Pattern Recognition, 2012. pp.681-684.
[15] Jonathan Fabrizio, Beatriz Marcotegui, Matthieu Cord , “Text detection in street
level images”,Journal on Pattern analysis applications, 2013, Vol 16, Issue 4, pp. 519-
533.
[16] K. Wang, B. Babenko, and S. Belongie. “End-to-end scene text recognition”.

International conf. on Computer Vision, pp.1457-1464, Vol. 10, 2011.
MSBECL Page 75
[17] Anand Mishra, Karteek Alahari, C.V. Jawahar, “An MRF Model for Binarization
of Natural Scene Text”, IEEE conf on Document Analysis and Recognition,
2011,pp.11-16.
[18] L. Neumann, J. Matas, “Real-time scene text localization and recognition, in

Proc. IEEE Conf.on Computer Vision and Pattern Recognition, 2012, pp. 3538−3545.
[19] Cunzhao Shi, Chunheng Wang, Baihua Xiao, Yang Zhang, Song Gao, “Scene
text detection using graph model built upon maximally stable extremal region”. vol
34, issue 2, 2013, page no. 107-116.
[20] C.P.Sumathi, T.Santhanam, G.Gayathri Devi, “A survey on various approaches of

text extraction in images”, International journal of computer science and engineering
survey, Vol 3, No.4, 2012, pp. 27-42
[21] B.H.Shekar, Smitha M.L., “Skeleton Matching based approach for Text
Localization in Scene Images”, arXiv:1502.03913v1, 2015, pp. 1-12.
[22] Xu-Cheng Yin, Wei-Yi Pei, Jun Zhang, and Hong-Wei Hao, “Multi-Orientation
Scene Text Detection with Adaptive Clustering”, IEEE Transactions On Pattern
Analysis And Machine Intelligence, VOL. 37, NO. 9, 2015, pp. 1930-1937.
[23] Xiaobing Wang, Yonghong Song, Yuanlin Zhang, Jingmin Xin, “Natural scene
text detection with multi-layersegmentation and higher order conditional random field
based analysis”, Else-vier publication on Pattern Recognition Letters 6061, 2015, pp.
41-47.
[24] Rodrigo Minetto , Nicolas Thome , Matthieu Cord, Neucimar J. Leite , Jorge
Stolfi, “Snooper-Text: A text detection system for automatic indexing of urban
scenes”, Elsevier publication on Computer Vision and Image Understanding 122,
2014, pp. 92-104.
[25] Runmin Wang, Nong Sang, “Changxin Gao, Text detection approach based on
confidence map and context information”, Elsevier publication on Neurocomputing,
MSBECL Page 76
157, 2015, pp. 153-165.
[26] Bowornrat Sriman, Lambert Schomaker, “Object Attention Patches for Text
Detection and Recognition in Scene Images using SIFT”, ICPRAM 2015.
MSBECL Page 77
Paper Published
Title Survey on text detection ,segmentation and recognition

from a natural scene images
Author Name 1. Gheware S.D.

2. Prof.N.G.Dharashive
Name of Journal International Journal of Advanced Research in computer

and Communication Engineering.
ISSN Number 2278-1021

Volume Volume 6,Issue 11,November 2017
Title Accurate Text Detection in Natural Scene Images using

MSER Approach
Author Name 1. Gheware S.D.

2. Prof.N.G.Dharashive
Name of Journal International Journal of Advanced Research in

Engineering and Technology.
ISSN Number 2320-6802

Volume Volume 6,Issue VI, June 2018
MSBECL Page 78
MSBECL Page 79

Project Report On 2factor Authentication

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report On 2factor Authentication

Uploaded by

Copyright:

Available Formats

A Dissertation Report on

"Accurate Text Detection in Natural Scene Images using MSER Approach"

Under the Guidance of

Dept. of Computer Science and Engineering

“Accurate Text Detection in Natural Scene Images using MSER Approach”

External Examiner Internal Examiner

Department of Computer Science and Engineering

During the academic year 2017-2018

PROF. N. G. DHARASHIVE PROF. N. J. PATHAN

PROF. S. R. TANDLE PROF. N.B. KHATOD

Shital Digamber Gheware

List of Tables VII

1.1 Text Detection 2

CHAPTER 2 BACKGROUND AND RELATED WORK 14

2.2.2 Two stage algorithm for ER Pruning 18

confidence map and context information

CHAPTER 3 PROPOSED SYSTEM OF TEXT DETECTION IN NATURAL SCENE

3.2 Problem Description 28

CHAPTER 5 CONCLUSION AND FUTURE WORK 72

1. Survey on Text Detection, Segmentation and Recognition from

1.3 Sample born-digital image 4

2.1 MSER lattice induced by the inclusion relation.

Only certain nodes correspond to characters 17

2.2 System Flowcharts 18

2.3 The flowchart of the algorithm and the illustration of MSER

3.2 Use Case Diagram for Text Detection System 30

3.3 Level 0 data flow diagram for Text Detection System 31

3.4 Level 1 data flow diagram for Text Detection System 31

3.5 Sequence diagram for Text Detection System 32

3.6 System architecture for text detection system 32

3.7 The process of MSER finding (a) MSERs according 33

to variations; (b) MSERs tree after linear reduction;

(c) character candidates after tree accumulation.

3.8 Implementation of SWT 37

3.9 Filing pixels with SWT values 37

3.10 The output of text detection system a) Natural scene image 39

b) Gray image c) MSER regions of input image d) character candidates’

4.4 Scroll bar adjustment for resizing 57

4.6 Selected image is going to open in first axis 58

4.7 The result of MSER button press is displayed in second axis 58

4.8 Shows the result of Character data button press 59

4.9 The result of text Data button press 59

4.10 The text will be displayed in command window 60

4.11 The result of reset button press is displayed 60

4.12 The rotate button press for rotated image 61

4.13 The result of rotated image 61

4.14 The image having vertical text 62

4.15 The result of image having vertical text 62

4.16 The image having different font size of each character 63

4.18 The text detection of image having noise 64

4.19 The result of image having noise 64

4.20 The text detection of image having only numbers 65

4.21 The result of Image having only numbers 65

4.22 Text detection of blur image 66

4.23 detected Text of blur image 66

4.24 Text detection system apply on street images 67

4.25 Detected text of street view image 67

4.26 Comparison of MSER with other methods by Recall,

Precision and F measure. 70

4.1 Result analysis using relevant and retrieved values 69

1.1 TEXT DETECTION

any document is created by scanners or cameras. In which, the image is transformed

Figure 1.1. Document text image