Professional Documents
Culture Documents
Using MATLAB
CHAPTER 1
INTRODUCTION
1.1 Introduction of Image Processing
Modern digital technology has made it possible to manipulate multi-dimensional
signals with systems that range from simple digital circuits to advanced parallel
computers. The goal of this manipulation can be divided into three categories:
We will focus on the fundamental concepts of image processing. Space does not permit
us to make more than a few introductory remarks about image analysis. Image
understanding requires an approach that differs fundamentally from the theme of this
book. Further, we will restrict ourselves to twodimensional (2D) image processing
although most of the concepts and techniques that are to be described can be extended
easily to three or more dimensions.
Digitizer
Mass
Image processing
Digital
Computer
Display
Operator console
the attached digitizer flex cable or digitizer flex ribbon, that should be included in your
purchase of a replacement LCD digitizer.
It's pretty difficult to preserve the integrity of the flex cable when there is a
digitizer replacement performed. So, when we have an LCD assembly, minus small parts
and frames, we're looking at 3 major components... 1) the digitizer (front glass), 2) the
digitizer flex cable (should be attached to the glass) and 3) the LCD display unit. All 3 of
these components can be purchased separately or all together. Although, I wouldn't
recommend to anyone that they purchase the digitizer without the digitizer flex cable.
Mounting the flex cable can be tricky with some phones and if not done correctly will
give you poor functionality of your touch screen LCD.
Image acquisition
Preprocessing
Segmentation
Knowledge base
Image Restoration: Image restoration is an area that also deals with improving
(vi) Compression: Compression deals with techniques for reducing the storage required
to save an image or the bandwidth to transmit it. Particularly in the uses of internet it is
very much necessary to compress data.
(vii)Morphological Processing: Morphological processing deals with tools for extracting
image components that are useful in the representation and description of shape.
(viii) Segmentation: Segmentation procedures partition an image into its constituent parts
or objects. In general, autonomous segmentation is one of the most difficult tasks in
digital image processing. A rugged segmentation procedure brings the process a long way
toward successful solution of imaging problems that require objects to be identified
individually.
(ix)Representation and Description: Representation and description almost always
follow the output of a segmentation stage, which usually is raw pixel data, constituting
either the boundary of a region or all the points in the region itself. Choosing a
representation is only part of the solution for transforming raw data into a form suitable
for subsequent computer processing. Description deals with extracting attributes that
result in some quantitative information of interest or are basic for differentiating one class
of objects from another.
(x)Object recognition : Recognition is the process that assigns a label, such as, vehicle
to an object based on its descriptors.
(xi) Knowledge Base : Knowledge may be as simple as detailing regions of an image
where the information of interest is known to be located, thus limiting the search that has
to be conducted in seeking that information. The knowledge base also can be quite
complex, such as an interrelated list of all major possible defects in a materials inspection
problem or an image database containing high-resolution satellite images of a region in
connection with change-detection applications.
1.2.3 Digital computer
An electronic computer in which the input is discrete rather than continuous, consisting
of combinations of numbers, letters, and other characters written in an appropriate
programming language and represented internally in binary notation. Compare analog
computer.
Dept. Of E.C.E, SIETK, PUTTUR
Medical field
Remote sensing
Machine/Robot vision
Color processing
Pattern recognition
Video processing
Microscopic Imaging
Others
1.4 Segmentation
The division of an image into meaningful structures, image segmentation, is often
an essential step in image analysis, object representation, visualization, and many other
image processing tasks. In chapter 8, we focussed on how to analyze and represent an
object, but we assumed the group of pixels that identified that object
was known
beforehand. In this chapter, we will focus on methods that find the particular pixels that
make up an object. A great variety of segmentation methods has been proposed in the past
decades, and some categorization is necessary to present the methods properly here. A
disjunct categorization does not seem to be possible though, because even two very
Dept. Of E.C.E, SIETK, PUTTUR
different
segmentation
approaches
may
share
properties
that
defy
singular
object and then growing outward until it meets the object boundaries.
Clustering techniques. Although clustering is sometimes used as a synonym for
agglomerative) segmentation techniques, we use it here to denote techniques that
are primarily used in exploratory data analysis of high-dimensional measurement
patterns. In this context, clustering methods attempt to group together patterns
that are similar in some sense. This goal is very similar to what we are attempting
to do when we segment an image, and indeed some clustering techniques can
10
B. Sentence segmentation
Sentence segmentation is the problem of dividing a string of written language into
its component sentences. In English and some other languages, using punctuation,
particularly the full stop character is a reasonable approximation. However even in
English this problem is not trivial due to the use of the full stop character for
abbreviations, which may or may not also terminate a sentence. For example Mr. is not its
own sentence in "Mr. Smith went to the shops in Jones Street." When processing plain
text, tables of abbreviations that contain periods can help prevent incorrect assignment of
sentence boundaries.As with word segmentation, not all written languages contain
punctuation characters which are useful for approximating sentence boundaries.
C. Text segmentation
Topic analysis consists of two main tasks: topic identication and text
segmentation. While the first is a simple classification of a specific text, the latter case
implies that a document may contain multiple topics, and the task of computerized text
segmentation may be to discover these topics automatically and segment the text
Dept. Of E.C.E, SIETK, PUTTUR
11
accordingly. The topic boundaries may be apparent from section titles and paragraphs. In
other cases, one needs to use techniques similar to those used in document
classification.Segmenting the text into topics or discourse turns might be useful in some
natural processing tasks: it can improve information retrieval or speech recognition
significantly (by indexing/recognizing documents more precisely or by giving the
specific part of a document corresponding to the query as a result). It is also needed in
Topic detection and Tracking systems and text summarizing problems.Many different
approaches have been tried. e.g. HMM, lexical chains, passage similarity using word cooccurrence, clustering etc. It is quite an ambiguous task people evaluating the text
segmentation systems often differ in topic boundaries. Hence, evaluating is quite dubious
problem too.
D. Other segmentation problems
Processes may be required to segment text into segments besides mentioned,
including morphemes (a task usually called morphological analysis) or paragraphs.
E.Automatic segmentation approaches
Automatic segmentation is the problem in natural language processing of
implementing a computer process to segment text.When punctuation and similar clues are
not consistently available, the segmentation task often requires fairly non-trivial
techniques, such as statistical decision-making, large dictionaries, as well as
consideration of syntactic and semantic constraints. Effective natural language processing
systems and text segmentation tools usually operate on text in specific domains and
sources. As an example, processing text used in medical records is a very different
problem than processing news articles or real estate advertisements.The process of
developing text segmentation tools starts with collecting a large corpus of text in an
application domain. There are two general approaches:
12
Annotate the sample corpus with boundary information and use Machine
Learning
Some text segmentation systems take advantage of any markup like HTML and know
document formats like PDF to provide additional evidence for sentence and paragraph
boundaries.
1.6 Compression
The objective of image compression is to reduce irrelevance and redundancy of
the image data in order to be able to store or transmit data in an efficient form.
1.6.1 Lossy and lossless compression
Image compression may be lossy or lossless. Lossless compression is preferred
for archival purposes and often for medical imaging, technical drawings, clip art, or
comics. Lossy compression methods, especially when used at low bit rates, introduce
compression artifacts. Lossy methods are especially suitable for natural images such as
photographs in applications where minor (sometimes imperceptible) loss of fidelity is
acceptable to achieve a substantial reduction in bit rate. The lossy compression that
produces imperceptible differences may be called visually lossless.
Methods for lossless image compression are:
Entropy encoding
13
Chain codes
Reducing the color space to the most common colors in the image. The selected
colors are specified in the color palette in the header of the compressed image.
Each pixel just references the index of a color in the color palette, this method can
be combined with dithering to avoid posterization.
Chroma subsampling. This takes advantage of the fact that the human eye
perceives spatial changes of brightness more sharply than those of color, by
averaging or dropping some of the chrominance information in the image.
Fractal compression.
14
may be found in lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability
is especially useful for previewing images while downloading them (e.g., in a web
browser) or for providing variable quality access to e.g., databases. There are several
types of scalability:
Resolution progressive: First encode a lower image resolution; then encode the
difference to higher resolutions.
15
signal.For example, for 16-bit data representation there are 16 bit planes: the first bit
plane contains the set of the most significant bit, and the 16th contains the least
significant bit.
It is possible to see that the first bit plane gives the roughest but the most critical
approximation of values of a medium, and the higher the number of the bit plane,
the less is its contribution to the final stage. Thus, adding a bit plane gives a better
approximation.
If a bit on the nth bit plane on an m-bit dataset is set to 1, it contributes a value of
2(m-n), otherwise it contributes nothing. Therefore, bit planes can contribute half of
the value of the previous bit plane.
16
CHAPTER-2
LITERATURE SURVEY
2.1 MULTISCALE SEGMENTATION FOR MRC DOCUMENT
COMPRESSION USING COST FUNCTION
The Mixed Raster Content (MRC) standard (ITU-T T.44) specifies a framework
for document compression which can dramatically improve the compression/quality
tradeoff as compared to traditional lossy image compression algorithms. The key to
MRCs performance is the separation of the document into foreground and back- ground
layers, represented as a binary mask. In this paper, we propose a integrated segmentation
algorithm which is based on the sequential application of two algorithms. Cost Optimized
Segmentation (COS), is a block wise segmentation algorithm.
The second algorithm, Connected Component Classification (CCC), refines the
initial segmentation by classifying feature vectors of connected components using a
Markov random field (MRF) model. The integrated COS/CCC segmentation algorithms
are then incorporated to a resolution enhanced rendering (RER) method ie to achieve
high quality rendering of document containing text, pictures and graphics, while
maintaining desired compression ratios.
The procedure for Cost Optimized Segmentation (COS) is as follows. The image
is first divided into overlapping blocks. Each block contains mm pixels, and adjacent
blocks overlap by m/2 pixels in both the horizontal and vertical directions. The blocks are
denoted, Oi,j for i = 1, .., M , and j = 1, .., N , where M and N are the number of the
blocks in the vertical and horizontal directions. The pixels in each block are segmented
into foreground (1) or background (0) by the clustering method of Cheng and
Dept. Of E.C.E, SIETK, PUTTUR
17
Bouman. This results in an initial binary mask for each block denoted by C i,j {0, 1}mm.
However, in order to form a consistent segmentation of the page, these initial block
segmentations must be merged into a single binary mask. To do this, we allow each block
to be modified using a class assignment, si,j{0, 1, 2, 3}, as follows,the most traditional
approach to text segmentation is Otsus method which thresholds pixels in an effort to
divide the documents histogram into object and background.
18
19
20
method is very simple to understand, computationally less expensive, and efficient. In the
latter part, we exploit the contextual information using MRF-based post processing to
improve the results of document segmentation. The rest of the paper is organized as
follows.
2.3.1 Estimating Globally Matched Wavelet Filters
Matched wavelet estimation for any signal is formulated as finding a closed form
expression for extracting the compactly infinitely supported wavelet which maximizes
error norm between the signal reconstructed at initial scaling subspace and successive
lower wavelet subspace [1]. At an abstract level, our system use a set of trained wavelet
filters matched to text and non text classes. When a mixed document (having both text
and non text components) is passed through text matched filters, we get blacked-out
regions in the detail (high pass) space corresponding to the text regions of the document
and vice versa for the non text matched wavelet filters. These blacked-out regions in the
output of text and non text wavelet filters are used to classify various regions as either
text or non text.
In , an approach is proposed for estimating matched wavelets for a given image. It
is further shown in [1] that estimated wavelets with separable kernel have higher peak
signal-to-noise ratio (PSNR) for the same bit-rate as compared with standard 9/7 wavelet.
In this section, we describe a technique for estimating a set of matched wavelets from a
database of images. We term them as GMWs. These GMWs are used to generate feature
vectors for segmentation. We discuss more about their implementation in subsequent
subsections.
2.3.2 Matched Wavelets & Their Estimation
First, we briefly review the theory of matched wavelets with separable kernel as
proposed. Consider a 2-D two-band wavelet system (with separable kernels) shown in
Fig. 2. Here, and are the horizontal and vertical directions, the scaling filter in any
direction is represented as , its dual is represented as , wavelet filter is represented as ,
and its dual is shown as . Further, boxes showing 2 with an upward or downward arrow
Dept. Of E.C.E, SIETK, PUTTUR
21
approximation subspace or scaling subspace, whereas the outputs of the other three
channels are called detail subspaces. This system is designed as biorthogonal wavelet
system which means that it needs to satisfy following conditions for perfect
reconstruction of the two- band filter bank
f 1 ( n ) =(1 ) ho ( M n)
(1)
(2)
(t)
(t )
are governed by
two-scale relations for the two-band wavelet system. Similar equations exist for
estimating dual scaling function
22
. The error
e ( x )=a ( x )a x
(3)
Where is the continuous 2-D image signal and represents the 2-D image reconstructed
from detail coefficients only. Then, corresponding error energy is defined as
2
E= e ( x ) dx
(4)
23
The Fisher classifier is often used for two-class classification problems. Although
it can be extended to multiclass classification (three classes in our case), yet the
classification accuracy decreases due to the overlap between neighboring classes. Thus,
we need to make some modifications to the Fisher classifiers (explained in the last
section) to apply them in this case [3].We use three Fisher classifiers, each optimized for
a two-class classification problem (text/picture, picture/background, and background
text). Each classifier outputs a confidence in the classification and the final decision is
made by fusing the outputs of all three classifiers.
Fig. 4. Distribution of Y for image and background as obtained from classifier 1. Similar
distributions are obtained for classifiers 2 and 3.
2.3.4 MRF POSTPROCESSING FOR DOCUMENT IMAGE SEGMENTATION
Segmentation may lead to overlapping in the feature space. This is especially true
for the picture and background classes because of lack of hard distinction between the
textures of these two classes. We deal with this problem by exploiting the contextual
information around each pixel. A similar approach has been used recently in to refine the
results of segmenting the handwritten text, printed text and noise in the document image.
Results for the document image segmentation of previous section show that
misclassification happens either in form of occurrence of certain isolated clusters of
another class in a given class or at the boundaries of the different classes as indicated by
Fig. 5. Removing this misclassification is equivalent to making the classification
smoother. In this section, we present
24
Figure 2.6: Document Segmentation results obtained for two sample images
Images show that the misclassification occurs either at the class boundaries or
because of the presence of small isolated clusters.The problem of correcting the
misclassification belongs to very general class of problems in vision and can be
formulated in terms of energy minimization. Every pixel must be assigned a label in the
set {text, picture, background}. refers to a particular labeling of the pixels and refers to
the value of the label of a particular pixel. We consider the first order MRFmodel. This
simplifies the energy function to the following form:
E (f )=
{p q } N
V p ,q ( f p , f q ) + D p ( f p)
p P
(5)
Inputs to the algorithm are the classification confidence maps (for image, text,
background) and labelings (initial results) that we obtained in the last section using this
classification confidence maps. Using the initial labelings, algorithm evaluates the
interaction energy ( ) and minimize the total energy ( ) to obtain a new labeling. This step
is repeated till no further minimization is possible, finally leaving the resulting optimized
labeling.
Dept. Of E.C.E, SIETK, PUTTUR
25
26
Figure2.7: Image
segmentation and
character segmentation
separator
symbols
frame
of
reference
and
horizontal-vertical
segmentation. All these methods are very useful as a preprocessing step for the OCR.
Some of algorithms based on prior knowledge and separator symbols frame of reference
might not be useful for NP segmentation as it is difficult have prior knowledge regarding
vehicle NP in advance. Dynamic programming and Segment confidence-based binary
segmentation (SCBS) based methods can be really useful for NP character extraction
27
Traditional generative Markov random fields for segmenting images model the
image data and corresponding labels jointly, which requires extensive independence
assumptions for tract- ability. We present the conditional random field for an application i
n sign detection, using typical scale and orientation selective texture filters and a
nonlinear texture operator based on the grating cell. The resulting model captures
dependencies between neighboring image region labels in a data-dependent way that
escapes the difficult problem of modeling image formation, instead focusing effort and
computation on the labeling task. We compare the results of training the model with
pseudo-likelihood against an approximation of the full likelihood with the iterative tree re
parameterization algorithm and demonstrate improvement over previous methods.
Image segmentation and region labeling are common problems in computer
vision. In this work, we seek to identify signs in natural images by classifying regions
according to their textural properties. Our goal is to integrate with a wearable system that
will recognize any detected signs as a navi gational aid to the visually impaired. Generic
sign detection is a difficult problem. Signs may be located anywhere in an image, exhibit
a wide range of sizes, and contain an extraordinarily broad set of fonts, colors,
arrangements, etc. For these reasons, we treat signs as a general texture class and seek to
discriminate such a class from the many others present in natural images.
The value of context in computer vision tasks has been studied in various ways
for many years. Two types of context are important for this problem:label context and
data context. In the absence of label context, local regions are classified independently,
which is a common approach to object detection. Such disregard for the (unknown) labels
of neighboring regions often leads to isolated false positives and missing false negatives.
The absence of data context means ignoring potentially helpful image data from any
neighbors of the region being classified. Both contexts are simultaneously important. For
instance, since neighboring regions often have the same label, we could penalize label
discontinuity in an image. If such regularity is imposed without regard for the actual data
in a region and local evidence for a label is weak, then continuity constraints would
typically override the local data. Conversely, local region evidence for a \sign" label
Dept. Of E.C.E, SIETK, PUTTUR
28
could be weak, but a strong edge in the adjoining region might bolster belief in the
presence of a sign at the site because the edge indicates a transition. Thus, considering
both the labels and data of neighboring regions is important for predicting labels. This is
exactly what the conditional random field (CRF) model provides. The advantage of the
discriminative contextual model over a generative one for detection tasks has recently
been shown in [8]. We demonstrate a training method that improves prediction results,
and we apply the model to a challenging real-world task. First the details of the model
and how it divers from the typical random field are described, followed by a description
of the image features we use. We close with experiments and conclusions.
2.5.1 Image Features for Sign Detection
Text and sign detection has been the subject of much research. Earlier approaches
either use independent, local classifications or use heuristic methods, such as connected
component analysis. Much work has been based on edge detectors or more general
texture features, as well as color. Our approach calculates a joint labeling of image
patches, rather than labeling patches independently, and it obviates layout heuristics by
allowing the CRF to learn the characteristics of regions that contain text. Rather than
simply using functions of single filters (e.g., moments) or edges, we use a richer
representation that captures important relationships between responses to different scaleand orientation-selective filters.
To measure the general textural properties of both sign and especially non-sign
(hence, background) image regions, we use responses of scale and orientation selective
filters. Specifically, we use the statistics of filter responses described in, where
correlations between steerable pyramid responses of different scales and orientations are
the prominent features.
A biologically inspired non-linear texture operator for detecting gratings of bars at
a particular orientation and scale is described. Scale and orientation selective filters, such
as the steerable pyramid or Gabor filters, respond indiscriminately to both single edges
and one or more bars.
Dept. Of E.C.E, SIETK, PUTTUR
29
Figure 2.8: Grating cell data flow for a single scale and orientation.
Two boxes at I, T, and F represent center on and center filters, while the boxes at M are
for the six receptive fields.Using an algorithm that ranks discriminative power of random
field model features, we found the top three in the edge-less, context-free Max Ent model
to be (i) the level of green hue (easily identifying vegetation as background), (ii) mean
grating cell response (easily identifying text), and (iii) correlation between a vertically
and diagonally oriented filter of moderate scale (the single most useful other `textural'
feature).
CHAPTER-3
High Quality MRC Document Coding MRC model
Dept. Of E.C.E, SIETK, PUTTUR
30
31
Figure 3.1: MRC imaging model forms text and line art by using a binary mask to
choose between foreground and background layers.
In this work, we focus on the rate distortion optimized segmentation (RDOS)
algorithm. Strictly speaking, the RDOS encoder is not a true MRC encoder because it
does not encode each layer of the MRC model independently. Nonetheless, the RDOS
Dept. Of E.C.E, SIETK, PUTTUR
32
method can in principal be modified to be a true MRC method, and the methods
introduced in this work are equally applicable to any typical MRC encoder.
The RDOS algorithm classifies each 88 block of pixels into one of four
classes: picture block, two-color block, one-color block or other block. Each
class corresponds to a different coding method. The picture and other blocks use JPEG
block encoders. The one-color blocks are entropy coded using an arithmetic encoder. For
each two-color block, both the foreground and background colors are entropy coded
using the arithmetic encoders while the 8 8 binary mask is encoded using a JBIG2
encoder. The class of each block is chosen to maximize the rate-distortion performance
over the entire document. The optimization is achieved by applying each candidate
coding method to each block and then selecting the method which yields the best ratedistortion trade-off.
33
trained to identify the characteristic patterns of the RER encoder. Therefore, it can do a
much better job of accurately estimating the true pixel values.
Figure3.2: Illustration of MRC encoder and decoder with RER. Examples were selected
from actual RER inputs and outputs.
Figure 3.3 illustrates how the RER encoder and decoder are jointly optimized to
maximize the quality of the decoded document. As we will see, both the encoder and the
decoder have parameters which can be trained to produce the best possible result. The
error diffusion algorithm has +ve parameters which control its behavior, and the
nonlinear predictor has a large number of parameters which specify the nodes of a
nonlinear regression tree.
34
Figure 3.3: Overview of method used to train the optimized encoder and decoder. Once
training is complete, the encoder and decoder function independently.
In each iteration of the optimization, the parameters of the encoder or decoder are
alternatively fixed, while the parameters of the other one are optimized. Importantly, two
different sets of documents are used for training the encoder and decoder. We have found
this improves the robustness of the training procedure. Experimental results are shown
for test documents that are not contained in either set of training documents. The
experimental results indicate that this training process robustly converges to parameters
which reduce the distortion of the decoded document. Moreover, we have found that joint
optimization of the encoder and decoder performs substantially better than independent
optimization of these two functions.
3.3.1 The RER Encoder
Let Xs be a pixel in the raster document at location s. In the MRC format, each
pixel also has an associated foreground color, Fs, and background color, Bs. The binary
MRC mask then determines whether Fs or Bs will be used to represent the true pixel
value Xs. In RDOS encoding, the foreground and background colors are constant in 88
blocks. But in other MRC encoding methods, the values of the foreground and
background colors can change from pixel to pixel. Next define the scalar value s which
determines the relative mixture of foreground and background color in the pixel Xs. More
specifically, s is given by the value on the real line which minimizes the squared error
Dept. Of E.C.E, SIETK, PUTTUR
35
Figure 4 gives a geometric interpretation of s as the projection of the true pixel color
onto the line connecting the foreground and background colors. The solution to this least
squares approximation problem is given by
Figure3.4: Least
squares
36
serpentine scan order. Then ws0; ws1; ws2; ws3 are the values of the four corresponding
error diffusion weights. The values of these four weights are varied at each pixel s using
the formula
37
This data forms a binary vector, zs, which is then used as input to a binary regression tree
predictor known as Tree- Based Resolution Synthesis (TBRS). The TBRS predictor
estimates the value of s in a two-step process. First, it classifies the vector zs into one of
M classes using a binary tree classifier. The basic idea of TBRS is to use a binary
regression tree as a piecewise linear approximation to the conditional mean estimator.
The classification step is essential because it can separate out the distinct regions of the
document corresponding to mask edges of different orientation and shape.
One additional complication occurs with the RDOS method. Since it is not a true
MRC encoder, pixels which fall outside of two-color blocks have no binary mask values.
This can cause a problem when the pixel s falls near the boundary of a block, and the 5
5 window about the pixel covers part of the adjacent block that is not a two-color block.
In this case, the pixels are classified as 0, 1, or 2 depending on if they are close to the
background color, the foreground color or neither color. Then the values, 1, and 2 are
encoded as binary values 00, 01, and 10, to insure that the input vector zs remain binary.
3.3.3 Training
The objective of the training process is to optimize the performance of the RER
encoder and decoder by selecting the encoder and decoder parameters to maximize the
decoded document quality over a training set of documents. The distortion metric used to
measure document quality is mean squared error. While mean squared error is not always
a good measure of quality, for this application we found that it was always well correlated
with our subjective evaluation of quality.
The training process alternated between optimization of the encoder and decoder
parameters. So, when optimizing the encoder parameters, the previously obtained
decoder parameters were used; and when optimizing the decoder parameters, the
previously obtained encoder parameters were used. The training phases for encoder and
decoder used different sets of training data. This strategy seemed to produce more robust
training results. The iterative optimization is always started by optimizing the decoder
and using the initial encoder parameters
38
Figure3.7: Comparison of compression results. (a) A portion of the original test image;
(b) Compressed by standard RDOS at 0.184 bpp (130:1 compression ratio); (c)
Compressed by RER enhanced RDOS at 0.182 bpp (132:1 compression ratio).
Dept. Of E.C.E, SIETK, PUTTUR
39
CHAPTER-4
PROPOSED SYSTEM
Our segmentation method is composed of two algorithms that are applied in
sequence: the cost optimized segmentation (COS) algorithm and the connected
component classification (CCC) algorithm. The COS algorithm is a block wise
segmentation algorithm based upon cost optimization. The COS produces a binary image
from a gray level or color document; however,
the resulting binary image typically contains many false text detections.
The CCC algorithm further processes the resulting binary image to improve the
accuracy of the segmentation. It does this by detecting non text components (i.e., false
text detections) in a Bayesian framework which incorporates an Markov random field
(MRF) model of the component labels. One important innovation of our method is in the
design of the MRF prior model used in the CCC detection of text components. In
particular, we design the energy terms in the MRF distribution so that they adapt to
attributes of the neighboring components relative locations and appearance. By doing
this, the MRF can enforce stronger dependencies between components which are more
likely to have come from related portions of the document.
40
Figure4.1: COS algorithm comprises two steps: block wise segmentation and global
segmentation. The parameters of the cost function used in the global segmentation are
optimized in an offline training procedure.
4.1.1 Block wise Segmentation
Block wise segmentation is performed by first dividing the image into
overlapping blocks, where each block contains MM pixels, and adjacent blocks overlap
by pixels in both the horizontal and vertical directions. The blocks are denoted by for ,
and , where and are the number of the blocks in the vertical and horizontal directions,
respectively. If the height and width of the input image is not divisible by , the image is
padded with zeros. For each block, the color axis having the largest variance over the
block is selected and stored in a corresponding gray image block, . The pixels in each
block are segmented into foreground (1) or background (0) by the clustering method
of Cheng and Bouman [24]. The clustering method classifies each pixel in by comparing
it to a threshold . This threshold is selected to minimize the total subclass variance. More
specifically, the minimum value of the total subclass variance is given by
41
where and are number of pixels classified as 0 and 1 in by the threshold , and and are the
variances within each subclass (see Fig. 3). Note that the subclass variance can be
calculated efficiently. First, we create a histogram by counting the number of pixels
which fall into each value between 0 and 255. For each threshold , we can recursively
calculate and from the values calculated for the previous threshold of .
Figure4.2: Illustration of a block wise segmentation. The pixels in each block are
separated into foreground (1) or background (0) by comparing each pixel
4.1.2 Global Segmentation
The global segmentation step integrates the individual segmentations of each
block into a single consistent segmentation of the page. To do this, we allow each block
to be modified using a class assignment denoted by,
42
Notice that for each block, the four possible values of correspond to four possible
changes in the blocks segmentation: original, reversed, all background, or all foreground.
If the block class is original, then the original binary segmentation of the block is
retained. If the block class is reversed, then the assignment of each pixel in the block is
reversed. If the block class is set to all background or all foreground, then the pixels
in the block are set to all 0s or all 1s, respectively. Fig. 4 illustrates an example of the
four possible classes where black indicates a label of 1 (foreground) and white
indicates a label of 0 background).
As it is shown, the cost function contains four terms, the first term representing the fit of
the segmentation to the image pixels, and the next three terms representing regularizing
constraints on the segmentation. The values , , and are then model parameters which can
be adjusted to achieve the best segmentation quality. The first term is the square root of
the total subclass variation within a block given the assumed segmentation. More
specifically
Dept. Of E.C.E, SIETK, PUTTUR
43
where is the standard deviation of all the pixels in the block. Since must always be less
than or equal to , the term can always be reduced by choosing a finer segmentation
corresponding to or 1 rather than smoother segmentation corresponding to or 3. The
terms and regularize the segmentation by penalizing excessive spatial variation in the
segmentation. To compute the term , the number of segmentation mismatches between
pixels in the overlapping region between block and the horizontally adjacent block is
counted. The term is then calculated as the number of the segmentation mismatches
divided by the total number of pixels in the overlapping region. Also is similarly defined
for vertical mismatches. By minimizing these terms, the segmentation of each block is
made consistent with neighboring blocks. The term denotes the number of the pixels
classified as foreground (i.e., 1) in divided by the total number of pixels in the block.
This cost is used to ensure that most of the area of image is classified as background. For
computational tractability, the cost minimization is iteratively performed on individual
rows of blocks, using a dynamic programming approach [37]. Note that row-wise
approach does not generally minimize the global cost function in one pass through the
image.
Therefore, multiple iterations are performed from top to bottom in order to
adequately incorporate the vertical consistency term. In the first iteration, the
optimization of th row incorporates the term containing only the throw. Starting from the
second iteration, terms for both the th row and th row are included. The optimization
stops when no changes occur to any of the block classes. Experimentally, the sequence of
updates typically converges within 20 iterations.
The cost optimization produces a set of classes for overlapping blocks. Since the
output segmentation for each pixel is ambiguous due to the block overlap, the final COS
segmentation output is specified by the center region of each overlapping block. The
weighting coefficients , , and were found by minimizing the weighted error between
Dept. Of E.C.E, SIETK, PUTTUR
44
where , and the terms and are the number of pixels in the missed detection and false
detection categories, respectively.
45
Figure 3.4. Illustration of how the component inversion step can correct erroneous
segmentations of text. (a) Original document before segmentation. (b) Result of COS
binary segmentation. (c) Corrected segmentation after component inversion.
46
at 300 dpi resolution. The component inversion step corrects text segmentation errors that
sometimes occur in COS segmentation when text is locally embedded in a highlighted
region. Fig. 5(b) illustrates this type of error where text is initially segmented as
background. Notice the text 100 Years of Engineering Excellence is
initially
segmented as background due to the red surrounding region. In order to correct these
errors, we first detect foreground components that contain more than eight interior
background components (holes).
In each case, if the total number of interior background pixels is less than half of
the surrounding foreground pixels, the foreground and background assignments are
inverted. Fig. 5(c) shows the result of this inversion process. Note that this type of error is
a rare occurrence in the COS segmentation. The final step of component classification is
performed by extracting a feature vector for each component, and then computing a MAP
estimate of the component label. The feature vector, , is calculated for each connected
component, , in the COS segmentation. Each is a 4-D feature vector which describes
aspects of the th connected component including edge depth and color uniformity.
Finally, the feature vector is used to determine the class label, , which takes a value of 0
for nontext and 1 for text.
47
The Bayesian segmentation model used for the CCC algorithm is shown in Fig. 6.
The conditional distribution of the feature vector given is modeled by a multivariate
Gaussian mixture while the underlying true segmentation labels are modeled by an MRF.
Using this model, we classify each component by calculating the MAP estimate of the
labels, , given the feature vectors, . In order to do this, we first determine which
components are neighbors in the MRF. This is done based upon the geometric distance
between components on the page.
4.2.1 Statistical Model
Here, we describe more details of the statistical model used for the CCC
algorithm. The feature vectors for text and non-text groups are modeled as Ddimensional multivariate Gaussian mixture distributions
The components of the feature vectors include measurements of edge depth and
external color uniformity of the connected component. The edge depth is defined as the
Euclidean distance between RGB values of neighboring pixels across the component
boundary (defined in the initial COS segmentation). The color uniformity is associated
with the variation of the pixels outside the boundary. In this experiment, we defined a
feature vector with four components, where the first two are mean and variance of the
edge depth and the last two are the variance and range of external pixel values. More
Dept. Of E.C.E, SIETK, PUTTUR
48
details are provided in the Appendix. To use an MRF, we must define a neighborhood
system. To do this, we first find the pixel location at the center of mass for each
connected component. Then for each component we search outward in a spiral pattern
until the nearest neighbors are found. The number is determined in an offline training
process along with other model parameters. We will use the symbol to denote the set of
neighbors of connected component. To ensure all neighbors are mutual, if component is a
neighbor of component, we add component to the neighbor list of component if this is not
already the case. In order to specify the distribution of the MRF, we first define
augmented feature vectors.
The augmented feature vector,, for the connected component consists of the
feature vector concatenated with the horizontal and vertical pixel location of the
connected components center. We found the location of connected components to be
extremely valuable contextual information for text detection. For more details of the
augmented feature vector, see the Appendix. Next, we define a measure of dissimilarity
between connected components in terms of the Mahalanobis distance of the augmented
feature vectors given by
Where
49
Using the defined neighborhood system, we adopted an MRF model with pairwise cliques. Let be the set of all where and denote neighboring connected components.
Then, the are assumed to be distributed as
Where S(.) is an indicator function taking the value 0 or 1, and , , and are scalar
parameters of the MRF model. As we can see, the classification probability is penalized
by the number of neighboring pairs which have different classes. This number is also
weighted by the term . If there exists a similar neighbor close to a given component, the
term becomes large since Di,j is small. This favors increasing the probability
that the two similar neighbors have the same class.
50
51
the weighting factor of the error for false detection, was fixed to 0.5 for the multiscaleCOS/CCC training process.
52
53
test documents on two additional scanners: the HP Photo smart 3300 All-in-One series
and Samsung SCX-5530 FN.
These test images were used to examine the robustness of the algorithms to
scanner variations Fig4.9 illustrates segmentations generated by Otsu/CCC, multiscaleCOS/CCC/Zheng, DjVu, Lura Document, COS, COS/CCC, and multiscale-COS/CCC for
a 300 dpi test image. The ground truth segmentation is also shown. This test image
contains many complex features such as different color text, light-color text on a dark
background, and various sizes of text. As it is shown, COS accurately detects most text
components but the number of false detections is quite large.
However, COS/CCC eliminates most of these false detections without
significantly sacrificing text detection. In addition, multiscale- COS/CCC generally
detects both large and small text with minimal false component detection. Otsu/CCC
method misses many text detections. Lura Document is very sensitive to sharp edges
embedded in picture regions and detects a large number of false components. DjVu also
detects some false components but the error is less severe than LuraDocument.
Multiscale-COS/CCC/Zhengs result is similar to our multiscale- COS/CCC result but
our text detection error is slightly less.
54
CHAPTER-5
SOFTWARE REQUIREMENTS
The main tools required for this project can be classified into two broad categories.
1) Hardware requirement,
2) Software requirement.
55
Built in functions for complex operations and algorithms (Ex. FFT, DCT, etc)
Algorithm development
not require dimensioning. This allows solving many technical computing problems,
especially those with matrix and vector formulations.
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK and
EISPACK projects. Today, MATLAB uses software developed by the LAPACK and
ARPACK projects, which together represent the state-of-the-art in software for matrix
computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses in mathematics, engineering, and science. In industry, MATLAB is the
tool of choice for high-productivity research, development, and analysis.
Dept. Of E.C.E, SIETK, PUTTUR
56
57
This is a library that allows writing C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking),
calling MATLAB as a computational engine, and for reading and writing MAT-files.
5.2.3 Basic programming
This part provides a brief introduction to starting and quitting MATLAB, and the
tools and functions that help to work with MATLAB variables and files.
A. Starting MATLAB
On a Microsoft Windows platform, to start MATLAB, double-click the MATLAB
shortcut icon. On a UNIX platform, to start MATLAB, type mat lab at the operating
system prompt. To can change the directory in which MATLAB starts, define startup
options including running a script upon startup, and reduce startup time in some
situations.
B. Quitting MATLAB
To end MATLAB session, select Exit MATLAB from the File menu in the
desktop, or type quit in the Command Window. To execute specified functions each time
MATLAB quits, such as saving the workspace, create and run a finish.m script.
C. MATLAB Desktop
When starting MATLAB, the MATLAB desktop appears, containing tools
(graphical user interfaces) for managing files, variables, and applications associated with
MATLAB. The first time MATLAB starts, the desktop appears as shown in the following
illustration, although your Launch Pad may contain different entries.
To can change the way desktop looks can be done by opening, closing, moving,
and resizing the tools in it. To move tools outside of the desktop or return them back
inside the desktop (docking) can also be done. All the desktop tools provide common
features such as context menus and keyboard shortcuts. Selecting Preferences from the
File menu can also specify certain characteristics for the desktop tools. For example,
specifying the font characteristics for Command Window text.
Dept. Of E.C.E, SIETK, PUTTUR
58
D. Desktop Tools
This section provides an introduction to MAT Labs desktop tools. MATLAB
functions are also used to perform most of the features found in the desktop tools. The
tools are:
Command Window
Command History
Launch Pad
Help Browser
Workspace Browser
Array Editor
Editor/Debugger
i.
Command Window
Use the Command Window to enter variables and run functions and M-files.
ii.
Command History
The lines entered in the Command Window are logged in the Command History
window. In the Command History, previously used functions can be viewed, and copy
and execute selected lines. To save the input and output from a MATLAB session to a
file, use the diary function.
iii.
The exclamation point character ! is a shell escape and indicates that the rest of or the
input line is a command to the operating system. This is useful for invoking utilities
running other programs without quitting MATLAB. On Linux, for example, Emacs
magik.m invokes an editor called emacs for a file named magik.m. When quitting the
external program, the operating system returns control to MATLAB.
Dept. Of E.C.E, SIETK, PUTTUR
59
iv.
Launch Pad
MAT Labs Launch Pad provides easy access to tools, demos, and documentation.
v.
Help Browser
Use the Help browser to search and view documentation for all the Math Works
products. The Help browser is a Web browser integrated into the MATLAB desktop that
displays HTML documents.
To open the Help browser, click the help button in the toolbar, or type help
browser in the Command Window. The Help browser consists of two panes, the Help
Navigator, which is used to find information, and the display pane, to view the
information.
vi.
Help Navigator
Product filter - Set the filter to show documentation only for the products
specified.
Contents tab - View the titles and tables of contents of documentation for
products.
Index tab - Find specific index entries (selected keywords) in the Math Works
documentation.
Search tab - Look for a specific phrase in the documentation. To get help for a
specific function, set the Search type to Function Name.
vii.
Browse to other pages - Use the arrows at the tops and bottoms of the pages, or
use the back and forward buttons in the toolbar.
60
Find a term in the page - Type a term in the Find in page field in the toolbar and
click Go.
Other features available in the display pane are: copying information, evaluating a
To determine how to execute functions call, MATLAB uses a search path to find
M-files and other MATLAB-related files, which are organized in directories on file
system. Any file to run in MATLAB must reside in the current directory or in a directory
that is on the search path. By default, the files supplied with MATLAB and Math Works
toolboxes are included in the search path.
ix.
Workspace Browser
The MATLAB workspace consists of the set of variables (named arrays) built up
during a MATLAB session and stored in memory. Variables are added to the workspace
by using functions, running M-files, and loading saved workspaces.
To view the workspace and information about each variable, use the Workspace
browser, or use the functions who and whose. To delete variables from the workspace,
select the variable and select Delete from the Edit menu. Alternatively, use the clear
function.
To save the workspace to a file that can be read during a later MATLAB session,
select Save Workspace As from the File menu, or use the save function. This saves the
workspace to a binary file called a MAT-file, which has a .mat extension. There are
options for saving to different formats. To read in a MAT-file, select Import Data from the
File menu, or use the load function.
5.2.4 Array Editor
Dept. Of E.C.E, SIETK, PUTTUR
61
Start by entering Drer's matrix as a list of its elements. For this follow few basic
conventions:
62
13
10
11
12
15
14
This exactly matches the numbers in the engraving. Once entered the matrix, it is
automatically remembered in the MATLAB workspace. Now it is simply referred as A.
5.2.7 Expressions
Like most other programming languages, MATLAB provides mathematical
expressions, but unlike most programming languages, these expressions involve entire
matrices. The building blocks of expressions are:
Variables
Numbers
Operators
Functions
5.2.8 Variables
MATLAB does not require any type declarations or dimension statements. When
MATLAB encounters a new variable name, it automatically creates the variable and
allocates the appropriate amount of storage. If the variable already exists, MATLAB
changes its contents and, if necessary, allocates new storage. For example, num_students
Dept. Of E.C.E, SIETK, PUTTUR
63
= 25 creates a 1-by-1 matrix named num_students and stores the value 25 in its single
element. Variable names consist of a letter, followed by any number of letters, digits, or
underscores. MATLAB uses only the first 31 characters of a variable name. MATLAB is
case sensitive; it distinguishes between uppercase and lowercase letters. A and a are not
the same variable. To view the matrix assigned to any variable, simply enter the variable
name.
5.2.9 Numbers
MATLAB uses conventional decimal notation, with an optional decimal point and
leading plus or minus sign, for numbers. Scientific notation uses the letter e to specify a
power-of-ten scale factor. Imaginary numbers use either i or j as a suffix. Some examples
of legal numbers are
3
9.6397238
1i
-99
1.60210e-20
-3.14159j
0.0001
6.02252e23
3e5i
All numbers are stored internally using the long format specified by the IEEE
floating-point standard. Floating-point numbers have a finite precision of roughly 16
significant decimal digits and a finite range of roughly 10-308 to 10+308.
5.2.10 Operators
Expressions use familiar arithmetic operators and precedence rules.
+
Addition
Subtraction
Multiplication
Division
64
Power
'
()
5.3 Functions
MATLAB provides a large number of standard elementary mathematical
functions, including abs, sqrt, exp, and sin. Taking the square root or logarithm of a
negative number is not an error; the appropriate complex result is produced
automatically. MATLAB also provides many more advanced mathematical functions,
including Bessel and gamma functions. Most of these functions accept complex
arguments. For a list of the elementary mathematical functions, type
help elfun
For a list of more advanced mathematical and matrix functions, type
help spec fun
help elmat
Some of the functions, like sqrt and sin, are built-in. They are part of the
MATLAB core so they are very efficient, but the computational details are not readily
accessible. Other functions, like gamma and sinh, are implemented in M-files. The code
can be seen and even can be modified. Several special functions provide values of useful
constants.
Pi
3.14159265...
Imaginary unit, -1
65
Same as i
Eps
Realmin
Realmax
Inf
Infinity
NaN
Not-a-number
CHAPTER-6
EXPERIMENTAL RESULTS
66
CHAPTER-7
CONCLUSION
We presented a novel segmentation algorithm for the compression of raster
documents. While the COS algorithm generates consistent initial segmentations, the CCC
algorithm substantially reduces false detections through the use of a component-wise
MRF context model. The MRF model uses a pair-wise Gibbs istribution which more
heavily weights nearby components with similar features. We showed that the COS/CCC
algorithm achieves greater text detection accuracy with a lower false detection rate, as
Dept. Of E.C.E, SIETK, PUTTUR
67
REFERENCES
[1] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International
Telecommunication Union, 1999.
[2] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system
for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.
[3] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.
26, pp. 647656, 1982.
68
[4] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th Int.
Conf. Pattern Recognit., 1990, pp. 567572.
[5] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.
[6] Y. Chen and B. Wu, A multi-plane approach for te 14191444, 2009.
[7] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle
River, NJ: Pearson Education, 2008.
[8] ITU-T Recommendation T.44 Mixed Raster Content (MRC), T.44, International
Telecommunication Union, 1999.
[9] G. Nagy, S. Seth, and M. Viswanathan, A prototype document image analysis system
for technical journals, Computer, vol. 25, no. 7, pp. 1022, 1992.
[10] K. Y.Wong and F. M.Wahl, Document analysis system, IBM J. Res. Develop., vol.
26, pp. 647656, 1982.
[11] J. Fisher, A rule-based system for document image segmentation, in Proc. 10th
Int. Conf. Pattern Recognit., 1990, pp. 567572.
[12] L. OGorman, The document spectrum for page layout analysis, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 11621173, Nov. 1993.
[13] Y. Chen and B. Wu, A multi-plane approach for text segmentation of complex
document images, Pattern Recognit., vol. 42, no. 7, pp. 14191444, 2009.
[14] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed. Upper Saddle
River, NJ: Pearson Education, 2008.
69