You are on page 1of 27

1.

INTRODUCTION
1.1 Introduction
In recent years, the researches about intelligent video surveillances have increased
noticeably. Foreground object detection could be considered as one of the fundamental
and critical techniques in this field. In the conventional methods, background subtraction
and temporal difference have been widely used for foreground extraction in case of using
stationary cameras. The continuing improvements in background modeling techniques
have led to many new and fascinating applications like event detection, object behavior
analysis, suspicious object detection, traffic monitoring, and so forth. However, some
factors like dynamic backgrounds and moving shadows might affect the results of
foreground detection and make the problems more complicated. Dynamic backgrounds,
one of the factors, might detect and treat the escalators and swaying trees as the
foreground regions. Another factor, moving shadows, occurred when light was blocked
by moving objects, usually misclassified the foreground regions. This project would
focus on the studies of moving shadows and aim at developing an efficient and robust
algorithm of moving shadow removal as well as the related applications.
Real-time extraction of moving objects from video sequences is an important topic
for various applications of computer vision. Applications include counting the number of
cars in traffic, observing traffic patterns, automatic detection of trespassers, video data
compression, and analysis of non-rigid motion. To extract moving objects, there is a
problem that shadows produced by objects are regarded as moving objects. This is one of
the big problems to be solved. The methods for shadow detection or object extraction by
using chromatic information have been proposed. When the shadow detection process is
added to the method for extracting moving objects, it becomes difficult to handle the
whole process in real time. While, object extraction methods by using chromatic
information, to which RGB color space is converted easily, can handle the whole process
in real time. However, a simple background subtraction using chromatic information has
the limitation to address the intensity changes.

1
• Objectives
The project will be having two objectives. The first one stands at eliminating
shadows from frames of videos, which will assist the machine vision to count or detect
the objects separately from shadows. This will be done so as the shadows are not
accounted as moving objects.
The Second objective stands at removing shadows from live videos, so that the
shadows are completely invisible to the user’s observation, and the details are not hid in
the background. This is an important aspect where user has to monitor the videos on
surveillance system for some abnormal movement or trespassing.
1.2 Environment to be used
For this project we will be using Mathworks MATLAB environment. It is one of
the most suitable software for carrying out Signal processing operations, specially on
Images, Audio, Video and morphing purposes. MATLAB® is a high-performance
language for technical computing. It integrates computation, visualization, and
programming in an easy-to-use environment where problems and solutions are expressed
in familiar mathematical notation.
The name MATLAB stands for matrix laboratory. MATLAB was originally written
to provide easy access to matrix software developed by the LINPACK and EISPACK
projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries,
embedding the state of the art in software for matrix computation. MATLAB has evolved
over a period of years with input from many users. In university environments, it is the
standard instructional tool for introductory and advanced courses in mathematics,
engineering, and science. In industry, MATLAB is the tool of choice for high-
productivity research, development, and analysis.
The platform for the project is of the following specifications
• INTEL Celeron Laptop

• Frequency – 1.5 GHz

• 760 MB RAM

• Window XP Professional

2
• MATLAB Version 7.0

2. LITERATURE SURVEY
2.1 Literature Survey
• About the studies on influences of moving shadows, Zhang et al.[1] classified
these techniques into four categories, including color model, statistical model, textural
model, and geometric model. Color model used the difference of colors between the
shaded and nonshaded pixels.
• Cucchiara et al.[2] removed moving shadows by using the concept on HSV color
space that the hue component of shaded pixels would vary in a smaller range and the
saturation component would decrease more obviously. He used this to detect moving
objects, shadows and ghosts in video sequences. The detection of shadows was based on
the observation that shadows changed significantly the lightness of an area without
greatly modifying the colour information.
• Some researchers proposed shadow detection methods based on RGB color space
and normalized-RGB color space. Yang et al. [3] described the ratio of intensities
between a shaded pixel and its neighboring shaded pixel in the current image, and this
intensity ratio was found to be close to that in the background image. They also made use
of the slight change of intensities on the normalized R and G channels between the
current and background image.
• Cavallaro et al.[4] found that the color components would not change their orders
and the photometric invariant features would have a small variation while shadows
occurred. Beside of color model, statistical model used the probabilistic functions to
determine whether or not a pixel belonged to the shadows.
• Zhang et al.[1] introduced an illumination invariance feature and then analyzed
and modeled shadows as a Chi-square distribution. They classified each moving pixel
into the shadow or foreground object by performing a significance test.
• Song and Tai [5] applied Gaussian model to representing the constant RGB-color
ratios, and determined whether a moving pixel belonged to the shadow or foreground
object by setting ±1.5 standard deviation as a threshold.

3
• Martel-Brisson and Zaccarin [6] proposed GMSM (Gaussian Mixture Shadow
Model) for shadow detection which was integrated into a background detection algorithm
based on GMM. They tested if the mean of a distribution could describe a shaded region,
and if so, they would select this distribution to update the corresponding Gaussian
mixture shadow model.
The texture model assumed that the texture of the foreground object would be t
totally different from that of the background, and that the textures would be distributed
uniformly inside the shaded region. Joshi and Papanikolopoulos [7,8] proposed an
algorithm that could learn and detect the shadows by using support vector machine
(SVM). Support vector machines (SVMs) are a set of related supervised
learning methods that analyze data and recognize patterns, used
for classification and regression analysis. The original SVM algorithm was invented
by Vladimir Vapnik and the current standard incarnation (soft margin) was proposed
by Corinna Cortes and Vladimir Vapnik. The standard SVM takes a set of input data,
and predicts, for each given input, which of two possible classes the input is a member
of, which makes the SVM a non-probabilistic binary linear classifier. Since an SVM is a
classifier, then given a set of training examples, each marked as belonging to one of two
categories, an SVM training algorithm builds a model that predicts whether a new
example falls into one category or the other. Intuitively, an SVM model is a
representation of the examples as points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as wide as possible. New examples
are then mapped into that same space and predicted to belong to a category based on
which side of the gap they fall on. More formally, a support vector machine constructs
a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be
used for classification, regression or other tasks. Intuitively, a good separation is
achieved by the hyperplane that has the largest distance to the nearest training data
points of any class (so-called functional margin), since in general the larger the margin
the lower the generalization error of the classifier whereas the original problem may be
stated in a finite dimensional space, it often happens that in that space the sets to be
discriminated are not linearly separable. For this reason it was proposed that the original
finite dimensional space be mapped into a much higher dimensional space presumably

4
making the separation easier in that space. SVM schemes use a mapping into a larger
space so that cross products may be computed easily in terms of the variables in the
original space making the computational load reasonable. The cross products in the
larger space are defined in terms of a kernel function K(x,y) which can be selected to suit
the problem. The hyperplanes in the large space are defined as the set of points whose
cross product with a vector in that space is constant. The vectors defining the
hyperplanes can be chosen to be linear combinations with parameters αi of images of
feature vectors which occur in the data base.

In this way the sum of kernels above can be used to measure the relative nearness
of each test point to the data points originating in one or the other of the sets to be
discriminated. Note the fact that the set of points x mapped into any hyperplane can be
quite convoluted as a result allowing much more complex discrimination between sets
which are far from convex in the original space. Joshi and Papanikolopoulos defined four
features of images, including intensity ratio, color distortion, edge magnitude distortion
and edge gradient distortion. They introduced a cotraining architecture which could make
two SVM classifiers help each other in the training process, and they should need a small
set of labeled samples on shadows before training SVM classifiers for different video
sequences. Leone et al. presented a shadow detection method by using Gabor features.
Mohammed Ibrahim and Anupama [9] proposed their method by using division image
analysis and projection histogram analysis. Image division operation was performed on
the current and reference frames to highlight the homogeneous property of shadows.
They afterwards eliminated the left pixels on the boundaries of shadows by using both
column-and row-projection histogram analyses. Geometric model attempted to remove
shadowed regions or the shadowing effect by observing the geometric information of
objects.
• Hsieh et al.[10] used the histograms of vehicles and the calculated center of lane
to detect the lane markings, and also developed a horizontal and vertical line-based
method to remove shadows by characteristics of those lane markings. This method might
become ineffective in case of no lane markings.
2.2 Related Work

5
Several techniques for background subtraction and shadow detection have been
proposed in the past years. Background detection techniques may use grayscale or color
images, while most shadow detection methods make use of chromaticity information.
Next, some of these techniques are described. The car tracking system of Kolleretal, used
an adaptive background model based on monochromatic images filtered with Gaussian
and Gaussian derivative (vertical and horizontal) kernels. McKenna et al proposed a
background model that combines pixel RGB and chromaticity values with local image
gradients. IntheirW4system, Haritaoglu and collaborators used gray scale images to build
a background model, representing each pixel by three values its minimum intensity value,
its maximum intensity value and the maximum intensity difference between consecutive
frames observed during the training period.
Haritaoglu and collaborators used grayscale images to build a background model,
representing each pixel by three values; its minimum intensity value, its maximum
intensity value and the maximum intensity difference between consecutive frames
observed during the training period. Elgammal et al. used a nonparametric background
model based on kernel based estimators, that can be applied to both color or grayscale
images. KaewTrakulPong and Bowden used color images for background representation.
In their method, each pixel in the scene is modelled by a mixture of Gaussian
distributions (and different Gaussians are assumed to represent different colors).
Cucchiara’s group used a temporal median filtering in the RGB color space to produce a
background model. Shadow detection algorithms have also been widely explored by
several authors, mostly based on invariant color features, that are not significatively
affected by illumination conditions. McKenna et al used pixel and edge information of
each channel of the normalized RGB color space (or rgb) to detect shadowed pixels.
Elgammal et al. also used the normalized rgb color space, but included a lightness
measure to detect cast shadows. Prati’s and Cucchiara’s groups used the HSV color space,
classifying as shadows those pixels having the approximately the same hue and saturation
values compared to the background, but lower luminosity. KaewTrakulPong and Bowden
used a chromatic distortion measure and a brightness threshold in the RGB space to
determine foreground pixels affected by shadows. Salvador et al. adopted the c1c2c3

6
photometric invariant color model, and explored geometric features of shadows. A few
authors have studied shadow detection in monochromatic video sequences, having
in mind applications such as indoor video surveillance and conferencing. Basically, they
detect the penumbra of the shadow, assuming that edge intensity within the penumbra is
much smaller that edge intensity of actual moving objects. Clearly, such hypothesis does
not hold for video sequences containing low-contrast foreground objects (specially in
outdoors applications). A revision of the literature indicates that several background
models are available, applicable for color and/or grayscale video sequences. Also, there
are several shadow detection algorithms to remove undesired segmentation of cast
shadows in video sequences. However, in accordance with other authors, we chose to use
a background model based on median filtering, because it is effective and requires less
computational cost than the Gaussian or other complex statistics.

7
3. PERFORMANCE ANALYSIS
3.1 Block Diagram of the process
For the purpose of the shadow removal from objects in the video, we will follow
more or less the following block diagram The process shown here is mainly for a fixed
reference background image. This is the most basic algorithm that relies on extracting the
foreground by calculating the difference in the gray level values of the foreground and
the background.

FRAME BACKGROUND NORMALISATION


ACQUISITION IMAGE

RECONSTRUCT- MORPHOLOGICAL REMOVAL OF SHADOW


ION OPERATION

Fig. 3.1 Block Diagram of the shadow removal process


As stated above, the project shall be undertaken in various stages. Firstly,
experiments will be carried out for foreground extraction for the correct detection of
moving objects in a video. Then the algorithm will be applied for complete removal of
shadows in a video for clarity in the view. But foremost, it is important to know about the
basic steps involved for the whole process. The above block diagram has been explained
in brief below.
 Reading of frame from a video sequence
The mmreader function constructs a multimedia reader object that can read video
data from Multimedia file. This function is very useful as it can read the video file with
different formats which are not been read by simple

8
The object created by mmreader is used for reading the video file so that frame can
be extracted from it and the processing can be done on each of the frame. The function
aviread is used to extract a frame from the video, where videoFile is the path of the
movie file and Frameno is the number of frame to be extracted. The function will extract
the entire frame and will store it into the Movie.

 Background Reference Image


A background reference image is required for the process of shadow removal. It
will be used to as reference to every new frame which will result in the outline of objects
and their shadows. The first frame of every video can be used as the background
reference image. The Background reference need not be necessary that it will be first
frame of the video sequence. It may be one of the other frames also. The background
reference image is stored in a variable matrix which will be used in further processing of
the image.
 Normalizing the RGB factor in each frame
First extract the background reference image which is generally the first frame of
the sequence.
For Normalization we will do the following things
N=R+G+B
N= Summation of Each color in a frame
R= The Red pixel value
G= The Green pixel value
B= The Blue pixel value
R=R/N, G=G/N, B=B/N;
In this step we will do preprocessing for the background reference image and also
for the subsequent frame of the given video. This is done to remove the change of
intensity in any frame. Converting an RGB image into normalized RGB removes the
effect of any intensity variations.
The light reflected from an object depends not only on object colours but also on
lighting geometry and illuminant colour. As a consequence the raw colour recorded by a
camera is not a reliable cue for object based tasks such as recognition and tracking. One

9
solution to this problem is to find functions of image colours that cancel out dependencies
due to illumination. While many invariant functions cancel out either dependency due to
geometry or illuminant colour, only the comprehensive normalisation has been shown
(theoretically and experimentally) to cancel both.

 Removal of Shadow
We will perform for each of the frame the normalization for RGB color and after
that will do the further processing of the image. I will subtract each frame of the video
from the background frame. Performed this for the entire frame in the video sequence
.After that we applied the threshold on each frame so that we can get a shadow free image
in distorted frame where we will just show the moving object.
 Morphological operation
Morphology is a broad set of image processing operations that process images
based on shapes. Morphological operations apply a structuring element to an input image,
creating an output image of the same size. In a morphological operation, the value of
each pixel in the output image is based on a comparison of the corresponding pixel in the
input image with its neighbors. By choosing the size and shape of the neighborhood, you
can construct a morphological operation that is sensitive to specific shapes in the input
image.
We will apply the dilation to get the accurate boundary of the moving object as
well as shadow .The dilation will add pixel to the boundary of an image to get an exact
boundary of image and in later process we will apply this dilated image to the normal
image to get exact boundary of the moving object.
The process of dilation is as described: The dilation operator takes two pieces of
data as inputs. The first is the image which is to be dilated. The second is a set of
coordinate points known as a Structuring Element. It is this structuring element that
determines the precise effect of the dilation on the input image. The mathematical
definition of grayscale dilation is identical except for the way in which the set of
coordinates associated with the input image is derived. In addition, these coordinates are

10
3-D rather than 2-D. The brief procedure for implementing the dilation in the binary
image is explained.
The dilation is applied on the binary image in a single pass. During the pass, if the
pixel in hand is equal to binary 1, then apply the structuring element on the image by
starting from that particular pixel as the origin. The sample project supposes that, in the
thresholded image, the background is black and the foreground is white. If the sample
image is not so, then you have to adjust accordingly.
The dilation can also be applied on the graylevel images also in a single pass.
During the passing through the image, the structuring element is applied on each pixel of
the image, such that the origin of the structuring element is applied on that particular
pixel. In this case the corresponding pixel of the output image contains the maximum of
the pixels surrounding it. In this case, only those pixels are compared with each other,
where the structuring element contains. Let us take the example of the following
image:

1 1 1
1 1 1
1 1 1

Fig. 3.2 A binary image Fig. 3.3 A 3X3 structuring element


The following image shows the effect of the dilation on the above binary image by
the above structuring element:

Fig. 3.4 Effect of dilation


In the above images, the boxes shows the pixels, and the white colored boxes
shows that the binary pixel contains 1, while the black box shows that the corresponding
pixel contains 0. It can be seen that the hole is removed from the image after applying the
dilation. Dilations can be made directional by using less symmetrical structuring

11
elements. e.g. a structuring element that is 10 pixels wide and 1 pixel high will dilate in a
horizontal direction only. Similarly, a 3*3 square structuring element with the origin in
the middle of the top row rather than the center, will dilate the bottom of a region more
strongly than the top.
 Reconstruction of original image and its boundary
After applying all the above 5 process, the reconstruction of the image needs to be
done so that we can get the original video with only the detection of moving object and
the shadows are not detected. For getting reconstructed boundary of the object without
shadow region, point wise multiplication of objects and its shadow is multiplied with the
dilated object region. After getting the boundary of the moving object using the point
wise multiplication we will able to get the boundary size of moving object and then we
can apply this boundary to the original frames of video for showing the boundary of
moving object only.
3.2 Flow chart for foreground extraction based on Gaussian Mixture Model
In this section, we would introduce our entire architecture for algorithms of moving
shadow removal which consisted of five blocks, including foreground object extraction,
foreground-pixel extraction by edge-based shadow removal, foreground-pixel extraction

12
by gray level-based shadow removal, feature combination, and the practical applications.

Fig. 3.5 Flow Chart representing the process of Shadow removal


3.2.1 Foreground object extraction
The sequence of images in gray-level should be taken as the input of the foreground
object extraction processes, and the moving object with its minimum bounding rectangle
as the output of the algorithm. The Gaussian Mixture Model (GMM) can be incorporated
here, functioning as the development of background image which is a representative

13
approach to background subtraction. It would be more appropriate to choose the way by
background subtraction for extracting foreground objects rather than that by temporal
difference under the consideration of all the pros and cons of these two typical
approaches. Furthermore, the latter could do a better job in extracting all relevant pixels,
and this paper aimed at tackling the problems generated from the traffic monitoring
systems where cameras would be usually set fixedly. Some previous studies have
proposed a standard process of background construction, hence we would put a higher
premium on the following two parts, inclusive of foreground-pixel extraction by edge-
based and gray-level based shadow removal.
 Gaussian Mixture Model (GMM): In statistics, a mixture model is a probabilistic
model for density estimation using a mixture distribution. That is, the observations in a
mixture model are assumed to be distributed according to a mixture density. A mixture
model can be regarded as a type of unsupervised learning or clustering. Mixture
models should not be confused with model for compositional data, i.e., data whose
components are constrained to sum to a constant value. Typical finite-dimensional
mixture model is a hierarchical model consisting of the following components:
 N random variables corresponding to observations, each assumed to be
distributed according to a mixture of K components, with each component belonging
to the same parametric family of distributions but with different parameters
 N corresponding random latent variables specifying the identity of the
mixture component of each observation, each distributed according to a K-
dimensional categorical distribution
 A set of K mixture weights, each of which is a probability (a real number
between 0 and 1), all of which sum to 1
 A set of K parameters, each specifying the parameter of the corresponding
mixture component. In many cases, each "parameter" is actually a set of parameters.
For example, observations distributed according to a mixture of one-
dimensional Gaussian distributions will have a mean and variance for each
component. Observations distributed according to a mixture of V-dimensional
categorical distributions (e.g. when each observation is a word from a vocabulary of

14
size V) will have a vector of V probabilities, collectively summing to 1. Typically a
Gaussian Model will have the following parameters.
K Number of mixture components
N Number of observations
Parameter of distribution of observation associated with
ϴi = 1….K
component i
ᶲi = 1….K Mixture weight, i.e prior probability of a particular component i
K-Dimensional vector composed of all individual ᶲi = 1….K must

sum to 1
µi= 1…N Mean of component i
X2i= 1…N Variance of component i
F(x|ϴ) Probability distribution of an observation, parameterized on ϴ
Table 1 Gaussian Mixture Model Parameters
3.2.2 Foreground pixel extraction by edge-based shadow removal
The main ideas for extracting foreground-pixels by the information of edges in the
detected object were inspired by that the edges of object in interest would be identified
and removed much easier if the homogeneity in the shadow region could be within a
small range of variance. Relatively, we could also obtain the features of edges for non
-shadow regions. The flowchart of foreground-pixel extraction by edge-based shadow
removal is as shown in Figure 3.6. More clearly Sobel operations can be used to extract
the edges for both GMM-based background images and foreground objects.
The main ideas for extracting foreground-pixels by the information of edges in the
detected object were inspired by that the edges of object in interest would be identified
and removed much easier if the homogeneity in the shadow region could be within a
small range of variance. Relatively, we could also obtain the features of edges for non
-shadow regions. The flowchart of foreground-pixel extraction by edge-based shadow
removal was shown in Figure 3.6. More clearly Sobel operations can be used to extract
the edges for both GMM-based background images and foreground objects.

15
Fig. 3.6 Flow chart showing the process of foreground pixel extraction

 Sobel Operation
The Sobel operator performs a 2-D spatial gradient measurement on an image and
so emphasizes regions of high spatial frequency that correspond to edges. Typically it is
used to find the approximate absolute gradient magnitude at each point in an input
grayscale image. The operator consists of a pair of 3×3 convolution kernels as shown in
Figure 3.7. One kernel is simply the other rotated by 90°.

Figure 3.7 Sobel convolution kernels

16
Here Gx is the vertical Sobel kernel, Gy is the horizontal Sobel kernel
These kernels are designed to respond maximally to edges running vertically and
horizontally relative to the pixel grid, one kernel for each of the two perpendicular
orientations. The kernels can be applied separately to the input image, to produce
separate measurements of the gradient component in each orientation (call these Gx and
Gy). These can then be combined together to find the absolute magnitude of the gradient
at each point and the orientation of that gradient. The gradient magnitude is given by:

Typically, an approximate magnitude is computed using:

which is much faster to compute.


The angle of orientation of the edge (relative to the pixel grid) giving rise to the
spatial gradient is given by:

In this case, orientation 0 is taken to mean that the direction of maximum contrast
from black to white runs from left to right on the image, and other angles are measured
anti-clockwise from this.
Often, this absolute magnitude is the only output the user sees --- the two
components of the gradient are conveniently computed and added in a single pass over
the input image using the pseudo-convolution operator shown in Figure 3.8.

Figure 3.8 Pseudo-convolution kernels used to quickly compute approximate gradient magnitude

17
Using this kernel the approximate magnitude is given by:

 Image subtraction
One image can be subtracted from another one, pixel by pixel. Where the pixels
have the same value, the resulting pixel is 0. Where the two pixels are different, the
resulting pixel is the difference between them. Image subtraction is used in motion
detection, i.e. to detect object motion between two or more images. Typically, two
images with the same backgound scene are subtracted, yielding an image that only shows
the different (the object in motion).
 Boundary Elimination
The outer borders have to be removed because of the following problems. The
shadow region and real foreground object with the same motion vectors would make
them always adjacent to each other. Also, the interior region of shadowed/foreground
objects should be homogeneous/nonhomogeneous (non texture or edgeless/dominant
edged), which implies that the edges from shadows would appear at the outer borders of
foreground objects. Considering these two properties, the objective of removing shadows
could be treated as eliminating the outer borders and preserving the remaining edges
which belongs to real foreground objects. Also, the latter property mentioned above
might not be always satisfied.
We will use a mask to achieve boundary elimination. The selection of the mask
completely depends upon the size of the border of the foreground in question. If the
region covered by the mask completely belongs to foreground objects, we reserve this
point; otherwise, we eliminate this point. After applying the outer boundary elimination,
we would obtain the features for non shadow pixels.
3.2.3 Gray level based shadow removal foreground pixel extraction
Figure 3.9 shows the flowchart of gray level-based shadow removal foreground
pixel extraction. Some pixels can be selected which belong to shadow-potential regions
from foreground objects. The darkening factors can be calculated, and then a Gaussian
model can be built for each gray level. Once the Gaussian model is trained, we could use

18
this model to determine if each of the pixels inside foreground objects belonged to the
shadowed region or not.

Fig. 3.9 Gray level based shadow removal foreground pixel extraction

 Gaussian Darkening Factor Model Updating


We would rather stimulate the darkening factor with respect to each gray level by
one Gaussian Model than that with respect to each pixel. In the beginning, we would
select the shadow-potential pixels as the updating data of Gaussian models by our three
predefined conditions introduced in the followings.
• Pixels must belong to the foreground objects, for the shadowed pixels must be
part of the foregrounds ideally.
• The intensity of a pixel in the current frame should be smaller than that in the
frame of backgrounds, for the shadowed pixels must be darker than background-
pixels.
• The pixels obtained from the foreground-pixel extraction by edge-based shadow
removal should be excluded to reduce the number of pixels which might be classified
as nonshadowed pixels.

19
• After the pixels for updating were selected, we would update the mean and
standard deviation of Gaussian model. Figure 3.9 displayed the flowchart of the
updating process of Gaussian darkening factor model. After calculating the
darkening factor, we would update the Gaussian model.

Fig. 3.10 Gaussian darkening factor model updating procedure


A threshold has to be set as a minimum number of updating times, and the updating
times of each Gaussian model must exceed this threshold to ensure the stability of each
model. Besides, in order to reduce the computation loading of updating procedure, we
gave a limit that each Gaussian model could only be updated at most 200 times for one
frame.
 Determination of non-shadowed pixels
Here, we introduce how to extract the nonshadowed pixels by using the trained
Gaussian darkening factor model. Figure 3.10 gives us the rules to calculate the
difference between the mean of Gaussian model and the darkening factor, and check if
the difference was smaller than 3 times of standard deviation. If yes, the pixel would be
classified as shadowed. Otherwise, it would be considered as the nonshadowed pixel and
could be reserved as a feature point. Figure 3.11 describes our tasks to determine the
nonshadowed pixels. If the Gaussian model was not trained, we would go checking if the
nearby Gaussian models were marked as trained or not. In our programs, we selected the
nearby 6 Gaussian models for checking, and we chose the nearest one if there existed any
trained Gaussian model.

20
Fig. 3.11 Non shadowed pixel determination

3.2.4 Feature combination


The two kinds of features which have been introduced in the Section 3.2.1 and
3.2.2 in our algorithm should be combined to extract foreground objects in a more
accurate way. Figure 3.12 exhibits the flowchart of feature combination.

21
Fig. 3.12 Flow chart for feature combination

 Noise filtering and dilation


Opening and Closing: The opening filter performs an erosion then a dilation (see
above). In images containing bright objects on a dark background, the opening filter
smoothes object contours, breaks (opens) narrow connections, eliminates minor
protrusions and removes small dark spots. In images with dark objects on a bright
background, the opening filter fills narrow gaps between objects. The closing filter is a
morphological filter that performs a dilation followed by an erosion. In images containing
dark objects on a bright background, the opening filter smoothes object contours, breaks
narrow connections, eliminates minor protrusions and removes small bright spots. In
images with bright objects on a dark background, the closing filter fills narrow gaps
between objects.
3.3 Proposed algorithm
In this Section, we describe the background model W4 [11] given by I. Haritaoglu,
D. Harwood, and L. Davis: Realtime surveillance of people and their activities. The
proposed algorithm is a small improvement over the model. We also highlight a novel

22
method for shadow segmentation of foreground pixels, based on normalized cross-
correlations and pixel ratios.
3.3.1 Background scene modelling
W4 uses a model of background variation that is a bimodal distribution constructed
from order statistics of background values during a training period, obtaining robust
background model even if there are moving foreground objects in the field of view, such
as walking people, moving cars, etc. It uses a two stage method based on excluding
moving pixels from background model computation. In the first stage, a pixel wise
median filter over time is applied to several seconds of video (typically 20-40 seconds) to
distinguish moving pixels from stationary pixels (however, our experiments showed that
100 frames ≈ 3.4 seconds are typically enough for the training period, if not too many
moving objects are present). In the second stage, only those stationary pixels are
processed to construct the initial background model. Let V be an array containing N
consecutive images, Vk(i, j) be the intensity of a pixel (i, j) in the k-th image of V, σ(i, j)
and λ(i, j) be the standard deviation and median value of intensities at pixel (i, j) in all
images in V, respectively. The initial background model for a pixel (i, j) is formed by a
three-dimensional vector: the minimum m(i, j) and maximum n(i, j) intensity values and
the maximum intensity difference d(i, j) between consecutive frames observed during this
training period. This condition guarantees that only stationary pixels are computed in the
background model, i.e., Vz(i, j) is classified as a stationary pixel. After the training
period, an initial background model B(i, j) is obtained. Then, each input image It (i, j) of
the video sequence is compared to B(i, j), and a pixel (i, j) is classified as a background
pixel if: It(i, j)−m(i, j) ≤ kµ or It (i, j)−n(i, j) ≤ kµ - (2)
where µ is the median of the largest inter frame absolute difference image d(i, j),
and k is a fixed parameter (the authors suggested the value k = 2). It can be noted that, if a
certain pixel (i, j) has an intensity m(i, j)<It(i, j)<n(i, j) at a certain frame t, it should be
classified as background (because it lies between the minimum and maximum values of
the background model). However, Equation (2) may wrongly classify such pixel as
foreground, depending on k, µ, m(i, j) and n(i, j). For example, if µ= 5, k = 2, m(i, j) = 40,
n(i, j) = 65 and It (i, j) = 52, Equation (2) would classify It(i, j) as foreground, even

23
though it lies between m(i, j) and n(i, j). To solve this problem, we propose an alternative
test for foreground detection, and classify It (i, j) as a foreground pixel if:
It (i, j)>(m(i, j)−kµ) and It (i, j)<(n(i, j)+kµ) - (3)
3.3.2 Shadow identification
In shadowed regions, it is expected that a certain fraction α of incoming light is
blocked. Although there are several factors that may influence the intensity of a pixel in
shadow, we assume that the observed intensity of shadow pixels is directly proportional
to incident light; consequently, shadowed pixels are scaled versions (darker) of
corresponding pixels in the background model. As noticed by other authors, the
normalized crosscorrelation (NCC) can be useful to detect shadow pixel candidates, since
it can identify scaled versions of the same signal. In this work, we use the NCC as an
initial step for shadow detection, and refine the process using local statistics of pixel
ratios, as explained next.
• Detection of shadow pixel candidates
Let B(i, j) be the background image formed by temporal median filtering, and I(i, j)
be an image of the video sequence. For each pixel (i, j) belonging to the foreground,
consider a (2N + 1) × (2N + 1) template Ti j such that Ti j(n,m) = I(i+n, j+m), for −N ≤ n
≤ N, −N ≤ m ≤ N (i.e. Ti j corresponds to a neighborhood of pixel (i, j)). Then, the NCC
between template Ti j and image B at pixel (i, j) is given by:

where

For a pixel (i, j) in a shadowed region, the NCC in a neighboring region Ti j should
be large (close to one), and the energy ETi j of this region should be lower than the

24
energy EB(i, j) of the corresponding region in the background image. Thus, a pixel (i, j) is
pre-classified as shadow if:
NCC(i, j) ≥ Lncc and ETi j < EB(i, j) - (6)
where Lncc is a fixed threshold. If Lncc is low, several foreground pixels corresponding
to moving objects may be misclassified as shadows. On the other hand, selecting a larger
value for Lncc results in less false positives, but pixels related to actual shadows may not
be detected.
• Shadow refinement
The NCC provides a good initial estimate about the location of shadowed pixels, by
detecting pixels for which the surrounding neighborhood is approximately scaled with
respect to the reference background. However, some background pixels related to valid
moving objects may be wrongly classified as shadow pixels. To remove such false
positives, a refinement stage is applied to all pixels that satisfy Equation (6). The
proposed refinement stage consists of verifying if the ratio I(i, j)/B(i, j) in a neighborhood
around each shadow pixel candidate is approximately constant, by computing the
standard deviation of I(i, j)/B(i, j) within this neighborhood. More specifically we
consider a region R with (2M+1) × (2M+1) pixels centered at each shadow pixel
candidate (i,j), and classify it as a shadow pixel if:

where stdR(I(i,j)/ B(i,j)) is the standard deviation of quantities I(i,j)/B(i,j) over the
region R,andLstd,Llow are thresholds. More precisely, Lstd controls the maximum
deviation within the neighbourhood being analyzed, and Llow prevents the
misclassification of dark objects with very low pixel intensities as shadowed pixels.

25
4. CONCLUSION
4.1 Conclusion
In this report we have presented some of the techniques that can be used for the
shadow removal from videos or an image sequence. The actual project implementation
will be carried out in steps:
• Testing of algorithm on still gray level images

• Applying the algorithm on a gray level image sequence

• Extracting the foreground from a fixed background reference image

• Applying the algorithm to remove shadows in real time surveillance systems.

The algorithm given above in section 3.1 is based on RGB color model to detect the
shadow in the video sequence. When compared with the area in the background of a
video sequence, there is only decrease on Red ,Green, Blue value in the shadow while its
percentage of RGB keeps comparatively steady and constant The above proposed
algorithm removes the shadow simply and effectively
We have also presented a real-time and efficient moving shadow removal
algorithm based on versatile uses of GMM in section 3.2, including the background
removal and development of features by Gaussian models. This algorithm innovates to
use the homogeneous property inside the shadowed regions, and hierarchically detects the
foreground objects by extracting the edge-based, gray level-based features, and feature
combination. Our approach can be characterized by some original procedures such as
“pixel-by-pixel maximization”, subtraction of edges from background images in the
corresponding regions, adaptive binarization, boundary elimination, the automatic
selection mechanism for shadow-potential regions, and the Gaussian darkening factor
model for each gray level.
Among all these proposed procedures, “pixel-by-pixel maximization” and
subtraction of edges from background images in the corresponding regions deal with the

26
problems which result from the shadowed regions with edges. Adaptive binarization and
boundary elimination are developed to extract the foreground-pixels of nonshadowed
regions. Most significantly, we propose the Gaussian darkening factor model for each
gray level to extract nonshadow pixels from foreground objects by using the information
of gray levels, and integrate all the useful features to locate the real objects without
shadows. Finally, in comparison with the previous approaches, the experimental results
show that our proposed algorithm can accurately detect and locate the foreground objects
in different scenes and various types of shadows. We will apply the presented algorithm
to vehicle counting to prove its capability and effectiveness. Our algorithm indeed
improves the results of vehicle counting and it is also verified to be efficient with the
prompt processing speed
4.2 Future Scope
The above algorithm works on videos with a normal background and stable videos
from fixed camera. This algorithm can be modified to work on the video having complex
background as well videos that not stable. The videos that are not stable can be made
stable using the video stabilizer and then we can work on it. The video stabilizer can be
implemented in future so that it can stable the video and then processing can be done.
Also by working on the real time shadow removal algorithm, we can completely remove
the shadows from videos so that it assists the human user monitoring on the surveillance
system to visualize the objects clearly.

27

You might also like