You are on page 1of 28

1

Literature Survey Report


REPORT OF LITERATURE REVIEW ON COST EFFECTIVE COMPUTER VISION
ROBOT
Adarsh S Hegde 1 | Anshul Kumar 2 | Harshit S Badiger 3
Guided By: Mr. H.D. Kattimani

1
U.G. Student, Department of Electrical and Electronics Engineering, BMSIT&M, Bangalore, Karnataka,
India
2
U.G. Student, Department of Electrical and Electronics Engineering, BMSIT&M, Bangalore, Karnataka,
India
3
U.G. Student, Department of Electrical and Electronics Engineering, BMSIT&M, Bangalore, Karnataka,
India

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


2

ABSTRACT
Computer vision is long history of research in root techniques and applications. The
basics are developed with the knowledge of many subjects and having applications in
several domains. The main uses of machine vision are quality assurance, sorting,
material handling, robotic guidance. Even though computer vision is used in industry,
recent development of high volume, requirement of quality products, due to regulations,
computer vision has more importance and has become obligation in the industry.

Literature Review
INTRODUCTION
The increased consumption, awareness of quality, safety has created an awareness for
improved quality in consumer products. The demand of user specific customisation,
increase of competition has raised the need of cost reduction. This can be achieved by
increasing the quality of products, reducing the wastage during the production,
flexibility in customisation and faster production. The human based quality control is
not apt after a certain quantity of production. At higher level of production, it is
important to have a system that simulate human acts. The vision system can be viewed
as simulated system with combination of human eye (Camera) and intelligence
(Computer).

Machine Vision (MV) is the technology used to provide image-based analysis for
applications such as automatic inspection, process control and robot guidance in
industry. Vision Sensors/Machine Vision Systems analyse images to perform
appearance inspections, character inspections, positioning, and defect inspections. The
machine vision systems can be used in a wide range of applications because of their
flexibility and versatile features. The use of vision systems in inspection and motion
control applications imposes several real-time constraints on image processing.
However, constantly increasing performances and decreasing costs of machine vision
software and hardware make vision measuring systems more advantageous than the
conventional measuring systems. These vision systems can be used to precisely measure
variables such as distance, angle, position, orientation, colour, etc. The main advantage
of a machine vision-based system is its non-contact inspection principle, which is
important in the cases where it is difficult to implement contact measurements.

Also, Machine Vision technology helps to achieve better productivity and aids in the
overall quality management, thus posing a prominent competition to other industries
which do not implement vision systems. The scope of Vison based systems is not only
limited to the fields described here and it extends widely to much more industries such
as welding industries, where Machine vision is used to identify and classify weld defects
in welding environments, where human inspection is not efficient. With the

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


3

advancements in this field, Computer vision encompasses even to human gait


recognition system.

The main aim of this paper is to study the various models implemented using computer
vision for object detection and automation, their design, implementations, problems
faced and desired practical outcomes and compare various models. Various design
models which are similar to our needs in terms of design and algorithm implementations
are found to be as follows:
1. Rapid Object Detection using a boosted cascade of simple features
2. Object Detection Using Image Processing
3. Study on Object Detection using OpenCV-Python

PROBLEMS IN EXISTING SYSTEM


There are certain companies like Keyence, Cognex Corporation, Basler AG, Matrix
Vision GmbH which develops and manufactures machine vision systems, software,
sensors and digital cameras for industrial applications, traffic systems, medical devices
and the video surveillance market. The global Machine Vision market is expected to
reach USD 15.46 billion by the end of 2022 with 8.18% CAGR during the forecast
period 2017-2022. Machine Vision Market is growing with positive growth in all
regions. Increasing application areas year on year and advancement in technology and
integration is driving the market on a global scale. Asia Pacific is dominating the global
market with more than 30% of market share followed by Europe standing as the second
biggest market due to heavy demand from the automotive and healthcare industry. North
America stands as the third biggest market. The companies described above have
monopoly in the vision system market. They have their own proprietary software which
is patented and impossible to copy. So, the manufacturing or any other industrial
company have no choice but to purchase these systems from them. And the problem is
that these systems are very expensive. The systems start from minimum 4 lakh Indian
rupee and that is for just system. The cost of various other components like robotic arms
and different other components have to take into consideration which cost another
fortune. There is also a need for regular maintenance, updates and whatnots of this
system.
As India is a cost-oriented country, we focus mainly based on cost of a product
regardless of its quality. Also, for a small-scale manufacturing company it will cost a
fortune to acquire that kind of machine vision system which is necessary for its day-to-
day operation.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


4

RAPID OBJECT DETECTION USING BOOSTED


CASCADE OF SIMPLE FEATURES

ABSTRACT
This paper describes a machine learning approach for visual object detection which is
capable of processing images extremely rapidly and achieving high detection rates. This
work is distinguished by three key contributions.

The first is the introduction of a new image representation called the “Integral Image”
which allows the features used by our detector to be computed very quickly. The second
is a learning algorithm, based on AdaBoost, which selects a small number of critical
visual features from a larger set and yields extremely efficient classifiers [1]. The third
contribution is a method for combining increasingly more complex classifiers in a
“cascade” which allows background regions of the image to be quickly discarded while
spending more computation on promising object-like regions.

The cascade can be viewed as an object specific focus-of-attention mechanism which


unlike previous approaches provides statistical guarantees that discarded regions are
unlikely to contain the object of interest. In the domain of face detection, the system
yields detection rates comparable to the best previous systems. Used in real-time
applications, the detector runs at 15 frames per second without resorting to image
differencing or skin colour detection.

INTRODUCTION
This article brings together new algorithms and insights to construct a framework for
robust and extremely rapid object detection. This framework is demonstrated on, and in
part motivated by, the task of face detection. This face detection system is most clearly
distinguished from previous approaches in its ability to detect faces extremely rapidly.
In other face detection systems, auxiliary information, such as image differences in
video sequences, or pixel colour in colour images, have been used to achieve high frame
rates. There are three main contributions of our object detection framework.
The first contribution of this article is a new image representation called an integral
image that allows for very fast feature evaluation. In order to compute the features very
rapidly at many scales we introduce the integral image representation for images. The
integral image can be computed from an image using a few operations per pixel.
Once computed, any one of these Harr-like features can be computed at any scale or
location in constant time.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


5

The second contribution of this paper is a method for constructing a classifier by


selecting a small number of important features using AdaBoost [1]. Within any image
sub window, the total number of Harr-like features is very large, far larger than the
number of pixels. In order to ensure fast classification, the learning process must
exclude a large majority of the available features, and focus on a small set of critical
features. Motivated by the work of Tieu and Viola, feature selection is achieved through
a simple modification of the AdaBoost procedure: the weak learner is constrained so
that each weak classifier returned can depend on only a single feature [2].

As a result, each stage of the boosting process, which selects a new weak classifier, can
be viewed as a feature selection process. AdaBoost provides an effective learning
algorithm and strong bounds on generalization performance. The third major
contribution of this paper is a method for combining successively more complex
classifiers in a cascade structure which dramatically increases the speed of the detector
by focusing attention on promising regions of the image. The notion behind focus of
attention approaches is that it is often possible to rapidly determine where in an image
an object might occur[20,8,1]. More complex processing is reserved only for these
promising regions. The key measure of such an approach is the “false negative” rate of
the attentional process. It must be the case that all, or almost all, object instances are
selected by the attentional filter. We will describe a process for training an extremely
simple and efficient classifier which can be used as a “supervised” focus of attention
operator. The term supervised refers to the fact that the attentional operator is trained to
detect examples of a particular class. In the domain of face detection, it is possible to
achieve fewer than 1% false negatives and 40% false positives using a classifier
constructed from two Harr-like features. The effect of this filter is to reduce by over one
half the number of locations where the final detector must be evaluated. Those sub-
windows which are not rejected by the initial classifier are processed by a sequence of
classifiers, each slightly more complex than the last. If any classifier rejects the sub-
window, no further processing is performed.

METHODOLOGY

Our object detection procedure classifies images based on the value of simple features.
There are many motivations for using features rather than the pixels directly. The most
common reason is that features can act to encode ad-hoc domain knowledge that is
difficult to learn using a finite quantity of training data. For this system there is also a
second critical motivation for features: the feature-based system operates much faster
than a pixel-based system. The simple features used are reminiscent of Haar basis
functions.
The value of a two-rectangle feature is the difference between the sum of the pixels
within two rectangular regions. The regions have the same size and shape and are
horizontally or vertically adjacent (see Figure 1). A three-rectangle feature

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


6

computes the sum within two outside rectangles subtracted from the sum in a center
rectangle. Finally, a four-rectangle feature computes the difference between diagonal
pairs of rectangles.

Figure 1

Example rectangle features shown relative to the enclosing detection window. The sum of the pixels
which lie within the white rectangles are subtracted from the sum of pixels in the grey rectangles. Two-
rectangle features are shown in (A) and (B). Figure (C) shows a three-rectangle feature, and (D) a four-
rectangle feature.

Integral Image

Rectangle features can be computed very rapidly using an intermediate representation


for the image which we call the integral image. The integral image at location x,y
contains the sum of the pixels above and to the left of x,y inclusive:

where ii(x,y) is the integral image and i(x,y) is the original image. Using the following
pair of recurrences:

(where s(x,y) is the cumulative row sum, s(x-1) = 0 and ii(-1,y) = 0) the integral image
can be computed in one pass over of the image.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


7

Figure 2

Using the integral image any rectangular sum can be computed in four array references
(see Figure 2). Clearly the difference between two rectangular sums can be computed
in eight references. Since the two-rectangle features defined above involve adjacent
rectangular sums they can be computed in six array references, eight in the case of the
three-rectangle features, and nine for four-rectangle features.

Learning Classification Function

Given a feature set and a training set of positive and negative images, any number of
machine learning approaches could be used to learn a classification function. In our
system a variant of AdaBoost is used both to select a small set of features and train the
classifier. In its original form, the AdaBoost learning algorithm is used to boost the
classification performance of a simple (sometimes called weak) learning algorithm.
Recall that there are over 180,000 rectangle features associated with each image sub-
window, a number far larger than the number of pixels. Even though each feature can
be computed very efficiently, computing the complete set is prohibitively expensive.
Our hypothesis, which is borne out by experiment, is that a very small number of these
features can be combined to form an effective classifier. The main challenge is to find
these features.
In support of this goal, the weak learning algorithm is designed to select the single
rectangle feature which best separates the positive and negative examples (this is similar
to the approach of in the domain of image database retrieval). For each feature, the weak
learner determines the optimal threshold classification function, such that the minimum
number of examples are misclassified. Here x is a 24x24 pixel sub-window of an image.

A weak classifier hj (x) thus consists of a feature fj, a threshold 𝜃j and a parity pj
indicating the direction of the inequality sign:

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


8

Table 1: The AdaBoost algorithm for classifier learning.Each round of boosting


selects one feature from the180,000 potential features.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


9

Learning Results

While details on the training and performance of the final system are presented in
Section 5, several simple results merit discussions. Initial experiments demonstrated
that a frontal face classifier constructed from 200 features yields a detection rate of 95%
with a false positive rate of 1 in 14084. These results are compelling, but not sufficient
for many real-world tasks. In terms of computation, this classifier is probably faster than
any other published system, requiring 0.7 seconds to scan a 384 by 288-pixel image.
For the task of face detection, the initial rectangle features selected by AdaBoost are
meaningful and easily interpreted. The first feature selected seems to focus on the
property that the region of the eyes is often darker than the region of the nose and cheeks
(see Figure 3).

Figure 3

The first and second features selected by AdaBoost. The two features are shown in the
top row and then overlaid on a typical training face in the bottom row. The first feature
measures the difference in intensity between the region of the eyes and a region across
the upper cheeks. The feature capitalizes on the observation that the eye region is often
darker than the cheeks. The second feature compares the intensities in the eye regions
to the intensity across the bridge of the nose.

This feature is relatively large in comparison with the detection sub-window, and should
be somewhat insensitive to size and location of the face. The second feature selected
relies on the property that the eyes are darker than the bridge of the nose.

The Attentional Cascade

This section describes an algorithm for constructing a cascade of classifiers which


achieves increased detection performance while radically reducing computation time.
The key insight is that smaller, and therefore more efficient, boosted classifiers can be
constructed which reject many of the negative sub-windows while detecting almost all

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


10

positive instances (i.e. the threshold of a boosted classifier can be adjusted so that the
false negative rate is close to zero). Simpler classifiers are used to reject the majority of
sub windows before more complex classifiers are called upon to achieve low false
positive rates.

Stages in the cascade (see Figure 4) are constructed by training classifiers using
AdaBoost and then adjusting the threshold to minimize false negatives. Note that the
default AdaBoost threshold is designed to yield a low error rate on the training data. In
general, a lower threshold yields higher detection rates and high false positive rates.

Figure 4: Schematic depiction of the detection cascade.


Detector Cascade Discussion

The complete face detection cascade has 38 stages with over 6000 features.
Nevertheless, the cascade structure results in fast average detection times. On a difficult
dataset, containing 507 faces and 75 million sub-windows, faces are detected using an
average of 10 feature evaluations per sub window. The structure of the cascaded
detection process is essentially that of a degenerate decision tree. Unlike techniques
which use a fixed detector, Amit and Geman [1] 1propose an alternative point of view
where unusual co-occurrences of simple image features are used to trigger the
evaluation of a more complex detection process. In this way the full detection process
need not be evaluated at many of the potential image locations and scales. While this
basic insight is very valuable, in their implementation it is necessary to first evaluate
some feature detector at every location. These features are then grouped to find unusual
co-occurrences. In practice, since the form of our detector and the features that it uses
are extremely efficient, the amortized cost of evaluating our detector at every scale and
location is much faster than finding and grouping edges throughout the image.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


11

Results

A 38-layer cascaded classifier was trained to detect frontal upright faces. To train the
detector, a set of face and nonface training images were used. The face training set
consisted of 4916 hand labelled faces scaled and aligned to a base resolution of 24 by
24 pixels.

Figure 5

The faces were extracted from images downloaded during a random crawl of the world
wide web. Some typical face examples are shown in Figure 5. The non-face sub
windows used to train the detector come from 9544 images which were manually
inspected and found to not contain any faces. There are about 350 million sub windows
within these non-face images. The number of features in the first five layers of the
detector is 1, 10, 25, 25 and 50 features respectively. The remaining layers have
increasingly more features. The total number of features in all layers is 6061. Each
classifier in the cascade was trained with the 4916 training faces (plus their vertical
mirror images for a total of 9832 training faces) and 10,000 non-face sub-windows (also
of size 24 by 24 pixels) using the Adaboost training procedure. For the initial one feature
classifier, the nonface training examples were collected by selecting random sub-

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


12

windows from a set of 9544 images which did not contain faces. The non-face examples
used to train subsequent layers were obtained by scanning the partial cascade across the
non-face images and collecting false positives. A maximum of 10000 such non-face
sub-windows were collected for each layer. The speed of the cascaded detector is
directly related to the number of features evaluated per scanned sub-window. Evaluated
on the MIT+CMU test set, an average of 10 features out of a total of 6061 are evaluated
per sub-window. This is possible because a large majority of sub-windows are rejected
by the first or second layer in the cascade.

Figure 6: Output of our face detector on a number of test images from the MIT+CMU
test set.

CONCLUSION
The approach was used to construct a face detection system which is approximately 15
faster than any previous approach. This paper brings together new algorithms,
representations, and insights which are quite generic and may well have broader
application in computer vision and image processing.
Finally, this paper presents a set of detailed experiments on a difficult face detection
dataset which has been widely studied. This dataset includes faces under a very wide
range of conditions including: illumination, scale, pose, and camera variation.
Experiments on such a large and complex dataset are difficult and time consuming.
Nevertheless, systems which work under these conditions are unlikely to be brittle or
limited to a single set of conditions. More importantly conclusions drawn from this
dataset are unlikely to be experimental artefacts.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


13

OBJECT DETECTION USING IMAGE


PROCESSING
ABSTRACT
An Unmanned Ariel vehicle (UAV) has greater importance in the army for border
security. The main objective of this article is to develop an OpenCV-Python code using
Haar Cascade algorithm for object and face detection. Currently, UAVs are used for
detecting and attacking the infiltrated ground targets. The main drawback for this type
of UAVs is that sometimes the object is not properly detected, which thereby causes the
object to hit the UAV. This project aims to avoid such unwanted collisions and damages
of UAV. UAV is also used for surveillance that uses Voila-jones algorithm to detect
and track humans. This algorithm uses cascade object detector function and vision. train
function to train the algorithm. The main advantage of this code is the reduced
processing time. The Python code was tested with the help of available database of video
and image, the output was verified.

INTRODUCTION
The Unmanned Aerial Vehicle, which is an aircraft with no pilot on board. UAVs can
be remote controlled aircraft (e.g. flown by a pilot at a ground control station) or can fly
autonomously based on pre-programmed flight plans or more complex dynamic
automation systems. Today, images and video are everywhere. Online photo sharing
sites and social networks have them in the billions. The field of vision research [6] has
been dominated by machine learning and statistics. Using images and video to detect,
classify, and track objects or events in order to” understand” a real-world scene.
Programming a computer and designing algorithms for understanding what is in these
images is the field of computer vision. Computer vision powers applications like image
search, robot navigation, medical image analysis, photo management and many more.
Object detection can be further divided into soft detection, which only detects the
presence of an object, and hard detection, which detects both the presence and location
of the object. Object detection field is typically carried out by searching each part of an
image to localize parts, whose photometric or geometric properties match those of the
target object in the training database. This can be accomplished by scanning an object
template across an image at different locations, scales, and rotations, and a detection is
declared if the similarity between the template and the image is sufficiently high. The
similarity between a template and an image region can be measured by their correlation
(SSD). Over the last several years it has been shown that image-based object detectors
are sensitive to the training data. Image processing is a method to convert an image into
digital form and perform some operations on it, in order to get an enhanced image or to
extract some useful information from it. It is a type of signal dispensation in which input
is image, like video frame or photograph and output may be image or characteristics

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


14

associated with that image. Image processing basically includes the following three
steps:

 Importing the image with optical scanner or by digital photography.


 Analysing and manipulating the image which includes data compression and
image enhancement
 and spotting patterns that are not to human eyes like satellite photographs.
 Output is the last stage in which result can be altered image or report that is based
on image analysis.

Collecting
Image Feature Object
Putative
Capture Detection Detection
Points

Figure 7: Block Diagram for Object Detection

METHODOLOGY
A simple face tracking system by dividing the tracking problem into three separate
problems:

 Detect a face to track


 Identify facial features to track
 Track the face

Step 1: Detect a Face to Track

Before they begin tracking a face, they need to first detect it. Use the vision Cascade
Object Detector to detect the location of a face in a video frame.
The cascade object detector uses the Viola- Jones detection algorithm (Later, they will
discuss the mathematical modelling of Haar-like features and Viola Jones) and a trained
classification model for detection. By default, the detector is configured to detect faces,
but it can be configured for other object types. (See Figure 8)

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


15

Figure 8

Step 2: Identify Facial Features to Track.

Once the face is located in the video, the next step is to identify a feature that will help
us track the face. For example, they can use the shape, texture, or colour. they choose a
feature that is unique to the object and remains invariant even when the object moves.
In this example, they use colour as the feature to track. The colour provides a good deal
of contrast between the face and the background and does not change as the face rotates
or moves. (See Figure 9)

Figure 9

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


16

Step 3: Track the Face

With the colour selected as the feature to track, they can use the vision. Once the face
in the video was identified, as a position which occupied in the output area in term of
geometric coordinators, they can distinguish between the real face shape and its
correspondent background. In this and previous section, they introduce algorithms to
visualize feature spaces used by object detectors. They found that these visualizations
allow us to analyse object detection systems in new ways and gain new insight into the
detectors failures. So here they present an algorithm work to detect cars using shape
features. (See Figure 10)

Figure 10

-Object Detection

They have introduced algorithms to visualize feature spaces used by object detectors.
We found that these visualizations allowed them to analyse object detection systems in
new ways and gain new insight into the detectors failures. So here we present an
algorithm work to detect cars using shape features.
The easy way to do vehicle detection is by using Haar Cascades. To do vehicle tracking,
we need to use a tracking algorithm. The Haar Cascades is not the best choice for vehicle
tracking because its large number of false positives. What do we mean by False
Positives? In the general case, a false positive is an error in some evaluation process in
which a condition tested for is mistakenly found to have been detected. The Haar-
Cascade cars3.xml was trained using 526 images of cars from the rear (360x240 pixels,
no scale). The images were extracted from the Car dataset taken of the freeways of
southern California. This algorithm detects any moving object as vehicle. To save the

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


17

foreground masks, we can use the BGS Library. One suggestion is to use a BS
algorithm. But only a BS algorithm is insufficient to do vehicle tracking, we will need
a blob tracker algorithm or a library
like cvBlob or OpenCVBlobsLib. I have taken the original video work form YouTube.
In order to do some tips to do vehicle tracking and counting, we need to:
 First, perform a background subtraction.
 Send the foreground mask to cvBlob or OpenCVBlobsLib the cvBlob library
provide some methods to get the centroid, the track and the ID of the moving
objects. We can also set if we want to draw a bounding box, or the centroid and
the angle of the tracked object.
 Check if the centroid of the moving object has crossed a virtual line

-Haar-like Features & Viola-Jones

The usage of Haar-like features in object detection [7,8,9,10] is proposed first time by
Paul Viola and Michael Jones in Viola & Jones (2001). This method then attracted so
much attention from the others and a lot of people extended this method for a various
field of object detection. The idea behind the Haar Detection Cascade is to eliminate
negative examples with very little processing. A series of classifiers are computed to
every sub-region in the image. If the sub region does not pass all of the classifiers than
that image is released and further computation is performed. If a sub region passes the
first stage which requires little computation, it is then passed onto the next stage where
a little more computation is needed. If the image passes this classifier it is then passed
to another sub region. In order for face detection to occur, an image sub region must
pass all of these classifiers. One can train a classifier to improve its accuracy, but the
classifier in OpenCV for face detection works just fine for our purpose. (See Figure 11)

Figure 11

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


18

Haar-like features are an over complete set of two-dimensional (2D) Haar functions,
which can be used to encode local appearance of objects[11]. They consist of two or
more rectangular regions enclosed in a template. The feature value f of a Haar-like
feature which has k rectangles is obtained. One of the main reasons for the popularity
of the Haar-like features is that they provide a very attractive trade-off between speed
of evaluation and accuracy. With a simple weak classifier based on Haar-like features
costing just 60 microprocessor instructions, Viola and Jones[12]achieved 1% false
negatives and 40% false positives for the face detection problem. The high speed of
evaluation is mainly due to the use of integral images, which once computed, can be
used to rapidly evaluate any Haar like feature at any scale in constant time. Since the
introduction of horizontally and vertically aligned Haar-like features by Papageogiou
et, many different Haar-like features have appeared in the literature[12,13,14]. The main
difference in the Haar like features concept is in the number of rectangles and the
orientation of the rectangles with respect to the template.

CONCLUSION
They have developed this article from general to private, means from the need of
computer vision to how and why to detect objects and faces. They explained in detailed
all concepts of object and face detection and why is it so important that field? Their
results showed that the main aim was to detect the objects and the output objects were
detected from the real scene. The face detection program can be implemented to detect
and follow people in case of surveillance and other domains. They introduced a tool to
explain some of the success of object detection systems. They present algorithms to
visualize the success spaces of object detectors. This work is done in Python-OpenCV
and can be performed with MATLAB also but they preferred Python because they can
include it in OpenCV programs and the execution time in Python is lesser and simple.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


19

STUDY ON OBJECT DETECTION USING OPEN


CV – PYTHON

ABSTRACT
Object detection is a well-known computer technology connected with computer vision
and image processing that focuses on detecting objects or its instances of a certain class
(such as humans, flowers, animals) in digital images and videos. There are various
applications of object detection that have been well researched including face detection,
character recognition, and vehicle calculator. Object detection can be used for various
purposes including retrieval and surveillance. In this study, various basic concepts used
in object detection while making use of OpenCV library of python 2.7, improving the
efficiency and accuracy of object detection are presented.

INTRODUCTION

Object detection [19] and location in digital images has become one of the most
important applications for industries to ease user, save time and to achieve parallelism.

The main aim of studying and researching computer vision is to simulate the behaviour
and manner of human eyes directly by using a computer and later on develop a system
that reduces human efforts. Its main purpose is reconstructing the visual aspects of 3D
objects after analysing the 2D information extracted. Real life 3D objects are
represented by 2D images. The process of object detection analysis is to determine the
number, location, size, position of the objects in the input image. The common object
detection method is the colour-based approach, detecting objects based on their colour
values[19].

The method is used because of its strong adaptability and robustness, however, the
detection speed needs to be improved, because it requires testing all possible windows
by exhaustive search and has high computational complexity. The detection of the
objects can be extended using automation and robotics for plucking of the objects like
apples, bananas from the corresponding tree using the image processing techniques and
it will be easier, faster.

OpenCV library implemented in python2.7 along with the help of Numpy is used and
the world of object detection is explored, a virtual Artificial Neural Network is created
using Sci-kit tool.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


20

METHODOLOGY
-Theory
Every object class has its own special features that help in classifying the object. With
more efficient algorithms, objects can even be recognized even when they are partially
obstructed from the direct view.

Various terms used in Object Detection are,

1) Edge matching
 Uses edge detection techniques to find the edges
 Effect of changes in lighting and colour
 Count the number of overlapping edges.

2) Divide and Conquer search


 All positions are to be considered as a set.
 The lower bound is determined at best position in the cell.
 The cell is pruned if the bound is too large.
 The process stops when a cell becomes small enough.

3) Grayscale matching
 Edges give a lot of information being robust to illumination changes.

 Pixel distance is computed as a function of both pixel intensity and position.


The same thing can compute with colour too.

4) Gradient matching
 Comparing image gradients can also be helpful in making it robust to
illumination changes.
 Matching is performed like matching greyscale images.

-Open CV

OpenCV (Open Source Computer Vision) is an open source [10] computer vision and
machine learning software library. Around 3000 algorithms are currently embedded
inside OpenCV library, all these algorithms being efficiently optimized. It supports real-
time vision applications. These algorithms are categorized under classic algorithms,
state of art computer vision algorithms and machine learning algorithms. These

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


21

algorithms are easily implemented in Java, MATLAB, Python, C, C++ etc. and are well
supported by operating system like Window, Mac OS, Linux and Android.
There are more than 500 different algorithms and even more such functions that
compose or support those algorithms. OpenCV is written natively in C++ and has a
templated interface that works seamlessly with STL containers.
For OpenCV to work efficiently with python 2.7 we need to install NumPy package
first.

-NumPy

NumPy is the fundamental package for scientific computing with Python.It can be
treated as an extension of the Python programming language with support for
multidimensional matrices and arrays. It is open source software with many
contributors. It contains among other things:

• A powerful N-dimensional array object.


• Broadcasting functions.
• Tools for integrating C/C++ and FORTRAN code.
• Useful linear algebra, Fourier transform, and random number capabilities.

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data.

Steps Involved in Object Detection

-Install Open CV & Python

Below Python packages are to be downloaded and installed to their default location -
Python-2.7.x, NumPy and Matplotlib. Install all packages into their default locations.
Python will be installed to C/Python27/. Open Python IDLE. Enter import NumPy and
make sure NumPy is working fine. Download OpenCV from Sourceforge. Go to
OpenCV/build/python/2.7 folder. Copy cv2.pyd to C:/Python27/lib/site-packages.

-Read an Image

Use the function CV2.imread() to read an image. The image should be in the current
working directory otherwise, we need to specify the full path of the image as the first
argument. The second argument is a flag which specifies the way image should be read.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


22

1. CV2.IMREAD_COLOR: This function is used to load a colourSS image.


Transparency of image, if present will be neglected. It is the default flag.
2. CV2.IMREAD_GRAYSCALE: Loads image in grayscale mode
3. CV2.IMREAD_UNCHANGED: Loads image as such including alpha channel.

-Feature Detection

 Understanding features
 Corner detection
 Feature Matching
 Homography

Feature Feature
Read Image Homography
Understanding Matching

Various Object Detection Algorithms Implemented in Python

-Haar like features

It is an effective object detection technique which is proposed by Paul Viola and


Michael Jones in 2001. It is a machine learning based method for object detection where
we train a classifier from a lot of images.

This classifier is then used in detecting objects in an image. Initially, the algorithm
needs images with faces (positive images) and images without faces (negative images)
to train a classifier and then extract features from this classifier.

This method introduces a concept cascade of the classifier. Instead of applying all the
features at once we group the features into different stages of the classifier and apply
one by one. Discard the window if it fails in the first stage. If it passes the stage then
continue the process. The window which passes through all the stages will be our
desired region.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


23

Figure 12: Face Detection using Haar Cascade

-Circular Hough Transformation

this transformation was initially meant to detect arbitrary shapes from an image. It was
later modified to detect circular objects in low-contrast noisy images and referred as
Circular Hough Transformation.

CHT relies on equations for circles:

𝑟 2 = (𝑥 − 𝑎)2 + (𝑥 − 𝑏)2

where a and b are the coordinate of the centre, and r is the radius of the circle.

CHT relies on three parameters, which require larger computation time and memory and
it increases the complexity to extract information from the image.
For simplicity, CHT programs are provided with a constant value of radius or provided
with a range of radius prior to running the application.

-Template matching

Template matching[16,17,18] is a high-level machine vision technique to detect


objects from an image that matches a given image pattern. This technique matches the
source image with the template image or patch.
If the template image has strong features, the feature-based approach may be used
otherwise template-based approach is used.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


24

-Blob Detection

This method is used to detect regions in an image that differs in properties. A blob is a
region in the image in which all the points can be considered to be similar to each other.
There are two classes of blob detection method: differential method and local extrema
method.

-Gradient Based Method

The gradient-based method uses spatial and temporal partial derivatives to estimate
image flow at every position in the image. If the motion is not known in advance to be
restricted to a small range of possible values then a multi- scale analysis must be applied
so that the scale of the smoothing prior to derivatives estimation is appropriate to the
scale of the motion.

-Deep Face Method

Facebook AI research group has developed Deep Face Software in Menlo Park,
California by the support of an advanced deep learning neural network. A piece of
software that simulates an approximation of how real neurons works is called a Neural
Network. Deep Face Learning can be performed by Machine Learning. It can be defined
as a huge body of data that develops a high-level abstraction by looking for recurring
faces.

CONCLUSION

Computer Vision helps in solving real world problems in a more efficient and reliable
way. The are various ways in which the object detection process can be achieved with
less complex algorithms. Python has been preferred over MATLAB for integrating with
OpenCV because when a MATLAB program is run on a computer, it gets busy trying
to interpret all that MATLAB code as MATLAB code is built on Java. OpenCV is
basically a library of functions written in C\C++. Additionally, OpenCV is easier to use
for someone with little programming background. So, it is better to start researching on
any concept of object detection using OpenCV-Python. Feature understanding and
matching are the two crucial steps in object detection and therefore should be performed
with the highest accuracy possible. Deep Face is the most effective face detection
method that is preferred over Haar-Cascade by most of the social applications like
Facebook, Snap chat, Instagram etc.

In the coming days Open CV will be immensely used in solving various real-world
problems.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


25

PROBLEM STATEMENT

“To create a machine vision system that’s both efficient, precise and most importantly
low cost using OpenCV-Python3 on a multipurpose robotic arm (DOBOT MAGACIAN)
with Raspberry Pi 3 and sort out the nuts from collection of various components by
picking up and arranging them in an orderly manner according to the user input.”

OVERALL CONCLUSION

Computer Vision is the technology which is booming right now in engineering field and
it is making our lives easier by helping humans solve real world problems in the best
way it can be done. When Computer Vision is tied up with Robotics, many of the simple
tasks can be automated there by reducing the human effort and time. Object Detection
is one such problem which are discussed above and it can be solved with Computer
Vision. From the above discussion it is clear that open source software like Open CV
can be employed for the process of object Detection.

Open CV and Python being the major set of contribution for the object detection in a
predefined space has been proven satisfactorily efficient from the above different
algorithms mentioned. The basics of object detection lie deep within the Viola Jones
algorithm which is discussed above. All the different algorithms and ways for object
detection like Haar like features and Integral Image can be employed in solving our
problem and achieving it in the most efficient manner possible.

Combining all the algorithms, reducing the time complexity of the final approached
algorithm and configuring the same with final prototype of Robot will help us make the
same more efficient and faster.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


26

REFERENCES

[1] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line


learning and an application to boosting. In Computational Learning Theory: Eurocolt
’95, pages 23–37. Springer-Verlag, 1995.

[2] Anonymous. Anonymous. In Anonymous, 2000.

[3] J.K. Tsotsos, S.M. Culhane,W.Y.K.Wai, Y.H. Lai, N. Davis, and F. Nuflo. Modeling
visual-attention via selective tuning. Artificial Intelligence Journal, 78(1-2):507–545,
October 1995.

[4] L. Itti, C. Koch, and E. Niebur. A model of saliency-based


visual attention for rapid scene analysis. IEEE Patt. Anal.
Mach. Intell., 20(11):1254–1259, November 1998.

[5] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line


learning and an application to boosting. In Computational Learning Theory: Eurocolt
’95, Springer-Verlag, 1995.

[6] Automatic Detection of Cars in Real Roads using Haar-like Features, M. Oliveira,
V. Santos Department of Mechanical Engineering, University of Aveiro, 3810 Aveiro,
Portugal.

[7] Haar Classifier based Identification and Tracking of Moving Objects from a Video
Sequence, Visakha K, Sidharth S Prakash, International Research Journal of
Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 01 | Jan-
2018

[8] Understanding Image Virality, Arturo Deza, Devi Parikh, CVPR2015, Computer
Vision Foundation

[9] Visual Object Recognition with Supervised Learning, Bernd Heisele, MIT Center
for Biological and Computational Learning

[10] Visualizing Object Detection Features, Carl Vondrick, Aditya Khosla, Hamed
Pirsiavash, Tomasz Malisiewicz, Antonio Torralba, International Journal of Computer
Vision

[11] C.P. Papageorgiou, M. Oren, T. Poggio, A general framework for object detection,
in: ICCV ’98: Proceedings of the International Conference on Computer Vision,
Washington, DC, USA, 1998, pp. 555-562.

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


27

[12] P. Viola, M.J. Jones, Rapid object detection using a boosted cascade of simple
features, in: CVPR ’01: Proceedings of the Conference on Computer Vision and Pattern
Recognition, Los Alamitos, CA, USA, 2001, pp. 511-518.

[13] S.Z. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, H. Shum, Statistical learning of
Multiview face detection, in: ECCV ’02: Proceedings of the European Conference on
Computer Vision, Lecture Notes in Computer Sciences, vol. 2353, London, UK, 2002,
pp. 67-81.

[14] R. Lienhart, J. Maydt, An extended set of Haar-like features for rapid object
detection, in: ICIP ’02: Proceedings of the International Conference on Image
Processing, 2002, pp. 900- 903

[15] Nidhi, “Image Processing and Object Detection”, Dept. of Computer Applications,
NIT, Kurukshetra, Haryana, 1(9): 396-399, 2015.

[16] Khushboo Khurana and Reetu Awasthi,“Techniques for Object Recognition in


Images and Multi-Object Detection”,(IJARCET), ISSN:2278-1323,4th, April 2013.

[17] Latharani T.R., M.Z. Kurian, Chidananda Murthy M.V,“Various Object


Recognition Techniques for Computer Vision”, Journal of Analysis and Computation,
ISSN: 0973-2861.

[18] R. Hussin, M. Rizon Juhari, Ng Wei Kang, R.C.Ismail, A.Kamarudin, “Digital


Image Processing Techniques for Object Detection from Complex Background
Image,”Perlis, Malaysia: School of Microelectronic Engineering, University Malaysia
Perlis, 2012.

[19] Opencv.org,2017. [Online].Available:http://www.opencv.org/about

[20] Object Detection


[Online].Available:http://en.m.wikipedia.org/wiki/Object_detection

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger


28

Dept. of EEE, BMSIT&M Adarsh S Hegde | Anshul Kumar | Harshit S Badiger

You might also like