Stereo Vision Based Object Detection

M.E.
(Electronics & Telecommunication) Stereo Vision based Object Detection
ABSTRACT
This report presents an analysis of color-stereo approaches for Object Detection. There
are many applications of stereo vision like Pedestrian Detection, 3D Face Detection, Automated
Systems, Robotics, Aerial Surveys etc. Generally by using two cameras, Stereo View of the
scene can be generated. Stereo Vision is same as that of the structure of human eye. By having
two eyes humans can judge the 3D views. In same way by creating disparity maps by use of two
cameras, object distance is decided. According to the Distance of Object from Camera it can
reconstruct the 3D View of scene in Computers. This science is very helpful in the field of
Automation where robots will have understanding of 3D Environment. Different approaches and
techniques for analyzing the stereo views with the help of Disparity maps created by
triangulation geometrics are discussed. By using the disparity maps, v-Disparity Image & u-
Disparity Image for Vertical and Horizontal alignment the object Location can be fixed.
Algorithm for Disparity Map generation from two camera Images is Dense-Stereo Matching
with correspondence-matching is given by Konolige. For Object Detection Candidate-Bounding-
Box Generation is used with the help of Disparity Images. The Last step of this technique is
Candidate Filtering and Merging. This approach is reported to be very accurate for Object
Detection.
CHAPTER 1
SKNCOE – Electronics & Telecommunication Engineering – 2010
1
M.E. (Electronics & Telecommunication) Stereo Vision based Object Detection
INTRODUCTION
1.1 BACKGROUND
Recently, computing speed and storage media capacity improves spectacularly. This
enables advanced image processing in video rate. Therefore, many video applications have
widely been studied, and robot vision becomes one of the hottest topics [1]. Based on the coming
aging society with fewer children, nursing-care robots have been developed by several
companies. To realize advanced autonomous robot control in such complex living environment,
visual information plays important role. Especially, recognition of moving objects is very
important task to avoid a collision or to recognize human gesture.
There are many methods to detect moving objects from camera [2], but most of them do
not extract shape of the moving objects, simultaneously. As described in following section, some
systems have already been developed to detect moving object from camera. However, they have
some restriction to detection of Objects Location, or these recognition systems tend to be
expensive by employing multiple sensing elements. If, it becomes possible to realize the
recognition system using only camera pictures, then it can be mounted on various systems easily.
Therefore, in this seminar, we propose a method for not only detecting but also extracting
moving objects from only stereo camera.
1.2 RELEVANCE
There are certain approaches for Object detection. Only by using one Camera it is
possible to detect the moving Object but the problem arises while detecting how far the object is
from Camera Location. Object Distance we cannot get from this approach [3]. This approach
may get failed for long distance Object Detection or for detecting exact position of Object.
To detect moving objects, background subtraction and inter-frame subtraction are well
known methods. Background subtraction is obtained from subtraction between pre-taken
background image and an input image. By this method, it’s possible to detect moving objects
almost completely. Inter-frame subtraction is obtained from subtraction between a previous
image and an input image. This method is adaptive in the dynamic environment change
comparing to background subtraction [6].
But Problem comes when the Object is not moving and it is nearer to the Camera then
above approach fails to detect the Object. So because of that we are going for stereo approach.
What is stereo approach is discussed in further chapters.
2
1.3 ORGANIZATION OF SEMINAR REPORT

Our aim is to detect the Object by using the Stereo Cameras with the help of Disparity
Maps. So this can be done with the help of following modules of System [7].
1. Dense-Stereo Matching: First step to perform the dense stereo matching to yield
disparity estimates of the imaged scene. Here the correspondence-matching algorithm by
Konolige [4] is used for Stereo Matching.
2. u- and v-Disparity Image Generation: The u- and v-disparity images are histograms
that bin the disparity values d for each column or row in the image, respectively. This is
useful for Detecting Position of Object for next stage. The resulting v-disparity histogram
image indicates the density of disparities for each image row v, whereas the u-disparity
image shows the density of disparities for each image column u.
3. Object-Bounding-Box Generation: Here Object can be extracted from regions-of-
interest (ROI) in the u- and v-disparity images. The ROIs in the u-disparity image are
extracted by scanning the rows of the image for continuous spans where the histogram
value exceeds the given threshold.
4. Object Filtering and Merging: We merge overlapping bounding box Objects if their
overlap is significant and the disparities associated with the bounding boxes are close.
5. Object Extraction: In this stage it will extract the Object Information like shape,
Direction and its path.
1.4 SUMMARY
This chapter has introduced a Stereo Approach of Object Detection. It also discussed how
Stereo Approach gives more Accuracy and information about Object.
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
3
There are certain approaches for Object detection. Only by using one Camera it’s
possible to detect the moving Object but the problem arises while detecting how far the object is
from Camera Location. Object Distance cannot be extracted from this approach [3]. This
approach may get failed for long distance Object Detection or for detecting exact position of
Object.
Some approaches fails when the Object is not moving and it is nearer to the Camera.
Therefore stereo approach is commonly used in Object detection. As stereo Approach give basic
information about the objects location in 3D environment. Also there are many systems which
make use of stereo approach combine with other techniques to improve the Object Detection as
well as Object Extraction.
2.2 LITERATURE SERVEY
There are many approaches for Object detection. Basic approaches are explained below:
To detect moving objects, background subtraction and inter-frame subtraction are well
known methods [6]. Background subtraction is obtained from subtraction between pre-taken
background image and an input image. By this method, it can detect moving objects almost
completely. Inter-frame subtraction is obtained from subtraction between a previous image and
an input image. This method is adaptive in the dynamic environment change comparing to
background subtraction. However, when camera moves itself, it is difficult to detect moving
objects by these methods because of less distinction between background and moving objects.
4
In such case, optical flow becomes a key to extract moving objects. Frazier proposes the
method using complex logarithmic mapping, and Takeda proposes the method using residual
error of FOE estimation. In the first case, calculation cost is large because the mapping is
generated every frame, and the camera motion is restricted to parallel direction. In the latter case,
the camera motion is not restricted, but it is difficult to calculate FOE from optical flow.
Also, Ogale proposes the method using 2-D optical flow. In this method, moving objects
are classified into three classes. When the objects are moving to different direction from the
camera, it can be detected using motion-based clustering. On the other hand, when an object is
moving to same direction with camera, optical flow will not be sufficient to detect the moving
object. So, ordinal depth conflict is needed as an additional constraint. When an object is
occluded behind another object, we can see the order of objects. At the same time, optical flow
lengths of background objects are inversely proportional to distance from camera. Therefore, if
ordinal depth conflict occurs, it can detect moving object. However when it detects the
independent object: there are no object overlapping moving object, it is said that it need another
source.
In reference [6], they employ 3-D optical flow and extract all moving object in images
from a stereo camera with free movement. At first, it captures successive left and right images as
input from the stereo camera. Secondly, it obtains feature points from successive three right
images and 3-D coordinates at these points from stereo images. Thirdly, it relates feature points
between next frame and a previous frame, and then we estimate the camera motion by analyzing
the change of world coordinate at feature points. Then it corrects these successive images for the
estimated camera motion. Finally, it extracts moving objects using information form inter-frame
subtraction, edge, inter-frame subtraction and stereo. It is shown in diagrams below.
5
Fig. 1: Compensation of Camera Motion and Object Detection

But Problem comes when the Object is not moving and it is nearer to the Camera then
above approach fails to detect the Object. So because of all above problems it is preferable to go
for stereo approach for Object Detection.
A fundamental step to analyzing Object with stereo imagery is to detect obstacles in the
scene and localize their position in 3D space from the disparity maps generated from stereo
correspondence matching [7]. The disparity images derived from stereo analysis can be used to
generate a list of object regions in the scene. We adapt a classical approach to obstacle detection
in stereo imagery proposed by Labayrade et al. [5] that utilizes the concept of v-disparity to
identify potential obstacles in the scene. Essentially, v-disparity is a histogram of the disparity
image that counts the occurrence of disparity values for each row in the image and can be used to
detect the ground plane in the scene and isolate regions that contain obstacles. Variations of this
approach to detecting objects in stereo imagery have been implemented. However, this paper
illustrates a generalized framework that is able to obtain dense stereo correspondences and robust
ground plane estimates with both color and infrared-based stereo technique consists of the
following stages:
1. Dense-Stereo Matching
2. u- and v-Disparity Image Generation
3. Object-Bounding-Box Generation
4. Object Filtering & Merging
5. Object Extraction
There is one more approach is to match the stereo image using Sum of Absolute
Differences (SAD) correlation algorithm to establish correspondence between image features in
the different views of scene [8]. This is used to produce a stereo disparity image containing
information about the depth of objects away from the camera in the image. A geometric
6
projection algorithm is then used to generate a 3-Dimensional (3D) point map, placing pixels of
the disparity image in 3D space. This is then converted to a 2-Dimensional (2D) depth map
allowing objects in the scene to be viewed The disparity mapping is produced by block matching
algorithm Sum of Absolute Differences (SAD). This assistive technology utilizing stereoscopic
cameras has the purpose of automated obstacle detection, path planning and following, and
collision avoidance during navigation.
Stereo Disparity
The purpose of stereo vision is to perform range measurements based on the left and
right images obtained from stereoscopic cameras. Basically, an algorithm is implemented to
establish the correspondence between image features in different views of the scene and then
calculate the relative displacement between feature coordinates in each image. In order to
produce a disparity image, the PGR software’s inbuilt SAD correlation algorithm in Eq. is used
to compare a neighborhood in one stereo image to a number of neighborhoods in the other stereo
image. [9]
where a window of size (2M +1)×(2M +1) (called a correlation mask) is centered at the
coordinates of the matching pixels (i, j) , (x, y) in one stereo image, IL and IR are the intensity
functions of the left and right images, and dmin, dmax are the minimum and maximum disparities.
The disparity dmin of zero pixels often corresponds to an infinitely far away object and the
disparity dmax denotes the closest position of an object. The disparity range was tuned, through
trials comparing range and matching accuracy, to between 0.5m and 4.75m which provided
adequate mapping accuracy and distance for response to obstacles.
3D Point Map Generation

Once a disparity image is produced from the processed left and right camera images, a
3D point map can be created which maps the depth-determined pixels from the disparity image
into a 3D plane. The PGRView software then allows the 3D plane to be rotated and observed
from different viewpoints. This is a very useful feature in determining where there was noise and
which calibration settings improved the 3D point cloud for the purposes of accurate obstacle
detection and depth analysis.
The Result of above experiment is shown in diagrams below.
7
Another paper presents a method to solve the correspondence problem in matching the
stereo image using Sum of Absolute Differences (SAD) algorithm [10]. The computer vision
application in this paper is using an autonomous vehicle which has a stereo pair on top of it. The
8
estimation of range is using curve fitting tool (cftool) for each detected object or obstacles. This
tool is provided by Matlab software. The disparity mapping is produced by block matching
algorithm Sum of Absolute Differences (SAD).
A. Image Rectification
The rectification of stereo image pairs can be carry out under the condition of calibrated
camera. To quickly and accurately search the corresponding points along the scan lines,
rectification of stereo pairs are performed so that corresponding epipolar lines are parallel to the
horizontal scan-lines and the difference in vertical direction is zero. Image rectification is the
undistortion according to the calibration parameters calculated in the camera calibration. After
all intrinsic and extrinsic camera parameters are calculated they can be used to rectify images
according to the epipolar constraint [4]. The rectification process is shown by Figure 4. The
process above starting with acquire stereo images after that the image programming software
Matlab will enhance the images using histogram equalization method. The next step is finding
the matching point to be rectified. This problem faces a correspondence problem. Then the
matched point and camera calibration information are applied to reconstruct the stereo images to
form a rectified images.
The equation below is used to rectify the images in Matlab.
Inew (x0, Y0) = a1Iold (x1, y1) + a2Iold (x2, y2) + a3Iold(x3, y3) + a4Iold (x4, y4)
Fig. 5 Rectification Process
9
Fig. 6 Original Image (a) and Image after Rectification (b)

With Inew and IOld as the original and the rectified image and the blending coefficients ai
separate for each camera. Above Figure 6(a)(b) are the original image before rectification and
after rectification. The output size of rectified stereo Image is 320x240. The horizontal line for
both images indicates the left and right image is horizontally aligned compared to image Figure
(a).
B. Stereo Correspondence
With assume from now on that the images are rectified. That is, the epipolar lines are
parallel to the rows of the images. By plotting the rectified images indicate the horizontal
coordinate by x and the vertical coordinate by y. With that geometry, given a pixel at coordinate
xb the problem of stereo matching is to find the coordinate xr of the corresponding pixel in the
same row in the right Image. The difference d = xr - xi is called the disparity at that pixel. The
basic matching approach is to take a window W centered at the left pixel, translate that window
by d and compare the intensity values in W in the left image and W translated in the right image.
The comparison metric typically has the form:
SAD: ∑Il(x, y), Ir (x+d, y)) = ∑ | Il (x, y) - Ir (x+d, y) |
10
The function of SAD measures the difference between the pixel values. The disparity is
computed at every pixel in the image and for every possible disparity. It sums up the intensities
of all surrounding pixels in the neighborhood for each pixel in the left image. The absolute
difference between this sum and the sum of the pixel, and Its surrounding, in the right image is
calculated. The minimum over the row in the right image is chosen to be the best matching pixel.
The disparity then is calculated as the actual horizontal pixel difference. The output is a disparity
image. Those Images can be interpreted as disparity being the inverse of the depth (larger
disparity for points closer to the cameras).
Fig. 7 SAD Block Matching Process

To calculate stereo correspondence of stereo Images, there are some simple standard
algorithms by using block matching and matching criteria. The blocks are usually defined on
epipolar line for matching ease. Each block from the left image is matched into a block in the
right image by shifting the left block over the searching area of pixels in right image as shown in
Figure above. At each shift, the sum of comparing parameter such as the intensity or color of the
two blocks is computed and saved. The sum parameter is called "match strength". The shift
which gives a best result of the matching criteria is considered as the best match or
correspondence.
According to the SAD algorithm works on each block from the left image is matched into
a block in the right image by shifting the left block over the searching area of pixels in right
image as shown in Figure above. Ideally, for every pixel mask within the original Image there
should be a single mask within a second image that is nearly identical to the original and thus the
SAD for this comparison should be zero.
C. Disparity Mapping
11
Together with the stereo camera parameters from calibration and the disparity between
corresponding stereo points, the stereo images distances can be retrieved. In order to find
corresponding pairs of stereo points, they first have to be compared for different disparities, after
which the best matching pairs can be determined. The maximum range at which the stereo vision
can be used for detecting obstacles depends on the image and depth resolution. Absolute
differences of pixel intensities are used in the algorithm to compute stereo similarities between
points. By computing the sum of the absolute differences for pixels in a window surrounding the
points, the difference between similarity values for stereo points can be calculated. The disparity
associated with the smallest SAD value is selected as best match Figure below shows the
disparity mapping using SAD block matching algorithm.
D. Range Estimation using Curve Fitting Tool:

The estimation of the obstacle's range in this paper is using curve fitting tool in Matlab to
determine the range according to the pixel values. Each pixel in the mapping of disparity will be
calculated through the curve fitting tool and the coordinate of horizontal is referring to left
Image.
Fig. 8 using Tsai’s method
The equation of the distance estimation is:

Range = a*exp(b*x) + c*exp(d*x)
12
• a = 0.339
• b = -3.525
• c= 0.9817
• d= -0.4048
Where the value of a, b, c and d is a constant value produced by curve fitting tool. The
value of x represents the value of pixels in the disparity mapping. The curve can be explained as
Figure below. X axis represents disparity value in pixel density and y axis shows the distance or
range in meter for every disparity values.
Fig. 9 Curve Fitting tool Window

2.4 SUMMARY
This section discussed various approaches of detection object by using Stereo Camera
Images. The first approach was of background subtraction and inter-frame subtraction to detect
the moving object. Another one is Stereo Disparity using Sum of Absolute Differences (SAD)
algorithm & 3D Point Map Generation. Next approach
was of using Image rectification before going for Sum of
Absolute Differences (SAD) algorithm.
CHAPTER 3
INTRODUCTION
13
Fig. 10 Object Detection Algorithm

So, after discussing various approaches used in previous papers, the basic approach or
steps are shown in list below.
1. Dense-Stereo Matching
2. u- and v-Disparity Image Generation
3. Object-Bounding-Box Generation
4. Object Filtering & Merging
5. Object Extraction
In this chapter all of above steps is discussed in details.
3.1 DENSE-STEREO MATCHING:

The First step is to perform the dense stereo matching for yielding disparity estimates of
the imaged scene. Here images from Stereo Camera are processed (matched) together and the
disparity images will be found with the help of various algorithms. The background behind the
Stereo Matching is the Triangulation Geometrics explained below [12]:
Triangulation Geometrics
The technique for gauging depth information given two offset images is called
triangulation. Triangulation makes use of a number of variables; the center point of the cameras
(c1, c2), the cameras focal lengths (F), the angles (O1, O2), the image planes (IP1, IP2), and the
image points (P1, P2). The following examples show how the triangulation technique works.
Fig. 11 Triangulation Geometrics
For any point P of some object in the real world, P1 and P2 are pixel point representations
of P in the images IP1 and IP2 as taken by cameras C1 and C2. F is the focal length of the camera
14
(distance between lens and film). B is the offset distance between cameras C1 and C2. V1 and V2
are the horizontal placement of the pixel points with respect to the center of the camera. The
disparity of the points P1 and P2 from image to image can be calculated by taking the difference
of V1 and V2. This is the equivalent of the horizontal shift of point P1 to P2 in the image planes.
Using this disparity one can calculate the actual distance of the point in the real world from the
images. The following formula can be derived from the geometric relation above:
Distance of Point in Real World= (base offset) * (focal length of camera)(disparity) OR D=bfd
This formula is used to calculate the real world distance of a point. If we are interested in
is relative distance of points rather than exact distance we can do this with even less information.
The base offset and focal length of the camera are the same for both images. Hence the distance
of different points in the images will vary solely based on this disparity component. Therefore
we can gauge relative distance of points in images without having the base offset and focal
length.
Triangulation works under the assumption that points P1 and P2 represent the same point
P in the real world. An algorithm for matching these two points must be performed. This can be
done by taking small regions in one image and comparing them to regions in the other image.
Each comparison is given a score and the best match is used in calculating the disparity.
The technique for scoring region matching varies, but usually is based on the number of pixels
that are the same on an exact or near-exact point basis. Both triangulation technique for stereo
image matching and technique for point-matching within a region are successfully implemented
in the “Cooperative Algorithm for Stereo Matching and Occlusion Detection”
One of Stereo Vision experiment gives the output of Stereo Matching as shown in figures
below [11].
15
Fig. 12 Dense Stereo Matching Result
In the figure above we can see very small difference between the Left & Right Camera
Image. That small Image Difference is being calculated by Algorithm. Here the correspondence-
matching algorithm by Konolige [4] is selected for its ease of use and reliable disparity
generation for both color-stereo and infrared-stereo imageries. In this progress, there are two
processes. One is low-pass filtering and calibration. The other is stereo processing. In the stereo
processing, the stereo matching is based on relation. We apply following equation to obtain the
disparity map [6],
where dmax and dmin mean disparity range, m means window size (mm). Iright and Ileft mean
right and left image. X and y mean remarkable point. Passing through these processes, we obtain
the depth image. In this paper, dmax equals 40, dmin equals 0 and m equals 11.
3.2 U- AND V-DISPARITY IMAGE GENERATION:
16
The u- and v-disparity images are histograms that bin the disparity values d for each
column or row in the image, respectively. The resulting v-disparity histogram image indicates the
density of disparities for each image row v, whereas the u-disparity image shows the density of
disparities for each image column u. Fig. 14 shows an example of u-disparity images, and Fig. 14
shows the corresponding v-disparity images generated from the color stereo and infrared-stereo
disparity maps in Fig. 13.
Fig. 13 Disparity Image
Fig. 14 U- & V-Disparity Image from Color-Stereo Images

Notice that the u-disparity images in Fig. 14 show three distinct horizontal regions
corresponding to the three pedestrians in the scene. It is these regions that we wish to detect in
order to build objects areas. The region spanning the entire length at the top of the u-disparity
image indicates the background plane and can be filtered from processing. Similarly, the v-
disparity images in Fig. 14 show vertical peaks of high density for both the background plane
and the range of disparities containing pedestrians. These regions also need to be detected to
17
build objects. Additionally, the downward-sloping trend for each row in the v-disparity image is
exploited to estimate the ground plane in the scene [13].
3.3 OBJECT BOUNDING BOX GENERATION:

Bounding-box Objects can be extracted from regions-of-interest (ROI) in the u- and v-
disparity images. The ROIs in the u-disparity image are extracted by scanning the rows of the
image for continuous spans where the histogram value exceeds the given threshold. Fig. 14(a)
and (b) overlays the extracted regions on the u-disparity image. The ROIs are extracted from the
v-disparity image by selecting columns where the sum of the histogram values above the ground
plane is greater than the threshold. The ROI spans from the ground plane to the highest point in
the column that exceeds the given threshold. Fig. 14(a) and (b) shows the extracted regions in on
the v-disparity image. The Object bounding boxes are selected from the ROIs in the u- and v-
disparity images based on their disparity values. For a given disparity d, the widths of the
bounding boxes are determined by the ROIs found in the u-disparity image, and the heights are
derived from the ROIs in the v-disparity image. Large bounding boxes associated with
background regions are filtered, and the remaining objects are shown in Fig. 15.
Fig. 15 Bounding-box objects with color-stereo images.
3.4 OBJECT FILTERING AND MERGING:

As shown in Fig. 15, there are often multiple overlapping object bounding boxes
generated. This occurs because the disparities associated with a single object span a range of
values, particularly as the object moves closer to the camera. We merge significantly overlapping
objects if the disparities that are associated with the bounding boxes are close. The final object
bounding boxes are shown in Fig. 16. Notice how the overlapping candidates have merged into
the correct bounding boxes corresponding to the pedestrians(Objects) in the scene.
18
Fig. 16 Final selection of Objects

after bounding box merging with
color-stereo images.
3.5 OBJECT EXTRACTION:

After we have detected the Object then it is possible to go for Extracting the information
about the Object like its shape, location and direction of moving. By using the Segmentation on
the output we can get the shapes of the Object as shown in figures below.
Fig. 17 Outlined foreground extraction for color
images using Color segmentation
Further it is also possible to measure the distance of object from camera with the help of
Disparity Map. As Disparity Map gives the Objects Distance Information on the Brightness of
Object in Disparity Map, we can detect object which is so near or far away. This information is
useful in application like Pedestrian detection & Tracking.
For shape Extraction we can also go for Image background Subtraction or inter-frame
Subtraction as discussed before in reference [6].
19
There is one more application in which this stereo approach can be used, which is a face
detection in Stereo Environment [14]. With the help of Disparity Maps it is possible to create 3D
Depth Map of Object represented by Grids. Then by using Grid map of Face we can create 3D
Face as shown in diagram below.
Fig. 18 creating 3D Face using Disparity Maps: (a) left frame of a stereo image, (b) reconstructed disparity map, (c)
corresponding depth-map on a 3D grid, (d) reconstructed 3D head.
It is also possible to create the 3D View of any Object with the help of Disparity Map,
but for that the Disparity Map must be of very high quality which give us wide range of distance
in terms of its intensity.
2.4 SUMMARY
This section has discussed the One of the approach for detecting Object using the Stereo
cameras with the help of Disparity Maps. That contained the way of finding the Disparity Map in
the Triangulation geometry. Then the disparity image get divide into u- and v-disparity maps
which are like histograms of disparity map, and helped to get exact object location on the Scene.
After getting the Object Box over Object with the help of Object Bounding Box generation and
Object filtering & merge. At last we have seen that how object’s shape and distance can be
calculated with the help of Disparity Maps by which we can get more information about Object.
20
CHAPTER 4
CONCLUSION
We have presented a technique for Object Detection with the help of Stereo Cameras.
Paper discussed the various approaches of finding the Object in Scene given by the cameras.
With help of correspondence-matching algorithm by Konolige [4] we got the Disparity map.
Here it is possible to get more accuracy for the Object Detection by using Stereo Vision,
compared to previous approaches. So by using this technique not only we are getting the Object
detected but also get 3D Information about Object as well as the Scene as a whole. The result of
this technique is shown below from reference [7].
21
So this stereo vision technique can be used for application like Automated Systems,
Robotics, Pedestrian Detection for cars, 3D Face Detection [14], Aerial Surveys for calculation
of contour maps or even geometry extraction for 3D building mapping, calculation of 3D
heliographical information such as obtained by the NASA STEREO project
CHAPTER 5
REFERENCE
[1] R.Cucchiara, C.Grana, A.Prati, G.Tardini, R.Vezzani, “Using computer vision techniques
dangerous situation detection in domotic applications,” Proc. of the IDSS04, London, Great Britain,
pages 1–5, February, the 23rd 2004.
[2] R.Fablet, P.Bouthemy, M.Gelgon, “Moving object detection in color image sequences using region-
level graph labeling,” ICIP, 1999.
[3] A. Ess, B. Leibe, K. Schindler, L. van Gool, “Moving Obstacle Detection in Highly
Dynamic Scenes” IEEE International Conference on Robotics and Automation, 2009.
[4] K. Konolige, “Small vision systems: Hardware and implementation,” in Proc. 8th Int.
Symp. Robot. Res., 1997, pp. 111–116.
22
[5] R. Labayrade, D. Aubert, and J.-P. Tarel, “Real time obstacle detection in stereovision on
non flat road geometry through “v-disparity” representation.” in IEEE Conference on
Intelligent Vehicles, 2002.
[6] Masakazu MORIMOTO, Yasuhiro MITO, Kensaku FUJII, “AN OBJECT DETECTION
AND EXTRACTION METHOD USING STEREO CAMERA” in Automation Congress,
2008. WAC 2008.
[7] Stephen J. Krotosky and Mohan Manubhai Trivedi “On Color-, Infrared-, and Multimodal-
Stereo Approaches to Pedestrian Detection” IEEE TRANSACTIONS ON INTELLIGENT
TRANSPORTATION SYSTEMS, VOL. 8, NO. 4, DECEMBER 2007
[8] Jordan S. Nguyen, Thanh H. Nguyen, Hung T. Nguyen, “Semi-autonomous Wheelchair
System Using Stereoscopic Cameras” 31st Annual International Conference of the IEEE
EMBS Minneapolis, 2009.
[9] T. H. Nguyen, J. S. Nguyen, D. M. Pham, and H. T. Nguyen, "Real-Time Obstacle
Detection for an Autonomous Wheelchair Using Stereoscopic Cameras," in The 29th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, 2007.
[10]Rostam Affendi Hamzah, Rosman Abd Rahim, Zarina Mohd Noh, “Sum of Absolute
Differences Algorithm in Stereo Correspondence Problem for Stereo Matching in
Computer Vision Application” IEEE, 2010
[11]Zeng-Fu Wang, Zhi-Gang Zheng , “A Region Based Stereo Matching Algorithm Using
Cooperative Optimization”
[12]http://disparity.wikidot.com/
[13]K. Konolige, “Small vision systems: Hardware and implementation,” in Proc. 8th Int.
Symp. Robot. Res., 1997
[14]Sergey Kosov, Kristina Scherbaum, Kamil Faber, Thorsten Thorm¨ahlen, Hans-Peter
Seidel , “RAPID STEREO-VISION ENHANCED FACE DETECTION” ICIP 2009
23

Stereo Vision Based Object Detection

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stereo Vision Based Object Detection

Uploaded by

Copyright:

Available Formats

M.E.

(Electronics & Telecommunication) Stereo Vision based Object Detection

SKNCOE – Electronics & Telecommunication Engineering – 2010

1.3 ORGANIZATION OF SEMINAR REPORT

2.2 LITERATURE SERVEY

SKNCOE – Electronics & Telecommunication Engineering – 2010

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 1: Compensation of Camera Motion and Object Detection

3D Point Map Generation

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 5 Rectification Process

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 6 Original Image (a) and Image after Rectification (b)

Fig. 7 SAD Block Matching Process

D. Range Estimation using Curve Fitting Tool:

Fig. 8 using Tsai’s method

The equation of the distance estimation is:

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 9 Curve Fitting tool Window

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 10 Object Detection Algorithm

In this chapter all of above steps is discussed in details.

3.1 DENSE-STEREO MATCHING:

Fig. 11 Triangulation Geometrics

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 12 Dense Stereo Matching Result

3.2 U- AND V-DISPARITY IMAGE GENERATION:

SKNCOE – Electronics & Telecommunication Engineering – 2010

Fig. 13 Disparity Image

Fig. 14 U- & V-Disparity Image from Color-Stereo Images

SKNCOE – Electronics & Telecommunication Engineering – 2010

3.3 OBJECT BOUNDING BOX GENERATION:

Fig. 15 Bounding-box objects with color-stereo images.

3.4 OBJECT FILTERING AND MERGING:

Fig. 16 Final selection of Objects

3.5 OBJECT EXTRACTION:

SKNCOE – Electronics & Telecommunication Engineering – 2010

SKNCOE – Electronics & Telecommunication Engineering – 2010

SKNCOE – Electronics & Telecommunication Engineering – 2010

You might also like