You are on page 1of 6

A Real-Time Multiple-Vehicle Detection and Tracking System with Prior Occlusion

Detection and Resolution



Bing-Fei Wu, Shin-Ping Lin, Yuan-Hsin Chen,

Department of Electrical and Control Engineering, National Chiao Tung University,
1001 Ta-Hsueh Road, Hsinchu 30050, Taiwan
Email: bwu@cc.nctu.edu.tw

Abstract The proposed multiple-vehicle detection and tracking
(MVDT) system utilizes a color background to segment moving
objects and exploits relations among the moving objects and existed
trajectories to track vehicles. Initially, the background is extracted
by classification. Then, it is regularly updated by previous moving
objects to guarantee robust segmentation in luminance-change
circumstance. For partial wrong converged background due to
roadside parking vehicles, it will be corrected later by checking fed
back trajectories to avoid false detection after the vehicles moving
away. In tracking processing, the relations of distances or distances
and angles are applied to determine whether to create, extend, and
delete a trajectory. If occlusion detected after trajectory creation, it
will be resolved by rule-based tracking reasoning. Otherwise, lane
information will be used. Finally, traffic parameter calculations
based on the trajectories are listed. Moreover, for easy setup,
parameter automation for the system is proposed.
Keywords - Detection, segmentation, tracking, occlusion, rule-
based reasoning, traffic parameter.

I. INTRODUCTION

Visual sensing system has many applications in
intelligent transportation system (ITS). In comparison to
traditional sensing system, it is easier to setup, lower cost,
and versatile. However, it spends more processing time than
traditional ones. Therefore, a real-time multiple-vehicle
detection and tracking (MVDT) system is proposed to reduce
the processing time. Generally, a MVDT system is
constituted by vehicle detection and tracking processing. In
the following two paragraphs, we will review previous works
related to the two processing.
In vehicle detection processing, virtual slit [1] and
virtual loop [2] exploits the concept of inductive loop [3] to
detect vehicle passing by monitoring illumination change in
pre-specified regions of a frame. As the kind of processing
checks the pre-specified regions of frame only, its processing
speed is fast. However, it is hard to setup, expensive, and
functional limited. Another alternative uses double-difference
operator [4] with gradient magnitude to detect vehicles.
Although the kind of processing is more complicated than
previous one, it can gather more vehicle information.
However, it is hard to adapt the luminance changes due to
daylight, weather, or automatic electric shutter (AES).
Consequently, optical flow based techniques which estimate
the intensity motion between two subsequent frames is used
to overcome the change of luminance [5], [6]. However, it
requires much time to find out an optimal solution. Hence,
Smith et al. [7], Brandt et al. [8] and Papanikolopoulos et al.
[9] that dynamically update an estimated background to
detect moving objects can adapt the luminance changes in
real-time. However, the techniques depend on an initial
background without vehicle inside.
For optical flow based tracking technique, maximum a
posterior (MAP) is used to prove or disprove a given
hypothesis (trajectory) based on Bayesian inference. For
instance, Kamijo et al. [1] and Tao et al. [6] use MAP to track
occluded vehicles by spatio-temporal Markov random field
(ST-MRF) and dynamic layer shape, motion, and appearance
model respectively. However, MAP needs much computation
power. For complexity reduction, Li et al. [5] and Smith et al.
[7] use sequential importance sampling (SIS) belonged to a
class of Monte Carlo method and Sum-of-Squared
Differences (SSD) with dynamic pyramiding respectively.
Even with the improvements, MAP can only satisfy near real-
time issue. In comparison to MAP, the processing speed of
extended Kalman filter (EKF) based techniques are faster [2],
[10]. The technique estimates positions and velocities (states)
of vehicles represented in dynamic models. Although the
technique is robust, it will converge to wrong states if
vehicles are occluded. For such a reason, rule-based
reasoning is used to reduce the processing time and to
overcome the occlusion problem. However, the method used
by [4], [8] spends much time to detect occlusion.
In this study, dynamic segmentation and rule-based
tracking reasoning are proposed for vehicle detection and
tracking processing respectively. The two techniques take the
processing speed, preciseness, and robustness into
consideration. The details are described in the following two
sections.

II. SYSTEM OVERVIEW

The proposed MVDT system consists of dynamic
segmentation and rule-based tracking reasoning. First, the
dynamic segmentation uses current video frame, previous
moving objects and previous trajectories to segment current
311 0-7803-9314-7/05/$20.002005 IEEE
2005 IEEE International
Symposium on Signal Processing
and Information Technology
moving objects. Then, the rule-based tracking reasoning
utilizes the current moving objects with previous trajectories
to find current trajectories. The block diagram of the
proposed system is shown in Fig. 1.


Dynamic
segme ntation
Rule -based tracking
reasoning
Dela y
Dela y
Previous
trajectories
Frames
Previous
mo vi ng obj ects
Moving
obj ects
Trajectories

Fig. 1. The block diagram of the proposed MVDT system


III. DYNAMIC SEGMENTATION AND RULE-BASED
TRACKING REASONING

For real-time issue, the dynamic segmentation reduces
vehicle detection problem to subtraction between the current
frame and the statistically maintained color background, and
the rule-based tracking reasoning simplifies vehicle tracking
problem to relate centers of segmented moving objects to
centers of the last trajectory nodes. For preciseness issue, the
rule-based tracking reasoning uses spatial and spatio-
temporal filter to eliminate false detected and vibrated
moving objects. For robustness issue, the dynamic
segmentation exploits background compensation to maintain
a color background, and the rule-based tracking reasoning
considers prior occlusions, and mis-detected objects.
A. Color Background Extraction

The concept of color background extraction is to exploit
the appearance probability (AP) of each pixels color to
extract background. That is, for sufficient long time, a color
with the maximum AP is most probable belonged to the
background. However, to get the AP for each pixels color
requires lots of memories and has to de-noise. Hence, the AP
of each pixels color class is utilized instead.
A color class located at coordinate (x, y) has an ordered
number c to unique identify it. To calculate AP and to
classify pixels color, a counter, CC(x, y, c), and a color mean,
CM(x, y, c) = [CM
R
(x, y, c), CM
G
(x, y, c), CM
B
(x, y, c)]
T
, are
created. The total number of classes located at the same
coordinate is denoted as NC(x, y).
Initially, there is only one class for each pixel. As pixel
of a frame located at (x, y) and sampled at time instance t will
be denoted as f(x, y, t) = [f
R
(x, y, c), f
G
(x, y, c), f
B
(x, y, c)]
T
, 0-
th color mean, counter, and number of classes are initialized
as CM(x, y, 0) = f(x, y, 0), CC(x, y, 0) = 1, and NC(x, y) = 1
respectively. In addition, a color background, BG(x, y) =
[BG
R
(x, y), BG
G
(x, y), BG
B
(x, y)]
T
, is set as [-1, -1, -1]
T
to
indicate that all pixels are not converged yet.
Then, a decision function utilizes the sum of absolute
differences (SAD), SAD(x, y, c), shown in Eq.(1) to
determine whether to classify current pixel color to the c-th
class or to create a new class.
) , , ( ) , , (
) , , ( ) , , (
) , , ( ) , , ( ) , , (
c y x CM t y x f
c y x CM t y x f
c y x CM t y x f c y x SAD
B B
G G
R R

+
+
(1)
First, the decision function classifies the pixel to a class
j by Eq.(2). Then the corresponding SAD SAD(x, y, j) is
compared with a fix threshold TH1. If the SAD(x, y, j) is less
than the TH1, the CM(x, y, j) and CC(x, y, j) will be updated
according to Eq.(3). Otherwise, a new class is created
according to Eq.(4).
) , , ( min arg
) , ( 0
c y x SAD j
y x NC c<

(2)

+
+
+

1 ) , , ( ) , , (
1 ) , , (
) , , ( ) , , ( ) , , (
) , , (
j y x CC j y x CC
j y x CC
t y x j y x j y x CC
j y x
f CM
CM
(3)

1 ) , ( ) , (
1 )) , ( , , (
) , , ( )) , ( , , (
y x NC y x NC
y x NC y x CC
t y x y x NC y x f CM
(4)
As time goes by, the class counter that belongs to
background will be increased rapidly. The AP: AP(x, y, c) of
each class is defined as Eq.(5). The k-th class that is most
probable to be classified to background is decided by Eq.(6).
Then, the CM(x, y, k) should be rounded to background
BG(x, y) or not depends on whether the AP of the class
greater than a dynamic threshold TH2 or not.
1
) , , (
) , , (
, ,
) , , (
1 ) , (
0
+

t
c y x CC
c y x CC
c) y CC(x
c y x AP
y x NC
c
(5)
312
) , , ( max arg
) , ( 0
c y x CC k
y x NC c<

(6)
B. Moving Objects Segmentation

With the extracted background, we can detect the
moving objects by check the sum-of-difference between
background and the input frame. The sum-of-difference for
each pixel is defined as Eq.(7). A binary mask of moving
objects, MM(x, y), or the complement of a background
mask, ) , ( y x BM , are obtained by Eq.(8). In the equation,
MTH
L
and MTH
H
, are dynamic threshold described in the
next paragraph. The moving object mask is exploited for
vehicle tracking processing to track trajectories. The
background mask is used to select regions of background that
needed to be updated with a predefined n by Eq.(9). If the n
is too large, the background will be hard to adapt slow
illumination change. However, if the n is small, the
background will be easily affected by moving objects. For
such a reason, the selected of n is important. In our
experience, n = 8 can satisfy the two issues mentioned above.
) , ( ) , , (
) , ( ) , , (
) , ( ) , , ( ) , (
y x BG t y x f
y x BG t y x f
y x BG t y x f y x MSD
B B
G G
R R

+
+
(7)

>
<

otherwise 0
, ) , (
and ) , (
1
) , ( ) , (
H
L
MTH y x MSD
MTH y x MSD
y x BM y x MM
(8)
n
t y x y x n
y x y x BM
) , , ( ) , ( ) 1 (
) , ( , 1 ) , ( If
f BG
BG
+

(9)
As the background updated at each time instance, the
dynamic segmentation proposed can overcome the slow
illumination change, such as the change of daylight or
weather, with fixed threshold. However, for rapid
illumination change, such as the effect of AES, a fixed
threshold will cause false detection. Therefore, an adaptive
thresholding method is proposed to find the low-valley VL =
[VL
R
, VL
G
, VL
B
,]
T
and high-valley VH = [VH
R
, VH
G
, VH
B
,]
T

of filtered difference distribution FD(n) = [FD
R
(n), FD
G
(n),
FD
B
(n)]
T
between the background and the subsequent frame.
The filtered difference distribution is the output of difference
distribution D(n) = [D
R
(n), D
G
(n), D
B
(n)]
T
shown in Eq.(10)
smoothed by Eq.(11). The reason for not using the D(n) to
find the valleys directly is due to that D(n) is noisy. The noise
will affect the Laplacian operator in Eq.(12) to find the
correct valleys. The concept to find valleys for FD(n) is
based on the observation: no matter a frame affected by AES
or not, most frequent appeared differences in D(n) are most
possibly belongs to background. With the valleys, we then
can obtain the dynamic threshold MTH
L
and MTH
H

mentioned before by Eq.(13).
B G R C n D
n y x BG t y x f
C
C C
, , where , 1 ) (
) , ( ) , , (



(10)
taps of number the is 1 2 where
,
1 2
) (
) (
+
+

+

p
p
n
n
p n
p n i
D
FD
(11)
( )
B G R C VH VL
n
n n n n
C C
, , with where
, 0 ) ( arg ,
) 1 ( ) ( 2 ) 1 ( ) (
2
2
<

+ +
FD VH VL
FD FD FD FD
(12)
B G R H
B G R L
VH VH VH MTH
VL VL VL MTH
+ +
+ +
(13)

C. Background Compensation

For partial wrong converged background due to
roadside parking vehicles, trajectories are fed back from
vehicle tracking processing to decide whether the moving
objects are false detected of not. If the moving objects are
false detected, the following three situations will occur:
1. The centers of moving objects do not change too much for
a period of time;
2. The centers of the first trajectory nodes are not near to the
boundary of a detection zone;
3. There are no edges near the contour of moving objects.
If any trajectory of moving objects satisfies the three
situations, the regions of moving objects will be set as
background.

D. Filter Out False Detected Objects

In general, some false detected objects can be
eliminated by the spatial properties obtained after connected
component labeling. The spatial properties used top-most
coordinate T(l), left-most coordinate L(l), bottom-most
coordinate B(l), right-most coordinate R(l), area A(l), width
W(l), height H(l), aspect ratio AR(l), size S(l), and density
D(l). Among the spatial properties, the first 5 properties can
be obtained during connected-component labeling. The
others are derived from the 5 properties. The spatial
properties are then used to filter out some false detected
objects by using thresholding method. The method utilizes
two statistic moments: mean and variance of vehicles spatial
properties as the references of threshold assignment. The
313
thresholding operator shown in in Eq.(14) is used to filter out
false detected objects by width of bounding box. The WM(l, t)
and WV(l, t) are mean and variance of the width of all
moving objects found until now. For other spatial properties,
the operators utilized are similar.

otherwise nothing do
, ) , ( 2
) , ( ) (
component th - the Eliminate
t l WV
t l WM l W
l (14)

E. Prior Splitting by Lane Information

If vehicles are occluded when they just entrance to the
frame, the tracking processing might be confused.
Fortunately, most vehicles are occluded side by side
horizontally across adjacent lanes. Consequently, a prior
occlusion detection and resolution with the help of lane
information is proposed. First, a lane mask H(x, y) with
values: -1 (ignored), 0 (separated), 1 (first lane), and so on is
made as shown in Fig. 2(b). As each moving object is
specified a label ID l with a label ID image g(x, y) after
connected-component labeling. Next, each moving objects
are checked by Eq.(15) to obtain a histogram S(l,h) with
respect to lane ID h. Then, the occlusion detection is judged
by Eq.(16) with a reasonable threshold TH3 = 5. Finally, the
occlusion is resolved by split the moving objects referring to
H(x, y).
1 )) , ( , ( )) , ( , (
then , 0 ) , ( and 1 ) , ( and ) , (
+

y x H l S y x H l S
y x H y x H l y x g
(15)
resolution occlusion do
then , 3 ) 1 , ( ) , ( If TH h l -S h l S < +
(16)

(a) (b)
Fig. 2. (a) A background obtained after color background extraction with a
detection zone bounded by a magenta rectangle; (b)Visual representation of
lane information base on the background shown in (a). Different gray-level
color regions indicate different lanes. The white color regions are seperations
between lanes. The black color regions are ignored regions.

F. Update Trajectories and Eliminate Vibrated Moving
Objects

In order to reduce the computation power, the centers of
trajectories are used to relate current moving objects to
current existed trajectory. The relation used to check whether
a moving object should related to an existed trajectory or to
create a new trajectory is the distance between the centers of
current moving object and the last trajectory node (the
trajectory node of the k-th trajectory at time instance t is
denoted as l(k, t)). If the number of nodes in the existed
trajectory is greater than 1, the angle AC(l, k, t) of a vector
A(l, k, t) from the center of the last node C(l(k, t-1)) to the
center of moving object C(l)

and another vector B(l, k, t)
from the center of the second last node C(l(k, t-2)) to the
center of the last node is checked, too. If AC(l, k, t) > 0, the l
th moving object will satisfy the angle constraint (
o
60 ) with
the k-th trajectory at time instance t.
o
60 cos ) , , ( ) , , ( ) , , ( ) , , ( ) , , (
)) 2 , ( ( )) 1 , ( ( ) , , (
)) 1 , ( ( ) ( ) , , (
t k l t k l t k l t k l t k l AC
t k l t k l t k l
t k l l t k l
B A B A
C C B
C C A



(17)
If a moving object satisfies the distance constraint but
does not satisfy the angle constraint, a vibrated counter
associated to the trajectory will be increased by 1. If the
vibrated counter of the trajectory is greater than 3, the
trajectory will be thought as vibrated moving objects. That is,
the trajectory will be ignored.

G. Resolves Multiple-Vehicle Occlusions

In case the prior occlusion detection and resolution
stated in sub-section III.E fails, a post occlusion detection
and resolution technique based on the trajectories is used.
The following steps are used to check whether a moving
object is formed by occluded vehicles or not.
1. If a moving object can not relate to any existed
trajectories, go to step 2. Otherwise, add the moving object to
the trajectory.
2. If the region of the moving object plus an offset
estimated by the centers of the last two trajectory nodes is a
superset of the trajectory last node, go to step 3. Otherwise,
create a new trajectory for the moving object.
3. Split the moving object to two moving objects.

H. Calculate Traffic Parameters

In general, most traffic parameters can be derived by the
tracking trajectories. However, each trajectory has to classify
to a lane ID before we calculate traffic parameters. The
method used to classify a trajectory to a lane ID is specified
in Eq.(17). In Eq.(17), the way to obtain S(l,h) is the same as
Eq.(15) where l is extended to l(k, t-1) to indicate the k-th
trajectory at time instance (t-1).
314
) ), 1 , ( ( max arg ) , (
*
h t k l S t k h
h

(17)
Equations used to calculate traffic parameters are listed
in Table 1. Note that the moment we calculate the traffic
parameters is at the moment we delete the trajectory. Hence,
the last node of the just deleted trajectory is at time instance
t-1.
Table 1. Eqautions used to calculate traffic parameters.
Traffic Parameters Equations
Speed:
VS(h
*
)
) (
005 . 0
) 1 , ( 8
7
)) , ( (
8
1
)) , ( (
)) 1 , ( , ( ) 1 , (
* *
k W FPH t k N
C C
t k h VS t k h VS
t k N t k l t k l
t

+


where FPH is the number of frame per hour; N
t
(k,
t-1) is the number of nodes in the k-th trajectory at
time instance t 1; Cl is the center of the l-th
moving object;
) (k W
f is the average width of
nodes in the k-th trajectory.
Quantity:
VQ(h
*
)
If N(k, t-1) > 3, then
VQ(h
*
(k, t))=VQ(h
*
(k, t))+1
Headway:
VH(h
*
)
First, initialize tH (h) = 0 for all lane ID h.
)) , ( (
)) , ( (
)) , ( (
*
*
*
t k h VS
FPH
t k h t t
t k h HH
H


tH (h
*
(k, t)) = t
Volume:
VV(h
*
)
t
FPH t k h VQ
t k h VV

)) , ( (
)) , ( (
*
*

Occupancy:
VO
If N
t
(k, t-1) > 3, then
1 + OF OF

% 100
t
OF
VO



I. Parameter Automation

In order to adapt the MVDT system to different capture
view conditions, all parameters used in the system have to be
decided automatically. In this work, the average of weighted
small vehicle width or height mean is referred to tune the
system parameters. Before system start-up, the car height and
width statistics are gathered and the statistic results are stored.
The mean of the vehicle width or height is thought as a
separation between small vehicles and large vehicles. Then,
the small vehicle width or height mean is obtained by
calculate the average of small vehicle widths or heights (from
0 to the separation). If we want to update the small vehicle
width mean, all vehicles with their widths less than the mean
width will be taken into average.

IV. EXPERIMENTAL RESULTS

In the experimental results, two types of image
sequences (as shown in Fig. 3 and Fig. 4) captured in the
National Highway No.1 99km are tested. Each sequence
contains 10500 frames. The size of each image is 320240
and the frame rate of the sequence is 30 fps. The average
processing time is 148ms per frame. The proposed system is
developed on Windows XP platform with a Pentium-4 2.8
GHz CPU, 512M RAM.
Fig. 3 and Fig. 4 are two examples of occlusion
resolution. In Fig. 3, the two occluded vehicles on the right
side are split by lane mask. In Fig. 4, the two occluded
vehicles are split based on trajectories. Fig. 5 (a)~(c) are three
traffic parameters recorded at different time instance. In
addition, the accuracy rates of the traffic parameters are listed
in Table 2. Accurately, the parameters listed are the total
vehicle quantity, the total small vehicle quantity, and the total
large vehicle quantity for all lanes at time instance 1000,
2000, and 3000.


Fig. 3. An example of prior occlusion resolution


Fig. 4. An example of post occlusion resolution


Traffic parameters
VS(1) 98.4km/hr
VQ(1) 12
VQS(1) 12
VQL(1) 0
VH(1) 74m
VV(1) 1383/hr

VO 85.0%
(a)
315
Traffic parameters
VS(1) 98.2km/hr
VQ(1) 34
VQS(1) 29
VQL(1) 5
VH(1) 48m
VV(1) 1896/hr

VO 90.2%
(b)

Traffic parameters
VS(1) 105.0km/hr
VQ(1) 49
VQS(1) 41
VQL(1) 8
VH(1) 54m
VV(1) 1802/hr

VO 89.9%
(c)
Fig. 5. Traffic parameters of the first lane (the left-most lane); (a) the 1000-th
frame; (b) the 2000-th frame ; (c) the 3000-th frame

Table 2. The accuracy rate of quantities total, small, large vehicles
Traffic Parameters Accuracy Rate
Total quantity 96.6%
Total quantity of small vehicles 98.2%
Total quantity of large vehicles 95.0%


V. CONCLUSIONS

In this work, a MVDT system with parameter
automation, vehicle detection, prior splitting by lane inform-
ation, vehicle tracking, post splitting and comprehensive
traffic parameter calculation are proposed. In the beginning, a
spatio-temporal statistics based color background extraction
technique with luminance adapting and wrong convergence
compensation is utilized to segment moving objects robustly.
Next, prior splitting by lane information is exploited to
resolve occluded vehicles when the vehicles just entrance the
detection zone. However, some vehicles might be occluded
due to lane change in the middle of the detection zone. Hence,
after tracking vehicles by a distance or distance and angle
based relation, a post splitting technique is applied. Finally,
traffic parameters based on tracking trajectories are
calculated to benefit traffic monitoring. In the experimental
result, the data tells that the processing speed of the proposed
system achieves the real-time issue with high accuracy. In
addition, the proposed system can be setup without giving
any environment information in advance except the lane
mask. For such reason, in the further, a method automatically
detects lanes based on the background extracted will be
studied.

ACKNOWLEDGMENT

This work was supported by Nation Science Council,
Taiwan under Grant no. NSC94-2213-E-009-062

REFERENCE

[1] Shunsuke Kamijo, Yasuyuki Matsushita, Katsushi Ikeuchi, Masao
Sakauchi, Traffic monitoring and accident detection at intersections, IEEE
Transactions on Intelligent Transportation Systems, vol. 1, no. 2, pp. 108-118,
Jun. 2000.
[2] Andrew H. S. Lai and Nelson H. C. Yung, Vehicle-type identification
through automated virtual loop assignment and block-based direction-biased
motion estimation, IEEE Transactions on Intelligent Transportation Systems,
vol. 1, no. 2, pp. 86-97, Jun. 2000.
[3] Dae-Woon Lim, Sung-Hoon Choi, Joon-Suk Jun, Automated
detection of all kinds of violations at a street intersection using real time
individual vehicle tracking, IEEE International Conference on Image
Analysis and Interpretation, pp. 126-129, Apr. 2002.
[4] Rita Cucchiara, Massimo Piccardi, and Paola Mello, Image analysis
and rule-based reasoning for a traffic monitoring system, IEEE Transactions
on Intelligent Transportation Systems, vol. 1, no. 2, pp. 119-130, Jun. 2000.
[5] Baoxin Li, Rama Chellappa, A generic approach to simultaneous
tracking and verification in video, IEEE Transactions on Image Processing,
vol. 11, no. 5, pp. 530-544, May 2002.
[6] Hai Tao, Harpreet S. Sawhney, and Rakesh Kumar, Object tracking
with Bayesian estimation of dynamic layer representations, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp.
75-89, Jan. 2002.
[7] Christopher E. Smith, Scott A. Brandt, and Nikolaos P.
Papanikolopoulos, Visual tracking for intelligent vehicle-highway systems,
IEEE Transactions on Vehicular Technology, vol. 45, no. 4, pp. 744-759,
Nov. 1996.
[8] Surendra Gupte, Osama Masoud, Robert F. K. Martin, and Nikolaos P.
Papanikolopoulos, Detection and classification of vehicles, IEEE
Transactions on Intelligent Transportation Systems, vol. 3, no. 1, pp. 37-47,
Mar. 2002.
[9] Dieter Koller, Joseph Weber, and Jitendra Malik, Robust multiple car
tracking with occlusion reasoning, Third European Conference on
Computer Vision, pp. 186-196, Springer-Verlag, 1997.
[10] Thomas Bcher, Cristobal Curio, Johann Edelbrunner, Christian Igel,
David Kastrup, Iris Leefken, Gesa Lorenz, Axel Steinhage, Werner von
Seelen, Image processing and behavior planning for intelligent vehicles,
IEEE Transactions on Industrial Electronics, vol. 50, no. 1, pp. 62-75, Feb.
2003.
316

You might also like