Professional Documents
Culture Documents
Department
I.
I NTRODUCTION
R ELATED W ORK
Trafc light recognition was early studied in [4]. A nonreal-time and viewpoint dependent theory is applied to formulate the hidden Markov models. On the other hand, trafc
light recognition for still cameras is proposed for intelligent
transportation system (ITS) in [5]. [6] also present a videobased system for automatic red light runner detection which
also works on xed cameras. Omachi et al. [7] use pure
image processing techniques including color ltering and edge
extraction to detect the trafc light.
Fig. 2.
Real-time approaches to trafc light recognition are presented in [3], [8]. [3] propose real-time recognition algorithm
to identify the color of the trafc light in various detecting
conditions and assist the driver with color vision deciency. [8]
obsoletes color-based image detecting techniques and adopts
luminance to lter out false detections. By applying spot light
detection and template matching for suspended and supported
trafc lights, the work can be performed in real-time on
personal computers. [9] presents a non-real-time algorithm
with high accuracy by learning models of image features from
trafc light regions.
Recognition technique that tracks the trafc lights is early
proposed in [10]. The authors introduce the architecture of
detecting, tracking, and classifying for trafc light recognition and compare three different detectors, e.g. a high-end
HDR camera sensor, that advantageously perform to speed
or accuracy as a trade-off. Another tracking approach using
CAMSHIFT algorithm to obtain temporal information of trafc light is presented in [11]. The tracking algorithm also
narrows down the searching areas to improve performance on
image processing. Our system revises the architecture in [10],
proposes advanced detecting and tracking models that eliminate the use of costly color detector and template matching
in tracking, and can recognize realistic scenes accurately in
real-time on computationally constrained mobile devices.
III.
S YSTEM D ESIGN
A. Architecture
Figure 2 presents the block diagram of our system. Our
proposed system is designed with a multi-stage architecture.
In order to achieve real-time performance on a mobile device,
whose camera typically captures images at a rate of 30 frame/s
or lower, the processing time for each frame needs to be less
than 1/30 second. Therefore, we designed the algorithm of
each block to avoid complicated computation.
As shown in Figure 2, video frames read from the builtin camera are processed in RGB domain1 and a few simple
1 From the Android smartphone used in our experiments, the raw data
obtained from the camera is in RGB color domain.
Fig. 3. The output images after each processing stage; upper-left: original
image, upper-right: RGB ltered image, lower-left: connected-component
labeling and merging, and lower-right: trajectory model.
and
(1)
that the trafc light is not under-exposed, since for trafc light
recognition purpose the only object in consideration is the
trafc light. To do so, the system could estimate the ambient
lighting conditions, and adjust the exposure setting accordingly. It is also worth mentioning that setting the exposure to
the minimum provides a pre-ltering mechanism to eliminate
low luminous intensity areas that could have similar hue values
to the trafc light and a method to prevent motion blur.
C. Connected-Component Labeling and Merging
To distinguish individual blobs, connected-component labeling algorithm with 8-connectivity for the binary image is
applied to nd the area size and position of each blob. Each
blob is assigned with a identication (ID) number, and the area
size and the position in the frame is recorded along with the ID
number. In addition, a smallest bounded rectangle which can
contain the blob is found. Let hbi and wbi denote the height and
the width of this rectangle for the blob with ID i, and (xbi , ybi )
denote the center position of this rectangle. The occurrence of
noise attributes to disconnection of contiguous components due
to non-uniform diffusion in real-world. Therefore, those small
blobs should be merged as a whole. In our system, for each
blob bi we check if there are other blobs bj that are sufciently
close. If so, they are marked with the same ID number to merge
them into a larger blob. The following presents the criteria to
merge two blobs
"
"
"
(xbi xbj )2 + (ybi ybj )2 1.8( h2bi + wb2i + h2bj + wb2j )
(4)
.
D. Candidate Assignment and Criteria Filtering
From our observation of the experimental data, it is found
that trafc lights are generally in a circular shape and has a
moderate area size. Therefore, by applying the ltering criteria
of the dimension ratio in [8]
max(hbi , wbi ) < 1.8 min(hbi , wbi ))
(5)
, and lter out blobs with area size larger than 3600 pixels,
those that are unlikely to be a trafc light are ltered out. Note
that the constant 1.8 in equation 5 is chosen to maintain recall
at a certain level, i.e., avoiding the intended trafc light to be
ignored, but at the same time lter out as many false detections
as possible. The remaining blobs are dened as candidates of
trafc lights.
E. Geometry-Based Filtering
After the previous ltering schemes, the candidates are
similar in color, luminance intensity, shape, and size. Thus,
it is rather hard to distinguish them from the appearance.
To overcome this problem, we proposed two geometry-based
models to lter out most false detections. In addition, for both
models, temporal information is required since each candidate
appears in consecutive frames and its position is strongly
related to that of the frames before and after. As a result,
the same candidate in consecutive frames is recognized as a
whole and the spatial-temporal information, i.e., the position
of the candidate in the frame at a particular time, is recorded
for further use. To arrange the appearances (detections) of
400
350
300
y
300
y
200
250
200
100
14
150
100
12
10
Time (s)
200
400
600
100
50
200
300
x
400
500
600
T
!
t=0
(6)
Fig. 5.
(8)
5
4
3
2
1
10
9
8
0
+1
+2
+3
+4
+5
7
6
5
4
3
2
10
Fig. 6.
15
20
35
40
45
50
Feature
Set
Frame
coordinates
(x, y)
I
II
III
IV
!3
!
!
!
Estimated trafc
light position
(s, t)
!
!
Regression line
parameters
(c1 , c2 )
!
!
Another inaccuracy is caused by inaccurate vehicle coordinates reported by the GPS. [13] reports that the error could
be up to 10 meters with a consumer-grade GPS device. This
GPS error shifts the estimated coordinates to a higher or lower
level. Fortunately, for trafc light recognition, the error does
not signicantly affect the performance since it affects both
intended trafc light and the other false detections.
x = f
30
Distance (m)
The coordinate system of the image plane and the real world
TABLE II.
Fig. 7.
25
(10)
which is achieved by applying a rotation matrix to the original coordinate (x, y). However, another difculty is that the
elevation angle is unknown.
The elevation angle is approximately xed as a vehicle
approaches the intersection. As a result, the estimated height
of the trafc light should be kept xed during the approach.
Utilizing this fact, one simple approach is to evaluate all
possible angles in a range, so that the estimated height of
the trafc light is closer to be a constant. It is reasonable
to assume that the elevation angle is in the range from
5 to +5, since the elevation angle is calibrated manually
before recording. Moreover, the resolution for searching for
an optimal estimation of is set to one degree, as vibration
of the device also creates inaccuracy and a ner resolution
does not yield better results. Figure 8 shows an example of
estimating the elevation angle. Note that the variation of the
estimated height exist because of the vehicle vibration that
causes varying elevation angle.
E XPERIMENTAL R ESULTS
criteria ltering.
check mark ! means the inclusion of this feature in that particular
feature set.
3 The
TABLE I.
Stage
RGB color domain
Candidate assigning
Trajectory model
Projection model
Decision tree model
Sub steps
Name
RGB color ltering
Connected Components Labeling
Merging2
Candidates assigning
Low-pass lter
Linear regression
Camera projection
Angle correction
Classication
Total
Duration (ms)
2.602
11.247
0.012
0.072
0.011
1.201
0.009
0.222
0.324
Fig. 9.
Fig. 10.
A. Computation Time
The computation time of each stage in the proposed system
is measured and shown in Table I. One can observe that the
combined execution time for a frame is only 15.7 ms, only
half of the required 1/30 second for real-time processing on a
mobile device and half of the numbers in [8], which is obtained
from executions on personal computers.
B. Recognition Performance
For all 32 intersections, video sequences and recognized
candidate features are collected in advance. Afterwards, the
class of each candidate is labeled manually. With the features
and the labeled class of each candidate, a data set having
about 20,000 samples is constructed. Each sample is a possible
trafc light in a specic frame; thus, every possible trafc light
5 With
30
1
1
10
20
5
15
C2
Estimated y coordinate
25
10
5
5
0
5
10
10
50
0
Estimated x coordinate
Fig. 11.
50
2000
C1
2000
4000
Visualization of estimated coordinates and regression line parameters (1: ITL; -1: non-ITL)
TABLE III.
Filtering
w/o ltering
w/ ltering5
Trafc
light4
59 (59)
1222 (1500)
Non-Trafc
light
0 (296)
1333 (1500)
Precision
0.166
0.882
Performance
Recall
F1-score
1.000
0.285
0.815
0.845
C ONCLUSION
In this paper, we proposed a real-time trafc light recognition system that can be implemented on a mobile device
with limited computational resources. By using straightforward
and simplied detection schemes, processing in RGB color
domain, the computation time is greatly reduced. Although
the simple detection scheme produces many false detections,
the proposed geometry-based ltering scheme could be used to
eliminate most of them. Moreover, the system performs well
in realistic condition where the camera experiences vibration
when shooting the video. The overall F1-score of our system
is as high as 0.85 on average and our proposed system can
process each frame within 20 milliseconds. It is also worth
to mention that our proposed ltering scheme can be used to
combine with other detection scheme and to eliminate the false
detections time-efciently in a similar way.
ACKNOWLEDGMENT
This work was supported in part by National Science
Council, National Taiwan University, and Intel Corporation
under Grants NSC-101-2911-I-002-001 and NTU-102R7501.
R EFERENCES
[1]
4000
[2] E. Koukoumidis, L.-S. Peh, and M. R. Martonosi, Signalguru: leveraging mobile phones for collaborative trafc signal schedule advisory,
in Proceedings of the 9th international conference on Mobile systems,
applications, and services, ser. MobiSys 11. New York, NY, USA:
ACM, 2011, pp. 127140.
[3] Y. Kim, K. Kim, and X. Yang, Real time trafc light recognition system
for color vision deciencies, in Mechatronics and Automation, 2007.
ICMA 2007. International Conference on, 2007, pp. 7681.
[4] Z. Tu and R. Li, Automatic recognition of civil infrastructure objects
in mobile object mapping imagery using a markov random eld model,
in Proc. the 19th International Society for Photogrammetry and Remote
Sensing (ISPRS) Congress, 2000.
[5] Y.-C. Chung, J.-M. Wang, and S.-W. Chen, A vision-based trafc
light system at intersections, Journal of Taiwan Normal University:
Mathematics, Science, and Technology, vol. 47, no. 1, pp. 6786, 2002.
[6] N. Y. H.S. Lai, A video-based system methodology for detecting red
light runners, in Proceedings of the IAPR Workshop on Machine Vision
Applications, 1998, pp. 2326.
[7] M. Omachi and S. Omachi, Trafc light detection with color and edge
information, in Proc. IEEE International Conference on Computer
Science and Information Technology (ICCSIT), 2009, pp. 284287.
[8] R. de Charette and F. Nashashibi, Real time visual trafc lights
recognition based on spot light detection and adaptive trafc lights
templates, in Proc. IEEE Intelligent Vehicles Symposium, 2009, pp.
358363.
[9] Y. Shen, U. Ozguner, K. Redmill, and J. Liu, A robust video based trafc light detection algorithm for intelligent vehicles, in IEEE Intelligent
Vehicles Symposium, 2009, pp. 521526.
[10] F. Lindner, U. Kressel, and S. Kaelberer, Robust recognition of trafc
signals, in IEEE Intelligent Vehicles Symposium, 2004, pp. 4953.
[11] J. Gong, Y. Jiang, G. Xiong, C. Guan, G. Tao, and H. Chen, The
recognition and tracking of trafc lights based on color segmentation
and camshift for intelligent vehicles, in IEEE Intelligent Vehicles
Symposium, 2010, pp. 431435.
[12] J. J. More, The Levenberg-Marquardt algorithm: Implementation and
theory, Lecture Notes in Mathematics, vol. 630, pp. 169173, 1978.
[13] M. G. Wing, A. Eklund, and L. D. Kellogg, Consumer-Grade
Global Positioning System (GPS) Accuracy and Reliability, Journal
of Forestry, vol. 103, no. 4, pp. 169173, June 2005.
[14] L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp.
532, 2001.
[15] G. Dupret and M. Koda, Bootstrap re-sampling for unbalanced data in
superviced learning, European Journal of Operational Research, vol.
134, no. 1, pp. 141156, 2001.