You are on page 1of 14

1118 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO.

5, MAY 2017

Real-Time Traffic Light Recognition


Based on Smartphone Platforms
Wei Liu, Shuang Li, Jin Lv, Bing Yu, Ting Zhou, Huai Yuan, and Hong Zhao

Abstract— Traffic light recognition is of great significance assistance or even autonomous driving [1]–[8]. For example,
for driver assistance or autonomous driving. In this paper, in order to drive safely through road intersections, Google’s
a traffic light recognition system based on smartphone platforms self-driving car has mounted a camera positioned near the
is proposed. First, an ellipsoid geometry threshold model in Hue
Saturation Lightness color space is built to extract interesting rear-view mirror for traffic light recognition [8]. In recent
color regions. These regions are further screened with a post- years, with the increase of computation power, the application
processing step to obtain candidate regions that satisfy both of smartphones in the driving assistance has gradually become
color and brightness conditions. Second, a new kernel function a hot research field [9]–[12]. Compared with the commercial
is proposed to effectively combine two heterogeneous features, driving assistance systems that use dedicated hardware, the
histograms of oriented gradients and local binary pattern, which
is used to describe the candidate regions of traffic light. A kernel driving assistance application based on smartphone has several
extreme learning machine (K-ELM) is designed to validate these advantages such as low cost, easier usability, and upgradability.
candidate regions and simultaneously recognize the phase and Several interesting works for traffic light recognition on
type of traffic lights. Furthermore, a spatial-temporal analysis smartphone platforms have been reported. For instance,
framework based on a finite-state machine is introduced to Roters et al. [9] present a mobile vision system to detect
enhance the reliability of the recognition of the phase and type
of traffic light. Finally, a prototype of the proposed system is pedestrian light in live video streams to help pedestrians with
implemented on a Samsung Note 3 smartphone. To achieve a visual impairment cross roads. In [10], a real-time red traffic
real-time computational performance of the proposed K-ELM, light recognition method is proposed on mobile platforms. The
a CPU–GPU fusion-based approach is adopted to accelerate the method consists of real-time traffic light localization, circular
execution. The experimental results on different road environ- region detection, and traffic light recognition.
ments show that the proposed system can recognize traffic lights
accurately and rapidly. Due to the ego movement of the vehicle as well as the
variety of outdoor conditions, accurate traffic light recognition
Index Terms— Finite-state machine, geometry threshold model, is still faced with various challenges [5], [6], [9]:
kernel extreme learning machine (K-ELM), smartphone, traffic
light recognition. 1) varying unknown environment;
2) the interference of other light sources, such as billboards
I. I NTRODUCTION and street lamps;
3) the impact of different weather and illumination
T YPICAL traffic scenes contain a lot of traffic information,
such as road signs, road markings, and traffic lights.
Usually, it is not easy for the drivers to keep attention to
conditions;
4) the change of viewing angles and sizes of traffic lights
the various presenting traffic information. The distraction, due to the ego motion of the vehicle;
visual fatigue, and understanding errors of the drivers can 5) various appearances of traffic lights, e.g. with or without
lead to severe traffic accidents. Especially, as the traffic the countdown timer;
lights are used to direct the pedestrians and vehicles to pass 6) the existence of different types of traffic lights, which
the intersections orderly and safely, it is of great impor- indicate different meanings, such as the traffic light with
tance to recognize and understand them accurately. Therefore, a round lamp and the one with an arrow lamp;
many research institutions are striving to recognize the traffic 7) the functions of autofocus and automatic white balance
lights using in-car cameras to assist the driver to under- of on-board cameras or smartphones that may result in
stand driving conditions. This function is critical to driving color cast or blur;
8) the requirement of the real-time processing behavior of
Manuscript received December 19, 2014; revised July 13, 2015 and
October 13, 2015; accepted December 22, 2015. Date of publication January 6, the traffic light recognition algorithm.
2016; date of current version May 3, 2017. This work was supported In order to solve the above problems, we present a traffic
in part by the National Natural Science Foundation of China under light recognition system on smartphone platforms. Different
Grant 61273239 and in part by the Fundamental Research Funds for the
Central Universities of China under Grant 151802001. This paper was from [9], the smartphone is fixed on the front windshield of the
recommended by Associate Editor A. Kokaram. ego vehicle with a bracket. The system recognizes the traffic
W. Liu, H. Yuan, and H. Zhao are with the Research Academy, Northeastern light, including its phase (red or green) and type (round and
University, Shenyang 110179, China (e-mail: lwei@neusoft.com).
S. Li, J. Lv, B. Yu, and T. Zhou are with the Advanced Automotive straight arrow) information, and reminds the driver to follow
Electronics Technology Research Center, Neusoft Corporation, the indications of traffic lights.
Shenyang 110179, China. The system consists of three stages: 1) candidate region
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. extraction; 2) recognition; and 3) spatial-temporal analysis.
Digital Object Identifier 10.1109/TCSVT.2016.2515338 In the stage of candidate region extraction, an ellipsoid
1051-8215 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1119

CIE chromaticity diagram. Color feature is therefore widely


used to detect traffic lights. There are some common
color spaces, such as red–green–blue (RGB) [5], [6],
[9], [13]–[15], Hue Saturation Intensity (HSI) [16], [17],
Fig. 1. Vertical-type traffic lights.
and HSV [3], [18]–[20]. The RGB color space is not
robust against the change of illuminations. Comparing
geometry threshold model in Hue Saturation Lightness (HSL) with RGB color space, HSI and HSV color spaces are
color space is built to extract interesting color regions, insensitive to luminance fluctuation and similar to the color
which can resolve the incorrect segmentation problem in perception of human. Thus, many researchers use these color
the existing linear color threshold method and avoid the spaces as a measure for distinguishing predefined colors of
problem of color cast to a certain extent. Meanwhile, traffic lights [3], [16]. Besides the color spaces mentioned
these regions are further screened with a postprocessing above, other color spaces such as YCbCr [21] and CIELab [22]
step and the candidate regions that simultaneously sat- are also used to extract traffic light candidate regions.
isfy both color and brightness conditions are obtained. In Although the color spaces used are different, most of
the stage of recognition, a new nonlinear kernel func- the existing methods determine whether a pixel is an inter-
tion is proposed to effectively combine two heterogeneous esting color pixel or not by the linear color threshold
features [histograms of oriented gradients (HOG) and local method [1], [3], [4], [14], [15]. That is to say, given the
binary pattern (LBP)], and a kernel extreme learning predefined range thresholds Tmin and Tmax in a certain color
machine (K-ELM) is designed to verify if a candidate region channel, a pixel is an interesting color pixel if its correspond-
is a traffic light or not and to simultaneously recognize the ing component value Tv satisfies Tmin < Tv < Tmax . The
phase and type of traffic lights. In the stage of spatial-temporal color-threshold-based segmentation method has a common
analysis, a multiframe recognition framework based on tradeoff problem: an increased false positive rate (FPR) due
finite-state machine is introduced to further increase the to the wide color threshold range or a decreased true posi-
reliability of recognition over a period of time. tive rate (TPR) due to a narrow color threshold range [23].
Besides, this system has been implemented on a smartphone Especially, this situation could get worse if color cast appears
platform. For real-time performances, some additional work due to the white balance problem or the severe exposure
is also done, including a quick lookup table (LUT)-based of camera to external illumination. Consequently, a robust
color candidate region extraction and a CPU–GPU-based color segmentation method is required that considers various
acceleration of the K-ELM execution. illumination conditions.
The remainder of this paper is organized as follows. The Color feature alone is inadequate to generate traffic light
related work is presented in Section II. The system framework candidate regions because of the existence of some inter-
is presented in Section III. The details of the proposed system ference such as billboards and trees. In addition, complex
are described: candidate region extraction (Section IV), scenes and cluttered backgrounds may cause many false pos-
recognition in single images (Section V), spatial-temporal itives. It is necessary to use additional features to distinguish
analysis (Section VI), and the system implementation on traffic lights and overcome interference effects. Thus, many
smartphone platforms (Section VII). In Section VIII, the researchers integrate some specific features to eliminate the
experimental results are provided in comparison with the interference regions, such as the geometrical feature and the
state-of-the-art methods. Finally, conclusions and future work brightness feature. Yu et al. [1] and Shen et al. [24] use
are made in Section IX. the geometrical features (aspect ratio, area, and pixel density)
to obtain precise candidate regions of lamps. In [25], the color,
II. R ELATED W ORK brightness, and structural features are employed individually
Traffic lights are very different across the world. A typical to obtain a set of traffic light candidate locations. Lindner
traffic light consists of three lamps arranged vertically. et al. [26] propose to eliminate some interference regions by
The colors of the lamps are red, yellow, and green from detecting the circular edge of the lamp candidate region.
top to bottom, and they are either round or arrow shaped In addition, several interesting works have been reported
(see Fig. 1). The color of the active lamp represents the for detection on the candidate regions. For instance, in [27],
phase of the traffic light, i.e., red and green, and indicates the the use of inter-component difference information for effective
passable condition of the corresponding lane. In recent years, color edge detection is proposed. In [28], a novel framework
there have been traffic lights with count-down timers. Of these for saliency detection, which first models the background with
traffic lights, the yellow lamp in the middle is replaced with deep learning architecture and then separates salient objects
a count-down timer indicating the time remaining. Currently, from the background, is proposed.
most of the existing traffic light recognition approaches are In recent years, in order to shorten the computation cost
composed of two main processes: generation and verification and reduce the risk of getting incorrect candidate regions,
of the traffic light candidate regions. some techniques adopt additional sensors such as the global
positioning system (GPS) and pre-existing maps containing
A. Generation of Traffic Light Candidate Regions traffic light locations. For example, Fairfield and Urmson [8]
According to the CIE standard, each color of traf- propose a method to predict the positions of traffic light with
fic lights is usually defined in a specific area in the a prior map. The predicted positions are then projected into
1120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

the image frame using a camera model, and serve as traffic


light candidate regions. To improve the recognition accuracy,
in [29], an on-board GPS sensor is employed to identify the
traffic light candidate regions.

B. Verification of Traffic Light Candidate Regions


Although the simple geometric verification can remove
some nontraffic light candidate regions, some interference
regions that are similar to traffic lights still exist, for example,
the car tail light. Therefore, it needs a further verification to
check the candidate regions. The most common verification
methods are template matching and machine learning. The
Fig. 2. Overview of the proposed traffic light recognition system.
former method uses some templates of traffic lights, which
are predefined by a priori knowledge, to verify the candidate
regions, as in [2] and [4]. The advantage of this method
is its simplicity, while the disadvantages lie in its strict
demands of the accuracy of the candidate regions’ boundaries
and its low robustness. Compared with the former method,
the latter method has the advantages of high accuracy and
robustness. For instance, in order to verify the candidate
regions of traffic lights, the AdaBoost algorithm and the Fig. 3. Extraction process of the traffic light candidate region.
Haar-like feature are adopted in [3]. Jang et al. [23] employ
a support vector machine (SVM) with the HOG feature for
the traffic light verification. In [29], a convolutional neural represents a transient phase between the green light and the
network (CNN) is used to recognize the phase of the traffic red light and it serves as a warning signal. It is even absent
lights, i.e., red or green under normal illumination conditions. for the traffic light with countdown timer. Fig. 2 describes the
In [30], the H component in HSI space is classified with Back proposed recognition system.
Propagation (BP) neural network to verify the traffic light The recognition system in this paper includes three stages:
candidate regions. Chiang et al. [31] employ an SVM with 1) candidate region extraction based on color and brightness
the LBP feature for traffic light verification. Recently, a new information; 2) K-ELM-based single-frame traffic light
classification method in [32] based on neural networks and recognition; and 3) multiframe traffic light recognition with
midlevel features has shown promising results. Nevertheless, spatial-temporal analysis. The details of the proposed system
due to the complexity and the real-time capability, this method are presented in the following sections.
is still a challenge when applied on mobile platforms.
To resolve the issues of low accuracy and reliability in the IV. E XTRACTION OF T RAFFIC L IGHT
single-frame-based traffic light recognition algorithm, some C ANDIDATE R EGIONS
researchers expand the recognition method into video stream-
Since the traffic light is brighter than most of the back-
ing (multiframe). For example, the CAMSHIFT algorithm ground and has special color, the recognition system proposed
is applied to track the candidate regions of traffic lights in
in this paper combines a color segmentation with a bright
[3]. Gómez et al. [6] use a motion estimation method to
region extraction algorithm to generate the traffic light candi-
track the traffic light in the video streaming and feedback date regions. The system has three stages: 1) the extraction of
the current phase of traffic light. In [24], a video-sequence-
color candidate region; 2) the extraction of lamp candidate
based decision scheme is proposed. It avoids the temporary region; and 3) the generation of the traffic light candidate
inconsistency in the verification of candidate regions. Gómez
region. The extraction process is shown in Fig. 3.
et al. [6] present an Hidden Markov Model to find the optimal
state sequence associated with the given observation sequence,
aiming to obtain the best performance in the determination of A. Color Candidate Region Extraction Based
the traffic light phases. In [33], an interacting multiple-model on Ellipsoid Geometry Threshold Model
filter is used to track the traffic light through the time and to In this paper, the HSL color space is adopted to extract
increase traffic light recognition performances. These methods the traffic light candidate regions. The space is better matched
described above improve both the reliability and the precision to visual perception, with less correlated color channel [34].
of traffic light recognition. Shen et al. [24] have pointed out that the color appearances of
the traffic lights are concentrated around several predetermined
III. T RAFFIC L IGHT R ECOGNITION S YSTEM specific colors so that they can be well described by Gaussian
A traffic light recognition system based on smartphone distributions. Therefore, similar to [24], we model the color
platforms is proposed to recognize the vertical traffic lights features of the traffic lights as 1D Gaussian distributions.
in urban environment. We are only interested in recognizing First, we model the hue, saturation, and lightness according
the red and green traffic light, since the yellow traffic light to 1D Gaussian distributions. The value of the color channel k
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1121

Fig. 4. Statistical distributions of the pixels in manually labeled traffic light


regions. (a) Statistical distribution of red pixels. (b) Statistical distribution of
green pixels.

Fig. 5. Ellipsoid geometry threshold models. (a) Ellipsoid geometry threshold


at each pixel is defined as Ck , k = 1, 2, 3, Ck ∼ N(μk , σk2 ). model of red. (b) Ellipsoid geometry threshold model of green.
μk and σk2 are the mean and variance of color channel k,
respectively.
Then, the interesting pixels of red and green traffic light of red pixels (Hr , Sr , L r ) and green pixels (Hg , Sg , L g ) are
candidate regions are generated by the following equations: expressed as
⎧   l   

⎪ 1, if H ∈ H , H h ∪ S ∈ Sl , S h


r

r1 r1  r r1 r1 (Hr − Hri )2 (Sr − Sri )2 (L r − L ri )2

⎪ ∪ L r ∈ L lr1 , L r1h + + ≤ 1, i = 1, 2
⎨   l    h ri2 2
sri lri2
br = or Hr ∈ Hr2 , Hr2h ∪ S ∈ Sl , S h
r (1)

⎪   r2 r2 (3)

⎪ ∪ L r ∈ L r2 , L r2
l h

⎪ (Hg − Hg0)2 (Sg − Sg0 )2 (L g − L g0 )2
⎩0, else + + ≤1 (4)
⎧ h 2g sg2 l g2
  l   

⎨1, if Hg ∈ Hg , Hg ∪
h Sg ∈ Sgl , Sgh
 where Hr ∈ (Hril , Hrih ), Sr ∈ (Sri l
, Sri
h
), L r ∈ (L lri , L ri
h
),
bg = ∪ L g ∈ L lg , L hg (2)

⎩ Hg ∈ (Hg , Hg ), Sg ∈ (Sg , Sg ), and L g ∈ (L g , L g ). The
l h l h l h
0, else.
centers of the ellipsoids are coincident with the centers of
Here, the value range in color channel k is the cubes expressed as (3) and (4). The other parameters of
(μk − λ · σk , μk + λ · σk ) and λ = 3. the ellipsoids can be calculated as follows:
In order to learn parameters μk and σk , the training images ⎧  h 
⎪ l 2
⎨(2h ri ) = 4 Hri − H ri 
2
with red and green traffic lights are collected, respectively, 
and the traffic lights regions are labeled manually. With all (2sri )2 = Sri h − Sl 2 + L h − L l 2 (5)

⎩  ri  ri ri 
the pixels in the manually labeled traffic light regions, the (2lri )2 = Sri h − Sl 2 + H h − H l 2
ri ri ri
parameters μk and σk can be estimated. As the samples are ⎧  h 2
collected from different weather and illumination conditions, ⎪
⎨(2h g ) = 4 Hg − Hg 
2 l
2 2
the parameters can adapt to different environments. With the (2sg )2 = Sgh − Sgl + L hg − L lg (6)

⎩   2  2
ranges determined above, three cubes (two for red and one (2l g )2 = Sgh − Sgl + Hgh − Hgl .
for green) can be determined, and the centers of the cubes
are denoted by (Hri , Sri , L ri ), i = 1, 2, and (Hg0, Sg0 , L g0 ), Fig. 5 shows the visualization of the ellipsoid geometry
respectively. In Fig. 4(a) and (b), the statistical distributions of threshold models and the statistical distributions of the pixels
the pixels in manually labeled traffic lights regions are shown in manually labeled traffic light regions. Compared with the
for the red and the green pixels, respectively. cubes built by the traditional linear color threshold method,
From the results in Fig. 4, one can see that the pixels are the proposed ellipsoid geometry model can eliminate a large
gathering in the compact regions around the centers of the amount of the color pixels that are uninteresting to our
cubes, instead of filling them. It is also clear from Fig. 4 algorithm. In addition, more than 99% of the interesting
that the colors of the pixels in the corners of the cubes are color pixels from the training images are contained within the
very different from the colors of those at the center. Since ellipsoids defined by the above parameters. More test images
a cube without corners resembles an ellipsoid, an ellipsoid with traffic lights have also been collected and the statistic of
geometry threshold model is proposed in this paper. The model these images reveals a similar conclusion.
1122 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

Fig. 6. Sketch image of regions. (a) Color candidate region. (b) Expanded
region.

Compared with the traditional methods, the proposed


ellipsoid geometry threshold model has the following
advantages.
1) It resolves the incorrect segmentation problem in
traditional methods and improves the accuracy of the
candidate color pixel extraction.
2) It avoids the problem of traffic light color cast to a
certain extent. Fig. 7. Results of extraction of lamp candidate regions. (a) Original color
3) It saves the processing time for the subsequent color image. (b) Color candidate regions (marked in yellow). (c) Image after top-hat
transform. (d) Lamp candidate regions (marked in yellow).
candidate region extraction and verification process, as
it filters out some uninteresting color regions.
After extracting the interesting color pixels, we conduct an ones in the region Ri . M is a threshold and M = WH/4. σ Ri
eight-connected-component labeling to generate lamp candi- and σ Ri represent the average gray values of region Ri and
date regions. Similar to [1], several geometrical features of region Ri , respectively. They are calculated as follows:
each candidate region are computed: the pixel density, the
(x,y)∈Ri f (x, y)
aspect ratio, and the area. Using these geometrical features, σ Ri = (8)
N Ri
some color regions that unlikely belong to lamp candidate
regions are eliminated. As shown in Fig. 7(d), the vehicle’s (x,y)∈RiE f (x, y) − (x,y)∈Ri f (x, y)
σ Ri = (9)
right tail light is eliminated as it does not satisfy the aspect N R E − N Ri
i
ratio constraint.
where f (x, y) represents the gray value of pixel (x, y) and
B. Extraction of Lamp Candidate Regions N R E represents the total number of pixels in the expanded
i
Due to the influence of complex background, weather, region RiE.
illumination conditions and other light sources, some inter- Fig. 7 shows some results of the lamp candidate regions.
ference regions still exist in the obtained color candidate In Fig. 7(d), it can be observed that some regions that do not
regions. Considering that an active lamp is brighter than the satisfy the brightness condition are removed.
surrounding local region, this characteristic can be used as a It is notable that for the purpose of illustration, in Fig. 7(c),
postprocessing step to extract the lamp candidate region while the top-hat transform result of the whole image is provided.
removing the interference regions. The extraction process is However, in the real-world application, considering the com-
as follows. putational performance, the top-hat transform is applied only
First, an expanded region RiE is built for the color candidate to the expanded region RiE.
regions Ri , i = 1, 2, . . . , N. As shown in Fig. 6, the height
and width of the minimum enclosing rectangle of the color C. Generation of Traffic Light Candidate Regions
candidate region are denoted by H and W , respectively. The
As we know, a typical traffic light consists of three lamps
region between the boundaries of RiE and Ri is denoted by Ri .
arranged vertically with equal sizes and the order of the three
Then, the original color image of the expanded region RiE is lamps is fixed as red, yellow(or count-down timer), and green
converted into a grayscale image. The top-hat transform is from top to bottom, as shown in Fig. 1. Considering that a
applied to eliminate the influence of uneven illumination. Here, backboard is often around a traffic light in most traffic scenes,
a square structuring element whose width is 11 pixels is chosen this structural information can also be used to select real red
for the top-hat filter. and green traffic lights from the candidates. Here, according to
Finally, a color candidate region Ri is labeled a lamp can- the relationship (the relative positions and size ratios) between
didate region if the region satisfies the following conditions: active lamps and the backboard, we can generate the traffic

Ni < M light candidate regions.
(7) It should be noted that for the traffic light with count-down
σ Ri > σ Ri
timer, since the active lamp and the count-down timer have
where Ni = N  E − N Ri , N Ri represents the numbers of pixels the same color and are extremely close to each other, these
Ri
in the region Ri , and N  E represents the number of pixels two regions might be probably extracted as only one single
Ri region. This region would be easily treated as one single
in the expanded region RiE whose color is the same as the lamp, and it will lead to the generation of a wrong traffic
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1123

light candidate region. To solve this problem, we generate the where K (xi , x j ) represents the proposed kernel function, xi is
traffic light candidate region RiTL according to the aspect ratio the feature vector of sample i , xi = [x iHOG , x iLBP ], and
Ai (Ai = H /W ) of the obtained lamp candidate region, as x iHOG , x iLBP represent the feature vectors of HOG and LBP,
shown in respectively. β is a combination coefficient, which determines
⎧ the contribution of each feature, and β ∈ [0, 1]. By (13), the

⎪ X L = max(1, xl − K )

⎨ X = min(c, x + K ) if R is a red lamp HOG feature and the LBP can be combined with different β.
R r i It is worth noting that the method of combinative HOG–LBP
(10)

⎪ YT = max(1, yt − K ) candidate region features in [35] is only a special case of the proposed method,


Y B = min(r, yt + 7K ) which is equal to β = 0.5. More details can be seen from the
⎧ experimental results in Section VIII.

⎪ X L = max(1, xl − K )

⎨ X = min(c, x + K ) if R is a green lamp In this paper, a traffic light candidate region is first converted
R r i
(11) into grayscale and is then scaled to a size of 20 × 40 pixels,

⎪ Y = max(1, y − 7K ) candidate region which is used to extract the features of HOG and LBP.


T b
Y B = min(r, yb + K ) For HOG feature, the block size is 10, the cell size is 5,
and the orientation bin number is 9; for LBP feature, we
where (X L , YT ) and (X R , Y B ), respectively, denote the extract 58D uniform patterns and 1D nonuniform pattern per
left-top and right-bottom vertices of the traffic light candidate block, and the feature vectors of all the blocks are concate-
region RiTL , (xl , yt ) and (xr , yb ), respectively, represent the nated as the LBP feature of the candidate region. For each
left-top and right-bottom vertices of the minimum enclosing traffic light candidate region, the dimension of the feature
rectangle of the lamp candidate region Ri , and r and c, vector is 1995. In order to reduce the computation burden,
respectively, represent the height and width of the whole the between-category to within-category sums of squares
image. Here, K can be determined as follows: (BW) method [38] is adopted to reduce feature dimensions.
 The strategy of BW is to select the features with large
K = W/2, if Ai ≥ 1.5
(12) between-category distances and small within-category dis-
K = (W + H )/4, else. tances. Considering the algorithmic acceleration based on
OpenCL (to be described in Section VII-B), the dimension
V. R ECOGNITION OF THE T RAFFIC of the feature vector is reduced to 256, and then it is input to
L IGHT IN A S INGLE I MAGE the K-ELM with the proposed kernel function.
After the above procedures of traffic light candidate region B. Recognition of Traffic Light Candidate Regions
extraction, there also exists the influence of interfering light
sources, such as car tail lights. In order to verify whether a Extreme learning machine (ELM) is a machine learn-
candidate region is a traffic light or a background and simulta- ing method with fast training speed and suitable for the
neously recognize the type of the traffic light (round, straight multicategory classification task [39]. Many research results
arrow, left-turn arrow, right-turn arrow, etc.), in this section, show that ELM produces comparable or better classification
a new nonlinear kernel function is proposed to effectively accuracies with implementation complexity compared with
combine two heterogeneous features, HOG and LBP, which artificial neural networks and support vector machines [40].
is used to describe the traffic lights. In addition, a K-ELM is Furthermore, it has been pointed out that K-ELM achieves
designed to recognize the candidate region. good generalization performance; meanwhile, there is no ran-
domness in assigning connection weights between input and
hidden layer and the number of hidden nodes does not need to
A. Feature Extraction of Traffic Light Candidate Regions be given [41], [42]. Therefore, we select the K-ELM to verify
HOG and LBP are two heterogeneous features with com- whether a candidate region is a traffic light or the background
plementary information. The combination of the two features and recognize the phase and type of the traffic light. The output
can extract contour and texture information simultaneously function of K-ELM can be written compactly as
and has obtained effective results in the applications such  −1
T I
as pedestrian detection and face recognition [35]. Thus, we f (x) = h(x)H + HH T
T
λ
use HOG–LBP feature to describe the traffic light candidate ⎡ ⎤T
region. In [36], HOG and LBP are directly concatenated to K (x, x 1 )  −1
⎢ .. ⎥ I
form a feature vector, while the contributions of each feature =⎣ . ⎦ +  ELM T (14)
are not considered, and the descriptive ability of the features λ
K (x, x N )
is not fully exploited. Inspired by [37], a new nonlinear
kernel function is proposed to combine the two heterogeneous where f (x) is the output of K-ELM; N is the number of
features training samples; x i (i = 1, 2, . . . , N) expresses the feature
vector of training samples; x is the feature vector of a traffic
K (xi , x j ) light candidate region, i.e., the input to the K-ELM classifier;
⎛ 2 ⎞
(1−β) x iHOG −x HOG +β x LBP −x LBP 2 ELM is the kernel matrix for the classifier; ELM = H H T :
= exp⎝− ⎠ (13)
j i j
ELMi j = h(x i )h(x j ) = K (x i , x j ); K (x i , x j ) represents the
γ proposed kernel function in (13); T = [t1 , t2 , . . . , ti , . . . , t N ]T
1124 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

is the target vector; ti = [ti1 , ti2 , . . . , tim ] is the output


vector of the i th training sample; and λ is the regularization
coefficient.
In this paper, two K-ELM classifiers are trained according
to the color information of the candidate regions: one for
the recognition of green traffic lights and the other for the
recognition of red traffic lights. For each color in the traffic
lights, the output of K-ELM is designed as six classes, i.e.,
m = 6, which correspond to nontraffic lights and five different
types of traffic lights that are round, straight arrow, left-turn
Fig. 8. Maintenance process of the information queue.
arrow, right-turn arrow, and unknown type. The unknown type
indicates the traffic light whose type cannot be determined.
Once a candidate region is verified by the corresponding
K-ELM classifier, it is regarded as a traffic light. The phase of
this traffic light is determined to be the color of the candidate
region, and its type is recognized by the K-ELM’s output
vector via the maximum operation.

VI. R ECOGNITION OF THE T RAFFIC L IGHT BASED


ON S PATIAL -T EMPORAL A NALYSIS

A. Traffic Light Phase Recognition Fig. 9. Framework of a finite-state machine.


Using a Finite-State Machine
The recognition of the traffic light in single images was locations are associated with L by the NN rule. In this paper,
described in Sections IV and V. The traffic lights in the image the Lucas-Kanade algorithm [43] is adopted for tracking the
were extracted and recognized, and the phase and type of traffic lights. All tracking points are chosen at the positions
recognized traffic light can also be given. However, some false where obvious features could be extracted, such as the corners
positive recognition results might exist. In order to increase the of the traffic light backboard. It should be noted that for each
reliability of recognition over a period of time, in this section, of the unassociated recognized traffic lights, a new queue is
the traffic light recognition is extended from single images to established.
multiframe images. After establishing the information queues, a spatial-temporal
1) The number of phases of the traffic light is limited analysis framework based on a finite-state machine is intro-
(red or green) and each phase will last for a certain duced to enhance the reliability of the recognition of traffic
period of time. In fact, there exists an optional yellow lights. For each recognized traffic light, four states exist in
or count-down timer phase, but in our system, we ignore its life cycle: 1) initial state; 2) candidate state; 3) validation
this phase. state; and 4) end state. The finite-state machine is able to
2) For each traffic light, it can only be at one particular clearly describe the transitions between these states (shown
phase at one time, namely, either red or green light is in Fig. 9) and the required conditions of these transitions.
switched ON. In this paper, only the traffic lights at validation state have
The above rules can be applied to improve the recogni- phase recognition results. Thus, some occasional single-frame
tion performance of traffic light in a single frame. In this false positive recognition results can be reduced.
section, we introduce an information queue Si , which allows This finite-state machine can be interpreted as follows.
a verification by multiframe spatial-temporal analysis. The For a new recognized light L j , which has not been associ-
information queue Si is used to record the recognition results ated with any existing queues, a new queue is established and
of the recent Q size times of the i th recognized traffic light, its state information is initialized. At this moment, the light L j
and the recognition results consist of phase, type, and loca- is at the initial state. This state is a temporary state. Once
tion. The Q size denotes the size of the information queue. the initialization is complete, it transits to the candidate state.
The maintenance process of the information queue is shown This process corresponds to the state transition process T1
in Fig. 8. in Fig. 10.
First, the existing information queues S = {S1 , S2 , . . . , For a traffic light L j at candidate state or validation state,
Si , . . . , S K } and the recognized traffic lights L = it will enter the end state when there are NV consecutive
{L 1 , L 2 , . . . , L j , . . . , L N } at time t + 1 are associated by the void flags in the queue, which means the traffic light is
nearest neighbor (NN) rule. Then, according to the association not successfully associated continuously. The queue will be
results, the information queues will be updated: if Si is deleted afterward. This process corresponds to T2 .
associated to L j , a new piece of information corresponding For a traffic light L j at candidate state, it will turn into the
to L j is added; otherwise, a void flag is pushed. validation state if the validation condition is met. This process
Here, in order to perform the association, the traffic lights corresponds to T3 . Otherwise, it will maintain the current
represented by S need to be tracked. Then the updated tracked candidate state, which corresponds to T4 . The validation
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1125

where k ∈ {1, 2, 3, 4, 5} represents the type of the traffic light


to be round, straight arrow, left-turn arrow, right-turn arrow,
and unknown type, respectively. Ct represents the type of the
traffic light at time t.

VII. S YSTEM I MPLEMENTATION ON


S MARTPHONE P LATFORMS
In our research, this recognition system is implemented on
a Samsung Note 3 smartphone. The Note 3 is equipped with
a quad-core Krait 400 architecture CPU at up to 2.3 GHz per
core and an Adreno 330 GPU with a frequency of 450 MHz
and 3G RAM.
With limited computing resources on smartphone platforms,
more efficient solutions need to be explored to achieve
real-time performances.

A. Quick Extraction on Color Candidate Regions


As the image sequence captured by the Samsung Note 3
smartphone is in the YUV (YUV4:2:0) color space, it needs to
be converted into HSL space. Then one can apply the proposed
ellipsoid geometry threshold model to judge if a pixel is an
interesting color pixel or not. However, the process of color
space conversion and judgment will cause a sharp increase in
computation cost. To reduce computation load on the device,
Fig. 10. Acceleration process of the K-ELM. (a) Proposed acceleration
approach for one candidate region. (b) Parallel process of multiple candidate we combine both color space conversion and interesting color
regions. pixel judgment process in an LUT. The storage structure of the
LUT is C[Y ][U ][V ] = C V , where C V ∈ {0, 1, 2} represents
condition is as follows: the number of the most frequent that the pixel is green, red, and an uninteresting color pixel,
appearing phase Ns in the recent Q size times in the information respectively. The size of the LUT is 256 × 256 × 256.
queue should not be less than the preset threshold Q min . Therefore, by simply looking up the conversion table, a given
For a traffic light L j at validation state, the output phase pixel can be judged quickly if it is an interesting color pixel
after the multiframe spatial-temporal analysis is or not.


⎨Green, if Ns = Ng
phase = Red, if Ns = Nr (15) B. Acceleration of K-ELM Algorithm Using OpenCL


Unknown, otherwise From the formula (13) it can be seen that the output f (x)
of K-ELM consists of two parts: one part is
where Ns = max(Nr , Ng ), and Nr and Ng represent the
A N×m = ((I /λ) + ELM )−1 T , which is only related to
number of phases as red and green in the recent Q size times,
the training samples and can be calculated offline, and thus
respectively.
it does not consume any online computation resource, and
When the information queue of one validated traffic light
the other part is B1×N = [K (x, x 1), . . . , K (x, x N )], which
no longer meets the validation condition, its state turns from
is related to the feature vector of the candidate region and
the validation state into the candidate state. This process
the feature vectors of the training samples. This part needs
corresponds to T5 .
to be calculated online with the proposed kernel function
B. Type Recognition of the Traffic Light in (12), and thus it is time consuming. Considering that the
computation of each dimension of B1×N is independent, it
After the recognition of the phase of the traffic light,
is suitable for parallel optimization. In order to reduce the
a simple voting approach is adopted to determine the type
computation time, a CPU–GPU fusion-based approach is
of the traffic light
⎧ adopted to accelerate the proposed K-ELM algorithm using

⎨k ∗ , T ∗ OpenCL. For the recognition of a given candidate region, its
if L kt > Q min
Type = t =1 acceleration process is shown in Fig. 10.

⎩Unknown, otherwise First, the calculation and dimension reduction of the
candidate region’s feature vector x is performed on CPU.

T
To facilitate the computation of GPU, considering the sugges-
k ∗ = arg max L kt
k tion of [37] and the number of the GPU’s processing elements,
t =1
 the dimension of the feature vector is reduced to 256.
1, if Ct = k Then, the needed data for calculating f (x) is copied from
L kt = (16)
0, otherwise the CPU memory to the global memory of GPU. The data
1126 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

include the feature vector x, the off-line calculated feature TABLE I


vectors X 256×N = [x 1T x 2T · · · x NT ] of training samples, G ROUND -T RUTH S TATISTICS OF THE T EST V IDEOS
and A N×m . Here, X 256×N and A N×m are treated as con-
stant matrices and copied only once at the initialization.
Next, f (x) is calculated on GPU. To calculate B1×N =
[K (x, x 1), . . . , K (x, x N )], the global execution space of GPU
is divided into n work groups, n = N/M
. Here,
indicates
the ceiling operation, M indicates the number of work items in
each work-group, and totally, M ×n work items are generated.
In this paper, M = 256, which corresponds to the dimension of
the feature vector. For each work item, one K (x, x i ) operation
is executed. In the process of calculating matrix B, as the
feature vector x of the candidate region is used by all the work
items, it is copied from global memory to the shared local
memory of each work group. In addition, all the work items captured with the vehicle-mounted Samsung Note 3 phone and
are programmed to perform coalesced access to the global the images from the Internet.
memory so that the memory access is accelerated. To quantitatively evaluate the effectiveness of the proposed
Finally, the output f (x) is copied to the CPU memory. system, 150 test videos have been collected from different
Within the framework of OpenCL, a pipeline scheme is illumination conditions (morning, daytime, and twilight) and
also designed to parallel process multiple candidate regions, different weather conditions (sunny, cloudy, and rainy) by the
as shown in Fig. 10(b). It can be seen that the calculation Samsung Note 3 phone installed on the test vehicle. Each
and copy of the feature vectors of these candidate regions video contains at least 100 frames of images with a resolution
are pipelined, as well as the calculation and copy of the of 1280 × 720. The collected videos contain several types of
corresponding outputs. These operations will hide the time traffic lights, as shown in Fig. 1, including: 1) round lights
used for copying data between the memories of CPU and GPU. with a count-down timer; 2) round lights without a count-
The proposed acceleration approach has been implemented down timer; and 3) various types of arrow lights (left-turn,
and evaluated on the Samsung Note 3 smartphone. From the right-turn, and straight).
test results, it can be seen that the proposed acceleration A ground-truth label was made manually, storing the phases
approach achieves 0.75 ms per candidate region, which is five of all visible traffic lights. Furthermore, the types of traffic
times faster than the only CPU version of implementation. This lights that can be visually distinguished by human are also
meets the requirement of the system’s real-time behavior. given. For those traffic lights whose types are visually indis-
tinguishable, they are labeled unknown types. In Table I, the
statistics of the videos are presented.
VIII. E XPERIMENTAL R ESULTS
In this section, we first analyze the contribution of the
B. Contribution Analysis of the Nonlinear Kernel Function
nonlinear kernel function that is used to combine two het-
erogeneous features. And then, we compare quantitatively In the proposed heterogeneous feature combination method,
the proposed K-ELM recognition method with state-of-the-art the coefficient β acts as an important role and directly affects
methods. Furthermore, we verify the effectiveness of the pro- the description power of the feature combination. To demon-
posed system via the experimental results of spatial-temporal strate the contribution of parameter β, with the training data set
analysis. Finally, the runtime performance of the system is introduced in Section VIII-A, a family of K-ELM classifiers
evaluated as well. with various β values is trained and evaluated on the validation
data set. Considering both recall and precision, F-measure is
adopted as the evaluation criteria
A. Experimental Data
RE ∗ PR
In order to analyze the contribution of the heterogeneous F =2∗ (17)
feature combination method and validate the performance RE + PR
of K-ELM, two data sets are built up for the training and where RE = (TP/TP + FN) and PR = (TP/TP + FP).
testing purposes. For the training data set, two subsets are Here, RE and PR represent recall and precision, respec-
collected, one for each color of the traffic lights. In each subset, tively. TP is the total number of correctly recognized traffic
we collect 5000 samples of traffic lights and 5000 samples of light samples, FN is the total number of missed traffic light
nontraffic lights. Of each subset, 50% of the samples serve for samples, and (TP + FN) indicates the total number of traffic
training and 50% are used as a validation data set to determine light samples in the ground truth. FP is the total number
the optimal parameters of K-ELM (see Section VIII-B). of misrecognized nontraffic light samples and (TP + FP)
Similar to the training data set, a test data set is collected. indicates the total number of recognized traffic lights.
The test data set also has two subsets, one for each color of the It should be noted that the proposed K-ELM outputs both
traffic lights. In each subset, we collect 4000 samples of traffic the phase and the type of the traffic lights simultaneously.
lights and 4000 samples of nontraffic lights. It is notable that Here, in order to conveniently demonstrate the contribution
the test samples are collected from two sources: the images of the parameter β, only the recall and precision of the phase
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1127

Fig. 11. Evaluation results of red traffic light recognition with various Fig. 12. Comparisons of different feature combinations. (a) Results on test
β values. (a) Curves of RE and PR with different β values. (b) Curve of data set of red light recognition. (b) Results on test data set of green light
F-measure with various β values, and the red square marker indicates the recognition.
point with the maximal F-measure value.
several feature combinations with different β values whose
information are considered. This is done by treating the output descriptive power is superior to that of β = 0.5. This exhibits
of the K-ELM as a binary classification output—traffic light the contribution of parameter β. The feature combination with
and nontraffic light, regardless of the type information. β = 0.8 has the highest score of F. Therefore, for red lights,
Besides the parameter β, there are two other parameters β = 0.8 is chosen as the combination coefficient in this paper.
in K-ELM: the kernel parameter γ and the regularization Similarly, the optimal combination coefficient β can also be
parameter λ. The choice of the values of these two parameters obtained for green lights.
can also affect the performance of the classifier. Especially, to Furthermore, in order to further validate the effectiveness
analyze the contribution of β, we take the HOG–LBP features of the proposed heterogeneous feature combination method,
in [35] as the baseline of comparison, which corresponds four different feature combinations are tested and compared
to the case with β = 0.5. In this case, the optimal values on the test data set, with the Receiver Operating Characteristic
of γ and λ are first determined by applying multiple exper- (ROC) curves shown in Fig. 12. The horizontal axis shows
iments with a grid search strategy on the validation data the FPR, and the vertical axis shows the TPR. Four selected
set. Then, the variation of F-measure with respect to the parameters β = 0, β = 1, β = 0.5, and β = 0.8, respectively,
various values of β is analyzed using the determined optimal represent the HOG, LBP, HOG + LBP [35], and the proposed
values of γ and λ. In this paper, the search range is defined HOG-LBP combination feature. From these results, one can
as {2−10 , 2−9 , . . . , 24 } for γ and {2−5 , 2−4 , . . . , 210 } for λ. see that the proposed feature combination method outperforms
The optimal values of parameter γ and λ on the validation the single feature and the feature combination method of [36].
data set are determined and γ = 1 and λ = 16. This shows the effectiveness of the heterogeneous features
Then, the variation of F-measure with respect to various with the proposed combination method.
values of β (from 0 to 1) is analyzed using the determined
optimal values of γ and λ. The analytical results on the C. Quantitative Comparison Between
validation data set of red lights are shown in Fig. 11. From K-ELM and Other Methods
the results, it can be seen that different values of β have A traffic light contains both phase and type information.
different effects on the red light recognition, when using the First, we evaluate the contribution of the proposed K-ELM on
optimal values of parameters γ and λ. In Fig. 11(b), there exist the phase recognition performance. The proposed K-ELM is
1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

TABLE II
T YPE R ECOGNITION R ATES OF D IFFERENT M ETHODS

best knowledge, this work is the first prototype of traffic


light type recognition in the literature. In order to compare,
several other methods are also implemented by us, including:
1) linear SVM classifier; 2) BP network with multiclass output;
3) ELM with multiclass output; and 4) NN. In the SVM
case, we have followed the one-versus-one vote scheme for
multiclass classification. To perform a fair comparison, for
all the methods, the same heterogeneous feature combination
is adopted, and two classifiers are trained per method to,
respectively, recognize the types of red and green lights.
Table II shows the type recognition rates obtained
on the test data sets. As can be seen, the proposed
K-ELM-based type recognition method outperforms all the
competing methods, providing type recognition rates equal
to 92.59% and 93.81% for the green light data set and the
red light data set, respectively. The corresponding confusion
matrixes are shown in Fig. 14(a) and (b), respectively. From
the results, one can see that the recognition accuracy of arrow
lights (straight, left-turn, and right-turn) is higher than that
Fig. 13. ROC curves of different recognition methods. (a) ROC curve of
red light recognition. (b) ROC curve of green light recognition. of round lights. This is because the arrow lights’ clear edges
and pointing directions highly benefit the recognition. As for
the false recognition rate, the round lights are easily confused
compared with other state-of-the-art methods: 1) AdaBoost +
with left-turn, right-turn, and unknown types of lights. This is
Haar-like [2]; 2) BP network [30]; 3) SVM + LBP [31];
due to the fact that with such an image resolution, it is very
4) SVM + HOG [23]; and 5) CNN [29]. Here, in comparison,
difficult to distinguish the types of the traffic lights at a far
the result of SVM + HOG + LBP is also given. To perform a
distance.
quantitative evaluation, we test different methods on test data
sets. The ROC curves are shown in Fig. 13.
Here, the evaluation result of another method is also D. Spatial-Temporal Analysis
provided: an ELM classifier with the proposed heterogeneous We also quantitatively evaluate the effectiveness of the pro-
feature combination, labeled ELM + k-HOG-LBP. To ensure posed system via the experiments results of spatial-temporal
a fairer comparison, for all the methods, the same training analysis on test videos. In this paper, the parameters of the
data set and test data set (as mentioned above) are used, and information queue are set as Q size = 7 and Q min = 4,
their optimal parameters are also determined on the validation due to the reason that the output delay is expected to be as
data set. From the results, one can see that the proposed small as possible while a reliable output is guaranteed. As the
K-ELM method performs much better than the state-of-the-art average processing speed is 20 frames/s (to be described in
methods. Section VIII-E) with Q min = 4, the output delay is limited
It has been previously mentioned that the proposed within 0.5 s, which is smaller than the common reaction time
K-ELM outputs both the phase and the type of the traffic light of drivers. Therefore, the influence of the output delay can
simultaneously, while in the state-of-the-art methods, only the be neglected. If Q size is set too large, too many outdated
phase recognition result is provided. These methods perform a light recognition results will be stored in the queue. This not
binary classification by outputting the traffic light as positive only influences the recognition performance but also consumes
and nontraffic light as negative. For the sake of comparison, more memory. Therefore, in this paper, Q size = 7.
only the output of the phase information is considered. This Table III shows the phase recognition results of traffic
is done by treating the outputs of all types of traffic lights as lights after the spatial-temporal analysis. Here, in order
positive and the outputs of nontraffic light as negative. to demonstrate the improvement by the proposed spatial-
Furthermore, we evaluate the contribution of the pro- temporal analysis, the single-frame recognition results are also
posed K-ELM on type recognition performance. To our given for a comparison. In Table III, TPR represents the
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1129

TABLE III
R ECOGNITION R ESULTS OF THE T RAFFIC L IGHT P HASE

Fig. 15. Typical test results of the traffic light recognition system on a mobile
platform.

TABLE V
P ROCESSING T IME OF THE P ROPOSED S YSTEM

seen from the results, the accuracy of the type recognition of


traffic lights is markedly improved with the proposed spatial-
temporal analysis.
Fig. 15 shows some recognition results of typical traffic
scenes. To provide a clearer view of the results, only the parts
of images around the traffic lights are shown in Fig. 15. It can
be seen that the proposed system can recognize traffic lights
of different types correctly.
Fig. 14. Confusion matrix. (a) Confusion matrix for type estimation of E. Runtime Performance Evaluation
red lights. (b) Confusion matrix for type estimation of green lights.
We evaluate the processing time of the proposed system on
the Samsung Note 3 platform. The processing time is given
TABLE IV
in Table V. From Table V, one can see that the processing
R ECOGNITION R ESULTS OF T RAFFIC L IGHT T YPE
Process Total time of the proposed system with OpenCL
acceleration is about 47 ms/f, which means our system can
achieve about 20 frames/s.
IX. C ONCLUSION
In this paper, a traffic light recognition system based on
smartphone platforms is proposed and several contributions
have been made.
1) To avoid the influences of the camera color cast, the
complex background, weather, and illumination condi-
true positive rate, also known as recognition rate or recall. tions, an ellipsoid geometry threshold model in HSL
FPR represents the false positive rate and gives the percentage color space is built to extract interesting color regions.
of misrecognized nontraffic lights in the total number of Meanwhile, a postprocessing step is applied to obtain
traffic lights recognized by the system. In the case of candidate regions of traffic lights.
FPR measurement, the smaller the value is, the better the 2) A new kernel function is proposed to effectively com-
accuracy is. As can be seen from the results, the accuracy of bine two heterogeneous features (HOG and LBP), and
the phase recognition of traffic lights is effectively improved a K-ELM is designed to verify if a candidate region is
with the proposed spatial-temporal analysis. a traffic light or not.
Table IV shows the type recognition results of traffic lights 3) To further increase the reliability of recognition over a
after the spatial-temporal analysis. The single-frame recogni- period of time, a spatial-temporal analysis framework
tion results are also given for a comparison. Here, the statistics based on finite-state machine is introduced to recognize
of the unknown type traffic lights are not included. As can be the phase and type of traffic lights.
1130 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 27, NO. 5, MAY 2017

4) A CPU–GPU fusion-based approach is adopted to accel- [15] M. Diaz-Cabrera, P. Cerri, and J. Sanchez-Medina, “Suspended traf-
erate the execution of the proposed K-ELM so that fic lights detection and distance estimation using color features,” in
Proc. 15th Int. IEEE Conf. Intell. Transp. Syst. (ITSC), Sep. 2012,
a computational performance that is five times faster pp. 1315–1320.
than the only CPU version of implementation can be [16] H. Tae-Hyun, J. In-Hak, and C. Seong-Ik, “Detection of traf-
achieved. fic lights for vision-based car navigation system,” in Advances in
Image and Video Technology. Heidelberg, Germany: Springer, 2006,
The test results of real scenes show that the proposed system pp. 682–691.
can simultaneously accurately recognize the phase and type [17] Y. Jie, C. Xiaomin, G. Pengfei, and X. Zhonglong, “A new traffic
of traffic lights compared with the existing methods. Besides, light detection and recognition algorithm for electronic travel aid,” in
Proc. 4th Int. Conf. Intell. Control Inf. Process. (ICICIP), Jun. 2013,
the response of the system is rapid and a feedback can be pp. 644–648.
given in less than a second. It is also worth pointing out that [18] J. Choi, B. T. Ahn, and I. S. Kweon, “Crosswalk and traffic light detec-
the recognition of traffic lights (especially the recognition of tion via integral framework,” in Proc. 19th Korea-Jpn. Joint Workshop
Frontiers Comput. Vis. (FCV), Jan./Feb. 2013, pp. 309–312.
arrow lights) is not useful unless the results are associated [19] J. Levinson, J. Askeland, J. Dolson, and S. Thrun, “Traffic light mapping,
with the lane information. There may be multiple lights in the localization, and state detection for autonomous vehicles,” in Proc. Int.
cross road that has many lanes in the same direction. In such Conf. Robot. Autom., May 2011, pp. 5784–5791.
[20] Z. Li-Tian, F. Meng-Yin, Y. Yi, and W. Mei-Ling, “A framework
a scenario, each traffic light indicates the traffic situation of of traffic lights detection, tracking and recognition based on motion
its corresponding lane. Therefore, the recognition results of models,” in Proc. IEEE 17th Int. Conf. Intell. Transp. Syst. (ITSC),
traffic lights must be associated with the lanes. In the future, Oct. 2014, pp. 2298–2303.
we will try to fuse the recognition results with GPS navigation [21] H.-K. Kim, J. H. Park, and H.-Y. Jung, “Effective traffic lights recogni-
tion method for real time driving assistance system in the daytime,” in
information. By considering both the trajectory planning and Proc. 59th World Acad. Sci. Eng. Technol., 2011, pp. 1–4.
the current location information, the recognition results will [22] S. Sooksatra and T. Kondo, “Red traffic light detection using fast radial
be reasonably interpreted and utilized. symmetry transform,” in Proc. 11th Int. Conf. Elect. Eng./Electron.,
Comput., Telecommun. Inf. Technol. (ECTI-CON), May 2014, pp. 1–6.
[23] C. Jang, C. Kim, D. Kim, M. Lee, and M. Sunwoo, “Multiple exposure
R EFERENCES images based traffic light recognition,” in Proc. IEEE Intell. Vehicles
Symp., Jun. 2014, pp. 1313–1318.
[1] C. Yu, C. Huang, and Y. Lang, “Traffic light detection during day and [24] Y. Shen, U. Ozguner, K. Redmill, and J. Liu, “A robust video based
night conditions by a camera,” in Proc. IEEE 10th Int. Conf. Signal traffic light detection algorithm for intelligent vehicles,” in Proc. IEEE
Process. (ICSP), Oct. 2010, pp. 821–824. Intell. Vehicles Symp., Jun. 2009, pp. 521–526.
[2] R. de Charette and F. Nashashibi, “Traffic light recognition using image [25] Y. Zhang, J. Xue, G. Zhang, Y. Zhang, and N. Zheng, “A multi-feature
processing compared to learning processes,” in Proc. IEEE/RSJ Int. fusion based traffic light recognition algorithm for intelligent vehicles,”
Conf. Intell. Robots Syst. (IROS), Oct. 2009, pp. 333–338. in Proc. 33rd Chin. Control Conf. (CCC), Jul. 2014, pp. 4924–4929.
[3] J. Gong, Y. Jiang, G. Xiong, C. Guan, G. Tao, and H. Chen, “The [26] F. Lindner, U. Kressel, and S. Kaelberer, “Robust recognition of
recognition and tracking of traffic lights based on color segmentation traffic signals,” in Proc. IEEE Intell. Vehicles Symp., Jun. 2004,
and CAMSHIFT for intelligent vehicles,” in Proc. IEEE Intell. Vehicles pp. 49–53.
Symp. (IV), Jun. 2010, pp. 431–435. [27] J. Ren, J. Jiang, D. Wang, and S. S. Ipson, “Fusion of intensity and
[4] R. de Charette and F. Nashashibi, “Real time visual traffic lights inter-component chromatic difference for effective and robust colour
recognition based on spot light detection and adaptive traffic lights edge detection,” IET Image Process., vol. 4, no. 4, pp. 294–301,
templates,” in Proc. IEEE Intell. Vehicles Symp., Jun. 2009, pp. 358–363. Aug. 2010.
[5] M. Omachi and S. Omachi, “Traffic light detection with color and [28] J. Han, D. Zhang, X. Hu, L. Guo, J. Ren, and F. Wu, “Background
edge information,” in Proc. 2nd IEEE Int. Conf. Comput. Sci. Inf. prior-based salient object detection via deep reconstruction residual,”
Technol. (ICCSIT), Aug. 2009, pp. 284–287. IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 8, pp. 1309–1321,
[6] A. E. Gómez, F. A. R. Alencar, P. V. Prado, F. S. Osório, and Aug. 2015.
D. F. Wolf, “Traffic lights detection and state estimation using hidden [29] V. John, K. Yoneda, B. Qi, Z. Liu, and S. Mita, “Traffic light recognition
Markov models,” in Proc. IEEE Intell. Vehicles Symp., Jun. 2014, in varying illumination using deep learning and saliency map,” in
pp. 750–755. Proc. IEEE 17th Int. Conf. Intell. Transp. Syst. (ITSC), Oct. 2014,
[7] J. Baber, J. Kolodko, T. Noel, M. Parent, and L. Vlacic, “Cooperative pp. 2286–2291.
autonomous driving: Intelligent vehicles sharing city roads,” IEEE [30] W. Hong-Jiang et al., “Research on unmanned vehicle traffic signal
Robot. Autom. Mag., vol. 12, no. 1, pp. 44–49, Mar. 2005. recognition technology,” in Proc. Int. Conf. Intell. Syst. Design Eng.
[8] N. Fairfield and C. Urmson, “Traffic light mapping and detection,” Appl. (ISDEA), vol. 2. Oct. 2010, pp. 298–301.
in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2011, [31] C.-C. Chiang, M.-C. Ho, H.-S. Liao, A. Pratama, and W.-C. Syu,
pp. 5421–5426. “Detecting and recognizing traffic lights by genetic approximate ellipse
[9] J. Roters, X. Jiang, and K. Rothaus, “Recognition of traffic lights in detection and spatial texture layouts,” Int. J. Innov. Comput. Inf. Control,
live video streams on mobile devices,” IEEE Trans. Circuits Syst. Video vol. 7, no. 12, pp. 6919–6934, 2011.
Technol., vol. 21, no. 10, pp. 1497–1511, Oct. 2011. [32] G. Cheng, J. Han, L. Guo, Z. Liu, S. Bu, and J. Ren, “Effective
[10] Y.-T. Chiu, D.-Y. Chen, and J.-W. Hsieh, “Real-time traffic light detec- and efficient midlevel visual elements-oriented land-use classification
tion on resource-limited mobile platform,” in Proc. IEEE Int. Conf. using VHR remote sensing images,” IEEE Trans. Geosci. Remote Sens.,
Consum. Electron.-Taiwan (ICCE-TW), May 2014, pp. 211–212. vol. 53, no. 8, pp. 4238–4249, Aug. 2015.
[11] A. Acharya, J. Lee, and A. Chen, “Real time car detection and tracking in [33] G. Trehard, E. Pollard, B. Bradai, and F. Nashashibi, “Tracking both
mobile devices,” in Proc. Int. Conf. Connected Vehicles Expo (ICCVE), pose and status of a traffic light via an interacting multiple model filter,”
Dec. 2012, pp. 239–240. in Proc. 17th Int. Conf. Inf. Fusion (FUSION), Jul. 2014, pp. 1–7.
[12] C.-W. Tang, K.-T. Feng, P.-H. Tseng, C.-H. Chen, and J.-W. Guo, [34] R. Pan, W. Gao, and J. Liu, “Color clustering analysis of yarn-
“A pitch-aided lane tracking algorithm for driver assistance system dyed fabric in HSL color space,” in Proc. WRI World Congr. Softw.
with insufficient observations,” in Proc. IEEE Wireless Commun. Netw. Eng. (WCSE), vol. 2. May 2009, pp. 273–278.
Conf. (WCNC), Apr. 2012, pp. 3261–3266. [35] W.-J. Park, D.-H. Kim, Suryanto, C.-G. Lyuh, T. M. Roh, and
[13] Y. K. Kim, K. W. Kim, and X. Yang, “Real time traffic light recognition S.-J. Ko, “Fast human detection using selective block-based HOG-LBP,”
system for color vision deficiencies,” in Proc. Int. Conf. Mechatronics in Proc. 19th IEEE Int. Conf. Image Process. (ICIP), Sep./Oct. 2012,
Autom. (ICMA), Aug. 2007, pp. 76–81. pp. 601–604.
[14] M. Omachi and S. Omachi, “Detection of traffic light using structural [36] X. Wang, T. X. Han, and S. Yan, “An HOG-LBP human detector with
information,” in Proc. IEEE 10th Int. Conf. Signal Process. (ICSP), partial occlusion handling,” in Proc. IEEE 12th Int. Conf. Comput. Vis.,
Oct. 2010, pp. 809–812. Sep./Oct. 2009, pp. 32–39.
LIU et al.: REAL-TIME TRAFFIC LIGHT RECOGNITION BASED ON SMARTPHONE PLATFORMS 1131

[37] Y.-X. Li, S. Ji, S. Kumar, J. Ye, and Z.-H. Zhou, “Drosophila Bing Yu received the B.S. degree from Shanghai
gene expression pattern annotation through multi-instance multi-label Jiao Tong University, Shanghai, China, in 2010, and
learning,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 9, no. 1, the M.S. degree in automation from the Institut de
pp. 98–112, Jan./Feb. 2012. Recherche en Communications et Cybernétique de
[38] Z.-L. Sun, H. Wang, W.-S. Lau, G. Seet, and D. Wang, “Application of Nantes, Nantes, France, in 2012.
BW-ELM model on traffic sign recognition,” Neurocomputing, vol. 128, He is currently a Research Engineer with
pp. 153–159, Mar. 2014. the Advanced Automotive Electronics Technology
[39] G.-B. Huang, D. H. Wang, and Y. Lan, “Extreme learning machines: Research Center, Neusoft Corporation, Shenyang,
A survey,” Int. J. Mach. Learn. Cybern., vol. 2, no. 2, pp. 107–122, China. His current research interests include image
2011. processing, computer vision, and machine learning.
[40] S. S. Baboo and S. Sasikala, “Multicategory classification using
an extreme learning machine for microarray gene expression can-
cer diagnosis,” in Proc. IEEE Int. Conf. Commun. Control Comput.
Technol. (ICCCCT), Oct. 2010, pp. 748–757.
[41] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan,
“A fast and accurate online sequential learning algorithm for feedforward
networks,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1411–1423,
Nov. 2006. Ting Zhou received the B.S. degree in automa-
[42] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning tion and the M.S. degree in control engineering
machine for regression and multiclass classification,” IEEE Trans. Syst., from Northeastern University, Shenyang, China,
Man, Cybern. B, Cybern., vol. 42, no. 2, pp. 513–529, Apr. 2012. in 2011 and 2013, respectively.
[43] B. Peasley and S. Birchfield, “Replacing projective data association She is currently a Research Engineer with
with Lucas–Kanade for KinectFusion,” in Proc. IEEE Int. Conf. Robot. the Advanced Automotive Electronics Technology
Autom. (ICRA), May 2013, pp. 638–645. Research Center, Neusoft Corporation, Shenyang.
Her current research interests include machine
learning, computer vision, and image semantic
Wei Liu received the M.S. and Ph.D. degrees segmentation.
in control theory and control engineering from
Northeastern University, Shenyang, China, in
2001 and 2005, respectively.
He is currently a Professor-Level Senior
Engineer with the Research Academy, Northeastern
University. He is also the Director of the Intelligent
Vision Laboratory with Neusoft Corporation,
Shenyang. His current research interests include Huai Yuan received the B.S. degree in computer
computer vision, image processing, and pattern software from Nankai University, Tianjin, China,
recognition with applications to intelligent video in 1983, and the M.S. degree in computer software
surveillance and advanced driver assistance systems. from Northeastern University, Shenyang, China,
in 1986.
He is currently an Associate Professor with
Shuang Li received the master’s degree in Northeastern University. He is also the Director
applied mathematics from Northeastern University, of the Advanced Automotive Electronics Technol-
Shenyang, China, in 2014. ogy Research Center with Neusoft Corporation,
She is currently a Software Engineer with Neusoft Shenyang. His current research interests include
Corporation, Shenyang. Her current research inter- computer vision, image processing, and intelligent
ests include computer vision, image processing, and vehicles.
pattern recognition.

Jin Lv received the B.S. degree in electronic and Hong Zhao received the M.S. and Ph.D. degrees
information engineering from the Shenyang Univer- in computer science from Northeastern University,
sity of Technology, Shenyang, China, in 2008, and Shenyang, China, in 1984 and 1991, respectively.
the M.S. degree in pattern recognition and intelligent He has been a Professor with Northeastern Uni-
systems from Northeastern University, Shenyang, versity since 1994. He is currently the Director
in 2010. of the National Engineering Research Center
She is currently a Research Engineer with for Digital Medical Imaging Device, Shenyang.
the Advanced Automotive Electronics Technology His current research interests include computer
Research Center, Neusoft Corporation, Shenyang. multimedia systems, distributed computer systems,
Her current research interests include image process- image processing, and computer vision.
ing, computer vision, and machine learning.

You might also like