You are on page 1of 31

Design of traffic light and traffic sign detection system for autonomous vehicle

Thesis
Submitted in partial fulfilment of the requirements of
BITS C421T/422T Thesis
By
Rachit Bhargava
ID No. (2011A4TS232P)
Under the supervision of
Dr. K Madhava Krishna
Associate Professor, IIIT Hyderabad
&
Dr R.K. Mittal
Professor, BITS Pilani

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE,


PILANI
(9th May, 2015)

2011A4TS232P

Thesis report 2014-15

Acknowledgement
Firstly, I would like to thank the Head of the Mechanical Department,
Dr. Sai Jagan Mohan for granting his permission to allow me to explore
my interests in robotics and pursue a semester off campus in one of
the leading robotics facilities in India at IIIT, Hyderabad. I appreciate
the freedom he has given me during the duration of my thesis to
pursue what I am interested in.
I would also like to thank Dr. R.K. Mittal, who has been my on campus
advisor for the duration of this thesis. I am very thankful to him for his
support during this period and more importantly, for spending his
precious time with me to guide me on how to pursue my interests in
robotics.
Finally and most importantly, I would like to thank my supervisor, Dr.
K Madhava Krishna for believing in me and trusting me with projects
of vast importance to him even when I had no prior research
experience in this field. With the help of my mentor, Harit Pandya and
Dr. Madhava, I have been able to explore the highly exciting field of
computer vision and gain valuable practical exposure in the same. This
experience has made me a lot more confident to explore further in the
highly complex, diverse and ever evolving field of robotics.
My time here has been very enjoyable and I hope my work has been
up to your standards and has reflected my enthusiasm to work in this
field.

2011A4TS232P

Thesis report 2014-15

CERTIFICATE
This is to certify that the Thesis entitled, The design of traffic light and traffic
sign recognition systems for autonomous vehicles, and submitted by
Rachit Bhargava ID No. 2011A4TS232P in partial fulfilment of the requirement
of BITS C421T/422T Thesis embodies the work done by him/her under my
supervision.

Signature of the Supervisor


Date:
Name: Dr. K Madhava Krishna
Designation: Head of the Robotics Research Lab, IIIT Hyderabad

2011A4TS232P

Thesis report 2014-15

Project abstract
Traffic car accidents claim more than a million lives each year and
majority of these accidents are due to driver error. This figure can be
significantly reduced if we are able to provide information to the
driver to react more quickly or even have the automobile react on his
behalf. Thus there is a need of an intelligent driver safety system. This
project deals with two such modules for this intelligent system namely
traffic sign and traffic light recognition system. The aim of the project
is to develop a system that is able to take in images of dense urban
environments and be able to detect traffic lights and traffic signs and
more importantly recognise these correctly. An even more important
requirement is that this must be achieved in real time for it to be
implemented successfully in an autonomous vehicle. The following
will be achieved using basic image processing techniques like
segmentation, dilation and erosion and machine learning techniques
like Cascade training and SVM (Support Vector Machine) .

2011A4TS232P

Thesis report 2014-15

Table of contents
1. Design of traffic light detection system for autonomous vehicle

1.1 Introduction

1.2 The image and the RGB and HSV colour space

1.3 Geometric constraints and morphological operations

1.4 Haar descriptors and Cascade training

12

1.5 The pipeline

15

1.6 Results

16

1.7 Challenges

19

2. Design of traffic sign detection system for autonomous vehicles

20

2.1 Introduction

20

2.2 The image and the RGB and HSV colour space

21

2.3 Hog features (Histogram Oriented Gradients)

22

2.4 Multiclass SVM with RBF kernel and E-SVM

24

2.5 The pipeline

27

2.6 Results

28

2.7 Challenges

28

3. Conclusion

29

4. References

30

2011A4TS232P

Thesis report 2014-15

1. Design of traffic light detection system for autonomous vehicle


1.1 Introduction
In todays world traffic scenes are complex and include lots of information and keeping
constant attention on the traffic signs is not an easy task for drivers. Therefore some
traffic data (signs) can be missed for several causes such as the complexity of the road
scene, the high number of visual information, or even the drivers stress or visual fatigue.
In order to assist this task, several driver assistant systems have been suggested in the
past years using either database information (i.e. learned Geographic Information
Systems) or on-vehicle sensors (i.e. laser, camera, etc.) to provide various environment
information such as traffic signs, speed limits, traffic lights, crosswalks or even information
like the presence of pedestrians or obstacles. The specific functionality of traffic lights
detection shall be very useful since traffic lights position and state provide good
knowledge of the traffic environment such as high probability of crossroads/crosswalk,
dangerous area, etc. Furthermore detecting traffic lights with an on camera could also be
used to improve fusion of GPS and camera visual data in order to make visual projection
of road information on the windshield.
Most of the existing algorithms work on only suspended traffic lights. Our aim of doing
this project was to achieve traffic light detection in even the densest urban scenarios in
real time. The generated algorithm after further improvement will play an important part
in helping the autonomous vehicle being developed by the team at IIIT Hyderabad for the
Mahindra Spark the Rise Driverless Car Challenge to detect the traffic lights in its
surrounding environment and recognise whether they correspond to a go or a stop
state and allow it to react accordingly. The detection and recognition is achieved using
basic image processing and machine learning techniques which will be explained in later
sections. Once we are clear with the tools that are to be implemented, the pipeline or the
outline of the algorithm will be explained.

2011A4TS232P

Thesis report 2014-15

Fig 1: The required result of the problem

1.2 The image and the RGB and HSV colour space
Any image that is inputted into a computer is stored in a 3 Dimensional matrix with each
point in the matrix being represented by 3 channels of red, blue and green intensities. The
intensities range from the value 0 (Extremely Dark) to 255 (Extremely Bright). It is from
this matrix, are we to extract information that indicates the position of the object we
desire to find. For instance in the case of the traffic lights can be distinguished from other
objects in an image since they can be of 2 different colours only namely red and green and
the fact that their intensity value is extremely high in comparison to its surroundings. Thus
our first step will be to use this information to our advantage through colour segmentation
of the image.
Colour segmentation algorithms include the histogram threshold method, the feature
space clustering method, the region-based method, the edge-based detection method,
the fuzzy set method, the neural network, the method based on the physical model and
so on. Of which, the threshold method has a good real-time performance, and the hue of
each kind of light is basically fixed. When a light turns on, its intensity is largely fixed.

2011A4TS232P

Thesis report 2014-15

Therefore, the threshold segmentation method based on the HSV colour space is suitable
for recognition of traffic lights.

Fig 2: 3D view of HSV colour space

HSV colour space is a colour model based on human vision, and many researches have
used colour threshold with the HSI colour model. It can be used as a measure for
distinguishing predefined colours of traffic lights. In this work, the segmentation method
in HSV colour space was used. About 900 samples with different lighting conditions,
different background environment, and different brightness were selected to calculate
the H, S & V statistical curves of the red and green light. The characteristic parameter of
the threshold segmentation in HSV colour space was obtained according to the statistics
curves and by trial and error. By various trials we finally were able to decide upon a region
of H, S and V so that we were able to minimize the number of candidates for the next step
and the number of false negatives.
The resultant HSV ranges taken in the final algorithm were:
For Red light:
H: 165-34
S: 192-255
V: 192-255

2011A4TS232P

Thesis report 2014-15

For Green Light:


H: 79-89
S: 65-166
V: 204-255
The resultant image obtained was:

Fig 3: Results from colour segmentation

1.3 Geometric constraints and morphological operations


The resulting image after colour segmentation is what is known as a binary image where
the matrix points which satisfy the thresholds are assigned the value of 1 else assigned 0.
As we can see from the binary image it is still very difficult to ascertain the spotlight of
the traffic light from the above binary image. This is attributed partly due to other objects
in the surrounding such as the presence of neon lights and partly due to the presence of
noise in the image.
1. Dealing with the noise: Image noise is random (not present in the object imaged)
variation of brightness or colour information in images, and is usually an aspect of
electronic noise. This noise may not be filtered out during the colour segmentation
and must be removed through what is known as morphological operations namely
dilation and erosion. By choosing an appropriate structural element it is possible to
erode out what we dont require (mostly noise) and dilate those characteristics
which are more favourable to us by choosing an appropriate structural element. (For
better understanding, read source material and documentation in OpenCV)

2011A4TS232P

Thesis report 2014-15

Fig 4: Dilation for a binary image

Fig 5: Erosion for a binary image

2. Reducing number of candidates for recognition phase: Let us return to the question
of what is unique about the shape of traffic lights. The fact that is circular of course!!
So we find the contour associated with each blob of connected components and attain
the bounding box surrounding these blobs. These are done using predefined functions
in the OpenCV library. It is done by measuring the centroid of the blob by means of
moments and finding the extreme corners of the blob. Once we have the bounding
boxes we get a measure of their aspect ratio. As is obvious this results in the removal
of blobs which do not satisfy the circularity condition, thereby further reducing the
number of candidates for spotlight recognition phase.
The resultant of these two steps is:

Fig 6: The result after noise removal and applying geometric constraints

10

2011A4TS232P

Thesis report 2014-15

The next step is extremely important. We now concentrate on the structure of the
traffic light as shown below.

Fig 7: Schematic of the traffic light used to decide the region of interest
From the blob candidates we are able to figure out its height and width as well as its
centroids. This is done with the image moments m00, m01 and m10. The general equation
of image moments Mij is given by the equation.

where x and y are the image coordinates or matrix row and column and I(x,y) is the pixel
value or intensity value
By studying the schematic representation of the traffic lights in the area we used the
centroid, width and height information of the blob (traffic spotlight) to generate a region
of interest (by expansion of the bounding box using ratios of dimensions that are attained
from the schematic representation) that will contain the entire traffic lights and not just
the spotlight. It is this region of interest that is going to be passed onto the recognition
phase.

11

2011A4TS232P

Thesis report 2014-15

So, in the image that we have been following we get the following region of interests:

Fig 7: Determining the region of


interest from the binary image

1.4 Haar descriptors and Cascade training


Training Haar-cascade
Haar-cascade is an object detection algorithm used to locate the traffic lights in our region
of interest (Kumar R. & Bindu A.; 2006). In Haar-cascade, the system is provided with
several numbers of positive images (In our case, pictures of traffic lights) and negative
images (images that are not traffic lights but can be anything else like chair, table, wall,
etc. as long as they clearly possess a different structure), and the feature selection is done
along with the classifier training using Adaboost and Integral images.

12

2011A4TS232P

Thesis report 2014-15

Features used by Haar-cascade


In general, three kinds of features are used in which the value of a two rectangular
features is the difference sum of the pixels within two rectangular regions. These regions
have same shape and size and are horizontally or vertically adjacent as shown in Fig 8.
Where as in the three rectangular features are computed by taking the sum of two outside
rectangles and then subtracted with the sum in a centre rectangle. Moreover, in the four
rectangles feature computes the difference between diagonal pairs of rectangles (Viola P.
& Jones M.; 2001).

Fig 8: Example rectangle features use in Haar-cascade. The sum of pixels in the white rectangles is
subtracted for the sum of the pixels in the grey rectangles. Here A and B are two rectangle feature, and
C and D are three and four rectangle feature (Viola P. & Jones M.; 2001).

Learning Classification Functions


Classification learning process requires a set of positive and negative images for training
and a set of features are selected using Ada-boost for training the classifier. To increase
the learning performance of the algorithm (which is sometime called as weak learner),
the Ada-boost algorithm is used. Ad-boost provides guarantees in several procedures.
The process of Boosting works with the learning of single simple classifier and rewriting
the weight of the data where errors were made with higher weights. Afterwards a second
simple classifier is learned on the weighted classifier, and the data is reweighted on the

13

2011A4TS232P

Thesis report 2014-15

combination of 1st and 2nd classifier and so on until the final classifier is learned.
Therefore, the final classifier is the combination of all previous n-classifiers, which is
shown in Fig 9. The Ada-boost cascade of classifiers is one of the fastest and robust
methods of detection and characterization, however, it presents some limitations on
complex scenes especially those that changes shape (Sialat et al; 2009).
The final contribution (or weights) of any of the n classifiers is linearly dependent on the
ratio of the true positives and the false positives that are attained in each case.

Fig 9: Ada-boost learning process. The (+) indicates it is a


positive training sample and (-) a negative training sample
The detection using Haar-cascade is based upon the training of a classifier using number
of positive images that represent the object to be recognized and even large number of
negative images that represent objects or feature not to be detected. However, OpenCV
is already provided with the program for training a classifier to recognize any object, which
is known as HaarTraining function.

14

2011A4TS232P

Thesis report 2014-15

1.5 The pipeline


Once we are familiar with all the tools we can recap the entire process that goes into
building a robust traffic light detection and recognition system.
The following are the steps involved:

The image is stored in the form of a 3 dimensional matrix in which each dimension
represents the red, green and blue channel respectively.

The image is then converted to an HSV image because it is easier to handle and process.
(Reasons explained before)

The image is then thresholded within the given constraints. The constraints were decided
through statistical analysis of the constraints on sample data.

We now have a binary image each for the green threshold and red threshold.

The binary images are applied morphological transformations and geometric constraints
so that noise is removed and the number of candidate blobs are reduced.

The centroid is calculated via moments (m00, m01 and m10) and hence so was the width
and the height of individual blobs.

Using the schematics of the standard traffic light we expand the region of interest beyond
the width and height of the blob.

The region of interests are then extracted in the greyscale image of the original and then
its Haar descriptor are calculated.

These Haar descriptors serve as the input for the cascade trainer generated using
Adaboost and the sample positive and negative images.

After colour segmentation the colour information of the blob was stored.

This information coupled with the detected windows gained after the testing using the
Cascade trainer, we are able to successfully detect and recognise traffic lights on inputting
an image.

15

2011A4TS232P

Thesis report 2014-15

Fig 10: Procedure of traffic light detection and recognition

1.6 Results
We first need to understand how results are computed in the field of computer vision.
There are 2 values used to determine the efficiency of the system namely Precision and
Recall.
Fig 11:
Precision: PRECISION is the ratio of the number of
relevant records retrieved (true positives) to the total
number of irrelevant and relevant records retrieved
(i.e. (true positives + false positive) or number of
detections). It is usually expressed as a percentage.
Recall: RECALL is the ratio of the number of relevant
records retrieved (i.e. true positives) to the total
number of relevant records (i.e. (true positives + false
negatives) or total number of positive data) in the
database. It is usually expressed as a percentage.

16

2011A4TS232P

Thesis report 2014-15

In a system built for driver safety precision, recall, speed of detection and accuracy of the
recognition become of utmost importance. For the system in its current stage the
following were the results achieved:

Precision: 90%

Recall: 72%

Recognition: 100% (since it is purely based on colour segmentation)

Time to process a single frame (or image): 38 milliseconds

Below are a few test images of the algorithm:

Fig 12: The above 4 figures below are examples of some results from our
system on the LARA traffic light detection dataset

17

2011A4TS232P

Thesis report 2014-15

18

2011A4TS232P

Thesis report 2014-15

1.7 Challenges
Of course since we are dealing with real life images we are going to have to deal with
obstacles that arise from exposure to the environment.
Diverse illumination conditions: This is an issue that has been central to any computer
vision problem. Detection and recognition are solely dependent on the fact that we are
able to extract the features of the object accurately. But as it happens in external
conditions, the amount of light available is not always ideal. In cases of extremely bright
conditions, there may occur spectral highlights (where reflected light follows a Spectral
model rather than a Lambertian model of reflection) while in the case of darkness there
is no light available for the object to reflect. In either of the cases there may be a loss of
essential features leading to occurrence of false negatives.

Fig 13: Example of loss of relevant features like colour of the object and addition of irrelevant ones (like existence
of image gradients around the white spot even though the object is actually uniform) due to spectral highlighting

Motion blurring: The roads are not always smooth and neither are conditions always ideal
to operate a camera smoothly. There are bound to be many vibrations. Thus the the
camera may not accurately capture a steady image of the environment and there may be
blurring of the image which may lead to inaccurate feature extraction and a large amount
of noise.

Fig 14: Effect of motion blurring

19

2011A4TS232P

Thesis report 2014-15

Similar background: The nature of the objects that we may find in the surroundings of an
object of interest are always unpredictable. It is bound to happen that the background of
the object of interest may possess a colour within the constraint and hence not be filtered.
As a result the object and the background appear to be fused in the binary image as one
blob. As a result it may get filtered out since it may now not satisfy the geometric
constraints.
Partial Occlusions: We are not always going to have a clear view of the objects. It may
happen that a huge vehicle or tree may block our view of the traffic light. This partial
hiding of the object behind certain obstacles is what is known as occlusion. As is obvious
it is impossible to capture all the features of an object that is partially hidden from our
view. Loss of essential features may lead to it not getting detected.
Lack of robustness to location: As has been mentioned earlier there is a requirement of
positive samples to train the cascade trainer. These positive samples hence serve as a
template for the trainer to find out if our region of interest actually contains the object of
interest. But, there is no uniformity in the colour, shape, structure etc. of the traffic lights
around the world. In a country like India such features may even differ from city to city!
So it is not possible to train a cascade trainer which is robust to the location of the car or
more importantly robust to change in the features of the object of interest.

2. Design of traffic sign detection system for autonomous vehicles


2.1 Introduction
Detecting traffic lights is very important. But most traffic lights are positioned and
designed in such a way that they are easily visible to the driver. The same however cannot
be said about traffic signs. For anyone who has driven a car, is aware of the fact that traffic
signs are very easy to miss even in broad daylight. Traffic signs ensure the safety of the
driver by giving them information about what the safe speed limit is, when to stop and
check at an intersection or even information about what lies ahead such as a sharp turn
lies ahead or whether you are entering a school zone etc.

20

2011A4TS232P

Thesis report 2014-15

Fig 15: The required result of the problem


Thus when designing an intelligent driver safety system, existence of a traffic sign
detection and recognition system becomes very important. A very important requirement
for such a system (as before) is that it must be real time. Most of the techniques that have
been used before like the use of template matching and neural networks, while are
extremely accurate and more complex, but are also very slow. So there has to be a tradeoff between the accuracy and the speed of the system. The technique we have works
pretty quickly in a dense urban environment. It is still in the development phase and a lot
of work has to be done in raising the accuracy of the system.

2.2 The image and the RGB and HSV colour space
The concept that is going to be applied is the same as for the traffic lights (refer 1.2). Of
course, the sample data that is being taken is different and hence the thresholds here are
going to be different from that being observed in the traffic lights. There are primarily two
types of traffic signs that we are going to take into account when it comes to the colour
of the traffic signs, namely red and blue traffic signs. The following were the thresholds in
the HSV space:
For red signs:
H=160-15
S=40-255
V=0-255

21

2011A4TS232P

Thesis report 2014-15

For blue signs:


H=84-124
S=100-255
V=20-255
An example of the image thresholding (colour segmentation) is given below:

Fig 16: The binary images


attained after the blue
threshold (on the left) and
the red threshold (on the
right)

If you observe the traffic sings in the image you will observe how well they have been
extracted out of the dense urban background.

2.3 Hog features (Histogram Oriented Gradients)


The HOG feature descriptor is fairly simple to understand. One of the main reasons for
this is that it uses a global feature to describe the object of interest rather than a
collection of local features. Put simply, this means that the entire object is represented
by a single feature vector, as opposed to many feature vectors representing smaller parts
of the object.

22

2011A4TS232P

Thesis report 2014-15

To compute the HOG descriptor, we operate on 10x10 pixel cells within the detection
window. These cells will be organized into overlapping blocks with the overlap of 50
percent. This is termed as the block. The block is divided into 2x2 cells with each cell having
a size of 5x5. For each cell we calculate the image gradient for each pixel as shown below.

Fig 17: Calculating Image Gradients

The gradient is calculated by sliding a gradient mask (an example can be seen below) over
the entire image and calculating the values by 2 dimensional discrete convolution.

Fig 18: Sobel gradient masks in the x and y direction respectively

The general formula for convolution of a mask E over an image C is given by

where A is the resultant gradient image


Each image gradient has two parts, the magnitude and the direction of greatest
magnitude change. Thus for each cell each cell can be plotted in a histogram such as
below:

23

2011A4TS232P

Thesis report 2014-15

Fig 19: Example of a gradient histogram of a cell with 8 bins


with the x-axis representing the angle of maximum increase and the y axis representing
the magnitude. For each gradient vector, its contribution to the histogram is given by the
magnitude of the vector (so stronger gradients have a bigger impact on the histogram).
We split the contribution between the two closest bins. So, for example, if a gradient
vector has an angle of 85 degrees, then we add 1/4th of its magnitude to the bin centered
at 70 degrees, and 3/4ths of its magnitude to the bin centered at 90.
So for each cell we have 9 dimensional vector. Thus the total length of the vector for the
entire 30x30 image will be:
Total number of blocks * Total number of cells * Total number of bins
which in my system is:
25 * 4 * 9 = 900 dimensional global vector defining our image

2.4 Multiclass SVM with RBF kernel and E-SVM


While the method to extract information about the features may be simple the method
being used to extract information from the features to determine its class (i.e. type) is
certainly not. SVM stands for support vector machine. In this report I will only explain the
two class linear SVM and give a brief idea of the so called kernel trick. As mentioned
above we now have a 900 length vector giving information about the features (gradient
based features) of the image. How do we know this is a positive sample (one which
contains the traffic sign) or a negative sample based on this global feature vector?

24

2011A4TS232P

Thesis report 2014-15

We first assume a 900 dimensional space with each point, of course, being represented
by an 900 dimensional vector. We then collect positive samples, images of traffic sings in
our case and an even greater number of negative samples, images of anything other than
traffic signs. We then calculate the HOG features (the 900 length feature vector) of each
sample and plot them in this space along with information of its class (+1 for positive and
-1 for negative). We get a figure as below:

With n=900 in this case

Fig 19: The infinite linear hyper-planes separating the data


Since we are dealing with the linearly separable case we want to find a hyper-plane or an
899 dimensional plane which separates the positive from the negative samples. Now
there are infinite planes that might separate the data (as is evident from the image above).
We are in pursuit of such a plane that maximizes the gap which is formally known as the
margin. The points on the dotted line are the support vectors.
So given the data points x and their label y (as in the figure above) we can model the
problem as below:
The value of the gap is the
distance between these 2
planes (using simple coordinate
geometry)

25

Fig 20: The optimal


hyper-plane which
maximizes the margin

2011A4TS232P

Thesis report 2014-15

Here w is the normal vector to the plane and b is the distance from the origin. Both b and
w are normalized so that the two parallel hyper-planes on which the support vectors lie
are at unit distance from the hyperplane.
Thus this now becomes an optimization problem of maximizing the value of D or
minimizing the value of w. In summary the primal solution to our problem becomes:

The problem is then converted to a dual problem and when applying the KarushKuhn
Tucker conditions it is observed that the solution is dependent on the support vectors and
a few parameters (like Penalty factor and Gaussian factor), not discussed here. When a
testing variable (z) is inputted the distance of the variable is calculated by inserting the
value z instead of variable x in the equation of the optimal hyper-plane. Depending on the
sign and magnitude of the value we can determine the probability of the testing image
belonging to each class.
Of course in real situations linearly separable data is not available but instead we have
something like below (left hand side):

Fig 21: The kernel trick


So, we increment the number of dimensions from 2 to 3 as is observed in the figure above
(900 to 901 in our case) using information of the vector of the training image. This is done

26

2011A4TS232P

Thesis report 2014-15

by means of application of a kernel. The one I have used is what is shown in the figure
above and is known as the Radial Basis Function given by:

The addition of the new dimension based on the information present in the vectors helps
generate a mapping where a viable hyper-plane exists. This is the advantage of the kernel.
It is a very interesting and vast topic. Please check the references for further information.
In my traffic sign detection system there are 38 different classes which have been trained
in two different ways:
Multiclass non-linear SVM with an RBF Kernel: Similar to the process explained above,
just involving 38 different classes. Similar images belong to a particular class. The testing
images is matched against a class.
Exampler-SVM: It is a recently developed method. In this method each sample is a class
of its own and a hyper-plane exists between that sample and the rest. Here the testing
data is matched against each sample and not a class.

2.5 The pipeline


As in the case of traffic light recognition, the entire process can be imagined as an
assembly line where the image serves as the raw material.
The following are the steps involved:

The image is stored in the form of a 3 dimensional matrix in which each dimension
represents the red, green and blue channel respectively.

The image is then converted to an HSV image because it is easier to handle and process.
(Reasons explained before)

The image is then thresholded within the given constraints. The constraints were decided
through statistical analysis of the constraints on sample data.

We now have a binary image each for the blue threshold and red threshold.

27

2011A4TS232P

Thesis report 2014-15

We then find the blobs in the image and find the approximate polygon corresponding to
each blob and hence classify them as circles and triangles (2 common shapes of traffic
signs).

We then apply certain geometric constraints (size, aspect ratio etc.) on the contours
attained and hence reduce the number of testing candidates. The bounding boxes of the
remaining candidates is attained.

We then train 3 Multi Class non-linear SVMs each for the following cases:

1. Blue traffic sign (containing 8 signs / positive classes and 1 negative class)
2. Red triangle traffic signs (containing 16 signs / positive classes and 1 negative class)
3. Red circle traffic signs (containing 14 signs / positive classes and 1 negative class)

The bounding boxes are used to extract the region of interest from the grayscale image
of the original image. The image is resized to 30x30 so that it can be tested using the
appropriate trainer (one of the 3 mentioned above).

If the region of interest matches with any of the positive classes its bounding box is
marked on the original image along with its label.

2.6 Results
This system is still in a very early stage and hence precision recall measurements can yet
not be done. The computation time was attained and was roughly 0.29 seconds per image
owing to the high complexity involved in comparison to the traffic light recognition
problem. Some of the result images have been compiled in the attached folder titled
Traffic Signs (images are too large for MS Word).

2.7 Challenges
The challenges involved are exactly the same as that observed in the traffic light system
(Refer section 1.7) but are amplified owing to the more complex structure of traffic signs.
Here are a few challenges which are specific to this traffic signs.

Differentiating between the speed limit signs i.e. differentiating between the numbers
120 and 100 or 50 and 30 is extremely complex and a quicker and more accurate
process is required to distinguish such numbers.

28

2011A4TS232P

Thesis report 2014-15

Unlike traffic lights, traffic signs are not so well maintained in the urban setup and are
prone to rusting which can lead to loss of essential features.

The traffic signs are made up of highly reflective surfaces and are very susceptible to
spectral highlighting.

Traffic signs are so diverse in structure and colour that it is impossible to create a system
that works perfectly even in the same city.

Traffic signs are placed to the sides of the roads and are hence more likely to be occluded
and merge with its surroundings.

3. Conclusion
Autonomous vehicles (driverless cars) are going to be available in the not too distant
future. The problem of detecting objects of interest in the unpredictable outdoor
environment is still a very common problem that researchers are trying to solve. The
system that I have presented above is an example of one of the real time solutions that is
available to solve the problem.
There are still a large number of problems to be tackled before the system reaches a level
that in can be implemented in commercial cars. This is evident from sections 1.7 and 2.7
of the report. A major reason to these problems is that such systems rely on the
infrastructure that it is interacting with. In a country like India where roads are not smooth
(more motion blurring) and traffic lights and signs are not well maintained, developing an
intelligent system for dealing with such a high amount of unpredictability is altogether a
different and highly complex problems to solve.
However, rapid developments are being in made in computer architecture and if Moores
Law is to be believed, we may be able to emulate a human brain in the near future that
might be capable of solving such complex problems. However, for now, till that day arrives
we are going to have to rely on systems like the one shown above that have been designed
assuming nearly perfect environmental conditions and fairly uniform infrastructure (in
shape and features). As the results show, the system works pretty well and provides a
base upon which future work can be done.

29

2011A4TS232P

Thesis report 2014-15

4. References
Traffic light recognition
1

3
4

5
6
7

Conference paper:
The Recognition and Tracking of Traffic Lights Based on Color Segmentation
and CAMSHIFT for Intelligent Vehicles
Jianwei Gong, Yanhua Jiang, Guangming Xiong, Chaohua Guan, Gang Tao and
Huiyan Chen
Conference paper:
Traffic Light Recognition using Image Processing Compared to Learning
Processes
Raoul de Charette, Fawzi Nashashibi
Website (for training dataset):
http://www.lara.prd.fr/lara
Conference paper:
Rapid Object Detection using a Boosted Cascade of Simple Features
Michael jones and Paul Viola
Website:
http://www.kirupa.com/design/little_about_color_hsv_rgb.htm
Website:
http://docs.opencv.org/
Website:
http://in.mathworks.com/help/images/morphology-fundamentals-dilationand-erosion.html

Traffic sign recognition


1

Conference paper:
Traffic Sign Recognition How far are we from the solution?
Markus Mathias, Radu Timofte, Rodrigo Benenson, and Luc Van Gool
Conference paper:
Road-Sign Detection and Recognition Based on
Support Vector Machines
Saturnino Maldonado-Bascn, Hilario Gmez-Moreno
Journal Paper:
Automatic road-sign detection and classification based on support
vector machines and hog descriptors
A. Adam, C. Ioannidis
Conference paper:
Histograms of Oriented Gradients for Human Detection
Navneet Dalal and Bill Triggs

30

2011A4TS232P

Thesis report 2014-15

Website (for training dataset):


http://benchmark.ini.rub.de/

31

You might also like