Research Proposal

Running head: USING CAPSNETS FOR TRAFFIC LIGHT IMAGE RECOGNITION 1
How can Capsule Neural Networks be used to improve traffic
light image recognition applications for autonomous driving?
Keshav Shenoy
Kennesaw State University

USING CAPSNETS FOR TRAFFIC LIGHT IMAGE RECOGNITION 2
Table of Contents
Rationale of Study…………………………………………………………………………………3
Concept Map………………………………………………………………………………………4
Chapter 1: The Problem and its Setting…………………………………………………………...5
A. Statement of the Problem……………………………………………………………...5
B. Context for the Study………….………………………..……………………………..5
C. Subproblems and Hypotheses………….…………………………….………………..5
D. Definition of Terms………….…………………………………………….…………..7
E. Assumptions………….………………………………………………………………..8
F. Delimitations and Limitations………….……………………………………………...9
G. Importance of Study………….………………………………………………………10
Chapter 2: Review of Literature……...………………………………………………………….11
Chapter 3: Methodology……………...………………………………………………………….15
A. Evaluation of End Product……………………..…………………………………….15
B. Type of Design and Data…….………………..………..……………………………16
C. Development and Prototyping of Solution….……………………….………………16
D. Management Plan and Timeline……….………………………………….…………18
E. Data Tables………………….…………………………………………….…………20
References…………………………….………………………………………………………….21
Rationale of Study
Autonomous driving is a growing field of study with large numbers of potential commercial
applications. So far, contemporary autonomous driving researchers have constructed
Convolutional Neural Networks (CNN) to recognize traffic light signals, but CNNs have
significant flaws, most notably in their ability to evaluate positional data. As a result, this
research will investigate the possible benefits of implementing Capsule Neural Networks
(CapsNets) in place of CNNs. Specifically, the research will look for improvements in final
accuracy with faster minimization of loss. The basis for this hope can be found in the work of
Kumar, Arthika, and Parameswaran (2018), who implemented CapsNets in traffic sign
classification with positive results and a 97.6% accuracy (p. 4546). CNNs have been the leading
edge of image recognition for a long period of time and, as such, an alternative with a significant
improvement in performance could provide a substantial boost to the development of road-ready
vehicles. The researchers wish to observe the benefits and the drawbacks of CapsNet architecture
over that of CNNs. The comparison is multifaceted and could lead to further research for the
researcher.
Research Concept Map

Chapter 1: The Problem and its Setting
A. Statement of the Problem
a. Overarching Question
How can CapsNets be used to improve traffic light image recognition applications for
autonomous driving?
b. Purpose of the Study
This study will determine the potential of CapsNets to improve the final accuracy of traffic light
image recognition. This necessitates that the CapsNet can minimize loss at a faster rate than
current CNNs.
B. Context for the Study
Currently, according to Hinton (2018), CNNs are the predominant machine learning technique
being used for detecting and identifying objects within images (7:00). This has led to their
inclusion within multiple language libraries like Keras and TensorFlow. Unfortunately, like any
other emerging technology, there are a number of flaws with CNNs. Hinton (2018) has proposed
that many of these flaws can be remedied through the alteration of CNNs into a new, similar
neural network structure called a CapsNet (3:09). This research applies Hinton’s assertion to the
field of autonomous driving, where CNNs are used to assist autonomous motor vehicles in
identifying the objects around them.
C. Subproblem(s) and Hypothesis
Foundation Sub-problem 1: What do image recognition applications currently use for
autonomous driving?
Hypothesis: The most utilized current image recognition applications are CNN models, which
utilize feedforward supervised learning.
Independent Variable: The model of the image recognition application.
Dependent Variables: The utilization of the model.
Foundation Sub-problem 2: What is the benefit of Capsule Neural Networks over current image
recognition applications?
Hypothesis: CapsNets will have more accurate results because of positional data preservation
and dynamic routing.
Independent Variable: The model of the image recognition application.
Dependent Variable: The accuracy of the model’s results.
Applied Sub-problem 1: How do CapsNets perform when implemented in traffic light image
recognition?
Hypothesis: A trained CapsNet will recognize traffic lights from multiple autonomous driving
related image datasets with more than 85% final validation accuracy.
Independent Variables: The model, build, and design of the machine learning (ML) system that
has been trained and tested.
Dependent Variable: Final accuracy of CapsNet operating on validation data after training.
D. Definition of Terms
a. Terms
− Artificial Intelligence: Nilson (2010) defines it as “Artificial intelligence is that activity
devoted to making machines intelligent...” (as cited in Stone et al., 2016, p.12)
− Artificial Neural Network: Machine Learning using a collection of interconnected nodes,
“neurons,” loosely based on the organization of certain neurons in human brains (Rawat
& Wang, 2017, p.2354).
− Autonomous driving: An emerging technology in which artificial intelligence will control
the movement of transport vehicles instead of humans. (Stone et al., 2016, p. 7)
− Convolutional Neural Network: A type of feedforward artificial neural network built
from layers of convolutional and pooling layers (Rawat & Wang, 2017, p. 2354).
− Capsule Neural Network: A type of artificial neural network that modifies convolutional
neural networks by segmenting groups of neurons into capsules for the better evaluation
of positional data. (Hinton, 2018, 3:08)
− Image Recognition (or Image Classification): “…the task of categorizing images into one
of several predefined classes…” (Rawat & Wang, 2017, p. 2352)
− Convolutional Layers: “…serve as feature extractors, and thus they learn the feature
representations of their input images…” (Rawat & Wang, 2017, p.2355)
− Machine Learning: “…the design of learning algorithms, as well as scaling existing
algorithms, to work with extremely large data sets.” (Stone et al., 2016, p. 9)
− Pooling Layer: LeCun et al. (1989a), LeCun et al. (1989b), LeCun et al. (1998), and
Ranzato et al. (2007) claimed that pooling layers “…reduce the spatial resolution of the
feature maps and thus achieve spatial invariance to input distortions and translations” (as
cited in Rawat & Wang, 2017, p. 2356).
− Pose: A specific type of positional data, including position, orientation, scale,
deformation, velocity, color, and more, which is recorded by CapsNets. (Hinton, 2018,
3:23)
b. Acronyms and Abbreviations
− API: Application Programming Interface
− CapsNet: Capsule Neural Network
− CNN: Convolution Neural Network
− ML: Machine Learning
− TF: TensorFlow
− SVM: Support Vector Machine
E. Assumptions
It is assumed that the datasets accurately represent the population of traffic lights that
autonomous motor vehicles would encounter in practice. While traffic lights are not very
variable, some alterations exist in structure and orientation based on locale.
It is assumed that the performance of the produced CapsNet after training accurately models the
performance of a theoretical CapsNet that is trained more extensively.
It is assumed that the dataset developers annotated the datasets with the correct bounding boxes
and signal.
It is assumed that the power of the Central Processing Unit and processors of the computer used
during training and testing will not affect accuracy results.
It is assumed that the ML algorithm can be created and tested at full potential within the TF API
and Python language.
F. Delimitations and Limitations
There are many different types of ML algorithms and neural networks. The research will be
confined to the performance of CNNs and CapsNets due to their relevance and current use within
the field.
The research will limit itself to the study of accuracy, with the understanding that a high final
accuracy indicates the ability for optimization in terms of performance and speed on more
suitable processing equipment.
The research will only investigate the performance of CapsNets within the TF framework and
will not attempt to reconstruct the design within Caffe or any other ML framework.
The research will limit itself to a few levels of image quality and dimensions with the
understanding that practically applied autonomous driving applications will have similar or
greater levels of image quality.
While planning to attempt to identify relatively small traffic lights with artificial neural
networks, the research will set a minimum pixel size at around 4px width according to the futility
of classifying objects with a smaller size than that.
The research is limited to the ML area of artificial intelligence and will not examine other areas
of artificial intelligence within autonomous driving.

G. Importance of Study
By showing the performance of CapsNet technology within traffic light image recognition in
autonomous driving, this research can support or fail to support a shift in resources towards
further CapsNet research. The potential for a more powerful and accurate CNN is very
significant, because CNNs are currently within the forefront of object recognition (Hinton, 2018,
7:00). Improving upon the capabilities of CNNs with CapsNets could change how researchers
approach image recognition problems and push further forward the adoption of autonomous
motor vehicles globally as well as the incorporation of artificial intelligence and ML in common
objects.
Chapter 2: Review of Literature
Currently, the field of image object detection and recognition within ML is increasing in
importance for a number of different applications. Specifically, Fairfield and Urmson (2011)
discuss its growing significance in the field of autonomous driving, where it has been used to
build perception systems in combination with cameras (p. 1). They specifically cite the issue of
traffic light image recognition, which cannot be performed by alternative measures like sonar or
radar (p. 1), because it requires knowledge of color. As such, a large amount of development has
gone into designing the best learning algorithms for traffic light image recognition problems. So
far, Huang et al. (2017) found that the leading models used are CNNs (p. 1). Lim et al. (2017)
discusses this, describing CNN architecture as one where image data is fed through a series of
deep (convolutional) and pooling layers, as well as a kernel, to extract features for classification
(p. 11). They explain that CNN technology is state-of-the-art, needing only one network to
accurately classify traffic signs (Lim et al., 2017, p. 10).
Despite this, there are still significant issues with the CNN model. One significant
problem Liu et al. (2016) identify is balancing speed performance and accuracy (p. 1). To
alleviate some of this, Liu et al. (2016) suggest SSD (Single Shot MultiBox Detector) – a “deep
network based object detector that does not resample pixels or features for bounding box
hypotheses and and is as accurate as approaches that do” (p. 2). By replacing bounding boxes
proposals with a convolutional filter, Liu et al. (2016) are able to construct a model that operates
at higher frames per second than previous approaches with Faster R-CNN (p. 16). However, in
contrast with Liu et al.’s research, Huang et al. (2017) suggests that Faster R-CNN can be
improved to a similar speed as SSD by minimizing proposals while maintaining accuracy on
very small objects, a task SSD struggles with (p. 14). Meanwhile, in the similar field of traffic
sign image recognition, Lim et al. (2017) took a unique approach to the optimization problem by
combining a Support Vector Machine (SVM) model – an ML system which does not utilize
neural networks – with CNN technology to improve results (p. 2). SVMs were utilized first to
verify the sign and a CNN afterwards to classify the sign (Lim et al., 2017, p. 2). Lim et al.’s
(2017) combination worked out, forming a system able to classify images at real-time with
97.9% average accuracy and with improved accuracy specifically in poor lighting (p. 19). It is
difficult to compare Lim et al.’s (2017) sign model to the traffic light models of Liu et al. (2016)
or Huang et al. (2017), but the improvements of Huang et al. and Lim et al. over Liu et al. in
such a short time frame show the speed of significant advancements occurring within
autonomous vehicle image recognition applications.
Outside of the actual model usage, multiple researchers have attempted to make
improvements through external changes to structure or learning strategy. A strong example of
this are Fairfield and Urmson (2011), who show the ability for mapped traffic lights to improve
detection results within a model (p. 6). By mapping the location of traffic lights against current
location of the vehicle, a network can predict when it should expect to detect traffic lights and
when it should expect not to, reducing false positives and false negatives (Fairfield & Urmson,
2011, p. 6). Ghahramani (2015) takes a more technical approach, exploring the ability for
probabilistic frameworks – models which “make predictions about future data, and take
decisions that are rational given these predictions” (p. 1) – to increase accuracy. Tyukin et al.
(2018) mirrors this by considering the use of multiple ML models within a teacher-student
model, speeding up the training of classification algorithms and improving the universality of
models in application to data (p. 1). They improve on previous work in the field by creating a
framework for the teacher-student model which requires less raw data and training (Tyukin et al.,
2018, p. 2). Though not implemented within the context of automated driving, the success of the
model within CNN image recognition suggests its potential for the field.
More than anything else, however, the biggest challenge that has been issued against
CNNs is from Hinton (2014), who references their lack of structure as a major flaw with their
performance in handling positional data (1:47). As a way to fix this, Hinton (2011; 2014)
proposes CapsNets, similar to CNNs but with layers loosely replaced with “capsules” (p. 2;
3:09). According to Hinton (2014), capsules would output the likelihood that a feature is present
and “pose” information, which would include a large amount of positional information (3:09).
First, Hinton (2014) claims, capsules would improve massively on the current CNN
practice of max pooling, which reduces the available information in a subsampling procedure
(6:57). CapsNets get rid of pooling completely, instead using coincidence filtering to find
clusters of inputs at high dimensions, removing unwanted background inputs while keeping
useful data (Hinton, 2014, 5:26). Secondly, Sabour, Frosst, and Hinton (2017) point out the
benefits of capsules for the dynamic routing of information, specializing specific capsules for
certain tasks (p. 2). This contrasts with max pooling, which Sabour et al. (2017) states will,
“throw away information about the precise position of the entity within the region” (p. 2),
because it considers multiple input vectors, not just the most active one. These two effects, the
removal of subsampling and the introduction of dynamic routing, could lead to improvements in
a number of fields, including: digit segmentation and separation, like that performed by Hinton
& Ghahramani (2000, p. 1) and Sabour et al. (2017); traffic sign image recognition, like that
done by Lim et al. (2017); and shape analysis, like that described by Hinton (2014, 15:15). In
fact, Kumar, Arthika, and Parameswaran (2018) have already implemented CapsNets in traffic
sign image recognition with strong results: 97.6% accuracy and 0.0311038 loss at the end of
validation (p. 4546). Unfortunately, it does not seem like CapsNets have yet been applied to the
primary issue of this research, traffic light recognition. Based on the results of Kumar et al.
(2018), however, the CapsNet architecture should have a strong accuracy rating when
implemented within traffic light detection.
Discussion
From the literature, it becomes clear that there are numerous areas for potential
improvement within CapsNets that do not exist in CNNs. These include the elimination of
information loss from down-sampling suggested by Hinton et al. (2011, p. 7) and by Hinton
(2014, 6:55), as well as within dynamic routing between capsules to enable specialization
(Sabour et al., 2017, p. 2). Sabour et al. (2017) goes so far as to state that, “The fact that a simple
capsules system already gives unparalleled performance at segmenting overlapping digits is an
early indication that capsules are a direction worth exploring” (p. 9). This supports the
conclusion that CapsNets, if developed at the same level as CNNs have enjoyed, should become
one of the leading approaches towards image classification.
Further Research: Beyond just this research’s exploration of the utilization of CapsNets
in traffic light imaging, research should also be conducted into applying the improvements made
within CNN architecture to CapsNets. As an example of that, the emulation of Fairfield and
Urmson’s (2011) traffic light mapping (p. 1) or Lim’s (2017) utilization of SVMs as a pre-
processing measure (p. 1) within a CapsNet framework could provide valuable evidence towards
the potential of CapsNets within autonomous driving applications.

Chapter 3: Methodology
A. Evaluation of End Product
Applied Sub-Problem 1: How do CapsNets perform when implemented in traffic light image
recognition?
Need: A trained CapsNet will recognize traffic lights from multiple autonomous driving related
image datasets with more than 85% final validation accuracy.
Research Basis: This need provides a good basis from which to begin examination, because it
establishes clear proof of concept from which the CapsNet can improve. As detailed in the
research paper, state-of-the-art image recognition technology has reached the point of greater
than 90% accuracy after a reasonable number of iterations. Hinton (2014) describes how CNNs
have been extensively developed and improved by researchers for many years now (7:02). As
such, 85% accuracy is an ambitious, but reasonable level of accuracy to expect from an emerging
model for learning. Reaching that level supports the idea that there is potential for CapsNet
architecture to improve to the point of replacing CNN architecture in traffic light image
recognition applications in autonomous driving systems, but is not too high a bar for the newer
system to jump over.
Independent Variable: The model, build, and design of the machine learning (ML) system that
has been trained and tested.
Dependent Variable: Final accuracy of CapsNet operating on validation data after training.
B. Type of Design and Data
Type of Design: The research will utilize the Engineering Design Process to implement a
CapsNet ML system within traffic light image recognition. If the design does not meet the
evaluation criteria, another iteration will be introduced until the best possible CapsNet structure
that can be produced within the timeframe is produced. The design process assumes that the
product is possible to construct and that the successful implementations of previous researchers
will cross apply to this work, as well as the assumptions listed previously in Chapter 1.
Type of data: The data is numerical. The final accuracy is a single number taken at the end of
validation from a table of accuracy over iteration, while loss will be measured per epoch as a
residual sum of squares. Final accuracy is the number being used to evaluate the success of the
product, while loss will simply inform the researchers of how the model’s accuracy increased
over training and validation. The data is descriptive, because it is summarizing the success of the
model in classification. It also encompasses the whole scope of the network, not just a sample of
it.
C. Solution Development and Prototyping
Testing: The model evaluation is done as part of its operation, during the validation section of
the code. This section will test the code against images it has not yet seen, but that are of the
same type as those the model was trained on. The accuracy of the model in recognizing traffic
lights within these images will be the final accuracy.
Analysis: The only way to analyze the model’s accuracy data is by direct greater-than, less-than
comparison of accuracies to previous versions and the evaluation objective, because the network
is built entirely around minimizing loss and increasing accuracy. As such, it would not make
sense to analyze the model than by any other metric than its own accuracy. A model which
reaches the threshold of 85% final accuracy is successful, while a model which does not is not a
success.
Validity:
− Internal Validity: Internal validity will be increased by the randomization of all possible
assignments within the design process. These include the order the examples are read during
training and validation, which subsamples of data are used for training, which subsamples of
data are used for validation. It will also be improved by the controlling of as many variables
as possible, including the number of steps allowed within training and validation, the dataset
used, and the computer the algorithm is executed on.
− External Validity: This will be increased by trying to have as universal a coverage as possible
of traffic light images. By incorporating every type of traffic light, the model will be
applicable to almost all of the subject. This can be done by using multiple datasets, as this
research will, and by using datasets with many, diverse sets of images from multiple
locations, as this research will.
− Criterion-related Validity: If the results of the produced CapsNet in traffic light image
recognition are similar to results of other CapsNets produced for traffic sign image
recognition or other perception problems, it suggests that the model is operating correctly in
relation to the body of contemporary work.
Reliability:
− Test-Retest Reliability: If the model, run twice on the same dataset, produces the same
results, it will indicate strongly that the model is reliable.

− Inter-rater Reliability: If the model, run on two different datasets, returns similar results, it
suggests that the model is operating correctly and is not overfitted to a specific set of data
samples.
Consistency:
Both forms of reliability addressed above and the factors discussed within internal validity can
be applied to measure the consistency of the model. Additionally, because of the linear nature of
Python programs, as long as conditions for testing are controlled, the program should operate in
the same manner each time.
Feasibility:
Kumar et al.’s (2018) successful construction of a CapsNet model for traffic sign image
recognition (p. 4547) demonstrates the feasibility of the system within artificial intelligence and
ML broadly, as well as within specifically autonomous driving perception problems.
D. Design Management Plan
Week 1-2: Building on Empirical Examples: The first step is to examine previously implemented
capsule neural networks and convolutional neural networks implemented for similar problems.
By founding the most basic areas of the design from models shown to previously have success,
the research can establish the model on a stable basis from which to start the design process.
Most of these are open-source projects published by University researchers or Google/TF
employees on GitHub.
Week 3: Implement Data Processing: The first step within both CNNs and CapsNet Frameworks
is the processing of the input data into a format understandable by the neural networks. This is
achieved through a Python program which reads each pixel of the images from the traditional file
format into a 3-dimensional array of pixel values. Each image will be represented by one array,
with height, width, and RGB making up the 3 dimensions. The final input dimensions would be
HeightxWidthx3. The pre-annotated datasets utilized by this research have this data already
established with labels and ground truths within a JSON, config, or similar file. The program will
read the labelling and truth information from the file and send it to the neural network for the
training portion of the network.
Week 4-12: Implement, Train, and Test Capsule Neural Network: The second step of the design
process is to create the actual Capsule Network. This network will have the implementation in
Python for both the training and the validation portions of the Neural Network done using the TF
framework. At the end of the validation, there will be implementation to produce and plot an
accuracy and loss curve as well as to record the data into a csv or similar data file. If the
objective of 85% accuracy is not reached, the researchers will analyze further where losses in
performance could have occurred and renovate the CapsNet, iterating the design until 85%
accuracy is reached.
Tools: The entire project is done within Python, a popular machine learning language.
Additionally, the research utilizes the TF Python framework, which implements a large number
of classes, functions, and objects for ML. The TF framework provides simple, pre-implemented
methods for developing the ML algorithm, measuring the change in the dependent variable
(accuracy), and recording the artificial neural networks every few number of steps.
E. Data Tables
CapsNet Accuracy
Epochs
1 2 3 4 5 6 7 8 9
Training
Accuracy (%)
Loss
Testing
Loss
Note: The actual data table will have more than this number of epochs, depending on the
iteration amount chosen and number of data samples. The whole table is not shown for ease of
viewing.
CapsNet Loss
Epochs
1 2 3 4 5 6 7 8 9
Training
Loss (No
Loss
Units)
Testing
Loss
Note: The actual data table will have more than this number of epochs, depending on the
iteration amount chosen and number of data samples. The whole table is not shown for ease of
viewing.
References
Fairfield, N., & Urmson, C. (2011). Traffic light mapping and detection. IEEE International
Conference on Robotics and Automation. doi:10.1109/icra.2011.5980164
Ghahramani, Z. (2015, May 28). Probabilistic machine learning and artificial
intelligence. Nature,521(7553), 452-459. doi:10.1038/nature14541
Hinton, G. E., Ghahramani, Z., & Teh, Y. W. (2000). Learning to parse images. Advances in
Neural Information Processing Systems. Retrieved from the NIPS Proceedings database.
Hinton, G. E., Krizhevsky, A., & Wang, S. D. (2011). Transforming auto-encoders. Lecture
Notes in Computer Science Artificial Neural Networks and Machine Learning – ICANN
2011,44-51. doi:10.1007/978-3-642-21735-7_6
Hinton, G. E. (2018, April 12) What's wrong with convolutional nets? [Video File]. Retrieved
from https://techtv.mit.edu/collections/bcs/videos/30698-what-s-wrong-with-
convolutional-nets
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., . . . Murphy, K. (2017).
Speed/accuracy trade-offs for modern convolutional object detectors. 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
doi:10.1109/cvpr.2017.351
Kumar, A. D., Arthika, R. K., & Parameswaran, L. (2018). Novel deep learning model for traffic
sign detection using capsule networks. International Journal of Pure and Applied
Mathematics,118(20), 4543-4548. Retrieved from the Academic Publications database.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D.
(1989a). Handwritten digit recognition with a back-propagation network. In D. S.
Touretzky (Ed.), Advances in neural information processing systems, 2(pp. 396–404).
Cambridge, MA: MIT Press.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D.
(1989b). Backpropagation applied to handwritten zip code recognition. Neural
Computation, 1(4), 541–551 doi:10.1162/neco.1989.1.4.541
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
doi:10.1109/5.726791
Lim, K., Hong, Y., Choi, Y., & Byun, H. (2017). Real-time traffic sign recognition based on a
general purpose GPU and deep-learning. Plos One,12(3).
doi:10.1371/journal.pone.0173317
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., & Berg, A. C. (2016). SSD:
Single shot MultiBox detector. Computer Vision – ECCV 2016 Lecture Notes in
Computer Science,21-37. doi:10.1007/978-3-319-46448-0_2
Nilsson, N. J. (2010). The quest for artificial intelligence: A history of ideas and achievements.
Cambridge: Cambridge University Press. Available from Google Books database.
Ranzato, M. A., Huang, F. J., Boureau, Y., & LeCun, Y. (2007). Unsupervised learning of
invariant feature hierarchies with applications to object recognition. In Proceedings IEEE
Conference on Computer Vision and Pattern Recognition,1-8.
doi:10.1109/CVPR.2007.383157
Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification: A
comprehensive review. Neural Computation,29(9), 2352-2449.
doi:10.1162/neco_a_00990
Sabour, S., Frosst, N., & Geoffrey, H. E. (2017). Dynamic routing between capsules. 31st
Conference on Neural Information Processing Systems. Retrieved from the NIPS
Proceedings database
Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., … Teller, A. (2016,
September). Artificial intelligence and life in 2030." One Hundred Year Study on
Artificial Intelligence,1-52. Retrieved from http://ai100.stanford.edu/2016-report
Tyukin, I. Y., Gorban, A. N., Sofeykov, K. I., & Romanenko, I. (2018, August 13). Knowledge
transfer between artificial intelligence systems. Frontiers in Neurorobotics,12.
doi:10.3389/fnbot.2018.00049

Research Proposal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Proposal

Uploaded by

Copyright:

Available Formats

Running head: USING CAPSNETS FOR TRAFFIC LIGHT IMAGE RECOGNITION 1

How can Capsule Neural Networks be used to improve traffic

light image recognition applications for autonomous driving?

Kennesaw State University

Chapter 1: The Problem and its Setting…………………………………………………………...5

A. Statement of the Problem……………………………………………………………...5

B. Context for the Study………….………………………..……………………………..5

C. Subproblems and Hypotheses………….…………………………….………………..5

F. Delimitations and Limitations………….……………………………………………...9

Chapter 2: Review of Literature……...………………………………………………………….11

A. Evaluation of End Product……………………..…………………………………….15

B. Type of Design and Data…….………………..………..……………………………16

C. Development and Prototyping of Solution….……………………….………………16

D. Management Plan and Timeline……….………………………………….…………18

applications. So far, contemporary autonomous driving researchers have constructed

improvement in performance could provide a substantial boost to the development of road-ready

Research Concept Map

Chapter 1: The Problem and its Setting

A. Statement of the Problem

b. Purpose of the Study

B. Context for the Study

identifying the objects around them.

C. Subproblem(s) and Hypothesis

Foundation Sub-problem 1: What do image recognition applications currently use for

utilize feedforward supervised learning.

Independent Variable: The model of the image recognition application.

Dependent Variables: The utilization of the model.

and dynamic routing.

Independent Variable: The model of the image recognition application.

Dependent Variable: The accuracy of the model’s results.

has been trained and tested.

− Artificial Intelligence: Nilson (2010) defines it as “Artificial intelligence is that activity

− Artificial Neural Network: Machine Learning using a collection of interconnected nodes,

& Wang, 2017, p.2354).

− Autonomous driving: An emerging technology in which artificial intelligence will control

the movement of transport vehicles instead of humans. (Stone et al., 2016, p. 7)

− Convolutional Neural Network: A type of feedforward artificial neural network built

of positional data. (Hinton, 2018, 3:08)

of several predefined classes…” (Rawat & Wang, 2017, p. 2352)

representations of their input images…” (Rawat & Wang, 2017, p.2355)

− Machine Learning: “…the design of learning algorithms, as well as scaling existing

cited in Rawat & Wang, 2017, p. 2356).

− Pose: A specific type of positional data, including position, orientation, scale,

b. Acronyms and Abbreviations

− API: Application Programming Interface

− CapsNet: Capsule Neural Network

− CNN: Convolution Neural Network

− ML: Machine Learning

− SVM: Support Vector Machine

variable, some alterations exist in structure and orientation based on locale.

performance of a theoretical CapsNet that is trained more extensively.

during training and testing will not affect accuracy results.

and Python language.

F. Delimitations and Limitations

suitable processing equipment.

greater levels of image quality.

of classifying objects with a smaller size than that.

of artificial intelligence within autonomous driving.

Chapter 2: Review of Literature

accurately classify traffic signs (Lim et al., 2017, p. 10).

improved to a similar speed as SSD by minimizing proposals while maintaining accuracy on

autonomous vehicle image recognition applications.

improvements through external changes to structure or learning strategy. A strong example of