Professional Documents
Culture Documents
Aditya Agarwal
2K13/SE/007
-0-
Certificate
DEPARTMENT OF SOFTWARE
ENGINEERING
This is to certify that this Seminar report entitled Neural Networks and Machine
Learning submitted by Aditya Agarwal (2K13/SE/007) in partial fulfillment for the
requirements for the award of Bachelor of Technology Degree in Software Engineering
(SE) at Delhi Technological University is an authentic work carried out by the student
under my supervision and guidance.
To the best of my knowledge, the matter embodied in the report has not been submitted to
any other university or institute for the award of any degree or diploma.
Acknowledgement
The successful completion of any task would be incomplete without accomplishing the
people who made it possible and whose constant guidance and encouragement secured me
the success.
First of all, I am grateful to the Almighty for establishing me to complete this self-study
assignment. I owe a debt to our faculty, Ms. Kusum Lata (Assistant Professor, COE
Department) for incorporating in me the idea of a creative self-study project, helping me in
undertaking this project and also for being there whenever I needed her assistance.
I also place on record, our sense of gratitude to one and all, who directly or indirectly have
lent their helping hand in this venture.
Last, but never the least, I thank my parents for being with me, in every sense.
Abstract
The goal of the field of Machine Learning is to build computer systems that learn from
experience and that are capable to adapt to their environments. Learning techniques and
methods developed by researchers in this field have been successfully applied to a variety
of learning tasks in a broad range of areas, including, for example, text classification, gene
discovery, financial forecasting, credit card fraud detection, collaborative filtering, design
of adaptive web agents and others.
Neural Networks are an innovation in the field of machine learning and Artificial
Intelligence that was originally motivated by the goal of having machines that can mimic
the brain. A Neural Network is the representation of brain's learning approach. This brain
operates as multiprocessor and has excellent interlinked. Neural Network also can be
represented as "Parallel distributed processing" planning.
Neural Networks came to be very widely used throughout the 1980's and 1990's and for
various reasons as popularity diminished in the late 90's. But more recently, Neural
Networks have had a major recent resurgence because maybe somewhat more recently that
computers became fast enough to really run large scale Neural Networks and because of
that as well as a few other technical reasons which we'll talk about later, modern Neural
Networks today are the state of the art technique for many applications like speech
recognition, text detection etc. Digit Recognition is an application of Neural Networks
which has been dealt with in this project.
Table of Contents
S.No.
1
Topic
Chapter 1
Introduction
Page No.
1
Machine Learning
Supervised Learning
Unsupervised Learning
Neural Networks
Chapter 2
Literature Survey
Chapter 3
Discussion
Model Representation
Architecture
Algorithms
12
14
Other applications
18
Conclusion
19
References
20
Chapter-1
Introduction
Machine Learning
Machine Learning is the field of study that gives computers the ability to learn without
being explicitly programmed.
A computer program is said to learn from experience E with respect to some task T and
some performance measure P, if its performance on T, as measured by P, improves with
experience E.
Various examples and applications exist:
Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
Self-customizing programs
E.g., Amazon, Netflix product recommendations
Supervised learning
Unsupervised learning
Supervised learning
The term supervised learning refers to the fact that we gave the algorithm a data set in
which the "right answers" were given. Such a data set is commonly called a training data
set.
Two types of supervised learning problems:
Regression problem, that means that our goal is to predict a continuous valued
output. Let's say you want to predict housing prices by collecting and plotting data
of price vs features of house. The learning algorithm might be able to do is put a
straight line through the data and use it to predict the price for new house.
Unsupervised learning
In machine learning, the problem of unsupervised learning is that of trying to find hidden
structure in unlabeled data. Since the examples given to the learner are unlabeled, there is
no error or reward signal to evaluate a potential solution. This distinguishes unsupervised
learning from supervised learning and reinforcement learning.
Approaches to unsupervised learning include:
For example,
Clustering is used is in Google News and if you have not seen this before, you can
actually go to this URL news.google.com to take a look. What Google News does is
everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on
the web and it groups them into cohesive news stories.
Similarly, an example of DNA microarray data is with the idea is put a group of different
individuals and for each of them, you measure how much they do or do not have a certain
gene.
Neural Networks
In computer science, artificial neural networks (ANNs) are forms of computer architecture
inspired by biological neural networks (the central nervous systems of animals, in
particular the brain) and are used to estimate or approximate functions that can depend on a
large number of inputs and are generally unknown. Artificial neural networks are generally
presented as systems of interconnected "neurons" which can compute values from inputs,
and are capable of machine learning as well as pattern recognition thanks to their adaptive
nature.
Examinations of the human's central nervous system inspired the concept of neural
networks. In an Artificial Neural Network, simple artificial nodes, known as "neurons",
"neurodes", "processing elements" or "units", are connected together to form a network
which mimics a biological neural network.
There is no single formal definition of what an artificial neural network is. However, a class
of statistical models may commonly be called "Neural" if they possess the following
characteristics:
1. It consist of sets of adaptive weights, i.e. numerical parameters that are tuned by a
learning algorithm, and
2. They are capable of approximating non-linear functions of their inputs.
Non-Linear Hypothesis
For many machine learning problems, number of features, n will be pretty large. For
example, consider the problem of computer vision on the previous page.
So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features,
that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we
would end up with a total of three million features. And that's just too large to be
reasonable; the computation would be very expensive to find and to represent all of these
three million features per training example.
So, simple logistic regression together with adding in maybe the quadratic or the cubic
features - that's just not a good way to learn complex nonlinear hypotheses when n is large
because you just end up with too many features.
The problem can be stated that it is difficult to design an algorithm to do what the brain
does even when features are large. The solution is hence to model the brain itself.
Chapter-2
Literature Survey
Warren McCulloch and Walter Pitts (1943) created a computational model for neural
networks based on mathematics and algorithms. They called this model threshold logic.
The model paved the way for neural network research to split into two distinct approaches.
One approach focused on biological processes in the brain and the other focused on the
application of neural networks to artificial intelligence.
Frank Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition
based on a two-layer learning computer network using simple addition and subtraction.
With mathematical notation, Rosenblatt also described circuitry not in the basic
perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation
could not be processed until after the back propagation algorithm was created by Paul
Werbos (1975).
In the 1990s, neural networks were overtaken in popularity in machine learning by support
vector machines and other, much simpler methods such as linear classifiers. Renewed
interest in neural nets was sparked in the 2000s by the advent of deep learning.
Between 2009 and 2012, the recurrent neural networks and deep feed forward neural
networks developed in the research group of Jrgen Schmidhuber at the Swiss AI Lab
IDSIA have won eight international competitions in pattern recognition and machine
learning.
Such neural networks also were the first artificial pattern recognizers to achieve humancompetitive or even superhuman performance on benchmarks such as traffic sign
recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun and
colleagues at NYU.
This work is in direct correspondence to the recent multi layered neural network
architecture, and its algorithms and applications in handwritten digit recognition.
Chapter-3
Discussion
Model Representation
These are called neuro-rewiring experiments. There's this sense that if the same piece of
physical brain tissue can process sight or sound or touch then maybe there is one learning
algorithm that can process sight or sound or touch.
And instead of needing to implement a thousand different programs or a thousand different
algorithms to do, you know, the
thousand wonderful things that
the brain does, maybe what we
need to do is figure out some
approximation or to whatever
the brain's learning algorithm is
and implement that and that the
brain learned by itself how to
process these different types of
data.
This is logistic model of a neuron with x1, x2 and x3 being the three features, x0 being
the bias unit equal to 1 and h (x) is the sigmoid (logistic) activation function that uses
the feature vector x and the parameter vector . Here, 0, 1, 2, 3 are the parameters
or the weights assigned to x0, x1, x2, x3 respectively.
Architecture
If network has sj units in layer j, sj+1 units in layer j+1, then (j) will be of dimension
sj+1 x (sj +1).
There are various architectures of neural networks possible:
Feed-forward neural networks
The first layer is the input and the last layer is the output.
If there is more than one hidden layer, we call them deep neural networks.
The activities of the neurons in each layer are a non-linear function of the
activities in the layer below.
Recurrent Networks
That means you can sometimes get back to where you started by following
the arrows.
They can have complicated dynamics and this can make them very difficult to train.
These are like recurrent networks, but the connections between units are
symmetrical (they have the same weight in both directions).
John Hopfield (and others) realized that symmetric networks are much
easier to analyze than recurrent networks.
They are also more restricted in what they can do because they obey an
energy function.
Symmetrically connected nets without hidden units are called Hopfield nets.
Algorithms
Forward propagation algorithm
The process of computing h(x) is called forward propagation where we start off with the
activations of the input-units and then we sort of forward-propagate that to the hidden layer
and compute the activations of the hidden layer and then of the output layer. A vector wise
implementation of this procedure is given below.
1. Calculation of activations
3. Forward propagation step of calculating hidden layers from input layer and output
layer from the last hidden layer using sigmoid activation function.
for the unregularized case is correct, you will implement the gradient for the regularized
neural network.
When training neural networks, it is important to randomly initialize the parameters for
symmetry breaking. One effective strategy for random initialization is to randomly select
values for theta (l) uniformly in the range -0.12 to 0.12. This range of values ensures that
the parameters are kept small and makes the learning more efficient.
Given a training example (x(t); y(t)), we will first run a forward pass to compute all the
activations throughout the network, including the output value of the hypothesis h_(x).
Then, for each node j in layer l, we would like to compute an error term delta(l)j that
measures how much that node was responsible for any errors in our output.
For an output node, we can directly measure the difference between the network's
activation and the true target value, and use that to define delta(3)j (since layer 3 is the
output layer). For the hidden units, you will compute delta(l)j based on a weighted average
of the error terms of the nodes in layer (L + 1).
Other applications
1. Integration of fuzzy logic into neural networks
Fuzzy logic is a type of logic that recognizes more than simple true and false
values, hence better simulating the real world. For example, the statement today is
sunny might be 100% true if there are no clouds, 80% true if there are a few clouds,
50% true if it's hazy, and 0% true if rains all day. Hence, it takes into account
concepts like -usually, somewhat, and sometimes.
Fuzzy logic and neural networks have been integrated for uses as diverse as
automotive engineering, applicant screening for jobs, the control of a crane, and the
monitoring of glaucoma.
2. Pulsed neural networks
Most practical applications of artificial neural networks are based on a
computational model involving the propagation of continuous variables from one
processing unit to the next. In recent years, data from neurobiological experiments
have made it increasingly clear that biological neural networks, which
communicate through pulses, use the timing of the pulses to transmit
information and perform computation. This realization has stimulated
significant research on pulsed neural networks, including theoretical analyses and
model development, neurobiological modeling, and hardware implementation."
3. NNs might, in the future, allow:
robots that can see, feel, and predict the world around them
composition of music
trends found in the human genome to aid in the understanding of the data
compiled by the Human Genome Project
Conclusion
Perhaps the greatest advantage of Neural Networks is their ability to be used as an arbitrary
function approximation mechanism that 'learns' from observed data. However, using them
is not so straightforward, and a relatively good understanding of the underlying theory is
essential.
Choice of model: This will depend on the data representation and the application.
Overly complex models tend to lead to problems with learning.
Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online learning and large
data set applications. Their simple implementation and the existence of mostly local
dependencies exhibited in the structure allows for fast, parallel implementations in
hardware.
References
1. class.coursera.org/ml-007/lecture
2. cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/Future/index.
html
3. ima.ac.uk/slides/nzk-02-06-2009.pdf
4. L. Neumann and J. Matas, A method for text localization and recognition in realworld images, in Computer Vision ACCV 2010, ser. Lecture Notes in Computer
Science, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Springer Berlin /
Heidelberg, 2011, vol. 6494, pp. 770783.
5. papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagationnetwork.pdf
6. Steven Bell, Text Detection And Recognition in Natural Images in CS 231A
(Computer Vision) Stanford University, 2011