You are on page 1of 25

8/12/2014

Neural Networks and


Machine Learning
Self-study Seminar Report

Aditya Agarwal
2K13/SE/007
-0-

Certificate

DEPARTMENT OF SOFTWARE
ENGINEERING

This is to certify that this Seminar report entitled Neural Networks and Machine
Learning submitted by Aditya Agarwal (2K13/SE/007) in partial fulfillment for the
requirements for the award of Bachelor of Technology Degree in Software Engineering
(SE) at Delhi Technological University is an authentic work carried out by the student
under my supervision and guidance.
To the best of my knowledge, the matter embodied in the report has not been submitted to
any other university or institute for the award of any degree or diploma.

Ms. Kusum Lata


(Assistant Professor)
Dept. of Computer Engineering
Delhi Technological University
Place: DTU, Bawana Road, Delhi-110042
Date: 08/12/2014

Acknowledgement

The successful completion of any task would be incomplete without accomplishing the
people who made it possible and whose constant guidance and encouragement secured me
the success.
First of all, I am grateful to the Almighty for establishing me to complete this self-study
assignment. I owe a debt to our faculty, Ms. Kusum Lata (Assistant Professor, COE
Department) for incorporating in me the idea of a creative self-study project, helping me in
undertaking this project and also for being there whenever I needed her assistance.
I also place on record, our sense of gratitude to one and all, who directly or indirectly have
lent their helping hand in this venture.
Last, but never the least, I thank my parents for being with me, in every sense.

Abstract
The goal of the field of Machine Learning is to build computer systems that learn from
experience and that are capable to adapt to their environments. Learning techniques and
methods developed by researchers in this field have been successfully applied to a variety
of learning tasks in a broad range of areas, including, for example, text classification, gene
discovery, financial forecasting, credit card fraud detection, collaborative filtering, design
of adaptive web agents and others.
Neural Networks are an innovation in the field of machine learning and Artificial
Intelligence that was originally motivated by the goal of having machines that can mimic
the brain. A Neural Network is the representation of brain's learning approach. This brain
operates as multiprocessor and has excellent interlinked. Neural Network also can be
represented as "Parallel distributed processing" planning.
Neural Networks came to be very widely used throughout the 1980's and 1990's and for
various reasons as popularity diminished in the late 90's. But more recently, Neural
Networks have had a major recent resurgence because maybe somewhat more recently that
computers became fast enough to really run large scale Neural Networks and because of
that as well as a few other technical reasons which we'll talk about later, modern Neural
Networks today are the state of the art technique for many applications like speech
recognition, text detection etc. Digit Recognition is an application of Neural Networks
which has been dealt with in this project.

Table of Contents

S.No.
1

Topic
Chapter 1
Introduction

Page No.
1

Machine Learning

Supervised Learning

Unsupervised Learning

Neural Networks

Chapter 2
Literature Survey

Chapter 3
Discussion

Model Representation

Architecture

Algorithms

12

Hand-written digit recognition

14

Other applications

18

Conclusion

19

References

20

Chapter-1
Introduction
Machine Learning
Machine Learning is the field of study that gives computers the ability to learn without
being explicitly programmed.
A computer program is said to learn from experience E with respect to some task T and
some performance measure P, if its performance on T, as measured by P, improves with
experience E.
Various examples and applications exist:

Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering

Applications cant program by hand.


E.g., Autonomous helicopter, handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision.

Self-customizing programs
E.g., Amazon, Netflix product recommendations

Understanding human learning (brain, real AI).

Two types of learning:

Supervised learning

Unsupervised learning

Supervised learning
The term supervised learning refers to the fact that we gave the algorithm a data set in
which the "right answers" were given. Such a data set is commonly called a training data
set.
Two types of supervised learning problems:

Regression problem, that means that our goal is to predict a continuous valued
output. Let's say you want to predict housing prices by collecting and plotting data
of price vs features of house. The learning algorithm might be able to do is put a
straight line through the data and use it to predict the price for new house.

Classification problem (Logistic regression), where the goal is to predict a


discrete value output.
Let's say you want to look at medical records and try to predict of a breast cancer as
malignant or benign. The past medical records help to get a discrete outputmalignant or benign.

Unsupervised learning
In machine learning, the problem of unsupervised learning is that of trying to find hidden
structure in unlabeled data. Since the examples given to the learner are unlabeled, there is
no error or reward signal to evaluate a potential solution. This distinguishes unsupervised
learning from supervised learning and reinforcement learning.
Approaches to unsupervised learning include:

Clustering (e.g., k-means, mixture models, hierarchical clustering)

Hidden Markov models,

Blind signal separation using feature extraction techniques for dimensionality


reduction (e.g., principal component analysis, independent component analysis,
non-negative matrix factorization, singular value decomposition)

For example,
Clustering is used is in Google News and if you have not seen this before, you can
actually go to this URL news.google.com to take a look. What Google News does is
everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on
the web and it groups them into cohesive news stories.
Similarly, an example of DNA microarray data is with the idea is put a group of different
individuals and for each of them, you measure how much they do or do not have a certain
gene.

Neural Networks
In computer science, artificial neural networks (ANNs) are forms of computer architecture
inspired by biological neural networks (the central nervous systems of animals, in
particular the brain) and are used to estimate or approximate functions that can depend on a
large number of inputs and are generally unknown. Artificial neural networks are generally
presented as systems of interconnected "neurons" which can compute values from inputs,
and are capable of machine learning as well as pattern recognition thanks to their adaptive
nature.
Examinations of the human's central nervous system inspired the concept of neural
networks. In an Artificial Neural Network, simple artificial nodes, known as "neurons",
"neurodes", "processing elements" or "units", are connected together to form a network
which mimics a biological neural network.
There is no single formal definition of what an artificial neural network is. However, a class
of statistical models may commonly be called "Neural" if they possess the following
characteristics:
1. It consist of sets of adaptive weights, i.e. numerical parameters that are tuned by a
learning algorithm, and
2. They are capable of approximating non-linear functions of their inputs.

Non-Linear Hypothesis

For many machine learning problems, number of features, n will be pretty large. For
example, consider the problem of computer vision on the previous page.
So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features,
that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we
would end up with a total of three million features. And that's just too large to be
reasonable; the computation would be very expensive to find and to represent all of these
three million features per training example.
So, simple logistic regression together with adding in maybe the quadratic or the cubic
features - that's just not a good way to learn complex nonlinear hypotheses when n is large
because you just end up with too many features.
The problem can be stated that it is difficult to design an algorithm to do what the brain
does even when features are large. The solution is hence to model the brain itself.

Chapter-2
Literature Survey
Warren McCulloch and Walter Pitts (1943) created a computational model for neural
networks based on mathematics and algorithms. They called this model threshold logic.
The model paved the way for neural network research to split into two distinct approaches.
One approach focused on biological processes in the brain and the other focused on the
application of neural networks to artificial intelligence.
Frank Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition
based on a two-layer learning computer network using simple addition and subtraction.
With mathematical notation, Rosenblatt also described circuitry not in the basic
perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation
could not be processed until after the back propagation algorithm was created by Paul
Werbos (1975).
In the 1990s, neural networks were overtaken in popularity in machine learning by support
vector machines and other, much simpler methods such as linear classifiers. Renewed
interest in neural nets was sparked in the 2000s by the advent of deep learning.
Between 2009 and 2012, the recurrent neural networks and deep feed forward neural
networks developed in the research group of Jrgen Schmidhuber at the Swiss AI Lab
IDSIA have won eight international competitions in pattern recognition and machine
learning.
Such neural networks also were the first artificial pattern recognizers to achieve humancompetitive or even superhuman performance on benchmarks such as traffic sign
recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun and
colleagues at NYU.
This work is in direct correspondence to the recent multi layered neural network
architecture, and its algorithms and applications in handwritten digit recognition.

Chapter-3
Discussion

Model Representation

These are called neuro-rewiring experiments. There's this sense that if the same piece of
physical brain tissue can process sight or sound or touch then maybe there is one learning
algorithm that can process sight or sound or touch.
And instead of needing to implement a thousand different programs or a thousand different
algorithms to do, you know, the
thousand wonderful things that
the brain does, maybe what we
need to do is figure out some
approximation or to whatever
the brain's learning algorithm is
and implement that and that the
brain learned by itself how to
process these different types of
data.

This is logistic model of a neuron with x1, x2 and x3 being the three features, x0 being
the bias unit equal to 1 and h (x) is the sigmoid (logistic) activation function that uses
the feature vector x and the parameter vector . Here, 0, 1, 2, 3 are the parameters
or the weights assigned to x0, x1, x2, x3 respectively.

Above is different neurons strung together called a neural network.


The first layer, also called the input layer is where we input our features, x1 x2 x3.
The final layer also called the output layer has the neuron that outputs the final value
computed by a hypotheses.
The layer two in between, is called the hidden layer in which the neurons represent the
features learnt by the neural network from the input features and the learnt parameters.

Architecture

If network has sj units in layer j, sj+1 units in layer j+1, then (j) will be of dimension
sj+1 x (sj +1).
There are various architectures of neural networks possible:
Feed-forward neural networks

These are the commonest type of neural network in practical applications.

The first layer is the input and the last layer is the output.

If there is more than one hidden layer, we call them deep neural networks.

They compute a series of transformations that change the similarities between


cases.

The activities of the neurons in each layer are a non-linear function of the
activities in the layer below.

Recurrent Networks

These have directed cycles in their connection graph.

That means you can sometimes get back to where you started by following
the arrows.

They can have complicated dynamics and this can make them very difficult to train.

There is a lot of interest at present in finding efficient ways of training


recurrent nets.

They are more biologically realistic.

Symmetrically connected networks

These are like recurrent networks, but the connections between units are
symmetrical (they have the same weight in both directions).

John Hopfield (and others) realized that symmetric networks are much
easier to analyze than recurrent networks.

They are also more restricted in what they can do because they obey an
energy function.

For example, they cannot model cycles.

Symmetrically connected nets without hidden units are called Hopfield nets.

Algorithms
Forward propagation algorithm
The process of computing h(x) is called forward propagation where we start off with the
activations of the input-units and then we sort of forward-propagate that to the hidden layer
and compute the activations of the hidden layer and then of the output layer. A vector wise
implementation of this procedure is given below.

1. Calculation of activations

2. Vectorisation of input features and the activations

3. Forward propagation step of calculating hidden layers from input layer and output
layer from the last hidden layer using sigmoid activation function.

Back propagation algorithm


The main objective is to find parameters theta to try to minimize the cost function j (theta)
in order to use either gradient descent or one of the advance optimization algorithms.
Follwing are the steps:
1. First convert the discrepancy between each output and its target value into an error
derivative.
2. Then compute error derivatives in each hidden layer from error derivatives in the
layer above.
3. Then use error derivatives w.r.t. activities to get error derivatives w.r.t. the
incoming weights.
4. Finally use gradient descent or any other technique to minimize the error cost
function.

Handwritten Digit Recognition


We can use multi-class logistic regression to recognize handwritten digits. However,
logistic regression cannot form more complex hypotheses as it is only a linear classifier.
Thus one can implement a neural network to recognize handwritten digits using the
MNIST database of handwritten digits. The neural network will be able to represent
complex models that form non-linear hypotheses. One goal is to implement the feedforward propagation algorithm to use already given weights for prediction. Next goal is to
write the back-propagation algorithm for learning the neural network parameters.
Model representation
Our neural network is shown in Figure 2. It has 3
layers: an input layer, a hidden layer and an output
layer. Our inputs are pixel values of digit images. Since
the images are of size 20x20, this gives us 400 input
layer units (excluding the extra bias unit which always
outputs +1). There are 5000 training examples in
ex3data1.mat. Each pixel is represented by a floating
point number indicating the grayscale intensity at that
location. The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of
these training examples becomes a single row in our data matrix X. This gives us a 5000
by 400 matrix X where every row is a training example for a handwritten digit image.
The second part of the training set is a 5000-dimensional vector y that contains labels for
the training set. We have mapped the digit zero to the value ten, while the digits \1" to \9"
are labeled as 1 to9 in their natural order.

Feed-forward Propagation and Prediction


Feed-forward propagation for the neural network is implemented. The code in predict.m
returns the neural network's prediction.
The feed-forward computation computes h(x(i)) for every example i and returns the
associated predictions. Predict function is called using the loaded set of parameters for
Theta1 and Theta2. The accuracy is about 97.5%.
Cost Function
The cost function for the neural network (without regularization) is

where h(x(i)) is computed and K = 10 is the total number of possible labels.

The regularized cost function is

where if lambda = 1, the cost is about 0.383770.


Back propagation
Implement the backpropagation algorithm to compute the gradients for the parameters for
the (unregularized) neural network. After you have verified that your gradient computation

for the unregularized case is correct, you will implement the gradient for the regularized
neural network.
When training neural networks, it is important to randomly initialize the parameters for
symmetry breaking. One effective strategy for random initialization is to randomly select
values for theta (l) uniformly in the range -0.12 to 0.12. This range of values ensures that
the parameters are kept small and makes the learning more efficient.

Given a training example (x(t); y(t)), we will first run a forward pass to compute all the
activations throughout the network, including the output value of the hypothesis h_(x).
Then, for each node j in layer l, we would like to compute an error term delta(l)j that
measures how much that node was responsible for any errors in our output.
For an output node, we can directly measure the difference between the network's
activation and the true target value, and use that to define delta(3)j (since layer 3 is the
output layer). For the hidden units, you will compute delta(l)j based on a weighted average
of the error terms of the nodes in layer (L + 1).

Other applications
1. Integration of fuzzy logic into neural networks
Fuzzy logic is a type of logic that recognizes more than simple true and false
values, hence better simulating the real world. For example, the statement today is
sunny might be 100% true if there are no clouds, 80% true if there are a few clouds,
50% true if it's hazy, and 0% true if rains all day. Hence, it takes into account
concepts like -usually, somewhat, and sometimes.
Fuzzy logic and neural networks have been integrated for uses as diverse as
automotive engineering, applicant screening for jobs, the control of a crane, and the
monitoring of glaucoma.
2. Pulsed neural networks
Most practical applications of artificial neural networks are based on a
computational model involving the propagation of continuous variables from one
processing unit to the next. In recent years, data from neurobiological experiments
have made it increasingly clear that biological neural networks, which
communicate through pulses, use the timing of the pulses to transmit
information and perform computation. This realization has stimulated
significant research on pulsed neural networks, including theoretical analyses and
model development, neurobiological modeling, and hardware implementation."
3. NNs might, in the future, allow:

robots that can see, feel, and predict the world around them

improved stock prediction

common usage of self-driving cars

composition of music

handwritten documents to be automatically transformed into formatted word


processing documents

trends found in the human genome to aid in the understanding of the data
compiled by the Human Genome Project

self-diagnosis of medical problems using neural networks

and much more!

Conclusion
Perhaps the greatest advantage of Neural Networks is their ability to be used as an arbitrary
function approximation mechanism that 'learns' from observed data. However, using them
is not so straightforward, and a relatively good understanding of the underlying theory is
essential.

Choice of model: This will depend on the data representation and the application.
Overly complex models tend to lead to problems with learning.

Learning algorithm: There are numerous trade-offs between learning algorithms.


Almost any algorithm will work well with the correct hyperparameters for training on
a particular fixed data set. However, selecting and tuning an algorithm for training on
unseen data requires a significant amount of experimentation.

Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.

With the correct implementation, ANNs can be used naturally in online learning and large
data set applications. Their simple implementation and the existence of mostly local
dependencies exhibited in the structure allows for fast, parallel implementations in
hardware.

References

1. class.coursera.org/ml-007/lecture
2. cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/Future/index.
html
3. ima.ac.uk/slides/nzk-02-06-2009.pdf
4. L. Neumann and J. Matas, A method for text localization and recognition in realworld images, in Computer Vision ACCV 2010, ser. Lecture Notes in Computer
Science, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Springer Berlin /
Heidelberg, 2011, vol. 6494, pp. 770783.
5. papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagationnetwork.pdf
6. Steven Bell, Text Detection And Recognition in Natural Images in CS 231A
(Computer Vision) Stanford University, 2011

You might also like