Professional Documents
Culture Documents
In this article, I’ll show you a toy example to learn the XOR logical
function. My objective is to make it as easy as possible for you to
to see how the basic ideas work, and to provide a basis from
which you can experiment further. In real applications, you would
not write these programs from scratch (except we do use numpy
for the low-level number crunching), you would use libraries such
as Keras, Tensorflow, SciKit-Learn, etc.
Artificial Neural Networks. When you have read this post, you
might like to visit A Neural Network in Python, Part 2: activation
functions, bias, SGD, etc.
This less-than-20-lines program learns how the exclusive-or logic function works. This function is true only if
both inputs are different. Here is the truth-table for xor:
a b a xor b
0 0 0
0 1 1
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1 0 1
1 1 0
Main variables:
Wh & Wz are the weight matrices, of dimension previous layer size * next layer size.
X is the input matrix, dimension 4 * 2 = all combinations of 2 truth values.
Y is the corresponding target value of XOR of the 4 pairs of values in X.
Z is the vector of learned values for XOR.
Walk-through
1. We use numpy, because we’ll be using matrices and vectors. There are no ‘neuron’ objects in the code,
rather, the neural network is encoded in the weight matrices.
2. Our hyperparameters (fancy word in AI for parameters) are epochs (lots) and layer sizes. Since the input data
comprises 2 operands for the XOR operation, the input layer devotes 1 neuron per operand. The result of the
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
XOR operation is one truth value, so we have one output node. The hidden layer can have any number of
nodes, 3 seems sufficient, but you should experiment with this.
3. The successive values of our training data add another dimension at each layer (or matrix) so the input matrix
X is 4 * 2, representing all possible combinations of truth value pairs. The training data Y is 4 values
corresponding to the result of XOR on those combinations.
4. An activation function corresponds to the biological phenomenon of a neuron ‘firing’, i.e. triggering a nerve
signal when the neuron’s inputs combine in some appropriate way. It has to be chosen so as to cause
reasonably proportionate outputs within a small range, for small changes of input. We’ll use the very popular
sigmoid function, but note that there are others. We also need the sigmoid derivative for backpropagation.
5. Initialise the weights. Setting them all to the same value, e.g. zero, would be a poor choice because the
weights are very likely to end up different from each other and we should help that along with this ‘symmetry-
breaking’.
c. Now we compare the guess with the training date, i.e. Y – Z, giving E.
d. Finally, backpropagation. This comprises computing changes (deltas) which are multiplied (specifically,
via the dot product) with the values at the hidden and input layers, to provide increments for the
appropriate weights. If any neuron values are zero or very close, then they aren’t contributing much and
might as well not be there. The sigmoid derivative (greatest at zero) used in the backprop will help to
push values away from zero. The sigmoid activation function shapes the output at each layer.
E is the final error Y – Z.
dZ is a change factor dependent on this error magnified by the slope of Z; if its steep we need to
change more, if close to zero, not much. The slope is sigmoid_(Z).
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Finally, Wz and Wn are adjusted applying those deltas to the inputs at their layers, because the
larger they are, the more the weights need to be tweaked to absorb the effect of the next forward
prop. The input values are the value of the gradient that is being descended; we’re moving the
weights down towards the minimum value of the cost function.
If you want to understand the code at more than a hand-wavey level, study the backpropagation
algorithm mathematical derivation such as this one or this one so you appreciate the delta rule,
which is used to update the weights. Essentially, its the partial derivative chain rule doing the
backprop grunt work. Even if you don’t fully grok the math derivation at least check out the 4
equations of backprop, e.g. as listed here (click on the Backpropagation button near the bottom)
and here because those are where the code ultimately derives from.
The matrix multiplication going from the input layer to the hidden layer looks like this:
X00 X01 Wh00 Wh01 Wh02 X00*Wh00 + X01*Wh10 X00*Wh01 + X01*Wh11 X00*Wh02 + X01*Wh12
X10 X11 Wh10 Wh11 Wh12 X10*Wh00 + X11*Wh10 X10*Wh01 + X11*Wh11 X10*Wh02 + X10*Wh12
=
X20 X21 X20*Wh00 + X21*Wh10 X20*Wh01 + X21*Wh11 X20*Wh02 + X21*Wh12
The X matrix holds the training data, excluding the required output values. Visualise it being rotated 90
degrees clockwise and fed one pair at a time into the input layer (X00 and X01, etc). They go across each
column of the weight matrix Wh for the hidden layer to produce the first row of the result H, then the next etc,
until all rows of the input data have gone in. H is then fed into the activation function, ready for the
corresponding step from the hidden to the output layer Z.
[[ 0.01288433]
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[ 0.99223799]
[ 0.99223787]
[ 0.00199393]]
You won’t get the exact same results, but the first and last numbers should be close to zero, while the 2 inner
numbers should be close to 1. You might have preferred exact 0s and 1s, but our learning process is
analogue rather than digital; you could always just insert a final test to convert ‘nearly 0’ to 0, and ‘nearly 1’ to
1!
Here’s an improved version, inspired by SimpleXOR mentioned in the Reddit post in Further Reading, below.
It has no (or linear) activation on the output layer and gets more accurate results faster.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Output should look something like this:
[[ 6.66133815e-15]
[ 1.00000000e+00]
[ 1.00000000e+00]
[ 8.88178420e-15]]
Part 2 will build on this example, introducing biases, graphical visualisation, learning a math function (sine),
etc…
Further Reading
Artificial Neural Networks, Wikipedia
A Neural Network in 11 lines of Python (Part 1)
A Neural Network in 13 lines of Python (Part 2 – Gradient Descent)
Neural Networks and Deep Learning (Michael Nielsen)
Implementing a Neural Network from Scratch in Python
Python Tutorial: Neural Networks with backpropagation for XOR using one hidden layer
Neural network with numpy
Can anyone share a simplest neural network from scratch in python? (Reddit)
Neural Networks Demystified (Youtube)
A Neural Network in Python, Part 2: activation functions, bias, SGD, etc
Share this:
2 2
More
Like this:
Loading...
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Related
A Neural Network in Python, Easy AI with Python 3 Easy Graph Plotting with
Part 2: activation functions, Pyplot
19th May 2016
bias, SGD, etc.
7th May 2015
9th February 2017 In "Advanced"
In "Basic"
In "Advanced"
4 Comments
1. A Brief Introduction to Artificial Neural Networks - AI Linux says:
31st January 2017 at 3:17 pm
[…] This short article gives you a high-level overview of the AI technique known as artificial neural
networks (ANN). The objective is to convey intuition rather than rigour, sufficient for example to
understand this Python code. After reading this, you might like to follow up with the Further Reading list
below. I have made heavy use of Wikipedia (and the listed resources) but any errors are likely my own.
There is a companion article over on Python3.codes: A Neural Network in Python, Part 1: sigmoid
function, gradient descent & backpropagation […]
2. Can anyone share a simplest neural network from scratch in python? - Artificial
Intelligence News says:
10th February 2017 at 10:53 am
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[…] code http :// python3. codes/ neural-network-python-part-1-sigmoid-function-gradient-descent-
backpropagat… shared by […]
[…] A Neural Network in Python, Part 1: sigmoid function, gradient descent & backpropagation […]
[…] http://python3.codes/neural-network-python-part-1-sigmoid-function-gradient-descent-
backpropagation/ […]
Pinbin Theme by Color Awesomeness | Copyright 2018 Python3 Codes | Powered by WordPress
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD