You are on page 1of 10

About Python Resources Why Python?

Tags Richmond Sites

Advanced, Artificial Intelligence

← A Neural Network in Python, Part 1: →

sigmoid function, gradient descent &


Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
backpropagation
By Monty, 31st January 2017

In this article, I’ll show you a toy example to learn the XOR logical
function. My objective is to make it as easy as possible for you to
to see how the basic ideas work, and to provide a basis from
which you can experiment further. In real applications, you would
not write these programs from scratch (except we do use numpy
for the low-level number crunching), you would use libraries such
as Keras, Tensorflow, SciKit-Learn, etc.

What do you need to know to understand the code here? Python


3, numpy, and some linear algebra (e.g. vectors and matrices). If
you want to proceed deeper into the topic, some calculus, e.g.
partial derivatives would be very useful, if not essential. If you
aren’t already familiar with the basic principles of ANNs, please
read the sister article over on AILinux.net: A Brief Introduction to Artificial Neural Network

Artificial Neural Networks. When you have read this post, you
might like to visit A Neural Network in Python, Part 2: activation
functions, bias, SGD, etc.

This less-than-20-lines program learns how the exclusive-or logic function works. This function is true only if
both inputs are different. Here is the truth-table for xor:

a b a xor b

0 0 0

0 1 1

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1 0 1

1 1 0

Main variables:
Wh & Wz are the weight matrices, of dimension previous layer size * next layer size.
X is the input matrix, dimension 4 * 2 = all combinations of 2 truth values.
Y is the corresponding target value of XOR of the 4 pairs of values in X.
Z is the vector of learned values for XOR.

1 #   XOR.py-A very simple neural network to do exclusive or.


2 import numpy as np
3  
4 epochs = 60000           # Number of iterations
5 inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
6  
7 X = np.array([[0,0], [0,1], [1,0], [1,1]])
8 Y = np.array([ [0],   [1],   [1],   [0]])
9  
10 def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
11 def sigmoid_(x): return x * (1 - x)             # derivative of sigmoid
12                                                 # weights on layer inputs
13 Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
14 Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
15  
16 for i in range(epochs):
17  
18     H = sigmoid(np.dot(X, Wh))                  # hidden layer results
19     Z = sigmoid(np.dot(H, Wz))                  # output layer results
20     E = Y - Z                                   # how much we missed (error)
21     dZ = E * sigmoid_(Z)                        # delta Z
22     dH = dZ.dot(Wz.T) * sigmoid_(H)             # delta H
23     Wz +=  H.T.dot(dZ)                          # update output layer weights
24     Wh +=  X.T.dot(dH)                          # update hidden layer weights
25      
26 print(Z)                # what have we learnt?

Walk-through
1. We use numpy, because we’ll be using matrices and vectors. There are no ‘neuron’ objects in the code,
rather, the neural network is encoded in the weight matrices.

2. Our hyperparameters (fancy word in AI for parameters) are epochs (lots) and layer sizes. Since the input data
comprises 2 operands for the XOR operation, the input layer devotes 1 neuron per operand. The result of the

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
XOR operation is one truth value, so we have one output node. The hidden layer can have any number of
nodes, 3 seems sufficient, but you should experiment with this.

3. The successive values of our training data add another dimension at each layer (or matrix) so the input matrix
X is 4 * 2, representing all possible combinations of truth value pairs. The training data Y is 4 values
corresponding to the result of XOR on those combinations.

4. An activation function corresponds to the biological phenomenon of a neuron ‘firing’, i.e. triggering a nerve
signal when the neuron’s inputs combine in some appropriate way. It has to be chosen so as to cause
reasonably proportionate outputs within a small range, for small changes of input. We’ll use the very popular
sigmoid function, but note that there are others. We also need the sigmoid derivative for backpropagation.

5. Initialise the weights. Setting them all to the same value, e.g. zero, would be a poor choice because the
weights are very likely to end up different from each other and we should help that along with this ‘symmetry-
breaking’.

6. Now for the learning process:


a. We’ll make an initial guess using the random initial weights, propagate it through the hidden layer as the
dot product of those weights and the input vector of truth-value pairs. Recall that a matrix – vector
multiplication proceeds along each row, multiplying each element by corresponding elements down
through the vector, and then summing them. This matrix goes into the sigmoid function to produce H. So
H = sigmoid(X * Wh)

b. Same for the Z (output) layer, Z = sigmoid(H * Wz)

c. Now we compare the guess with the training date, i.e. Y – Z, giving E.

d. Finally, backpropagation. This comprises computing changes (deltas) which are multiplied (specifically,
via the dot product) with the values at the hidden and input layers, to provide increments for the
appropriate weights. If any neuron values are zero or very close, then they aren’t contributing much and
might as well not be there. The sigmoid derivative (greatest at zero) used in the backprop will help to
push values away from zero. The sigmoid activation function shapes the output at each layer.
E is the final error Y – Z.

dZ is a change factor dependent on this error magnified by the slope of Z; if its steep we need to
change more, if close to zero, not much. The slope is sigmoid_(Z).

dH is dZ backpropagated through the weights Wz, amplified by the slope of H.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Finally, Wz and Wn are adjusted applying those deltas to the inputs at their layers, because the
larger they are, the more the weights need to be tweaked to absorb the effect of the next forward
prop. The input values are the value of the gradient that is being descended; we’re moving the
weights down towards the minimum value of the cost function.

If you want to understand the code at more than a hand-wavey level, study the backpropagation
algorithm mathematical derivation such as this one or this one so you appreciate the delta rule,
which is used to update the weights. Essentially, its the partial derivative chain rule doing the
backprop grunt work. Even if you don’t fully grok the math derivation at least check out the 4
equations of backprop, e.g. as listed here (click on the Backpropagation button near the bottom)
and here because those are where the code ultimately derives from.

The matrix multiplication going from the input layer to the hidden layer looks like this:

X00 X01 Wh00 Wh01 Wh02 X00*Wh00 + X01*Wh10 X00*Wh01 + X01*Wh11 X00*Wh02 + X01*Wh12

X10 X11 Wh10 Wh11 Wh12 X10*Wh00 + X11*Wh10 X10*Wh01 + X11*Wh11 X10*Wh02 + X10*Wh12

=
X20 X21 X20*Wh00 + X21*Wh10 X20*Wh01 + X21*Wh11 X20*Wh02 + X21*Wh12

X30 X31 X30*Wh00 + X31*Wh10 X30*Wh01 + X31*Wh11 X30*Wh02 + X31*Wh12

The X matrix holds the training data, excluding the required output values. Visualise it being rotated 90
degrees clockwise and fed one pair at a time into the input layer (X00 and X01, etc). They go across each

column of the weight matrix Wh for the hidden layer to produce the first row of the result H, then the next etc,
until all rows of the input data have gone in. H is then fed into the activation function, ready for the
corresponding step from the hidden to the output layer Z.

If you run this program, you should get something like:

[[ 0.01288433]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[ 0.99223799]
[ 0.99223787]
[ 0.00199393]]

You won’t get the exact same results, but the first and last numbers should be close to zero, while the 2 inner
numbers should be close to 1. You might have preferred exact 0s and 1s, but our learning process is
analogue rather than digital; you could always just insert a final test to convert ‘nearly 0’ to 0, and ‘nearly 1’ to
1!

Here’s an improved version, inspired by SimpleXOR mentioned in the Reddit post in Further Reading, below.
It has no (or linear) activation on the output layer and gets more accurate results faster.

1 #   XOR.py-A very simple neural network to do exclusive or.


2 #   sigmoid activation for hidden layer, no (or linear) activation for output
3  
4 import numpy as np
5  
6 epochs = 20000                                  # Number of iterations
7 inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
8 L = .1                                          # learning rate     
9  
10 X = np.array([[0,0], [0,1], [1,0], [1,1]])
11 Y = np.array([ [0],   [1],   [1],   [0]])
12  
13 def sigmoid (x): return 1/(1 + np.exp(-x))      # activation function
14 def sigmoid_(x): return x * (1 - x)             # derivative of sigmoid
15                                                 # weights on layer inputs
16 Wh = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
17 Wz = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))
18  
19 for i in range(epochs):
20  
21     H = sigmoid(np.dot(X, Wh))                  # hidden layer results
22     Z = np.dot(H,Wz)                            # output layer, no activation
23     E = Y - Z                                   # how much we missed (error)
24     dZ = E * L                                  # delta Z
25     Wz +=  H.T.dot(dZ)                          # update output layer weights
26     dH = dZ.dot(Wz.T) * sigmoid_(H)             # delta H
27     Wh +=  X.T.dot(dH)                          # update hidden layer weights
28      
29 print(Z)                # what have we learnt?

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Output should look something like this:

[[ 6.66133815e-15]
[ 1.00000000e+00]
[ 1.00000000e+00]
[ 8.88178420e-15]]

Part 2 will build on this example, introducing biases, graphical visualisation, learning a math function (sine),
etc…

Further Reading
Artificial Neural Networks, Wikipedia
A Neural Network in 11 lines of Python (Part 1)
A Neural Network in 13 lines of Python (Part 2 – Gradient Descent)
Neural Networks and Deep Learning (Michael Nielsen)
Implementing a Neural Network from Scratch in Python
Python Tutorial: Neural Networks with backpropagation for XOR using one hidden layer
Neural network with numpy
Can anyone share a simplest neural network from scratch in python? (Reddit)
Neural Networks Demystified (Youtube)
A Neural Network in Python, Part 2: activation functions, bias, SGD, etc

Share this:

      
2 2
More

Like this:

Loading...

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Related

A Neural Network in Python, Easy AI with Python 3 Easy Graph Plotting with
Part 2: activation functions, Pyplot
19th May 2016
bias, SGD, etc.
7th May 2015
9th February 2017 In "Advanced"
In "Basic"
In "Advanced"

Tags: matplotlib, numpy, random

4 Comments
1. A Brief Introduction to Artificial Neural Networks - AI Linux says:
31st January 2017 at 3:17 pm

[…] This short article gives you a high-level overview of the AI technique known as artificial neural
networks (ANN). The objective is to convey intuition rather than rigour, sufficient for example to
understand this Python code. After reading this, you might like to follow up with the Further Reading list
below. I have made heavy use of Wikipedia (and the listed resources) but any errors are likely my own.
There is a companion article over on Python3.codes: A Neural Network in Python, Part 1: sigmoid
function, gradient descent & backpropagation […]

2. Can anyone share a simplest neural network from scratch in python? - Artificial
Intelligence News says:
10th February 2017 at 10:53 am

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[…] code http :// python3. codes/ neural-network-python-part-1-sigmoid-function-gradient-descent-
backpropagat… shared by […]

3. A Brief Introduction to Artificial Neural Networks - Alan Richmond says:


27th April 2017 at 2:44 pm

[…] A Neural Network in Python, Part 1: sigmoid function, gradient descent & backpropagation […]

4. Repository of Sources to learn Data Science | IT Technologies says:


29th May 2017 at 12:33 pm

[…] http://python3.codes/neural-network-python-part-1-sigmoid-function-gradient-descent-
backpropagation/ […]

What do you think?


Leave a Reply
Enter your comment here...

Top Posts & Pages Follow me on Twitter Tags


Popular Sorting Algorithms My Tweets AI assignment bit operations boolean chr
A Quick Introduction to Python 3 Programming
A Neural Network in Python, Part 1: sigmoid
color colorsys complex
function, gradient descent & backpropagation
conditionals cos data types def dict
Easy Graph Plotting with Pyplot eval events float for fractal function
Binary Search graphics HSV import input
Spectral Harmonographs
libraries list loop math
Create PDF in your applications with the Pdfcrowd HTML to PDF API
l lib PDFCROWD
A Neural Network in Python, Part 2: activation matplotlib modulo not numpy
functions, bias, SGD, etc.
pygame
pillow print pyaudio
Hello World 2, in Python 3
Multi-pendulum Harmonograph simulator using random range recursion
numpy and matplotlib RGB search simulation sin tuple
Animated Tower of Hanoi turtle while
Categories Translate
Advanced Select Language ▼
Artificial Intelligence
Audio
Basic
Beginner
Fractals
Games
Harmonograph
Intermediate
Numbers
Pyplot
Text
Turing Machines
Turtle
Uncategorised

Pinbin Theme by Color Awesomeness | Copyright 2018 Python3 Codes | Powered by WordPress

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

You might also like