You are on page 1of 29

INTEGER SEQUENCE

LEARNING USING
RECTIFIED LINEAR UNITS
IN RECURRENT NEURAL
NETWORKS

INNOVATIVE AND CREATIVE PROJECT

Submitted by

SHRIRAMVISHWANATH S. (13BCS022)
SARAVANAN V.A. (13BCS084)
VENKATAKRISHNAN N. (13BCS102)
NAGARAJ P. (14BCS306)

in partial fulfillment for the award of the degree

of

Bachelor of Engineering

in

Computer Science and Engineering

Dr.Mahalingam College of Engineering and Technology


Pollachi - 642003
An Autonomous Institution

Affiliated to Anna University, Chennai - 600 025

OCTOBER 2018
Dr.Mahalingam College of Engineering and Technology
Pollachi - 642003
An Autonomous Institution

Affiliated to Anna University, Chennai -600 025

BONAFIDE CERTIFICATE

Certified that this project report, “INTEGER SEQUENCE LEARNING


USING RECTIFIED LINEAR UNITS IN RECURRENT NEURAL
NETWORKS”
is the bonafide work of

SHRIRAMVISHWANATH S. (13BCS022)
SARAVANAN V.A. (13BCS084)
VENKATAKRISHNAN N. (13BCS102)
NAGARAJ P. (14BCS306)

who carried out the project work under my supervision.

Dr. G.Anupriya
SUPERVISOR Dr. G.Anupriya
Associate Professor HEAD OF THE DEPARTMENT
Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering
Dr. Mahalingam College of Engineering and Dr. Mahalingam College of Engineering and
Technology, Pollachi – 642003 Technology, Pollachi – 642003

Submitted for the Autonomous End Semester Examination Innovative and Creative
ProjectViva-voce held on _______________________

INTERNAL EXAMINER EXTERNAL EXAMINER


INTEGER SEQUENCE LEARNING USING RECTIFIED LINEAR
UNITS IN RECURRENT NEURAL NETWORKS

ABSTRACT
Integer sequence prediction is very popular in aptitude tests which measure a

person’s quantitative and numerical reasoning abilities. Extended applications include

time series prediction, DNA sequencing etc. Given a sequence of numbers, the subject

must understand the underlying function and predict the next number in the sequence.

Since this task requires logical thinking, it serves as a test bed for Artificial General

Intelligence systems. Inductive reasoning systems have been attempted earlier, but they

require manual programming. Since artificial neural networks can approximate almost

any function, they can be used to solve this problem. Existing research has explored

artificial neural networks and simple recurrent networks with regard to learning integer

sequences. An attempt is made to improve the performance of recurrent networks by

using rectified linear units as the activation function, which has been known to produce

better results in many other applications such as computer vision and speech

recognition. The Online Encyclopedia of Integer Sequences holds a vast collection of

number sequences and has been used as a benchmark to test the performance of our

system.
ACKNOWLEDGEMENT

First and foremost, we wish to express our deep unfathomable feeling, gratitude
to our institution and our department for providing us a chance to fulfill our long
cherished of becoming Computer Science engineers.

We express our sincere thanks to our honorable Secretary Dr.C.Ramaswamyfor


providing us with required amenities.

We sincerely thank our directorDr.RangaPalaniswamy, for his moral support


and encouragement for our project.

We wish to express our hearty thanks to Dr.A.Rathinavelu, Principal of our


college, for his constant motivation and continual encouragement regarding our project
work.

We are grateful to Dr.G.Anupriya,Head of the Department, Computer Science


and Engineering, for her direction delivered at all times required. We also thank her for
her tireless and meticulous efforts in bringing out this project to its logical conclusion.

Our hearty thanks to our guide Dr.G.Anupriya Associate Professor for her
constant support and guidance offered to us during the course of our project by being
one among us and all the noble hearts that gave us immense encouragement towards the
completion of our project.

We are deeply grateful to our project coordinator, Ms. A. Brunda for her
guidance, patience and support. We also thank our review panel
members________________,
_____________________and_____________________for their continuous support and
guidance.

ii
LIST OF ABBREVIATIONS (in alphabetic order)

ANN : Artificial Neural Network

FFN : Feed Forward Network

LSTM : Long Short Term Memory

OEIS : Online Encyclopedia of Integer Sequences

ReLU : Rectified Linear Unit

RNN : Recurrent Neural Network

SRN : Simple Recurrent Network

iii
TABLE OF CONTENTS

1. INTRODUCTION ......................................................................................................................1
1.1 Objective ............................................................................................................................2
1.2 Overview ............................................................................................................................2
2. LITERATURE SURVEY ..............................................................................................................3
2.1 An AI Approach to Predict Number Sequences .................................................................3
2.2 Comparing Computer Models Solving Number Series Problems ......................................4
2.3 Recurrent Neural Networks ...............................................................................................5
2.4 Existing System - Solving Number Series with Simple Recurrent Networks ......................6
2.5 Initializing Recurrent Networks of Rectified Linear Units ..................................................7
2.6 Summary ............................................................................................................................8
3. METHODOLOGY .....................................................................................................................9
3.1 Preprocessing of input data ...............................................................................................9
3.2 Autoencoder ................................................................................................................... 10
3.3 Architecture of Simple Recurrent Network .................................................................... 11
3.4 Gradient Descent Algorithm ........................................................................................... 11
3.5 Activation functions ........................................................................................................ 12
4. RESULTS............................................................................................................................... 13
4.1 Dataset ............................................................................................................................ 13
4.2 Evaluation Metric ............................................................................................................ 13
4.3 Experiments & Results .................................................................................................... 14
4.4 Summary of Results ........................................................................................................ 16
5. CONCLUSION ....................................................................................................................... 17
REFERENCES ................................................................................................................................ 18
APPENDIX A: SAMPLE CODE ............................................................................................ A.1
APPENDIX B: SCREENSHOTS ............................................................................................ B.1

iv
LIST OF FIGURES

Figure 1 Block Diagram .................................................................................................... 9


Figure 2 Structure of Autoencoder .................................................................................. 11
Figure 3 FFN vs. SRN..................................................................................................... 11
Figure 4 Illustration of Gradient descent ........................................................................ 12
Figure 5 Benchmark dataset ............................................................................................ 13
Figure 6 Result of SRN with Autoencoder on benchmark set ...................................... B.1
Figure 7 Result of SRN without Autoencoder on benchmark set ................................. B.1

v
LIST OF TABLES

Table 1 Pattern generation .............................................................................................. 10


Table 2 Accuracy on benchmark data without Autoencoder .......................................... 14
Table 3 Accuracy on benchmark data with Autoencoder ............................................... 14
Table 4 Accuracy on OEIS data without Autoencoder ................................................... 15
Table 5 Accuracy on OEIS data with Autoencoder ........................................................ 15

vi
1. INTRODUCTION

Integer sequence prediction refers to the process of predicting the next number of
an integer sequence. This is very popular in aptitude tests which are used to measure a
person’s numerical reasoning ability. Artificial General Intelligence (AGI) refers to
Artificial Intelligence (AI) systems that can successfully perform any intellectual task
that a human can. Since humans have the logical ability to deduce number sequences,
integer sequence prediction serves as a good test-bed for such AGI systems. Neural
networks have shown promising results in the quest for AGI, as they can be applied to
several different problems like speech recognition and computer vision without much
change in their architecture.If they can also be applied to this task successfully, it will
be another step towards reaching AGI.

The subject will be given a sequence of numbers, and is asked to predict the next
number in the sequence. These sequences will have some underlying function, which
has to be identified by the subject in order to successfully predict the next number. This
function may be of several types, ranging from simple arithmetic calculations such as
addition, subtraction, multiplication or division to complex recursive functions that may
require inputs given several time steps ago. Earlier research has pointed towards
inductive reasoning and artificial neural networks for solving this problem. Inductive
reasoning systems such as IGOR2[1] require manual programming which might not be
feasible for large datasets. In contrast, artificial neural networks are universal
approximators, which can, in theory, be used to approximate any function. Hidden units
in these networks use non-linearities such as hyperbolic tangent or sigmoid function.
Thus, they can model even non-linear functions. From earlier research, it can be seen
that the performance of neural networks and humans vary drastically. Some sequences
that could not be solved by any neural networks have been solved by some human
subjects, and vice versa[2]. However, further research is required in this area, as only
simple architectures have been tested on this problem.

Ragni and Klein investigated the use of simple neural networks using
backpropagation of errors to predict integer sequences[3]. Wendemuth et al. used
simple recurrent networks for this problem, and reported better performance on a test

1
dataset of 20 sequences[4]. These networks use hyperbolic tangent or linear activation
functions which cause the vanishing/exploding gradient problem. Rectified linear units,
on the other hand, have become very popular recently, and have been known to improve
the performance ofneural networks by eliminating the vanishing/exploding gradient
problem[5]. They are used extensively in computer vision and speech recognition.

1.1 Objective

 The purpose of this project is to improve the performance of neural


networks in the task of integer sequence prediction, using rectified linear
units in recurrent neural networks.

 The accuracy of various network architectures on two different datasets


will be compared.

 The performance of the networks with and without autoencoder will be


compared.

1.2 Overview

Chapter 2 presents a brief overview of the existing literature on this topic. ANN
approaches to solving number series problems, architecture of Recurrent Neural
Network (RNN), and the initialization of rectified linear units in RNNs are
discussed.Chapter 3 describes the methodology used to solve the integer sequence
learning problem. This includes preprocessing the input data to a form more suitable for
training neural networks, and ways to initialize weights effectively using unsupervised
learning methods such as autoencoder. An overview of the experiments conducted and
the hyperparameters used are also described.Chapter 4 presents the results of the
experiments. The performance of the networks using various architectures and weight
initialization methods is tabulated. Chapter 5 describes the interpretation of the results,
and gives a formal conclusion to the project.

2
2. LITERATURE SURVEY

A brief summary of earlier research is outlined below.

2.1 An AI Approach to Predict Number Sequences

Number series problems are an interesting test bed for Artificial Intelligence
systems because the underlying function includes but is not limited to addition,
subtraction, multiplication and division. Artificial Neural Networks can be used to solve
such sequences and predict the next number in the sequence. The Online Encyclopedia
of Integer Sequences (OEIS), which consists of a vast collection of number sequences,
is used for benchmarking[6].ANN with a single hidden layer and error back-propagation
has been used. Hyperbolic tangent is used as the activation function. The numbers in
each sequence are normalized to the range of 0 to 1, for efficient network optimization.
Learning rate, number of units in the input and hidden layers and number of iterations
are systematically varied to identify the best configuration.

Around 57,000 sequences from the OEIS database, with values +/- 1000 are
chosen as the dataset. Another dataset of 20 sequences is also used for testing.The input
data, which is in the form of sequence of integers, is used to generate patterns based on
the number of input units. These are used as training data, and the last pattern is used as
test data.Out of the 20 sequences in one experiment, 17 could be solved i.e the last
number of the sequence could be predicted correctly, even if it is not used during
training. In the OEIS dataset, 26,951 out of 57,524 sequences could be solved. The best
architecture was with 4 input and 2 hidden nodes, which could solve 12,764 sequences.

Thus, it can be inferred that the architecture of the ANN has a significant role in
the prediction accuracy of a neural network. The best configuration for the ANNs is
about 2-4 input nodes and 5-6 hidden nodes. The maximum accuracy obtained is 12,764
out of 57,524 which is about 22%[7].

3
2.2 Comparing Computer Models Solving Number Series Problems

Inductive reasoning is the process of finding a general rule which fits the given
instance. Many IQ tests have number series prediction as a component. Given a
sequence of numbers, the subject must learn the underlying function and use it to
predict the next number. Since this is a measure of human intelligence, it can also be
used as a test bed for Artificial General Intelligence.Number series in intelligence tests
are usually restricted to the four basic mathematical operations and use small values for
easy calculations. Each number in the sequence may depend on one or more numbers
which occurred before. Thus, the underlying function may range from simple arithmetic
to complex recursive ones. Number series may be characterized according to features
such as necessary background knowledge, numerical values, structural complexity and
existence of a closed formula.

AI systems solving number series include SEEKWHENCE[8], which uses


general principles such as pattern recognition and analogy to solve sequences. Sanghi
and Dowe used automated theorem proving to solve number series problems[9]. Some
rule based systems use a semi-analytical approach where the term structure is guessed
based on heuristic enumeration of term structure. IGOR2 is an inductive programming
system which learns functional programs from small sets of input/output examples.
However, for complex series, mathematical operations have to be pre-defined so that the
system can use it for rule construction.

All the above approaches are based on symbolic computation. They are not only
able to produce the next number, but also give the underlying function. However, the set
of functions which can be learned is restricted. In contrast, artificial neural networks can
approximate any arbitrary function. Patterns are generated from the sequences and used
as input for network optimization. The last pattern is used to predict the next number.
The architecture of the network, learning rate and number of iterations are
systematically varied to find the optimum hyper parameters.It is observed that out of 20
sequences, 6 could not be solved by IGOR2, while only 3 could not be solved by ANNs.
One series could not be solved by either approach. Inductive reasoning systems require
manual programming but can be applied to several sequences without modification.

4
ANNs do not require manual programming, but individual networks have to be trained
for each sequence.

2.3 Recurrent Neural Networks

Recurrent neural networks may have different architectures such as fully


interconnected, partially recurrent and Long Short Term Memory (LSTM) networks.
Partially recurrent neural networks have been used to learn strings of characters.
Although some nodes are part of a feedforward structure, other nodes provide the
sequential context and receive feedback from other nodes. Weights from the context
units (C1 and C2) are processed like those for the input units, for example, using
backpropagation. The context units receive time delayed feedback from the second layer
units. Training data consists of inputs and their desired successor outputs. The net can
be trained to predict the next letter in a string of characters and to validate a string of
characters.

Two fundamental ways can be used to add feedback into feedforward multilayer
neural networks. Elman[10] introduced feedback from the hidden layer to the context
portion of the input layer. This approach pays more attention to the sequence of input
values. Jordan recurrent neural networks [11] use feedback from the output layer to the
context nodes of the input layer and give more emphasis to the sequence of output
values.Gradient descent is a key concept in neural network optimization. Error
backpropagation in neural networks is based on this technique. While backpropagation
is relatively simple to implement, several problems can occur in its use in practical
applications, including the difficulty of the network getting trapped in some local
minima.Researchers have developed a variety ofschemes by which gradient methods,
and in particular backpropagationlearning,can be extended to recurrent neural
networks[12]. Thebackpropagation through time approach approximates a recurrent
neural network as a sequence of static networks using gradient methods. In another
approach, a master neural network is used to identify suitable dynamical slave networks
for processing the given data.

5
Thus, recurrent neural networks are suitable for working with data which has
long-term dependencies, as they can “remember” previous inputs for longer periods of
time. Backpropagation technique can also be applied to RNNs for network optimization.

2.4 Existing System - Solving Number Series with Simple Recurrent


Networks

Simple Recurrent Networks (SRNs) are known to be a useful tool in cognitive


modeling of sequence learning. One advantage of SRNs over feed-forward networks is
their ability to implicitly learn the temporal characteristics of a given sequence.
Recurrent connections to a context layer provide the network with a dynamic memory.
SRNs do not need the assumption of a fixed time window of relevant information. They
are also able to learn an internal representation of the input sequence.

A standard Feed-Forward Network (FFN) is also able to develop internal


representations of the input patterns using hidden units. But the main difference
between an FFN and SRN lies in its structure. In an SRN, besides the hidden layer, a
new layer called the context layer is introduced. This layer stores the internal state of the
hidden layer at the time t. At the next timestep t+1, this internal state is fed back into the
hidden layer.This simple addition has a huge effect on the processing in the network. As
the context layer stores the previous internal state and provides this information to the
hidden layer, the hidden units get a broader task. In an SRN the external input and the
previous internal state have to be mapped to the desired output. The hidden layer must
find a representation of some input pattern and, at the same time, find a reasonable
encoding for the sequential structure of these representations.

Since neural networks converge much more effectively when using normalized
inputs, the numerical values are normalized to the range [0,1] before processing. The
linear activation function is used in this approach. Instead of using random initial
weights, unsupervised pre-training is used as auto encoder for weight initialization. This
means, the network was trained to generate the input (numbers of the series) at the
output. Such pre-training procedure can help to guide the parameters of the layers
towards regions in parameter space where solutions are allowed; that is, near a solution
that captures statistical structure of the input [4].

6
The network was trained for a maximum of 1000 iterations, omitting the last
element. After every 10 training cycles the network was tested on the complete series. If
it could predict the final element of the series, it was considered to have successfully
learned the rule underlying the series. For training, as for pre-training, the scaled
conjugate gradient back propagation algorithm was used.

In this experiment, 100 SRNs were trained on each of the 20 series. Thus, the
chance of starting from some unfavorable initial weights is minimized, and a measure of
the general difficulty of the task is discovered. It is seen that a simple network with one
input and one hidden unit could solve 18 of the 20 series, and with three input units, all
20 series could be solved. Thus, it can be concluded that recurrence in neural networks
is important for cognitive modeling, because recurrence is a fundamental concept in
human cognition.

2.5 Initializing Recurrent Networks of Rectified Linear Units

Recurrent Neural Networks are used in several areas such as speech recognition,
machine translation and sequence prediction tasks like language modeling. However,
training RNNs using back-propagation can be difficult because vanishing and exploding
gradients cause great difficulty in learning long-term dependencies [12]. Hessian-Free
optimization and stochastic gradient descent with momentum are some of the
approaches proposed for overcoming this difficulty. However, the most successful of
them all is the LSTM Recurrent Neural Networks which use stochastic gradient descent,
but changes the hidden units in such a way that the back propagated gradients are much
better behaved.

LSTM replaces logistic or tanhhiddenunits with “memory cells” that can store an
analog value. Each memory cell has its own input and output gates that control when
inputs are allowed to add to the stored analog value and when this value is allowed to
influence the output. These gates are logistic units with their own learned weightson
connections coming from the input and also the memory cells at the previous time-step.
There isalso a forget gate with learned weights that controls the rate at which the analog
value stored in thememory cell decays. For periods when the input and output gates are
off and the forget gate is notcausing decay, a memory cell simply holds its value over

7
time so the gradient of the error with respect to itsstored value stays constant when
backpropagated over those periods.

LSTMs have seen success in several tasks such as unconstrained handwriting


recognition, handwriting generation, image captioning etc. Recent research on deep feed
forward networks has shown that rectified linear units (ReLUs) are easier to train than
the logistic or tanh units that have been used for many years [5]. At first, ReLUs might
seem inappropriate for RNNs since they do not have an upper bound and thus can
explode quite easily. But experimental results show that they are quite effective.

With the right initialization of weights, RNNs composed of rectified linear units
are relatively easy to train. Their performance on test data is comparable with LSTMs in
certain tasks like predicting the next word in a very large corpus of text. The recurrent
weight matrix is initialized to be the identity matrix and biases to be zero. Identity
initialization has the very desirable property that when the error derivatives for the
hidden units are back propagated through time they remain constant provided no extra
error-derivatives are added. This is the same behavior as LSTMswhen their forget gates
are set so that there is no decay and it makes it easy to learn very long-range temporal
dependencies.

Therefore, it is proposed that rectified linear units be used as the activation


function in recurrent neural networks for the task of integer sequence prediction. The
recurrent weight matrix shall also be initialized as the identity matrix in order to
improve the network performance.

2.6 Summary

Thus, it can be inferred that neural networks can be used to predict integer
sequences. Simple recurrent networks are able to solve a considerable number of
sequences even with a very basic architecture. By using rectified linear units with
identity initialization of recurrent weight matrix, better accuracy may be achieved in
integer sequence learning.

8
3. METHODOLOGY

Each input sequence is preprocessed to generate a set of patterns as training data.


The auto encoder is then used to train the initial weights of the network. The network is
then trained on the data, withholding the last pattern. The last pattern is used to predict
the next number of the series.

Figure 1shows the overall block diagram of the process. Training data is
generated from the input sequences. Weight matrices are initialized randomly or using
autoencoder. The activation function is used to introduce non-linearity, so that the
network can learn non-linear functions also. The cost function calculates the difference
between the predicted and actual outputs. The gradient descent algorithm is used for
weight optimization. Finally, the optimized weights are used to predict the last number
of the series. The implementation was done in Ubuntu 16.04 environment using Python
with Tensor Flow library.

Figure 1 Block Diagram

3.1 Preprocessing of input data

Each sequence in the dataset consists of several integers in order. Since neural
networks work better when the input range is fixed, the numbers were scaled to the
𝑖
range (0, 1). This was done using the function 𝑓𝑖 = 10𝑙𝑒𝑛(𝑛)
where n is the largest
number in the sequence. The output of the network was scaled back using the inverse of

9
the same function. Patterns are generated from the sequences and used as training data
for the neural network.

Table 1 Pattern generation

Pattern N1 N2 N3 N4 N5 N6 N7 N8
P1 V1 V2 V3 T
P2 V2 V3 V4 T
P3 V3 V4 V5 T
P4 V4 V5 V6 T
P5 V5 V6 V7 ?

Table 1shows an example of pattern generation for a network with 3 input


units,from a sequence of 8 numbers. P1 to P5 denote pattern numbers, N1 to N8 denote
position of values in the sequence, and V1 to V7 denote the actual values. T denotes
target values used for training. The last pattern was used only for predicting the last
number.

3.2 Autoencoder

An autoencoder, as shown in Figure 2, is an artificial neural network used for


unsupervised learning. Here, the number of units in the output layer is matched with that
in the input layer. The autoencoder consists of two parts: the encoder and the decoder.
The encoder attempts to reduce the dimensionality of the input by compressing it. The
decoder then tries to reconstruct the input from the compressed representation. In our
experiment, tied weights were used i.e the transpose of the encoder weight matrix was
used for decoding. These new weights were used for training the neural network.

10
Figure 2 Structure of Autoencoder

3.3 Architecture of Simple Recurrent Network

A simple recurrent network is very similar to a normal neural network, but it has
an extra context layer connected to the hidden layer. This context layer stores the output
of the hidden layer from one time step (t) and feeds it to the hidden layer during the next
time step (t+1). The difference between their architectures is shown in Figure 3 below.

Figure 3 FFN vs. SRN

3.4 Gradient Descent Algorithm

The gradient descent algorithm, along with the backpropagation technique, is


used to optimize the weights of the neural network. During the forward pass, the
network uses the weights to predict the output. The “cost” or error value i.e the
difference between the actual and predicted output is backpropagated through the
network and the gradients are used to update the weight matrices. During each step, the
gradient descent algorithm takes a small step in the direction which has the lowest
slope. This is repeated several times until the global minimum is reached, and thus, the
network is optimized. Equation 1 and 2 show the computations to get the gradients for
each layer, from right to left.δ(𝑙) denotes theerror values of nodes in layer l. Θ(𝑙) denotes

the weight matrix from layer l to layer l+1. g is the activation function, 𝑧 (𝑙) denotes the

input values to layer l, and 𝑎 (𝑙) is the activation at layer l.

δ(𝑙) = ((Θ(𝑙) )𝑇 δ(𝑙+1) ) ∗ 𝑔′(𝑧 (𝑙) ) (1)

𝑔′ (𝑧 (𝑙) ) = 𝑎(𝑙) ∗ (1 − 𝑎(𝑙) ) (2)

11
Figure 4 shows the visualization of cost where the global minimum is at the
centre. J(w) denotes the cost for the weights w. The steps taken towards reaching the
minimum are highlighted in black.

Figure 4 Illustration of Gradient descent

3.5 Activation functions

The activation function is used to introduce non-linearity in the network. Without


non-linearity, the output can only be a linear combination of the inputs. But the dataset
may contain sequences whose underlying function cannot be approximated to a linear
combination of inputs. Thus, non-linearity is required to successfully learn such
sequences. Common activation functions include hyperbolic tangent, shown in Equation
3and rectified linear function shown in Equation 4[13]. Here, z denotes the linear
combination of weights and inputs in each unit of the network.

2
𝑡𝑎𝑛ℎ(𝑧) = 1+ 𝑒 −2𝑧 – 1 (3)

𝑟𝑒𝑙𝑢(𝑧) = 𝑚𝑎𝑥(0, 𝑧) (4)

12
4. RESULTS

4.1 Dataset

The neural network approach is tested on two different data sets. One is the
benchmark 20 sequences[7] presented in Figure 5 below, and the other is a set of 5000
random sequences taken from the OEIS database[6], such that the minimum length is 8,
and all the values are in the range +/- 1000.

Figure 5 Benchmark dataset

4.2 Evaluation Metric

The networks predict floating point numbers, which were rounded off to the
nearest integer. In the OEIS dataset, to get a better picture of the accuracy, the count of
sequences which differed upto+/- 10 from the actual numbers were also recorded.For
each network architecture, the number of sequences solved is used as the evaluation
metric.

13
4.3 Experiments & Results

In our experiment, the number of input units was fixed at 4, as it yielded the best
results. The number of hidden units was varied from 1 to 5 and the results were
recorded. For one part, the weights were initialized randomly with mean 0 and standard
deviation of 0.1. In another part, an autoencoder was used for weight initialization. The
autoencoder was trained for 1000 iterations with a learning rate of 0.3. The SRN was
then trained for a maximum of 1000 iterations with a learning rate of 0.9. After every 10
iterations, the network was tested on the last pattern. If it could predict the last number
correctly, it was considered to have successfully solved the sequence.

Table 2Accuracy on benchmark data without Autoencoder

Num_hidden 1 2 3 4 5
Trial 1 30% 35% 45% 40% 45%
Trial 2 35% 45% 45% 35% 50%
Trial 3 30% 35% 40% 50% 50%
Trial 4 35% 35% 50% 40% 40%
Trial 5 35% 35% 45% 35% 45%
Average 33% 37% 45% 40% 46%

Table 2shows the number of sequences out of the 20 benchmark set solved by an
SRN with 4 input units, and varying number of hidden units.It is seen that the network
with 5 input units solves 46% of sequences on average.

Table 3Accuracy on benchmark data with Autoencoder

Num_hidden 1 2 3 4 5
Trial 1 55% 40% 55% 35% 45%
Trial 2 55% 35% 45% 55% 50%
Trial 3 35% 55% 40% 45% 40%
Trial 4 35% 45% 55% 50% 50%
Trial 5 30% 45% 50% 50% 50%
Average 42% 44% 49% 47% 47%

14
Table 3 shows the number of sequences solved by the same SRN mentioned
above, but with unsupervised pre-training using autoencoder. It is seen that the network
with 3 hidden units has the best performance, with an average of 49%.

Table 4Accuracy on OEIS data without Autoencoder

Num_hidden Exact +/- 5 +/- 10


1 36% 43% 48%
2 38% 44% 49%
3 40% 46% 50%
4 41% 46% 51%
5 42% 46% 50%
6 42% 46% 50%
7 42% 45% 49%

Table 4 shows the performance of the SRN without autoencoder, on the OEIS
dataset of 5000 sequences, with the number of hidden units varied from 1 to 7. It is seen
that the networks with 5-7 hidden units perform the best, solving 42% ofsequences.

Table 5Accuracy on OEIS data with Autoencoder

Num_hidden Exact +/- 5 +/- 10


1 32% 39% 43%
2 29% 33% 37%
3 29% 35% 40%
4 28% 33% 37%
5 30% 34% 37%
6 30% 34% 38%
7 31% 34% 37%

Table 5shows the number of sequences solved by SRN with autoencoder, out of
5000 sequences from the OEIS dataset, with hidden units varied from 1 to 7. The
network with 1 hidden unit performs best, solving 32% of sequences. Increasing the
number of hidden units decreases the performance of the network.

15
4.4 Summary of Results

From the above table, it is seen that an SRN with 3 hidden units works best for
the benchmark dataset, solving about 45% of sequences on average. SRN with
autoencoder solves50% of sequences on average.However, for the OEIS dataset, with
autoencoder, 1 hidden unit works best, solving 32% of sequences, with additional units
gradually decreasing the performance of the network. It is also seen that when the
accuracy is decreased to +/- 10, the number solved increases to 43%. Without
autoencoder, the network solves up to 42% sequences when the hidden layer size is 6.
When the accuracy is reduced to +/- 10, the number increases up to 51%. On the OEIS
dataset, it is seen that random weight initialization works better than using an
autoencoder.

16
5. CONCLUSION

Thus, simple recurrent networks can be used to predict integer sequences.The


presence of autoencoder improves the performance of the network by a small margin on
the benchmark dataset. However, it does not work well for the OEIS dataset. With an
autoencoder, the SRN is able to solve about 32% of the sequences, but when the
weights are randomly initialized, 42% of the sequences can be solved.

17
REFERENCES

[1] U. Schmid and E. Kitzelmann, "Inductive rule learning on the knowledge level,"
Cognitive Systems Research, pp. 237-248, 2011.
[2] U. Schmid and M. Ragni, "Comparing Computer Models Solving Number Series
Problems," Lecture Notes in Computer Science, vol. 9205, pp. 352-361, 2015.
[3] M. Ragni and A. Klein, "Predicting Numbers: An AI Approach to Solving Number
Series," Lecture Notes in Artificial Intelligence, vol. 7006, pp. 255-259, 2011.
[4] S. Gluge and W. Andreas, "Solving Number Series with Simple Recurrent
Networks," Lecture Notes in Computer Science, vol. 7930, pp. 412-420, 2013.
[5] N. Jaitly, Q. V. Le and G. E. Hinton, "A simple way to initialize recurrent networks
of rectified linear units," arXiv, vol. 1504, no. 00941, 2015.
[6] "The Online Encyclopedia of Integer Sequences," [Online]. Available:
http://oeis.org. [Accessed June 2016].
[7] M. Ragni and A. Klein, "Solving number series - Architectural Properties of
Successful Artificial Neural Networks," Neural Computation Theory &
Applications, pp. 224-229, 2011.
[8] M. Meredith, "Seek-whence: a model of pattern perception," Technical report,
Indiana Univ., Bloomington (USA), 1986.
[9] P. Sanghi and D. Dowe, "A computer program capable of passing I.Q. tests," in 7th
Conf. of the Australasian Society for Cognitive Science, Sydney, Australia, 2003.
[10] J. L. Elman, "Finding structure in time," Cognitive Science, vol. 14, no. 179, 1990.
[11] M. Jordan, "Generic constraints on underspecified target trajectories," in
International Joint Conference on Neural Networks, 1989.
[12] L.R.Medsker, Recurrent Neural Networks: Design and Applications, 2001, pp. 12-
15.
[13] "Activation Function," Wikipedia, [Online]. Available:
https://en.wikipedia.org/wiki/Activation_function. [Accessed August 2016].

18
APPENDIX A: SAMPLE CODE

Pattern Generation: inputsplit.py

1: import math
2: import numpy as np
3: definputsplit(Xin,input_layer_size):
4: m = abs(max(Xin))
5: count = 1
6: if not(m == 0 or m == 1):
7: count = math.ceil(math.log10(abs(max(Xin))))
8: Xin = [a/pow(10,count) for a in Xin]
9: X = []
10: for i in range(0,len(Xin)-input_layer_size):
11: X.append(Xin[i:i+input_layer_size])
12: X = np.matrix(X)
13: y = np.matrix(Xin[input_layer_size:]).T
14: return X,y,count

SRN with Autoencoder: srn_ae.py

1: import tensorflow as tf
2: import numpy as np
3: from inputsplit import inputsplit

4: input_layer_size = 4;
5: hidden_layer_size = 3;

6: definit_weights(shape):
7: return tf.Variable(tf.random_normal(shape, stddev=0.1))

8: def model(X, w_h, w_o, b1, b2, h_prev, w_hprev):


9: a2 = tf.matmul(X, w_h) + b1 + tf.matmul(h_prev, w_hprev)
10: #a2 = tf.nn.tanh(a2)
11: a3 = tf.matmul(a2, w_o) + b2
12: #a3 = tf.nn.tanh(a3)
13: yHat = a3
14: return yHat, a2, w_hprev

15: defautoencode(X, W, b, Wprime, bprime):


16: Y = tf.matmul(X, W) + b
17: Z = tf.matmul(Y,Wprime) + bprime
18: return Z

19: f = open('oeis_sample.csv')
20: correct,nseq,near5,near10 = 0,0,0,0

21: while(1):
22: sess = tf.InteractiveSession()
23: x = f.readline()

A.1
24: if(x==""):
25: break
26: sequence = [int(y) for y in x.rstrip('\r\n').split(',')]
27: nseq+=1
28: if(nseq>5000):
29: break
30: Xin,yout,count = inputsplit(sequence,input_layer_size)

31: trX = Xin[0:-1]


32: trY = yout[0:-1]
33: teX = Xin[-1]

34: X = tf.placeholder("float", [None, input_layer_size])


35: Y = tf.placeholder("float", [None, 1])

36: W1 = init_weights([input_layer_size, hidden_layer_size])


37: W2 = init_weights([hidden_layer_size, 1])
38: h = tf.Variable(tf.ones([1,hidden_layer_size]))
39: Wh = init_weights([hidden_layer_size,hidden_layer_size])
40: b1 = tf.Variable(tf.zeros(hidden_layer_size))
41: b2 = tf.Variable(tf.zeros(1))
42: Wprime = tf.transpose(W1)
43: bprime = tf.Variable(tf.zeros(input_layer_size))

44: sess.run(tf.initialize_all_variables())
45: Z = autoencode(X,W1,b1,Wprime,bprime)
46: aecost = tf.reduce_mean(tf.square(Z - X))
47: aetrain_op=tf.train.GradientDescentOptimizer(0.1).minimize(aecos
t)
48: for i in range(1000):
49: sess.run(aetrain_op, feed_dict={X: trX})
50: #print(sess.run(h),sess.run(Wh))

51: py_x,h,Wh = model(X, W1, W2, b1, b2, h, Wh)


52: cost = tf.reduce_mean(tf.square(py_x - Y))
53: train_op = tf.train.GradientDescentOptimizer(0.9).minimize(cost)
54: predict_op = py_x * pow(10,count)
55: for i in xrange(0,1001,10):
56: sess.run(train_op, feed_dict={X: trX, Y: trY})
57: if(float(sequence[-1])==round(float(sess.run(predict_op,
feed_dict={X: teX})))):
58: correct+=1
59: break

60: #print(correct,int(sequence[-1]),float(sess.run(predict_op,
feed_dict={X: teX})))
61: predicted_num = float(sess.run(predict_op, feed_dict={X: teX}))
62: if(abs(float(sequence[-1])-predicted_num)<5):
63: near5+=1
64: if(abs(float(sequence[-1])-predicted_num)<10):
65: near10+=1
66: if nseq%10==0:
67: print nseq,correct,near5,near10

68: #print(int(sequence[-1]),float(sess.run(predict_op,
feed_dict={X: teX})))
69: tf.reset_default_graph()
70: sess.close()

A.2
APPENDIX B: SCREENSHOTS

Figure 6 Result of SRN with Autoencoderon benchmark set

Figure 7 Result of SRN without Autoencoder on benchmark set

B.1

You might also like