NEURAL Dynamic Networks

WK5 Dynamic Networks
Contents
CS 476: Networks of Neural
Sequences
Computation
Time Delayed I
Time Delayed II WK5 Dynamic Networks:

Recurrent I
Time Delayed & Recurrent
Recurrent II
Networks
Conclusions
Dr. Stathis Kasderidis
Dept. of Computer Science
University of Crete
Spring Semester, 2009

CS 476: Networks of Neural Computation, CSD, UOC, 2009
Contents
Sequence Learning
Contents
Time Delayed Networks I: Implicit
Sequences
Representation
Time Delayed I
Time Delayed Networks II: Explicit
Time Delayed II Representation
Recurrent I Recurrent Networks I: Elman + Jordan

Recurrent II Networks
Conclusions Recurrent Networks II: Back Propagation
Through Time
Conclusions

Sequence Learning
MLP & RBF networks are static networks, i.e.

Contents they learn a mapping from a single input
Sequences signal to a single output response for an
arbitrary large number of pairs;
Time Delayed I
Time Delayed II
Dynamic networks learn a mapping from a
single input signal to a sequence of response
Recurrent I signals, for an arbitrary number of pairs
Recurrent II (signal,sequence).
Conclusions Typically the input signal to a dynamic
network is an element of the sequence and
then the network produces as a response the
rest of the sequence.
To learn sequences we need to include some
form of memory (short term memory) to the
network.
Sequence Learning II
We can introduce memory effects with two

Contents principal ways:
Sequences Implicit: e.g. Time lagged signal as input
Time Delayed I to a static network or as recurrent
Time Delayed II connections
Recurrent I Explicit: e.g. Temporal Backpropagation

Method
Recurrent II
In the implicit form, we assume that the
Conclusions
environment from which we collect examples
of (input signal, output sequence) is
stationary. For the explicit form the
environment could be non-stationary, i.e. the
network can track the changes in the
structure of the signal.
Time Delayed Networks I
The time delayed approach includes two basic

Contents types of networks:
Sequences Implicit Representation of Time: We
Time Delayed I combine a memory structure in the input

layer of the network with a static network
Time Delayed II
model
Recurrent I Explicit Representation of Time: We
Recurrent II explicitly allow the network to code time, by

generalising the network weights from
Conclusions
scalars to vectors, as in TBP (Temporal
Backpropagation).
Typical forms of memories that are used are
the Tapped Delay Line and the Gamma Memory
family.
The Tapped Delay Line form of memory is

Contents shown below for an input signal x(n):
Sequences
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II The Gamma form of memory is defined by:

Conclusions
n 1
g p ( n) p (1 ) n p n p
p 1

The Gamma Memory is shown below:

Contents
Sequences
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions

In the implicit representation approach we

Contents combine a static network (e.g. MLP / RBF) with
Sequences a memory structure (e.g. tapped delay line).
Time Delayed I An example is shown below (the NETtalk
network):
Time Delayed II
Recurrent I
Recurrent II
Conclusions

We present the data in a sliding window. For

Contents example in NETtalk the middle group of input
Sequences neurons present the letter in focus. The rest of
the input groups, three before & three after,
Time Delayed I
present context.
Time Delayed II
The purpose is to predict for example the next
Recurrent I symbol in the sequence.
Recurrent II The NETtalk model (Sejnowski & Rosenberg,
1987) has:
Conclusions
203 input nodes
80 hidden neurons
26 output neurons
18629 weights
Used BP method for training
Time Delayed Networks II
In explicit time representation, neurons have a

Contents spatio-temporal structure, i.e. its synapse
Sequences arriving to a neuron is not a scalar number but a
vector of weights, which are used for
Time Delayed I
convolution of the time-delayed input signal of a
Time Delayed II previous neuron with the synapse.
Recurrent I A schematic representation of a neuron is

Recurrent II given below:
Conclusions

The output of the neuron in this case is given

Contents by: p

Sequences y j ( n) w j (l ) x ( n l ) b j
l 0
Time Delayed I
Time Delayed II
In case of a whole network, for example
Recurrent I
assuming a single output node and a linear
Recurrent II output layer, the responsep is given by:
m 1 m 1

y ( n) w j y j ( n) w j w j (l ) x ( n l ) b j b0
Conclusions j 1 j 1 l 0
Where p is the depth of the memory and b0 is the

bias of the output neuron

In the more general case, where we have

Contents multiple neurons at the output layer, we have
Sequences for neuron j of any layer:
m0 p
Time Delayed I y j (n) w ji (l ) xi ( n l ) b j

Time Delayed II
i 1 l 0
Time Delayed II
Recurrent I The output of any synapse is given by the
Recurrent II convolution sum:
p
Conclusions T
s ji (n) w ji xi (n) w ji (l ) xi (n l )
l 0

Where the state vector xi(n) and weight vector

Contents
wji for synapse I are defined as follows:
Sequences
xi(n)=[xi(n), xi(n-1),, xi(n-p)]T
Time Delayed I
wji=[wji(0), wji(1),, wji(p)]T
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Time Delayed Networks II: Learning Law
To train such a network we use the Temporal

Contents BackPropagation algorithm. We present the
Sequences algorithm below.
Time Delayed I Assume that neuron j lies in the output layer
Time Delayed II and its response is denoted by yj(n) at time n,
while its desired response is given by dj(n).
Recurrent I
Recurrent II
We can define an instantaneous value for the
sum of squared errors produced by the network
Conclusions as follows: 1
E ( n)
2 j
e
2
j ( n)

Time Delayed Networks II: Learning Law-1
The error signal at the output layer is defined

Contents by:
Sequences
e j ( n) d j ( n) y j ( n)
Time Delayed I
The idea is a minimise an overall cost function,
Time Delayed II calculated over all time:
Recurrent I
Etotal E (n)
n
Recurrent II
Conclusions
We could proceed as usual by calculating the
gradient of the cost function over the weights.
This impliesE
that we needto
E (calculate
n) the
instantaneous
total

gradient:
w ji w n ji

However this approach to work we need to

Contents unfold the network in time (i.e. to convert it to
Sequences an equivalent static network and then calculate
the gradient). This option presents a number of
Time Delayed I
disadvantages:
Time Delayed II
A loss of symmetry between forward and
Recurrent I backward pass for the calculation of
Recurrent II instantaneous gradient;
Conclusions No nice recursive formula for propagation of
error terms;
Need for global bookkeeping to keep track
of which static weights are actually the same
in the equivalent network

For these reasons we prefer to calculate the

Contents gradient of the cost function as follows:
Sequences Etotal Etotal v j ( n)

w ji

v j ( n) w ji
Time Delayed I n
Time Delayed II
Note that in general holds:
Recurrent I
Recurrent II Etotal v j ( n) E ( n )

v j ( n) w ji w ji
Conclusions
The equality is correct only when we take the
sum over all time.

To calculate the weight update we use the

Contents steepest descent method:
Sequences Etotal v j ( n)
w ji ( n 1) w ji ( n)
Time Delayed I v j ( n) w ji ( n)
Time Delayed II
Where is the learning rate.
Recurrent I
We calculate the terms in the above relation as
Recurrent II
follows:
Conclusions v j ( n )
xi ( n)
w ji ( n)
This is by definition the induced field vj(n)

We define the local gradient as:

Contents
Etotal
Sequences j ( n)
v j ( n )
Time Delayed I
Time Delayed II Thus we can write the weight update equations

in the familiar form:
Recurrent I
w ji (n 1) w ji (n) j (n) xi (n)
Recurrent II
Conclusions
We need to calculate the for the cases of
output and hidden layers.

For the output layer the local gradient is given

Contents by: Etotal E ( n )
Sequences j ( n) e j (n) ' (v j (n))
v j ( n ) v j ( n )
Time Delayed I
Time Delayed II
For a hidden layer we assume that neuron j is
Recurrent I
connected to a set A of neurons in the next
Recurrent II layer (hidden or output).
E Then we have:
j ( n) total
Conclusions v j ( n )
Etotal vr (k )

r A k vr ( k ) v j ( n )

By re-writing we get the following:

Contents
vr ( k )
Sequences j ( n) r ( k )
r A k v j ( n )
Time Delayed I
v r ( k ) y j ( n )
r (k )
Time Delayed II r A k y j ( n ) v j ( n )
Recurrent I
Recurrent II Finally we putting all together we get:

n p
Conclusions
j (n) ' (v j (n)) r (k ) wrj (k l )
r A k n
p
' (v j (n)) r (n l ) wrj (n)
r A l 0

Where l is the layer level

Contents
Sequences
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Recurrent I
A network is called recurrent when there are

Contents connections which feedback to previous layers
Sequences or neurons, including self-connections. An
example is shown next:
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions Successful early models of recurrent networks

are:
Jordan Network
Elman Network

Recurrent I
The Jordan Network has the structure of an

Contents MLP and additional context units. The Output
Sequences neurons feedback to the context neurons in 1-1
fashion. The context units also feedback to
Time Delayed I
themselves.
Time Delayed II
The network is trained by using the
Recurrent I Backpropagation algorithm
Recurrent II
A schematic is shown in the next figure:
Conclusions

Recurrent I
The Elman Network has also the structure of

Contents an MLP and additional context units. The Hidden
Sequences neurons feedback to the context neurons in 1-1
fashion. The hidden neurons connections to the
Time Delayed I
context units are constant and equal to 1. It is
Time Delayed II also called Simple Recurrent Network
Recurrent I (SRN).
Recurrent II The network is trained by using the
Backpropagation algorithm
Conclusions
A schematic
is shown in the
next figure:

Recurrent II
More complex forms of recurrent networks are

Contents possible. We can start by extending a MLP as a
Sequences
basic building block.
Time Delayed I Typical paradigms of complex recurrent models

are:
Time Delayed II
NonlinearAutoregressive with Exogenous
Recurrent I
Inputs Network (NARX)
Recurrent II
The State Space Model
Conclusions
The Recurrent Multilayer Perceptron (RMLP)
Schematic representations of the networks are
given in the next slides:

Recurrent II-1
The structure of the NARX model includes:

Contents
A MLP static network;
Sequences
A current input u(n) and its delayed versions
Time Delayed I up to a time q;
Time Delayed II
A time delayed version of the current output
Recurrent I y(n) which feeds back to the input layer. The
Recurrent II memory of the delayed output vector is in
general p.
Conclusions
The output is calculated as:
y(n+1)=F(y(n),,y(n-p+1),u(n),,u(n-q+1))

Recurrent II-2
A schematic of the NARX model is as follows:

Contents
Sequences
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Recurrent II-3
The structure of the State Space model

Contents includes:
Sequences A MLP network with a single hidden layer;
Time Delayed I Thehidden neurons define the state of the
Time Delayed II network;
Recurrent I A linear output layer;
Recurrent II A feedback of the hidden layer to the input
Conclusions layer assuming a memory of q lags;
Theoutput is determined by the coupled
equations:
x(n+1)=f(x(n),u(n))
y(n+1)=C x(n+1)
Recurrent II-4
Where f is a suitable nonlinear function

Contents characterising the hidden layer. x is the state
Sequences
vector, as it is produced by the hidden layer. It
has q components. y is the output vector and it
Time Delayed I has p components. The input vector is given by
Time Delayed II u and it has m components.
Recurrent I A schematic representation of the network is

Recurrent II given below:
Conclusions

Recurrent II-5
The structure of the RMLP includes:

Contents
One or more hidden layers;
Sequences
Feedback around each layer;
Time Delayed I
Thegeneral structure of a static MLP
Time Delayed II
network;
Recurrent I
The output is calculated as follows
Recurrent II (assuming that xI, xII, and xo are the first,
Conclusions second and output layer outputs):
xI(n+1)= I(xI(n), u(n))
xII(n+1)= II(xII(n), xI(n+1))
xO(n+1)= O(xO(n), xII(n+1))
Recurrent II-6
Where the functions I(), II() and O() denote

Contents the
Sequences Activation functions of the corresponding layer.
Time Delayed I
A schematic representation is given below:
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Recurrent II-7
Some theorems on the computational

Contents power of recurrent networks:
Sequences Thm 1: All Turing machines may be
Time Delayed I simulated by fully connected recurrent
networks built on neurons with sigmoid
Time Delayed II
activation functions.
Recurrent I
Thm 2: NARX networks with one layer of
Recurrent II hidden neurons with bounded, one-sided
Conclusions saturated activation functions and a linear
output neuron can simulate fully connected
recurrent networks with bounded, one-sided
saturated activation functions, except for a
linear slowdown.

Recurrent II-8
Corollary: NARX networks with one hidden

Contents layer of neurons with BOSS activations
Sequences
functions and a linear output neuron are
Turing equivalent.
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Recurrent II-9
The training of the recurrent networks can be

Contents done with two methods:
Sequences BackPropagation Through Time
Time Delayed I Real-Time Recurrent Learning
Time Delayed II
We can train a recurrent network with either
Recurrent I epoch-based or continuous training operation.
Recurrent II However an epoch in recurrent networks does
not mean the presentation of all learning
Conclusions
patterns but rather denotes the length of a
single sequence that we use for training. So an
epoch in recurrent network corresponds in
presenting only one pattern to the network.
At the end of an epoch the network stabilises.
Recurrent II-10
Some useful heuristics for the training is

Contents given below:
Sequences Lexigraphic order of training samples should
Time Delayed I be followed, with the shortest strings of
symbols being presented in the network first;
Time Delayed II
The training should begin with a small
Recurrent I
training sample and then its size should be
Recurrent II incrementally increased as the training
Conclusions proceeds;
The synaptic weights of the network should
be updated only if the absolute error on the
training sample currently being processed by
the network is greater than some prescribed
criterion;
Recurrent II-11
The use of weight decay during training is

Contents recommended; weight decay was discussed
Sequences
in WK3
Time Delayed I
Time Delayed II The BackPropagation Through Time algorithm

Recurrent I
proceeds by unfolding a network in time. To be
more specific:
Recurrent II
Assume that we have a recurrent network N
Conclusions
which is required to learn a temporal task
starting from time n0 and going all the way to
time n.
Let N* denote the feedforward network that
results from unfolding the temporal
CS 476:operation of the
Networks of Neural recurrent
Computation, CSD, network
UOC, 2009 N.
Recurrent II-12
Thenetwork N* is related to the original

Contents network N as follows:
Sequences For each time step in the interval (n0,n], the
Time Delayed I network N* has a layer containing K neurons,
where K is the number of neurons contained
Time Delayed II in network N;
Recurrent I In every layer of network N* there is a copy
Recurrent II of each neuron in network N;
Conclusions For every time step l [n0,n], the synaptic

connection from neuron i in layer l to neuron j
in layer l+1 of the network N* is a copy of the
synaptic connection from neuron i to neuron j
in the network N.
The following example explains the idea of
CSunfolding:
476: Networks of Neural Computation, CSD, UOC, 2009
Recurrent II-13
We assume that we have a network with two

Contents neurons which is unfolded for a number of
Sequences
steps, n:
Time Delayed I
Time Delayed II
Recurrent I
Recurrent II
Conclusions

Recurrent II-14
We present now the method of Epochwise

Contents BackPropagation Through Time.
Sequences Let the dataset used for training the network
Time Delayed I be partitioned into independent epochs, with
each epoch representing a temporal pattern
Time Delayed II
of interest. Let n0 denote the start time of an
Recurrent I epoch and n1 denotes its end time.
Recurrent II
We can define the following cost function:
Conclusions
1 n1
Etotal (n0 , n1 ) e j (n)
2
2 n n0 j A

Recurrent II-15
Where A is the set of indices j pertaining to

Contents those neurons in the network for which
Sequences
desired responses are specified, and ej(n) is
the error signal at the output of such a
Time Delayed I
neuron measured with respect to some
Time Delayed II desired response.
Recurrent I
Recurrent II
Conclusions

Recurrent II-16
The algorithm proceeds as follows:

Contents
1. For a given epoch, the recurrent network
Sequences starts running from some initial state until
Time Delayed I it reaches a new state, at which point the
training is stopped and the network is
Time Delayed II
reset to an initial state for the next epoch.
Recurrent I The initial state doesnt have to be the
Recurrent II same for each epoch of training. Rather,
what is important is for the initial state for
Conclusions the new epoch to be different from the
state reached by the network at the end
of the previous epoch;

Recurrent II-17
2. First a single forward pass of the data

Contents through the network for the interval (n0,
Sequences n1) is performed. The complete record of
input data, network state (i.e. synaptic
Time Delayed I
weights), and desired responses over this
Time Delayed II interval is saved;
Recurrent I
3. A single backward pass over this past
Recurrent II record is performed to compute the
values jof Etotalgradients:
( n0 , n1 )
Conclusions ( n)the
local
v j ( n )
For all j A and n0 < n n1 . This

computation is performed by the formula:
Recurrent II-18
' (v j (n))e j (n) for n n1

Contents
j (n)
' (v j (n)) e j (n) w jk k (n 1) for n0 n n1
Sequences k A
Time Delayed I
Where () is the derivative of an
Time Delayed II activation function with respect to its
Recurrent I argument, and vj(n) is the induced local
Recurrent II
field of neuron j.
Conclusions The use of above formula is repeated,

starting from time n1 and working back,
step by step, to time n0 ; the number of
steps involved here is equal to the
number of time steps contained in the
epoch.
Recurrent II-19
4. Once the computation of back-

Contents
propagation has been performed back to
time n0+1, the following adjustment is
Sequences
applied to the synaptic weight wji of
Time Delayed I neuron j:
Etotal
Time Delayed II w ji
w ji
Recurrent I n1
Recurrent II

n n0 1
j (n) xi ( n 1)
Conclusions
Where is the learning rate parameter

and xi(n-1) is the input applied to the ith
synapse of neuron j at time n-1.

Recurrent II-20
There is a potential problem with the

Contents method, which is called the Vanishing
Sequences
Gradients Problem, i.e. the corrections
calculated for the weights are not large
Time Delayed I enough when using methods based on
Time Delayed II steepest descent.
Recurrent I However this is a research problem currently
Recurrent II and ones has to see the literature for details.
Conclusions

Conclusions
Dynamic networks learn sequences in contrast

Contents to the static mappings of MLP and RBF
Sequences
networks.
Time Delayed I Time representation takes place explicitly or

implicitly.
Time Delayed II
The implicit form includes time-delayed
Recurrent I
versions of the input vector and use of a static
Recurrent II network model afterwards or the use of
Conclusions recurrent networks.
The explicit form uses a generalisation of the
MLP model where a synapse is modelled now as
a weight vector and not as a single number. The
synapse activation is not any more the product
of the synapses weight with the output of a
CSprevious neuron
476: Networks of Neural but ratherCSD,
Computation, theUOC,
inner
2009product of
Conclusions I
The extended MLP networks with explicit

Contents temporal structure are trained with the
Sequences
Temporal BackPropagation algorithm.
Time Delayed I The recurrent networks include a number of

simple and complex architectures. In the simpler
Time Delayed II
case we train the networks using the standard
Recurrent I BackProgation algorithm.
Recurrent II In the more complex cases we first unfold the
Conclusions network in time and then train it using the
BackProgation Through Time algorithm.

NEURAL Dynamic Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NEURAL Dynamic Networks

Uploaded by

Copyright:

Available Formats

WK5 Dynamic Networks

Time Delayed II WK5 Dynamic Networks:

Spring Semester, 2009

Recurrent I Recurrent Networks I: Elman + Jordan

CS 476: Networks of Neural Computation, CSD, UOC, 2009

MLP & RBF networks are static networks, i.e.

We can introduce memory effects with two

Recurrent I Explicit: e.g. Temporal Backpropagation

The time delayed approach includes two basic

Time Delayed I combine a memory structure in the input

Recurrent II explicitly allow the network to code time, by

The Tapped Delay Line form of memory is

Recurrent II The Gamma form of memory is defined by:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The Gamma Memory is shown below:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

In the implicit representation approach we

CS 476: Networks of Neural Computation, CSD, UOC, 2009

We present the data in a sliding window. For

In explicit time representation, neurons have a

Recurrent I A schematic representation of a neuron is

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The output of the neuron in this case is given

Where p is the depth of the memory and b0 is the

CS 476: Networks of Neural Computation, CSD, UOC, 2009

In the more general case, where we have

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Where the state vector xi(n) and weight vector

CS 476: Networks of Neural Computation, CSD, UOC, 2009

To train such a network we use the Temporal

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The error signal at the output layer is defined

CS 476: Networks of Neural Computation, CSD, UOC, 2009

However this approach to work we need to

CS 476: Networks of Neural Computation, CSD, UOC, 2009

For these reasons we prefer to calculate the

CS 476: Networks of Neural Computation, CSD, UOC, 2009

To calculate the weight update we use the

This is by definition the induced field vj(n)

CS 476: Networks of Neural Computation, CSD, UOC, 2009

We define the local gradient as:

Time Delayed II Thus we can write the weight update equations

CS 476: Networks of Neural Computation, CSD, UOC, 2009

For the output layer the local gradient is given

CS 476: Networks of Neural Computation, CSD, UOC, 2009

By re-writing we get the following:

Recurrent II Finally we putting all together we get:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Where l is the layer level

CS 476: Networks of Neural Computation, CSD, UOC, 2009

A network is called recurrent when there are

Conclusions Successful early models of recurrent networks

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The Jordan Network has the structure of an

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The Elman Network has also the structure of

CS 476: Networks of Neural Computation, CSD, UOC, 2009

More complex forms of recurrent networks are

Time Delayed I Typical paradigms of complex recurrent models

CS 476: Networks of Neural Computation, CSD, UOC, 2009

The structure of the NARX model includes:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

A schematic of the NARX model is as follows: