Professional Documents
Culture Documents
11/18/2004
11/18/2004
x1 x2 xi
...
uj uN
...
input neurons
(7.1.1)
(7.1.2)
11/18/2004
i = uik
i = 1.. N
(7.1.3)
Therefore, we desire the network with an initial state x(0)=xk0 to converge to xk()=xk =yk whenever uk is applied as input to the network. (7.1.4)
(7.1.5)
11/18/2004
ik = i ( yik xik )
(7.1.8)
In equation (7.1.8) the coefficient i used to discriminate between the output neurons and the others by setting its value as
i =
(7.1.9)
Therefore, the neurons, which are not output, will have no effect on the error.
(7.1.11)
ek wij
(7.1.12)
Here is a positive constant named the learning rate, which should be chosen so small.
11/18/2004
i i = i
the partial derivative of ek given in Eq. (7.1.5) with respect to wsr becomes:
(7.1.13)
ek x k = ik i wsr wsr i
(7.1.14)
(7.1.15)
xik = f ( w ji x k + uik ) j
j
(7.1.16)
( x k w j
j
a = wij x k + uik j j
w ji
sr
+ w ji
x k j wsr
(7.1.17)
(7.1.18)
11/18/2004
wij
x k jsir = ir xsk j
j
(7.1.20)
Hence,
xik
k = f ( aik )( ir xs + w ji
x k j
(7.1.21)
(7.1.22)
(7.1.22)
(7.1.23)
ji
(7.1.24)
or,
(( ji w ji f (aik ))
(7.1.25)
11/18/2004
(( ji w ji f (aik ))
x k j = ir f (aik ) x sk wsr
(7.1.25)
(7.1.26)
and
Rik = ir f ( aik )
(7.1.27)
x k = R k x sk wsr
(7.1.28)
(7.1.29)
(7.1.30)
(L
j
k 1 ij
(7.1.31)
we obtain
k k x k = ( Lk )ir1 f ( ar ) xs wsr i
(7.1.32)
11/18/2004
ek wij
(7.1.12)
ek x k = ik i wsr wsr i
(7.1.14)
k k x k = ( Lk )ir1 f ( ar ) xs wsr i
Insertion of (7.1.32) in equation (7.1.14) and then (7.1.12), results in
d wsr k k = ik ( Lk )ir1 f ( ar ) x s dt i
(7.1.32)
(7.1.33)
(7.1.34)
(7.1.35)
11/18/2004
(7.1.36)
(7.1.37)
for s=1..N, r=1..N The recurrent backpropagation algorithm for recurrent neural network is summarized in the following.
11/18/2004
k k = f ( ar ) r
ik (Lk )ir1
x sk
is sufficiently small
x
Recurrent Network
koo
+ w ji
Adjoint Network
--
10
11/18/2004
(7.1.34)
k = f (ark * )vrk r
(7.2.1)
by introducing a new vector variable v into the system whose rth component defined by the equation
k vr = ( Lk * )ir1 ik * i
(7.2.2)
in which * is used instead of in the right handside to denote the fixed values of the recurrent network in order to prevent confusion with the fixed points of the adjoint network. They have constant values in the derivations related to the fixed-point vk of the dynamic system. adjoint
L
j
k * k jr j
= k* r
(7.2.5)
11
11/18/2004
(7.1.26)
(
j
jr
f (a k * ) wrj )v k = k * j j r
(7.2.6)
that is
0 = vrk + f ( a k * ) wrj v k + rk* j j
j
(7.2.7)
Such a set of equations may be assumed as a fixed-point solution to the dynamical system defined by the equation dvr = vr + f (a k * ) wrj v j + rk * j (7.2.8) dt j
k = f (ark * )vrk r
(7.2.1)
Therefore vk and then k in equation (7.2.1) can be obtained by the relaxation of the adjoint dynamical system instead of computing L-1. Hence, a backward phase is introduced to the recurrent backpropagation as summarized in the following:
12
11/18/2004
CHAPTER VI : Learning in Recurrent Networks 7.2 Backward Phase: Recurrent BP having backward phase
Step 0. Initialize weights: to small random values Step 1. Apply a sample: apply to the input a sample vector uk having desired output vector yk Step 2. Forward Phase: Let the network to relax according to the state transition equation d k x ( t ) = xik + f ( w ji x k + uik ) j dt i j to a fixed point xk Step 3. Compute:
aik * = aik = w ji x k + uik j
j
f f (a ) = k* a a = ai k* k k i = i = i ( yi xik ) i = 1..N
' k* i
k = f (a rk * )vrk r
k where vr is the fixed point solution of the dynamic system defined by the equation:
dvr = vr + f (a k * ) wrj v j (t ) + rk * j dt j
Step 5. Repeat steps 1-4 for k+1, until the mean error
e =< e k >=< 1 2 i ( yik xik ) 2 >
i
is sufficiently small.
13
11/18/2004
(7.1.1) (7.2.8)
Due to difficulty in constructing a Lyapunov function for recurrent backpropagation, a local stability analysis [Almeida 87] is provided in the following. In recurrent backpropagation, we have two adjoint dynamic systems defined by Eqs. (7.1.1) and (7.2.8). Let x* and v* be stable attractors of these systems. Now we will introduce small disturbances x and v at these stable attractors and observe the behaviors of the systems.
(7.3.1)
satisfying
xi* = f ( w ji x * + ui ) j
j
(7.3.2)
If the disturbance x is small enough, then a function g (.) at x*+ x can be linearized approximately by using the first two terms of the Taylor expansion of the function around x*, which is
g (x * + x) g (x * ) + g (x * ) T x
(7.3.3)
where
g( x * )
14
11/18/2004
(7.3.4)
* j j j
= f ( w ji x + u i ) + f ( w ji x + u i ) w ji x j
* j j
(7.3.5)
(7.3.6)
(7.3.7)
Therefore, by inserting equations (7.3.6) and (7.3.7) in equation (7.3.1), it becomes (7.3.8)
(7.3.9)
(7.3.10)
15
11/18/2004
(7.3.11)
satisfying
vi* = f ( a * ) wij v * + i* j j
j
(7.3.12)
When the disturbance v in is small enough, then linearization in Eq. (7.3.11) results in
dvi = ( ji f (a * ) wij )v j j dt j
(7.3.13)
(7.3.14)
(7.3.15)
(7.3.16)
If the matrix L* has distinct eigenvalues, then the complete solution for the system of homogeneous linear differential equation given by (7.3.15) is in the form
x(t ) = j j e
j jt
(7.3.17)
where j is the eigenvector corresponding to the eigenvalue j of L* and j is any real constant to be determined by the initial condition.
16
11/18/2004
(7.3.18)
If it is true that each j has a positive real value then the convergence of both x(t) and y(t) to vector 0 are guaranteed. It should be noticed that, if weight vector w is symmetric, it has real eigenvalues. Since L can be written as
L = D(1) D( f (ai )) W
(7.3.19)
where D(ci) represents diagonal matrix having ith diagonal entry as ci, real eigenvalues of W imply that they are also real for L.
17