Professional Documents
Culture Documents
wcj yj (
1)
j2N
where yj (
1) is the activation of network input(xj (
1
Input Gate:
xin =
w j yj (
1) +
j2N
w c sc (
1)
c2C
yin = f (xin )
Forget Gate:
xf or =
w j yj (
1) +
j2N
w c sc (
1)
c2C
yf or = f (xf or )
The Cell value, which is similar to the hidden node in RNN
sc = yf or sc (
1) + yin g(xc )
Output Gate:
xout =
w!j yj (
1) +
j2N
w!c sc ( )
c2C
yout = f (xout )
Cell Output:
8c 2 C; yc = yout h(sc )
output layer:
xk =
wkc yc
c2C
yk = sof t max(x)k
Error function is cross-entropy for a softmax output layer
Etotal =
1
X
E( ) =
E( )
n X
C
X
tki log(yki )
k=1 i=c
Using the BPTT propagate the output errors backwards through the net
(error from answer).
)=
@E( )
@xk
) = yk ( )
tk ( )
dene
k(
k(
k2output units
For each LSTM block the delta are calculated as follows:
Cell Output:
8c 2 C; dene
c(
)=
X
@E( )
=
wjc j ( )
@yc
j2N
Output Gate:
out (
)=
X
@E( ) @yc @yout
@E( )
=
= f 0 (xout )
@xout
@yc @yout @xout
c(
)h(sc )
c2C
where
@yout
= f 0 (xout )
@xout
@yc
= h(sc )
@yout
) rx( ) E = f 0 (xout )
out
ry ( ) E
h(sc )
States:
@E( )
@sc
@E( ) @yc
@E( ) @sc ( + 1)
+
@yc @sc
@sc ( + 1)
@sc
@xf or ( + 1) @E( ) @xout
@E( ) @xin ( + 1)
@E( )
+
+
+
@xin ( + 1)
@sc
@xf or ( + 1)
@sc
@xout @sc
@E(
+
1)
= c yout h0 (sc ) +
yf or ( + 1)
@sc
+ in ( + 1)w c + f or ( + 1)w c + out w!c
r s( ) E = r y ( ) E
c
+rx(
+1)
in
h0 (sc ) + rs(
yout
w c + rx(
+1)
f or
+1)
( )
rxout
yf or ( + 1)
w!c
This result is dierent fromRNN. The iterative term multiply with forget
3
gate value which control the hidden layer remember things or not.
Cells:
8c 2 C;
c(
)=
@E( )
@E( ) @sc
@E( )
=
= yin g 0 (xc )
@xc
@sc @xc
@sc
g 0 (xc )
) rx( ) E = yin
c
r s( ) E
c
Forget Gate:
f or (
)=
X @E( )
@E( )
@E( ) @sc @yf or
=
= f 0 (xf or )
sc (
@xf or
@sc @yf or @xf or
@sc
1)
c2C
where
@sc
= sc (
@yf or
@yf or
= f 0 (xf or )
@xf or
1)
X @E( )
@sc
) rx( ) E = f 0 (xf or )
f or
Input Gate:
in (
)=
sc (
1)
c2C
X @E( )
@E( )
@E( ) @sc @yin
=
= f 0 (xin )
g(xc )
@xin
@sc @yin @xin
@sc
c2C
where
@sc
= g(xc )
@yin
@yin
= f 0 (xin )
@xin
X @E( )
@sc
) rx( ) E = f 0 (xin )
in
g(xc )
c2C
Now using the s to get the partial derivatives of the cumulative sequence
error
Etotal
1
X
de ner (S)
E( )
@Etotal
@
1)i
where
1)i
!c (t); sc (t)i