Professional Documents
Culture Documents
Sigmoid Unit
1 if wi xi > 0 y= i =0 0 otherwise
n
z = wi xi
n i
s( z )
w0 -1 w1 w2 . . . x1 x2 xn wn
w0 -1
w1 w2 . . . x1 x2 xn
wn
y = s(z) =
1 1 + ez
E =
w1
z1
w2 y1
z2
Goal: find the weight vector that minimizes the error Approach: Gradient Descent
y s ( z2 ) z 2 s( z 2 ) = = y1 w2 z 2 w2 z 2
recall z 2 = w2 y1 so,
E s ( z 2 ) = ( y y * ) y1 z 2 w2
y s ( z 2 ) z2 s ( z1 ) z1 s ( z 2 ) s( z1 ) w2 x = = w1 z 2 s ( z1 ) z1 w1 z 2 z1
E s ( z1 ) = 2 w2 x w1 z1
Descent rule:
wi j = wi j r j yi
Backpropagation rule: j =
ds( z j ) dz j
wi j
w j k
yi
w j k j yj j
Example of Backpropagation
y3 z3 w03 -1 z1 w01 -1 w11 x1 w13 y1 w23 w21 w12 y2 z2 Descent rule: w03 = w02 = w02 w01 = -1
Backpropagation rule:
3 =
2 =
1 =
w13 = w12 = w11 = w23 = w22 = w21 =
w22 x2
Initial Conditions: all weights are zero, learning rate is 8. Input: (x1, x2) = (0, 1)
y* = z1 = z2 = y1 = y2 = z3 = y3 =
3 = 2 = 1 =
Example of Backpropagation
y3 z3 w03 -1 z1 w01 -1 w11 x1 w13 y1 w23 w21 w12 y2
Backpropagation rule:
* 3 = y3 (1 y3 )( y3 y3 ) 2 = y2 (1 y2 ) 3 w23
1 = y1 (1 y1 ) 3 w13
z2 Descent rule: w03 = w03 r 3 ( 1) w13 = w13 r 3 y1 w23 = w23 r 3 y2 w02 = w02 r 2 (1) w12 = w12 r 2 x1 w22 = w22 r 2 x2 w02 w01 = w01 r 1 (1) w11 = w11 r 1 x1 w21 = w21 r 1 x2 -1
w22 x2
Initial Conditions: all weights are zero, learning rate is 8. Input: (x1, x2) = (0, 1)
y* = 1 z1 = 0 z2 = 0 y1 = 1 2 y2 = 1 2 z3 = 0 y3 = 1 2
3 = 1 8 2 = 0 1 = 0