You are on page 1of 2

Neural Nets for Dummies

Training: Prediction: Advantages:


Choose connection weights that minimize error Propagate input feature values through the network of artificial neurons Fast prediction Does feature weighting Very generally applicable Very slow training Overfitting is easy

Disadvantages: Perceptron Unit


y

Sigmoid Unit
1 if wi xi > 0 y= i =0 0 otherwise
n

z = wi xi

n i

s( z )
w0 -1 w1 w2 . . . x1 x2 xn wn

w0 -1

w1 w2 . . . x1 x2 xn

wn

y = s(z) =

1 1 + ez

Creates a decision plane (line) in feature space

Creates a soft decision plane (line) in feature space


Adopted from Tomas Lozano Perezs 6.034 Recitation Notes

The simplest two-layer sigmoid Neural Net


v v 1 v v y = F ( x, w) ( y * F ( x , w )) 2 vector of weights 2 y E vector of inputs * = ( y y ) w j w j y* = Desired output v v y = F ( x , w) = s ( z 2 ) = s ( w2 y1 ) = s ( w2 s ( z1 )) = s ( w2 s ( w1 x))

E =

w1

z1

w2 y1

z2

Goal: find the weight vector that minimizes the error Approach: Gradient Descent
y s ( z2 ) z 2 s( z 2 ) = = y1 w2 z 2 w2 z 2
recall z 2 = w2 y1 so,

(How does the error change as we twiddle with the weights?)


z 2 = y1 w2 z 2 recall z 2 = w2 s ( z1 ) so, = w2 s ( z1 ) z recall z1 = w1 x so, 1 = x w1

E s ( z 2 ) = ( y y * ) y1 z 2 w2

y s ( z 2 ) z2 s ( z1 ) z1 s ( z 2 ) s( z1 ) w2 x = = w1 z 2 s ( z1 ) z1 w1 z 2 z1

E s ( z1 ) = 2 w2 x w1 z1

Descent rule:

wi j = wi j r j yi

Backpropagation rule: j =

ds( z j ) dz j

wi j

w j k

yi

w j k j yj j

yi is xi for input layer

Example of Backpropagation
y3 z3 w03 -1 z1 w01 -1 w11 x1 w13 y1 w23 w21 w12 y2 z2 Descent rule: w03 = w02 = w02 w01 = -1
Backpropagation rule:

3 =

2 =

1 =
w13 = w12 = w11 = w23 = w22 = w21 =

w22 x2

Initial Conditions: all weights are zero, learning rate is 8. Input: (x1, x2) = (0, 1)
y* = z1 = z2 = y1 = y2 = z3 = y3 =

3 = 2 = 1 =

w03 = w02 = w01 =

w13 = w12 = w11 = w23 = w22 = w21 =

Example of Backpropagation
y3 z3 w03 -1 z1 w01 -1 w11 x1 w13 y1 w23 w21 w12 y2
Backpropagation rule:
* 3 = y3 (1 y3 )( y3 y3 ) 2 = y2 (1 y2 ) 3 w23

1 = y1 (1 y1 ) 3 w13
z2 Descent rule: w03 = w03 r 3 ( 1) w13 = w13 r 3 y1 w23 = w23 r 3 y2 w02 = w02 r 2 (1) w12 = w12 r 2 x1 w22 = w22 r 2 x2 w02 w01 = w01 r 1 (1) w11 = w11 r 1 x1 w21 = w21 r 1 x2 -1

w22 x2

Initial Conditions: all weights are zero, learning rate is 8. Input: (x1, x2) = (0, 1)
y* = 1 z1 = 0 z2 = 0 y1 = 1 2 y2 = 1 2 z3 = 0 y3 = 1 2

3 = 1 8 2 = 0 1 = 0

w03 = 1 w02 = 0 w01 = 0

w13 = 1 2 w12 = 0 w11 = 0 w23 = 1 2 w22 = 0 w21 = 0

You might also like