Professional Documents
Culture Documents
ij
w
0 5 2 ( ) row vectors
3 0 4 ( )
1 6 -1 ( )
column vectors
vectors of weights:
weights come into unit j
weights go out of unit i
) (
3 , 2 , 1 j j j j
w w w w
) (
3 , 2 , 1 i i i i
w w w w
1
w
2
w
3
w
1
w
2
w
3
w
1 2 3
2
1
5
3
4
6
-1
small usually of scale the specifies
rate learning :
n) computatio vector(for input ) ...... , (
vector put target)out (or training ) ...... , (
vector input training ) ....... (
} {
raining learning/t ) ( ) (
,
2 1
2
2 , 1
ij
n
m
n
ij
ij ij ij
w
x x x x
t t t t
s s s s
w W
old w new w w
Review of Matrix Operations
Vector: a sequence of elements (the order is important)
e.g., x=(2, 1) denotes a vector
length = sqrt(2*2+1*1)
orientation angle = a
x=(x1, x2, , xn), an n dimensional vector
a point on an n dimensional space
column vector: row vector
,
_
8
5
2
1
x
a
T
x y ) 8 5 2 1 (
x x
T T
) (
X (2, 1)
transpose
norms of a vector: (magnitude)
vector operations:
i
n i
i
n
i
i
n
i
x x norm L
x x norm L
x x norm L
max
) (
1
2 / 1 2
1
2
2
1
1
1
x y
x
x
x
y y y y x
y
y
y
x x x y x
n y x
x r rx rx rx rx
T
n
n i i
n
i
n
n
T
T
n
,
_
,
_
) ... , ( ) ...... , (
dimension same of vectors column are ,
product ) dot ( inner
vector column a : , scaler a : ) ,...... , (
2
1
2 1
1
2
1
2 1
2 1
Cross product:
defines another vector orthogonal to the plan
formed by x and y.
y x
matrix:
the element on the ith row and jth column
a diagonal element
a weight in a weight matrix W
each row or column is a vector
jth column vector
ith row vector
n m
j
i
mn m m
n
n m
a
a a a
a a a
A
,
_
} {
......
......
2 1
1 12 11
,
_
m
n n m
i
j
a
a
a a A
a
a
) ...... (
:
:
1
1 x
:
:
:
ij
ii
ij
w
a
a
a column vector of dimension m is a matrix of mx1
transpose:
jth column becomes jth row
square matrix:
identity matrix:
,
_
mn n n
m
T
n m
a a a
a a a
A
......
......
2 1
1 21 11
n n
A
'
,
_
otherwise
j i if
a I
j
i
0
1
1 ...... 0 0
0 ...... 1 0
0 ..... 0 1
symmetric matrix: m = n
matrix operations:
The result is a row vector, each element of which is
an inner product of and a column vector
ji ij i i
T
a a ij or a a i or A A
, ,
) ( ) ,...... (
1
j
i n
ra ra ra rA
) ,...... (
) ,...... )( ...... (
1
1 1
n
T T
n m n m
T
a x a x
a a x x A x
T
x
j
a
product of two matrices:
vector outer product:
j i ij p m p n n m
b a C where C B A
n m n n n m
A I A
( )
,
_
,
_
n m m m
n
n
m
i
T
y x y x y x
y x y x y x
y y
x
x
x
y x
...... , ,
,...... ,
......
2 1
1 2 1 1 1
1
1
Calculus and Differential Equations
'
) ( ) (
) ( ) (
1 1
t f t x
t f t x
n n
i
x t
)) ( ), ( (
1
t x t x
n
) (t f
i
Multi-variable calculus:
partial derivative: gives the direction and speed of
change of y, with respect to
)) ( ...... ), ( ), ( ( ) (
2 1
t x t x t x f t y
n
) (
3
) (
2
2
) (
1
1
) (
2
2 1
3 2 1
3 2 1
3 2 1
3 2 1
2
) cos(
) sin(
x x x
x x x
x x x
x x x
e
x
y
e x
x
y
e x
x
y
e x x y
+ +
+ +
+ +
+ +
+ +
i
x
the total derivative:
Gradient of f :
Chain-rule: y is a function of , is a function of t
T
n
t x t x
t x t x t y
n
f
n
x
f
x
f
dt
df
) ......
1
(
......
1
) ( ) (
) ( ) ( ) (
1
) ...... , (
1 n
x
f
x
f
f
)) ( ...... ), ( ), ( ( ) (
2 1
t x t x t x f t y
n
i
x
i
x
dynamic system:
'
) ...... , (
) ..... , (
1
1
1 1
) (
) (
n n
n
x x f
n
x x f
t x
t x
i
x
i x
i
0
) ...... , (
1 n
x x
i
x
Chapter 2: Simple Neural Networks for
Pattern Classification
General discussion
Linear separability
Hebb nets
Perceptron
Adaline
General discussion
Pattern recognition
Pattern classification:
General architecture
Single layer
net input to Y:
bias b is treated as the weight from a special unit with
constant output 1.
threshold related to Y
output
classify into one of the two classes
+
n
i
i i
w x b net
1
'
<
net
net
net f y
if 1 -
if 1
) (
Y
xn
x1
1
w
n
w
1
b
) ...... , (
1 n
x x
Decision region/boundary
n = 2, b != 0, = 0
is a line, called decision boundary, which partitions the
plane into two decision regions
If a point/pattern is in the positive region, then
, and the output is one (belongs to
class one)
Otherwise, , output 1 (belongs to
class two)
n = 2, b = 0, != 0 would result a similar partition
2
1
2
1
2
2 2 1 1
or 0
w
b
x
w
w
x
w x w x b
+ +
2
x
1
x
+
-
) , (
2 1
x x
0
2 2 1 1
+ + w x w x b
0
2 2 1 1
< + + w x w x b
n
i
i i
w x b
Linear Separability Problem
If such a decision boundary does not exist, then the two classes are
said to be linearly inseparable.
n
i
i i
w x b
'
< + +
+
+
<
(4)
(3)
(2)
(1)
0
0
0
0
2 1
2 1
2 1
2 1
w w b
w w b
w w b
w w b
By Rosenblatt (1962)
the gradient of E:
There for
P
p
p in y p t
P
E
1
2
)) ( _ ) ( (
1
) ...... , (
1 n
w
E
w
E
E
i
i
w
E
w
i
P
p
P
p
i i
x p in y p t
P
p in y p t
w
p in y p t
P w
E
] )) ( _ ) ( (
2
[
) ( _ ) ( ( ))] ( _ ) ( (
2
[
1
1
i
i
w
E
w
i
P
x p in y p t
P
] )) ( _ ) ( (
2
[
1
Notes: