Cse 546 Wi 12 Perceptron

CSL346: ercepLron
WlnLer 2012
Luke Zeulemoyer

Slldes adapLed from uan kleln
Who needs probablllues?
revlously: model daLa
wlLh dlsLrlbuuons
!olnL: (x,?)
e.g. naive 8ayes
Condluonal: (?|x)
e.g. Loglsuc 8egresslon
8uL walL, why
probablllues?
LeLs Lry Lo be error-
drlven!
mpg cylinders displacementhorsepower weight acceleration modelyear maker
good 4 97 75 2265 18.2 77 asia
bad 6 199 90 2648 15 70 america
bad 4 121 110 2600 12.8 77 europe
bad 8 350 175 4100 13 73 america
bad 6 198 95 3102 16.5 74 america
bad 4 108 94 2379 16.5 73 asia
bad 4 113 95 2228 14 71 asia
bad 8 302 139 3570 12.8 78 america
: : : : : : : :
: : : : : : : :
: : : : : : : :
good 4 120 79 2625 18.6 82 america
bad 8 455 225 4425 10 70 america
good 4 107 86 2464 15.5 76 europe
bad 5 131 103 2830 15.9 78 europe
Cenerauve vs. ulscrlmlnauve
Cenerauve classlers:
L.g. naive 8ayes
A [olnL probablllLy model wlLh evldence varlables
Cuery model for causes glven evldence
ulscrlmlnauve classlers:
no generauve model, no 8ayes rule, oen no
probablllues aL all!
1ry Lo predlcL Lhe label ? dlrecLly from x
8obusL, accuraLe wlLh varled feaLures
Loosely: mlsLake drlven raLher Lhan model drlven
Llnear Classlers
lnpuLs are feaLure values
Lach feaLure has a welghL
Sum ls Lhe acuvauon
lf Lhe acuvauon ls:
osluve, ouLpuL !"#$$ &
negauve, ouLpuL !"#$$ '
!
f
1
f
2
f
3
w
1
w
2
w
3
>0?
Lxample: Spam
lmaglne 3 feaLures (spam ls posluve class):
free (number of occurrences of free)
money (occurrences of money)
8lAS (lnLercepL, always has value 1)
BIAS : -3
free : 4
money : 2
...
BIAS : 1
free : 1
money : 1
...
free money
w.f(x) > 0 ! SAM!!!
8lnary ueclslon 8ule
ln Lhe space of feaLure vecLors
Lxamples are polnLs
Any welghL vecLor ls a hyperplane
Cne slde corresponds Lo ?=+1
CLher corresponds Lo ?=-1
BIAS : -3
free : 4
money : 2
...
0 1
0
1
2
free
m
o
n
e
y

+1 = SPAM
-1 = HAM
8lnary ercepLron AlgorlLhm
SLarL wlLh zero welghLs
lor each Lralnlng lnsLance (x,y*):
Classlfy wlLh currenL welghLs
lf correcL (l.e., y=y*), no change!
lf wrong: updaLe
= ln
2
e
(x
i
i0
)
2
2
2
i
1
2
e
(x
i
i1
)
2
2
2
i
=
(x
i
i0
)
2
2
2
i
+
(x
i
i1
)
2
2
2
i
=

i0
+
i1
2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1
+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1
2
i
w = w +
j
[y
j
p(y
j
|x
j
, w)]f(x
j
)
w = w + [y
y(x;w)]f(x)
w = w + y
f(x)
4
Lxamples: ercepLron
Separable Case
http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html
Lxamples: ercepLron
lnseparable Case
http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html
lrom Loglsuc 8egresslon Lo Lhe ercepLron:
2 easy sLeps!
Loglsuc 8egresslon: (ln vecLor noLauon): y ls [0,1}
ercepLron: y ls [0,1}, y(x,w) ls predlcuon glven w
Differences?
Drop the !
j
over training examples: online vs.
batch learning
Drop the distn: probabilistic vs. error driven
learning
= ln
2
e
(x
i
i0
)
2
2
2
i
1
2
e
(x
i
i1
)
2
2
2
i
=
(x
i
i0
)
2
2
2
i
+
(x
i
i1
)
2
2
2
i
=

i0
+
i1
2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1
+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1
2
i
w = w +
j
[y
j
p(y
j
|x
j
, w)]f(x
j
)
w = w + [y
y(x;w)]f(x)
4
= ln
2
e
(x
i
i0
)
2
2
2
i
1
2
e
(x
i
i1
)
2
2
2
i
=
(x
i
i0
)
2
2
2
i
+
(x
i
i1
)
2
2
2
i
=

i0
+
i1
2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1
+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1
2
i
w = w +
j
[y
j
p(y
j
|x
j
, w)]f(x
j
)
w = w + [y
y(x;w)]f(x)
4
Muluclass ueclslon 8ule
lf we have more Lhan Lwo
classes:
Pave a welghL vecLor for
each class:
CalculaLe an acuvauon for
each class
PlghesL acuvauon wlns
Lxample
BIAS :
win :
game :
vote :
the :
...
BIAS :
win :
game :
vote :
the :
...
BIAS :
win :
game :
vote :
the :
...
win the vote
win the election
win the game
Lxample
BIAS : -2
win : 4
game : 4
vote : 0
the : 0
...
BIAS : 1
win : 2
game : 0
vote : 4
the : 0
...
BIAS : 2
win : 0
game : 2
vote : 0
the : 0
...
win the vote
BIAS : 1
win : 1
game : 0
vote : 1
the : 1
...
1he Mulu-class ercepLron Alg.
SLarL wlLh zero welghLs
lLeraLe Lralnlng examples
Classlfy wlLh currenL welghLs
lf correcL, no change!
" If wrong: lower score of wrong
answer, raise score of right answer
roperues of ercepLrons
SeparablllLy: some parameLers geL Lhe
Lralnlng seL perfecLly correcL
Convergence: lf Lhe Lralnlng ls
separable, percepLron wlll evenLually
converge (blnary case)
MlsLake 8ound: Lhe maxlmum number
of mlsLakes (blnary case) relaLed Lo Lhe
margin or degree of separablllLy
Separable
Non-Separable
Problems with the Perceptron
" Noise: if the data isnt
separable, weights might thrash
" Averaging weight vectors over time
can help (averaged perceptron)
" Mediocre generalization: finds a
barely separating solution
" Overtraining: test / held-out
accuracy usually rises, then falls
" Overtraining is a kind of overfitting
Linear Separators
" Which of these linear separators is optimal?
Support Vector Machines
" Maximizing the margin: good according to intuition,
theory, practice
" Support vector machines (SVMs) find the separator with
max margin
SVM
1hree vlews of
Classlcauon
(more Lo come laLer ln
course!)
naive 8ayes:
arameLers from daLa sLausucs
arameLers: probablllsuc lnLerpreLauon
1ralnlng: one pass Lhrough Lhe daLa
Loglsuc 8egresslon:
arameLers from gradlenL ascenL
arameLers: llnear, probablllsuc model,
and dlscrlmlnauve
1ralnlng: one pass Lhrough Lhe daLa per
gradlenL sLep, use valldauon Lo sLop
1he percepLron:
arameLers from reacuons Lo mlsLakes
arameLers: dlscrlmlnauve
lnLerpreLauon
1ralnlng: go Lhrough Lhe daLa unul held-
ouL accuracy maxes ouL
Training
Data
Held-Out
Data
Test
Data

Cse 546 Wi 12 Perceptron

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cse 546 Wi 12 Perceptron

Uploaded by

Copyright:

Available Formats

CSL346: ercepLron

You might also like