You are on page 1of 19

CSL346: ercepLron

WlnLer 2012
Luke Zeulemoyer


Slldes adapLed from uan kleln
Who needs probablllues?
revlously: model daLa
wlLh dlsLrlbuuons
!olnL: (x,?)
e.g. naive 8ayes
Condluonal: (?|x)
e.g. Loglsuc 8egresslon
8uL walL, why
probablllues?
LeLs Lry Lo be error-
drlven!
mpg cylinders displacementhorsepower weight acceleration modelyear maker
good 4 97 75 2265 18.2 77 asia
bad 6 199 90 2648 15 70 america
bad 4 121 110 2600 12.8 77 europe
bad 8 350 175 4100 13 73 america
bad 6 198 95 3102 16.5 74 america
bad 4 108 94 2379 16.5 73 asia
bad 4 113 95 2228 14 71 asia
bad 8 302 139 3570 12.8 78 america
: : : : : : : :
: : : : : : : :
: : : : : : : :
good 4 120 79 2625 18.6 82 america
bad 8 455 225 4425 10 70 america
good 4 107 86 2464 15.5 76 europe
bad 5 131 103 2830 15.9 78 europe
Cenerauve vs. ulscrlmlnauve
Cenerauve classlers:
L.g. naive 8ayes
A [olnL probablllLy model wlLh evldence varlables
Cuery model for causes glven evldence
ulscrlmlnauve classlers:
no generauve model, no 8ayes rule, oen no
probablllues aL all!
1ry Lo predlcL Lhe label ? dlrecLly from x
8obusL, accuraLe wlLh varled feaLures
Loosely: mlsLake drlven raLher Lhan model drlven
Llnear Classlers
lnpuLs are feaLure values
Lach feaLure has a welghL
Sum ls Lhe acuvauon
lf Lhe acuvauon ls:
osluve, ouLpuL !"#$$ &
negauve, ouLpuL !"#$$ '
!
f
1
f
2
f
3
w
1
w
2
w
3
>0?
Lxample: Spam
lmaglne 3 feaLures (spam ls posluve class):
free (number of occurrences of free)
money (occurrences of money)
8lAS (lnLercepL, always has value 1)
BIAS : -3
free : 4
money : 2
...
BIAS : 1
free : 1
money : 1
...
free money
w.f(x) > 0 ! SAM!!!
8lnary ueclslon 8ule
ln Lhe space of feaLure vecLors
Lxamples are polnLs
Any welghL vecLor ls a hyperplane
Cne slde corresponds Lo ?=+1
CLher corresponds Lo ?=-1
BIAS : -3
free : 4
money : 2
...
0 1
0
1
2
free
m
o
n
e
y

+1 = SPAM
-1 = HAM
8lnary ercepLron AlgorlLhm
SLarL wlLh zero welghLs
lor each Lralnlng lnsLance (x,y*):
Classlfy wlLh currenL welghLs
lf correcL (l.e., y=y*), no change!
lf wrong: updaLe
= ln

2
e

(x
i

i0
)
2
2
2
i
1

2
e

(x
i

i1
)
2
2
2
i

=
(x
i

i0
)
2
2
2
i
+
(x
i

i1
)
2
2
2
i
=

i0
+
i1

2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1

+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1

2
i
w = w +

j
[y

j
p(y

j
|x
j
, w)]f(x
j
)
w = w + [y

y(x;w)]f(x)
w = w + y

f(x)
4
Lxamples: ercepLron
Separable Case
http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html
Lxamples: ercepLron
lnseparable Case
http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html
lrom Loglsuc 8egresslon Lo Lhe ercepLron:
2 easy sLeps!
Loglsuc 8egresslon: (ln vecLor noLauon): y ls [0,1}
ercepLron: y ls [0,1}, y(x,w) ls predlcuon glven w
Differences?
Drop the !
j
over training examples: online vs.
batch learning
Drop the distn: probabilistic vs. error driven
learning
= ln

2
e

(x
i

i0
)
2
2
2
i
1

2
e

(x
i

i1
)
2
2
2
i

=
(x
i

i0
)
2
2
2
i
+
(x
i

i1
)
2
2
2
i
=

i0
+
i1

2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1

+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1

2
i
w = w +

j
[y

j
p(y

j
|x
j
, w)]f(x
j
)
w = w + [y

y(x;w)]f(x)
4
= ln

2
e

(x
i

i0
)
2
2
2
i
1

2
e

(x
i

i1
)
2
2
2
i

=
(x
i

i0
)
2
2
2
i
+
(x
i

i1
)
2
2
2
i
=

i0
+
i1

2
i
x
i
+

2
i0
+
2
i1
2
2
i
w
0
= ln
1

+

2
i0
+
2
i1
2
2
i
w
i
=

i0
+
i1

2
i
w = w +

j
[y

j
p(y

j
|x
j
, w)]f(x
j
)
w = w + [y

y(x;w)]f(x)
4
Muluclass ueclslon 8ule
lf we have more Lhan Lwo
classes:
Pave a welghL vecLor for
each class:
CalculaLe an acuvauon for
each class
PlghesL acuvauon wlns
Lxample
BIAS :
win :
game :
vote :
the :
...
BIAS :
win :
game :
vote :
the :
...
BIAS :
win :
game :
vote :
the :
...
win the vote
win the election
win the game
Lxample
BIAS : -2
win : 4
game : 4
vote : 0
the : 0
...
BIAS : 1
win : 2
game : 0
vote : 4
the : 0
...
BIAS : 2
win : 0
game : 2
vote : 0
the : 0
...
win the vote
BIAS : 1
win : 1
game : 0
vote : 1
the : 1
...
1he Mulu-class ercepLron Alg.
SLarL wlLh zero welghLs
lLeraLe Lralnlng examples
Classlfy wlLh currenL welghLs
lf correcL, no change!
" If wrong: lower score of wrong
answer, raise score of right answer
roperues of ercepLrons
SeparablllLy: some parameLers geL Lhe
Lralnlng seL perfecLly correcL
Convergence: lf Lhe Lralnlng ls
separable, percepLron wlll evenLually
converge (blnary case)
MlsLake 8ound: Lhe maxlmum number
of mlsLakes (blnary case) relaLed Lo Lhe
margin or degree of separablllLy
Separable
Non-Separable
Problems with the Perceptron
" Noise: if the data isnt
separable, weights might thrash
" Averaging weight vectors over time
can help (averaged perceptron)
" Mediocre generalization: finds a
barely separating solution
" Overtraining: test / held-out
accuracy usually rises, then falls
" Overtraining is a kind of overfitting
Linear Separators
" Which of these linear separators is optimal?
Support Vector Machines
" Maximizing the margin: good according to intuition,
theory, practice
" Support vector machines (SVMs) find the separator with
max margin
SVM
1hree vlews of
Classlcauon
(more Lo come laLer ln
course!)
naive 8ayes:
arameLers from daLa sLausucs
arameLers: probablllsuc lnLerpreLauon
1ralnlng: one pass Lhrough Lhe daLa
Loglsuc 8egresslon:
arameLers from gradlenL ascenL
arameLers: llnear, probablllsuc model,
and dlscrlmlnauve
1ralnlng: one pass Lhrough Lhe daLa per
gradlenL sLep, use valldauon Lo sLop
1he percepLron:
arameLers from reacuons Lo mlsLakes
arameLers: dlscrlmlnauve
lnLerpreLauon
1ralnlng: go Lhrough Lhe daLa unul held-
ouL accuracy maxes ouL
Training
Data
Held-Out
Data
Test
Data

You might also like