Professional Documents
Culture Documents
logistic regression
Le Song
Machine Learning
CS 7641,CSE/ISYE 6740, Fall 2015
Classification
Represent the data
1, 1
Classifier
1 ,
1
1
,
?
;
1
,
Prior
normalization constant
, then
, otherwise
Alternatively:
If ratio
|!"
|!"
!"
!"
, then
, otherwise
ln
'(
')
1 7
1 is Gaussian,
1 is fully factorized
! , !
ln
'(
')
Logistic regression
Neural networks
1 is fully factorized
.
"
, second
1
10
K- nearest neighbors
k-nearest neighbor classifier: assign a label by taking a
majority vote over the 2 training points closest to
For 3 4 1 , the k-nearest neighbor rule generalizes the nearest
neighbor rule
To define this more mathematically:
I6
If
as:
9 :;
<
=>
11
Example
K=1
12
Example
K=3
13
Example
K=5
14
Example
K = 25
15
Example
K = 51
16
Example
K = 101
17
Computations in K-NN
Similar to KDE, essentially no training or learning phase,
computation is needed when applying the classifier
Memory: ? @-
Discriminative classifier
Directly estimate decision boundary h x
|
posterior distribution
'(
ln
')
or
or 8
is a function of , and
take a
1
exp E H
JKLM NO
20
log .
"
,E
1
exp E H
Plug in
E :
<
1
exp E H
log .
1 EH
"
log 1
exp E H
1 exp E H
,E
exp
EH
22
The gradient of E
E :
<
Gradient
U E
UE
log .
"
1 EH
<
,E
log 1
exp
EH
exp E H
1 exp E H
23
Gradient descent/ascent
One way to solve an unconstrained optimization problem is
gradient descent
Given an initial guess, we iteratively refine the guess by taking
the direction of the negative gradient
Think about going down a hill by
taking the steepest direction
at each step
Update rule
V6 W8 6
V6 is called the step size or learning rate
6J
24
[<
E Y ||
exp E H
1 exp E H
25
26
1, 1 ,
]1
Number of parameters
Nave Bayes :
2; 1, when all random variables are binary
4n+1 for Gaussians: 2; mean, 2; variance, and 1 for prior
logistic regression:
;
1: EX , E , E , , E1
27
28
29
30
31
,,
S,
"
c"
E log . . Ic
S
<<
S
<<
" c"
" c"
h
c Ec
!g(
c logIc
S
32
< Ic
33