Professional Documents
Culture Documents
Test Examples
(What class to assign this?)
1-Nearest Neighbor
2-Nearest Neighbor
3-Nearest Neighbor
8-Nearest Neighbor
Controlling COMPLEXITY in k-NN
Ingredient Sweetness Crunchiness Food type
apple 10 9 fruit
Bacon 1 4 protein
banana 10 1 fruit
carrot 7 10 vegetable
celery 3 10 vegetable
cheese 1 1 protein
Measuring similarity with distance
Locating the tomato's nearest neighbors requires a distance function, or a
formula that measures the similarity between the two instances.
There are many different ways to calculate distance. Traditionally, the k-NN
algorithm uses Euclidean distance, which is the distance one would measure
if it were possible to use a ruler to connect two points, illustrated in the previous
figure by the dotted lines connecting the tomato to its neighbors.
Euclidean distance
Euclidean distance is specified by the following formula, where p and q are the
examples to be compared, each having n features. The term p1 refers to the value
of the first feature of example p, while q1 refers to the value of the first feature of
example q:
Application of KNN
Note: The probability of all the possible outcomes of a trial must always sum to 1
Understanding probability cont..
For example, given the value P(spam) = 0.20, we can calculate
P(ham) = 1 – 0.20 = 0.80
All
emails
Lotter
y 5%
Spam Ham
20% 80%
Understanding joint probability
Lottery appearing in
Spam
Lottery
appearin
g in Ham
Lottery
without
appearin
g in Spam
Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam ∩
Lottery). the notation A ∩ B refers to the event in which both A and B occur.
Calculating P(spam ∩ Lottery) depends on the joint probability of the two
events or how the probability of one event is related to the probability of the
other.
If the two events are totally unrelated, they are called independent events
Because 20 percent of all the messages are spam, and 5 percent of all
the e-mails contain the word Lottery, we could assume that 1 percent
of all messages are spam with the term Lottery.
P ( X1, X2 ) = P ( X1 ) P ( X2 )
Class Conditional Independence assumption:
P ( X1, X2 ) ¹ P ( X1 ) P ( X2 )
P ( X1, X2 C ) = P ( X1 C ) P ( X2 C )
Naïve Bayes Classifier
Conditional Independence among variables given Classes!
D
P ( C ) Õ P ( Xd C )
P ( C ) P ( X1, X2 ,..., X D C )
P ( C X1 , X2 ,..., X D ) = = d=1
å P ( X , X ,..., X
1 2 D C ') D
å Õ P( X d C¢)
C¢
C ¢ d=1
Simplifying assumption
( )
D
log P ( C X1, X2 ,..., X D ) µ log ( P ( C )) + å log P ( Xd C )
d=1
( )
Naïve Bayes Classifier for
Categorical Valued Variables
Let’s Naïve Bayes!
( )
D
(
log P ( C X1, X2 ,..., X D ) µ log ( P ( C )) + å log P ( Xd C )
d=1
)
Class Prior Parameters:
#EXMPLS COLOR SHAPE LIKE
P ( Like = Y) = ???
20 Red Square Y
P ( Like = N ) = ???
10 Red Circle Y
10 Red Triangle N
Class Conditional Likelihoods 10 Green Square N
P ( Color = Red Like = Y) = ???? 5 Green Circle Y
5 Green Triangle N
P ( Color = Red Like = N ) = ????
10 Blue Square N
...
10 Blue Circle N
P ( Shape = Triangle Like = N ) = ???? 20 Blue Triangle Y
Naïve Bayes Classifier for
Text Classifier
Text Classification Example
Promotions,
P ( spam doc2 ) = 0.94
Bag-of-Words Representation:
Ignore Sequential order of words
Represent as a Weighted-Set – Term Frequency of each term
Naïve Bayes Classifier with BoW
P ( doc1 promo)
= P ( buy :1, two :1, shirt : 2, get :1, one :1, half :1, off :1 promo)
doc1
doc2
N ( free, promo) = å tf ( free doc ) doc 3
docÎpromo ...
...
docn
Bayesian Classifier
Multi-variate real-valued data
Bayes Rule
P (c) P ( x c)
P ( c x) = x ÎR D
P ( x)
Simple Bayesian Classifier
P (c) P ( x c)
P ( c x) =
P ( x)
æ 1 ö
P ( x c ) = N ( x mc , S c ) = exp ç - ( x - mc ) S -1
c ( c )÷
1 T
x - m
D 1
è 2 ø
( 2p ) 2 Sc 2
Sum: ò P ( x c ) dx = 1
xÎR D
Mean: ò xP ( x c ) dx = mc
xÎR D
Co-Variance: ò ( x - mc ) ( x - mc ) P ( x c ) dx = S c
T
xÎR D
Controlling COMPLEXITY