Professional Documents
Culture Documents
Classification
Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chans notes Presented by Raymond Wong raywong@cse
COMP537
Classification
Suppose there is a person.
Race
Income
Child
Insurance
white
high
no
? root
child=yes
100% Yes 0% No
child=no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
Decision tree
COMP537 2
Classification
Suppose there is a person.
Test set
Child Insurance
Race
Income
white
high
no
? root
child=yes
100% Yes 0% No
child=no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
Training set
COMP537
Decision tree
3
Applications
Insurance
Marketing
Bank Loan
COMP537
Applications
Network
Software
COMP537
Same/Difference
Classification Clustering
COMP537
Classification Methods
COMP537
Decision Trees
COMP537
Entropy
Example 1
Consider a random variable which has a uniform distribution over 32 outcomes To identify an outcome, we need a label that takes 32 different values. Thus, 5 bit strings suffice as labels
COMP537
Entropy
Entropy is used to measure how informative is a node. If we are given a probability distribution P = (p1, p2, , pn) then the Information conveyed by this distribution, also called the Entropy of P, is: I(P) = - (p1 x log p1 + p2 x log p2 + + pn x log pn) All logarithms here are in base 2.
10
COMP537
Entropy
For example,
If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0.
The entropy is a way to measure the amount of information. The smaller the entropy, the more informative we have.
11
COMP537
Race
Income
Child
Insurance
black
Entropy
Info(T) = - log - log =1
black
black white
low
low low
no
no no
no
no no
COMP537
12
Race
Income
Child
Insurance
black
Entropy
Info(T) = - log - log =1
black
black white
low
low low
no
no no
no
no no
= 0.9183 = 0.6887
Gain(Income, T) = Info(T) Info(Income, T) = 1 0.6887= 0.3113 For attribute Race, For attribute Income,
COMP537
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - log - log 6 =1 7 For attribute Child, 8 Info(Tyes) = - 1 log 1 0 log 0 = 0
black
black white
low
low low
no
no no
no
no no
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - 1/5 log 1/5 4/5 log 4/5 6 = 0.7219 7 For attribute Race, 8 Info(Tblack) = - log log = 0.8113
black
black white
low
low low
no
no no
no
no no
COMP537
15
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - 1/5 log 1/5 4/5 log 4/5 6 = 0.7219 7 For attribute Income, 8 Info(Thigh) = - 1 log 1 0 log 0 = 0
black
black white
low
low low
no
no no
no
no no
=0
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 Gain(Income, T) = Info(T) Info(Income, T) = 0.7219 0 = 0.7219 For attribute Race, For attribute Income,
COMP537
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2 3 4
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
5 black Info(T) = - 1/5 log 1/5 4/5 log 4/5 {5, 6, 7, 8} {1} 6 =Insurance: 1 Yes; 0 No Insurance: 0 Yes; black 0.7219 4 No 7 black For attribute Income, 8 white Info(Thigh) = - 1 log 1 0 log 0 = 0
low
low low
no
no no
no
no no
=0
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 Gain(Income, T) = Info(T) Info(Income, T) = 0.7219 0 = 0.7219 For attribute Race, For attribute Income,
COMP537
child=yes
100% Yes 0% No
root
child=no
Race
Income
Child
Insurance
1 2 3 4 5 6
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
black
black white
low
low low
no
no no
no
no no
Decision tree
7 8
Race
Income
Child
Insurance
white
high
no
COMP537
18
child=yes
100% Yes 0% No
root
child=no
Race
Income
Child
Insurance
1 2 3 4 5 6
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
black
black white
low
low low
no
no no
no
no no
Decision tree
7 8
Termination Criteria?
COMP537
19
Decision Trees
COMP537
20
C4.5
ID3
Impurity Measurement
C4.5
Impurity Measurement
COMP537
Race
Income
Child
Insurance
black
Entropy
Info(T) = - log - log =1
black
black white
low
low low
no
no no
no
no no
Gain(Race, T) = (Info(T) Info(Race, T))/SplitInfo(Race) = (1 0.8113)/1 = 0.1887 For attribute Race, Gain(Race, T) = 0.1887
COMP537
22
Race
Income
Child
Insurance
black
Entropy
Info(T) = - log - log =1
black
black white = 0.9183
low
low low
no
no no
no
no no
= 0.6887
SplitInfo(Income) = - 2/8 log 2/8 6/8 log 6/8 = 0.8113 Gain(Income, T)= (Info(T)Info(Income, T))/SplitInfo(Income) = (10.6887)/0.8113 = 0.3837 Gain(Race, T) = 0.1887 For attribute Race, For attribute Income,
COMP537 For attribute Child,
Decision Trees
COMP537
24
CART
Impurity Measurement
COMP537
25
Race
Income
Child
Insurance
black
Gini
Info(T) = 1 ()2 ()2 =
black
black white
low
low low
no
no no
no
no no
COMP537
26
Race
Income
Child
Insurance
black
Gini
Info(T) = 1 ()2 ()2 =
black
black white
low
low low
no
no no
no
no no
Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow) = 0.333 Gain(Income, T) = Info(T) Info(Race, T) = 0.333 = 0.167 For attribute Race, For attribute Income,
COMP537 For attribute Child,
Classification Methods
COMP537
28
Bayesian Classifier
COMP537
29
COMP537
30
Conditional Probability
P(AB)
P(B)
COMP537
31
Bayes Rule
P(B|A) P(A)
P(B)
COMP537
32
Race
Income
Child
Insurance
low
no
no no
no
no no
Independent Assumption
Race Insurance
Income
Child
Insurance
Race
Income
Child
white
high
no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
P(Race = white, Income = high, Child = no| Yes) = P(Race = white | Yes) x P(Income = high | Yes) x P(Child = no | Yes) =xx = 0.09375
P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 34
COMP537
Race Insurance
Income
Child
Insurance
Race
Income
Child
white
high
no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
= 0.09375 P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 35
COMP537
Race Insurance
Income
Child
Insurance
Race
Income
Child
white
high
no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
COMP537
P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 36
Race Insurance
Income
Child
Insurance
Race
Income
Child
white
high
no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
COMP537
=0
37
Race Insurance
Income
Child
Insurance
Race
Income
Child
white
high
no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
COMP537
38
Race
Income
Child
white
high
no
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Insurance = black high no yes P(Race = white, Income = high, Child = no) ? white high yes yes yes yes no yes yes no
P(Yes) = P(No) =
Nave Bayes Classifier
black
black white
low
low low
no
no no
no
no no
= = = =
yes | Yes) = = P(Race = white, Income = high, Child = no) no | Yes) = 0.09375 x 0.5 yes | No) = 0 = P(Race = white, Income = high, Child = no) no | No) = 1 0.046875 = COMP537 P(Race = white, Income = high, Child = no)
P(Race = white, Income = high, Child = no| Yes) = 0.09375 P(Race = white, Income = high, Child = no| No) =0 P(Yes | Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no| Yes) P(Yes)
39
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Race Income Child Insurance = black high no yes P(Race = white, Income = high, Child = no) white high no ? white high yes yes P(No | Race = white, Income = high, Child = no) white low yes yes 0 Insurance = Yes = For attribute Race, white low yes yes P(Race = white, Income = high, Child = no) P(Race = black | Yes) = P(Yes) = black low no no P(Race = white | Yes) = black low no no P(No) = P(Race = black | No) = black low no no P(Race = white | No) = low no no Nave Bayes Classifier white For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = = 0.09375 P(Income = low | Yes) = P(Race = white, Income = high, Child = no| No) P(Income = high | No) = 0 =0 P(Income = low | No) = 1 P(No | Race = white, Income = high, Child = no) For attribute Child, P(Race = white, Income = high, Child = no| No) P(No) = P(Child = yes | Yes) = P(Race = white, Income = high, Child = no) P(Child = no | Yes) = 0 x 0.5 P(Child = yes | No) = 0 = P(Race = white, Income = high, Child = no) P(Child = no | No) = 1 0 = COMP537 40 P(Race = white, Income = high, Child = no)
Suppose there is a new person.
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Race Income Child Insurance = black high no yes P(Race = white, Income = high, Child = no) white high no ? white high yes yes P(No | Race = white, Income = high, Child = no) white low yes yes 0 Insurance = Yes = For attribute Race, white low yes yes P(Race = white, Income = high, Child = no) P(Race = black | Yes) = P(Yes) = black low no no P(Race = white | Yes) = black low no no P(No) = P(Race = black | No) = black low no no P(Race = white | No) = low no no Nave Bayes Classifier white For attribute Income,
Suppose there is a new person.
= = = =
high | Yes) = low | Yes) Since P(Yes | Race = white, Income = high, Child = no) = high | No)> P(No | Race = white, Income = high, Child = no). =0 low | No) = 1predict the following new person will buy an insurance. we
Race Income Child Insurance
white
high
no
COMP537
41
Bayesian Classifier
COMP537
42
Independent Assumption
COMP537
43
Yes/No
Yes/No
Exercise
Yes No
Heartburn
No Yes
Blood Pressure
High Low
Chest Pain
Yes Yes
Heart Disease
No No
No
Healthy
Yes
High
No
Yes
Some attributes are dependent on other attributes. e.g., doing exercises may reduce the probability of suffering from Heart Disease
Exercise (E)
Heart Disease
COMP537 44
D = Healthy
Exercise (E)
Diet (D)
D=Healthy D=Unhealthy
Hb=Yes
0.85 0.2
Heartburn (Hb)
CP=Yes HD=Yes Hb=Yes 0.8 0.6 0.4 0.1
45
0.55
0.75
HD=Yes Hb=No
HD=No Hb=Yes HD=No Hb=No
COMP537
Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)
COMP537
46
Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)
Property: A node is conditionally independent of its non-descendants if its parents are known.
Heartburn (Hb)
e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes) BP = High is conditionally independent of D = Healthy given HD = Yes e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) BP = High is conditionally independent of CP = Yes given HD = Yes
COMP537 47
Yes/No
Yes/No
Exercise
Yes No
Heartburn
No Yes
Blood Pressure
High Low
Chest Pain
Yes Yes
Heart Disease
No No
No
Healthy
Yes
High
No
Yes
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise ?
Heart Disease ?
P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y) = 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25 + 0.75 x 0.3 x 0.75
= 0.49
P(HD = No) = 1- P(HD = Yes) = 1- 0.49 = 0.51
COMP537
49
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise ?
Heart Disease ?
P(BP = High) = x{Yes, No} P(BP = High|HD=x) x P(HD = x) = 0.85x0.49 + 0.2x0.51 = 0.5185 P(HD = Yes|BP = High) = =
0.5185 = 0.8033
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise Yes
Heart Disease ?
P(HD = Yes | BP = High, D = Healthy, E = Yes) = = P(BP = High | HD = Yes, D = Healthy, E = Yes) P(BP = High | D = Healthy, E = Yes) x P(HD = Yes|D = Healthy, E = Yes)
P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes) x {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes)
Classification Methods
COMP537
52
Computer
COMP537
53
- + -
Computer
COMP537
54
Nearest Neighbor Classifier: Step 1: Find the nearest neighbor Step 2: Use the label of this neighbor
- + -
Computer
Suppose there is a new person
Computer
History
Buy Book?
95
COMP537
35
?
55
k-Nearest Neighbor Classifier: Step 1: Find k nearest neighbors Step 2: Use the majority of the labels of the neighbors
- + -
Computer
Suppose there is a new person
Computer
History
Buy Book?
95
COMP537
35
?
56