Classification

COMP537
Classification
Prepared by Raymond Wong The examples used in Decision Tree are borrowed from LW Chans notes Presented by Raymond Wong raywong@cse
COMP537
Classification
Suppose there is a person.
Race
Income
Child
Insurance
white
high
no
? root
child=yes
100% Yes 0% No
child=no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
Decision tree
COMP537 2
Classification
Suppose there is a person.
Test set
Child Insurance
Race
Income
white
high
no
? root
Race black white white white black black black white
Income high high low low low low low low
Child no yes yes yes no no no no
Insurance yes yes yes yes no no no no
child=yes
100% Yes 0% No
child=no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
Training set
COMP537
Decision tree
3
Applications
Insurance
According to the attributes of customers,

Marketing
Determine which customers will buy an insurance policy
Bank Loan
Determine which customers will buy a product such as computers
Determine which customers are risky customers or safe customers
COMP537
Applications
Network
According to the traffic patterns,
Determine whether the patterns are related to some security attacks
Software
According to the experience of programmers,
Determine which programmers can fix some certain bugs

5
COMP537
Same/Difference

Classification Clustering
COMP537
Classification Methods

Decision Tree Bayesian Classifier Nearest Neighbor Classifier
COMP537
Decision Trees

ID3 C4.5 CART
Iterative Dichotomiser Classification Classification And Regression Trees
COMP537
Entropy
Example 1
Consider a random variable which has a uniform distribution over 32 outcomes To identify an outcome, we need a label that takes 32 different values. Thus, 5 bit strings suffice as labels
COMP537
Entropy
Entropy is used to measure how informative is a node. If we are given a probability distribution P = (p1, p2, , pn) then the Information conveyed by this distribution, also called the Entropy of P, is: I(P) = - (p1 x log p1 + p2 x log p2 + + pn x log pn) All logarithms here are in base 2.
10
COMP537
Entropy
For example,

If P is (0.5, 0.5), then I(P) is 1. If P is (0.67, 0.33), then I(P) is 0.92, If P is (1, 0), then I(P) is 0.
The entropy is a way to measure the amount of information. The smaller the entropy, the more informative we have.
11
COMP537
Race
Income
Child
Insurance
black
high high low low low
no yes yes yes no
yes yes yes yes no
Entropy
Info(T) = - log - log =1
white white white black
black
black white
low
low low
no
no no
no
no no
For attribute Race,

Info(Tblack) = - log - log = 0.8113
Info(Twhite) = - log - log = 0.8113 Info(Race, T) = x Info(Tblack) + x Info(Twhite) = 0.8113

Gain(Race, T) = Info(T) Info(Race, T) = 1 0.8113 = 0.1887 For attribute Race, Gain(Race, T) = 0.1887
COMP537
12
Race
Income
Child
Insurance
black
no yes yes yes no
yes yes yes yes no
Entropy
black
black white
low
low low
no
no no
no
no no
For attribute Income,

Info(Thigh) = - 1 log 1 0 log 0 = 0
Info(Tlow) = - 1/3 log 1/3 2/3 log 2/3
= 0.9183 = 0.6887
Info(Income, T) = x Info(Thigh) + x Info(Tlow)
Gain(Income, T) = Info(T) Info(Income, T) = 1 0.6887= 0.3113 For attribute Race, For attribute Income,
COMP537
Gain(Race, T) = 0.1887 Gain(Income, T) = 0.3113

13
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
black white white white black
no yes yes yes no
yes yes yes yes no
{2, 3, 4} Insurance: 3 Yes; 0 No
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - log - log 6 =1 7 For attribute Child, 8 Info(Tyes) = - 1 log 1 0 log 0 = 0
black
black white
low
low low
no
no no
no
no no
Info(Tno) = - 1/5 log 1/5 4/5 log 4/5 = 0.7219

Info(Child, T) = 3/8 x Info(Tyes) + 5/8 x Info(Tno) = 0.4512 Gain(Child, T) = Info(T) Info(Child, T) = 1 0.4512 = 0.5488 For attribute Race, For attribute Income,
COMP537 For attribute Child,
Gain(Race, T) = 0.1887 Gain(Income, T) = 0.3113 Gain(Child, T) = 0.5488

14
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
no yes yes yes no
yes yes yes yes no
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - 1/5 log 1/5 4/5 log 4/5 6 = 0.7219 7 For attribute Race, 8 Info(Tblack) = - log log = 0.8113
black
black white
low
low low
no
no no
no
no no
Info(Twhite) = - 0 log 0 1 log 1 = 0

Info(Race, T) = 4/5 x Info(Tblack) + 1/5 x Info(Twhite) = 0.6490 Gain(Race, T) = Info(T) Info(Race, T) = 0.7219 0.6490 = 0.0729 For attribute Race, Gain(Race, T) = 0.0729
COMP537
15
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2
no yes yes yes no
yes yes yes yes no
3 {1, 5, 6, 7, 8} Insurance: 1 Yes; 4 No 4 5 Info(T) = - 1/5 log 1/5 4/5 log 4/5 6 = 0.7219 7 For attribute Income, 8 Info(Thigh) = - 1 log 1 0 log 0 = 0
black
black white
low
low low
no
no no
no
no no
Info(Tlow) = - 0 log 0 1 log 1
=0
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 Gain(Income, T) = Info(T) Info(Income, T) = 0.7219 0 = 0.7219 For attribute Race, For attribute Income,
COMP537

16
child=yes
100% Yes 0% No
root
child=no
20% Yes 80% No
Race
Income
Child
Insurance
1 2 3 4
black white white white
no yes yes yes no
yes yes yes yes no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
5 black Info(T) = - 1/5 log 1/5 4/5 log 4/5 {5, 6, 7, 8} {1} 6 =Insurance: 1 Yes; 0 No Insurance: 0 Yes; black 0.7219 4 No 7 black For attribute Income, 8 white Info(Thigh) = - 1 log 1 0 log 0 = 0
low
low low
no
no no
no
no no
Info(Tlow) = - 0 log 0 1 log 1
=0
Info(Income, T) = 1/5 x Info(Thigh) + 4/5 x Info(Tlow) = 0 Gain(Income, T) = Info(T) Info(Income, T) = 0.7219 0 = 0.7219 For attribute Race, For attribute Income,
COMP537

17
child=yes
100% Yes 0% No
root
child=no
Race
Income
Child
Insurance
1 2 3 4 5 6
no yes yes yes no
yes yes yes yes no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
black
black white
low
low low
no
no no
no
no no
Decision tree
7 8
Suppose there is a new person.
Race
Income
Child
Insurance
white
high
no
COMP537
18
child=yes
100% Yes 0% No
root
child=no
Race
Income
Child
Insurance
1 2 3 4 5 6
no yes yes yes no
yes yes yes yes no
Income=high
100% Yes 0% No
Income=low
0% Yes 100% No
black
black white
low
low low
no
no no
no
no no
Decision tree
7 8
Termination Criteria?
e.g., height of the tree e.g., accuracy of each node
COMP537
19
Decision Trees

ID3 C4.5 CART
COMP537
20
C4.5
ID3
Impurity Measurement
Gain(A, T) = Info(T) Info(A, T)
C4.5
Gain(A, T) = (Info(T) Info(A, T))/SplitInfo(A) where SplitInfo(A) = vA p(v) log p(v)

21
COMP537
Race
Income
Child
Insurance
black
no yes yes yes no
yes yes yes yes no
Entropy
black
black white
low
low low
no
no no
no
no no
For attribute Race,

Info(Tblack) = - log - log = 0.8113 Info(Twhite) = - log - log = 0.8113
Info(Race, T) = x Info(Tblack) + x Info(Twhite) = 0.8113 SplitInfo(Race) = - log - log =1
Gain(Race, T) = (Info(T) Info(Race, T))/SplitInfo(Race) = (1 0.8113)/1 = 0.1887 For attribute Race, Gain(Race, T) = 0.1887
COMP537
22
Race
Income
Child
Insurance
black
no yes yes yes no
yes yes yes yes no
Entropy
black
black white = 0.9183
low
low low
no
no no
no
no no

Info(Thigh) = - 1 log 1 0 log 0 = 0 Info(Tlow) = - 1/3 log 1/3 2/3 log 2/3 Info(Income, T) = x Info(Thigh) + x Info(Tlow)
= 0.6887
SplitInfo(Income) = - 2/8 log 2/8 6/8 log 6/8 = 0.8113 Gain(Income, T)= (Info(T)Info(Income, T))/SplitInfo(Income) = (10.6887)/0.8113 = 0.3837 Gain(Race, T) = 0.1887 For attribute Race, For attribute Income,
Gain(Income, T) = 0.3837 Gain(Child, T) = ?

23
Decision Trees

ID3 C4.5 CART
COMP537
24
CART
Gini I(P) = 1 j pj2
COMP537
25
Race
Income
Child
Insurance
black
no yes yes yes no
yes yes yes yes no
Gini
Info(T) = 1 ()2 ()2 =
black
black white
low
low low
no
no no
no
no no
For attribute Race,

Info(Tblack) = 1 ()2 ()2 = 0.375
Info(Twhite) = 1 ()2 ()2 = 0.375 Info(Race, T) = x Info(Tblack) + x Info(Twhite) = 0.375

Gain(Race, T) = Info(T) Info(Race, T) = 0.375 = 0.125 For attribute Race, Gain(Race, T) = 0.125
COMP537
26
Race
Income
Child
Insurance
black
no yes yes yes no
yes yes yes yes no
Gini
Info(T) = 1 ()2 ()2 =
black
black white
low
low low
no
no no
no
no no

Info(Thigh) = 1 12 02 = 0 Info(Tlow) = 1 (1/3)2 (2/3)2 = 0.444
Info(Income, T) = 1/4 x Info(Thigh) + 3/4 x Info(Tlow) = 0.333 Gain(Income, T) = Info(T) Info(Race, T) = 0.333 = 0.167 For attribute Race, For attribute Income,
Gain(Race, T) = 0.125 Gain(Race, T) = 0.167 Gain(Child, T) = ?

27

COMP537
28
Bayesian Classifier

Nave Bayes Classifier Bayesian Belief Networks
COMP537
29
Nave Bayes Classifier

Statistical Classifiers Probabilities Conditional probabilities
COMP537
30
Conditional Probability

A: a random variable B: a random variable

P(A | B) =
P(AB)
P(B)
COMP537
31
Bayes Rule

A : a random variable B: a random variable

P(A | B) =
P(B|A) P(A)
P(B)
COMP537
32
Race
Income
Child
Insurance
black white white black
high high low low
no yes yes yes no
yes yes yes yes no
Nave Bayes Classifierlow white

black
black white low low
low
no
no no
no
no no
Independent Assumption

Each attribute are independent e.g., P(X, Y, Z | A) = P(X | A) x P(Y | A) x P(Z | A)

COMP537 33
Race Insurance
Income
Child
Insurance
Race
Income
Child
high high low low
no yes yes yes no
yes yes yes yes no
white
For attribute Race, P(Race P(Race P(Race P(Race = = = =
Nave BayesYesClassifierlow white Insurance =

black | Yes) = white | Yes) = black | No) = white | No) =
high
no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
P(Income P(Income P(Income P(Income P(Child P(Child P(Child P(Child = = = =
= = = =
high | Yes) = low | Yes) = high | No) = 0 low | No) = 1
For attribute Child,
P(Race = white, Income = high, Child = no| Yes) = P(Race = white | Yes) x P(Income = high | Yes) x P(Child = no | Yes) =xx = 0.09375
P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 34
yes | Yes) = no | Yes) = yes | No) = 0 no | No) = 1
COMP537
Race Insurance
Income
Child
Insurance
Race
Income
Child
high high low low
no yes yes yes no
yes yes yes yes no
white

high
no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
= = = =
P(Race = white, Income = high, Child = no| Yes)
= 0.09375 P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 35
COMP537
Race Insurance
Income
Child
Insurance
Race
Income
Child
high high low low
no yes yes yes no
yes yes yes yes no
white

high
no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
= = = =
P(Race = white, Income = high, Child = no| Yes) = 0.09375
COMP537
P(Race = white, Income = high, Child = no| No) = P(Race = white | No) x P(Income = high | No) x P(Child = no | No) =x0x1 =0 36
Race Insurance
Income
Child
Insurance
Race
Income
Child
high high low low
no yes yes yes no
yes yes yes yes no
white

high
no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
= = = =
P(Race = white, Income = high, Child = no| No)
COMP537
=0
37
Race Insurance
Income
Child
Insurance
Race
Income
Child
high high low low
no yes yes yes no
yes yes yes yes no
white

high
no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
= = = =
P(Race = white, Income = high, Child = no| No) =0
COMP537
38
Race
Income
Child
white

white black low low black | Yes) = white | Yes) = black | No) = white | No) =
high
no
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Insurance = black high no yes P(Race = white, Income = high, Child = no) ? white high yes yes yes yes no yes yes no
P(Yes) = P(No) =
black
black white
low
low low
no
no no
no
no no
= = = =
yes | Yes) = = P(Race = white, Income = high, Child = no) no | Yes) = 0.09375 x 0.5 yes | No) = 0 = P(Race = white, Income = high, Child = no) no | No) = 1 0.046875 = COMP537 P(Race = white, Income = high, Child = no)
P(Race = white, Income = high, Child = no| Yes) = 0.09375 P(Race = white, Income = high, Child = no| No) =0 P(Yes | Race = white, Income = high, Child = no) P(Race = white, Income = high, Child = no| Yes) P(Yes)
39
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Race Income Child Insurance = black high no yes P(Race = white, Income = high, Child = no) white high no ? white high yes yes P(No | Race = white, Income = high, Child = no) white low yes yes 0 Insurance = Yes = For attribute Race, white low yes yes P(Race = white, Income = high, Child = no) P(Race = black | Yes) = P(Yes) = black low no no P(Race = white | Yes) = black low no no P(No) = P(Race = black | No) = black low no no P(Race = white | No) = low no no Nave Bayes Classifier white For attribute Income, P(Race = white, Income = high, Child = no| Yes) P(Income = high | Yes) = = 0.09375 P(Income = low | Yes) = P(Race = white, Income = high, Child = no| No) P(Income = high | No) = 0 =0 P(Income = low | No) = 1 P(No | Race = white, Income = high, Child = no) For attribute Child, P(Race = white, Income = high, Child = no| No) P(No) = P(Child = yes | Yes) = P(Race = white, Income = high, Child = no) P(Child = no | Yes) = 0 x 0.5 P(Child = yes | No) = 0 = P(Race = white, Income = high, Child = no) P(Child = no | No) = 1 0 = COMP537 40 P(Race = white, Income = high, Child = no)
P(Yes | Race = white, Income = high, Child = no) Race Income Child Insurance 0.046875 Race Income Child Insurance = black high no yes P(Race = white, Income = high, Child = no) white high no ? white high yes yes P(No | Race = white, Income = high, Child = no) white low yes yes 0 Insurance = Yes = For attribute Race, white low yes yes P(Race = white, Income = high, Child = no) P(Race = black | Yes) = P(Yes) = black low no no P(Race = white | Yes) = black low no no P(No) = P(Race = black | No) = black low no no P(Race = white | No) = low no no Nave Bayes Classifier white For attribute Income,
= = = =
high | Yes) = low | Yes) Since P(Yes | Race = white, Income = high, Child = no) = high | No)> P(No | Race = white, Income = high, Child = no). =0 low | No) = 1predict the following new person will buy an insurance. we
Race Income Child Insurance
white
high
no
COMP537
41
Bayesian Classifier

Nave Bayes Classifier Bayesian Belief Networks
COMP537
42
Bayesian Belief Network
Independent Assumption
Do not have independent assumption
COMP537
43
Yes/No
High/Low Yes/No Yes/No Bayesian Belief Network Healthy/ Unhealthy

Diet
Healthy Unhealthy
Yes/No
Exercise
Yes No
Heartburn
No Yes
Blood Pressure
High Low
Chest Pain
Yes Yes
Heart Disease
No No
No
Healthy
Yes
High
No
Yes
Some attributes are dependent on other attributes. e.g., doing exercises may reduce the probability of suffering from Heart Disease
Exercise (E)
Heart Disease
COMP537 44

E = Yes
0.7 0.25 HD=Yes E=Yes D=Healthy E=Yes D=Unhealthy E=No D=Healthy E=No D=Unhealthy 0.25 0.45
D = Healthy
Exercise (E)
Diet (D)
D=Healthy D=Unhealthy
Hb=Yes
0.85 0.2
Heart Disease (HD)
Heartburn (Hb)
CP=Yes HD=Yes Hb=Yes 0.8 0.6 0.4 0.1
45
0.55
0.75
Blood Pressure (BP)

BP=High HD=Yes HD=No 0.85 0.2
Chest Pain (CP)
HD=Yes Hb=No
HD=No Hb=Yes HD=No Hb=No
COMP537
Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)
Lemma: If X is conditionally independent of Y given Z, P(X, Y | Z) = P(X | Z) x P(Y | Z) ?
COMP537
46
Let X, Y, Z be three random variables. X is said to be conditionally independent of Y given Z if the following holds. P(X | Y, Z) = P(X | Z)

Exercise (E) Diet (D)
Property: A node is conditionally independent of its non-descendants if its parents are known.
Heart Disease (HD)
Heartburn (Hb)
Blood Pressure (BP)
Chest Pain (CP)
e.g., P(BP = High | HD = Yes, D = Healthy) = P(BP = High | HD = Yes) BP = High is conditionally independent of D = Healthy given HD = Yes e.g., P(BP = High | HD = Yes, CP=Yes) = P(BP = High | HD = Yes) BP = High is conditionally independent of CP = Yes given HD = Yes
COMP537 47
Yes/No
High/Low Yes/No Yes/No Bayesian Belief Network Healthy/ Unhealthy

Diet
Healthy Unhealthy
Yes/No
Exercise
Yes No
Heartburn
No Yes
Blood Pressure
High Low
Chest Pain
Yes Yes
Heart Disease
No No
No
Healthy
Yes
High
No
Yes
Suppose there is a new person and I want to know whether he is likely to have Heart Disease.
Exercise ? Exercise ? Exercise
Diet ? Diet ? Diet
Heartburn ? Heartburn ? Heartburn ?
Blood Pressure ? Blood Pressure High Blood Pressure High
Chest Pain ? Chest Pain ? Chest Pain ?
Heart Disease ? Heart Disease ? Heart Disease ?

48
Yes COMP537 Healthy
Exercise ?

Diet ? Heartburn ? Blood Pressure ? Chest Pain ?
Heart Disease ?
P(HD = Yes) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x, D=y) = x{Yes, No} y{Healthy, Unhealthy} P(HD=Yes|E=x, D=y) x P(E=x) x P(D=y) = 0.25 x 0.7 x 0.25 + 0.45 x 0.7 x 0.75 + 0.55 x 0.3 x 0.25 + 0.75 x 0.3 x 0.75
= 0.49
P(HD = No) = 1- P(HD = Yes) = 1- 0.49 = 0.51
COMP537
49
Exercise ?

Diet ? Heartburn ? Blood Pressure High Chest Pain ?
Heart Disease ?
P(BP = High) = x{Yes, No} P(BP = High|HD=x) x P(HD = x) = 0.85x0.49 + 0.2x0.51 = 0.5185 P(HD = Yes|BP = High) = =
P(BP = High|HD=Yes) x P(HD = Yes)

P(BP = High) 0.85 x 0.49
0.5185 = 0.8033
P(HD = No|BP = High) = 1 P(HD = Yes|BP = High) = 1 0.8033 = 0.1967

COMP537 50
Exercise Yes

Diet Heartburn ? Blood Pressure High Chest Pain ? Healthy
Heart Disease ?
P(HD = Yes | BP = High, D = Healthy, E = Yes) = = P(BP = High | HD = Yes, D = Healthy, E = Yes) P(BP = High | D = Healthy, E = Yes) x P(HD = Yes|D = Healthy, E = Yes)
P(BP = High|HD = Yes) P(HD = Yes|D = Healthy, E = Yes) x {Yes, No} P(BP=High|HD=x) P(HD=x|D=Healthy, E=Yes)
0.85x0.25 0.85x0.25 + 0.2x0.75 P(HD = No | BP = High, D = Healthy, E = Yes)
= 0.5862 = 1- P(HD = Yes | BP = High, D = Healthy, E = Yes) = 1-0.5862 = 0.4138

COMP537 51

COMP537
52
Nearest Neighbor Classifier

Computer 100 90 20 History 40 45 95
History
Computer
COMP537
53

Computer 100 90 20 History 40 45 95 Buy Book? No (-) Yes (+) Yes (+)
History
+ + + +
- + -
Computer
COMP537
54

History
+ + + +
Nearest Neighbor Classifier: Step 1: Find the nearest neighbor Step 2: Use the label of this neighbor
- + -
Computer
Suppose there is a new person
Computer
History
Buy Book?
95
COMP537
35
?
55

History
+ + + +
k-Nearest Neighbor Classifier: Step 1: Find k nearest neighbors Step 2: Use the majority of the labels of the neighbors
- + -
Computer
Suppose there is a new person
Computer
History
Buy Book?
95
COMP537
35
?
56

Classification

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification

Uploaded by

Copyright:

Available Formats

COMP537

Race black white white white black black black white

Income high high low low low low low low

Child no yes yes yes no no no no

Insurance yes yes yes yes no no no no

According to the attributes of customers,

Determine which customers will buy an insurance policy

Determine which customers will buy a product such as computers

According to the attributes of customers,

Determine which customers are risky customers or safe customers

According to the traffic patterns,

Determine whether the patterns are related to some security attacks

According to the experience of programmers,

Determine which programmers can fix some certain bugs

Decision Tree Bayesian Classifier Nearest Neighbor Classifier

ID3 C4.5 CART

Iterative Dichotomiser Classification Classification And Regression Trees

high high low low low

no yes yes yes no

yes yes yes yes no

white white white black

For attribute Race,

Info(Twhite) = - log - log = 0.8113 Info(Race, T) = x Info(Tblack) + x Info(Twhite) = 0.8113

high high low low low

no yes yes yes no

yes yes yes yes no

white white white black

For attribute Income,

Info(Tlow) = - 1/3 log 1/3 2/3 log 2/3

Info(Income, T) = x Info(Thigh) + x Info(Tlow)

Gain(Race, T) = 0.1887 Gain(Income, T) = 0.3113

black white white white black

high high low low low

no yes yes yes no

yes yes yes yes no

{2, 3, 4} Insurance: 3 Yes; 0 No

Info(Tno) = - 1/5 log 1/5 4/5 log 4/5 = 0.7219

Gain(Race, T) = 0.1887 Gain(Income, T) = 0.3113 Gain(Child, T) = 0.5488

black white white white black

high high low low low

no yes yes yes no

yes yes yes yes no

{2, 3, 4} Insurance: 3 Yes; 0 No

Info(Twhite) = - 0 log 0 1 log 1 = 0

black white white white black

high high low low low

no yes yes yes no

yes yes yes yes no

{2, 3, 4} Insurance: 3 Yes; 0 No

Info(Tlow) = - 0 log 0 1 log 1

Gain(Race, T) = 0.0729 Gain(Income, T) = 0.7219

black white white white

high high low low low

no yes yes yes no

yes yes yes yes no

Info(Tlow) = - 0 log 0 1 log 1

Gain(Race, T) = 0.0729 Gain(Income, T) = 0.7219

black white white white black

high high low low low

no yes yes yes no

yes yes yes yes no

Suppose there is a new person.

black white white white black

high high low low low