Professional Documents
Culture Documents
Weekend
W1
W2
W3
W4
W5
W6
W7
W8
W9
W10
Weather Parents
Sunny
Yes
Sunny
No
Windy
Yes
Rainy
Yes
Rainy
No
Rainy
Yes
Windy
No
Windy
No
Windy
Yes
Sunny
No
Money
Rich
Rich
Rich
Poor
Rich
Poor
Poor
Rich
Rich
Rich
Decision
Cinema
Tennis
Cinema
Cinema
Stay in
Cinema
Cinema
Shopping
Cinema
Tennis
First, we should find the split attribute for the root node:
Entropy(S) = -pcinema log2(pcinema) -ptennis log2(ptennis) -pshopping log2(pshopping) -pstay_in log2(pstay_in)
= 1.571
Entropy(S, weather) = -(|Ssun|/10)*Entropy(Ssun) - (|Swind|/10)*Entropy(Swind) - (|Srain|/10)*Entropy(Srain)
= -0.3*(2/3*log2(2/3)+1/3*log2(1/3)) - 0.4*(0.75*log2(0.75)+.25*log2(.25)) - 0.3*(2/3*log2(2/3)+2/3*log2(2/3))
= 0.8755
Gain(S, weather) = 1.571 - 0.8755 = 0.70
Entropy(S, parent) = -(|Syes|/10)*Entropy(Syes) - (|Sno|/10)*Entropy(Sno)
= 0.961
Gain(S, parents) = 1.571 - 0.961 = 0.61
Entropy(S, money) = -(|Srich|/10)*Entropy(Srich) - (|Spoor|/10)*Entropy(Spoor)
= 1.2894
Gain(S, money) = 1.571 - 1.2894 = 0.2816
This means that the first node in the decision tree will be the weather attribute.
For the first branch, we should only look for the rows that contain sunny as a value of Weather. Ssunny = {W1, W2, W10}.
Sunny
Yes
Rich
Cinema
W2
Sunny
No
Rich
Tennis
W10
Sunny
No
Rich
Tennis
From the previous step we can find the Entropy for Sunny was 0.918. Then we need to find the Gain of parent and money.
Gain(Ssunny, parents) = 0.918 - 0 = 0.918
Gain(Ssunny, money) = 0.918 - 0.918 = 0
Q2.
Transaction ID
T1
T2
T3
T4
T5
T6
T7
T8
T9
Items bought
{ I1, I2, I5}
{ I2, I4}
{ I2, I3}
{ I1, I2, I4}
{ I1, I3}
{ I2, I5}
{ I1, I3}
{ I1, I2, I3, I5}
{ I1, I2, I3}
1
1
0
0
1
1
0
1
2
1
1
1
1
0
1
0
3
0
0
1
0
1
0
1
4
0
1
0
1
0
0
0
5
1
0
0
0
0
1
0
12
13
14
15
23
24
25
34
35
45
123
125
135
234
235
245
1235
T8
T9
Notice that for level 3 and level 4 I should prune the candidates before finding the support. For this example the prune step
yields nothing. The gray columns have support less than the minimum.
The frequent Item-sets L = {{1},{2},{3},{4},{5},{1,2},{1,3},{1,5},{2,3},{2,4},{2,5},{1,2,3},{1,2,5}}
12 rules can be created from the 2-itemset, 12 rules can be created from the 3-itemset.
We will find the strong rules for {1,2} and {1,2,3}. If any:
Association rule
12
21
12,3
21,3
31,2
1,23
1,32
2,31
Support
4/9
4/9
2/9
2/9
2/9
2/9
2/9
2/9
Confidence
4/6
4/7
2/6
2/7
2/5
2/4
2/4
2/3
Strong Rule??
N
N
N
N
N
N
N
N