You are on page 1of 3

Assignment 2

Weekend
W1
W2
W3
W4
W5
W6
W7
W8
W9
W10

Weather Parents
Sunny
Yes
Sunny
No
Windy
Yes
Rainy
Yes
Rainy
No
Rainy
Yes
Windy
No
Windy
No
Windy
Yes
Sunny
No

Money
Rich
Rich
Rich
Poor
Rich
Poor
Poor
Rich
Rich
Rich

Decision
Cinema
Tennis
Cinema
Cinema
Stay in
Cinema
Cinema
Shopping
Cinema
Tennis

First, we should find the split attribute for the root node:
Entropy(S) = -pcinema log2(pcinema) -ptennis log2(ptennis) -pshopping log2(pshopping) -pstay_in log2(pstay_in)
= 1.571
Entropy(S, weather) = -(|Ssun|/10)*Entropy(Ssun) - (|Swind|/10)*Entropy(Swind) - (|Srain|/10)*Entropy(Srain)
= -0.3*(2/3*log2(2/3)+1/3*log2(1/3)) - 0.4*(0.75*log2(0.75)+.25*log2(.25)) - 0.3*(2/3*log2(2/3)+2/3*log2(2/3))
= 0.8755
Gain(S, weather) = 1.571 - 0.8755 = 0.70
Entropy(S, parent) = -(|Syes|/10)*Entropy(Syes) - (|Sno|/10)*Entropy(Sno)
= 0.961
Gain(S, parents) = 1.571 - 0.961 = 0.61
Entropy(S, money) = -(|Srich|/10)*Entropy(Srich) - (|Spoor|/10)*Entropy(Spoor)
= 1.2894
Gain(S, money) = 1.571 - 1.2894 = 0.2816
This means that the first node in the decision tree will be the weather attribute.

For the first branch, we should only look for the rows that contain sunny as a value of Weather. Ssunny = {W1, W2, W10}.

Weekend Weather Parents Money Decision


W1

Sunny

Yes

Rich

Cinema

W2

Sunny

No

Rich

Tennis

W10

Sunny

No

Rich

Tennis

From the previous step we can find the Entropy for Sunny was 0.918. Then we need to find the Gain of parent and money.
Gain(Ssunny, parents) = 0.918 - 0 = 0.918
Gain(Ssunny, money) = 0.918 - 0.918 = 0

Q2.

Transaction ID
T1
T2
T3
T4
T5
T6
T7
T8
T9

Items bought
{ I1, I2, I5}
{ I2, I4}
{ I2, I3}
{ I1, I2, I4}
{ I1, I3}
{ I2, I5}
{ I1, I3}
{ I1, I2, I3, I5}
{ I1, I2, I3}

First, we should find the frequent item set:


For the following table, the header shows the candidates for all levels 1-4. The last row shows the frequency for these
candidates.
ID
T1
T2
T3
T4
T5
T6
T7

1
1
0
0
1
1
0
1

2
1
1
1
1
0
1
0

3
0
0
1
0
1
0
1

4
0
1
0
1
0
0
0

5
1
0
0
0
0
1
0

12

13

14

15

23

24

25

34

35

45

123

125

135

234

235

245

1235

T8

T9

Notice that for level 3 and level 4 I should prune the candidates before finding the support. For this example the prune step
yields nothing. The gray columns have support less than the minimum.
The frequent Item-sets L = {{1},{2},{3},{4},{5},{1,2},{1,3},{1,5},{2,3},{2,4},{2,5},{1,2,3},{1,2,5}}
12 rules can be created from the 2-itemset, 12 rules can be created from the 3-itemset.
We will find the strong rules for {1,2} and {1,2,3}. If any:
Association rule
12
21
12,3
21,3
31,2
1,23
1,32
2,31

Support
4/9
4/9
2/9
2/9
2/9
2/9
2/9
2/9

Confidence
4/6
4/7
2/6
2/7
2/5
2/4
2/4
2/3

Strong Rule??
N
N
N
N
N
N
N
N

You might also like