Professional Documents
Culture Documents
Performance measures:
• TP: True Positive
• TN: True Negative
decision boundary • FP: False Positive
Classes associated with instances: X, ◦ • FN: False Negative
Tear
Performance measures: Spectacle production
Age prescription Ast rate Lens
• Success rate σ: young myope no reduced none
TP + TN young myope no normal soft
σ= young myope yes reduced none
N young myope yes normal hard
• Error rate : = 1 − σ young hypermetrope no reduced none
young hypermetrope no normal soft
• TPR (= recall ρ) True Positive Rate young hypermetrope yes reduced none
young hypermetrope yes normal hard
TPR = TP/(TP + FN) pre-presbyopic myope no reduced none
pre-presbyopic myope no normal soft
• FNR False Negative Rate: FNR = 1 − TPR pre-presbyopic myope yes reduced none
pre-presbyopic myope yes normal hard
• FPR False Positive Rate: pre-presbyopic hypermetrope no reduced none
pre-presbyopic hypermetrope no normal soft
FPR = FP/(FP + TN)
pre-presbyopic hypermetrope yes reduced none
• TNR True Negative Rate: TNR = 1 − FPR pre-presbyopic hypermetrope yes normal none
presbyopic myope no reduced none
• Precision π: presbyopic myope no normal none
presbyopic myope yes reduced none
π = TP/(TP + FP) presbyopic myope yes normal hard
presbyopic hypermetrope no reduced none
• F -measure: presbyopic hypermetrope no normal soft
2·ρ·π 2TP presbyopic hypermetrope yes reduced none
F = = presbyopic hypermetrope yes normal none
ρ+π 2TP + FP + FN Ast = Astigmatism
Rule representation and reasoning ZeroR
OneR(classvar)
• 17/24 instances correct
{
R←∅ • Correctly classified instances: 17 (70.83%)
for each var ∈ Vars do • Incorrectly classified instances: 7 (29.16%)
for each value ∈ Domain(var) do
classvar.most-freq-value ←
MostFreq(var.value, classvar) === Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
rule ← MakeRule(var.value, 1 0.368 0.417 1 0.588 soft
classvar.most-freq-value) 0 0 0 0 0 hard
0.8 0 1 0.8 0.889 none
R ← R ∪ {rule}
for each r ∈ R do === Confusion Matrix ===
CalculateErrorRate(r) a b c <-- classified as
5 0 0 | a = soft
R ← SelectBestRulesForSingleVar(R) 4 0 0 | b = hard
} 3 0 12 | c = none
Generalisation: separate-and-cover Example: choosing contact lenses
Recommended contact lenses: none, soft, hard
General principles:
1. Choose class value, e.g. hard
2. Construct rule Condition → Lens = hard
3. Determine accuracy α = p/t for all possible
conditions, where
– t: total number of instances covered by
the rule
– p: covered instances with the right (pos-
itive) class value
Condition α = p/t
Covering of classes:
Age = youg 2/8
Age = pre-presbyopic 1/8
• Rule-set generation for each class value sep-
Age = presbyopic 1/8
arately Spectacles = myope 3/12
Spectacles = hypermetrope 3/12
• Peeling: box compression – instances are Astigmatism = no 0/12
peeled off (fall outside the box) one face at Astigmatism = yes 4/12
Tears = reduced 0/12
the time Tears = normal 4/12
• PRISM algorithm 4. Select best condition (4/12)
SC(classvar, D) Rule:
{ • RHS: Lens = hard
R←∅
for each val ∈ Domain(classvar) do • LHS: Astigmatism = yes, with α = 4/12
E←D Not very accurate; expanded rule:
while E contains instances with val do (Astigmatism = yes ∧ New-condition)
rule ← MakeRule(rhs(classvar.val), lhs(∅)) → Lens = hard
IR ← ∅ Tear
until rule is perfect do Spectacle product
Age prescription Ast rate Lens
for each var ∈ Vars, ∀rule ∈ IR : var ∈
/ rule do young myope yes reduced none
young myope yes normal hard
for each value ∈ Domain(var) do young hypermetrope yes reduced none
young hypermetrope yes normal hard
inter-rule ← Add(rule, lhs(var.value)) pre-presbyopic myope yes reduced none
pre-presbyopic myope yes normal hard
IR ← IR ∪ {inter-rule} pre-presbyopic hypermetrope yes reduced none
pre-presbyopic hypermetrope yes normal none
rule ← SelectRule(IR) presbyopic myope yes reduced none
R ← R ∪ {rule} presbyopic myope yes normal hard
presbyopic hypermetrope yes reduced none
RC ← InstancesCoveredBy(rule, E) presbyopic hypermetrope yes normal none
WEKA Results
Limitations
Rules:
If Astigmatism = no and Tears = normal
and Spectacles = hypermetrope then Lens = soft
If Astigmatism = no and Tears = normal • Adding one condition at the time is greedy
and age = young then Lens = soft
If age = pre-presbyopic and Astigmatism = no search (‘optimal’ state may be missed)
and Tears = normal then Lens = soft
If Astigmatism = yes and Tears = normal • Accuracy α = p/t: promotes overfitting: the
and Spectacles = myope then Lens = hard more ‘correct’ (higher p compared to t) is,
If age = young and Astigmatism = yes
and Tears = normal then Lens = hard the higher α
If Tears = reduced then none
If age = presbyopic and Tears = normal • Resulting rules cover all instances perfectly
and Spectacles = myope
and Astigmatism = no then Lens = none
If Spectacles = hypermetrope Example
and Astigmatism = yes
and age = pre-presbyopic then Lens = none
If age = presbyopicand Spectacles = hypermetrope Consider rule r1 with accuracy α1 = 1/1 and
and Astigmatism = yes then Lens = none
rule r2 with accuracy α2 = 19/20, then r1 is
=== Confusion Matrix === considered superior to r2
a b c <-- classified as
5 0 0 | a = soft
0 4 0 | b = hard Alternative 1: information gain
0 0 15 | c = none
Alternative 2: probabilistic measure
Correctly classified instances: 24 (100%)
Information gain
p0
" #
ID (r) = p0 log 0 − log
p Comparison accuracy versus
t t information gain
where
Information gain I:
• α = p/t is the accuracy before adding a con-
dition to r • Emphasis is on large number of positive in-
• α0 = p0/t0 is the accuracy after a condition stances
has been added to r • High coverage cases first, special cases later
• Resulting rules cover all instances perfectly
Example
Consider rule r 0 with α0 = 1/1 and rule r 00 with
accuracy α00 = 19/20, both modifications of r Accuracy α:
with α = 20/200. Then is r 0 considered supe-
• Takes number of positive instances only into
rior to r 00 according to accuracy, but
account if ties break
ID (r 0 ) = 1[log(1/1) − log(20/200)] = 1 • Special cases first, high coverage cases later
ID (r 00) = 19[log(19/20) − log(20/200)] ≈ 18.6
• Resulting rules cover all instances perfectly
hence r 0 is inferior to r 00 according to informa-
tion gain
Tears
Lens = ? X
reduced normal yes no
none Astigmatism class variable
no yes X C X C
yes yes yes
... ... no
.. ..
Age
Spectacle
yes yes
. .
prescription yes no no yes
young ... ... no no
presbyopic pre−prebyopic hypermetrope myope
.. ...
yes no .
Spectacle hard no no
soft soft Age DX = yes
prescription
young pre−prebyopic
myope hypermetrope
DX = no
presbyopic
Dataset D:
none soft hard none none
D = DX=yes ∪ DX=no
Learning decision trees: Entropy:
X
• R. Quinlan: ID3, C4.5 and C5.0 HC (X = x) = − P (C = c|X = x) ln P (C = c|X = x)
c
• L. Breiman: CART (Classification and Re- Expected entropy:
gression Trees)
X
EHC (X) = P (X = x)HC (X = x)
x
Information gain (again) Example: contact lenses
recommendation
Dataset D:
Class variable is Lens:
D = DX=yes ∪ DX=no
5/24 if Lens = soft
Entropy: P (Lens) = 4/24 if Lens = hard
15/24 if Lens = none
X
HC (X = x) = − P (C = c|X = x) ln P (C = c|X = x)
c
5 5 4 4 15 15
Expected entropy: H(>) = − ln − ln − ln
24 24 24 24 24 24
X
≈ 0.92
EHC (X) = P (X = x)HC (X = x)
x For variable Ast (Astigmatism):
Example (continued)
0 0 4 4 8 8
H(Ast = yes) = − ln − ln − ln Information gain:
12 12 12 12 12 12
≈ 0.64
⇒ G(Tears) = H(>) − EH (Tears)
1 1 = 0.92 − 0.55 = 0.37
⇒ EH (Ast) = H(Ast = no) + H(Ast = yes)
2 2
= 1/2(0.68 + 0.64) = 0.66 Comparison: