Professional Documents
Culture Documents
• Discretization steps:
– placing breakpoints wherever class changes
• Leading to a rule:
2. Statistical modeling
• 1R using one attribute as the basis of decision
• How about using all attributes as the basis of
decision?
– All attributes contribute to the decision
• Attributes are equally important
• Attributes are independent to one another
• A simple methods to be used based on
probability Bayesian Classification
Bayesian Classification
Konsep Dasar (1)
Likelihood item diklasifikasikan HIJAU = jumlah item HIJAU di sekitar/jumlah total item HIJAU
Likelihood item diklasifikasikan RED = jumlah item MERAH di sekitar/jumlah total item MERAH
• Prior probability,
– Prior prob of yes = 9/14
– Prior prob of no = 5/14
• Likelihood probability,
– Likelihood prob of yes = 2/9 x 3/9 x 3/9 x 3/9 =
– Likelihood prob of no = 3/5 x 1/5 x 4/5 x 3/5 =
Implementation of Bayesian classification concept
• Posterior probability
dengan
Contoh
• Dataset untuk menentukan apakah seseorang akan
membeli komputer atau tidak berdasarkan atribut
umur, pemasukan, status pelajar, dan rating kredit.
• Klasifikasikan data dengan:
age income student credit_rating buys_comput
age <= 30 er
<=30 High No fair No
income = medium <=30 High No Excellent No
31 … 40 High No Fair Yes
student = yes >40 Medium No Fair Yes
>40 Low Yes Fair Yes
credit_rating = fair >40 Low Yes Excellent No
31 … 40 Low Yes Excellent Yes
Buys_computer=? <=30 Medium No Fair No
<=30 Low Yes Fair Yes
>40 Medium Yes Fair Yes
<=30 Medium Yes Excellent Yes
31 … 40 Medium No Excellent Yes
31 … 40 High Yes Fair Yes
>40 Medium No Excellent No
Contoh Implementasi age income student credit_ratin buys_comp
g uter
• Dimisalkan: <=30
<=30
High
High
No
No
fair
Excellent
No
No
31 … 40 High No Fair Yes
C2: buys_computer = no 31 … 40
<=30
Low
Medium
Yes
No
Excellent
Fair
Yes
No
<=30 Low Yes Fair Yes
• Hitung P(Ci) atau prior >40
<=30
Medium
Medium
Yes
Yes
Fair
Excellent
Yes
Yes
probability: 31 … 40
31 … 40
Medium
High
No
Yes
Excellent
Fair
Yes
Yes
>40 Medium No Excellent No
• P(buys_computer = yes)
= 9/14 = 0.643
• P(buys_computer = no) =
5/14 = 0.357
age income student credit_ratin buys_comp
g uter
<=30 High No fair No
<=30 High No Excellent No
31 … 40 High No Fair Yes
>40 Medium No Fair Yes
>40 Low Yes Fair Yes
>40 Low Yes Excellent No
31 … 40 Low Yes Excellent Yes
<=30 Medium No Fair No
<=30 Low Yes Fair Yes
>40 Medium Yes Fair Yes
<=30 Medium Yes Excellent Yes
31 … 40 Medium No Excellent Yes
31 … 40 High Yes Fair Yes
>40 Medium No Excellent No
• Thus,
– Posterior prob of yes = 2/9 x 0.0340 x 0.0221 x 3/9 x 9/14 = 0.000036
– Posterior prob of no = 3/5 x 0.0279 x 0.0381 x 3/5 x 5/14 = 0.000137
Solution for example 15
Kelebihan dan Kekurangan Naïve Bayes
Kelebihan:
• Mudah untuk diimplementasikan
• Untuk sebagian besar kasus diperoleh hasil yang bagus
Kekurangan:
• Harus menggunakan asumsi tidak ada hubungan antara
satu atribut dengan atribut yang lain, padahal prakteknya
terkadang ada data yang atributnya berkaitan. Masalah
ini diselesaikan dengan pengembangan dari Naïve
Bayes, yaitu Bayesian Belief Networks.
However, dependencies may exist..
• Bayesian (belief) network
– Graphical model of causal relationship
– Trained Bayesian network can be used for
classification.
– Two components:
• A directed acyclic graph
• A set of conditional probability tables (CPTs), each
variable has one CPT.
• Each node represents
random variable, which may
correspond to
– Attributes of D
– Hidden variables believed to
form a relationship
• CPT for a variable Y,
specifies the
conditional distribution
P(Y|Parents(Y)) .
Training Bayesian Network
• Network topology (layout of nodes and arcs)
– Given observable variables, several algorithms exist
for learning the network topology.
– Human expert in the field of analysis may help in
network design.